# Set up

In [1]:
import os, os.path
from datetime import date
import pandas as pd
from pandas.io.json import json_normalize
import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline
import matplotlib.dates as mdates

from tqdm import tqdm, tqdm_notebook
import requests
import json
import re

from pandas.plotting import register_matplotlib_converters
import seaborn as sns
sns.set(
    font_scale=1.5,
    style="darkgrid",
    rc={'figure.figsize':(20,7)})

from IPython.core.display import display, HTML
display(HTML("<style>.container { width:80% !important; }</style>"))
from IPython.core.interactiveshell import InteractiveShell
InteractiveShell.ast_node_interactivity = "all"

pd.set_option('display.float_format', lambda x: '%.3f' % x)
pd.set_option('display.max_rows', 1000)
pd.set_option('display.max_columns', 100)
pd.set_option('display.max_colwidth', 300)
pd.set_option('display.width', 1000)
pd.set_option('display.precision',3)

# Fetching datas

https://liquipedia.net/counterstrike/Main_Page hosts lots of informations about Counter Strike : teams, tournaments, players, hosts. 

We'll gather them with the help of its API. You can see the help page here : https://liquipedia.net/counterstrike/api.php

And the Liquipedia's terms of use here : https://liquipedia.net/api-terms-of-use

## Informations required

We will focus on teams performance through years, especially how much they earned, how many tournaments they won and how many sponsors do they have and where are they from. In addition, we will also take a look at their social media presence.

Therefore here are the arguments we will pass in our request :

In [2]:
attributes  = ["?Has name",
"?Is active","?Was created",
"?Was disbanded", "?Has earnings",
"?Has location","?Has region",
"?Has sponsor",
"?Has earnings in 2012",
"?Has earnings in 2013",
"?Has earnings in 2014",
"?Has earnings in 2015",
"?Has earnings in 2016",
"?Has earnings in 2017",
"?Has earnings in 2018",
"?Has earnings in 2019",
"?Has site",
"?Has twitter",
"?Has twitch stream",
"?Has instagram",
"?Has facebook",
"?Has youtube channel",
"?Has faceit profile",
"?Has vk",
"?Has esea id",
"?Has steam profile"]

Since these arguments require a "|" to separate them, we will add it with list comprehension and then regroup all the list elements in one single item ready to be passed in our next request.

In [3]:
attributes = ["|" + attribute for attribute in attributes]
attributes = ''.join(attributes)

In [4]:
attributes

'|?Has name|?Is active|?Was created|?Was disbanded|?Has earnings|?Has location|?Has region|?Has sponsor|?Has earnings in 2012|?Has earnings in 2013|?Has earnings in 2014|?Has earnings in 2015|?Has earnings in 2016|?Has earnings in 2017|?Has earnings in 2018|?Has earnings in 2019|?Has site|?Has twitter|?Has twitch stream|?Has instagram|?Has facebook|?Has youtube channel|?Has faceit profile|?Has vk|?Has esea id|?Has steam profile'

## Requesting and managing datas

We will fetch Liquipedia's data we need with the ".get()" method of the requests module. And load and read the results with the json module.

The first part of our request is composed of four parts.

The first one : "https://liquipedia.net/counterstrike/api.php?action=ask&query=",
which is the adress of the api and the actions it will carry, here "ask" ("API module to query Semantic MediaWiki using the ask language") and "query" ("Fetch data from and about MediaWiki").

The second one : "[[category:CSGO%20Teams]]", which is the category we want to examine, in our case the CS:GO teams.

The thrid one : wich is our attributes variable we created earlier.

The fourth one : "|limit=1500&format=json" which specifies the maximum number of results we need and the format we need. Here a json file.

In [5]:
api_request = requests.get("https://liquipedia.net/counterstrike/api.php?action=ask&query=[[category:CSGO%20Teams]]"+attributes+"|limit=1500&format=json")

api = json.loads(api_request.content)

Before we can created a dataframe out or json file, we need to only keep the ["results"] part of it, and get rid of the ["query"] entry. 

We will do so with by looping through the results as we create a list contening one dictionnary for each team.

In [6]:
teams = []

for result in api["query"]["results"]:
    Name_team = result
    teams.append(api["query"]["results"][Name_team]['printouts'])    

In [7]:
#showing the two first teams
teams[:2]

[{'Has name': ['/10/'],
  'Is active': ['f'],
  'Was created': ['2013'],
  'Was disbanded': [{'timestamp': '1398556800', 'raw': '1/2014/4/27'}],
  'Has earnings': [3900],
  'Has location': ['Germany'],
  'Has region': ['Europe'],
  'Has sponsor': [],
  'Has earnings in 2012': [0],
  'Has earnings in 2013': [3900],
  'Has earnings in 2014': [0],
  'Has earnings in 2015': [0],
  'Has earnings in 2016': [0],
  'Has earnings in 2017': [0],
  'Has earnings in 2018': [0],
  'Has earnings in 2019': [0],
  'Has site': [],
  'Has twitter': [],
  'Has twitch stream': [],
  'Has instagram': [],
  'Has facebook': ['https://facebook.com/team10cs'],
  'Has youtube channel': [],
  'Has faceit profile': [],
  'Has vk': [],
  'Has esea id': [],
  'Has steam profile': []},
 {'Has name': ['100 Thieves'],
  'Is active': ['t'],
  'Was created': ['2017-11-20 <br>[[File:Csgo icon.png|link=Counter-Strike: Global Offensive]]: 2017-12-12'],
  'Was disbanded': [],
  'Has earnings': [67000],
  'Has location': ['U

## Creating a dataframe

Now the can finally create a dataframe.

In [8]:
CSGO_teams = pd.DataFrame(teams)
CSGO_teams.head()

Unnamed: 0,Has earnings,Has earnings in 2012,Has earnings in 2013,Has earnings in 2014,Has earnings in 2015,Has earnings in 2016,Has earnings in 2017,Has earnings in 2018,Has earnings in 2019,Has esea id,Has facebook,Has faceit profile,Has instagram,Has location,Has name,Has region,Has site,Has sponsor,Has steam profile,Has twitch stream,Has twitter,Has vk,Has youtube channel,Is active,Was created,Was disbanded
0,[3900],[0],[3900],[0],[0],[0],[0],[0],[0],[],[https://facebook.com/team10cs],[],[],[Germany],[/10/],[Europe],[],[],[],[],[],[],[],[f],[2013],"[{'timestamp': '1398556800', 'raw': '1/2014/4/27'}]"
1,[67000],[0],[0],[0],[0],[0],[0],[0],[67000],[],[https://facebook.com/100Thieves],[],[https://www.instagram.com/100thieves],[United States],[100 Thieves],[North America],[https://www.100thieves.com/],[[https://www.rocketmortgage.com/ Rocket mortgage]<br />[https://www.redbull.com/us-en/ Red Bull]<br />[https://stockx.com/ stockX]<br />[https://www.totinos.com/ Totinos]<br />[http://cash.app/ cashapp]],[],[],[https://twitter.com/100Thieves],[],[https://www.youtube.com/100thieves],[t],[2017-11-20 <br>[[File:Csgo icon.png|link=Counter-Strike: Global Offensive]]: 2017-12-12],[]
2,[6650],[0],[0],[150],[6500],[0],[0],[0],[0],[],[],[],[],[Sweden],[1337],[Europe],[],[],[],[],[],[],[],[f],[],"[{'timestamp': '1436918400', 'raw': '1/2015/7/15'}]"
3,[6497],[0],[0],[0],[0],[3255],[3242],[0],[0],[],[https://facebook.com/2killgaming],[],[https://www.instagram.com/2killgaming],[Brazil],[2Kill Gaming],[South America],[https://www.2killgaming.com],[[http://www.kappabrasil.com.br Kappa]<br />[http://arsenalxfire.com.br Arsenal XFire]<br />[http://www.asrock.com/ ASRock]<br />[http://www.seagate.com/br/pt/ Seagate]<br />],[],[],[https://twitter.com/2KillGaming],[],[https://www.youtube.com/channel/2KillGaming],[f],[[[File:Csgo icon.png|link=Counter-Strike: Global Offensive]] : 2013-05-17],"[{'timestamp': '1514764800', 'raw': '1/2018'}]"
4,[1052],[0],[0],[0],[792],[260],[0],[0],[0],[],[https://facebook.com/31337esportspl],[],[],[Poland],[31337 eSPORTS],[Europe],[http://eleet-esports.pl/],[],[],[],[https://twitter.com/31337GAMING],[],[],[f],[2001-02-05<br>\n[[File:Csgo icon.png|link=Counter-Strike: Global Offensive]] 2012-01-01],"[{'timestamp': '1517443200', 'raw': '1/2018/2/1'}]"


Cleaning the dataset will be the main focus of the next part. For now, we'll just remove brackets and rearrange the columns for them to order as we passed the arguments in our queries.

In [9]:
#removing brackets and backslashes
CSGO_teams = CSGO_teams.applymap(lambda x: str(x).replace("[","").replace("]","").replace("\'",""))

#creating a list with the columns in the good order
cols = ["Has name",
"Is active","Was created",
"Was disbanded", "Has earnings",
"Has location","Has region",
"Has sponsor",
"Has earnings in 2012",
"Has earnings in 2013",
"Has earnings in 2014",
"Has earnings in 2015",
"Has earnings in 2016",
"Has earnings in 2017",
"Has earnings in 2018",
"Has earnings in 2019",
"Has site",
"Has twitter",
"Has twitch stream",
"Has instagram",
"Has facebook",
"Has youtube channel",
"Has faceit profile",
"Has vk",
"Has esea id",
"Has steam profile"]

CSGO_teams = CSGO_teams[cols]

In [10]:
CSGO_teams.head()

Unnamed: 0,Has name,Is active,Was created,Was disbanded,Has earnings,Has location,Has region,Has sponsor,Has earnings in 2012,Has earnings in 2013,Has earnings in 2014,Has earnings in 2015,Has earnings in 2016,Has earnings in 2017,Has earnings in 2018,Has earnings in 2019,Has site,Has twitter,Has twitch stream,Has instagram,Has facebook,Has youtube channel,Has faceit profile,Has vk,Has esea id,Has steam profile
0,/10/,f,2013,"{timestamp: 1398556800, raw: 1/2014/4/27}",3900,Germany,Europe,,0,3900,0,0,0,0,0,0,,,,,https://facebook.com/team10cs,,,,,
1,100 Thieves,t,2017-11-20 <br>File:Csgo icon.png|link=Counter-Strike: Global Offensive: 2017-12-12,,67000,United States,North America,https://www.rocketmortgage.com/ Rocket mortgage<br />https://www.redbull.com/us-en/ Red Bull<br />https://stockx.com/ stockX<br />https://www.totinos.com/ Totinos<br />http://cash.app/ cashapp,0,0,0,0,0,0,0,67000,https://www.100thieves.com/,https://twitter.com/100Thieves,,https://www.instagram.com/100thieves,https://facebook.com/100Thieves,https://www.youtube.com/100thieves,,,,
2,1337,f,,"{timestamp: 1436918400, raw: 1/2015/7/15}",6650,Sweden,Europe,,0,0,150,6500,0,0,0,0,,,,,,,,,,
3,2Kill Gaming,f,File:Csgo icon.png|link=Counter-Strike: Global Offensive : 2013-05-17,"{timestamp: 1514764800, raw: 1/2018}",6497,Brazil,South America,http://www.kappabrasil.com.br Kappa<br />http://arsenalxfire.com.br Arsenal XFire<br />http://www.asrock.com/ ASRock<br />http://www.seagate.com/br/pt/ Seagate<br />,0,0,0,0,3255,3242,0,0,https://www.2killgaming.com,https://twitter.com/2KillGaming,,https://www.instagram.com/2killgaming,https://facebook.com/2killgaming,https://www.youtube.com/channel/2KillGaming,,,,
4,31337 eSPORTS,f,2001-02-05<br>\nFile:Csgo icon.png|link=Counter-Strike: Global Offensive 2012-01-01,"{timestamp: 1517443200, raw: 1/2018/2/1}",1052,Poland,Europe,,0,0,0,792,260,0,0,0,http://eleet-esports.pl/,https://twitter.com/31337GAMING,,,https://facebook.com/31337esportspl,,,,,


Now we can export the dataframe in a csv format and move to the next part of our CS:GO teams analysis.

In [None]:
#exporting the csv :
CSGO_teams.to_csv(os.getcwd() + "\%s.csv" % "CSGO_teams",index=False)