<span style="font-size:25px;"> CS GO E-SPORT ANALYSIS </span>

# Introduction

Counter-Strike Global Offensive is a game released in 2012, as a sequel to Counter-Strike Source (released in 2004), which is itself a sequel to the original Counter-Strike (released in 2000). The game's longevity is primarily caused by its competitive approach and vibrant professional scene. This longevity has shown in numbers recently, as CS:GO reached in March its all-time high concurrent players (1.1M players), making it the most played game on Steam, 7 years after it was launched. So, I thought it would be interesting to celebrate this milestone by grouping relevant data about the game and seeing what insights people can get from it!

# Data Set Description

Counter-strike is a FPS (First-Person Shooter) game in which two teams of 5 players face each other in a matchup. The game retains the same gameplay concepts since its first version, which include a Terrorist side (T) that is tasked to plant a bomb and have it detonate, and a Counter-Terrorist side (CT) that is tasked to defuse the bomb or prevent it from being planted. Both teams can also win a round by eliminating all players on the opposing team before the bomb is planted.

A standard game of Counter-Strike is a best of 30 rounds, the winning team being the first to win 16 rounds. The 30 rounds are played in two halves of 15 on each side of the map, with a round time limit of 1 minute 55 seconds, plus 40 seconds after the bomb is planted.

In case both teams draw at the 30th round on 15x15, 6 more rounds are added-on, which constitutes overtime. The overtime ends if a team wins 4 out of 6 rounds. If both teams win 3 rounds in overtime, another overtime of 6 rounds is played, and the process might repeat indefinitely until one team wins it.

There are 7 maps in the map pool that are available to be played competitively at any given time. Maps are removed and added frequently for updates and revamps, as to not make the game stale. Matches are normally played as a 'bo3' (Best of 3) maps, with less important matches played in a 'bo1' fashion and finals often played as 'bo5's.

Counter-strike has an economic system that governs the acquisitions of armor, weapons and grenades by the players. Winning a round award the players with $3250 while losing a round after a winning streak gives them $1400. Losing many times in a short period increases the losing bonus by $500 for every additional loss, as to not penalize the losing team too much. Players can also win money by getting kills and planting or defusing the bomb.

The match in the link https://www.youtube.com/watch?v=EkJu4laFGTs elucidates all of these concepts. It is also one of my all-time favorite matches (even though I was not rooting for any of the teams), so I decided to include it here.


Information pulled from the original data set. Here's the link https://www.kaggle.com/mateusdmachado/csgo-professional-matches

# <p style="text-align:center;">📌General Ideas and Process </p>


 1.  Phase 1, General EDA of the dataset, preprocessing etc. 

 2.  Phase 2, visualize the important of the 'economy' of each match/map, to get a better understanding   of different teams strategies. 

 3.  Phase 3, Predictive Analysis. ```I would like to try to predict the match results based on the first 1/3, or 1/2 of the game. I'm not to sure the approach I will take just yet, but we will get there.```

In [None]:
import numpy as np
import pandas as pd
import plotly.graph_objects as go
import plotly.express as px
from os import listdir

pd.set_option('display.max_columns', 100)

listdir('../input/csgo-professional-matches/')

In [None]:
base_dir = '../input/csgo-professional-matches/'

df_results = pd.read_csv(base_dir+'results.csv',low_memory=False)
df_picks = pd.read_csv(base_dir+'picks.csv',low_memory=False)
df_economy = pd.read_csv(base_dir+'economy.csv',low_memory=False)
df_players = pd.read_csv(base_dir+'players.csv',low_memory=False)

While we take a second to take a peek at the data, it's important to note that rows in 'df_results' and df_economy are 'per map' data, while the rows in 'df_picks' and 'df_players' is representing the entire [best of 5] match.

In [None]:
df_results.head()

In [None]:
df_economy.head()

In [None]:
df_players.head()

In [None]:
df_picks.head()

We can now cut down some of this data to make it easier to manage and chat results. Lets take the top 20 teams and leave the others out for now.

In [None]:
min_rank = 30

df_results = df_results[(df_results.rank_1<min_rank)&(df_results.rank_2<min_rank)]

df_picks = df_picks[df_picks.match_id.isin(df_results.match_id.unique())]

df_economy[df_economy.match_id.isin(df_results.match_id.unique())]

df_players = df_players[df_players.match_id.isin(df_results.match_id.unique())]

In [None]:
winner_1 = df_results[df_results.result_1>=df_results.result_2].result_1.values
loser_1 = df_results[df_results.result_1>=df_results.result_2].result_2.values

winner_2 = df_results[df_results.result_1<df_results.result_2].result_2.values
loser_2 = df_results[df_results.result_1<df_results.result_2].result_1.values


winner = np.concatenate((winner_1,winner_2))

loser = np.concatenate((loser_1,loser_2))

df_scores = pd.DataFrame(np.vstack((winner,loser)).T,columns=['winner','loser'])

In [None]:
gb = df_scores.groupby(by=['winner','loser'])['winner'].count()/df_scores.shape[0]
overtime_percentage = str(round(gb[gb.index.get_level_values(0)!=16].sum()*100,1))+'%'

gb = round(gb[gb>10**-3]*100,1)


index_plot = np.array(gb.index.get_level_values(0).astype('str'))+'-'+np.array(gb.index.get_level_values(1).astype('str'))

## Now we can plot

fig = go.Figure()
fig.add_trace(go.Scatter(x=index_plot,y=gb.values, name='results'))
fig.update_layout(xaxis_type='category',title='Scores Distribution',xaxis_title='Score',yaxis_title='Percentage of Matches (%)')

In the graph above, we can gather the following information about the match results.

1. The chances of a team losing the game while scoring 0 round wins is pretty rare. In the CS GO community, this is refered to as 'The Dream"(beating a team without them scoring a single round). The chances of this happening(if we do not include overtime data) is 0.2%.

2. The most popular score in professional CS GO matches is 16-14. While this may be surprising to people who are not familiar with CS GO or the professional CS GO scene, it's not very surprising to me. I will explain my thought process below.
 
    *Diving a bit deeper on this particular distribution. CS GO teams are for the most part running very similar strategies known as a 'meta'. The skill level of the players on the teams represented in this data set(especially after code block 7, where we started only looking at the top 30 teams) is extremely high. However, the skill gap between these players is extremely low. Think of it as the NBA, or the NFL. Only the absolute top athletes make it to professional sports, but you can only run so fast, or kick a ball so far. So often times, the difference between a all pro player and a more common athlete(statistically) is a mere .2 seconds on a 40 yard sprint. That being said, the data clearly shows how small the gap is, with the final outcome of a game  most often coming down to 1 small action, or 1 blunder.*

3. Given the information in the above paragraph, the chances of a game going in to overtime(final score being 15-15) is also decently high compared to traditional sports, but a standard occurance in ESPORTS. The chances of a match going to overtime rounds is 9.7%, but would be significantly impacted if we consider all of the teams in the data.

# CT Vs T side Results

Playing CS GO over the years, I have heard countless times from players and content creators say, "this map is CT sided", or "this map is so T sided". What players are trying to say is that it is easier defending, or easier attacking, on the each specific map. For instance, it seems to be much easier to attack on the map 'Cache", just because the way the map is designed. Next we will take a look at the competitive map pool, and see what commonalities we can find between wins/losses vs maps.

In [None]:
ct_1 = df_results[['date','_map','ct_1']].rename(columns={'ct_1':'ct'})
ct_2 = df_results[['date','_map','ct_2']].rename(columns={'ct_2':'ct'})
ct = pd.concat((ct_2,ct_2))

In [None]:
t_1 = df_results[['date','_map','t_1']].rename(columns={'t_1':'t'})
t_2 = df_results[['date','_map','t_2']].rename(columns={'t_2':'t'})
t = pd.concat((t_1,t_2))

In [None]:
t = t.sort_values('date')
ct = ct.sort_values('date')

In [None]:
maps = ['Cache','Cobblestone','Dust2','Inferno','Mirage','Nuke','Overpass','Train','Vertigo']

In [None]:
series_t, series_ct, how_ct = {},{},{}
for i, key in enumerate(maps):
    t_map = t[t._map == maps[i]]
    ct_map = ct[ct._map == maps[i]]
    y_t = t_map.t.rolling(min_periods = 20, window= 200, center=True).sum().values
    y_ct = ct_map.ct.rolling(min_periods = 20, window= 200, center=True).sum().values
    
    series_t[key] = pd.Series(data=y_t,index=t_map.date)
    series_ct[key] = pd.Series(data=y_ct,index=ct_map.date)
    
    how_ct[key] = series_ct[key]/(series_ct[key]+series_t[key])//0.001/10
    

In [None]:
def add_trace(_map):
    fig.add_trace(go.Scatter(x=how_ct[_map].index, y=how_ct[_map].values, name=_map))

In [None]:
fig = go.Figure()
for _map in maps:
    add_trace(_map)
fig.add_trace(go.Scatter(x=['2015-11-01', '2020-03-12'], y = [50,50],
                        mode='lines', line=dict(color='grey'), showlegend=False))
fig.update_layout(title='Distribution of Rounds', yaxis_title='Percentage of round wins by CT (%)')
fig.show()

📌 Map Breakdown(from a players perspective)




1. de_Mirage - Mirage is one of the most played maps in the game, so it should be surpising that the map is one of the most balanced. Late 2016 Mirage leaned very heavy CT sided, but overall, the map has consistantly been well balanced. Not only is it a popular map on CSGO, its also one of my favorites. The map allows for fast rushes, yet very detailed, organized hits on both A and B bomb sites.
[Click here for more information on de_Mirage](https://counterstrike.fandom.com/wiki/Mirage#:~:text=Mirage%20is%20the%20Global%20Offensive,named%20it%20Mirage%20(de_mirage).





2. de_Overpass - Overpass comes in comfortably CT sided. This map is designed differently than most CSGO maps. Overpass has a multi level/floor design that allows for very quick CT rotations during the round. Also, sound is very important in CSGO, if you are above or below other players, their footsteps are very noticable, and you can communcate and adjust accordingly. 
[Click here for more information on de_Overpass](https://counterstrike.fandom.com/wiki/Overpass)





3. de_Train - Train is one of the classics. It has been in the map pool for a very long time, One of the longest spans to be more clear. Unlike Overpass, Train has a 2 side approach to map design, rather than a multi-level(although there are small areas that create this dynamic). We can gather that de_Train is a very CT sided map. From the graph, we can see that is stays CT sided from early 2016, to current day CSGO matches. The peaks in early 2017 and 2019(reaching a staggering 58.7% wins from CT side, and again in early 2019, hitting 59.6% WOW) shows just how CT sided this map is. Often times, winning 3 or 4 rounds out of 15 on T side is acceptable.
[Click here for more information on de_Train](https://counterstrike.fandom.com/wiki/Train)




4. de_Cobblestone - Formerly known as "Cobble", was designed by David Johnston after completing Dust. The map has a significant amount of verticle combat, both in the open spans of the map and in the enclosed indoor areas. Cobblestone was removed from the competitive map pool midway through 2018, which is when we lose information. That being said, Cobblestone remains comfortably T sided over 80% of the time it was in the map pool. late 2016/early 2017, and again in January of 2018, the meta slightly shifted and CT round wins were substantially higher, but it didnt last very long.
 [Click here for more information on de_Cobblestone](https://counterstrike.fandom.com/wiki/Cobblestone)
 
 



5. de_DustII - Dust2 is a remake of undoubtedly the most iconic map in Counter Strike history. This map is notorious for its angles that stretch literally from spawn to spawn. From the "Rush B", to the running past the mid door wondering if you were going to get killed or not. Aside from the player experience, this map is strongly T sided. See the above graph to see the map meta over time. [Click here for more information on de_Dust II](https://counterstrike.fandom.com/wiki/Dust_II)




6. de_Inferno - Coming in to the map pool during Counter Strike 1.1, this map has a very long history in CS, and CSGO. This is a common player favorite and is played in the pro circuit almost every single match. The map went through a significant overhaul in 2016, which leads us to the data. You can clearly see the map was heavily CT sided prior to 2016. The changes to the map(opening up "banana" to allow faster hits and new grenade lineups) made B site much easier to hit. Since the changes, it has consistantly been one of, if not the, most balanced map in CGO. [Click here for more information on de_inferno](https://counterstrike.fandom.com/wiki/Inferno)




7. de_Cache - With the introduction of Operation Breakout, Cache became an official map. It is now part of Active Duty in Competitive  and Defusal Group Delta in Deathmatch and Casual. Cache was reworked and shown to the public for the first time on 29th September 2019 during the ESL One New York 2019 CSGO tournament. Unlike other maps in Active Duty, it is entirely maintained by it's creator, FMPONE. de_Cache has been extremely friendly towards the T side of the map. Even recently, with new map updates and angles adjusted, its still confidently T sided. [Click here for more information on de_Cache](https://counterstrike.fandom.com/wiki/Cache)




8. de_Nuke - The map takes place in a warehouse containing nuclear materials or a nuclear power plant. In Global Offensive, the original version of the map takes place a German nuclear power plant, while the revamped version takes place in an American one. Nuke is one of least played maps in the game. Seems like for the past 5 years professional teams have consistantly insta-banned this from the map pool. It's extremely CT sided, and much like de_Train, getting only a handful of rounds on T side is very normal, even for professional teams. [Click here for more information on de_Nuke](https://counterstrike.fandom.com/wiki/Nuke)




9. de_vertigo - Vertigo is based on a skyscraper that was under construction and centered the conflict between the Counter-Terrorist and the Terrorist teams. The main objective for the Terrorists were to bomb the building while the CTs must prevent them from achieving their goal. There is't much information on vertigo in the competitive scene. As you can see in the graph, it was introduced in the first quarter of 2019. Even then, it took a while for the pro's to have enough practice on the map enough to feel comfortable playing it during the professional competitions. [Click here for more information on de_vertigo](https://counterstrike.fandom.com/wiki/Vertigo)

In [None]:
print('Total number of matches played on the map:')
df_results.groupby('_map').date.count()

In [None]:
majors = [{'tournament':'01. Cluj-Napoca 2015','start_date':'2015-10-28'},
          {'tournament':'02. Columbus 2016','start_date':'2016-03-29'},
          {'tournament':'03. Cologne 2016','start_date':'2016-07-05'},
          {'tournament':'04. Atlanta 2017','start_date':'2017-01-22'},
          {'tournament':'05. Krakow 2017','start_date':'2017-07-16'},
          {'tournament':'06. Boston 2018','start_date':'2018-01-26'},
          {'tournament':'07. London 2018','start_date':'2018-09-20'},
          {'tournament':'08. Katowice 2019','start_date':'2019-02-28'},
          {'tournament':'09. Berlin 2019','start_date':'2019-09-05'}]

In [None]:
def create_col_time_period(df):
    df['time_period'] = ''
    
    for major_start in majors:
        df.loc[(df['date']>=major_start['start_date']),'time_period'] = major_start['tournament']
        
    return df

In [None]:
df_results = create_col_time_period(df_results)
df_economy = create_col_time_period(df_economy)
df_picks = create_col_time_period(df_picks)
df_players = df_players.merge(df_results[['match_id','time_period']],on='match_id')

In [None]:
results_df_team_1 = df_results[['time_period','team_1','_map','ct_1','t_2','ct_2','t_1']].rename(columns={'team':'team'})

results_df_team_2 = df_results[['time_period','team_2','_map','ct_1','t_2','ct_2','t_1']].rename(columns={'team_2':'team'})

results_df_teams = pd.concat((results_df_team_1,results_df_team_2))[['time_period','team','_map']]

In [None]:
gb = results_df_teams.groupby(['time_period','_map']).team.count()

gb_text = round(gb*100/gb.groupby('time_period').sum(),1).reset_index().rename(columns={'team':'percentage'})

gb_text.percentage = gb_text.percentage.astype(str)+'%'

gb = gb.reset_index()

In [None]:
fig = go.Figure()
for _map in maps:
    fig.add_bar(name=_map,x=gb[gb._map==_map].time_period,y=gb[gb._map==_map].team,text=gb_text[gb_text._map==_map].percentage,textposition='inside')
    
fig.update_layout(barmode='stack',legend=dict(traceorder='normal'),yaxis_title='Number of Maps Played',font=dict(size=10))

fig.show()

# <p style="text-align:center;">📌Economy Analysis </p>

One of the terms you'll hear discussed a lot in the CS:GO scene surrounds the concept of “economy” - something that doesn't very intuitively explain itself.

```To explain things simply, a team's economy is concerned with the amount of money that all players on the team have gathered collectively in order to buy new weapons and utilites.```


This value is in a constant state of flux between rounds, and depends on how much the team has spent on weapons and armour, the kill awards that have been received per elimination (based on each weapon), the status of bomb planting / defusions, and who won the round in question.

All team members have their own pot of money that shifts and changes based on the events detailed above, and how you choose to manage your money has a significant impact on the rest of the team's purchasing plans.

For this reason you'll find that players will want to work together to purchase and swap items. If your AWP ace can't afford that weapon but you can, you can pick it up for them and swap with a gun that they can afford to purchase for you. It's just another side to the close teamwork and cooperation that's required to succeed in CS:GO.

Winning a round (by eliminating the entire team) provides the winning team with 3250 dollars per player, plus 300 dollars if the bomb is planted by a T. Winning by time on CT-side rewards players 3250 dollars, and winning the round with a defusal (CT) or detonation (T) of the bomb rewards 3500 dollars.

Weapon - Award Per Kill ($)

Knife - 1500

Pistol (excluding CZ75-Auto) - 300

CZ75-Auto - 100

SMG (excluding P90) - 600

P90 - 300

Shotgun - 900

Assault Rifles & Auto Snipers - 300

AWP - 100

Grenade - 300

Zeus x27 - 0


This is all I'm going to include about how the economy in CSGO works. I think any more would possibly cause even more entropy. I will however leave this last note. Economy is the single most important aspect of CSGO, this stretches from the noobs in silver to the pro's playing in 1 million dollar tournaments.

Below we will drive a bit deeper in the analysis. We will gather the probability of of a team winning the game based on their economy, taken at a specific point in the game.

In [None]:
## First we will take a look at the df_economy data we imported earlier

df_economy.head()