# CMSC Final Project

#### By Thomas Purtscher

## Introduction

### Goal

The goal of my project is to analyze Counter-Strike: Global Offensive (CS:GO) performance on a map-by-map basis to construct the best team from my group of friends (or if you are following this as a tutorial, your friends) for each map in the game. 

### Background on CS:GO

CS:GO is a 5v5 competitive shooter built in the Source engine, famed for its large esports scene and game mechanics mostly unique to the Counter-Strike franchise (ex. no ADS, extreme inaccuracy while running and shooting). It was released in 2012 and quickly became a major player in the esports scene and is currently one of the most popular competitve shooters. Most people play the game in 5v5 ranked mode, which limits the map pool to a select few that have rotated out throughout the years - the most popular of which being Dust 2, Nuke, Cache, Inferno, and Mirage. 

These maps are especially notable in the context of CS:GO because of the aforementioned staple mechanics of the Counter-Strike franchise. No ADS means weapons are more limited in their effective ranges and combined with no accurate run-and-gun (at least with rifles) means that your positioning at the start of a fight is incredibly important. The lack of a sprint button only furthers this importance on positioning, and the round-to-round in-game economy means that losing can be much more punishing as you actually lose the weapons you fight with. These factors combined (along with the physics engine of Source and the grenade mechanics in-game) means the map knowledge is absolutely crucial to succeeding in CS:GO, and it means that players will gravitate towards some maps because their experience can give them a serious competitive advantage (especially in professional CS:GO).

Every map we will be looking at (and that people play on) uses the defusal gamemode, indicated by the "de_" in front of the map name. This gamemode has 5 Counter-Terrorists defending bombsites while the 5 Terrorists attempt to either kill the enemy team within the time or plant the bomb. When the Terrorists plant the bomb, the Counter-Terrorists have 45 seconds to retake the site and defuse it while the Terrorists defend the site, effectively reversing the attack/defense roles

## Data

The data I will be using to perform the analysis is match history data from [csgostats](csgostats.gg) that covers all matches the player participated in from 2020-2022 (and in my case three extra from 2017-2019, for some reason). This data contains pretty much every stat you would concievably want except for the players you were playing with that game (but I was unable to find a site that does, so it can't be helped).

I first attempted to download the html through python from the url, but quickly ran into the issue of being blocked by the captcha of the site which was definitely a new and worrying issue, but this can be circumvented by downloading the html to the site manually and then parsing the html from the files you have obtained. It's less efficient but is an easy workaround, unlike attempting to solve the recaptcha with some AI or paying someone to solve it for you.

## Obtaining The Data

### Processing The HTML

Here we import all the required packages, which I will describe as we use them. The first one we use is BeautifulSoup, which allows us to view the html code more succinctly through bs.prettify(). This gives us a good idea of what we are looking at, and helps in finding the correct table when using pd.read_html(). We are using pandas in this project for the highly useful dataframes the packages provides, giving us a powerful and conivient data management tool. pd.read_html() scans the html file and extracts all the tables it finds, returning them as data frames. Here the table we want is the third of the three tables found on the page, which I will name as "my_games".

In [1]:
from bs4 import BeautifulSoup #html parser & viewer
import numpy as np #mathematics package
import pandas as pd #provides dataframes and related methods
import seaborn as sns #provides our visualizations


matches = open("Evihunger match history.htm", "r", encoding='utf-8') #opens the html file in python
bs = BeautifulSoup(matches.read())
pretty = bs.prettify()
#print(pretty) #used to preview the html
readtable = pd.read_html(str(bs)) #read tables from the html
print(len(readtable))
print(readtable[2].columns)
my_games = readtable[2] #fetches the proper table
print(my_games)

3
Index(['Date', 'Unnamed: 1', 'Map', 'Score', 'Rank', 'Unnamed: 5', 'K', 'D',
       'A', '+/-', 'HS%', 'ADR', '1v5', '1v4', '1v3', '1v2', '1v1', '5k', '4k',
       '3k', 'Rating', 'Unnamed: 21'],
      dtype='object')
                Date  Unnamed: 1          Map  Score  Rank  Unnamed: 5   K  \
0          1 day ago         NaN      de_nuke   16:8   NaN         NaN  17   
1     Sat 7th May 22         NaN      de_nuke  16:14   NaN         NaN  12   
2     Sat 7th May 22         NaN      de_nuke  11:16   NaN         NaN  29   
3     Fri 6th May 22         NaN   de_inferno  11:16   NaN         NaN  17   
4     Fri 6th May 22         NaN      de_nuke  14:16   NaN         NaN  25   
..               ...         ...          ...    ...   ...         ...  ..   
674   Sat 2nd May 20         NaN     de_dust2   16:0   NaN         NaN  23   
675  Tue 25th Feb 20         NaN  de_overpass  12:16   NaN         NaN  26   
676  Wed 27th Nov 19         NaN      de_nuke   16:3   NaN         NaN  19   


#### Results

Our read table looks pretty good but there are some things we need to clean up. Firstly there are several NaN columns that need to be removed. These columns are mostly either nonsense columns (I am assuming used for formatting) or the "Rank" column which holds the image of the rank you were at when you played the game. 

### Processing The Dataframe Of Matches

Our first step is to process the "Score" column into three boolean columns "W", "T", and "L", representing Win, Tie, and Loss respectively. There is something to be said for counting the number of rounds and averaging kills per round, but I elected not to do that. We already have ADR as a round-dependent stat which will prevent short games from throwing off our statistics. Counting rounds could also serve for "how much of a win/loss" the win/loss was, but this really is up to you; me and my friends tend to throw a few rounds when we're winning so even if we genuinely dominate the other team the score might not reflect it. Because of that I am not going to count rounds, as it would likely be unreliable and thus not very useful. I will also be deleting the "Date" column because all of these games are *relatively* recent and for most of this time period (last 3 years) me and my friends have been about the same in rank and skill. Finally, I will be deleting the "Rank" column not just because it is NA but because CSGO has one of the most unreliable and innacurate ranking systems in a modern competitve game - low ranks will sometimes be matched with high ranks and the variance in skill within each rank is extremely high.

In [2]:
#changing score to string
my_games['Score'] = my_games['Score'].astype("string")
#creating win/tie/loss columns
my_games['W'] = False
my_games['T'] = False
my_games['L'] = False

#marks wins/losses
index = 0
while index < len(my_games.Map):
    scores = my_games.at[index, 'Score'].split(":")
    if int(scores[0]) > int(scores[1]):
        my_games.loc[index, 'W'] = True
    elif int(scores[0]) == int(scores[1]):
        my_games.loc[index, 'T'] = True
    else:
        my_games.loc[index, 'L'] = True
    index += 1

#removing unnecessary columns
my_games = my_games.drop(['Date', 'Unnamed: 1', 'Rank', 'Unnamed: 5', 'Unnamed: 21', "Score"], axis = 1)
print(my_games)

             Map   K   D  A  +/-  HS%  ADR  1v5  1v4  1v3  1v2  1v1  5k  4k  \
0        de_nuke  17  18  7   -1   59   96    0    0    0    0    3   0   0   
1        de_nuke  12  21  7   -9   75   57    0    0    0    0    1   0   0   
2        de_nuke  29  19  3   10   41  103    0    0    0    1    1   0   0   
3     de_inferno  17  24  3   -7   35   79    0    0    0    0    0   0   0   
4        de_nuke  25  26  4   -1   40   95    0    0    0    0    1   0   1   
..           ...  ..  .. ..  ...  ...  ...  ...  ...  ...  ...  ...  ..  ..   
674     de_dust2  23   8  6   15   35  150    0    0    0    0    0   0   2   
675  de_overpass  26  23  4    3   38   97    0    0    0    0    0   0   0   
676      de_nuke  19  14  5    5   42  103    0    0    0    0    0   0   0   
677   de_inferno  29  23  5    6   66  105    0    0    0    0    0   0   0   
678   de_inferno  28  20  8    8   61  111    0    0    0    0    0   0   0   

     3k  Rating      W      T      L  
0     2    0

#### Results

Now our dataframe is far nicer to look at, and can easily be further processed by our program.

### Maps

Now we need to identify what maps have been played in the dataset to determine what maps are needed to make summary statistics for - the list of which can be generated by using .unique() on the "Map" column.

In [3]:
maps = my_games.Map.unique() #get all the maps
print(maps)

['de_nuke' 'de_inferno' 'de_vertigo' 'de_cache' 'de_mirage' 'de_iris'
 'de_basalt' 'de_ancient' 'cs_insertion2' 'de_overpass' 'de_dust2'
 'cs_office' 'de_train' 'cs_agency']


#### Results

We now have the list of maps in the dataset.

### Creating Per Map Summary Statistics

Because we have the list of maps, now we can make the statistics summarized across all games for each map. I made a list of every map and then turned that into the first column of a new dataframe and made empty columns for each summary statistic. Means are calculated with np.mean() and standard deviations are calculated with np.std() (np being numpy, a powerful mathematics package). Most of these columns are pretty self-explanatory aside from the "plus_minus" columns: the amount of kills in a game minus the amount of deaths, the "ADR" columns: the average damage per round, and the "Rating" column: the HLTV.org "rating" statistic (calculated based on a variety of variable and designed to measure performance with 1 being average).

In [4]:
#creating data structure to hold per map stats
my_per_map_stats = pd.DataFrame(['de_nuke', 'de_inferno', 'de_vertigo', 'de_cache', 'de_mirage', 'de_iris', 'de_basalt', 'de_ancient',
                              'cs_insertion2', 'de_overpass', 'de_dust2', 'cs_office', 'de_train', 'cs_agency'],  columns = ['Map'])

#creating columns for all the desired stats
my_per_map_stats['mean_K'] = 0
my_per_map_stats['std_K'] = 0
my_per_map_stats['mean_D'] = 0
my_per_map_stats['std_D'] = 0
my_per_map_stats['mean_A'] = 0
my_per_map_stats['std_A'] = 0
my_per_map_stats['mean_HS'] = 0
my_per_map_stats['std_HS'] = 0
my_per_map_stats['mean_plus_minus'] = 0
my_per_map_stats['sum_plus_minus'] = 0
my_per_map_stats['mean_ADR'] = 0
my_per_map_stats['std_ADR'] = 0
my_per_map_stats['mean_Rating'] = 0
my_per_map_stats['std_Rating'] = 0
my_per_map_stats['sum_W'] = 0
my_per_map_stats['sum_T'] = 0
my_per_map_stats['sum_L'] = 0

#calculating per map stats
index = 0
while index < len(my_per_map_stats.Map):
    map_data = my_games[my_games['Map'] == my_per_map_stats.Map[index]]
    
    my_per_map_stats.loc[index, 'mean_K'] = np.mean(map_data['K'])
    my_per_map_stats.loc[index, 'std_K'] = np.std(map_data['K'])
    my_per_map_stats.loc[index, 'mean_D'] = np.mean(map_data['D'])
    my_per_map_stats.loc[index, 'std_D'] = np.std(map_data['D'])
    my_per_map_stats.loc[index, 'mean_A'] = np.mean(map_data['A'])
    my_per_map_stats.loc[index, 'std_A'] = np.std(map_data['A'])
    my_per_map_stats.loc[index, 'mean_HS'] = np.mean(map_data['HS%'])
    my_per_map_stats.loc[index, 'std_HS'] = np.std(map_data['HS%'])
    my_per_map_stats.loc[index, 'mean_plus_minus'] = np.mean(map_data['+/-'])
    my_per_map_stats.loc[index, 'sum_plus_minus'] = sum(map_data['+/-'])
    my_per_map_stats.loc[index, 'mean_ADR'] = np.mean(map_data['ADR'])
    my_per_map_stats.loc[index, 'std_ADR'] = np.std(map_data['ADR'])
    my_per_map_stats.loc[index, 'mean_Rating'] = np.mean(map_data['Rating'])
    my_per_map_stats.loc[index, 'std_Rating'] = np.std(map_data['Rating'])
    my_per_map_stats.loc[index, 'sum_W'] = sum(map_data['W'])
    my_per_map_stats.loc[index, 'sum_T'] = sum(map_data['T'])
    my_per_map_stats.loc[index, 'sum_L'] = sum(map_data['L'])
    
    index += 1

my_per_map_stats.head() #previewing the stats table

Unnamed: 0,Map,mean_K,std_K,mean_D,std_D,mean_A,std_A,mean_HS,std_HS,mean_plus_minus,sum_plus_minus,mean_ADR,std_ADR,mean_Rating,std_Rating,sum_W,sum_T,sum_L
0,de_nuke,22.385246,7.412639,17.680328,5.387599,4.122951,2.125481,41.226776,13.186574,4.704918,1722,97.704918,20.510968,1.277623,0.397608,228,34,104
1,de_inferno,25.093333,6.806219,19.28,4.136214,4.653333,2.413812,43.253333,11.754254,5.813333,436,102.306667,20.305647,1.330133,0.377693,35,7,33
2,de_vertigo,20.759259,7.717069,18.814815,4.290705,4.703704,2.52124,44.611111,12.474295,1.944444,105,96.518519,24.727209,1.167222,0.41482,25,4,25
3,de_cache,22.764706,7.654521,18.264706,4.590968,4.617647,2.376536,40.529412,13.473799,4.5,153,99.441176,21.364059,1.272353,0.4079,16,4,14
4,de_mirage,23.112245,7.483155,19.193878,4.650342,4.479592,2.454607,41.581633,13.473272,3.918367,384,95.387755,23.217091,1.23102,0.40117,46,18,34


#### Results

Now we have a dataframe of all of our statistics for each map, which should let us investigate our performance on each map.

### Getting Our Friend's Stats

Obviously a team with just me on it would be the optimal team, however I unfortunately do not have four clones, so I will be requiring some friends to play with and consequently some stats to analyze, so I need to repeat the previous steps but for each of my friends. Again, we have to download & read the html of their stats page for each person (to save space I am doing it in a loop by concatenating strings to create the names of the various HTML files and inputting those into the open() function). After that, we can just repeat the previous steps for each friend by using a loop.

In [None]:
#getting tables and raw html for each friend
friends = ['Alec', 'Anthony', 'Arian', 'Daniel', 'Danny', 'Dylan', 'Eric', 'Ethan', 'Flynn', 'Issac', 'Jackton', 'Kristen', 'Matt', 'Nathan', 'Sarah', 'Tom', 'Tommy', 'Zack']
friend_tables = []
for friend in friends:
    matches = open((friend + " CS GO Stats.htm"), "r", encoding='utf-8')
    bs = BeautifulSoup(matches.read())
    readtable = pd.read_html(str(bs))
    friend_tables.append(readtable[2])

In [None]:
friend_stats = []
#this I would consider contrived code, its simply easier to walk through the explanation when I am only doing it for myself at first.
#I am not counting this towards the 150 lines.
for table in friend_tables:
    table['Score'] = table['Score'].astype("string")
    #Creating win column
    table['W'] = False
    table['T'] = False
    table['L'] = False

    index = 0
    while index < len(table.Map):
        scores = table.at[index, 'Score'].split(":")
        if int(scores[0]) > int(scores[1]):
            table.loc[index, 'W'] = True
        elif int(scores[0]) == int(scores[1]):
            table.loc[index, 'T'] = True
        else:
            table.loc[index, 'L'] = True
        index += 1

    #creating data structure to hold per map stats
    per_map_friend_stats = pd.DataFrame(['de_nuke', 'de_inferno', 'de_vertigo', 'de_cache', 'de_mirage', 'de_iris', 'de_basalt', 'de_ancient', 'cs_insertion2', 'de_overpass', 'de_dust2', 'cs_office', 'de_train', 'cs_agency'],  columns = ['Map'])
    per_map_friend_stats['mean_K'] = 0
    per_map_friend_stats['std_K'] = 0
    per_map_friend_stats['mean_D'] = 0
    per_map_friend_stats['std_D'] = 0
    per_map_friend_stats['mean_A'] = 0
    per_map_friend_stats['std_A'] = 0
    per_map_friend_stats['mean_HS'] = 0
    per_map_friend_stats['std_HS'] = 0
    per_map_friend_stats['mean_plus_minus'] = 0
    per_map_friend_stats['sum_plus_minus'] = 0
    per_map_friend_stats['mean_ADR'] = 0
    per_map_friend_stats['std_ADR'] = 0
    per_map_friend_stats['mean_Rating'] = 0
    per_map_friend_stats['std_Rating'] = 0
    per_map_friend_stats['sum_W'] = 0
    per_map_friend_stats['sum_T'] = 0
    per_map_friend_stats['sum_L'] = 0

    #calculating per map stats
    index = 0
    while index < len(my_per_map_stats.Map):
        map_data = table[table['Map'] == my_per_map_stats.Map[index]]

        per_map_friend_stats.loc[index, 'mean_K'] = np.mean(map_data['K'])
        per_map_friend_stats.loc[index, 'std_K'] = np.std(map_data['K'])
        per_map_friend_stats.loc[index, 'mean_D'] = np.mean(map_data['D'])
        per_map_friend_stats.loc[index, 'std_D'] = np.std(map_data['D'])
        per_map_friend_stats.loc[index, 'mean_A'] = np.mean(map_data['A'])
        per_map_friend_stats.loc[index, 'std_A'] = np.std(map_data['A'])
        per_map_friend_stats.loc[index, 'mean_HS'] = np.mean(map_data['HS%'])
        per_map_friend_stats.loc[index, 'std_HS'] = np.std(map_data['HS%'])
        per_map_friend_stats.loc[index, 'mean_plus_minus'] = np.mean(map_data['+/-'])
        per_map_friend_stats.loc[index, 'sum_plus_minus'] = sum(map_data['+/-'])
        per_map_friend_stats.loc[index, 'mean_ADR'] = np.mean(map_data['ADR'])
        per_map_friend_stats.loc[index, 'std_ADR'] = np.std(map_data['ADR'])
        per_map_friend_stats.loc[index, 'mean_Rating'] = np.mean(map_data['Rating'])
        per_map_friend_stats.loc[index, 'std_Rating'] = np.std(map_data['Rating'])
        per_map_friend_stats.loc[index, 'sum_W'] = sum(map_data['W'])
        per_map_friend_stats.loc[index, 'sum_T'] = sum(map_data['T'])
        per_map_friend_stats.loc[index, 'sum_L'] = sum(map_data['L'])

        index += 1
    friend_stats.append(per_map_friend_stats)

In [None]:
#printing out each column 
for friend in friends:
    print(friend)
    print(friend_stats[friends.index(friend)].head(1)) #only printing one line to save space

#### Results

Now we have that same table of statistics but for each of our friends, which is a big step as we now have the tools to be able to do some real analysis now.

### Joining The Dataframes

Although the dataframe per person structure we had before is easily readable and convienient for humans, joining it into one dataframe is much easier on the program's end for plotting functions, so we need to join it into one dataframe with pd.concat() (after adding each person's name to their dataframe).

In [None]:
#joining together friend_stats and my stats (my_per_map_stats) into a single dataframe
for friend in friends:
    friend_stats[friends.index(friend)]['name'] = friend
my_per_map_stats['name'] = "Thomas"
friend_stats.append(my_per_map_stats)
friend_stats_joined = pd.concat(friend_stats)

#### Results

Now we have our combined dataframe of statistics for everyone, meaning we can almost start performing exploratory visualizations.

### Additional Data

As a bit of additional data, we will take each person's individual matches and combine it into a single dataframe in a similar way to the stats dataframe. These matches are useful for visualizing distributions, and while they would be inefficient to use for selecting each player, they are very useful in visualizing the distribution of each stat, and can make some beautiful violin plots.

In [None]:
#joining together friend_tables and my_games into a single dataframe
for friend in friends:
    friend_tables[friends.index(friend)]['name'] = friend
my_games['name'] = "Thomas"
friend_tables.append(my_games)
friend_tables_joined = pd.concat(friend_tables)

#### Results

Now we have our combined dataframe of matches for everyone, meaning we can start making exploratory visualizations.

## Exploratory Visualizations

### Kills Per Map Per Player

The first visualizations we are going to make are scatter plots of kills per map per player. The friend name will be the x axis with the map being designated by the color of the point. This seems like an obvious first visualization to make because kills are an easily understandable and simple statistic to identify impact on a game. We will be using seaborn (a package for plotting) due to the ease of use and aesthetics of it's visualizations.

In [None]:
plot = sns.catplot(x = "name", y = "mean_K", hue = "Map", jitter = False, height = 6, aspect = 3, data = friend_stats_joined)
plot.set(xlabel ="Friend", ylabel = "Mean Kills per Game", title ='Mean Kills per Map per Player')

#### Results

The plot worked, but the data isnt nearly as consistent as we would like. The difference between all the different players isn't that much, and there is a huge amount of variance within each player.

### Filtering the Data

This does support the idea that per map statistics are valuable, but this level of variance is a little strange, so lets attempt to filter to maps with more than 10 games. This should increase the difference between each player by filtering out statistically insignificant results.

In [None]:
friend_stats_sigificant = friend_stats_joined[friend_stats_joined["sum_W"] + friend_stats_joined["sum_T"] + friend_stats_joined["sum_L"]  > 10]
plot = sns.catplot(x = "name", y = "mean_K", hue = "Map", jitter = False, height = 6, aspect = 3, data = friend_stats_sigificant)
plot.set(xlabel ="Friend", ylabel = "Mean Kills per Game", title ='Mean Kills per Map per Player, Maps with > 10 Games')

#### Results

This plot looks a lot better and accurately shows both variation on a per-map basis (some players are better at certain maps) and variation between players (some people are better than others), and it also corroborates my personal experience. I love inferno so it's not surprise that it's my best map, and Danny sucks in general.

### Testing ADR Per Map Per Player

Before we move into per-map analysis let's quickly try making the same visualization but with ADR instead of kills to see if there is any large disparity. Lower ADR/Kill will tend to occur when people are baiting other teammates, having them fight the enemy and die to they can easily clean up (having the knowledge of the enemy's positioning + the damage inflicted by your teammate).

In [None]:
plot = sns.catplot(x = "name", y = "mean_ADR", hue = "Map", jitter = False, height = 6, aspect = 3, data = friend_stats_sigificant)
plot.set(xlabel ="Friend", ylabel = "Mean ADR per Game", title ='Mean ADR per Map per Player, Maps with > 10 Games')

#### Results

There were no major shifts from the Kills plot to the ADR plot, which shows us that everything is just about good. The ADR plot shows less variance than the kills plot does though, as should be expected (only a small difference in ADR can indicate a large difference in kills, as trash damage that doesn't contribute to a kill will boost ADR).

## Per Map Exploratory Visualizations

### Nuke

<img src="https://vignette.wikia.nocookie.net/cswikia/images/5/51/De_nuke_thumbnail.jpg/revision/latest?cb=20180209112248" width="700"> <img src="https://external-content.duckduckgo.com/iu/?u=https%3A%2F%2Fwww.tobyscs.com%2Ffiles%2Fcsgo-de_nuke-callout.jpg&f=1&nofb=1" width="394">

The first map we'll be taking a look at is Nuke, which is one of the most popular maps in CS:GO and is notable for it's complexity relative to the other Active Duty (i.e. played at a competitive level, usually widly enjoyed by the community) maps and it's consistent revisions. Nuke is the one of the only maps in CS:GO with a double-layered design, meaning fast rotation times (time to get from one bombsite to the other) and accurate reading of sound cues is absolutely crucial to success on the map. There is an extreme variety of engagement distances too, with some areas on A site featuring near point-blank engagements and other areas like outside featuring ranges long enough to discourage any player without a sniper, and a vent so claustrophobic that knives are a viable weapon. Nuke is a map that rewards versatile players with good aim more than anything, and it is that very fact that makes this the most popular map for trolling in the game - lookup "FranzJ" on youtube.

In [None]:
nuke_stats = friend_tables_joined[friend_tables_joined['Map'] == "de_nuke"]
plot = sns.catplot(x = "name", y = "K", hue = "Map", kind = "violin", height = 6, aspect = 3, data = nuke_stats)
plot.set(xlabel ="Friend", ylabel = "Kills per Game", title ='Mean Kills per Player on Nuke')

#### Results

This is a nice violinplot that is pleasing on the eyes, but we can likely do better with a more readable plot.

### Box Plot

We can sacrifice a bit of style for readibility by creating a box plot instead - this easily shows the median, percentiles, and outliers and will likely be better for understanding the distribution (all the distributions are roughly normal, so a violin plot doesn't show much that a box plot won't)

In [None]:
plot = sns.catplot(x = "name", y = "K", hue = "Map", kind = "box", height = 6, aspect = 3, data = nuke_stats)
plot.set(xlabel ="Friend", ylabel = "Kills per Game", title ='Kills per Player on Nuke')

#### Results

We can easily see that Zack, Tommy, and I (I being Thomas) performed the best out of everyone with Zack having a few exceptional games. Ethan, Flynn, Kristen, and Jackton are also following close behind with strong performances themselves.

### Inferno

<img src="http://media.steampowered.com/apps/csgo/images/inferno/asite1-2.jpg?v=1" width="700"> <img src="https://external-content.duckduckgo.com/iu/?u=https%3A%2F%2Fwww.tobyscs.com%2Ffiles%2Fde_inferno-map-callout.jpg&f=1&nofb=1" width="394">

The second map we will be looking at is Inferno, a far more traditional map by most standards. Inferno (roughly speaking) sports the familiar three-lane style found in many CS:GO maps with the left/right lanes leading from Terrorist spawn to A site and B site, while the middle lane leads to a longer-range area that allows for rotates to either site. Another mainstay of the CS series is the B site supporting fewer entrances and a more grenade and rush heavy style of play. Similarly to Nuke, Inferno also has a range of engagement distances but all of Inferno's close range engagements are centered in Apartments and at the top of Banana which are natural choke points. This is a map where remembering grenade lineups (spots and angles where you can throw a grenade from for it to land in a certain advantageous spot) can be very instrumental as a Terrorist, and also heavily rewards those with good aim. Rotation times are quite long for Terrorists, so faking the other team out isn't as viable of a strategy on offense.

In [None]:
inferno_stats = friend_tables_joined[friend_tables_joined['Map'] == "de_inferno"]
plot = sns.catplot(x = "name", y = "K", hue = "Map", kind = "box", height = 6, aspect = 3, data = inferno_stats)
plot.set(xlabel ="Friend", ylabel = "Kills per Game", title ='Kills per Player on Inferno')

#### Results

Inferno is my strongest map (whereas many of my friends struggle on it), so it's not too surprising that I come out on top, but Eric, Flynn, and Zack (especially that one exceptional game) all did well too. You see a lot more variation among different players here because not all of us have played the map that much, and thus some don't know the map well enough to be very successful.

### Mirage

<img src="https://external-content.duckduckgo.com/iu/?u=https%3A%2F%2Fvignette.wikia.nocookie.net%2Fcswikia%2Fimages%2F1%2F1e%2FCSGO_Mirage_latest_version.jpg%2Frevision%2Flatest%3Fcb%3D20200301201524&f=1&nofb=1" width="700"> <img src="https://steamuserimages-a.akamaihd.net/ugc/902129724392201747/56353A73C55D76033FB0E95C156935FA24B34773/?interpolation=lanczos-none&output-format=jpeg&output-quality=95&fit=inside%7C1024%3A1024&composite-to=*,*%7C1024%3A1024&background-color=black" width="394">

Mirage is another extremely popular map sporting CS's typical three-lane style, with B site supporting more rushes and A site encouraging careful use of grenades and a varied attack. Mirage is a map where grenades are absolutely crucial, especially on A site. Mirage has long rotation times but due to the structure of the middle lane, allows for a lot of more complex plays from the Terrorist side. There also aren't very many close range engagements on Mirage so rifles and snipers are favored very heavily. To succeed on Mirage you need a knowledge of grenade lineups and strong strategy.

In [None]:
inferno_stats = friend_tables_joined[friend_tables_joined['Map'] == "de_mirage"]
plot = sns.catplot(x = "name", y = "K", hue = "Map", kind = "box", height = 6, aspect = 3, data = inferno_stats)
plot.set(xlabel ="Friend", ylabel = "Kills per Game", title ='Kills per Player on Mirage')

#### Results

On Mirage Eric seems to have done the best, but after closer inspection it seems to be because he has only played 8 games on the map. This seems a bit misleading but is overall fine as these visualizations are just meant to give a general idea about our performance, they won't be used in the optimal team creation. Arian seems to only have 1 game on this map, hence the single line, but as for the rest of the data there really isn't a standout "best player". Also I am hoping my lower outlier was only a few round game where the other team surrendered, otherwise that is embarassing.

### Vertigo

<img src="https://vignette.wikia.nocookie.net/cswikia/images/2/2b/CSGO_Vertigo_18_Nov_2019_update.jpg/revision/latest?cb=20200307130224" width="700"> <img src="https://external-content.duckduckgo.com/iu/?u=https%3A%2F%2Fi.redd.it%2Fqxc0wd7lfip21.png&f=1&nofb=1" width="790">

The last map we are looking at is Vertigo, another map outside the norm in CS. Vertigo has somewhat of a double-layered design like Nuke, but unlike Nuke Vertigo has the terrorists spawning on the bottom layer and the Counter-Terrorists on the top layer, along with mostly keeping the bottom layer on one corner of the map and the top layer in the opposite corner. This essentially results in the layered design not mattering nearly as much as nuke, aside from the fact that the Counter-Terrorists almost always having the advantage of the high ground. The one other thing that does matter about Vertigo's double-layered design is the importance of sound cues - each team can hear each other quite easily despite being far apart in terms of walking distance. This combined with the rotation times, choke points in key areas, and abilitly to rotate from one site to the other means that Vertigo is a very strategy-dependent map. Most of the grenades for the terrorist side are mostly reserved for when they are almost already on the site, as they are penned in below and don't have much of an opportunity to throw them, but can still be vital in certain situations.

In [None]:
inferno_stats = friend_tables_joined[friend_tables_joined['Map'] == "de_vertigo"]
plot = sns.catplot(x = "name", y = "K", hue = "Map", kind = "box", height = 6, aspect = 3, data = inferno_stats)
plot.set(xlabel ="Friend", ylabel = "Kills per Game", title ='Kills per Player on Vertigo')

#### Results

Kristen appears to do quite well but is again an instance where there weren't many games played, so her performance should be considered as unreliable. Arian still has one game, but the rest of us perform decently, some with a noticably very high spread, but no real standouts.

## Optimal Team Creation

Now we will finally put everything together and generate some optimal teams for each map.

### Maximize Function

Now we will make a versatile maximize function that will take in the data, desired maps, and estimators, and then will find the "best" team for each map based on those estimator functions.

In [None]:
#this function will take in data and estimators and find the best team
def maximize(table, maps, methods):
    players_per_map = [] #stores the five best players for each map
    
    #for each map
    for csgo_map in maps:
        chosen_players = [] #stores the five best players for this map
        map_table = table[table['Map'] == csgo_map]
        filtered_table = map_table[map_table["sum_W"] + map_table["sum_T"] + map_table["sum_L"]  > 10] #filtered down to the data we want
        
        #while players still needed
        players_found = 0
        while players_found < 5:
            players = filtered_table.name.unique()
            players_score = []
            max_score = -1 * 10 ^ (10)
            max_player = ""

            #estimate and find best scoring player
            for player in players:
                players_score.append(methods[players_found](filtered_table[filtered_table['name'] == player]))
                if players_score[-1] > max_score:
                    max_score = players_score[-1]
                    max_player = player

            filtered_table = filtered_table[filtered_table['name'] != max_player] #remove current best player from data
            chosen_players.append(max_player) #add to chosen players
            players_found += 1 #increment players found
            
        players_per_map.append(chosen_players)
        
    #for each map print the map name and players selected
    index = 0
    while index < len(maps):
        print("The Best Team for " + maps[index] + " is:")
        print(players_per_map[index])
        index += 1
    return players_per_map

### Basic Kill-Based Estimation

First off let's try a basic Kill-Based Estimator

In [None]:
target_maps = ['de_nuke', 'de_inferno', 'de_mirage', 'de_vertigo']
def kill_estimator(data):
    return data['mean_K'].tolist()[0]

print("When only accounting for kills:")
value = maximize(friend_stats_joined, target_maps, [kill_estimator] * 5)

#### Results

That was a very naive approach as it says that "kills are the only thing that matters" and leaves it at that, but it does verify that our maximize function is working. It corroborates our previous visualizations showing who is the best on each map, and is generally a simple but likely effective team composition.

### A Slightly More Nuanced Approach

Now let's try a estimator that is a little bit smarter - by combining Kills and Assists. I will weight Assists as half a Kill, as that is basically what an Assist is.

In [None]:
target_maps = ['de_nuke', 'de_inferno', 'de_mirage', 'de_vertigo']
def kill_assist_estimator(data):
    return kill_estimator(data) + (data['mean_A'].tolist()[0] * .5)

print("When only accounting for kills and assists:")
value = maximize(friend_stats_joined, target_maps, [kill_assist_estimator] * 5)

#### Results

This is slightly different, but is just about the same - which shows us that people who get a lot of kills tend to also get a lot of assists, aka no one (amongst the best players) is seriously struggling to finish their kills.

### Taking Consistency Into Account

Now lets try to find the players who perform the best and are the most consistent by combining standard deviation with the kill_assist_estimator. I am going to account for consistency by multiplying the previous estimator's result by 1/sqrt(standard deviation of Kills) to attempt to punish inconsistent players.

In [None]:
target_maps = ['de_nuke', 'de_inferno', 'de_mirage', 'de_vertigo']
def consistency_estimator(data):
    return kill_assist_estimator(data) * (1 / np.sqrt(data['std_K'].tolist()[0]))

print("When accounting for kills, assists, and consistency:")
value = maximize(friend_stats_joined, target_maps, [consistency_estimator] * 5)

#### Results

This seems a lot better - I am an extremely inconsistent player (especially on Nuke and Vertigo) so it makes sense that my performance would plumment when accounting for consistency. That being said, I think we can do better.

### Including The HLTV Rating

As an additional step I will be multiplying the value of the previous estimator by the HLTV rating (~.7 being bad, 1 being average, ~1.3 being good)

In [None]:
target_maps = ['de_nuke', 'de_inferno', 'de_mirage', 'de_vertigo']
def hltv_estimator(data):
    return consistency_estimator(data) * ((data['mean_Rating'].tolist()[0]))

print("When accounting for kills, assists, consistency, and HLTV rating:")
value = maximize(friend_stats_joined, target_maps, [hltv_estimator] * 5)

#### Results

This seems just about right (HLTV rating should be a good all-around estimator), but I have one final variable I want to account for, so I think we can improve this further.

### Accounting For Deaths

As a final percentage change, I am applying mean Deaths as a negative percentage (i.e. 25 kills equals -25% rating). This sounds like a pretty hefty change, but realistically everyones deaths will be from 12-22 on average, so it's only about a 10% modifier. This will already be partially accounted for in the HLTV rating but I want to apply a negative effect again as Deaths can be very harmful to a team composition - a death means armor and a gun lost for the next round that have to be repurchased, in addition to the impact on the gameplay of the round.

In [None]:
target_maps = ['de_nuke', 'de_inferno', 'de_mirage', 'de_vertigo']
def overall_estimator(data):
    return consistency_estimator(data) * (1 - (data['mean_D'].tolist()[0] / 100))

print("When accounting for kills, assists, consistency, HLTV rating, and deaths:")
value = maximize(friend_stats_joined, target_maps, [overall_estimator] * 5)

#### Results

This is probably the most accurate estimator yet, and I think I'm satisfied with it. It accounts for a variety of statistics and balances it a sensical way.

### Mixing The Estimators

Finally I am going to combine two of these estimators as an expiriment: I'll have two people estimated by the Kill + Assist estimator and three people estimated by the overall estimator. The idea is to have two people who get a lot of kills and have three people who are well balanced players.

In [None]:
print("When accounting for kills and assists for two people and kills, assists, consistency, HLTV rating, and deaths for the other three:")
value = maximize(friend_stats_joined, target_maps, [kill_assist_estimator, kill_assist_estimator, overall_estimator, overall_estimator, overall_estimator])

#### Results

This is really just a bit of an experiment as it is somewhat arbitary, but it makes sense that you want a couple people to get kills and some more consistent players to back them up, so I'd say these results have some validity. I think these compositions actually would work, so I'd say this is a valid estimator.

## Conclusion And Final Thoughts

If I were to actually use these teams in competitive matchmaking, I am sure they would do well as these are generally just the most skilled people out of my friends. I think the latter teams would be the best (The Overall and Mixed estimator ones), but really it isn't as simple as this analysis makes it out to be. It comes down to who is good at what angles on which maps and who can fullfill what roles on the team; things that cannot be properly represented by raw numbers like this. If all it took to predict the optimum team was a single low-ranked college kid all professional teams would run like clockwork. Again though, I think these team compositions make sense and are generally useful to attempt to create the optimum team amongst your friends. I actually had a lot of fun doing this project and showing all the statistics to my friends and enjoying the chaos it caused among them. All it takes is downloading the HTML of their stats page manually and following the tutorial, and then you can also make your friends angry by saying you're statistically better than them.