This code defines a Python class called "Player", which represents a player in a video game. The class has a dictionary attribute called "Player_Statistics", which contains various statistics for the player, such as their in-game name, real name, age, total kills, headshot percentage, etc.

The "init" method initializes the "Ingame_Name" statistic for a player object with the in-game name passed as a parameter when the object is created.

The "updateStat" method is used to update a specific statistic for the player. It takes two parameters: "statname" (the name of the statistic to update) and "value" (the new value to assign to the statistic). This method updates the corresponding entry in the "Player_Statistics" dictionary for the player object with the new value.

(The first part of code on data scraping from individual player profiles is inspired by https://github.com/JHitchiner/PyHLTV)

In [61]:
import requests
from lxml import html
from bs4 import BeautifulSoup
import pandas as pd
import urllib3

In [106]:
class Player:
	# Player Statistics dictionary
	Player_Statistics = {
	"Ingame_Name" : '',
	"IRL_Name" : '',
	"Current_Team" : '',
	"Age" : 0,
	"Total_Kills" : 0,
	"Headshot_Perc" : 0,
	"Total_Deaths" : 0,
	"KD_Ratio" : 0,
	"Damage_Per_Round" : 0,
	"Nade_Damage_Per_Round" : 0,
	"Maps_Played" : 0,
	"Rounds_Played" : 0,
	"Kills_Per_Round" : 0,
	"Assists_Per_Round" : 0,
	"Deaths_Per_Round" : 0,
	"Saved_By_Teammates_Per_Round" : 0,
	"Saved_Teammates_Per_Round" : 0,
	"Rating" : 0
	}

	def __init__(self, igname):
		self.Player_Statistics["Ingame_Name"] = igname

	def updateStat(self, statname, value):
		# Takes stat name and value and updates players stat
		self.Player_Statistics[statname] = value

This code defines a function called parseHTLVhtml which takes a request object r as input and returns a player object. The function checks if the status code of the request object is 200 (OK) and if not, it prints a message indicating that the HTML parsing was unsuccessful. If the status code is 200, the function extracts various statistics of a player from the HTML content of the response object, and creates a player object with these statistics.

The html module is used to create an HTML parse tree from the content of the response object. The function extracts the player's in-game name, real name, current team, age, total kills, headshot percentage, total deaths, kill-death ratio, damage per round, nade damage per round, maps played, rounds played, kills per round, assists per round, and deaths per round by using XPath expressions to navigate through the HTML parse tree. These statistics are stored in the player object using the updateStat method. The player object is returned at the end of the function.

In [107]:
def parseHTLVhtml(r):

	#Takes in a request object and returns a player object

	# Check to see if got 200 response back, if not end parsing
	if r.status_code != 200:
		print("Unable to parse html, got status code", r.status_code)
	# Continue if got 200 response
	# Convert response to strcutred format, create new player object, fill statistics
	tree = html.fromstring(r.content)
	igname_list = tree.xpath('//h1[@class="summaryNickname text-ellipsis"]/text()')
	igname = igname_list[0] if len(igname_list) > 0 else ''

	player = Player(igname) # Creater player object to fill in stats

	# Now have player object, time to fill in all statistics
	irlname_list = tree.xpath('//div[@class="text-ellipsis"]/text()')
	irlname = irlname_list[0] if len(irlname_list) > 0 else ''
	player.updateStat("IRL_Name", irlname)

	team_list = tree.xpath('//a[@class="a-reset text-ellipsis"]/text()')
	if len(team_list) > 0:
		currteam = team_list[0]
	else:
		currteam = "N/A"

	player.updateStat("Current_Team", currteam)

	age_list = tree.xpath('//div[@class="summaryPlayerAge"]/text()')
	age = age_list[0] if len(age_list) > 0 else ''
	player.updateStat("Age", age)

	totkills_list = tree.xpath('/html/body/div[2]/div/div[2]/div[1]/div/div[8]/div/div[1]/div[1]/span[2]/text()')
	totkills = totkills_list[0] if len(totkills_list) > 0 else ''
	player.updateStat("Total_Kills", totkills)

	hsperc_list = tree.xpath('/html/body/div[2]/div/div[2]/div[1]/div/div[8]/div/div[1]/div[2]/span[2]/text()')
	hsperc = hsperc_list[0] if len(hsperc_list) > 0 else ''
    
	player.updateStat("Headshot_Perc", hsperc)

	totdeath_list = tree.xpath('/html/body/div[2]/div/div[2]/div[1]/div/div[8]/div/div[1]/div[3]/span[2]/text()')
	totdeath = totdeath_list[0] if len(totdeath_list)>0 else ''
	player.updateStat("Total_Deaths", int(totdeath))

	kdratio_list = tree.xpath('/html/body/div[2]/div/div[2]/div[1]/div/div[8]/div/div[1]/div[4]/span[2]/text()')
	kdratio = kdratio_list[0] if len(kdratio_list)>0 else ''
	player.updateStat("KD_Ratio", float(kdratio))

	dmgpround_list = tree.xpath('/html/body/div[2]/div/div[2]/div[1]/div/div[8]/div/div[1]/div[5]/span[2]/text()')
	dmgpround = dmgpround_list[0] if len(dmgpround_list)>0 else ''
	player.updateStat("Damage_Per_Round", float(dmgpround))

	nadedmgpround_list = tree.xpath('/html/body/div[2]/div/div[2]/div[1]/div/div[8]/div/div[1]/div[6]/span[2]/text()')
	nadedmgpround = nadedmgpround_list[0] if len(nadedmgpround_list)>0 else ''
	player.updateStat("Nade_Damage_Per_Round", float(nadedmgpround))

	mapsplayed_list = tree.xpath('/html/body/div[2]/div/div[2]/div[1]/div/div[8]/div/div[1]/div[7]/span[2]/text()')
	mapsplayed = mapsplayed_list[0] if len(mapsplayed_list)>0 else ''
	player.updateStat("Maps_Played", int(mapsplayed))

	roundsplayed_list = tree.xpath('/html/body/div[2]/div/div[2]/div[1]/div/div[8]/div/div[2]/div[1]/span[2]/text()')
	roundsplayed = roundsplayed_list[0] if len(roundsplayed_list)>0 else ''
	player.updateStat("Rounds_Played", int(roundsplayed))

	killspround_list = tree.xpath('/html/body/div[2]/div/div[2]/div[1]/div/div[8]/div/div[2]/div[2]/span[2]/text()')
	killspround = killspround_list[0] if len(killspround_list)>0 else ''
	player.updateStat("Kills_Per_Round", float(killspround))

	assistsperround_list = tree.xpath('/html/body/div[2]/div/div[2]/div[1]/div/div[8]/div/div[2]/div[3]/span[2]/text()')
	assistsperround = assistsperround_list[0] if len(assistsperround_list)>0 else ''
	player.updateStat("Assists_Per_Round", float(assistsperround))

	deathsperround_list = tree.xpath('/html/body/div[2]/div/div[2]/div[1]/div/div[8]/div/div[2]/div[4]/span[2]/text()')
	deathsperround = deathsperround_list[0] if len(deathsperround_list)>0 else ''
	player.updateStat("Deaths_Per_Round", float(deathsperround))

	savedbyteam_list = tree.xpath('/html/body/div[2]/div/div[2]/div[1]/div/div[8]/div/div[2]/div[5]/span[2]/text()')
	savedbyteam = savedbyteam_list[0] if len(savedbyteam_list)>0 else ''
	player.updateStat("Saved_By_Teammates_Per_Round", float(savedbyteam))

	savedteam_list = tree.xpath('/html/body/div[2]/div/div[2]/div[1]/div/div[8]/div/div[2]/div[6]/span[2]/text()')
	savedteam = savedteam_list[0] if len(savedteam_list)>0 else ''
	player.updateStat("Saved_Teammates_Per_Round", float(savedteam))

	rating_list = tree.xpath('/html/body/div[2]/div/div[2]/div[1]/div/div[8]/div/div[2]/div[7]/span[2]/text()')
	rating = rating_list[0] if len(rating_list)>0 else ''
	player.updateStat("Rating", float(rating))

    # create a dictionary from player.Player_Statistics
	stats_dict = dict(player.Player_Statistics.items())

	return stats_dict

The code defines a function scrapePlayer that takes a single argument r which is a URL string. The function sends a GET request to the specified URL using the requests library, with additional headers containing a user agent to ensure successful connection with HLTV website. The verify=False parameter is used to skip SSL verification, because otherwise the function might raise SSL verification error. After receiving the response, the HTML content is parsed using another function called parseHTLVhtml and the parsed player data is returned.

This function is designed to scrape data about CS-GO players from the HLTV website, given a URL that points to a specific player's profile page.

In [114]:
# All players' dataset
BASEURL = 'https://www.hltv.org/stats/players/'
# Need user agent cause HLTV won't accept any connection without one
HEADERS = {'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:68.0) Gecko/20100101 Firefox/68.0'}

def scrapePlayer(r):
    r = requests.get(r, headers=HEADERS, verify=False)  # Had to skip SSL verification, otherwise it gave SSL verification error, even after trying in many ways
    player = parseHTLVhtml(r)
    return player

This code defines a function scrapePlayers that takes a URL as input. It then sends a GET request to the URL using the requests library and uses BeautifulSoup to parse the HTML content of the response. It selects all the links with the class playerCol a and creates a list player_links of these links. For each link in player_links, it appends the statistics of the corresponding player to a list player_stats_list. To get the statistics, it calls the function scrapePlayer with the URL of the player's page as input. Finally, the function returns player_stats_list, which is a list of statistics for all the players whose links were found on the input URL page.

In [109]:
def scrapePlayers(url):
    res = requests.get(url)
    soup = BeautifulSoup(res.content, 'html.parser')
    player_links = soup.select('.playerCol a')
    player_stats_list = []
    for link in player_links:
        player_url = 'https://www.hltv.org' + link['href']
        player_stats = scrapePlayer(player_url)
        player_stats_list.append(player_stats)
    return player_stats_list

The code first disables the SSL verification warnings generated by urllib3 library.

Then, it calls the function scrapePlayers by passing BASEURL (a URL string) as its argument. This function retrieves the HTML content of the webpage from the URL using the requests library, extracts player links from the webpage using Beautiful Soup, and then calls the function scrapePlayer for each player link to scrape their statistics. Finally, it returns a list of dictionaries containing the statistics of all players on the webpage.

The returned list of dictionaries all_data will contain statistical data of all the players on the HLTV website in the format of Python dictionaries.

In [110]:
urllib3.disable_warnings()

all_data = scrapePlayers(BASEURL)

In [111]:
# Convert nested list to flat list of dictionaries
all_data = [dict(player_stats) for player_stats in all_data]

In [112]:
# Create pandas dataframe with more readable column names
df = pd.DataFrame(all_data)
df.columns = ['Ingame Name', 'Real Name', 'Current Team', 'Age', 'Total Kills', 'Headshot %', 'Total Deaths', 'K/D Ratio', 'Damage / Round', 'Greande Damage / Round', 'Maps Played', 'Rounds Played', 'Kills / Round', 'Assists / Round', 'Deaths / Round', 'Saved by Teammates / Round', 'Saved Teammates / Round', 'Rating']

Unnamed: 0,Ingame Name,Real Name,Current Team,Age,Total Kills,Headshot %,Total Deaths,K/D Ratio,Damage / Round,Greande Damage / Round,Maps Played,Rounds Played,Kills / Round,Assists / Round,Deaths / Round,Saved by Teammates / Round,Saved Teammates / Round,Rating
0,s1mple,Oleksandr Kostyliev,Natus Vincere,25 years,9528,41.5%,7163,1.33,84.3,2.6,433,11460,0.83,0.1,0.63,0.08,0.11,1.25
1,ZywOo,Mathieu Herbaut,Vitality,22 years,5176,39.5%,3933,1.32,84.4,3.6,236,6353,0.81,0.11,0.62,0.1,0.12,1.27
2,coldzera,Marcelo David,00NATION,28 years,5762,48.1%,4768,1.21,79.6,1.9,283,7525,0.77,0.11,0.63,0.07,0.1,1.14
3,NiKo,Nikola Kovač,G2,26 years,8026,50.4%,6920,1.16,85.2,4.5,384,10373,0.77,0.13,0.67,0.08,0.1,1.18
4,device,Nicolai Reedtz,Astralis,27 years,11361,32.9%,9657,1.18,78.1,3.9,572,15142,0.75,0.11,0.64,0.08,0.08,1.12
5,ropz,Robin Kool,FaZe,23 years,4801,51.4%,4187,1.15,75.9,2.3,251,6677,0.72,0.11,0.63,0.06,0.09,1.1
6,GuardiaN,Ladislav Kovács,,31 years,6169,28.0%,5574,1.11,71.2,2.7,332,8751,0.7,0.1,0.64,0.09,0.09,1.05
7,Twistzz,Russel Van Dulken,FaZe,23 years,5346,62.1%,4895,1.09,73.3,3.7,290,7793,0.69,0.11,0.63,0.1,0.11,1.08
8,electroNic,Denis Sharipov,Natus Vincere,24 years,6830,48.7%,6399,1.07,79.9,5.0,365,9731,0.7,0.14,0.66,0.09,0.09,1.11
9,Magisk,Emil Reif,Vitality,24 years,8382,47.7%,7874,1.06,77.9,6.0,452,12129,0.69,0.14,0.65,0.08,0.09,1.09


In [113]:
# stores the dataframe in a csv file named 'Updated_Scrape_Data.csv'
df.to_csv('Updated_Scrape_Data.csv', index=False)