Import the necessary libraries to scrape player stats. These include Selenium and BeautifulSoup.

In [1]:
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from bs4 import BeautifulSoup

Found a bunch of seed usernames, one from every rank. These will be needed to find other usernames/stats of players from the same rank(s)! Manually (yes, painstakingly) grabbed these usernames from here -  https://tracker.gg/valorant/lfg (Use the 'Rank' filter).

Note that we only need a player's username (e.g. IshyWishy#7495) to scrape all their stats.

In [2]:
usernames = [
    'HabibiWasHere#8127',  # Iron 2
    'choraprajett#S4tsu',  # Iron 3
    'shaHIM#0911',         # Bronze 1
    'roza#meow',           # Bronze 2
    'hawaiiicattt#sigma',  # Bronze 3
    'zezosk#6969',         # Silver 1
    'Orgito29#ALBOZ',      # Silver 2
    'yuuviix#TTV',         # Silver 3
    'AllyThotti#999',      # Gold 1
    'TabletOnly#LN7',      # Gold 2
    'SinS Kalper#Kal16',   # Gold 3
    'DANIELMERS#2505',     # Platinum 1
    'StupidKaj#simp',      # Platinum 2
    'LoserUri#momo',       # Platinum 3
    'Kattu Poochi#Molly',  # Diamond 1
    'Valkyzek#kydae',      # Diamond 2
    'lia#εïз',             # Diamond 3
    'Zheng#0511',          # Ascendant 1
    'Nalapaya#UwU',        # Ascendant 2
    'saranghae#666',       # Ascendant 3
    'General#360zz',       # Immortal 1
    'pierce#agne',         # Immortal 2
    'Keem あ#1st',         # Immortal 3
    'curry#lisa'           # Radiant
]

These are the stat names that we are going to scrape. We won't be using all of them to predict a player's rank, but they are shown here for illustrative purposes, since we scrape all of them anyway.

In [3]:
stat_names = ['Damage/Round', 'K/D Ratio', 'Headshot %', 'Win %',\
              'Wins', 'KAST', 'DDΔ/Round', 'Kills', 'Deaths', 'Assists',\
              'ACS', 'KAD Ratio', 'Kills/Round', 'First Bloods',\
              'Flawless Rounds', 'Aces']

Here is what a mini-scrape looks like! The scraped stats for 2 players have been printed. We also employ 2 utility functions to parse the soup and give us the required data in the right format.

In [4]:
from utils import find_rank, find_highlighted_stats

for username in usernames[:2]:
    with webdriver.Chrome() as driver:
        print("Scraping", username)
        # Scrape from '/overview'
        driver.get(f"https://tracker.gg/valorant/profile/riot/{username.replace('#', '%23')}/overview")
        page_source = driver.page_source
    
    # Parse HTML
    soup = BeautifulSoup(page_source, 'html.parser')
    rank = find_rank(soup)
    others = find_highlighted_stats(soup)
    print(rank, others)
    print()

Scraping HabibiWasHere#8127
Iron 2 {'Damage/Round': '154.5', 'K/D Ratio': '1.12', 'Headshot %': '12.7%', 'Win %': '41.2%', 'Wins': '14', 'KAST': '71.8%', 'DDΔ/Round': '13', 'Kills': '574', 'Deaths': '512', 'Assists': '185', 'ACS': '240.6', 'KAD Ratio': '1.48', 'Kills/Round': '0.8', 'First Bloods': '88', 'Flawless Rounds': '21', 'Aces': '1'}

Scraping choraprajett#S4tsu
Iron 3 {'Damage/Round': '149.1', 'K/D Ratio': '1.03', 'Headshot %': '19.7%', 'Win %': '45.0%', 'Wins': '9', 'KAST': '71.6%', 'DDΔ/Round': '6', 'Kills': '317', 'Deaths': '307', 'Assists': '122', 'ACS': '221.2', 'KAD Ratio': '1.43', 'Kills/Round': '0.8', 'First Bloods': '19', 'Flawless Rounds': '19', 'Aces': '1'}



Here's all the possible ranks that a player can belong to. A player must belong to exactly one rank at a given time. The ranks are sorted from lowest (Iron 1) to highest (Radiant).

We intend on scraping the stats of exactly 5000 players per rank. We initialize rank_counts to all zeroes before we start our scraping.

In [5]:
ranks = ["Iron 1", "Iron 2", "Iron 3",\
         "Bronze 1", "Bronze 2", "Bronze 3",\
         "Silver 1", "Silver 2", "Silver 3",\
         "Gold 1", "Gold 2", "Gold 3",\
         "Platinum 1", "Platinum 2", "Platinum 3",\
         "Diamond 1", "Diamond 2", "Diamond 3",\
         "Ascendant 1", "Ascendant 2", "Ascendant 3",\
         "Immortal 1", "Immortal 2", "Immortal 3",\
         "Radiant"]

rank_counts = { rank: 0 for rank in ranks }
rank_counts

{'Iron 1': 0,
 'Iron 2': 0,
 'Iron 3': 0,
 'Bronze 1': 0,
 'Bronze 2': 0,
 'Bronze 3': 0,
 'Silver 1': 0,
 'Silver 2': 0,
 'Silver 3': 0,
 'Gold 1': 0,
 'Gold 2': 0,
 'Gold 3': 0,
 'Platinum 1': 0,
 'Platinum 2': 0,
 'Platinum 3': 0,
 'Diamond 1': 0,
 'Diamond 2': 0,
 'Diamond 3': 0,
 'Ascendant 1': 0,
 'Ascendant 2': 0,
 'Ascendant 3': 0,
 'Immortal 1': 0,
 'Immortal 2': 0,
 'Immortal 3': 0,
 'Radiant': 0}

We first write all the stat names on a single line in a new CSV file. This will serve as the row of column names, and will be useful for when we use pandas for visualization later. Run the cell block below only once, as it overwrites existing data (learnt this the hard way).

In [None]:
with open('stats.csv', mode='w', newline='', encoding='utf-8') as file:
    writer = csv.writer(file)
    writer.writerow(stat_names)

Now that we have our stats CSV file ready to be populated, we begin our scraping. Our scraping strategy is this -
We begin with our initial list of seed users, and scrape their stats one at a time. For each player, we then look at the first (most recent) match in their history and from there we obtain the names of 9 more players in similar ranks! Valorant is a 5v5 game and so each game has 10 players, which is what allows us to get the names of 9 more players per match. We look at only one match as this gives us enough players to continue BFS-ing on. Due to the matchmaking system of the game, each player is put in a game with people of similar ranks, allowing us to find more usernames per rank starting from a single username.

We first import an inbuilt queue module as our BFS data structure.

In [6]:
import queue

Now we initialize a queue and add the list of seed users into the queue.

In [7]:
username_queue = queue.Queue()
for username in usernames:
    username_queue.put(username)

We define a helper function that scrapes 9 more usernames (players) by looking at one match of the given username. We make use of the '/matches' page for this.

In [8]:
def get_usernames_from_match_history(username):
    with webdriver.Chrome() as driver:
        # Open the matches page
        driver.get(f"https://tracker.gg/valorant/profile/riot/{username.replace('#', '%23')}/matches")
        
        # Put everything in a try block in case it fails, return an empty list (no more usernames) in this case
        try:
            # To get the first match details, we need to find the corresponding row and click on it
            first_match_element = WebDriverWait(driver, 10).until(\
                      EC.visibility_of_element_located((By.XPATH, "//div[contains(@class, 'vmr trn-match-row')]")))
            
            # Clickable now
            first_match_element.click()
            
            # Extract page source
            page_source = driver.page_source
            
        except:
            # Could not find more usernames :(
            return []
    
    # Find usernames
    soup = BeautifulSoup(page_source, 'html.parser')
    
    # Find span tags that contain usernames
    username_spans = soup.find_all('span', class_=lambda x: x and 'trn-ign__username fit-long-username' in x) 
    usernames = [span.text for span in username_spans]
    
    # Find span tags that contain the discriminator (part after (including) the #)
    # For example, if IshyWishy#7495 is the full username, IshyWishy is the username and #7495 is the discriminator
    discriminator_spans = soup.find_all('span', class_=lambda x: x and 'trn-ign__discriminator' in x) 
    discriminators = [span.text for span in discriminator_spans]
    
    # Combine usernames and discriminators to get the full username
    usernames_with_disc = [usernames[i] + discriminators[i] for i in range(len(usernames))]
    
    return usernames_with_disc

Now that we have the helper/utility functions in place, let's start scraping! Remember, we keep going until we have exactly 5000 players from every rank!

In [9]:
from utils import write_to_stats_csv

def scrape_stats():
    '''
    Scrapes everything until the stopping condition is met!
    '''
    while not should_stop(rank_counts) and username_queue.qsize():
        # Get the username of the player whose stats we're about to scrape
        username = username_queue.get()
        
        # Try in case something goes wrong
        try:
            with webdriver.Chrome() as driver:
                # Use the /overview page as shown earlier
                driver.get(f"https://tracker.gg/valorant/profile/riot/{username.replace('#', '%23')}/overview")
                page_source = driver.page_source
                
            soup = BeautifulSoup(page_source, 'html.parser')
            
            # Find rank first, and only write other stats to CSV if we haven't already hit the limit (5000) for this rank
            # Note that we also write the rank as this serves as the label for training our prediction model.
            rank = find_rank(soup)
            if rank_counts[rank] < 5000:
                stats = find_highlighted_stats(soup)
                write_to_stats_csv(stats, rank)
                rank_counts[rank]+= 1
                
            # Get ~9 more usernames by looking at the most recent match of this user.
            more_usernames = get_usernames_from_match_history(username)
            
            # Add all these usernames to our queue to continue BFS-ing!
            for u in more_usernames:
                username_queue.put(u)
                
        except:
            # Could not scrape this username :(
            continue
            
    print("Scraped a bucketload of data!")

In [10]:
# scrape_stats()

Scraped a bucketload of data!


Phew, that took a hot minute. Now that we're done scraping, let's see what our CSV data looks like in a pandas dataframe

In [11]:
import pandas as pd

In [12]:
file_name = 'stats.csv'
df = pd.read_csv(file_name)
df.head(7)

Unnamed: 0,Damage/Round,K/D Ratio,Headshot %,Win %,Wins,KAST %,DDΔ/Round,Kills,Deaths,Assists,ACS,KAD Ratio,Kills/Round,First Bloods,Flawless Rounds,Aces,Rank
0,106.0,1.19,12.17,34.89,9,48.65,9,444,404,126,126.99,0.89,0.79,64,13,1,Iron 1
1,102.05,1.15,13.21,33.5,10,53.19,10,460,421,129,114.37,1.01,0.84,78,15,2,Iron 2
2,103.77,1.17,13.67,32.05,10,48.92,10,484,454,133,133.68,0.89,0.85,79,15,2,Iron 3
3,107.44,1.19,13.68,39.08,10,49.37,10,465,410,157,121.3,1.01,0.83,78,17,2,Bronze 1
4,114.16,1.19,15.03,35.04,10,49.97,10,485,416,153,140.47,0.95,0.81,84,17,2,Bronze 2
5,126.92,1.14,14.8,40.81,11,52.28,11,460,467,155,123.93,1.02,0.94,82,15,2,Bronze 3
6,128.07,1.3,13.73,40.7,11,52.1,11,516,478,146,128.64,1.0,0.89,88,18,2,Silver 1


We have all our data sitting nicely in a dataframe! Onto prediction! :)