# Valorant web scraping

I will make functions to scrape data and create datasets from [vlr.gg](https://www.vlr.gg/) matches. I will apply logistic regression to the data. The main purpose of this project is to get familiar with web scraping.

In [1]:
import math
import requests
import csv
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from bs4 import BeautifulSoup
from tqdm import tqdm
from sklearn import metrics
from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler

First I will make functions to scrape the data from vlr.gg.

## Scraping the data

Here are the functions I implemented in order to get the data from Valorant matches.

In [2]:
# Return soups with individual map stats
def get_map_soups(soup):
    return soup.find_all("div", class_="vm-stats-game", attrs={"data-game-id": lambda x: x != "all"})

# Return True if left team won the map, False otherwise
def get_map_result(map_soup):
    left_team, right_team = map_soup.find_all("div", class_="score")
    left_team = int(left_team.text.strip())
    right_team = int(right_team.text.strip())
    return left_team > right_team

# Return one soup for each team - their both sides stats on the map
def get_team_stats_soups(map_soup):
    tables = map_soup.find_all("table")
    ret = []
    for table in tables:
        ret.append(table.find_all("span", class_="mod-both"))
    return ret

# Return dataframe with the team stats (similar to how it's shown at vlr.gg)
# Return None if Rating is not present (some matches don't have player ratings)
def get_team_stats_df(team_stats_soup):
    colnames = ["Rating", "ACS", "K", "D", "A", "KD+/-", "KAST", "ADR", "HS%", "FK", "FD", "FKD+/-"]
    data = {colname: [] for colname in colnames}
    
    i = 0
    for stat in team_stats_soup:
        idx = i % len(colnames)
        col = colnames[idx]
        text = stat.text
        if col == "KAST" or col == "HS%":
            text = text[:-1]
        elif col == "Rating":
            try:
                text = int(100 * float(text))
            except:
                return None
        
        data[col].append(int(text))
        i += 1
    return pd.DataFrame(data)

# Return one row of the dataframe - result and stats of both teams on one map flattened to one row
# Return None if Rating is not present
def get_dataframe_row(map_soup):
    result = np.array([get_map_result(map_soup)])
    
    left_stats_soup, right_stats_soup = get_team_stats_soups(map_soup)
    left_stats_df = get_team_stats_df(left_stats_soup)
    right_stats_df = get_team_stats_df(right_stats_soup)
    if left_stats_df is None or right_stats_df is None:
        return None

    left_stats = left_stats_df.to_numpy().flatten(order="F")
    right_stats = right_stats_df.to_numpy().flatten(order="F")
    return np.concat((result, left_stats, right_stats))

# Return dataframe with all map stats from a tournament (match_list_url)
def get_dataframe(match_list_url, base_url="https://www.vlr.gg"):
    r = requests.get(match_list_url)
    match_list_soup = BeautifulSoup(r.content, "lxml")
    match_anchors = match_list_soup.find_all("a", class_="match-item")

    data = []
    for anchor in tqdm(match_anchors):
        url = base_url + anchor["href"]
        r = requests.get(url)
        match_soup = BeautifulSoup(r.content, "lxml")
        map_soups = get_map_soups(match_soup)
        for map_soup in map_soups:
            row = get_dataframe_row(map_soup)
            if row is not None:
                data.append(row)

    df = pd.DataFrame(data)
    df = df.rename(columns={0: "left_team_won"})
    return df

Let's see how each function works. I will use the [KRÜ vs. FURIA](https://www.vlr.gg/596400/kr-esports-vs-furia-vct-2026-americas-kickoff-ur1) match as an example.

In [3]:
# Get soup with the match page
url = "https://www.vlr.gg/596400/kr-esports-vs-furia-vct-2026-americas-kickoff-ur1"
r = requests.get(url)
soup = BeautifulSoup(r.content, "lxml")

In [4]:
map_soups = get_map_soups(soup)
len(map_soups)

3

The match had three maps, so the number of map soups is 3.

In [5]:
for map_soup in map_soups:
    print("Left team won:", get_map_result(map_soup))

Left team won: True
Left team won: False
Left team won: False


Left team (KRÜ) won only the first map, right team (FURIA) won the second and third map.

In [6]:
team_stats_soups = get_team_stats_soups(map_soups[0])
print("No. of team stats soups:", len(team_stats_soups))
print("No. of stats for one team", len(team_stats_soups[0]))

No. of team stats soups: 2
No. of stats for one team 60


Each of the teams has their own soup with the stats. Each team also has 5 players, there are 12 stats for each of them, that's 60 stats in total.

In [7]:
get_team_stats_df(team_stats_soups[0])

Unnamed: 0,Rating,ACS,K,D,A,KD+/-,KAST,ADR,HS%,FK,FD,FKD+/-
0,144,220,18,14,12,4,83,126,25,0,0,0
1,100,184,15,16,4,-1,74,118,32,3,3,0
2,86,116,9,14,5,-5,61,84,15,0,2,-2
3,84,205,15,19,6,-4,70,137,29,5,4,1
4,71,146,14,19,2,-5,52,100,35,2,4,-2


We have a dataframe with the team stats similar to how they are displayed at vlr.gg.

<img src="img/teamstats.png">

In [8]:
get_dataframe_row(map_soups[0])

array([  1, 144, 100,  86,  84,  71, 220, 184, 116, 205, 146,  18,  15,
         9,  15,  14,  14,  16,  14,  19,  19,  12,   4,   5,   6,   2,
         4,  -1,  -5,  -4,  -5,  83,  74,  61,  70,  52, 126, 118,  84,
       137, 100,  25,  32,  15,  29,  35,   0,   3,   0,   5,   2,   0,
         3,   2,   4,   4,   0,   0,  -2,   1,  -2, 145, 122,  98,  91,
        48, 292, 234, 162, 175, 143,  25,  18,  12,  16,  11,  12,  15,
        14,  12,  18,   4,  15,  14,   8,   4,  13,   3,  -2,   4,  -7,
        65,  78,  78,  78,  70, 186, 147, 112, 118,  90,  17,  23,  38,
        33,  31,   7,   3,   0,   0,   3,   2,   3,   1,   2,   2,   5,
         0,  -1,  -2,   1])

Above we can see all the stats (of both teams) flattened into a single array. The first value is if left team won.

In [9]:
get_dataframe("https://www.vlr.gg/event/matches/2283/valorant-champions-2025/?series_id=all")

100%|█████████████████████████████████████████████████████████████████████████| 34/34 [00:50<00:00,  1.49s/it]


Unnamed: 0,left_team_won,1,2,3,4,5,6,7,8,9,...,111,112,113,114,115,116,117,118,119,120
0,1,163,152,117,94,88,258,315,217,176,...,1,3,1,3,4,2,-3,2,0,-3
1,1,159,157,108,87,39,319,308,257,139,...,3,4,2,1,0,0,-2,0,0,0
2,0,85,78,74,61,55,166,149,162,106,...,1,1,2,4,0,0,1,1,0,1
3,1,204,142,131,85,84,362,235,207,161,...,0,2,2,5,2,1,0,-1,-4,-1
4,1,128,112,104,99,74,250,286,164,196,...,4,2,1,2,2,1,-1,3,-2,-1
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
83,1,193,141,122,111,98,375,295,178,190,...,1,4,1,2,2,0,-3,0,0,-1
84,1,127,120,104,89,88,275,175,203,158,...,4,2,1,0,2,2,-1,0,1,-1
85,0,120,118,100,82,53,216,222,183,153,...,3,4,1,2,2,0,3,1,1,-1
86,0,122,100,91,81,80,218,246,164,204,...,1,2,2,2,2,4,4,-1,-2,-2


Here we can see a dataframe with all the maps from the tournament in the argument of the function.

Let's make a bigger dataframe by concatenating multiple tournament dataframes into one.

In [10]:
match_list_url = "https://www.vlr.gg/event/matches/2449/esports-world-cup-2025/?series_id=all"

ewc_df = get_dataframe(match_list_url)

100%|█████████████████████████████████████████████████████████████████████████| 77/77 [01:30<00:00,  1.17s/it]


In [11]:
match_list_url = "https://www.vlr.gg/event/matches/2379/vct-2025-pacific-stage-1/?series_id=all"
pacific_stage1_df = get_dataframe(match_list_url)

100%|█████████████████████████████████████████████████████████████████████████| 42/42 [00:54<00:00,  1.30s/it]


In [12]:
match_list_url = "https://www.vlr.gg/event/matches/2380/vct-2025-emea-stage-1/?series_id=all"
emea_stage1_df = get_dataframe(match_list_url)

100%|█████████████████████████████████████████████████████████████████████████| 42/42 [00:54<00:00,  1.29s/it]


In [13]:
match_list_url = "https://www.vlr.gg/event/matches/2282/valorant-masters-toronto-2025/?series_id=all"
masters_toronto_df = get_dataframe(match_list_url)

100%|█████████████████████████████████████████████████████████████████████████| 25/25 [00:38<00:00,  1.53s/it]


In [14]:
df = pd.concat([ewc_df, pacific_stage1_df, emea_stage1_df, masters_toronto_df])
df

Unnamed: 0,left_team_won,1,2,3,4,5,6,7,8,9,...,111,112,113,114,115,116,117,118,119,120
0,1,146,140,101,65,63,241,276,237,149,...,8,0,3,2,2,-4,1,-1,-1,-1
1,1,182,122,105,101,85,316,161,244,148,...,4,5,2,1,0,0,-1,0,0,0
2,1,165,150,147,98,56,325,291,270,152,...,3,0,2,6,2,-2,3,-2,-3,-2
3,0,112,107,102,70,63,252,175,233,163,...,1,0,1,3,3,1,6,0,-1,-1
4,1,193,142,118,111,107,337,184,248,165,...,1,2,3,2,4,-1,0,-2,-2,-3
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
54,0,88,88,82,62,30,204,171,159,158,...,2,0,1,2,3,2,1,0,0,-1
55,1,135,104,97,94,90,255,167,205,190,...,1,5,0,0,3,3,0,3,3,-3
56,0,125,110,89,85,67,222,223,192,168,...,3,1,2,2,4,8,0,0,2,-2
57,1,133,132,107,96,77,230,228,217,156,...,5,1,2,0,4,-1,3,-2,1,-2


The final dataframe has 430 rows.

## Predicting the winning team

Let's see if it's possible to predict the winning team based on the player stats. I think it should be possible, the winning team usually has more kills, better rating and ACS, etc. I will use logistic regression.

In [15]:
Xdata = df.drop("left_team_won", axis=1)
Ydata = df["left_team_won"]

random_seed = 333

Xtrain, Xtest, Ytrain, Ytest = train_test_split(Xdata, Ydata, test_size=0.3, random_state=random_seed)

scaler = StandardScaler()
Xtrain = scaler.fit_transform(Xtrain)
Xtest = scaler.transform(Xtest)

In [16]:
clf = LogisticRegression()
clf.fit(Xtrain, Ytrain)
metrics.accuracy_score(Ytest, clf.predict(Xtest))

0.9147286821705426

The accuracy is 91.47%. That's high as I expected.

## Conclusion

I implemented functions to get map stats from professional Valorant matches, I got familiar with Beautiful Soup. I tried predicting the winning team using the data, the accuracy was high. Predicting the results of future matches based on past matches would be a more difficult problem.