# Sofascore

Sofascore is a website that provides live scores, results, and fixtures for a variety of sports. It also provides statistics and player ratings for football matches. 

It has some advanced statistics and event data we can use like shots and expected goals.

The unique thing about Sofascore is that it is a website that is set up terribly for scraping, but we can still do it thanks to APIs they use to load the data.

In [12]:
import requests

In [13]:
# We'll scrape all of the shot data from the Women's Champions League Final between Barcelona and Wolfsburg in 2023
url = 'https://www.sofascore.com/barcelona-vfl-wolfsburg/glcsOhG#id:11253247'

# in this url, the id for the match is 11253247
match_id = url.split('id:')[-1]

In [14]:
# Open the network tab in the developer tools and filter by XHR to find the API call that loads the data
# Then look for an api call that is called "shotmap"
# Copy the cURL command and head over to curlconverter.com and paste the cURL command to get the python requests code
headers = {
    'authority': 'api.sofascore.com',
    'accept': '*/*',
    'accept-language': 'en-US,en;q=0.9',
    'cache-control': 'max-age=0',
    'dnt': '1',
    'if-none-match': 'W/"c8379b88a8"',
    'origin': 'https://www.sofascore.com',
    'referer': 'https://www.sofascore.com/',
    'sec-ch-ua': '"Chromium";v="121", "Not A(Brand";v="99"',
    'sec-ch-ua-mobile': '?0',
    'sec-ch-ua-platform': '"macOS"',
    'sec-fetch-dest': 'empty',
    'sec-fetch-mode': 'cors',
    'sec-fetch-site': 'same-site',
    'user-agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/121.0.0.0 Safari/537.36',
}

response = requests.get(f'https://api.sofascore.com/api/v1/event/{match_id}/shotmap', headers=headers) # replace the id with the id of the match

In [15]:
# The first time you run this you get a 304 status code, which means the data is cached and the response is not updated
response.status_code

200

In [16]:
# We can add the 'last-modified' header to the request to get the data
# You can just hardcode todays date, or you can dynamically get the date and time
# I've never had a problem hardcoding the date
headers['If-Modified-Since'] = 'Mon, 26 Feb 2024 00:00:00 GMT'

In [17]:
response = requests.get(f'https://api.sofascore.com/api/v1/event/{match_id}/shotmap', headers=headers)

In [18]:
response.status_code

200

In [19]:
# Now turn it into a json object
data = response.json()

In [20]:
import pandas as pd

df = pd.DataFrame(data['shotmap'])

In [21]:
df.head(10)

Unnamed: 0,player,isHome,shotType,situation,playerCoordinates,bodyPart,goalMouthLocation,goalMouthCoordinates,blockCoordinates,id,time,addedTime,timeSeconds,draw,reversedPeriodTime,reversedPeriodTimeSeconds,incidentType,goalType
0,"{'name': 'Pauline Bremer', 'firstName': '', 'l...",False,save,corner,"{'x': 7.7, 'y': 54, 'z': 0}",head,high-centre,"{'x': 0, 'y': 51.7, 'z': 21.5}","{'x': 1, 'y': 52.5, 'z': 0}",2068363,90,9.0,5910,"{'start': {'x': 54, 'y': 7.7}, 'block': {'x': ...",1,390,shot,
1,"{'name': 'Lucy Bronze', 'firstName': '', 'last...",True,block,corner,"{'x': 2.7, 'y': 40.2, 'z': 0}",head,low-centre,"{'x': 0, 'y': 51.2, 'z': 19}","{'x': 1.4, 'y': 44.4, 'z': 0}",2068297,80,,4748,"{'start': {'x': 40.2, 'y': 2.7}, 'block': {'x'...",11,652,shot,
2,"{'name': 'Patricia Guijarro', 'slug': 'patrici...",True,block,set-piece,"{'x': 7.3, 'y': 42.9, 'z': 0}",head,low-centre,"{'x': 0, 'y': 51.6, 'z': 19}","{'x': 5.8, 'y': 44, 'z': 0}",2068287,78,,4675,"{'start': {'x': 42.9, 'y': 7.3}, 'block': {'x'...",13,725,shot,
3,"{'name': 'Aitana Bonmatí', 'firstName': '', 'l...",True,save,assisted,"{'x': 18.1, 'y': 53.6, 'z': 0}",right-foot,low-centre,"{'x': 0, 'y': 49.7, 'z': 17.1}","{'x': 1.2, 'y': 51.3, 'z': 0}",2068275,76,,4511,"{'start': {'x': 53.6, 'y': 18.1}, 'block': {'x...",15,889,shot,
4,"{'name': 'Geyse Ferreira', 'firstName': '', 'l...",True,miss,fast-break,"{'x': 9.8, 'y': 71.3, 'z': 0}",right-foot,high-right,"{'x': 0, 'y': 41.8, 'z': 66.7}",,2068267,72,,4295,"{'start': {'x': 71.3, 'y': 9.8}, 'end': {'x': ...",19,1105,shot,
5,"{'name': 'Fridolina Rolfö', 'firstName': '', '...",True,goal,assisted,"{'x': 6.7, 'y': 44.9, 'z': 0}",left-foot,high-right,"{'x': 0, 'y': 47.1, 'z': 24.7}",,2068254,70,,4177,"{'start': {'x': 44.9, 'y': 6.7}, 'end': {'x': ...",21,1223,shot,regular
6,"{'name': 'Mariona Caldentey', 'firstName': '',...",True,miss,assisted,"{'x': 7.8, 'y': 55.5, 'z': 0}",right-foot,close-right,"{'x': 0, 'y': 40.8, 'z': 9.7}","{'x': 4.5, 'y': 56.7, 'z': 0}",2068262,70,,4171,"{'start': {'x': 55.5, 'y': 7.8}, 'block': {'x'...",21,1229,shot,
7,"{'name': 'Ewa Pajor', 'slug': 'pajor-ewa', 'sh...",False,save,fast-break,"{'x': 11.4, 'y': 32.4, 'z': 0}",right-foot,high-left,"{'x': 0, 'y': 52.3, 'z': 20.9}","{'x': 1.8, 'y': 47.3, 'z': 0}",2068253,69,,4102,"{'start': {'x': 32.4, 'y': 11.4}, 'block': {'x...",22,1298,shot,
8,"{'name': 'Ewa Pajor', 'slug': 'pajor-ewa', 'sh...",False,save,assisted,"{'x': 11.4, 'y': 31.8, 'z': 0}",right-foot,low-centre,"{'x': 0, 'y': 51.5, 'z': 1.9}","{'x': 2.2, 'y': 45.8, 'z': 0}",2068250,67,,4018,"{'start': {'x': 31.8, 'y': 11.4}, 'block': {'x...",24,1382,shot,
9,"{'name': 'Lena Oberdorf', 'firstName': '', 'la...",False,miss,throw-in-set-piece,"{'x': 9.3, 'y': 41, 'z': 0}",right-foot,high-left,"{'x': 0, 'y': 56.3, 'z': 81.9}",,2068237,65,,3886,"{'start': {'x': 41, 'y': 9.3}, 'end': {'x': 43...",26,1514,shot,
