## WhoScored Soccer Player Statistics Scraper

This notebook will scrape the current season statistics of a soccer player and display it in a dataframe.
This notebook should launch a Chrome browser window, navigate to the specified page. Then wait for the content to load, scrape the data, and then display the DataFrame. After the operation, the browser window will automatically close and the dataframe  will be displayed at the bottom.
The next section will also take two players and compare their statistics and see who is performing better.

## Project Overview

This project started as a simple tool to scrape soccer player statistics from the WhoScored website. The initial goal was to gather performance data for a single player, focusing on key stats like goals, assists, and appearances. Using Python, Selenium, and BeautifulSoup, I built a script that navigates the website, extracts the relevant data, and organizes it into a pandas DataFrame for easy analysis.

After successfully scraping data for one player, I expanded the project to compare stats between two different players. The enhanced script now pulls data for both players, allowing for a side-by-side comparison of their performances. This includes calculating total goals, assists, and goal-to-game ratios, helping to determine which player has the edge in various categories.

In [8]:
from selenium import webdriver
from selenium.webdriver.chrome.service import Service
from selenium.webdriver.common.by import By
import pandas as pd
from bs4 import BeautifulSoup

# Specify the path to your ChromeDriver executable
chrome_driver_path = r'C:\Users\zareb\Downloads\chromedriver-win64 (1)\chromedriver-win64\chromedriver.exe'

# Set up the ChromeDriver service
service = Service(executable_path=chrome_driver_path)

# Initialize the WebDriver with the service
driver = webdriver.Chrome(service=service)

# Get the specific player URL and paste it as a string here!!!
player_url = 'https://www.whoscored.com/Players/5583/Show/Cristiano-Ronaldo'

# Load the page
driver.get(player_url)

# Wait for the page to load and execute JavaScript
driver.implicitly_wait(10)  # Adjust the time if needed

# Get the page source and parse it with BeautifulSoup
soup = BeautifulSoup(driver.page_source, 'html.parser')

# Now try to find the statistics table again
stats_table = soup.find('div', {'id': 'statistics-table-summary'})

if stats_table is None:
    print("Could not find the statistics table.")
else:
    # Proceed with the extraction as before
    headers = [header.get_text(strip=True) for header in stats_table.find_all('th')]
    rows = []

    for row in stats_table.find_all('tr')[1:]:
        cells = row.find_all('td')
        row_data = [cell.get_text(strip=True) for cell in cells]
        rows.append(row_data)

    df = pd.DataFrame(rows, columns=headers)
    print(df.head())

# Close the browser
driver.quit()

                      Tournament Apps Mins Goals Assists Yel Red  SpG   PS%  \
0  European ChampionshipPortugal    5  488     -       1   1   -  4.6  84.4   
1          Int. FriendlyPortugal    2  N/A     2     N/A   -   -  N/A   N/A   
2           AFC Champions League    4  N/A     3     N/A   1   -  N/A   N/A   
3                Total / Average   11  488     5       1   2   0  4.6  84.4   

  AerialsWon MotM Rating  
0        1.4    -   6.89  
1        N/A  N/A      -  
2        N/A  N/A      -  
3        1.4    0   6.89  


## This section of the notebook scrapes the data from two  different soccer players of choice and compares their goals, assists, and goal-to-game ratios. It also shows how long the process took.

In [13]:
from selenium import webdriver
from selenium.webdriver.chrome.service import Service
import pandas as pd
from bs4 import BeautifulSoup
import time

# Specify the path to your ChromeDriver executable
chrome_driver_path = r'C:\Users\zareb\Downloads\chromedriver-win64 (1)\chromedriver-win64\chromedriver.exe'

# Set up the ChromeDriver service
service = Service(executable_path=chrome_driver_path)

# Function to scrape player statistics
def scrape_player_stats(player_url):
    # Initialize the WebDriver with the service
    driver = webdriver.Chrome(service=service)
    
    # Load the player's page
    driver.get(player_url)
    
    # Wait for the page to load and execute JavaScript
    driver.implicitly_wait(10)  # Increased wait time to ensure the page fully loads
    
    # Get the page source and parse it with BeautifulSoup
    soup = BeautifulSoup(driver.page_source, 'html.parser')
    
    # Now try to find the statistics table using the correct class name
    #stats_table = soup.find('table', {'class': 'statistics-table-summary'})
    stats_table = soup.find('div', {'id': 'statistics-table-summary'})
    
    if stats_table is None:
        print(f"Could not find the statistics table for URL: {player_url}")
        driver.quit()
        return None
    
    # Extract headers and data rows
    headers = [header.get_text(strip=True) for header in stats_table.find_all('th')]
    rows = []
    for row in stats_table.find_all('tr')[1:]:  # Skip the header row
        cells = row.find_all('td')
        row_data = [cell.get_text(strip=True) for cell in cells]
        rows.append(row_data)
    
    # Convert to DataFrame
    df = pd.DataFrame(rows, columns=headers)
    
    # Close the browser
    driver.quit()
    
    return df

# URLs for two players
player1_url = 'https://www.whoscored.com/Players/337782/Show/Vin%C3%ADcius-J%C3%BAnior'
player2_url = 'https://www.whoscored.com/Players/300713/Show/Kylian-Mbapp%C3%A9'

# Scrape data for both players
start_time = time.time()
player1_stats = scrape_player_stats(player1_url)
player2_stats = scrape_player_stats(player2_url)
print(f"Time taken to scrape both players: {time.time() - start_time} seconds")

# If both DataFrames are successfully retrieved
if player1_stats is not None and player2_stats is not None:
    # Display the first few rows of each player's stats
    print("Player 1 Stats:")
    print(player1_stats)
    print("\nPlayer 2 Stats:")
    print(player2_stats)
    
    # Example comparison: Goals and Assists
    if 'Tournament' in player1_stats.columns and 'Goals' in player1_stats.columns:
        player1_goals = player1_stats.loc[player1_stats['Tournament'] == 'Total / Average', 'Goals'].astype(int).sum()
        player2_goals = player2_stats.loc[player2_stats['Tournament'] == 'Total / Average', 'Goals'].astype(int).sum()
        
        player1_assists = player1_stats.loc[player1_stats['Tournament'] == 'Total / Average', 'Assists'].astype(int).sum()
        player2_assists = player2_stats.loc[player2_stats['Tournament'] == 'Total / Average', 'Assists'].astype(int).sum()
        
        print(f"\nTotal Goals - Player 1: {player1_goals}, Player 2: {player2_goals}")
        print(f"Total Assists - Player 1: {player1_assists}, Player 2: {player2_assists}")
        
        # Determine which player has more goals and assists
        if player1_goals > player2_goals:
            print(f"Player 1 has more goals ({player1_goals}) than Player 2 ({player2_goals}).")
        elif player2_goals > player1_goals:
            print(f"Player 2 has more goals ({player2_goals}) than Player 1 ({player1_goals}).")
        else:
            print(f"Both players have the same number of goals ({player1_goals}).")
        
        if player1_assists > player2_assists:
            print(f"Player 1 has more assists ({player1_assists}) than Player 2 ({player2_assists}).")
        elif player2_assists > player1_assists:
            print(f"Player 2 has more assists ({player2_assists}) than Player 1 ({player1_assists}).")
        else:
            print(f"Both players have the same number of assists ({player1_assists}).")
        
        # Example: Goal-to-game ratio using "Apps" or "Appearances"
        if 'Apps' in player1_stats.columns and 'Apps' in player2_stats.columns:
            player1_apps = player1_stats.loc[player1_stats['Tournament'] == 'Total / Average', 'Apps'].str.extract(r'(\d+)').astype(int).sum().sum()
            player2_apps = player2_stats.loc[player2_stats['Tournament'] == 'Total / Average', 'Apps'].str.extract(r'(\d+)').astype(int).sum().sum()
            
            player1_goal_to_game_ratio = player1_goals / player1_apps
            player2_goal_to_game_ratio = player2_goals / player2_apps
            
            print(f"\nGoal-to-Game Ratio - Player 1: {player1_goal_to_game_ratio:.2f}, Player 2: {player2_goal_to_game_ratio:.2f}")
            
            # Determine which player has a better goal-to-game ratio
            if player1_goal_to_game_ratio > player2_goal_to_game_ratio:
                print(f"Player 1 has a better goal-to-game ratio ({player1_goal_to_game_ratio:.2f}) than Player 2 ({player2_goal_to_game_ratio:.2f}).")
            elif player2_goal_to_game_ratio > player1_goal_to_game_ratio:
                print(f"Player 2 has a better goal-to-game ratio ({player2_goal_to_game_ratio:.2f}) than Player 1 ({player1_goal_to_game_ratio:.2f}).")
            else:
                print(f"Both players have the same goal-to-game ratio ({player1_goal_to_game_ratio:.2f}).")
        else:
            print("Could not find 'Apps' statistic for one or both players.")
    else:
        print("The required columns 'Tournament', 'Goals', or 'Assists' were not found in the player stats DataFrame.")
else:
    print("One or both players' data could not be retrieved.")

Time taken to scrape both players: 71.97684931755066 seconds
Player 1 Stats:
            Tournament   Apps  Mins Goals Assists Yel Red  SpG   PS%  \
0     Champions League     10   902     6       4   3   -    3  79.6   
1               LaLiga  22(4)  1875    15       5   7   -    3  78.1   
2   Copa AmericaBrazil      3   251     2       -   2   -  1.3  76.8   
3  Supercopa de Espana      2   N/A     3     N/A   -   -  N/A   N/A   
4         Copa del Rey      1   N/A     -     N/A   1   -  N/A   N/A   
5  Int. FriendlyBrazil   3(1)   N/A     -     N/A   -   -  N/A   N/A   
6      Total / Average     46  3028    26       9  13   0  2.9  78.4   

  AerialsWon MotM Rating  
0        0.3    4   7.82  
1          -    4   7.41  
2          -    1   7.21  
3        N/A  N/A      -  
4        N/A  N/A      -  
5        N/A  N/A      -  
6        0.1    9   7.50  

Player 2 Stats:
                    Tournament   Apps  Mins Goals Assists Yel Red  SpG   PS%  \
0                      Ligue 1  2