# Retrieving Historical Weather Data for Bundesliga Matches

In this Jupyter Notebook, our goal is to retrieve historical weather data to correlate with football matches. Building upon our prior work in the 'Football_Bundesliga_WebScraping.ipynb' notebook, we have already extracted and stored stadium data in a MySQL database. This data includes the locations and dates of each football match, providing us with the precise venues for these events. With this foundational information at hand, our next step in this notebook is to use the OpenMeteo Historical Weather API to fetch the historical weather conditions for each match location at the time it was played. Integrating accurate weather data into our dataset is crucial for a comprehensive analysis, as it may significantly influence match outcomes. After successfully retrieving and verifying the weather data, we will store it in our MySQL database. This approach not only keeps our data well-organized but also facilitates accessibility for further analysis and model development, allowing for more nuanced predictions and insights into how weather conditions could have impacted past football games.

We retrieve weather data from this source: [OpenMeteo Historical Weather API](https://open-meteo.com/en/docs/historical-weather-api)

## 1. Setup

In this initial chapter, we prepare our Jupyter Notebook by importing necessary libraries and modules that enable data manipulation, database connectivity, and API requests. This setup ensures that we have all the tools required to retrieve and handle weather data effectively.

In [None]:
import pandas as pd
import numpy as np
import mysql.connector
from mysql.connector import Error
import openmeteo_requests # Custom module for managing requests to the OpenMeteo API.
import requests_cache # Caches the responses of HTTP requests to enhance efficiency and reduce load times.
from retry_requests import retry # Provides a mechanism to automatically retry HTTP requests on failure.
import time # Allows us to use functionality related to time, such as delays and timestamp calculations.

## 2. Get Football Matches and Stadium Data

In this chapter, we focus on retrieving data from our MySQL database. This includes the football matches and associated stadium information previously stored. This data is crucial as it forms the basis for our subsequent weather data integration.

In [None]:
try:
    # Establish a connection to the MySQL database using specified credentials.
    connection = mysql.connector.connect(
        host='localhost',
        database='adsfootball',
        user='root',
        password='abcabc123'
    )
    
    # SQL query to retrieve all entries from the 'matches' table.
    query = "SELECT * FROM matches"
    
    # Execute the SQL query and load the resulting data directly into a pandas DataFrame.
    football_matches_df = pd.read_sql_query(query, connection)
    
    # Output the first few rows of the DataFrame to verify the data has been loaded correctly.
    print(football_matches_df.head())
    
except Error as e:
    # If an error occurs during the database operations, print the error message.
    print(f"Error: {e}")
finally:
    # Check if the database connection is still open and close it to free resources.
    if connection.is_connected():
        connection.close()
        print("MySQL connection is closed")

In [None]:
try:
    # Establish a connection to the MySQL database using the provided credentials.
    connection = mysql.connector.connect(
        host='localhost',
        database='adsfootball',
        user='root',
        password='abcabc123'
    )
    
    # Prepare an SQL query to fetch all records from the 'stadiums' table.
    query = "SELECT * FROM stadiums"
    
    # Execute the SQL query and store the results in a DataFrame for easy data manipulation.
    stadium_data_df = pd.read_sql_query(query, connection)
    
    # Display the first few rows of the DataFrame to check the data and ensure it's loaded correctly.
    print(stadium_data_df.head())
    
except Error as e:
    # Output any errors encountered during the database operation.
    print(f"Error: {e}")

finally:
    # Ensure that the database connection is closed to release system resources.
    if connection.is_connected():
        connection.close()
        print("MySQL connection is closed")

In our football matches dataset, specific details about match locations, such as coordinates or names, are not directly provided. Instead, the dataset includes a column labeled 'venue' that specifies whether the match was played at 'Home' or 'Away.' This implies that when the 'venue' is marked as 'Home,' the match location corresponds to the home stadium of the team listed in the 'team' column. For instance, if Bayer Leverkusen is listed as the 'team' and the 'venue' is 'Home,' the game was held at Bayer Leverkusen’s stadium. However, for accurate retrieval of weather data, precise geographic coordinates are essential. Thus, we need to merge the football matches dataset with the stadium dataset, which provides detailed information such as team names, stadium names, and the exact coordinates of each stadium.

## 3. Organize Football Dataset

In this chapter, we aim to merge data based on team names, but we face a challenge due to inconsistencies in team naming across the 'team' and 'opponent' columns. For example, 'Eintracht Frankfurt' is shortened to 'Eint Frankfurt' in the 'opponent' column, and 'Bayer Leverkusen' becomes 'Leverkusen.' To ensure accurate merging, we need to standardize the team names in both columns. This is critical because our merging logic depends on the 'team' column to determine the game location based on whether the 'venue' is 'Home' or 'Away'.

This function identifies discrepancies in team naming between the 'team' and 'opponent' columns of the dataset. It collects unique team names from both columns, compares them to find differences, and returns a set of team names that are referred to differently across the two columns.

In [None]:
# Function to identify inconsistencies in team naming between 'team' and 'opponent' columns
def find_teams_with_different_names(df):
    # Collect unique team names from both 'team' and 'opponent' columns into separate sets
    team_names = set(df['team'])
    opponent_names = set(df['opponent'])
    
    # Determine names present in one column but not the other
    diff_in_team = team_names.difference(opponent_names)
    diff_in_opponent = opponent_names.difference(team_names)
    
    # Combine unique names from both comparisons into a single set
    all_differences = diff_in_team.union(diff_in_opponent)
    
    return all_differences

# Execute the function and store the results
different_names = find_teams_with_different_names(football_matches_df)

# Print the team names that differ between columns
print("Teams with different names by column:", different_names)

As we can see, the following teams are named differently in both columns ('**team**' and 'opponent'):
- **Bayer Leverkusen** (Leverkusen)
- **Monchengladbach** (M'Gladbach)
- **Greuther Furth** (Greuter Fürth)
- **Eintracht Frankfurt** (Eint Frankfurt)
- **Koln** (Köln)

We want to standardize them across both columns.

In [None]:
# Mapping dictionary where keys are alternative names and values are the standard names
name_mapping = {
    'Leverkusen': 'Bayer Leverkusen',
    'Eint Frankfurt': 'Eintracht Frankfurt',
    'M\'Gladbach': 'Monchengladbach',
    'Köln': 'Koln',
    'Greuther Fürth': 'Greuther Furth'
}

# Define a function to standardize team names based on a provided mapping dictionary
def standardize_team_names(df, mapping):
    # Update 'opponent' column in the DataFrame using the mapping to ensure uniformity
    df['opponent'] = df['opponent'].apply(lambda x: mapping.get(x, x))
    return df

# Apply the function to standardize the names in the 'opponent' column of the DataFrame
football_matches_df = standardize_team_names(football_matches_df, name_mapping)

Now we can check again for differences in the names.

In [None]:
# Function to find teams with different names in the dataset
def find_teams_with_different_names(df):
    # Collect 'team' and 'opponent' names into two separate sets
    team_names = set(df['team'])
    opponent_names = set(df['opponent'])
    
    # Find the differences in names between both columns
    diff_in_team = team_names.difference(opponent_names)
    diff_in_opponent = opponent_names.difference(team_names)
    
    # Combine the differences from both sets
    all_differences = diff_in_team.union(diff_in_opponent)
    
    return all_differences

# Calling the function
different_names = find_teams_with_different_names(football_matches_df)
print("Teams with different names by column:", different_names)

Another inconsistency we encounter is that the team names in the stadium dataset are written differently compared to those in the football matches dataset. To ensure accurate data integration, we need to standardize these names by addressing variations such as 'Umlauts' (e.g., converting 'ü' to 'u') and replacing or removing certain prefixes and suffixes. This standardization is crucial for maintaining consistency across our datasets and enabling effective data merging.

In [None]:
# Function to replace umlauts and other special characters
def replace_umlauts(name):
    umlaut_replacements = {'ä': 'a', 'ö': 'o', 'ü': 'u', 'ß': 'ss'}
    for umlaut, replacement in umlaut_replacements.items():
        name = name.replace(umlaut, replacement)
    return name

# Function to normalize team names
def normalize_name(name):
    name = replace_umlauts(name)
    # Remove prefixes, spaces, and other characters for standardization
    return name.lower().replace('1.', '').replace('fc', '').replace(' ', '').replace('.', '').replace("'", "")

We now want to check if there are any teams that are listed in the football matches dataset but are missing in the stadium dataset. This verification is essential to ensure data completeness and consistency across both datasets.

In [None]:
# List of all unique teams in football_matches_df
all_teams = pd.concat([football_matches_df['team'], football_matches_df['opponent']]).unique()
all_teams_normalized = [normalize_name(team) for team in all_teams]

In [None]:
# List of normalized team names in stadium_data_df
stadium_teams_normalized = [normalize_name(team) for team in stadium_data_df['fdcouk']]
stadium_teams_normalized.extend([normalize_name(team) for team in stadium_data_df['team']])

In [None]:
# Find teams in football_matches_df that have no match in stadium_data_df
missing_teams = set()
for team in all_teams_normalized:
    if not any(team in stadium_team for stadium_team in stadium_teams_normalized):
        missing_teams.add(team)

In [None]:
# Output the teams that have no match
print("Teams in football_matches_df with no match in stadium_data_df:")
for team in missing_teams:
    print(team)

Here we can see that following teams are missing in the stadium dataset:
- VfL Bochum
- FC Heidenheim
- FC Union Berlin
- Arminia Bielefeld
- SV Darmstadt 98
- RB Leipzig

We have to add them manually in the dataset with the corresponding data.

In [None]:
# Define new teams to be added to the stadium_data_df
new_teams = [
    # Dictionary for each new team
    # Each dictionary contains all the required information for the team
    {"id": 24, "team": "1. FC Heidenheim", "fdcouk": "Heidenheim", "city": "Heidenheim", "stadium": "Voith-Arena", "capacity": 15000, "latitude": 48.2230555556, "longitude": 9.02666666667, "country": "Germany"},
    {"id": 25, "team": "1. FC Union Berlin", "fdcouk": "Union Berlin", "city": "Berlin", "stadium": "Stadion An der Alten Försterei", "capacity": 22012, "latitude": 52.454331516, "longitude": 13.56749773, "country": "Germany"},
    {"id": 26, "team": "RB Leipzig", "fdcouk": "RB Leipzig", "city": "Leipzig", "stadium": "Red Bull Arena", "capacity": 41122, "latitude": 51.3408086368, "longitude": 12.3422636309, "country": "Germany"},
    {"id": 27, "team": "Arminia Bielefeld", "fdcouk": "Arminia Bielefeld", "city": "Bielefeld", "stadium": "SchücoArena", "capacity": 27332, "latitude": 52.031389, "longitude": 8.516944, "country": "Germany"},
    {"id": 28, "team": "SV Darmstadt 98", "fdcouk": "Darmstadt 98", "city": "Darmstadt", "stadium": "Merck-Stadion am Böllenfalltor", "capacity": 17810, "latitude": 49.854663248, "longitude": 8.66999732, "country": "Germany"},
    {"id": 29, "team": "VfL Bochum", "fdcouk": "Bochum", "city": "Bochum", "stadium": "Vonovia Ruhrstadion", "capacity": 26000, "latitude": 51.4872597176, "longitude": 7.23525905896, "country": "Germany"}
]

# Convert the list of dictionaries into a DataFrame
new_teams_df = pd.DataFrame(new_teams)

# Append the new teams DataFrame to the existing stadium_data_df
stadium_data_df = pd.concat([stadium_data_df, new_teams_df], ignore_index=True)
stadium_data_df

## 3. Match Football Data with Stadium Data

In this chapter, we will merge our organized football match data with stadium data. This integration allows us to map each match to its corresponding stadium, ensuring we have accurate location information for subsequent weather data retrieval.

In [None]:
# Function to match team names between the football matches dataset and the stadium dataset
def match_team_names(football_name, stadium_df):
    # Normalize the football team name for comparison
    football_name_norm = normalize_name(football_name)
    best_match = None
    shortest_length = float('inf')  # Initialize with infinity to find the minimum length string match

    # Iterate through each row in the stadium dataset
    for _, row in stadium_df.iterrows():
        # Check against both 'fdcouk' and 'team' columns in the stadium dataset
        for column in ['fdcouk', 'team']:
            stadium_name_norm = normalize_name(row[column])
            # Check if one normalized name contains the other
            if football_name_norm in stadium_name_norm or stadium_name_norm in football_name_norm:
                # Update the best match if the current name is shorter than previously found names
                if len(stadium_name_norm) < shortest_length:
                    shortest_length = len(stadium_name_norm)
                    best_match = row['fdcouk']
    
    return best_match

# This function ensures that we find the closest match based on the shortest normalized name that includes the football name.

In [None]:
# Enhanced function to retrieve coordinates, city, and stadium information for a given team
def get_stadium_info(football_team, venue, opponent, stadium_df):
    # Determine which team to search for based on the venue
    team_to_search = football_team if venue == 'Home' else opponent
    # Find the best matching stadium team name
    matched_stadium_team = match_team_names(team_to_search, stadium_df)
    
    # If a match is found, retrieve additional information from the stadium dataset
    if matched_stadium_team:
        team_row = stadium_df[stadium_df['fdcouk'] == matched_stadium_team]
        if not team_row.empty:
            # Return the latitude, longitude, city, and stadium name
            return (team_row.iloc[0]['latitude'], team_row.iloc[0]['longitude'],
                    team_row.iloc[0]['city'], team_row.iloc[0]['stadium'])
    
    # Return None for all fields if no match is found
    return (None, None, None, None)

# This function is crucial for providing a comprehensive set of stadium data for each match,
# enhancing the analysis capabilities with precise location details.

In [None]:
# Add columns to football_matches_df
# Unpack the tuple directly into the new columns
football_matches_df[['latitude', 'longitude', 'city', 'stadium']] = pd.DataFrame(
    football_matches_df.apply(
        lambda x: get_stadium_info(x['team'], x['venue'], x['opponent'], stadium_data_df), 
        axis=1
    ).tolist(), index=football_matches_df.index
)

# Output the updated DataFrame to check the newly added data
football_matches_df.head()

# This code block effectively adds geographical and venue-related details to the football matches DataFrame.
# It enriches each match entry with precise location data including latitude, longitude, city, and stadium name by utilizing the 'get_stadium_info' function.

## 4. Get Weather Data for Every Game

In this chapter, we transition from organizing match data to collecting historical weather data for each game's date and location. Utilizing the OpenMeteo Historical Weather API, we will retrieve weather conditions that are crucial for comprehensive match analysis. Instead of descriptive weather categories, the API provides weather codes based on the World Meteorological Organization (WMO) standards, which can be referenced using the provided [WMO Code Table](https://www.nodc.noaa.gov/archive/arc0021/0002199/1.1/data/0-data/HTML/WMO-CODE/WMO4677.HTM). Upon gathering this information, we will store the weather data in our MySQL database for future analysis.

In [None]:
# Set up a cached session for HTTP requests to reduce load times and API calls. Cache does not expire.
cache_session = requests_cache.CachedSession('.cache', expire_after=-1)

In [None]:
# Wrap the cached session in a retry mechanism, configuring it to retry up to 5 times with a backoff factor of 0.2 seconds.
retry_session = retry(cache_session, retries=5, backoff_factor=0.2)

In [None]:
# Initialize the OpenMeteo API client with the retry-enabled cached session.
openmeteo = openmeteo_requests.Client(session=retry_session)

In [None]:
# Define the base URL for the Open-Meteo API's archive endpoint.
url = "https://archive-api.open-meteo.com/v1/archive"

In [None]:
# Create an empty DataFrame with specified columns to store the weather data retrieved from the API.
weather_df = pd.DataFrame(columns=['date', 'latitude', 'longitude', 'weather_code', 'mean_temperature', 'precipitation_sum', 'rain_sum', 'snowfall_sum'])

Open-Meteo's free version permits up to 600 API calls per minute. Given our requirement to make approximately 2500 calls, as dictated by the number of entries in our football matches dataset, we need to manage our requests carefully. Once we reach the limit of 599 calls, we will pause our requests for just over a minute before continuing, ensuring we do not exceed the rate limit.

In [None]:
# Initialize the request count to keep track of how many API calls have been made.
request_count = 0
# Set the maximum number of requests we can make before needing to pause.
max_requests_before_pause = 599
# Define the duration in seconds for which to pause once the limit is reached to avoid exceeding the API's rate limit.
pause_duration = 61

The parameters we want to obtain are:
- **Weather Code** (WMO-Codes)
- **Mean Temperature** (°C)
- **Total Precipitation** (mm)
- **Total Rainfall** (mm)
- **Total Snowfall** (cm)

In [None]:
# Iterate over each row in the football matches DataFrame
for index, row in football_matches_df.iterrows():
    # Before making an API call, check if the request limit has been reached
    if request_count >= max_requests_before_pause:
        # Notify and pause execution to adhere to API rate limits
        print(f"Maximum number of requests reached ({max_requests_before_pause}). Pausing for {pause_duration} seconds.")
        # Pause execution for the specified duration
        time.sleep(pause_duration)
        # Reset request count after pausing
        request_count = 0

    # Extract match date, and geographic coordinates from the DataFrame row
    match_date = row['date']
    latitude = row['latitude']
    longitude = row['longitude']

    # Configure the parameters for the weather API request
    params = {
        "latitude": latitude,
        "longitude": longitude,
        "start_date": match_date,
        "end_date": match_date,
        "daily": ["weather_code", "temperature_2m_mean", "precipitation_sum", "rain_sum", "snowfall_sum"],
        "timezone": "Europe/Berlin"
    }

    try:
        # Perform the API call to retrieve weather data
        responses = openmeteo.weather_api(url, params=params)
        daily = responses[0].Daily()
        
        # Extract weather data from the response
        weather_data = daily.Variables(0).ValuesAsNumpy()
        mean_temperature = daily.Variables(1).ValuesAsNumpy()
        precipitation_sum = daily.Variables(2).ValuesAsNumpy()
        rain_sum = daily.Variables(3).ValuesAsNumpy()
        snowfall_sum = daily.Variables(4).ValuesAsNumpy()
        
        # Ensure all data is non-null before appending to the DataFrame
        if pd.notna(weather_data).all():
            # Create a new DataFrame row with the extracted weather data
            new_row = pd.DataFrame({
                'date': [match_date], 
                'latitude': [latitude], 
                'longitude': [longitude], 
                'weather_code': [weather_data], 
                'mean_temperature':[mean_temperature], 
                'precipitation_sum':[precipitation_sum], 
                'rain_sum':[rain_sum], 
                'snowfall_sum':[snowfall_sum]
            })
            # Append the new row to the main weather DataFrame
            weather_df = pd.concat([weather_df, new_row], ignore_index=True)
    except Exception as e:
        # Print an error message if the API call fails
        print(f"Error retrieving weather for {row['date']}: {str(e)}")

# After all data is collected, convert certain columns to float for consistent data types
weather_df['weather_code'] = weather_df['weather_code'].astype(float)
weather_df['mean_temperature'] = weather_df['mean_temperature'].astype(float)
weather_df['precipitation_sum'] = weather_df['precipitation_sum'].astype(float)
weather_df['rain_sum'] = weather_df['rain_sum'].astype(float)
weather_df['snowfall_sum'] = weather_df['snowfall_sum'].astype(float)

In [None]:
weather_df.head()

## 5. Store Weather Data in MySQL Table

In this chapter, we will focus on securely storing the retrieved weather data into a MySQL database. This process ensures that the weather data is integrated and archived in a structured format, facilitating easy access and analysis alongside the football match data. By the end of this section, the weather data will be systematically stored, ready for further analysis and integration with other datasets.

In [None]:
# SQL query to create a new table for storing weather data, if it doesn't already exist
create_table_query_weather = """
            CREATE TABLE IF NOT EXISTS weather (
                id INT AUTO_INCREMENT PRIMARY KEY,
                date VARCHAR(255),
                latitude FLOAT,
                longitude FLOAT,
                weather_code FLOAT,
                mean_temperature FLOAT,
                precipitation_sum FLOAT,
                rain_sum FLOAT,
                snowfall_sum FLOAT
            );
        """

In [None]:
# SQL query template for inserting weather data into the weather table
insert_query_weather = """
INSERT INTO weather (date, latitude, longitude, weather_code, mean_temperature, precipitation_sum, rain_sum, snowfall_sum)
VALUES (%s, %s, %s, %s, %s, %s, %s, %s)
"""

In [None]:
# Establish a connection to the MySQL database
try:
    connection = mysql.connector.connect(user='root', password='abcabc123', host='localhost', database='adsfootball')
    if connection.is_connected():
        # Create a cursor object to interact with the database
        cursor = connection.cursor()
        # Ensure there is no previous 'weather' table that might conflict with the new data
        cursor.execute("DROP TABLE IF EXISTS weather")
        # Execute the table creation query
        cursor.execute(create_table_query_weather)

        # Insert each row of weather data from the DataFrame into the MySQL table
        for i, row in weather_df.iterrows():
            values = (row['date'], row['latitude'], row['longitude'], row['weather_code'], row['mean_temperature'], row['precipitation_sum'], row['rain_sum'], row['snowfall_sum'])
            cursor.execute(insert_query_weather, values)

        # Commit the transaction to make sure all data is saved in the database
        connection.commit()
        print("All data successfully committed.")

except Error as e:
    # Handle any errors that occur during the database connection or execution
    print(f"Error while connecting to MySQL: {e}")

finally:
    # Close the database connection to release resources
    if connection.is_connected():
        cursor.close()
        connection.close()
        print("MySQL connection is closed")

## 6. Update Football Matches Table in MySQL

In this chapter, we address the inconsistencies and updates required for the football matches table in our MySQL database. After standardizing team names and enhancing our dataset with additional location details such as coordinates and stadium information, we need to update our database to reflect these changes. This process involves checking for existing columns, dropping them if necessary, adding new columns, and updating each record with precise data.

In [None]:
# Configuration for MySQL database connection
db_config = {
    'host': 'localhost',
    'database': 'adsfootball',
    'user': 'root',
    'password': 'abcabc123'
}

In [None]:
# Function to check if there are existing columns in the table
def check_and_drop_columns(cursor, table_name):
    # Execute a query to fetch all column names from the specified table
    cursor.execute(f"SHOW COLUMNS FROM {table_name}")
    columns = [column[0] for column in cursor.fetchall()]

    # List of columns to be potentially dropped
    columns_to_drop = ['latitude', 'longitude', 'city', 'stadium']

    # Loop through the list and drop each column if it exists in the table
    for column in columns_to_drop:
        if column in columns:
            try:
                cursor.execute(f"ALTER TABLE {table_name} DROP COLUMN {column}")
            except mysql.connector.Error as err:
                print(f"Failed to drop column {column}: {err.msg}")

In [None]:
# Function to add the new columns to the table
def add_columns(cursor):
    # SQL statements to add new columns to the matches table
    add_column_statements = [
        "ALTER TABLE matches ADD COLUMN latitude DECIMAL(10, 8)",
        "ALTER TABLE matches ADD COLUMN longitude DECIMAL(11, 8)",
        "ALTER TABLE matches ADD COLUMN city VARCHAR(255)",
        "ALTER TABLE matches ADD COLUMN stadium VARCHAR(255)"
    ]
    # Execute each statement and handle errors such as column already existing
    for statement in add_column_statements:
        try:
            cursor.execute(statement)
        except mysql.connector.Error as err:
            if err.errno == 1060:  # Handle 'Column already exists' error
                print(f"Column already exists: {err.msg}")
            else:
                print(f"An error occurred: {err.msg}")

In [None]:
# SQL statement to update location and stadium details in the matches table
update_statement_matches = """
UPDATE matches
SET latitude = %s, longitude = %s, city = %s, stadium = %s
WHERE id = %s;
"""

In [None]:
# Establish a connection using the database configuration and handle the database operations
try:
    connection = mysql.connector.connect(**db_config)
    if connection.is_connected():
        cursor = connection.cursor()

        # Remove any redundant columns from the table
        check_and_drop_columns(cursor, "matches")
        
        # Add new columns to store additional location data
        add_columns(cursor)
        
        # Update each row with new geographic and stadium information
        for index, row in football_matches_df.iterrows():
            data = (row['latitude'], row['longitude'], row['city'], row['stadium'], row['id'])
            cursor.execute(update_statement_matches, data)
        
        # Commit all changes to ensure they are saved in the database
        connection.commit()
        print("Data has been successfully updated.")
        
except Error as e:
    # Handle any errors that occur during the database operations
    print(f"Error connecting to the MySQL database: {e}")

finally:
    # Ensure the database connection is closed after operations complete
    if connection.is_connected():
        cursor.close()
        connection.close()
        print("MySQL connection is closed.")

**We must also update the 'opponent' column to ensure that team names are correctly written.**

In [None]:
# SQL statement to update the 'opponent' names in the matches table
update_opponent_statement = """
UPDATE matches
SET opponent = %s
WHERE id = %s;
"""

In [None]:
# Establish a connection to the database using the predefined configuration
try:
    connection = mysql.connector.connect(**db_config)

    # Check if the database connection was successful
    if connection.is_connected():
        cursor = connection.cursor()

        # Iterate through the DataFrame and update the 'opponent' column with corrected names
        for index, row in football_matches_df.iterrows():
            # Data tuple containing new opponent name and the corresponding match ID
            opponent_data = (row['opponent'], row['id'])
            cursor.execute(update_opponent_statement, opponent_data)
        
        # Commit the updates to the database to ensure all changes are saved
        connection.commit()
        print("Data has been successfully updated.")

except Error as e:
    # Handle any errors encountered during the connection or update process
    print(f"Error connecting to the MySQL database: {e}")

finally:
    # Close the database connection and cursor to free resources
    if connection.is_connected():
        cursor.close()
        connection.close()
        print("MySQL connection is closed.")

## 7. Update Stadium Table in MySQL

In this final chapter, we focus on refining our stadium dataset stored in MySQL. Our goal is to enhance the dataset by incorporating any missing stadiums. This update ensures that our database reflects the most accurate and comprehensive stadium information available, supporting precise location-based analyses for future projects.

In [None]:
# SQL query to drop the existing 'stadiums' table if it exists, ensuring a fresh start
table_drop_query_stadiums = "DROP TABLE IF EXISTS stadiums;"

In [None]:
# SQL query to create a new 'stadiums' table with columns for team, stadium, and location details
table_creation_query_stadiums = """
        CREATE TABLE IF NOT EXISTS stadiums (
            id INT AUTO_INCREMENT PRIMARY KEY,
            team VARCHAR(255),
            fdcouk VARCHAR(255),
            city VARCHAR(255),
            stadium VARCHAR(255),
            capacity INT,
            latitude FLOAT,
            longitude FLOAT,
            country VARCHAR(255)
        );
        """

In [None]:
# SQL statement for inserting data into the 'stadiums' table
insert_statement_stadiums = (
            "INSERT INTO stadiums (team, fdcouk, city, stadium, capacity, latitude, longitude, country) "
            "VALUES (%s, %s, %s, %s, %s, %s, %s, %s)"
        )

In [None]:
# Establish a connection to the MySQL database
try:
    connection = mysql.connector.connect(
        host='localhost',
        database='adsfootball',
        user='root',
        password='abcabc123'
    )

    # Check if the database connection was successful
    if connection.is_connected():
        cursor = connection.cursor()

        # Execute the query to drop the existing 'stadiums' table
        cursor.execute(table_drop_query_stadiums)
        
        # Create a new 'stadiums' table based on the defined schema
        cursor.execute(table_creation_query_stadiums)
        
        # Iterate over each row in the stadium DataFrame to insert data
        for index, row in stadium_data_df.iterrows():
            data_tuple = (row['team'], row['fdcouk'], row['city'], row['stadium'], row['capacity'], 
                          row['latitude'], row['longitude'], row['country'])
            cursor.execute(insert_statement_stadiums, data_tuple)
        
        # Commit the transaction to save all changes
        connection.commit()
        print("Records inserted.")
        
        # Handle any errors that occur during the database operations
        cursor.close()
        
except Error as e:
    print(f"Error: {e}")

finally:
    # Ensure the database connection is closed to free resources
    if connection.is_connected():
        connection.close()
        print("MySQL connection is closed")