# HOL: Soccer Euro Cup 2024 Prediction ⚽⚽⚽
**Building a Forecasting Model** with **Snowpark ML**

---

In this HOL, we'll be using Snowflake Notebook and Snowpark ML to build out an end-to-end forecasting model, starting from data ingestion through to model deployment. We will address all the steps in ML development, and demonstrate the various capabilities Snowflake provides with Snowpark ML and its supporting MLOps capabilities.


_The model aims to predict the winner of the Euro Cup 2024 — a polarizing subject for Matteo (Italy supporter) and Simon (England supporter)!_
_Are you ready? Let's start!_

![image](https://i.gifer.com/embedded/download/BiCu.gif)


In [None]:
from snowflake.snowpark.context import get_active_session
session = get_active_session()

# add version tracking
app_tag = {
    "origin": "sf_sit",
    "name": "hol_sport_predict",
    "version": '{major: 1, minor: 0}'
}

session.query_tag = app_tag

## Data Ingestion
---
Once the dataset package is imported into the PUBLIC.DATA stage, we can import it in our staging tables.

**(Time: 5 mins)**

- Step through the notebook
- Move data from __PUBLIC.DATA__ stage to tables
- Get an understanding of the dataset loaded
- Create additional cells to query the dataset further


In [None]:
# Ingesting Data.
# Using a nested SQL in a python script to ingest all dataset in one step.

# Create File Format for generic csv
session.sql(f'''
    CREATE OR REPLACE FILE FORMAT ff_generic_csv
        TYPE = CSV
        FIELD_DELIMITER = ','
        FIELD_OPTIONALLY_ENCLOSED_BY='"'
        PARSE_HEADER = True
        TRIM_SPACE = TRUE
        NULL_IF = ('NULL', 'null')
        ESCAPE_UNENCLOSED_FIELD= NONE
        ERROR_ON_COLUMN_COUNT_MISMATCH=false
        replace_invalid_characters=true
        date_format=auto
        time_format=auto
        timestamp_format=auto;
''').collect()

# Create File Format for results.csv (loaded adding unique id extra columns)
session.sql('''
    create or replace file format ff_results_csv
    	type=csv
        skip_header=1
        field_delimiter=','
        trim_space=true
        field_optionally_enclosed_by='"'
        replace_invalid_characters=true
        date_format=auto
        time_format=auto
        timestamp_format=auto; 
''').collect()

# Iterate on files to load into tables
tables = ["fixture", "rankings"]
for table in tables:
    session.sql(f'''
        CREATE OR REPLACE TABLE {table.upper()}
        USING TEMPLATE (
            SELECT ARRAY_AGG(object_construct(*))
            FROM TABLE(
                INFER_SCHEMA(
                    LOCATION=>'@data/{table}.csv',
                    FILE_FORMAT=>'ff_generic_csv',
                    IGNORE_CASE => TRUE
                )
            )
        );
    ''').collect()
    
    session.sql(f'''
        COPY INTO {table.upper()}
        FROM '@data/{table}.csv'
        FILE_FORMAT = ff_generic_csv
        MATCH_BY_COLUMN_NAME = CASE_INSENSITIVE;
    ''').collect()

In [None]:
create or replace table results as
(
    select 
        ROW_NUMBER() OVER (ORDER BY $1) AS id,
        $1::date date, 
        $2 home_team, 
        $3 away_team, 
        $4::integer home_team_score, 
        $5::integer away_team_score, 
        $6 tournament, 
        $7 city, 
        $8 country,
        (CASE WHEN $9 = 'TRUE' then 1 ELSE 0 END) neutral
    from 
        @data/results.csv
    (file_format => 'ff_results_csv')
);

In [None]:
-- need to handle a country spelling difference for Turkey/Türkiye

update rankings 
    set country_full = 'Türkiye'
    where country_abrv = 'TUR';

In [None]:
-- let's check our newly created tables

show tables in schema public;

In [None]:
-- And now lets check England's 1966 world cup final victory is there...
SELECT
  *
FROM
  results
WHERE
  home_team = 'England' 
  AND away_team = 'Germany'
  AND tournament = 'FIFA World Cup'
  AND YEAR (date) = 1966;

In [None]:
# We also loaded the fixture of Euro Cup 2024 - these are the matches we'll predict the results. 
# Starting from group stage, through the knockout stage, up to the final.

session.table('fixture').limit(51)