## Installing dependencies

 - Have python3 running on your machine
 - Create a virtual environment `python3 -m venv venv` in your current working directory
 - Enter the virtual environment `source .venv/bin/activate`, run `pip install -r requirements.txt` to install all dependencies

----



In [55]:
import pandas as pd
import numpy as np
import matplot as mp
import random
import json
from collections import defaultdict

teams = []

## Question 1: Temperature Modeling

Consider a scenario where the temperature X(t) varies randomly over a continuous time interval t, where t is in the range from 0 to 1. We begin with the assumption that X(0) = 0, which means that the temperature at time 0 is 0. Now, if we choose a small time increment represented by ∆t, we can make the assumption that the change in temperature from time t to t + ∆t, denoted as X(t+ ∆t)− X(t), follows a normal distribution. This normal distribution is characterized by a mean of 0 and a variance of ∆t.

1. Let $P$ be the random variable denoting the proportion of time in [0, 1] such that the temperature is positive. Estimate the distribution of $P$ by Monte Carlo simulation and experimenting with various values of ∆t (e.g. ∆t = 0.01, 0.001, 0.0001, · · · .)

2. Let $T_{max}$ be the random variable denoting the time in [0, 1] such that the temperature is at its maximum. Estimate the distribution of $T_{max}$ by Monte Carlo simulation and experimenting with various values of ∆t (e.g. ∆t = 0.01, 0.001, 0.0001, · · · .)

# Question 2: Premier League Forecasting

Create a probabilistic model and perform Monte Carlo simulations to forecast the final points for Premier League teams in the 2024-2025 season. 

This model will include some unknown parameters that you will determine based on the data you gather. 

In Premier League matches, teams earn three points for a win, one point for a draw, and no points for a loss. For your predictions, you could use statistics from the beginning of the season up to a specific date to estimate the parameters of your model, and then run Monte Carlo simulations to project the outcomes of the remaining matches, ultimately predicting the final points for each team at the season's end. 

----

Here's a brief outline of a relatively straightforward way to model this scenario. 

For each match, you can treat the number of shots attempted by the home and away teams as random variables, such as Poisson random variables. The rate parameters of the Poisson distribution will be influenced by the strengths of both teams. Each shot taken will have an associated probability of scoring, which also varies depending on the teams involved. 

You are encouraged to create your own models, but it's essential to explain and justify your choices. Discuss how you determine the parameters and outline the advantages and limitations of the models you select.

-----
## Question 2: Premier League Forecasting:

The script datascraper.py extracts a large quantity of player statistics from the website https://www.fbref.com and returnst to us a *players.json* file that we can use for our predictive model.

The general principle of our simulation is a time based simulation. I.E we start the game off with a random player from the home team (excluding the gk) taking the kick-off. When it is taken he will pass to another random player (excluding the keeper). We then simulate the players chances of either passing, shooting, keeping or losing the ball.

We need to use this data to model a large number of things:
1) Team Lineups
2) 


#### Team Lineup Creation Function:

In [56]:

def predict_starting_lineup(team_name, iterations=10000, file_path ="./players.json"):
    """
    Predict the starting lineup for a team using a Monte Carlo simulation.

    Parameters:
        file_path (str): Path to the JSON file containing player stats.
        team_name (str): The name of the team to predict the lineup for.
        iterations (int): Number of Monte Carlo iterations.

    Returns:
        dict: Predicted starting lineup with 4 defenders, 3 midfielders, 3 forwards, and 1 goalkeeper.
    """
    #Assign numberDefenders, numberAttackers, numberMidfielders
    with open("./teamFormations.json", 'r') as formationFile:
        formations = json.load(formationFile)

    lineup = "4,3,3"
    for formation in formations:
        if formation["name"] == team_name:
            lineup = formation["formation"]
        
    # Load player data from file
    with open(file_path, 'r') as file:
        players = json.load(file)
    
    lineup = lineup.split(",")

    # Filter players for the specified team
    team_players = [player for player in players if player['team'] == team_name]

    if not team_players:
        raise ValueError(f"No players found for team {team_name}")

    # Separate players by position, accounting for multiple positions
    defenders = [p for p in team_players if 'DF' in p['position'].split(', ')]
    midfielders = [p for p in team_players if 'MF' in p['position'].split(', ')]
    forwards = [p for p in team_players if 'FW' in p['position'].split(', ')]
    goalkeepers = [p for p in team_players if 'GK' in p['position'].split(', ')]

    if not goalkeepers:
        raise ValueError(f"No goalkeepers found for team {team_name}")

    # Monte Carlo simulation to predict lineup
    defender_selection_counts = defaultdict(int)
    midfielder_selection_counts = defaultdict(int)
    forward_selection_counts = defaultdict(int)
    goalkeeper_selection_counts = defaultdict(int)

    for _ in range(iterations):
        # Randomly choose 4 defenders weighted by games played
        selected_defenders = random.choices(
            defenders,
            weights=[p['gamesPlayedForCurrentTeam'] if p['gamesPlayedForCurrentTeam'] > 0 else 1 for p in defenders],
            k= int(lineup[0])
        )

        # Randomly choose 3 midfielders weighted by games played
        selected_midfielders = random.choices(
            midfielders,
            weights=[p['gamesPlayedForCurrentTeam'] if p['gamesPlayedForCurrentTeam'] > 0 else 1 for p in midfielders],
            k= int(lineup[1])
        )

        # Randomly choose 3 forwards weighted by games played
        selected_forwards = random.choices(
            forwards,
            weights=[p['gamesPlayedForCurrentTeam'] if p['gamesPlayedForCurrentTeam'] > 0 else 1 for p in forwards],
            k= int(lineup[2])
        )

        # Randomly choose 1 goalkeeper weighted by starts/games ratio
        selected_goalkeeper = random.choices(
            goalkeepers,
            weights=[p['gamesPlayedForCurrentTeam'] if p['gamesPlayedForCurrentTeam'] > 0 else 1 for p in goalkeepers],
            k=1
        )[0]

        # Increment selection counts
        for player in selected_defenders:
            defender_selection_counts[player['name']] += 1
        for player in selected_midfielders:
            midfielder_selection_counts[player['name']] += 1
        for player in selected_forwards:
            forward_selection_counts[player['name']] += 1
        goalkeeper_selection_counts[selected_goalkeeper['name']] += 1

    # Determine the most likely starting lineup
    predicted_defenders = sorted(
        defender_selection_counts.items(),
        key=lambda x: x[1],
        reverse=True
    )[:int(lineup[0])]

    predicted_midfielders = sorted(
        midfielder_selection_counts.items(),
        key=lambda x: x[1],
        reverse=True
    )[:int(lineup[1])]

    predicted_forwards = sorted(
        forward_selection_counts.items(),
        key=lambda x: x[1],
        reverse=True
    )[:int(lineup[2])]

    predicted_goalkeeper = max(goalkeeper_selection_counts.items(), key=lambda x: x[1])

    # Format the result
    lineup = {
        'goalkeeper': predicted_goalkeeper[0],
        'defenders': [player[0] for player in predicted_defenders],
        'midfielders': [player[0] for player in predicted_midfielders],
        'forwards': [player[0] for player in predicted_forwards]
    }

    return lineup


#Creates an array of the team names!
with open("./teamFormations.json", 'r') as teamFile:
        teamsJSON = json.load(teamFile)
for team in teamsJSON:
    teams.append(team['name'])







['Arsenal', 'Aston Villa', 'Bournemouth', 'Brentford', 'Brighton', 'Chelsea', 'Crystal Palace', 'Everton', 'Fulham', 'Ipswich Town', 'Leicester City', 'Liverpool', 'Manchester City', 'Manchester Utd', 'Newcastle Utd', 'Nottingham Forest', 'Southampton', 'Tottenham', 'West Ham', 'Wolves']
