# Premier League Prediction

**Information about the EPL and its Teams**
- Total 20 teams in each season
- Each team play a total of 38 games in a season
- The dataset contains data from 2011 to 2023 i.e. 13 seasons.

### Libraries

In [153]:
import pandas as pd
import numpy as np
from sklearn.model_selection import GridSearchCV
from sklearn.metrics import accuracy_score, precision_score
from sklearn.ensemble import RandomForestClassifier

### Initialization

In [100]:
matches = pd.read_csv("../data/premier_league_data.csv")
matches.head()

Unnamed: 0,date,time,comp,round,day,venue,result,gf,ga,opponent,...,sh,sot,dist,fk,pk,pkatt,season,team,xg,xga
0,2011-08-15,,Premier League,Matchweek 1,Mon,Home,W,4,0,Swansea City,...,,,,0.0,0.0,,2011,Manchester City,,
1,2011-08-21,,Premier League,Matchweek 2,Sun,Away,W,3,2,Bolton,...,,,,0.0,0.0,,2011,Manchester City,,
2,2011-08-28,,Premier League,Matchweek 3,Sun,Away,W,5,1,Tottenham,...,,,,0.0,0.0,,2011,Manchester City,,
3,2011-09-10,,Premier League,Matchweek 4,Sat,Home,W,3,0,Wigan Athletic,...,,,,0.0,0.0,,2011,Manchester City,,
4,2011-09-18,,Premier League,Matchweek 5,Sun,Away,D,2,2,Fulham,...,,,,0.0,0.0,,2011,Manchester City,,


### Data Analysis

**Check if any game data is missing**

In [101]:
matches.shape

(9880, 28)

In [102]:
38 * 20 * 13

9880

- Each 20 teams played 38 games over 13 seasons, thus the math checks out ✅.

**Check if the data of teams in dataset matches with the number of games that they played**

In [103]:
matches['team'].value_counts()

team
Manchester City             494
Manchester United           494
Arsenal                     494
Tottenham Hotspur           494
Chelsea                     494
Everton                     494
Liverpool                   494
Newcastle United            456
West Ham United             456
Southampton                 418
Crystal Palace              418
Aston Villa                 380
Leicester City              342
West Bromwich Albion        304
Burnley                     304
Stoke City                  266
Swansea City                266
Brighton and Hove Albion    266
Bournemouth                 266
Wolverhampton Wanderers     266
Fulham                      266
Sunderland                  228
Norwich City                228
Watford                     228
Sheffield United            114
Queens Park Rangers         114
Brentford                   114
Leeds United                114
Hull City                   114
Wigan Athletic               76
Huddersfield Town            76
Car

In [104]:
# Dictionary of expected match counts for each team
expected_matches = {
    'Manchester City': 494, 'Manchester United': 494, 'Arsenal': 494, 'Tottenham Hotspur': 494, 'Chelsea': 494, 
    'Everton': 494, 'Liverpool': 494, 'Newcastle United': 456, 'West Ham United': 456, 'Southampton': 418, 
    'Crystal Palace': 418, 'Aston Villa': 380, 'Leicester City': 342, 'West Bromwich Albion': 304, 
    'Burnley': 304, 'Stoke City': 266, 'Swansea City': 266, 'Brighton and Hove Albion': 266, 
    'Bournemouth': 266, 'Wolverhampton Wanderers': 266, 'Fulham': 266, 'Sunderland': 228, 
    'Norwich City': 228, 'Watford': 228, 'Sheffield United': 114, 'Queens Park Rangers': 114, 
    'Brentford': 114, 'Leeds United': 114, 'Hull City': 114, 'Wigan Athletic': 76, 'Huddersfield Town': 76, 
    'Cardiff City': 76, 'Nottingham Forest': 76, 'Bolton Wanderers': 38, 'Blackburn Rovers': 38, 
    'Reading': 38, 'Middlesbrough': 38, 'Luton Town': 38
}

# Get the actual match counts from your data
actual_match_counts = matches['team'].value_counts()

# Compare actual vs expected
for team, expected_count in expected_matches.items():
    actual_count = actual_match_counts.get(team, 0)
    if actual_count == expected_count:
        continue
    else:
        print(f"{team}: Incorrect ({actual_count} matches, expected {expected_count})")

- No unmatched data, hence each team's data matches ✅.

**Check if there are correct number of games played in each season**

In [105]:
matches['season'].value_counts()

season
2011    760
2012    760
2013    760
2014    760
2015    760
2016    760
2017    760
2018    760
2019    760
2020    760
2021    760
2022    760
2023    760
Name: count, dtype: int64

In [106]:
38*20

760

- This checks out as well ✅.

### Data Cleaning

In [107]:
matches.dtypes

date              object
time              object
comp              object
round             object
day               object
venue             object
result            object
gf                 int64
ga                 int64
opponent          object
poss             float64
attendance       float64
captain           object
formation         object
opp formation     object
referee           object
match report      object
notes            float64
sh               float64
sot              float64
dist             float64
fk               float64
pk               float64
pkatt            float64
season             int64
team              object
xg               float64
xga              float64
dtype: object

In [108]:
matches["date"] = pd.to_datetime(matches["date"])

**Fill missing values**

In [110]:
matches.isnull().sum()

date                0
time             2280
comp                0
round               0
day                 0
venue               0
result              0
gf                  0
ga                  0
opponent            0
poss             2280
attendance        882
captain          3040
formation           0
opp formation       0
referee             0
match report        0
notes            9880
sh               2280
sot              2280
dist             4563
fk                  0
pk                  0
pkatt            2280
season              0
team                0
xg               4560
xga              4560
dtype: int64

In [111]:
average_times = {
    'Mon': '20:00',
    'Tue': '19:45',
    'Wed': '19:45',
    'Thu': '19:45',
    'Fri': '20:00',
    'Sat': '15:00',
    'Sun': '14:00'
}

matches["time"] = matches["time"].fillna(matches["day"].map(average_times))

In [112]:
avg_poss = np.ceil(matches.groupby('team')['poss'].mean())

# Fill missing possession values with the ceiling average possession
matches['poss'] = matches['poss'].fillna(matches['team'].map(avg_poss))

# For teams with Nan as avg poss, put the least poss
matches["poss"] = matches['poss'].fillna(avg_poss.min())

In [113]:
avg_attendance = np.ceil(matches.groupby('team')['attendance'].mean())

# Fill missing attandance values with the ceiling average
matches['attendance'] = matches['attendance'].fillna(matches['team'].map(avg_attendance))

In [114]:
# Calculate average values for each team
avg_dist = matches.groupby('team')['dist'].mean()
avg_xg = matches.groupby('team')['xg'].mean()
avg_xga = matches.groupby('team')['xga'].mean()

# Fill missing values with the average for each team
matches['dist'] = matches['dist'].fillna(matches['team'].map(avg_dist))
matches['xg'] = matches['xg'].fillna(matches['team'].map(avg_xg))
matches['xga'] = matches['xga'].fillna(matches['team'].map(avg_xga))

In [115]:
matches["captain"] = matches['captain'].fillna("")
matches["notes"] = matches['notes'].fillna("")
matches["sh"] = matches['sh'].fillna(0)
matches["sot"] = matches['sot'].fillna(0)
matches["pkatt"] = matches['pkatt'].fillna(0)
matches["dist"] = matches['pkatt'].fillna(0)
matches["xg"] = matches['pkatt'].fillna(0)
matches["xga"] = matches['pkatt'].fillna(0)

In [117]:
matches['formation'] = matches['formation'].str.replace('200', '', regex=False)
matches['opp formation'] = matches['opp formation'].str.replace('200', '', regex=False)

In [118]:
matches.isnull().sum()

date             0
time             0
comp             0
round            0
day              0
venue            0
result           0
gf               0
ga               0
opponent         0
poss             0
attendance       0
captain          0
formation        0
opp formation    0
referee          0
match report     0
notes            0
sh               0
sot              0
dist             0
fk               0
pk               0
pkatt            0
season           0
team             0
xg               0
xga              0
dtype: int64

In [120]:
matches.to_csv("../data/cleaned_premier_league_data.csv")

### Data Preprocessing

In [121]:
matches["venue"] = matches["venue"].astype("category").cat.codes

In [122]:
matches["opp_code"] = matches["opponent"].astype("category").cat.codes

In [125]:
matches["hour"] = matches["time"].str.replace(":.+", "", regex=True).astype('int')

In [127]:
matches['day_code'] = matches['date'].dt.dayofweek

In [132]:
matches['target'] = (matches['result'] == 'W').astype('int')

### Model Training

In [155]:
classifier = RandomForestClassifier(random_state=77)

In [156]:
param_grid = {
    'n_estimators': [100, 200, 300],
    'min_samples_split': [2, 5, 10, 20],
    'min_samples_leaf': [1, 2, 5],
    'max_depth': [None, 10, 20, 30],
    'max_features': ['sqrt', 'log2', 0.5]
}

In [157]:
train = matches[matches['date'] < '2020-01-01']
test = matches[matches['date'] > '2020-01-01']

In [158]:
predictors = ['venue', 'opp_code', 'hour', 'day_code']

In [159]:
# Use GridSearchCV to find the best parameters
grid_search = GridSearchCV(estimator=classifier, param_grid=param_grid, cv=5, n_jobs=-1)
grid_search.fit(train[predictors], train['target'])

print("Best Hyperparameters:", grid_search.best_params_)

Best Hyperparameters: {'max_depth': 10, 'max_features': 'sqrt', 'min_samples_leaf': 5, 'min_samples_split': 20, 'n_estimators': 100}


In [160]:
classifier = grid_search.best_estimator_
preds = classifier.predict(test[predictors])

In [162]:
acc = accuracy_score(test['target'], preds)
acc

0.6025413711583925

In [166]:
combined = pd.DataFrame(dict(actual=test['target'], prediction=preds)) 
pd.crosstab(index=combined['actual'], columns=combined['prediction'])

prediction,0,1
actual,Unnamed: 1_level_1,Unnamed: 2_level_1
0,1621,453
1,892,418


In [167]:
precision = precision_score(test['target'], preds)
precision

np.float64(0.4799081515499426)

### Improving precision with rolling averages

In [247]:
def rolling_averages(group, cols, new_cols):
    group = group.sort_values('date')
    rolling_stats = group[cols].rolling(3, closed='left').mean()
    group[new_cols] = rolling_stats
    # group = group.dropna(subset=new_cols)
    group[new_cols] = group[new_cols].fillna(0)

    return group

In [248]:
cols = ['gf', 'ga', 'sh', 'sot', 'dist', 'fk', 'pk', 'pkatt']
new_cols = [f'{c}_rolling' for c in cols]

In [249]:
matches_rolling = matches.groupby('team').apply(lambda x: rolling_averages(x, cols, new_cols))

  matches_rolling = matches.groupby('team').apply(lambda x: rolling_averages(x, cols, new_cols))


In [250]:
matches_rolling

Unnamed: 0_level_0,Unnamed: 1_level_0,date,time,comp,round,day,venue,result,gf,ga,opponent,...,day_code,target,gf_rolling,ga_rolling,sh_rolling,sot_rolling,dist_rolling,fk_rolling,pk_rolling,pkatt_rolling
team,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1,Unnamed: 22_level_1
Arsenal,76,2011-08-13,15:00,Premier League,Matchweek 1,Sat,0,D,0,0,Newcastle Utd,...,5,0,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000
Arsenal,77,2011-08-20,15:00,Premier League,Matchweek 2,Sat,1,L,0,2,Liverpool,...,5,0,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000
Arsenal,78,2011-08-28,14:00,Premier League,Matchweek 3,Sun,0,L,2,8,Manchester Utd,...,6,0,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000
Arsenal,79,2011-09-10,15:00,Premier League,Matchweek 4,Sat,1,W,1,0,Swansea City,...,5,1,0.666667,3.333333,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000
Arsenal,80,2011-09-17,15:00,Premier League,Matchweek 5,Sat,0,L,3,4,Blackburn,...,5,0,1.000000,3.333333,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
Wolverhampton Wanderers,9647,2024-04-24,19:45,Premier League,Matchweek 29,Wed,1,L,0,1,Bournemouth,...,2,0,1.000000,2.000000,9.666667,4.000000,0.333333,0.333333,0.333333,0.333333
Wolverhampton Wanderers,9648,2024-04-27,15:00,Premier League,Matchweek 35,Sat,1,W,2,1,Luton Town,...,5,1,0.666667,1.666667,10.333333,3.333333,0.000000,0.000000,0.000000,0.000000
Wolverhampton Wanderers,9649,2024-05-04,17:30,Premier League,Matchweek 36,Sat,0,L,1,5,Manchester City,...,5,0,0.666667,1.333333,11.000000,4.000000,0.000000,0.000000,0.000000,0.000000
Wolverhampton Wanderers,9650,2024-05-11,15:00,Premier League,Matchweek 37,Sat,1,L,1,3,Crystal Palace,...,5,0,1.000000,2.333333,10.000000,3.333333,0.000000,0.000000,0.000000,0.000000


In [251]:
matches_rolling = matches_rolling.droplevel('team')
matches_rolling

Unnamed: 0,date,time,comp,round,day,venue,result,gf,ga,opponent,...,day_code,target,gf_rolling,ga_rolling,sh_rolling,sot_rolling,dist_rolling,fk_rolling,pk_rolling,pkatt_rolling
76,2011-08-13,15:00,Premier League,Matchweek 1,Sat,0,D,0,0,Newcastle Utd,...,5,0,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000
77,2011-08-20,15:00,Premier League,Matchweek 2,Sat,1,L,0,2,Liverpool,...,5,0,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000
78,2011-08-28,14:00,Premier League,Matchweek 3,Sun,0,L,2,8,Manchester Utd,...,6,0,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000
79,2011-09-10,15:00,Premier League,Matchweek 4,Sat,1,W,1,0,Swansea City,...,5,1,0.666667,3.333333,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000
80,2011-09-17,15:00,Premier League,Matchweek 5,Sat,0,L,3,4,Blackburn,...,5,0,1.000000,3.333333,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
9647,2024-04-24,19:45,Premier League,Matchweek 29,Wed,1,L,0,1,Bournemouth,...,2,0,1.000000,2.000000,9.666667,4.000000,0.333333,0.333333,0.333333,0.333333
9648,2024-04-27,15:00,Premier League,Matchweek 35,Sat,1,W,2,1,Luton Town,...,5,1,0.666667,1.666667,10.333333,3.333333,0.000000,0.000000,0.000000,0.000000
9649,2024-05-04,17:30,Premier League,Matchweek 36,Sat,0,L,1,5,Manchester City,...,5,0,0.666667,1.333333,11.000000,4.000000,0.000000,0.000000,0.000000,0.000000
9650,2024-05-11,15:00,Premier League,Matchweek 37,Sat,1,L,1,3,Crystal Palace,...,5,0,1.000000,2.333333,10.000000,3.333333,0.000000,0.000000,0.000000,0.000000


In [252]:
matches_rolling.index = range(matches_rolling.shape[0])

In [253]:
matches_rolling

Unnamed: 0,date,time,comp,round,day,venue,result,gf,ga,opponent,...,day_code,target,gf_rolling,ga_rolling,sh_rolling,sot_rolling,dist_rolling,fk_rolling,pk_rolling,pkatt_rolling
0,2011-08-13,15:00,Premier League,Matchweek 1,Sat,0,D,0,0,Newcastle Utd,...,5,0,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000
1,2011-08-20,15:00,Premier League,Matchweek 2,Sat,1,L,0,2,Liverpool,...,5,0,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000
2,2011-08-28,14:00,Premier League,Matchweek 3,Sun,0,L,2,8,Manchester Utd,...,6,0,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000
3,2011-09-10,15:00,Premier League,Matchweek 4,Sat,1,W,1,0,Swansea City,...,5,1,0.666667,3.333333,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000
4,2011-09-17,15:00,Premier League,Matchweek 5,Sat,0,L,3,4,Blackburn,...,5,0,1.000000,3.333333,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
9875,2024-04-24,19:45,Premier League,Matchweek 29,Wed,1,L,0,1,Bournemouth,...,2,0,1.000000,2.000000,9.666667,4.000000,0.333333,0.333333,0.333333,0.333333
9876,2024-04-27,15:00,Premier League,Matchweek 35,Sat,1,W,2,1,Luton Town,...,5,1,0.666667,1.666667,10.333333,3.333333,0.000000,0.000000,0.000000,0.000000
9877,2024-05-04,17:30,Premier League,Matchweek 36,Sat,0,L,1,5,Manchester City,...,5,0,0.666667,1.333333,11.000000,4.000000,0.000000,0.000000,0.000000,0.000000
9878,2024-05-11,15:00,Premier League,Matchweek 37,Sat,1,L,1,3,Crystal Palace,...,5,0,1.000000,2.333333,10.000000,3.333333,0.000000,0.000000,0.000000,0.000000


In [272]:
extended_predictors = predictors + new_cols

train = matches_rolling[matches_rolling['date'] <= '2020-01-01']
test = matches_rolling[matches_rolling['date'] > '2020-01-01']

classifier = RandomForestClassifier(random_state=77)
grid_search = GridSearchCV(estimator=classifier, param_grid=param_grid, cv=5, n_jobs=-1)
grid_search.fit(train[extended_predictors], train['target'])
print("Best Hyperparameters:", grid_search.best_params_)

  _data = np.array(data, dtype=dtype, copy=copy,


Best Hyperparameters: {'max_depth': 10, 'max_features': 'sqrt', 'min_samples_leaf': 2, 'min_samples_split': 10, 'n_estimators': 300}


In [340]:
# classifier = RandomForestClassifier(min_samples_leaf= 50, n_estimators= 500, random_state=77)
classifier = grid_search.best_estimator_

In [341]:
def make_predictions(predictors):
    classifier.fit(train[predictors], train['target'])
    preds = classifier.predict(test[predictors])
    combined = pd.DataFrame(dict(actual=test['target'], prediction=preds), index=test.index) 
    precision = precision_score(test['target'], preds)

    return combined, precision

In [342]:
combined, precision = make_predictions(extended_predictors)

In [343]:
precision

np.float64(0.5656984785615491)

### Combine Home and Away predictions

In [344]:
combined = combined.merge(matches_rolling[['date', 'team', 'opponent', 'result']], 
                          left_index=True, right_index=True)
combined

Unnamed: 0,actual,prediction,date,team,opponent,result
325,0,0,2020-01-11,Arsenal,Crystal Palace,D
326,0,0,2020-01-18,Arsenal,Sheffield Utd,D
327,0,0,2020-01-21,Arsenal,Chelsea,D
328,0,0,2020-02-02,Arsenal,Burnley,D
329,1,0,2020-02-16,Arsenal,Newcastle Utd,W
...,...,...,...,...,...,...
9875,0,0,2024-04-24,Wolverhampton Wanderers,Bournemouth,L
9876,1,0,2024-04-27,Wolverhampton Wanderers,Luton Town,W
9877,0,0,2024-05-04,Wolverhampton Wanderers,Manchester City,L
9878,0,0,2024-05-11,Wolverhampton Wanderers,Crystal Palace,L


In [345]:
class MissingDict(dict):
    __missing__ = lambda self, key: key

map_values = {
    "Brighton and Hove Albion": "Brighton",
    "Manchester United": "Manchester Utd",
    "Newcastle United": "Newcastle Utd",
    "Tottenham Hotspur": "Tottenham",
    "West Ham United": "West Ham",
    "Wolverhampton Wanderers": "Wolves"
}

mapping = MissingDict(**map_values)

In [346]:
mapping["Wolverhampton Wanderers"]

'Wolves'

In [347]:
combined["new_team"] = combined["team"].map(mapping)
combined

Unnamed: 0,actual,prediction,date,team,opponent,result,new_team
325,0,0,2020-01-11,Arsenal,Crystal Palace,D,Arsenal
326,0,0,2020-01-18,Arsenal,Sheffield Utd,D,Arsenal
327,0,0,2020-01-21,Arsenal,Chelsea,D,Arsenal
328,0,0,2020-02-02,Arsenal,Burnley,D,Arsenal
329,1,0,2020-02-16,Arsenal,Newcastle Utd,W,Arsenal
...,...,...,...,...,...,...,...
9875,0,0,2024-04-24,Wolverhampton Wanderers,Bournemouth,L,Wolves
9876,1,0,2024-04-27,Wolverhampton Wanderers,Luton Town,W,Wolves
9877,0,0,2024-05-04,Wolverhampton Wanderers,Manchester City,L,Wolves
9878,0,0,2024-05-11,Wolverhampton Wanderers,Crystal Palace,L,Wolves


In [348]:
merged = combined.merge(combined, left_on=['date', 'new_team'], right_on=['date', 'opponent'])
merged

Unnamed: 0,actual_x,prediction_x,date,team_x,opponent_x,result_x,new_team_x,actual_y,prediction_y,team_y,opponent_y,result_y,new_team_y
0,0,0,2020-01-11,Arsenal,Crystal Palace,D,Arsenal,0,0,Crystal Palace,Arsenal,D,Crystal Palace
1,0,0,2020-01-18,Arsenal,Sheffield Utd,D,Arsenal,0,0,Sheffield United,Arsenal,D,Sheffield United
2,0,0,2020-01-21,Arsenal,Chelsea,D,Arsenal,0,0,Chelsea,Arsenal,D,Chelsea
3,0,0,2020-02-02,Arsenal,Burnley,D,Arsenal,0,0,Burnley,Arsenal,D,Burnley
4,1,0,2020-02-16,Arsenal,Newcastle Utd,W,Arsenal,0,0,Newcastle United,Arsenal,L,Newcastle Utd
...,...,...,...,...,...,...,...,...,...,...,...,...,...
3171,0,0,2024-04-24,Wolverhampton Wanderers,Bournemouth,L,Wolves,1,0,Bournemouth,Wolves,W,Bournemouth
3172,1,0,2024-04-27,Wolverhampton Wanderers,Luton Town,W,Wolves,0,0,Luton Town,Wolves,L,Luton Town
3173,0,0,2024-05-04,Wolverhampton Wanderers,Manchester City,L,Wolves,1,1,Manchester City,Wolves,W,Manchester City
3174,0,0,2024-05-11,Wolverhampton Wanderers,Crystal Palace,L,Wolves,1,0,Crystal Palace,Wolves,W,Crystal Palace


In [349]:
merged[(merged['prediction_x'] == 1) & (merged['prediction_y'] == 0)]['actual_x'].value_counts()

actual_x
1    372
0    266
Name: count, dtype: int64

In [350]:
cols = ['gf', 'ga', 'sh', 'sot', 'dist', 'fk', 'pk', 'pkatt']
['venue', 'opp_code', 'hour', 'day_code']

['venue', 'opp_code', 'hour', 'day_code']

In [351]:
my_data = np.array([1, 32, 15, 6, 1.39, 1.39, 9.7, 3.2, 0.1, 0.23, 0.1, 0.1]).reshape(1, -1)
my_predictors = ['venue', 'opp_code', 'hour', 'day_code', 'gf_rolling', 'ga_rolling', 'sh_rolling', 
              'sot_rolling', 'dist_rolling', 'fk_rolling', 'pk_rolling', 'pkatt_rolling']

my_data_df = pd.DataFrame(my_data, columns=my_predictors)
my_pred = classifier.predict(my_data_df)

print("Predicted Outcome:", my_pred)


Predicted Outcome: [0]
