# Predicting EPL Result Using Poisson Distribution and Classification

## Notebook Contents:

[Objective](#Objective)  
[Data Prep/Summary](#DPS)  
[Model](#model)

<a id='Objective'></a>
### Objective:
Predict the probability of each matches results of English Premier League using past season and Pro Evolution Soccer data using Poisson distribution and classification. We train seasons from 13/14 to 17/18 (five seasons) to predict 18/19 season. New teams promoted on each seasons will be ignored.

In [1]:
import pandas as pd
from sklearn.linear_model import LogisticRegression
from sklearn.ensemble import RandomForestClassifier
from sklearn.neighbors import KNeighborsClassifier
from sklearn.neural_network import MLPClassifier
from sklearn.preprocessing import StandardScaler  
from scipy.stats import poisson

  from numpy.core.umath_tests import inner1d


<a id='DPS'></a>
### Data Prep/Summary:
- Pro Evloution Soccer Data (PES 14 - PES 19)
- English Premier League Stats (2013 - Present)
- English Premier League Fixtures (2013 - Present)

In [2]:
# Current season Roster Power Index(RPI) from PES Database.
# Each data is used to predict each season.
# For example, use pes19 to predict (train) 18/19 season.
# Merge PES 15 data with epl_table_1314 table in order to predict EPL 14/15 season.

pes15 = pd.read_csv('./Data/PES15.csv')
pes16 = pd.read_csv('./Data/PES16.csv')
pes17 = pd.read_csv('./Data/PES17.csv')
pes18 = pd.read_csv('./Data/PES18.csv')
pes19 = pd.read_csv('./Data/PES19.csv')

In [3]:
# Read past seasons data which have been cleaned beforehands
# We want teamnames, total number of goals each team scored and allowed ('Team', 'HGF', 'HGA')
# We calculate each teams offensive and defensive rating ('H_Att', 'A_Att', 'H_Def', 'A_Def')
# Number of yellow cards, red cards, and Discipline points and the number of clean sheets each team for the last season

# IMPORTANT
# table_13 has the final result of 13/14 season, with 12/13 seasons summary
# We will be able to predict the result using features 

table_14 = pd.read_csv('./Data/epl_table_1314.csv')[['Team', 'HGF', 'HGA', 'H_Att', 'A_Att', 'H_Def', 'A_Def', 'YC', 'RC', 'DIS', 'CS']]
table_15 = pd.read_csv('./Data/epl_table_1415.csv')[['Team', 'HGF', 'HGA', 'H_Att', 'A_Att', 'H_Def', 'A_Def', 'YC', 'RC', 'DIS', 'CS']]
table_16 = pd.read_csv('./Data/epl_table_1516.csv')[['Team', 'HGF', 'HGA', 'H_Att', 'A_Att', 'H_Def', 'A_Def', 'YC', 'RC', 'DIS', 'CS']]
table_17 = pd.read_csv('./Data/epl_table_1617.csv')[['Team', 'HGF', 'HGA', 'H_Att', 'A_Att', 'H_Def', 'A_Def', 'YC', 'RC', 'DIS', 'CS']]
table_18 = pd.read_csv('./Data/epl_table_1718.csv')[['Team', 'HGF', 'HGA', 'H_Att', 'A_Att', 'H_Def', 'A_Def', 'YC', 'RC', 'DIS', 'CS']]

table = [table_14, table_15, table_16, table_17, table_18]

In [4]:
# Read all season fixtures from 13/14 to 18/19 (present)
# We need home/away team and the final result to compare with our prediction
# We want how many goals they score to make useful EDAs

epl_fixture_1415 = pd.read_csv('./Data/epl15.csv')[['HomeTeam', 'AwayTeam', 'FTR']]
epl_fixture_1516 = pd.read_csv('./Data/epl16.csv')[['HomeTeam', 'AwayTeam', 'FTR']]
epl_fixture_1617 = pd.read_csv('./Data/epl17.csv')[['HomeTeam', 'AwayTeam', 'FTR']]
epl_fixture_1718 = pd.read_csv('./Data/epl18.csv')[['HomeTeam', 'AwayTeam', 'FTR']]
epl_fixture_1819 = pd.read_csv('./Data/epl19.csv')[['HomeTeam', 'AwayTeam', 'FTR']]

fixture = [epl_fixture_1415, epl_fixture_1516, epl_fixture_1617, epl_fixture_1718, epl_fixture_1819]

In [5]:
# Merge two RPIs.
# Merge PES 15 database and EPL 13/14 table.
# Train 5 merged data to predict 18/19 season.
pr_15 = pd.merge(pes15, table_14, on='Team', how='outer').sort_values(by='Team').dropna()
pr_16 = pd.merge(pes16, table_15, on='Team', how='outer').sort_values(by='Team').dropna()
pr_17 = pd.merge(pes17, table_16, on='Team', how='outer').sort_values(by='Team').dropna()
pr_18 = pd.merge(pes18, table_17, on='Team', how='outer').sort_values(by='Team').dropna()
pr_19 = pd.merge(pes19, table_18, on='Team', how='outer').sort_values(by='Team').dropna()

### Getting W/D/L Probability from Poisson Distribution
Add W/D/L probability to fixture dataframe using Poisson distribution.

In [6]:
def result_percentage(dataframe, hometeam, awayteam):
    '''
    returns array of size 2.
    list of percentage of hometeam win/away win/draw
    dataframe of percentage of how many goals each team scores in 90 minutes (min: 0, max: 5)
    '''
    home_avg = dataframe['HGF'].sum()/380
    away_avg = dataframe['HGA'].sum()/380
    
    home_score = float(dataframe[dataframe['Team'] == hometeam]['H_Att']) * float(dataframe[dataframe['Team'] == awayteam]['A_Def']) * home_avg
    away_score = float(dataframe[dataframe['Team'] == awayteam]['A_Att']) * float(dataframe[dataframe['Team'] == hometeam]['H_Def']) * away_avg

    score = []
    # maximum score for a team is 5
    for goals in range(0, 6):
        scores = {}
        scores['Home'] = (poisson.pmf(goals, home_score)) # Hometeam score
        scores['Away'] = (poisson.pmf(goals, away_score)) # away score

        if len(scores) == 2:
                    score.append(scores)

    score = pd.DataFrame(score, columns=(['Home', 'Away']))

    # % of home team winning
    # home score > away score
    # home[1] * away[0]
    # home[2] * away[0] + home[2] * away[1]
    # home[3] * away[0] + home[3] * away[1] + home[3] * away[2]
    # home[4] * away[0] + home[4] * away[1] + home[4] * away[2] + home[4] * away[3]
    # home[5] * away[0] + home[5] * away[1] + home[5] * away[2] + home[5] * away[3]+  home[5] * away[4]
    home_w = 0
    away_w = 0
    draw = 0
    result = []
    for home in range(1, len(score)):
        for away in range(0, home):
            home_w += (score['Home'][home] * score['Away'][away])
    result.append(home_w)

    for away in range(1, len(score)):
        for home in range(0, away):
            away_w += (score['Home'][home] * score['Away'][away])
    result.append(away_w)

    for home in range(0, len(score)):
        away = home
        draw += (score['Home'][home] * score['Away'][away])
    result.append(draw)

    return result, score

In [7]:
# Predict each game
def predict(fixture, rating):
    for i in range(len(fixture)):
        if fixture['HomeTeam'][i] in list(rating['Team']) and fixture['AwayTeam'][i] in list(rating['Team']):
            result = result_percentage(rating, fixture['HomeTeam'][i], fixture['AwayTeam'][i])[0]
            fixture.loc[i, 'W'] = result[0]
            fixture.loc[i, 'D'] = result[2]
            fixture.loc[i, 'L'] = result[1]


In [8]:
for i in range(len(fixture)):
    predict(fixture[i], table[i])

### Merge PES data to Fixture Dataframe

In [9]:
# Merge all features to fixture data
def fill_df(epl_fixture_data, pes_data):
    '''
    Pass RPIs to fixture dataframe.
    IndexError will be ignored.
    The model ignores relegation and promotion of teams.
    '''
    for team in epl_fixture_data['HomeTeam']:
        try:
            ht_off = pes_data.loc[pes_data['Team'] == team, 'H_Att'].values[0] 
            ht_def = pes_data.loc[pes_data['Team'] == team, 'H_Def'].values[0]
            at_off = pes_data.loc[pes_data['Team'] == team, 'A_Att'].values[0]
            at_def = pes_data.loc[pes_data['Team'] == team, 'A_Def'].values[0]
            
            pes_ovr = pes_data.loc[pes_data['Team'] == team, 'Ovr'].values[0]
            pes_off = pes_data.loc[pes_data['Team'] == team, 'Fwd'].values[0]
            pes_def = pes_data.loc[pes_data['Team'] == team, 'Def'].values[0]
            pes_mid = pes_data.loc[pes_data['Team'] == team, 'Mid'].values[0]
            pes_spd = pes_data.loc[pes_data['Team'] == team, 'Spd'].values[0]
            pes_phy = pes_data.loc[pes_data['Team'] == team, 'Phy'].values[0]
            
            cs = pes_data.loc[pes_data['Team'] == team, 'CS'].values[0]
            yc = pes_data.loc[pes_data['Team'] == team, 'YC'].values[0]
            rc = pes_data.loc[pes_data['Team'] == team, 'RC'].values[0]
            dis = pes_data.loc[pes_data['Team'] == team, 'DIS'].values[0]

            epl_fixture_data.loc[epl_fixture_data['HomeTeam'] == team, 'HtYC'] = yc
            epl_fixture_data.loc[epl_fixture_data['HomeTeam'] == team, 'HtRC'] = rc
            epl_fixture_data.loc[epl_fixture_data['HomeTeam'] == team, 'HtCS'] = cs
            epl_fixture_data.loc[epl_fixture_data['HomeTeam'] == team, 'HtDis'] = dis
            epl_fixture_data.loc[epl_fixture_data['AwayTeam'] == team, 'AtDis'] = dis
            epl_fixture_data.loc[epl_fixture_data['AwayTeam'] == team, 'AtYC'] = yc
            epl_fixture_data.loc[epl_fixture_data['AwayTeam'] == team, 'AtRC'] = rc
            epl_fixture_data.loc[epl_fixture_data['AwayTeam'] == team, 'AtCS'] = cs
            
            epl_fixture_data.loc[epl_fixture_data['HomeTeam'] == team, 'HtOff'] = ht_off
            epl_fixture_data.loc[epl_fixture_data['HomeTeam'] == team, 'HtDef'] = ht_def
            epl_fixture_data.loc[epl_fixture_data['AwayTeam'] == team, 'AtOff'] = at_off
            epl_fixture_data.loc[epl_fixture_data['AwayTeam'] == team, 'AtDef'] = at_def
            
            epl_fixture_data.loc[epl_fixture_data['HomeTeam'] == team, 'HtPesOvr'] = pes_ovr
            epl_fixture_data.loc[epl_fixture_data['AwayTeam'] == team, 'AtPesOvr'] = pes_ovr
            epl_fixture_data.loc[epl_fixture_data['HomeTeam'] == team, 'HtPesOff'] = pes_off
            epl_fixture_data.loc[epl_fixture_data['AwayTeam'] == team, 'AtPesOff'] = pes_off
            epl_fixture_data.loc[epl_fixture_data['HomeTeam'] == team, 'HtPesDef'] = pes_def
            epl_fixture_data.loc[epl_fixture_data['AwayTeam'] == team, 'AtPesDef'] = pes_def
            epl_fixture_data.loc[epl_fixture_data['HomeTeam'] == team, 'HtPesMid'] = pes_mid
            epl_fixture_data.loc[epl_fixture_data['AwayTeam'] == team, 'AtPesMid'] = pes_mid
            epl_fixture_data.loc[epl_fixture_data['HomeTeam'] == team, 'HtPesSpd'] = pes_spd
            epl_fixture_data.loc[epl_fixture_data['AwayTeam'] == team, 'AtPesSpd'] = pes_spd
            epl_fixture_data.loc[epl_fixture_data['HomeTeam'] == team, 'HtPesPhy'] = pes_phy
            epl_fixture_data.loc[epl_fixture_data['AwayTeam'] == team, 'AtPesPhy'] = pes_phy
        
        except IndexError:
            pass
        
    return epl_fixture_data


In [10]:
test1 = fill_df(fixture[0], pr_15)
test2 = fill_df(fixture[1], pr_16)
test3 = fill_df(fixture[2], pr_17)
test4 = fill_df(fixture[3], pr_18)
test5 = fill_df(fixture[4], pr_19)


In [11]:
test1

Unnamed: 0,HomeTeam,AwayTeam,FTR,W,D,L,HtYC,HtRC,HtCS,HtDis,...,HtPesOff,AtPesOff,HtPesDef,AtPesDef,HtPesMid,AtPesMid,HtPesSpd,AtPesSpd,HtPesPhy,AtPesPhy
0,Arsenal,Crystal Palace,H,0.670188,0.238443,0.085600,51.0,4.0,17.0,0.881579,...,81.0,73.0,77.0,75.0,76.0,72.0,75.0,74.0,77.0,79.0
1,Leicester City,Everton,D,,,,,,,,...,,76.0,,72.0,,75.0,,75.0,,76.0
2,Manchester Utd,Swansea City,A,0.460424,0.266210,0.269164,64.0,3.0,13.0,1.000000,...,83.0,75.0,74.0,71.0,79.0,73.0,78.0,73.0,77.0,76.0
3,QP Rangers,Hull City,A,,,,,,,,...,,77.0,,75.0,,76.0,,75.0,,78.0
4,Stoke City,Aston Villa,A,0.573491,0.258294,0.163386,69.0,5.0,9.0,1.171053,...,78.0,77.0,72.0,74.0,74.0,75.0,71.0,76.0,79.0,79.0
5,West Bromwich,Sunderland,D,0.399652,0.260440,0.334923,67.0,0.0,7.0,0.881579,...,71.0,71.0,74.0,76.0,76.0,75.0,73.0,73.0,80.0,78.0
6,West Ham Utd,Tottenham,A,0.308918,0.253250,0.431572,59.0,5.0,14.0,1.039474,...,73.0,78.0,72.0,78.0,73.0,75.0,73.0,77.0,79.0,79.0
7,Liverpool,Southampton,H,0.632460,0.195279,0.149472,53.0,1.0,10.0,0.750000,...,80.0,74.0,78.0,74.0,81.0,73.0,79.0,75.0,79.0,77.0
8,Newcastle Utd,Manchester City,A,0.123377,0.164193,0.667755,51.0,6.0,10.0,0.986842,...,73.0,82.0,76.0,81.0,77.0,83.0,76.0,76.0,78.0,82.0
9,Burnley,Chelsea,A,,,,,,,,...,,82.0,,77.0,,82.0,,77.0,,80.0


<a id='model'></a>
### Model
- KNN
- Logistic Regression
- Random Forest
- Neural Network (MLP)

In [12]:
# Final model.
model_df = pd.concat([test1, test2, test3, test4, test5])

# Drop NAs from promotion/relegation.
model_df = model_df.dropna()

In [13]:
X = model_df.drop(columns=(['HomeTeam', 'AwayTeam', 'FTR']))
y = model_df['FTR']
new_cols = X.columns

In [14]:
new_cols

Index(['W', 'D', 'L', 'HtYC', 'HtRC', 'HtCS', 'HtDis', 'AtDis', 'AtYC', 'AtRC',
       'AtCS', 'HtOff', 'HtDef', 'AtOff', 'AtDef', 'HtPesOvr', 'AtPesOvr',
       'HtPesOff', 'AtPesOff', 'HtPesDef', 'AtPesDef', 'HtPesMid', 'AtPesMid',
       'HtPesSpd', 'AtPesSpd', 'HtPesPhy', 'AtPesPhy'],
      dtype='object')

In [15]:
scaler = StandardScaler()  
X = scaler.fit_transform(X)

knn = KNeighborsClassifier(n_neighbors=5)
lr = LogisticRegression(random_state=42)
rf = RandomForestClassifier(random_state=42)
mlp = MLPClassifier(hidden_layer_sizes=(2),solver='sgd',learning_rate_init= 0.01, random_state=42)

knn.fit(X, y)
lr.fit(X, y)
rf.fit(X, y)
mlp.fit(X, y);

In [17]:
print('---- Score on Train (Past 5 Seasons) ----')
print('KNN:' + str(knn.score(X, y)))
print('LR:' + str(lr.score(X, y)))
print('RF:' + str(rf.score(X, y)))
print('MLP:' + str(mlp.score(X, y)))

print('---- Score on Test (Current Season) ----')
print('KNN:' + str(knn.score(epl_fixture_1819.dropna()[new_cols], epl_fixture_1819.dropna()['FTR'])))
print('LR:' + str(lr.score(epl_fixture_1819.dropna()[new_cols], epl_fixture_1819.dropna()['FTR'])))
print('RF:' + str(rf.score(epl_fixture_1819.dropna()[new_cols], epl_fixture_1819.dropna()['FTR'])))
print('MLP:' + str(mlp.score(epl_fixture_1819.dropna()[new_cols], epl_fixture_1819.dropna()['FTR'])))

---- Score on Train (Past 5 Seasons) ----
KNN:0.6083112290008842
LR:0.5269672855879752
RF:0.9893899204244032
MLP:0.5278514588859416
---- Score on Test (Current Season) ----
KNN:0.4418604651162791
LR:0.6744186046511628
RF:0.4418604651162791
MLP:0.4186046511627907


In [20]:
epl_fixture_1819.dropna(inplace=True)
predict_X = epl_fixture_1819[new_cols]
compare = epl_fixture_1819[['HomeTeam', 'AwayTeam', 'W', 'D', 'L']]
compare = compare.reindex(columns=compare.columns.tolist() + ['Actual', 'Predict'])
compare['Actual'] = epl_fixture_1819['FTR']
compare['Predict'] = lr.predict(predict_X)

In [21]:
compare.loc[:, 'Result'] = (compare.loc[:, 'Actual'] == compare.loc[:, 'Predict']) * 1
compare.reset_index().drop(['index'], axis=1)

Unnamed: 0,HomeTeam,AwayTeam,W,D,L,Actual,Predict,Result
0,Manchester Utd,Leicester City,0.738472,0.138564,0.072993,H,H,1
1,Huddersfield,Chelsea,0.113782,0.20424,0.667711,A,A,1
2,Newcastle Utd,Tottenham,0.203776,0.275398,0.517535,A,A,1
3,Arsenal,Manchester City,0.21505,0.203304,0.556066,A,A,1
4,Liverpool,West Ham Utd,0.758934,0.07721,0.032043,H,H,1
5,Southampton,Burnley,0.235455,0.292911,0.469577,D,A,0
6,Chelsea,Arsenal,0.583343,0.239465,0.169422,H,H,1
7,Everton,Southampton,0.512122,0.259121,0.223675,H,H,1
8,West Ham Utd,Bournemouth,0.400191,0.266698,0.329019,A,A,1
9,Burnley,Watford,0.400948,0.342053,0.256443,A,A,1


In [22]:
compare['Result'].mean()

0.6744186046511628

In [23]:
compare[(compare['AwayTeam'] == 'Liverpool') | (compare['HomeTeam'] == 'Liverpool')]

Unnamed: 0,HomeTeam,AwayTeam,W,D,L,Actual,Predict,Result
8,Liverpool,West Ham Utd,0.758934,0.07721,0.032043,H,H,1
19,Crystal Palace,Liverpool,0.20562,0.182118,0.563511,A,A,1
34,Leicester City,Liverpool,0.227772,0.210801,0.540223,A,A,1
45,Tottenham,Liverpool,0.480283,0.214579,0.283161,A,A,1
56,Liverpool,Southampton,0.78462,0.13178,0.045358,H,H,1
61,Chelsea,Liverpool,0.377169,0.241799,0.371771,D,A,0


In [24]:
compare[(compare['AwayTeam'] == 'Manchester Utd') | (compare['HomeTeam'] == 'Manchester Utd')]

Unnamed: 0,HomeTeam,AwayTeam,W,D,L,Actual,Predict,Result
0,Manchester Utd,Leicester City,0.738472,0.138564,0.072993,H,H,1
29,Manchester Utd,Tottenham,0.517499,0.278438,0.201014,A,H,0
37,Burnley,Manchester Utd,0.165738,0.303306,0.529226,A,A,1
46,Watford,Manchester Utd,0.141468,0.186535,0.644478,A,A,1
66,West Ham Utd,Manchester Utd,0.158151,0.218098,0.610831,H,A,0


In [25]:
compare[(compare['AwayTeam'] == 'Chelsea') | (compare['HomeTeam'] == 'Chelsea')]

Unnamed: 0,HomeTeam,AwayTeam,W,D,L,Actual,Predict,Result
3,Huddersfield,Chelsea,0.113782,0.20424,0.667711,A,A,1
11,Chelsea,Arsenal,0.583343,0.239465,0.169422,H,H,1
27,Newcastle Utd,Chelsea,0.237487,0.284269,0.475667,A,A,1
31,Chelsea,Bournemouth,0.593484,0.238557,0.160225,H,H,1
59,West Ham Utd,Chelsea,0.171621,0.209591,0.601448,D,A,0
61,Chelsea,Liverpool,0.377169,0.241799,0.371771,D,A,0


In [26]:
compare[(compare['AwayTeam'] == 'West Ham Utd') | (compare['HomeTeam'] == 'West Ham Utd')]

Unnamed: 0,HomeTeam,AwayTeam,W,D,L,Actual,Predict,Result
8,Liverpool,West Ham Utd,0.758934,0.07721,0.032043,H,H,1
15,West Ham Utd,Bournemouth,0.400191,0.266698,0.329019,A,A,1
20,Arsenal,West Ham Utd,0.632783,0.080156,0.055083,H,H,1
47,Everton,West Ham Utd,0.547575,0.206789,0.222061,A,A,1
59,West Ham Utd,Chelsea,0.171621,0.209591,0.601448,D,A,0
66,West Ham Utd,Manchester Utd,0.158151,0.218098,0.610831,H,A,0


In [27]:
upcoming = pd.DataFrame(columns=(['HomeTeam', 'AwayTeam']))

In [28]:
upcoming['HomeTeam'] = ['Chelsea', 'West Ham Utd', 'Newcastle Utd', 'Bournemouth',
                       'Cardiff City', 'Wolves', 'Manchester City', 'Huddersfield', 'Everton', 'Arsenal',
                       'Manchester Utd', 'Tottenham', 'Brighton', 'Southampton', 'Fulham',
                       'Watford', 'Burnley', 'Liverpool', 'Crystal Palace', 'Leicester City']
upcoming['AwayTeam'] = ['Manchester Utd', 'Tottenham', 'Brighton', 'Southampton', 'Fulham',
                       'Watford', 'Burnley', 'Liverpool', 'Crystal Palace', 'Leicester City',
                       'Chelsea', 'West Ham Utd', 'Newcastle Utd', 'Bournemouth',
                       'Cardiff City', 'Wolves', 'Manchester City', 'Huddersfield', 'Everton', 'Arsenal']

In [29]:
predict(upcoming, pr_19)

In [30]:
upcoming

Unnamed: 0,HomeTeam,AwayTeam,W,D,L
0,Chelsea,Manchester Utd,0.336037,0.321699,0.341434
1,West Ham Utd,Tottenham,0.166114,0.234248,0.590857
2,Newcastle Utd,Brighton,,,
3,Bournemouth,Southampton,0.424351,0.28268,0.290395
4,Cardiff City,Fulham,,,
5,Wolves,Watford,,,
6,Manchester City,Burnley,0.727774,0.169446,0.078539
7,Huddersfield,Liverpool,0.135411,0.211864,0.639215
8,Everton,Crystal Palace,0.504015,0.293178,0.200685
9,Arsenal,Leicester City,0.657433,0.121496,0.097521


In [31]:
upcoming = fill_df(upcoming, pr_19).head(10)

In [32]:
upcoming

Unnamed: 0,HomeTeam,AwayTeam,W,D,L,HtYC,HtRC,HtCS,HtDis,AtDis,...,HtPesOff,AtPesOff,HtPesDef,AtPesDef,HtPesMid,AtPesMid,HtPesSpd,AtPesSpd,HtPesPhy,AtPesPhy
0,Chelsea,Manchester Utd,0.336037,0.321699,0.341434,40.0,4.0,16.0,0.736842,0.894737,...,85.0,85.0,82.0,81.0,83.0,82.0,78.0,78.0,77.0,78.0
1,West Ham Utd,Tottenham,0.166114,0.234248,0.590857,72.0,2.0,10.0,1.052632,0.75,...,80.0,82.0,79.0,82.0,79.0,83.0,75.0,77.0,77.0,78.0
2,Newcastle Utd,Brighton,,,,51.0,2.0,9.0,0.776316,,...,77.0,,76.0,,77.0,,76.0,,76.0,
3,Bournemouth,Southampton,0.424351,0.28268,0.290395,54.0,1.0,6.0,0.763158,0.921053,...,77.0,77.0,77.0,78.0,76.0,77.0,78.0,75.0,75.0,76.0
4,Cardiff City,Fulham,,,,,,,,,...,,,,,,,,,,
5,Wolves,Watford,,,,,,,,1.026316,...,,75.0,,77.0,,78.0,,75.0,,77.0
6,Manchester City,Burnley,0.727774,0.169446,0.078539,57.0,2.0,18.0,0.855263,0.855263,...,85.0,75.0,83.0,76.0,86.0,76.0,80.0,72.0,76.0,75.0
7,Huddersfield,Liverpool,0.135411,0.211864,0.639215,59.0,3.0,10.0,0.934211,0.631579,...,76.0,85.0,75.0,82.0,74.0,82.0,77.0,79.0,74.0,75.0
8,Everton,Crystal Palace,0.504015,0.293178,0.200685,49.0,3.0,10.0,0.802632,0.947368,...,79.0,79.0,80.0,78.0,79.0,78.0,77.0,76.0,77.0,78.0
9,Arsenal,Leicester City,0.657433,0.121496,0.097521,57.0,2.0,13.0,0.855263,0.894737,...,84.0,79.0,81.0,78.0,82.0,78.0,77.0,76.0,76.0,76.0


In [33]:
upcoming = upcoming.dropna().reset_index().drop(columns=['index'])

In [34]:
predict_U = upcoming[new_cols]
upcoming.loc[:,'Predict'] = lr.predict(predict_U)

In [35]:
upcoming[['HomeTeam', 'AwayTeam', 'Predict', 'W', 'D', 'L']]

Unnamed: 0,HomeTeam,AwayTeam,Predict,W,D,L
0,Chelsea,Manchester Utd,H,0.336037,0.321699,0.341434
1,West Ham Utd,Tottenham,A,0.166114,0.234248,0.590857
2,Bournemouth,Southampton,A,0.424351,0.28268,0.290395
3,Manchester City,Burnley,H,0.727774,0.169446,0.078539
4,Huddersfield,Liverpool,A,0.135411,0.211864,0.639215
5,Everton,Crystal Palace,A,0.504015,0.293178,0.200685
6,Arsenal,Leicester City,H,0.657433,0.121496,0.097521
