#  üèè T20 Cricket Score Predictor


### Overview  
A machine learning model that predicts the final T20 cricket score in real-time based on current match conditions and recent performance trends.  

### How It Works  
The model analyzes the current match state and uses **XGBoost** (Gradient Boosting) to predict the final score by considering:  
- Current score, wickets, and balls remaining  
- Run rate trends (overall and last 5 overs)  
- Historical performance of teams at specific venues  
- Recent momentum (wickets and runs in last 5 overs)  

### Key Features  
- **Real-time Predictions**: Get score forecasts at any point during an innings  
- **Context-Aware**: Accounts for venue, teams, and match situation  
- **Momentum Analysis**: Weights recent performance (last 5 overs) heavily  
- **High Accuracy**: Trained on extensive ball-by-ball historical data

## üìäüîçData Extraction

In [1]:
import numpy as np
import pandas as pd

In [2]:
df = pd.read_csv("t20i_info.csv")

In [6]:
df.shape

(63888, 15)

In [7]:
df.head()

Unnamed: 0.1,Unnamed: 0,match_id,batting_team,bowling_team,ball,runs,player_dismissed,city,venue,over,ball_no,ball_bowled,balls_left,current_score,wickets
0,0,2,Australia,Sri Lanka,0.1,0,0,,Melbourne Cricket Ground,0,1,1,119,0,0
1,1,2,Australia,Sri Lanka,0.2,0,0,,Melbourne Cricket Ground,0,2,2,118,0,0
2,2,2,Australia,Sri Lanka,0.3,1,0,,Melbourne Cricket Ground,0,3,3,117,1,0
3,3,2,Australia,Sri Lanka,0.4,2,0,,Melbourne Cricket Ground,0,4,4,116,3,0
4,4,2,Australia,Sri Lanka,0.5,0,0,,Melbourne Cricket Ground,0,5,5,115,3,0


In [8]:
df.isnull().sum()

Unnamed: 0             0
match_id               0
batting_team           0
bowling_team           0
ball                   0
runs                   0
player_dismissed       0
city                8548
venue                  0
over                   0
ball_no                0
ball_bowled            0
balls_left             0
current_score          0
wickets                0
dtype: int64

In [9]:
df.drop('city', axis=1, inplace=True)

In [10]:
df.isnull().sum()

Unnamed: 0          0
match_id            0
batting_team        0
bowling_team        0
ball                0
runs                0
player_dismissed    0
venue               0
over                0
ball_no             0
ball_bowled         0
balls_left          0
current_score       0
wickets             0
dtype: int64

In [11]:
print(df['venue'].value_counts())
eligible_venues = df['venue'].value_counts()[df['venue'].value_counts() >= 30].index.tolist()
df = df[df['venue'].isin(eligible_venues)]

venue
Shere Bangla National Stadium          3420
R Premadasa Stadium                    2983
Dubai International Cricket Stadium    2969
New Wanderers Stadium                  2819
Eden Park                              2532
                                       ... 
Senwes Park                             122
Sardar Patel Stadium, Motera            121
Hagley Oval                             121
Subrata Roy Sahara Stadium              121
Carrara Oval                             64
Name: count, Length: 94, dtype: int64


In [12]:
df.shape

(63888, 14)

In [13]:
df.head()

Unnamed: 0.1,Unnamed: 0,match_id,batting_team,bowling_team,ball,runs,player_dismissed,venue,over,ball_no,ball_bowled,balls_left,current_score,wickets
0,0,2,Australia,Sri Lanka,0.1,0,0,Melbourne Cricket Ground,0,1,1,119,0,0
1,1,2,Australia,Sri Lanka,0.2,0,0,Melbourne Cricket Ground,0,2,2,118,0,0
2,2,2,Australia,Sri Lanka,0.3,1,0,Melbourne Cricket Ground,0,3,3,117,1,0
3,3,2,Australia,Sri Lanka,0.4,2,0,Melbourne Cricket Ground,0,4,4,116,3,0
4,4,2,Australia,Sri Lanka,0.5,0,0,Melbourne Cricket Ground,0,5,5,115,3,0


In [14]:
df['current_score'] = df.groupby('match_id')['runs'].cumsum()

In [15]:
df.head()

Unnamed: 0.1,Unnamed: 0,match_id,batting_team,bowling_team,ball,runs,player_dismissed,venue,over,ball_no,ball_bowled,balls_left,current_score,wickets
0,0,2,Australia,Sri Lanka,0.1,0,0,Melbourne Cricket Ground,0,1,1,119,0,0
1,1,2,Australia,Sri Lanka,0.2,0,0,Melbourne Cricket Ground,0,2,2,118,0,0
2,2,2,Australia,Sri Lanka,0.3,1,0,Melbourne Cricket Ground,0,3,3,117,1,0
3,3,2,Australia,Sri Lanka,0.4,2,0,Melbourne Cricket Ground,0,4,4,116,3,0
4,4,2,Australia,Sri Lanka,0.5,0,0,Melbourne Cricket Ground,0,5,5,115,3,0


In [16]:
df['over'] = df['ball'].apply(lambda x : str(x).split(".")[0])
df['ball_no'] = df['ball'].apply(lambda x : str(x).split(".")[1])

In [17]:
df.head()

Unnamed: 0.1,Unnamed: 0,match_id,batting_team,bowling_team,ball,runs,player_dismissed,venue,over,ball_no,ball_bowled,balls_left,current_score,wickets
0,0,2,Australia,Sri Lanka,0.1,0,0,Melbourne Cricket Ground,0,1,1,119,0,0
1,1,2,Australia,Sri Lanka,0.2,0,0,Melbourne Cricket Ground,0,2,2,118,0,0
2,2,2,Australia,Sri Lanka,0.3,1,0,Melbourne Cricket Ground,0,3,3,117,1,0
3,3,2,Australia,Sri Lanka,0.4,2,0,Melbourne Cricket Ground,0,4,4,116,3,0
4,4,2,Australia,Sri Lanka,0.5,0,0,Melbourne Cricket Ground,0,5,5,115,3,0


In [18]:
df['ball_bowled'] = (df['over'].astype(int)*6 + df['ball_no'].astype(int))

In [19]:
df['balls_left'] = 120 - df['ball_bowled']

In [20]:
df.tail()

Unnamed: 0.1,Unnamed: 0,match_id,batting_team,bowling_team,ball,runs,player_dismissed,venue,over,ball_no,ball_bowled,balls_left,current_score,wickets
63883,121,964,Sri Lanka,Australia,19.3,1,0,R Premadasa Stadium,19,3,117,3,125,8
63884,122,964,Sri Lanka,Australia,19.4,0,0,R Premadasa Stadium,19,4,118,2,125,8
63885,123,964,Sri Lanka,Australia,19.5,0,DM de Silva,R Premadasa Stadium,19,5,119,1,125,9
63886,124,964,Sri Lanka,Australia,19.6,2,0,R Premadasa Stadium,19,6,120,0,127,9
63887,125,964,Sri Lanka,Australia,19.7,1,0,R Premadasa Stadium,19,7,121,-1,128,9


In [21]:
df['balls_left'] = df['balls_left'].apply(lambda x:0 if x<0 else x)

In [22]:
df.head()

Unnamed: 0.1,Unnamed: 0,match_id,batting_team,bowling_team,ball,runs,player_dismissed,venue,over,ball_no,ball_bowled,balls_left,current_score,wickets
0,0,2,Australia,Sri Lanka,0.1,0,0,Melbourne Cricket Ground,0,1,1,119,0,0
1,1,2,Australia,Sri Lanka,0.2,0,0,Melbourne Cricket Ground,0,2,2,118,0,0
2,2,2,Australia,Sri Lanka,0.3,1,0,Melbourne Cricket Ground,0,3,3,117,1,0
3,3,2,Australia,Sri Lanka,0.4,2,0,Melbourne Cricket Ground,0,4,4,116,3,0
4,4,2,Australia,Sri Lanka,0.5,0,0,Melbourne Cricket Ground,0,5,5,115,3,0


In [23]:
df['player_dismissed'] = df['player_dismissed'].apply(lambda x:1 if x != '0' else '0')

In [24]:
df['player_dismissed'] = df['player_dismissed'].astype(int)

In [25]:
df.head()

Unnamed: 0.1,Unnamed: 0,match_id,batting_team,bowling_team,ball,runs,player_dismissed,venue,over,ball_no,ball_bowled,balls_left,current_score,wickets
0,0,2,Australia,Sri Lanka,0.1,0,0,Melbourne Cricket Ground,0,1,1,119,0,0
1,1,2,Australia,Sri Lanka,0.2,0,0,Melbourne Cricket Ground,0,2,2,118,0,0
2,2,2,Australia,Sri Lanka,0.3,1,0,Melbourne Cricket Ground,0,3,3,117,1,0
3,3,2,Australia,Sri Lanka,0.4,2,0,Melbourne Cricket Ground,0,4,4,116,3,0
4,4,2,Australia,Sri Lanka,0.5,0,0,Melbourne Cricket Ground,0,5,5,115,3,0


In [33]:
df['player_dismissed'] = df.groupby('match_id')['player_dismissed'].cumsum()

In [29]:
df['wicket_left'] = 10 - df['player_dismissed']

In [36]:
df['current_run_rate'] = (df['current_score']*6 / df['ball_bowled'])

In [37]:
df.tail()

Unnamed: 0.1,Unnamed: 0,match_id,batting_team,bowling_team,ball,runs,player_dismissed,venue,over,ball_no,ball_bowled,balls_left,current_score,wickets,wicket_left,current_run_rate
63883,121,964,Sri Lanka,Australia,19.3,1,13372721,R Premadasa Stadium,19,3,117,3,125,8,2,6.410256
63884,122,964,Sri Lanka,Australia,19.4,0,13949026,R Premadasa Stadium,19,4,118,2,125,8,2,6.355932
63885,123,964,Sri Lanka,Australia,19.5,0,14545289,R Premadasa Stadium,19,5,119,1,125,9,1,6.302521
63886,124,964,Sri Lanka,Australia,19.6,2,15162026,R Premadasa Stadium,19,6,120,0,127,9,1,6.35
63887,125,964,Sri Lanka,Australia,19.7,1,15799762,R Premadasa Stadium,19,7,121,0,128,9,1,6.347107


In [50]:
groups = df.groupby('match_id')

last_five = []

for match_id in df['match_id'].unique():
    rolling_sum = groups.get_group(match_id)['runs'].rolling(window=30).sum()
    last_five.extend(rolling_sum.values.tolist())

df['last_five'] = last_five

In [51]:
final_df = df.groupby('match_id').sum()['runs'].reset_index().merge(df,on="match_id")

In [52]:
final_df

Unnamed: 0.1,match_id,runs_x,Unnamed: 0,batting_team,bowling_team,ball,runs_y,player_dismissed,venue,over,ball_no,ball_bowled,balls_left,current_score,wickets,wicket_left,current_run_rate,last_five
0,2,168,0,Australia,Sri Lanka,0.1,0,0,Melbourne Cricket Ground,0,1,1,119,0,0,10,0.000000,
1,2,168,1,Australia,Sri Lanka,0.2,0,0,Melbourne Cricket Ground,0,2,2,118,0,0,10,0.000000,
2,2,168,2,Australia,Sri Lanka,0.3,1,0,Melbourne Cricket Ground,0,3,3,117,1,0,10,2.000000,
3,2,168,3,Australia,Sri Lanka,0.4,2,0,Melbourne Cricket Ground,0,4,4,116,3,0,10,4.500000,
4,2,168,4,Australia,Sri Lanka,0.5,0,0,Melbourne Cricket Ground,0,5,5,115,3,0,10,3.600000,
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
63883,964,128,121,Sri Lanka,Australia,19.3,1,13372721,R Premadasa Stadium,19,3,117,3,125,8,2,6.410256,32.0
63884,964,128,122,Sri Lanka,Australia,19.4,0,13949026,R Premadasa Stadium,19,4,118,2,125,8,2,6.355932,32.0
63885,964,128,123,Sri Lanka,Australia,19.5,0,14545289,R Premadasa Stadium,19,5,119,1,125,9,1,6.302521,32.0
63886,964,128,124,Sri Lanka,Australia,19.6,2,15162026,R Premadasa Stadium,19,6,120,0,127,9,1,6.350000,33.0


In [53]:
final_df.columns

Index(['match_id', 'runs_x', 'Unnamed: 0', 'batting_team', 'bowling_team',
       'ball', 'runs_y', 'player_dismissed', 'venue', 'over', 'ball_no',
       'ball_bowled', 'balls_left', 'current_score', 'wickets', 'wicket_left',
       'current_run_rate', 'last_five'],
      dtype='object')

In [54]:
final_df = final_df[['batting_team', 'bowling_team','venue','balls_left', 'current_score',  'wicket_left',
       'current_run_rate', 'last_five','runs_x']]

In [57]:
final_df.dropna(inplace=True)

A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  final_df.dropna(inplace=True)


In [58]:
final_df.isnull().sum()

batting_team        0
bowling_team        0
venue               0
balls_left          0
current_score       0
wicket_left         0
current_run_rate    0
last_five           0
runs_x              0
dtype: int64

In [59]:
final_df.shape

(48645, 9)

## üß†üìäModel Training