# Final Data Prep

### Preparing the data that I will use directly to train the neural network

I am going to take the last two steps for data retrieval

- Get All the grounds in one hot encoding
- Concat all the data to make a single dataframe
- Selecting only the relevant data from the match data

The Data we will need here is 
- Ground Player Lineup data
- Innings
- Player Stats in Match
- Player Status in match


In [1]:
import numpy as np
import pandas as pd
from pathlib import Path
import random
current_dir = Path.cwd()
parent_dir = current_dir.parent.parent

In [4]:
ground_player_lineup = pd.read_pickle(parent_dir / "Resources/ground_player_lineup.pkl")
innings = pd.read_pickle(parent_dir / "Resources/match_data.pkl")
player_stats = pd.read_pickle(parent_dir / "Resources/MatchData/match_player_stats.pkl")
player_status = pd.read_pickle(parent_dir / "Resources/MatchData/player_status.pkl")

**It's important to sort all of these :)**

In [16]:
ground_player_lineup = ground_player_lineup.sort_index(level = 'Match Code')
player_stats.index.names = ['Match Code', "Country"]
player_stats = player_stats.sort_index(level = 'Match Code')
player_status = player_status.sort_index(level = 'Match Code')
innings = innings.sort_index(level = 'Match Code')

In [17]:
innings.loc[1359787]

Unnamed: 0_level_0,Unnamed: 1_level_0,Unnamed: 2_level_0,batter,bowler,non_striker,batter runs,extra runs,total runs,score,extras,wickets,out,Total Wickets,non_boundary,review,replacements
Country,Over,Ball,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1
Argentina,0,0,P Baron,DMW Rawlins,R Escobar,0,0,0,0,,,0,0,,,
Argentina,0,1,P Baron,DMW Rawlins,R Escobar,0,0,0,0,,,0,0,,,
Argentina,0,2,P Baron,DMW Rawlins,R Escobar,4,0,4,4,,,0,0,,,
Argentina,0,3,P Baron,DMW Rawlins,R Escobar,4,0,4,8,,,0,0,,,
Argentina,0,4,P Baron,DMW Rawlins,R Escobar,0,0,0,8,,,0,0,,,
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
Bermuda,19,2,JE Pitcher,P Arrighi,MO Jones,0,0,0,224,,,0,4,,,
Bermuda,19,3,JE Pitcher,P Arrighi,MO Jones,0,0,0,224,,,0,4,,,
Bermuda,19,4,JE Pitcher,P Arrighi,MO Jones,0,0,0,224,,,0,4,,,
Bermuda,19,5,JE Pitcher,P Arrighi,MO Jones,0,1,1,225,{'wides': 1},,0,4,,,


**This is the default series we will use accordingly ahead**

In [6]:
ground_names = ground_player_lineup['Details', 'Ground Name'].unique()
ground_col = pd.Index(ground_names)
ground_series_original = pd.Series(index = ground_col, dtype = 'float64')
ground_series_original = ground_series_original.fillna(0)
ground_series_original

The Rose Bowl                               0.0
Eden Park                                   0.0
County Ground                               0.0
Brisbane Cricket Ground, Woolloongabba      0.0
New Wanderers Stadium                       0.0
                                           ... 
Mission Road Ground, Mong Kok, Hong Kong    0.0
Santarem Cricket Ground                     0.0
Europa Sports Complex                       0.0
Solvangs Park, Glostrup                     0.0
Sportpark Het Schootsveld, Deventer         0.0
Length: 245, dtype: float64

**Making a function that returns each Match's ground in one hot series form**

In [7]:
def ground_series(each_match):
    ground_name = each_match['Details', 'Ground Name']
    ground_series = ground_series_original.copy()
    ground_series[ground_name] = 1
    return ground_series

In [8]:
## Checking if it works
ground_series(ground_player_lineup.loc[211028])

The Rose Bowl                               1.0
Eden Park                                   0.0
County Ground                               0.0
Brisbane Cricket Ground, Woolloongabba      0.0
New Wanderers Stadium                       0.0
                                           ... 
Mission Road Ground, Mong Kok, Hong Kong    0.0
Santarem Cricket Ground                     0.0
Europa Sports Complex                       0.0
Solvangs Park, Glostrup                     0.0
Sportpark Het Schootsveld, Deventer         0.0
Length: 245, dtype: float64

In [9]:
ground_onehot = ground_player_lineup.apply(ground_series, axis = 1)
ground_onehot

Unnamed: 0_level_0,The Rose Bowl,Eden Park,County Ground,"Brisbane Cricket Ground, Woolloongabba",New Wanderers Stadium,Sydney Cricket Ground,Westpac Stadium,Kennington Oval,Kingsmead,Newlands,...,"Gahanga International Cricket Stadium, Rwanda","Zahur Ahmed Chowdhury Stadium, Chattogram","St Albans Club, Buenos Aires","Belgrano Athletic Club Ground, Buenos Aires","Hurlingham Club Ground, Buenos Aires","Mission Road Ground, Mong Kok, Hong Kong",Santarem Cricket Ground,Europa Sports Complex,"Solvangs Park, Glostrup","Sportpark Het Schootsveld, Deventer"
Match Code,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
211028,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
211048,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
225263,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
225271,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
226374,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
1381454,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
1381455,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
1381456,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
1381457,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


### Saving this

In order to access this from disc, use the following code:

```py
ground_onehot = pd.read_pickle("../Resources/MatchData/ground_onehot.pkl")
```

In [10]:
ground_onehot.to_pickle(parent_dir / "Resources/MatchData/ground_onehot.pkl")
ground_onehot = pd.read_pickle(parent_dir / "Resources/MatchData/ground_onehot.pkl")
ground_onehot

Unnamed: 0_level_0,The Rose Bowl,Eden Park,County Ground,"Brisbane Cricket Ground, Woolloongabba",New Wanderers Stadium,Sydney Cricket Ground,Westpac Stadium,Kennington Oval,Kingsmead,Newlands,...,"Gahanga International Cricket Stadium, Rwanda","Zahur Ahmed Chowdhury Stadium, Chattogram","St Albans Club, Buenos Aires","Belgrano Athletic Club Ground, Buenos Aires","Hurlingham Club Ground, Buenos Aires","Mission Road Ground, Mong Kok, Hong Kong",Santarem Cricket Ground,Europa Sports Complex,"Solvangs Park, Glostrup","Sportpark Het Schootsveld, Deventer"
Match Code,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
211028,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
211048,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
225263,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
225271,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
226374,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
1381454,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
1381455,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
1381456,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
1381457,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


## Combining all this data into a single main dataframe

Here is the data we are combining:

- Restricted data from innings
- Player Stats
- Player Status
- Ground Details

In [11]:
innings

Unnamed: 0_level_0,Unnamed: 1_level_0,Unnamed: 2_level_0,Unnamed: 3_level_0,batter,bowler,non_striker,batter runs,extra runs,total runs,score,extras,wickets,out,Total Wickets,non_boundary,review,replacements
Match Code,Country,Over,Ball,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1
211028,Australia,0,0,AC Gilchrist,D Gough,ML Hayden,0,0,0,0,,,0,0,,,
211028,Australia,0,1,AC Gilchrist,D Gough,ML Hayden,4,0,4,4,,,0,0,,,
211028,Australia,0,2,AC Gilchrist,D Gough,ML Hayden,0,0,0,4,,,0,0,,,
211028,Australia,0,3,AC Gilchrist,D Gough,ML Hayden,0,0,0,4,,,0,0,,,
211028,Australia,0,4,AC Gilchrist,D Gough,ML Hayden,4,0,4,8,,,0,0,,,
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
1381458,Turkey,19,1,Murat Yilmaz,Huzaif Yousuf,Muhammet Kursat,0,0,0,146,,,0,8,,,
1381458,Turkey,19,2,Murat Yilmaz,Huzaif Yousuf,Muhammet Kursat,0,0,0,146,,,0,8,,,
1381458,Turkey,19,3,Murat Yilmaz,Huzaif Yousuf,Muhammet Kursat,0,0,0,146,,,0,8,,,
1381458,Turkey,19,4,Murat Yilmaz,Huzaif Yousuf,Muhammet Kursat,0,1,1,147,{'wides': 1},Muhammet Kursat,1,9,,,


In [12]:
restricted_innings = innings[['score', 'Total Wickets', 'total runs', 'out']]
restricted_innings.columns = pd.MultiIndex.from_product([['Inning Data'], ['score', 'Total Wickets', 'Total Runs', 'Out']])
restricted_innings

Unnamed: 0_level_0,Unnamed: 1_level_0,Unnamed: 2_level_0,Unnamed: 3_level_0,Inning Data,Inning Data,Inning Data,Inning Data
Unnamed: 0_level_1,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,score,Total Wickets,Total Runs,Out
Match Code,Country,Over,Ball,Unnamed: 4_level_2,Unnamed: 5_level_2,Unnamed: 6_level_2,Unnamed: 7_level_2
211028,Australia,0,0,0,0,0,0
211028,Australia,0,1,4,0,4,0
211028,Australia,0,2,4,0,0,0
211028,Australia,0,3,4,0,0,0
211028,Australia,0,4,8,0,4,0
...,...,...,...,...,...,...,...
1381458,Turkey,19,1,146,8,0,0
1381458,Turkey,19,2,146,8,0,0
1381458,Turkey,19,3,146,8,0,0
1381458,Turkey,19,4,147,9,1,1


In [15]:
pd.concat([player_status, restricted_innings], axis = 1)

ValueError: Reindexing only valid with uniquely valued Index objects

**The above error comes for multiple indexes in innings, so i will be removing them here**

In [13]:
restricted_innings_unique = restricted_innings[~restricted_innings.index.duplicated()]
innings_status = pd.concat([player_status, restricted_innings_unique], axis = 1, keys = ['Player Status', 'Inning Data'])
innings_status

Unnamed: 0_level_0,Unnamed: 1_level_0,Unnamed: 2_level_0,Unnamed: 3_level_0,Player Status,Player Status,Player Status,Player Status,Player Status,Player Status,Player Status,Player Status,Player Status,Player Status,Player Status,Player Status,Player Status,Player Status,Player Status,Player Status,Player Status,Inning Data,Inning Data,Inning Data,Inning Data
Unnamed: 0_level_1,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Batting Team,Batting Team,Batting Team,Batting Team,Batting Team,Batting Team,Batting Team,Batting Team,Batting Team,Batting Team,...,Bowling Team,Bowling Team,Bowling Team,Bowling Team,Bowling Team,Bowling Team,Inning Data,Inning Data,Inning Data,Inning Data
Unnamed: 0_level_2,Unnamed: 1_level_2,Unnamed: 2_level_2,Unnamed: 3_level_2,P1,P2,P3,P4,P5,P6,P7,P8,P9,P10,...,P6,P7,P8,P9,P10,P11,score,Total Wickets,Total Runs,Out
Match Code,Country,Over,Ball,Unnamed: 4_level_3,Unnamed: 5_level_3,Unnamed: 6_level_3,Unnamed: 7_level_3,Unnamed: 8_level_3,Unnamed: 9_level_3,Unnamed: 10_level_3,Unnamed: 11_level_3,Unnamed: 12_level_3,Unnamed: 13_level_3,Unnamed: 14_level_3,Unnamed: 15_level_3,Unnamed: 16_level_3,Unnamed: 17_level_3,Unnamed: 18_level_3,Unnamed: 19_level_3,Unnamed: 20_level_3,Unnamed: 21_level_3,Unnamed: 22_level_3,Unnamed: 23_level_3,Unnamed: 24_level_3
211028,Australia,0,0,10.0,5.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,...,1.0,1.0,1.0,1.0,10.0,1.0,0,0,0,0
211028,Australia,0,1,10.0,5.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,...,1.0,1.0,1.0,1.0,10.0,1.0,4,0,4,0
211028,Australia,0,2,10.0,5.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,...,1.0,1.0,1.0,1.0,10.0,1.0,4,0,0,0
211028,Australia,0,3,10.0,5.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,...,1.0,1.0,1.0,1.0,10.0,1.0,4,0,0,0
211028,Australia,0,4,10.0,5.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,...,1.0,1.0,1.0,1.0,10.0,1.0,8,0,4,0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
1381458,Turkey,19,1,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,10.0,5.0,...,1.0,1.0,1.0,1.0,1.0,10.0,146,8,0,0
1381458,Turkey,19,2,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,10.0,5.0,...,1.0,1.0,1.0,1.0,1.0,10.0,146,8,0,0
1381458,Turkey,19,3,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,10.0,5.0,...,1.0,1.0,1.0,1.0,1.0,10.0,146,8,0,0
1381458,Turkey,19,4,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,10.0,0.0,...,1.0,1.0,1.0,1.0,1.0,10.0,147,9,1,1


In [17]:
innings_status_noIndex = innings_status.reset_index(level = ['Over', 'Ball'])
innings_status_noIndex

Unnamed: 0_level_0,Unnamed: 1_level_0,Over,Ball,Player Status,Player Status,Player Status,Player Status,Player Status,Player Status,Player Status,Player Status,Player Status,Player Status,Player Status,Player Status,Player Status,Player Status,Player Status,Inning Data,Inning Data,Inning Data,Inning Data
Unnamed: 0_level_1,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Batting Team,Batting Team,Batting Team,Batting Team,Batting Team,Batting Team,Batting Team,Batting Team,...,Bowling Team,Bowling Team,Bowling Team,Bowling Team,Bowling Team,Bowling Team,Inning Data,Inning Data,Inning Data,Inning Data
Unnamed: 0_level_2,Unnamed: 1_level_2,Unnamed: 2_level_2,Unnamed: 3_level_2,P1,P2,P3,P4,P5,P6,P7,P8,...,P6,P7,P8,P9,P10,P11,score,Total Wickets,Total Runs,Out
Match Code,Country,Unnamed: 2_level_3,Unnamed: 3_level_3,Unnamed: 4_level_3,Unnamed: 5_level_3,Unnamed: 6_level_3,Unnamed: 7_level_3,Unnamed: 8_level_3,Unnamed: 9_level_3,Unnamed: 10_level_3,Unnamed: 11_level_3,Unnamed: 12_level_3,Unnamed: 13_level_3,Unnamed: 14_level_3,Unnamed: 15_level_3,Unnamed: 16_level_3,Unnamed: 17_level_3,Unnamed: 18_level_3,Unnamed: 19_level_3,Unnamed: 20_level_3,Unnamed: 21_level_3,Unnamed: 22_level_3
211028,Australia,0,0,10.0,5.0,1.0,1.0,1.0,1.0,1.0,1.0,...,1.0,1.0,1.0,1.0,10.0,1.0,0,0,0,0
211028,Australia,0,1,10.0,5.0,1.0,1.0,1.0,1.0,1.0,1.0,...,1.0,1.0,1.0,1.0,10.0,1.0,4,0,4,0
211028,Australia,0,2,10.0,5.0,1.0,1.0,1.0,1.0,1.0,1.0,...,1.0,1.0,1.0,1.0,10.0,1.0,4,0,0,0
211028,Australia,0,3,10.0,5.0,1.0,1.0,1.0,1.0,1.0,1.0,...,1.0,1.0,1.0,1.0,10.0,1.0,4,0,0,0
211028,Australia,0,4,10.0,5.0,1.0,1.0,1.0,1.0,1.0,1.0,...,1.0,1.0,1.0,1.0,10.0,1.0,8,0,4,0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
1381458,Turkey,19,1,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,1.0,1.0,1.0,1.0,1.0,10.0,146,8,0,0
1381458,Turkey,19,2,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,1.0,1.0,1.0,1.0,1.0,10.0,146,8,0,0
1381458,Turkey,19,3,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,1.0,1.0,1.0,1.0,1.0,10.0,146,8,0,0
1381458,Turkey,19,4,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,1.0,1.0,1.0,1.0,1.0,10.0,147,9,1,1


In [15]:
innings_status.loc[1359787]

Unnamed: 0_level_0,Unnamed: 1_level_0,Unnamed: 2_level_0,Player Status,Player Status,Player Status,Player Status,Player Status,Player Status,Player Status,Player Status,Player Status,Player Status,Player Status,Player Status,Player Status,Player Status,Player Status,Player Status,Player Status,Inning Data,Inning Data,Inning Data,Inning Data
Unnamed: 0_level_1,Unnamed: 1_level_1,Unnamed: 2_level_1,Batting Team,Batting Team,Batting Team,Batting Team,Batting Team,Batting Team,Batting Team,Batting Team,Batting Team,Batting Team,...,Bowling Team,Bowling Team,Bowling Team,Bowling Team,Bowling Team,Bowling Team,Inning Data,Inning Data,Inning Data,Inning Data
Unnamed: 0_level_2,Unnamed: 1_level_2,Unnamed: 2_level_2,P1,P2,P3,P4,P5,P6,P7,P8,P9,P10,...,P6,P7,P8,P9,P10,P11,score,Total Wickets,Total Runs,Out
Country,Over,Ball,Unnamed: 3_level_3,Unnamed: 4_level_3,Unnamed: 5_level_3,Unnamed: 6_level_3,Unnamed: 7_level_3,Unnamed: 8_level_3,Unnamed: 9_level_3,Unnamed: 10_level_3,Unnamed: 11_level_3,Unnamed: 12_level_3,Unnamed: 13_level_3,Unnamed: 14_level_3,Unnamed: 15_level_3,Unnamed: 16_level_3,Unnamed: 17_level_3,Unnamed: 18_level_3,Unnamed: 19_level_3,Unnamed: 20_level_3,Unnamed: 21_level_3,Unnamed: 22_level_3,Unnamed: 23_level_3
Argentina,0,0,10.0,5.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,...,1.0,1.0,1.0,1.0,1.0,1.0,0,0,0,0
Argentina,0,1,10.0,5.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,...,1.0,1.0,1.0,1.0,1.0,1.0,0,0,0,0
Argentina,0,2,10.0,5.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,...,1.0,1.0,1.0,1.0,1.0,1.0,4,0,4,0
Argentina,0,3,10.0,5.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,...,1.0,1.0,1.0,1.0,1.0,1.0,8,0,4,0
Argentina,0,4,10.0,5.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,...,1.0,1.0,1.0,1.0,1.0,1.0,8,0,0,0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
Bermuda,19,2,0.0,0.0,0.0,0.0,5.0,10.0,1.0,1.0,1.0,1.0,...,1.0,10.0,1.0,1.0,1.0,1.0,224,4,0,0
Bermuda,19,3,0.0,0.0,0.0,0.0,5.0,10.0,1.0,1.0,1.0,1.0,...,1.0,10.0,1.0,1.0,1.0,1.0,224,4,0,0
Bermuda,19,4,0.0,0.0,0.0,0.0,5.0,10.0,1.0,1.0,1.0,1.0,...,1.0,10.0,1.0,1.0,1.0,1.0,224,4,0,0
Bermuda,19,5,0.0,0.0,0.0,0.0,5.0,10.0,1.0,1.0,1.0,1.0,...,1.0,10.0,1.0,1.0,1.0,1.0,225,4,1,0


In [14]:
player_stats.loc[1359787]

Unnamed: 0_level_0,Batting Stats,Batting Stats,Batting Stats,Batting Stats,Batting Stats,Batting Stats,Batting Stats,Batting Stats,Batting Stats,Batting Stats,...,Bowling Stats,Bowling Stats,Bowling Stats,Bowling Stats,Bowling Stats,Bowling Stats,Bowling Stats,Bowling Stats,Bowling Stats,Bowling Stats
Unnamed: 0_level_1,P1,P1,P1,P1,P1,P1,P1,P1,P2,P2,...,P10,P10,P11,P11,P11,P11,P11,P11,P11,P11
Unnamed: 0_level_2,Mat,Inns,NO,Runs,HS,Ave,BF,SR,Mat,Inns,...,Ave,Econ,Mat,Inns,Overs,Mdns,Runs,Wkts,Ave,Econ
Country,Unnamed: 1_level_3,Unnamed: 2_level_3,Unnamed: 3_level_3,Unnamed: 4_level_3,Unnamed: 5_level_3,Unnamed: 6_level_3,Unnamed: 7_level_3,Unnamed: 8_level_3,Unnamed: 9_level_3,Unnamed: 10_level_3,Unnamed: 11_level_3,Unnamed: 12_level_3,Unnamed: 13_level_3,Unnamed: 14_level_3,Unnamed: 15_level_3,Unnamed: 16_level_3,Unnamed: 17_level_3,Unnamed: 18_level_3,Unnamed: 19_level_3,Unnamed: 20_level_3,Unnamed: 21_level_3
Argentina,17.0,15.0,1.0,316.0,66.0,22.57,308.0,102.59,17.0,15.0,...,20.3,6.54,3.0,10.710079,32.795811,0.453754,241.972625,10.520946,28.038534,8.036798
Bermuda,24.0,24.0,4.0,629.0,103.0,31.45,430.0,146.27,12.0,12.0,...,28.038534,3.0,6.0,4.0,10.5,0.0,71.0,1.0,71.0,6.55


In [19]:
innings_status_stats = innings_status_noIndex.merge(player_stats, left_index = True, right_index = True, how = 'inner')
innings_status_stats

Unnamed: 0_level_0,Unnamed: 1_level_0,Over,Ball,Player Status,Player Status,Player Status,Player Status,Player Status,Player Status,Player Status,Player Status,...,Bowling Stats,Bowling Stats,Bowling Stats,Bowling Stats,Bowling Stats,Bowling Stats,Bowling Stats,Bowling Stats,Bowling Stats,Bowling Stats
Unnamed: 0_level_1,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Batting Team,Batting Team,Batting Team,Batting Team,Batting Team,Batting Team,Batting Team,Batting Team,...,P10,P10,P11,P11,P11,P11,P11,P11,P11,P11
Unnamed: 0_level_2,Unnamed: 1_level_2,Unnamed: 2_level_2,Unnamed: 3_level_2,P1,P2,P3,P4,P5,P6,P7,P8,...,Ave,Econ,Mat,Inns,Overs,Mdns,Runs,Wkts,Ave,Econ
Match Code,Country,Unnamed: 2_level_3,Unnamed: 3_level_3,Unnamed: 4_level_3,Unnamed: 5_level_3,Unnamed: 6_level_3,Unnamed: 7_level_3,Unnamed: 8_level_3,Unnamed: 9_level_3,Unnamed: 10_level_3,Unnamed: 11_level_3,Unnamed: 12_level_3,Unnamed: 13_level_3,Unnamed: 14_level_3,Unnamed: 15_level_3,Unnamed: 16_level_3,Unnamed: 17_level_3,Unnamed: 18_level_3,Unnamed: 19_level_3,Unnamed: 20_level_3,Unnamed: 21_level_3,Unnamed: 22_level_3
211028,Australia,0,0,10.0,5.0,1.0,1.0,1.0,1.0,1.0,1.0,...,16.330000,7.170000,2.0,2.000000,6.300000,0.000000,42.000000,1.000000,42.000000,6.460000
211028,Australia,0,1,10.0,5.0,1.0,1.0,1.0,1.0,1.0,1.0,...,16.330000,7.170000,2.0,2.000000,6.300000,0.000000,42.000000,1.000000,42.000000,6.460000
211028,Australia,0,2,10.0,5.0,1.0,1.0,1.0,1.0,1.0,1.0,...,16.330000,7.170000,2.0,2.000000,6.300000,0.000000,42.000000,1.000000,42.000000,6.460000
211028,Australia,0,3,10.0,5.0,1.0,1.0,1.0,1.0,1.0,1.0,...,16.330000,7.170000,2.0,2.000000,6.300000,0.000000,42.000000,1.000000,42.000000,6.460000
211028,Australia,0,4,10.0,5.0,1.0,1.0,1.0,1.0,1.0,1.0,...,16.330000,7.170000,2.0,2.000000,6.300000,0.000000,42.000000,1.000000,42.000000,6.460000
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
1381457,Serbia,7,4,10.0,0.0,5.0,1.0,1.0,1.0,1.0,1.0,...,28.038534,8.036798,4.0,10.710079,32.795811,0.453754,241.972625,10.520946,28.038534,8.036798
1381457,Serbia,7,5,10.0,0.0,5.0,1.0,1.0,1.0,1.0,1.0,...,28.038534,8.036798,4.0,10.710079,32.795811,0.453754,241.972625,10.520946,28.038534,8.036798
1381457,Serbia,8,0,5.0,0.0,10.0,1.0,1.0,1.0,1.0,1.0,...,28.038534,8.036798,4.0,10.710079,32.795811,0.453754,241.972625,10.520946,28.038534,8.036798
1381457,Serbia,8,1,10.0,0.0,5.0,1.0,1.0,1.0,1.0,1.0,...,28.038534,8.036798,4.0,10.710079,32.795811,0.453754,241.972625,10.520946,28.038534,8.036798


In [20]:
other_data = innings_status_stats.reset_index("Country")
other_data

Unnamed: 0_level_0,Country,Over,Ball,Player Status,Player Status,Player Status,Player Status,Player Status,Player Status,Player Status,...,Bowling Stats,Bowling Stats,Bowling Stats,Bowling Stats,Bowling Stats,Bowling Stats,Bowling Stats,Bowling Stats,Bowling Stats,Bowling Stats
Unnamed: 0_level_1,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Batting Team,Batting Team,Batting Team,Batting Team,Batting Team,Batting Team,Batting Team,...,P10,P10,P11,P11,P11,P11,P11,P11,P11,P11
Unnamed: 0_level_2,Unnamed: 1_level_2,Unnamed: 2_level_2,Unnamed: 3_level_2,P1,P2,P3,P4,P5,P6,P7,...,Ave,Econ,Mat,Inns,Overs,Mdns,Runs,Wkts,Ave,Econ
Match Code,Unnamed: 1_level_3,Unnamed: 2_level_3,Unnamed: 3_level_3,Unnamed: 4_level_3,Unnamed: 5_level_3,Unnamed: 6_level_3,Unnamed: 7_level_3,Unnamed: 8_level_3,Unnamed: 9_level_3,Unnamed: 10_level_3,Unnamed: 11_level_3,Unnamed: 12_level_3,Unnamed: 13_level_3,Unnamed: 14_level_3,Unnamed: 15_level_3,Unnamed: 16_level_3,Unnamed: 17_level_3,Unnamed: 18_level_3,Unnamed: 19_level_3,Unnamed: 20_level_3,Unnamed: 21_level_3
211028,Australia,0,0,10.0,5.0,1.0,1.0,1.0,1.0,1.0,...,16.330000,7.170000,2.0,2.000000,6.300000,0.000000,42.000000,1.000000,42.000000,6.460000
211028,Australia,0,1,10.0,5.0,1.0,1.0,1.0,1.0,1.0,...,16.330000,7.170000,2.0,2.000000,6.300000,0.000000,42.000000,1.000000,42.000000,6.460000
211028,Australia,0,2,10.0,5.0,1.0,1.0,1.0,1.0,1.0,...,16.330000,7.170000,2.0,2.000000,6.300000,0.000000,42.000000,1.000000,42.000000,6.460000
211028,Australia,0,3,10.0,5.0,1.0,1.0,1.0,1.0,1.0,...,16.330000,7.170000,2.0,2.000000,6.300000,0.000000,42.000000,1.000000,42.000000,6.460000
211028,Australia,0,4,10.0,5.0,1.0,1.0,1.0,1.0,1.0,...,16.330000,7.170000,2.0,2.000000,6.300000,0.000000,42.000000,1.000000,42.000000,6.460000
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
1381457,Serbia,7,4,10.0,0.0,5.0,1.0,1.0,1.0,1.0,...,28.038534,8.036798,4.0,10.710079,32.795811,0.453754,241.972625,10.520946,28.038534,8.036798
1381457,Serbia,7,5,10.0,0.0,5.0,1.0,1.0,1.0,1.0,...,28.038534,8.036798,4.0,10.710079,32.795811,0.453754,241.972625,10.520946,28.038534,8.036798
1381457,Serbia,8,0,5.0,0.0,10.0,1.0,1.0,1.0,1.0,...,28.038534,8.036798,4.0,10.710079,32.795811,0.453754,241.972625,10.520946,28.038534,8.036798
1381457,Serbia,8,1,10.0,0.0,5.0,1.0,1.0,1.0,1.0,...,28.038534,8.036798,4.0,10.710079,32.795811,0.453754,241.972625,10.520946,28.038534,8.036798


In [21]:
old_columns = ground_onehot.columns.tolist()
old_columns

['The Rose Bowl',
 'Eden Park',
 'County Ground',
 'Brisbane Cricket Ground, Woolloongabba',
 'New Wanderers Stadium',
 'Sydney Cricket Ground',
 'Westpac Stadium',
 'Kennington Oval',
 'Kingsmead',
 'Newlands',
 'Western Australia Cricket Association Ground',
 'Melbourne Cricket Ground',
 'Old Trafford',
 'Brabourne Stadium',
 'Jade Stadium',
 'Gymkhana Club Ground',
 "St George's Park",
 'Kensington Oval, Bridgetown',
 'National Stadium',
 'SuperSport Park',
 "Queen's Park Oval, Port of Spain",
 'Civil Service Cricket Club, Stormont',
 "Lord's",
 'Trent Bridge',
 'Maple Leaf North-West Ground',
 'Seddon Park',
 'AMI Stadium',
 'R Premadasa Stadium',
 'Dubai International Cricket Stadium',
 'Warner Park, Basseterre',
 'Bellerive Oval',
 'Providence Stadium',
 'Beausejour Stadium, Gros Islet',
 'Edgbaston',
 'Sophia Gardens',
 'Vidarbha Cricket Association Stadium, Jamtha',
 'Punjab Cricket Association Stadium, Mohali',
 'Sir Vivian Richards Stadium, North Sound',
 'Adelaide Oval',
 'H

In [22]:
ground_onehot.columns = pd.MultiIndex.from_product([["Ground Data"], ["-"], old_columns])
ground_onehot

Unnamed: 0_level_0,Ground Data,Ground Data,Ground Data,Ground Data,Ground Data,Ground Data,Ground Data,Ground Data,Ground Data,Ground Data,Ground Data,Ground Data,Ground Data,Ground Data,Ground Data,Ground Data,Ground Data,Ground Data,Ground Data,Ground Data,Ground Data
Unnamed: 0_level_1,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-,-
Unnamed: 0_level_2,The Rose Bowl,Eden Park,County Ground,"Brisbane Cricket Ground, Woolloongabba",New Wanderers Stadium,Sydney Cricket Ground,Westpac Stadium,Kennington Oval,Kingsmead,Newlands,...,"Gahanga International Cricket Stadium, Rwanda","Zahur Ahmed Chowdhury Stadium, Chattogram","St Albans Club, Buenos Aires","Belgrano Athletic Club Ground, Buenos Aires","Hurlingham Club Ground, Buenos Aires","Mission Road Ground, Mong Kok, Hong Kong",Santarem Cricket Ground,Europa Sports Complex,"Solvangs Park, Glostrup","Sportpark Het Schootsveld, Deventer"
Match Code,Unnamed: 1_level_3,Unnamed: 2_level_3,Unnamed: 3_level_3,Unnamed: 4_level_3,Unnamed: 5_level_3,Unnamed: 6_level_3,Unnamed: 7_level_3,Unnamed: 8_level_3,Unnamed: 9_level_3,Unnamed: 10_level_3,Unnamed: 11_level_3,Unnamed: 12_level_3,Unnamed: 13_level_3,Unnamed: 14_level_3,Unnamed: 15_level_3,Unnamed: 16_level_3,Unnamed: 17_level_3,Unnamed: 18_level_3,Unnamed: 19_level_3,Unnamed: 20_level_3,Unnamed: 21_level_3
211028,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
211048,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
225263,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
225271,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
226374,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
1381454,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
1381455,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
1381456,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
1381457,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


In [23]:
final_data = other_data.merge(ground_onehot, left_index = True, right_index = True, how = "inner")

In [24]:
final_data = final_data.reset_index()

In [25]:
final_data = final_data.set_index(["Match Code", "Country", "Over", "Ball"])
final_data

Unnamed: 0_level_0,Unnamed: 1_level_0,Unnamed: 2_level_0,Unnamed: 3_level_0,Player Status,Player Status,Player Status,Player Status,Player Status,Player Status,Player Status,Player Status,Player Status,Player Status,...,Ground Data,Ground Data,Ground Data,Ground Data,Ground Data,Ground Data,Ground Data,Ground Data,Ground Data,Ground Data
Unnamed: 0_level_1,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Batting Team,Batting Team,Batting Team,Batting Team,Batting Team,Batting Team,Batting Team,Batting Team,Batting Team,Batting Team,...,-,-,-,-,-,-,-,-,-,-
Unnamed: 0_level_2,Unnamed: 1_level_2,Unnamed: 2_level_2,Unnamed: 3_level_2,P1,P2,P3,P4,P5,P6,P7,P8,P9,P10,...,"Gahanga International Cricket Stadium, Rwanda","Zahur Ahmed Chowdhury Stadium, Chattogram","St Albans Club, Buenos Aires","Belgrano Athletic Club Ground, Buenos Aires","Hurlingham Club Ground, Buenos Aires","Mission Road Ground, Mong Kok, Hong Kong",Santarem Cricket Ground,Europa Sports Complex,"Solvangs Park, Glostrup","Sportpark Het Schootsveld, Deventer"
Match Code,Country,Over,Ball,Unnamed: 4_level_3,Unnamed: 5_level_3,Unnamed: 6_level_3,Unnamed: 7_level_3,Unnamed: 8_level_3,Unnamed: 9_level_3,Unnamed: 10_level_3,Unnamed: 11_level_3,Unnamed: 12_level_3,Unnamed: 13_level_3,Unnamed: 14_level_3,Unnamed: 15_level_3,Unnamed: 16_level_3,Unnamed: 17_level_3,Unnamed: 18_level_3,Unnamed: 19_level_3,Unnamed: 20_level_3,Unnamed: 21_level_3,Unnamed: 22_level_3,Unnamed: 23_level_3,Unnamed: 24_level_3
211028,Australia,0,0,10.0,5.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
211028,Australia,0,1,10.0,5.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
211028,Australia,0,2,10.0,5.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
211028,Australia,0,3,10.0,5.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
211028,Australia,0,4,10.0,5.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
1381457,Serbia,7,4,10.0,0.0,5.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
1381457,Serbia,7,5,10.0,0.0,5.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
1381457,Serbia,8,0,5.0,0.0,10.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
1381457,Serbia,8,1,10.0,0.0,5.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


# We are done with data retrieval! Going to store these values a pickle file to retrieve later


### First time to reduce the memory

In [26]:
final_data_lite = final_data.astype('float16')
final_data_lite

Unnamed: 0_level_0,Unnamed: 1_level_0,Unnamed: 2_level_0,Unnamed: 3_level_0,Player Status,Player Status,Player Status,Player Status,Player Status,Player Status,Player Status,Player Status,Player Status,Player Status,...,Ground Data,Ground Data,Ground Data,Ground Data,Ground Data,Ground Data,Ground Data,Ground Data,Ground Data,Ground Data
Unnamed: 0_level_1,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Batting Team,Batting Team,Batting Team,Batting Team,Batting Team,Batting Team,Batting Team,Batting Team,Batting Team,Batting Team,...,-,-,-,-,-,-,-,-,-,-
Unnamed: 0_level_2,Unnamed: 1_level_2,Unnamed: 2_level_2,Unnamed: 3_level_2,P1,P2,P3,P4,P5,P6,P7,P8,P9,P10,...,"Gahanga International Cricket Stadium, Rwanda","Zahur Ahmed Chowdhury Stadium, Chattogram","St Albans Club, Buenos Aires","Belgrano Athletic Club Ground, Buenos Aires","Hurlingham Club Ground, Buenos Aires","Mission Road Ground, Mong Kok, Hong Kong",Santarem Cricket Ground,Europa Sports Complex,"Solvangs Park, Glostrup","Sportpark Het Schootsveld, Deventer"
Match Code,Country,Over,Ball,Unnamed: 4_level_3,Unnamed: 5_level_3,Unnamed: 6_level_3,Unnamed: 7_level_3,Unnamed: 8_level_3,Unnamed: 9_level_3,Unnamed: 10_level_3,Unnamed: 11_level_3,Unnamed: 12_level_3,Unnamed: 13_level_3,Unnamed: 14_level_3,Unnamed: 15_level_3,Unnamed: 16_level_3,Unnamed: 17_level_3,Unnamed: 18_level_3,Unnamed: 19_level_3,Unnamed: 20_level_3,Unnamed: 21_level_3,Unnamed: 22_level_3,Unnamed: 23_level_3,Unnamed: 24_level_3
211028,Australia,0,0,10.0,5.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
211028,Australia,0,1,10.0,5.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
211028,Australia,0,2,10.0,5.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
211028,Australia,0,3,10.0,5.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
211028,Australia,0,4,10.0,5.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
1381457,Serbia,7,4,10.0,0.0,5.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
1381457,Serbia,7,5,10.0,0.0,5.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
1381457,Serbia,8,0,5.0,0.0,10.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
1381457,Serbia,8,1,10.0,0.0,5.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


In [27]:
# Check memory usage For Original
memory_usage_per_column = final_data.memory_usage(deep=True)
total_memory_usage = memory_usage_per_column.sum()
print("Total memory usage For Original Version:", total_memory_usage / (1024 * 1024), "MB")

# Check memory usage For Lite
memory_usage_per_column = final_data_lite.memory_usage(deep=True)
total_memory_usage = memory_usage_per_column.sum()


print("Total memory usage For New Lite Version:", total_memory_usage / (1024 * 1024), "MB")

Total memory usage For Original Version: 975.6436729431152 MB
Total memory usage For New Lite Version: 244.96994018554688 MB



### Here is how to access them:

```py
final_data = pd.read_pickle(parent_dir / "Resources/final_data.pkl")
```

In [28]:
final_data_lite.to_pickle(parent_dir / "Resources/final_data.pkl")
final_data = pd.read_pickle(parent_dir / "Resources/final_data.pkl")
final_data

Unnamed: 0_level_0,Unnamed: 1_level_0,Unnamed: 2_level_0,Unnamed: 3_level_0,Player Status,Player Status,Player Status,Player Status,Player Status,Player Status,Player Status,Player Status,Player Status,Player Status,...,Ground Data,Ground Data,Ground Data,Ground Data,Ground Data,Ground Data,Ground Data,Ground Data,Ground Data,Ground Data
Unnamed: 0_level_1,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Batting Team,Batting Team,Batting Team,Batting Team,Batting Team,Batting Team,Batting Team,Batting Team,Batting Team,Batting Team,...,-,-,-,-,-,-,-,-,-,-
Unnamed: 0_level_2,Unnamed: 1_level_2,Unnamed: 2_level_2,Unnamed: 3_level_2,P1,P2,P3,P4,P5,P6,P7,P8,P9,P10,...,"Gahanga International Cricket Stadium, Rwanda","Zahur Ahmed Chowdhury Stadium, Chattogram","St Albans Club, Buenos Aires","Belgrano Athletic Club Ground, Buenos Aires","Hurlingham Club Ground, Buenos Aires","Mission Road Ground, Mong Kok, Hong Kong",Santarem Cricket Ground,Europa Sports Complex,"Solvangs Park, Glostrup","Sportpark Het Schootsveld, Deventer"
Match Code,Country,Over,Ball,Unnamed: 4_level_3,Unnamed: 5_level_3,Unnamed: 6_level_3,Unnamed: 7_level_3,Unnamed: 8_level_3,Unnamed: 9_level_3,Unnamed: 10_level_3,Unnamed: 11_level_3,Unnamed: 12_level_3,Unnamed: 13_level_3,Unnamed: 14_level_3,Unnamed: 15_level_3,Unnamed: 16_level_3,Unnamed: 17_level_3,Unnamed: 18_level_3,Unnamed: 19_level_3,Unnamed: 20_level_3,Unnamed: 21_level_3,Unnamed: 22_level_3,Unnamed: 23_level_3,Unnamed: 24_level_3
211028,Australia,0,0,10.0,5.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
211028,Australia,0,1,10.0,5.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
211028,Australia,0,2,10.0,5.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
211028,Australia,0,3,10.0,5.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
211028,Australia,0,4,10.0,5.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
1381457,Serbia,7,4,10.0,0.0,5.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
1381457,Serbia,7,5,10.0,0.0,5.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
1381457,Serbia,8,0,5.0,0.0,10.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
1381457,Serbia,8,1,10.0,0.0,5.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


In [29]:
## final_data.to_pickle("D:/arsalan/final_data.pkl")
## final_data = pd.read_pickle("D:/arsalan/final_data.pkl")
## final_data

In [2]:
final_data = pd.read_pickle(parent_dir / "Resources/final_data.pkl")
final_data

Unnamed: 0_level_0,Unnamed: 1_level_0,Unnamed: 2_level_0,Unnamed: 3_level_0,Player Status,Player Status,Player Status,Player Status,Player Status,Player Status,Player Status,Player Status,Player Status,Player Status,...,Ground Data,Ground Data,Ground Data,Ground Data,Ground Data,Ground Data,Ground Data,Ground Data,Ground Data,Ground Data
Unnamed: 0_level_1,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Batting Team,Batting Team,Batting Team,Batting Team,Batting Team,Batting Team,Batting Team,Batting Team,Batting Team,Batting Team,...,-,-,-,-,-,-,-,-,-,-
Unnamed: 0_level_2,Unnamed: 1_level_2,Unnamed: 2_level_2,Unnamed: 3_level_2,P1,P2,P3,P4,P5,P6,P7,P8,P9,P10,...,"Gahanga International Cricket Stadium, Rwanda","Zahur Ahmed Chowdhury Stadium, Chattogram","St Albans Club, Buenos Aires","Belgrano Athletic Club Ground, Buenos Aires","Hurlingham Club Ground, Buenos Aires","Mission Road Ground, Mong Kok, Hong Kong",Santarem Cricket Ground,Europa Sports Complex,"Solvangs Park, Glostrup","Sportpark Het Schootsveld, Deventer"
Match Code,Country,Over,Ball,Unnamed: 4_level_3,Unnamed: 5_level_3,Unnamed: 6_level_3,Unnamed: 7_level_3,Unnamed: 8_level_3,Unnamed: 9_level_3,Unnamed: 10_level_3,Unnamed: 11_level_3,Unnamed: 12_level_3,Unnamed: 13_level_3,Unnamed: 14_level_3,Unnamed: 15_level_3,Unnamed: 16_level_3,Unnamed: 17_level_3,Unnamed: 18_level_3,Unnamed: 19_level_3,Unnamed: 20_level_3,Unnamed: 21_level_3,Unnamed: 22_level_3,Unnamed: 23_level_3,Unnamed: 24_level_3
211028,Australia,0,0,10.0,5.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
211028,Australia,0,1,10.0,5.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
211028,Australia,0,2,10.0,5.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
211028,Australia,0,3,10.0,5.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
211028,Australia,0,4,10.0,5.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
1381457,Serbia,7,4,10.0,0.0,5.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
1381457,Serbia,7,5,10.0,0.0,5.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
1381457,Serbia,8,0,5.0,0.0,10.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
1381457,Serbia,8,1,10.0,0.0,5.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


In [3]:
final_data.loc[1359787]

Unnamed: 0_level_0,Unnamed: 1_level_0,Unnamed: 2_level_0,Player Status,Player Status,Player Status,Player Status,Player Status,Player Status,Player Status,Player Status,Player Status,Player Status,...,Ground Data,Ground Data,Ground Data,Ground Data,Ground Data,Ground Data,Ground Data,Ground Data,Ground Data,Ground Data
Unnamed: 0_level_1,Unnamed: 1_level_1,Unnamed: 2_level_1,Batting Team,Batting Team,Batting Team,Batting Team,Batting Team,Batting Team,Batting Team,Batting Team,Batting Team,Batting Team,...,-,-,-,-,-,-,-,-,-,-
Unnamed: 0_level_2,Unnamed: 1_level_2,Unnamed: 2_level_2,P1,P2,P3,P4,P5,P6,P7,P8,P9,P10,...,"Gahanga International Cricket Stadium, Rwanda","Zahur Ahmed Chowdhury Stadium, Chattogram","St Albans Club, Buenos Aires","Belgrano Athletic Club Ground, Buenos Aires","Hurlingham Club Ground, Buenos Aires","Mission Road Ground, Mong Kok, Hong Kong",Santarem Cricket Ground,Europa Sports Complex,"Solvangs Park, Glostrup","Sportpark Het Schootsveld, Deventer"
Country,Over,Ball,Unnamed: 3_level_3,Unnamed: 4_level_3,Unnamed: 5_level_3,Unnamed: 6_level_3,Unnamed: 7_level_3,Unnamed: 8_level_3,Unnamed: 9_level_3,Unnamed: 10_level_3,Unnamed: 11_level_3,Unnamed: 12_level_3,Unnamed: 13_level_3,Unnamed: 14_level_3,Unnamed: 15_level_3,Unnamed: 16_level_3,Unnamed: 17_level_3,Unnamed: 18_level_3,Unnamed: 19_level_3,Unnamed: 20_level_3,Unnamed: 21_level_3,Unnamed: 22_level_3,Unnamed: 23_level_3
Argentina,0,0,10.0,5.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,...,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0
Argentina,0,1,10.0,5.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,...,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0
Argentina,0,2,10.0,5.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,...,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0
Argentina,0,3,10.0,5.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,...,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0
Argentina,0,4,10.0,5.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,...,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
Bermuda,19,2,0.0,0.0,0.0,0.0,5.0,10.0,1.0,1.0,1.0,1.0,...,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0
Bermuda,19,3,0.0,0.0,0.0,0.0,5.0,10.0,1.0,1.0,1.0,1.0,...,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0
Bermuda,19,4,0.0,0.0,0.0,0.0,5.0,10.0,1.0,1.0,1.0,1.0,...,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0
Bermuda,19,5,0.0,0.0,0.0,0.0,5.0,10.0,1.0,1.0,1.0,1.0,...,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0
