## Unveiling Insights through Data-driven Exploration of Cricket World Cup 2023

### Project Description 

The recently concluded  <a href= 'https://www.cricketworldcup.com/'>ICC Men's 50 Over Cricket World Cup 2023</a> holds untapped data that can provide strategic insights. Our project aims to evaluate the performance of all the teams across the tournament by analyzing various aspects such as individual batting and bowling performances, venue-specific trends, the impact of toss decisions on match outcomes, team powerplay tactics, and first and second-innings performances. We want to understand how these factors contributed to some teams' success and others' underperformance in the tournament. To accomplish this, we collected data by scraping web data from reliable cricket news websites like <a href='https://www.espncricinfo.com/'>ESPNCricinfo</a> and <a href = 'https://www.cricbuzz.com/'>Cricbuzz</a> using the BeautifulSoup and Requests Python modules. We then applied appropriate data cleaning methods to transform the raw data to help us with our subsequent analysis. Finally, we created visualizations to communicate our findings effectively.

### Brief Description of the Tournament Structure

The Men's ODI (One-Day International) World Cup is a cricket tournament held every four years, in which national teams compete in One Day International matches. The tournament is organized by the International Cricket Council (ICC). In 2023, the tournament was hosted by India, and ten national teams participated. The tournament began on October 5th and concluded on November 19th, with Australia being crowned as the champions. The tournament consisted of 48 games, played across 10 different venues in India. Out of these 48 games, 45 were group stage matches, in which each team faced all the other teams. The top four teams from the group stage qualified for the knockout stages, which included the semi-finals (46th and 47th games) and the final (48th game). Each team can gain 2 points on winning the match and none for losing. In this project, we have limited our analysis to only <b>group stage</b> matches and individual performance analysis is inclusive of the entire tournament data (including knockouts).

#### Map showing the Participating Teams

<img src = 'plots/maps/world_map.png'/>

#### Map of Venues

In [1]:
from IPython.display import IFrame
IFrame(src='plots/maps/india_map.html', width=1000, height=1000)

### Data Analysis and Visualizations

#### Missing Values Analysis

<p>While performing data cleaning, we have observed that the raw <b>match_facts.csv</b> and <b>Batting_Scorecards.csv</b> has some missing values in their columns.</p>

<img src = 'plots/with_null_value/match_facts.png' />

<center><b>Fig 1: match_facts.csv with missing records in Team 1 PP-3 and Team 2 PP-3 Scores</b></center>

<img src = 'plots/with_null_value/Batting_Scorecards.png' />

<center><b>Fig 2: Batting_Scorecards.csv with multiple missing records</b></center>

For some context, there are three phases in an innings: -  <b>Powerplay 1</b> is from Overs 1-10, <b>Powerplay 2</b> is from Overs 11-40 and <b>Powerplay 3</b> is from Overs 41-50.

For the match_facts.csv file, it can be observed that some of the teams' powerplay 3 (PP-3) scores are missing. This is because those teams got bowled out before PP-3. To fill in the gaps, the missing values were replaced with 'NA'. In the case of the Batting_Scorecards.csv file, several records containing batting scores (runs scored and balls faced) were missing. This is because not all the batsmen got a chance to bat in the match. To fill in the gaps, each missing record was filled with 0. So, after filling in the missing values, the results are now complete.

<img src = 'plots/without_null_value/match_facts.png' />

<center><b>Fig 3: match_facts.csv with no missing values</b></center>

<img src = 'plots/without_null_value/Batting_Scorecards.png' />

<center><b>Fig 4: Batting_Scorecards.csv with no missing values</b></center>

#### Team Standings

Now, we take a look at the team standings after group stages.

In [2]:
IFrame(src='plots/analysis_plots/teams_matches_plot.html', width=980, height=1000)

<center><b>Fig 5: Plot showing the Teams and Matches Won along with their Qualification Status (Q - Qualified, E - Eliminated)</b></center>

From this plot, it can be clearly seen that: - 
<ul>
    <li>The top four teams - India, South Africa, Australia, and New Zealand - have qualified for the knockouts by winning the highest number of games in the group stage. </li>
    <li>India has won all of their group stage games, while Bangladesh, Sri Lanka, and Netherlands have won the least number of games.</li>
    <li>New Zealand, who are in the 4th position, have won five games, highlighting the fact that teams needed at least five wins to qualify for the knockouts.</li>
</ul>

Furthermore, we will be discussing the team tactics and factors which have have helped certain teams to perform better than others.

#### Matchwise Trends

In [3]:
IFrame(src='plots/analysis_plots/matchwise_points.html', width=980, height=1000)

<center><b>Fig 6: Plot showing the Match Number v/s Points Gained by different Teams</b></center>

The plot above shows some interesting insights about the performance of different teams in the tournament. Here are a few key takeaways:

- India has a perfectly linear plot because they won all their matches and secured the maximum points.
- After losing their first two matches, Australia won all their subsequent matches.
- South Africa won four matches in a row from their 4th match to their 7th match.
- New Zealand had a four-match winning streak followed by a four-match losing streak. They managed to qualify by winning their last remaining match.
- England won their first match in the second game but then lost five consecutive games, which resulted in them being knocked out of the group stages.
- Pakistan won their first two games but then had a four-match losing streak.
- Bangladesh won their first match but then lost six consecutive games.
- Netherlands and Sri Lanka only won a couple of matches each, with multiple losses in between.

Having analysed the matchwise trends for each team, let's further deep dive into gaining insights throughout the tournament.

#### General Tournament Trends

This section includes some of the most common trends observed throughout the tournament.

In [4]:
IFrame(src='plots/analysis_plots/teams_matches_average_runs_plot.html', width=980, height=1000)

<center><b>Fig 7: Plot showing the Average Runs scored per Wicket Lost by different Teams while Batting</b></center>

It's evident that the top four teams, who have made it to the knockout stages, have performed exceptionally well in terms of scoring high runs per wicket lost as compared to the other teams. Notably, Team India stands out with an impressive batting average per wicket of 57.34, while Netherlands are at the bottom of the table with the least average of 21.16.

In [5]:
IFrame(src='plots/analysis_plots/teams_matches_average_wickets_plot.html', width=980, height=1000)

<center><b>Fig 8: Plot showing the Average Wickets Lost by each Team per Match</b></center>

The teams that have been performing the best are those that have lost the least number of wickets per match. Out of these teams, only Australia has lost a higher number of wickets per match. In particular, Team India has lost fewer than 5 wickets per match, while both Netherlands and England have lost almost all 10 wickets in each match.

<img src = 'plots/analysis_plots/overall_team_stats_heatmap.png' />

<center><b>Fig 9: Heatmap showing the Correlation among the general Team Stats</b></center>

After performing a correlation analysis of the factors involved in <b>Figures 7 and 8</b>, the following conclusions can be drawn:
<ul>
    <li>Teams that have won matches had a higher average of runs scored per wicket lost, and they lost fewer wickets per match.</li>
    <li>This trend is observed with teams that qualified.</li>
</ul>

Next, we will be getting an brief overview of first and second innings statistics.

#### General First and Second Innings Stats

In [6]:
IFrame(src='plots/analysis_plots/matches_won_batting_1_2.html', width=980, height=1000)

<center><b>Fig 10: Pie Chart showing Wins Batting 1st v/s Batting 2nd</b></center>

When playing cricket in the subcontinent and during day-night games, there is always a possibility of the dew factor favoring the team batting second. However, based on the pie chart above, it was observed that teams had an equal chance of winning the match whether they batted first or chased during the tournament.

In [7]:
IFrame(src='plots/analysis_plots/1st_2nd_batting_counts.html', width=980, height=1000)

<center><b>Fig 11: Plot showing the count of Teams Batting 1st v/s 2nd</b></center>

Almost all teams had equal opportunities to bat first or chase a target, as shown in the plot.

In [8]:
IFrame(src='plots/analysis_plots/First_Batting_Wins.html', width=980, height=1000)

<center><b>Fig 12: Plots showing count of Teams Batting first and the total wins</b></center>

India and South Africa have won all of their matches while batting first, whereas Sri Lanka and Bangladesh did not win a single match while doing the same. Australia has lost only one match while batting first, while the rest of the teams have lost more than two matches.

In [9]:
IFrame(src='plots/analysis_plots/Second_batting_Wins.html', width=980, height=1000)

<center><b>Fig 13: Plots showing count of Teams Batting Second and the Total Wins</b></center>

India has won all of their matches when chasing, whereas Netherlands and England have lost all of their matches when batting second. Afghanistan and Australia have lost one match each while batting second, while the remaining teams have lost more than two matches under the same circumstances.

In [10]:
IFrame(src='plots/analysis_plots/Bowled_out_frequency.html', width=980, height=1000)

<center><b>Fig 14: Plot showing the number of times Teams get Bowled Out while Batting 1st before playing 50 overs</b></center>

Based on the plot, it is evident that three (India, New Zealand and South Africa) out of the four top teams have not lost all their 10 wickets in a single match batting first, with the exception being Australia, who has been bowled out three times. Only England and Bangladesh have been bowled out once, while the remaining teams have been bowled out more than twice batting first.

In [11]:
IFrame(src='plots/analysis_plots/less_than_300_less_than_50.html', width=980, height=1000)

<center><b>Fig 15: Pie showing win and loss %age of teams Winning and Losing after scoring less than 300 and getting Bowled Out Batting 1st</b></center>

In One Day Internationals (ODIs), it is commonly believed that teams which bat first and score over 300 runs have a higher probability of winning the game. This trend is also observed in tournaments, with roughly 80% of the teams losing their matches when they score less than 300 runs and get bowled out.

Now let's visualize the teams:

In [12]:
IFrame(src='plots/analysis_plots/Teams_Bowled_out_losses.html', width=980, height=1000)

<center><b>Fig 16: Plot showing the number of times each Team has lost scoring less than 300 and getting Bowled Out Batting 1st.</b></center>

It is evident that the teams that have not performed well have generally been bowled out before scoring 300 runs while batting first. Sri Lanka, for instance, has lost four out of their five matches batting first and scoring less than 300 runs, which resulted in them being bowled out. Netherlands, on the other hand, has lost both their matches while batting first. Similarly, Pakistan lost three matches, where they were bowled out twice while scoring less than 300 runs batting first. Furthermore, the top three out of four teams, except for Australia, have not been bowled out even once when batting first.

In [13]:
IFrame(src='plots/analysis_plots/over_300.html', width=980, height=1000)

<center><b>Fig 17: Pie showing Win and Loss %age of Teams Winning and Losing after scoring in excess of 300 Batting 1st</b></center>

As per the pie chart above, it is evident that teams that score over 300 runs in the first innings have a higher probability of winning the match.

In [14]:
IFrame(src='plots/analysis_plots/Teams_over_300_wins.html', width=980, height=1000)

<center><b>Fig 18: Plot showing the number of times each Team has won scoring over 300</b></center>

South Africa has won all of their matches while batting first, and in each of these matches, they have scored over 300 runs. Australia and India have also won three out of their four won matches while batting first and scoring over 300 runs. Similarly, in their three wins while batting first, England have scored over 300 runs in all of these matches.

To conclude this section, it can be inferred that:
<ul>
    <li>Teams batting first and second had an equal chance of winning their matches.</li>
    <li>Almost all the teams had an equal opportunity to batting 1st and 2nd.</li>
    <li>The teams that scored over 300 runs have won most of their matches, and the top-performing teams have managed to score over 300 runs in most of their first batting wins.</li>
    <li>On the other hand, teams that got bowled out in the first innings while batting first and scored less than 300 runs have lost most of their matches. Most of the underperforming teams have scored less than 300 runs while batting first and have been bowled out.</li>
</ul>

Further, let's deep dive into the first and second inning stats. Firstly, we look into first innings batting stats.

From the <b>Figure 12. </b>, it can be seen top five teams who have performed well batting first (bowling second) are:- India, England, South Africa, Australia and New Zealand.

#### First Innings In-depth analysis

In [15]:
IFrame(src='plots/analysis_plots/first_innings_detailed_stats_1.html', width=980, height=1000)

<center><b>Fig 19: Plot showing the Average Run Rate for each Team Batting First</b></center>

For some context, <b>Run Rate </b>is calculated as runs scored per over.

We can see that the top five teams who have had highest number of wins batting first have had the highest run rate while batting first with South Africa leading with a rate of 7.51.

In [16]:
IFrame(src='plots/analysis_plots/first_innings_detailed_stats_2.html', width=980, height=1000)

<center><b>Fig 20: Plot showing the Average Wickets lost per Match for different Teams Batting 1st</b></center>

Similarly top three (India, New Zealand and South Africa) out of the five best performing teams who have won batting first have lost the least number of wickets barring Australia and England who have lost a considerable amount of wickets batting first. Only these three teams have lost fewer wickets than the tournament average with rest of the teams approximately losing over 9 wickets per match.

Now let's look at the powerplay-wise segregation.

In [17]:
IFrame(src='plots/analysis_plots/first_innings_detailed_stats_3.html', width=980, height=1000)

<center><b>Fig 21: Plot showing the Average Run Rate for all the Teams in different Powerplay Phases while Batting First</b></center>

The following can be inferred from the plot:
<ul>
    <li>It can be seen that all the teams except Sri Lanka have significantly higher run rates in Powerplay 3 as compared to the previous two powerplays.
    </li>
    <li>
        A trend is observed with Afghanistan, australia, Bangladesh, Netherlands and India is that they have scored heavily in PP-1, the scoring dropping slightly in PP-2 and again picks up in PP-3.
    </li>
    <li>
        However, Pakistan, New Zealand and South Africa have scored at a lower rate in PP-1 and picked up their scoring rate in PP-2 and PP-3. 
    </li>
    <li>Sri Lanka, being the exception of their run scoring rate dropping from PP-1 to PP-3 (possibly due to fall of wickets as observed in <b>Fig. 20</b>).
    </li>
</ul>
  

In [18]:
IFrame(src='plots/analysis_plots/first_innings_detailed_stats_4.html', width=980, height=1000)

<center><b>Fig 22: Plot showing the Average Wickets lost by each Team during different Powerplay Phases while Batting 1st</b></center>

From the above figure, the following can be clearly seen:
<ul>
    <li>
        Very few wickets were lost by top five performing teams (Batting 1st) during the the first PP.</li><li>
        Most of the teams who lost wickets in PP-2 have experienced a decrease in the run rate (as observed in <b>Fig 21</b>).
    </li>
    <li>
        Inspite of losing wickets in PP-3, teams have scored at a quicker run rate in PP-3.
    </li>
</ul>

Next, we have analyzed the contribution of top, middle and lower order batsmen. 

For our analysis, we have considered the top-3 batsmen to be part of <b>top order</b>, 4-7 as <b>middle order</b> and last four batsmen as <b>lower order</b>.

In [19]:
IFrame(src='plots/analysis_plots/first_innings_detailed_stats_5.html', width=980, height=1000)

<center><b>Fig 23: Plot showing the %age contribution of Top, Middle and Lower Order while Batting 1st</b></center>

It is seen that the top and middle order have done the bulk of the scoring for top three (India, South Africa and New Zealand out of five best performing teams (batting 1st) did not require much contribution from the lower order (Australia and England have had relatively higher contributions from their lower batting order). The teams which haven't performed well are generally observed to have a higher rate of contribution in the lower order. Interestingly, it is observed that the lower order of Netherlands have had a higher percentage runs contributed by the lower order as compared to the top order (which is explained by the highest average wickets lost in PP-1 in the <b>Fig 22</b>.)

For some context, <b>Strike Rate</b> is referred as Total Runs scored by a batsman per balls faced

In [20]:
IFrame(src='plots/analysis_plots/first_innings_detailed_stats_6.html', width=980, height=1000)

<center><b>Fig 24: Plot showing the Phasewise batting Strike Rates while batting first</b></center>

It is observed that top 5 teams that have performed well batting first have had a higher middle order batting strike rate.

For some context, Fours and Sixes are considered as <b>boundaries</b>.

In [21]:
IFrame(src='plots/analysis_plots/first_innings_detailed_stats_7.html', width=980, height=1000)

<center><b>Fig 25: Plot showing the Run %age scored in Boundaries</b></center>

It is observed that top 5 teams (best win percentage batting first) have had highest %age of their runs scored in boundaries (more than the tournament average).

<img src = 'plots/analysis_plots/overall_first_batting_heatmap_1.png' />

<center><b>Fig 26: Correlation of Team statistics (Batting 1st)</b></center>

The above heatmap summarizes the correlation among all the factors (while Batting 1st) from <b> Figure 19 to 25 </b> and win percentage of teams. Below are some key findings:

<ul>
    <li>
        Teams who have won more matches are observed to have a higher Middle Order Strike Rate and PP-3 Run Rate.
    </li>
    <li>
        Teams who have lost more wickets in PP-1 are observed to have a lower scoring rate in PP-2.
    </li>
     <li>
        Teams who have lost more wickets in PP-2 are observed to have a lower scoring rate in PP-2 and PP-3.
    </li>
    <li>
        There is a significant positive correlation between Top Order Contribution and PP-2 rate indicating that teams whose top order has had higher contributions have achieved a higher rate of scoring in PP-2.
    </li>
      <li>
        Teams whose top order has contributed the most towards the final score have scored runs at a higher strike rate.
    </li>
    <li> Higher strike rate by middle order batsmen increases the scoring rate in PP-2 and PP-3.
    </li>
    <li>Teams who have lost lesser number of wickets in PP-2 have had a higher middle order strike rate.
    </li>
</ul>
Now let's look at the bowling stats for first innings.

For some context, <b>economy rate</b> of bowlers is denoted as total runs conceded divided by total overs bowled.

From the <b>Figure 13. </b>, it can be seen top five teams who have performed well bowling first (batting second) are:- India, Afghanistan, Australia, Pakistan and New Zealand.

In [22]:
IFrame(src='plots/analysis_plots/first_innings_detailed_stats_8.html', width=980, height=1000)

<center><b>Fig 27: Plot showing the Average Economy rate of all Teams Bowling 1st</b></center>

From the above plot, it is clearly visible that the top four of the five teams that have performed well bowling first (except Pakistan) have conceded runs at a lower economy rate than the tournament average.

In [23]:
IFrame(src='plots/analysis_plots/first_innings_detailed_stats_9.html', width=980, height=1000)

<center><b>Fig 28: Plot showing the Run %age scored in Boundaries</b></center>

The following can be inferred from the plot:
<ul>
    <li>
        India has conceded runs at the lowest economy rate of 4.74 in PP-1.
    </li>
    <li>
        Afghanistan have conceded the least economy rate of 4.35 during the middle overs in PP-2.
    </li>
    <li>
        In the death overs (PP-3), all the teams have conceded an economy rate of over 6 runs per over apart from India who have conceded runs at 4.88.
    </li>
    <li>
        Bangladesh have conceded runs at 5.08 in PP-1 (second-lowest after India). However, they have conceded runs at an economy of 8.6 during PP-3 (third most among all teams).
    </li>
    <li>
        Inspite of New Zealand conceding the second-highest economy rate (6.64) during PP-1. However, they have managed to reduce the run flow by conceding runs at 5.11.
    </li>
</ul>

In [24]:
IFrame(src='plots/analysis_plots/first_innings_detailed_stats_11.html', width=980, height=1000)

<center><b>Fig 29: Plot showing the average wickets taken per match (Bowling 1st)</b></center>

From this graph, it can be inferred that:
<ul>
    <li>
        India and England have managed to take over 9 wickets per match bowling first. However, England have conceded runs at a much higher rate than India (<b>Figure 27</b>).
    </li>
    <li>
        Australia and Netherlands have managed to take the least number of wickets bowling first. However, Australia has conceded runs at a lower rate as compared to Netherlands ((<b>Figure 27</b>).
    </li>
</ul>

In [25]:
IFrame(src='plots/analysis_plots/first_innings_detailed_stats_10.html', width=980, height=1000)

<center><b>Fig 30: Plot showing the Average Wickets taken per Match during the different Powerplay Phases (Bowling 1st)</b></center>

From the above plot, the following can be concluded:
<ul>
    <li>
        Among the best performing teams bowling 1st, New Zealand and South Africa have managed to make early wickets by taking nearly 2 wickets in PP-1 on average.
    </li>
    <li>
        India, Afghanistan and Australia have taken fewer wickets in PP-1. However, it is observed that they have taken the highest average number of wickets during middle overs (PP-2), which explains the reason of them conceding runs at a lower economy during PP-2 (<b>Figure 28</b>).
    </li>
    <li>
        Pakistan have managed to take the highest number of wickets in PP-3. However, they have taken the least number of wickets during the first two powerplays which explains their high economy rates during these phases (<b>Figure 28</b>).
    </li>
</ul>

In [26]:
IFrame(src='plots/analysis_plots/first_innings_detailed_stats_12.html', width=980, height=1000)

<center><b>Fig 31: Plot showing the Run %age conceded in Boundaries (Bowling 1st)</b></center>

From the above plot, India, Australia and Afghanistan have conceded the least percentage of runs as boundaries while bowling first and also have the least economy rate conceded among all the other teams <b>(Figure 27)</b>.

For some context, <b>Maiden</b> overs are those where bowlers do not concede a single run in the entire over.

In [27]:
IFrame(src='plots/analysis_plots/first_innings_detailed_stats_13.html', width=980, height=1000)

<center><b>Fig 32: Plot showing the total Maiden Over count of different Teams (Batting 1st)</b></center>

From this plot, we can see that New Zealand have bowled the highest number of maiden overs. However, they have conceded the third highest percentage of runs in boundaries (<b>Figure 31</b>), which could have contributed to their high economy rate.

For some context, <b>Extras</b> are considered as extra runs conceded by teams while bowling. Different types are Wides, No Balls, byes and leg byes. For our analysis, we have considered only <b>Wides</b> and <b>No Balls</b>.

In [28]:
IFrame(src='plots/analysis_plots/first_innings_detailed_stats_14.html', width=980, height=1000)

<center><b>Fig 33: Plot showing the Total extras conceded in Group Stages</b></center>

Among the top four out of five teams barring South Africa who have performed well bowling first, they have conceded lesser number of extras than the tournament average.

<img src = 'plots/analysis_plots/overall_first_bowling_heatmap_1.png' />

<center><b>Fig 34: Correlation of Team statistics (Bowling 1st)</b></center>

The above heatmap summarizes the correlation among all the factors (while Bowling 1st) from <b> Figure 27 to 33 </b> and win percentage of teams. Below are some key findings:

<ul>
    <li>
        Teams who have conceded the least runs in PP-3 (less economy rate in PP-3) have won more matches bowling 1st.
    </li>
    <li>
        Teams who have taken more wickets in the second powerplay have restricted the run scoring rate in middle overs (PP-2).
    </li>
</ul>
Now let's look at the batting stats for second innings.

#### Second Innings In-depth analysis

From the <b>Figure 13. </b>, it can be seen top five teams who have performed well bowling first (batting second) are:- India, Pakistan, Afghanistan, Australia and New Zealand.

In [29]:
IFrame(src='plots/analysis_plots/second_innings_detailed_stats_1.html', width=980, height=1000)

<center><b>Fig 35: Plot showing the Average Wickets lost while Batting 2nd</b></center>

From the bar plot, it is evident that the top 5 chasing teams (teams batting 2nd) have lost the least amount of wickets as compared to the rest of the teams.

In [30]:
IFrame(src='plots/analysis_plots/second_innings_detailed_stats_2.html', width=980, height=1000)

<center><b>Fig 36: Plot containing the percentage of Runs Scored while Batting Second in Different phases</b></center>

The following can be derived from this plot:
<ul>
    <li>It is observed that the top chasing teams (barring Pakistan) have scored a higher percentage of their target runs during the first PP.</li>
    <li>Also, all these teams have also scored the highest percentage of their runs in the second PP.</li>
</ul>

In [31]:
IFrame(src='plots/analysis_plots/second_innings_detailed_stats_3.html', width=980, height=1000)

<center><b>Fig 37: Plots containing phasewise wickets lost during Run Chases</b></center>

The following can be concluded from the above plot:
<ul>
    <li>
        In the first PP, the top chasing teams (besides Australia) have lost approximately 1 wicket per match while chasing.
    </li>
    <li>
        India lost the lowest number of wickets in PP-2.
    </li>
    <li>
        Despite losing less than 2 wickets on average in PP-1, Netherlands and South africa have lost over 5 wickets in the second powerplay.
    </li>
    <li>
        England lost the joint most wickets in PP-1 and joint second-most wickets in PP-2 (which could have contributed in them losing all their matches batting 2nd as per <b>figure 13</b>).
    </li>
</ul>

In [32]:
IFrame(src='plots/analysis_plots/second_innings_detailed_stats_4.html', width=980, height=1000)

<center><b>Fig 38: Plots containing Percentage-Wise distribution of Runs Scored while chasing</b></center>

From the bar plot, the following can be inferred:
<ul>
    <li>
        For top five teams successfully chasing targets, it was observed that they had the highest top order contribution.
    </li>
    <li>
        The higher percentage of runs scored by the top order for the successful teams could be attributed to lesser loss of wickets (<b>Figure 35</b>).
    </li>
    <li>
         A higher percentage contribution of runs is observed in the lower order for the teams who have underperformed while chasing as their top and middle order haven't scored enough runs and have lost wickets (<b>Figure 35</b>).
    </li>
</ul>

<img src = 'plots/analysis_plots/overall_second_batting_heatmap_1.png' />

<center><b>Fig 39: Correlation of Team statistics (Batting 2nd)</b></center>

The above heatmap summarizes the correlation among all the factors (while Batting 1st) from <b> Figure 35 to 38 </b> and win percentage of teams. Below are some key findings:

<ul>
    <li>
        Teams whose top order have performed well, has won matches while batting 2nd.
    </li>
    <li>
        Similarly, it is observed that the teams who have achieved the highest percentage of the target during the first two powerplays and lost less wickets during PP-2 have won more matches.
    </li>
    <li>
        Teams which have scored a high percentage of target runs during the first powerplay would achieve a higher percentage of the remaning runs during the PP-2.
    </li>
    <li>
        Teams whose top order scores the most runs is observed to have lost fewer wickets during PP-1.
    </li>
</ul>
Now let's look at the bowling stats for second innings.

From the <b>Figure 12. </b>, it can be seen top five teams who have performed well bowling second (batting first) are:- India, England, South Africa, Australia and New Zealand.

In [33]:
IFrame(src='plots/analysis_plots/second_innings_detailed_stats_5.html', width=980, height=1000)

<center><b>Fig 40: Plot showing economy rate conceded while defending totals (Bowling 2nd)</b></center>

Among all the top 5 teams (barring South Africa) who have successfully defended totals, it is observed that they have conceded less than 5.5 runs per over.

In [34]:
IFrame(src='plots/analysis_plots/second_innings_detailed_stats_6.html', width=980, height=1000)

<center><b>Fig 41: Plot showing the Wickets taken by all Teams per Match in different phases</b></center>

It is clearly visible that the top teams who have successfully defended totals have taken wickets in all three powerplays (i.e., some teams have taken more wickets in first two powerplays where others have taken in the last 2 Powerplays).

<img src = 'plots/analysis_plots/overall_second_bowling_heatmap_1.png' />

<center><b>Fig 42: Correlation of Team statistics (Bowling 2nd)</b></center>

The above heatmap summarizes the correlation among all the factors (while Bowling 2nd) from <b> Figure 40 and 41 </b> and win percentage of teams. Below are some key findings:

<ul>
    <li>
        Teams who have won bowling 2nd have taken wickets in all three powerplays.
    </li>
    <li>
        Teams who pick more wickets in PP-1 and PP-2 slow down the run scoring rate in PP-2.
    </li>
    <li>
        Teams who pick more wickets in PP-1 tend to take higher amount of wickets in PP-2.
    </li>
</ul>

Now let's analyse the toss details and venue trends.

#### Toss Analysis 

In [35]:
IFrame(src='plots/analysis_plots/toss_analysis_1.html', width=980, height=1000)

<center><b>Fig 43: Pie showing the Decisions after winning the toss</b></center>

It shows that the team captains winning the toss have chosen to bat or field almost equal number of times.

In [36]:
IFrame(src='plots/analysis_plots/toss_analysis_2.html', width=980, height=1000)

<center><b>Fig 44: Plot showing Total Number of Tosses won by each Team</b></center>

<img src = 'plots/analysis_plots/toss_win_heatmap_1.png' />

<center><b>Fig 45: Correlation between Teams Winning the Toss and Match</b></center>

Interestingly, there is a slight negative correlation between the toss outcomes on match results. It shows that the teams who have won the toss have a chance to lose the match.

#### Venue Trends

In [37]:
IFrame(src='plots/analysis_plots/venue_trends_1.html', width=980, height=1000)

<center><b>Fig 46: Plot showing the number of times each team played at a particular venue</b></center>

This plot shows that every other team has played at least twice in a single venue apart from India who have played their games in 9 different venues.

In [38]:
IFrame(src='plots/analysis_plots/venue_trends_2.html', width=980, height=1000)

<center><b>Fig 47: Plots showing the number of times that Teams won Batting 1st vs 2nd</b></center>

From the above graph, he following conclusions can be drawn:
<ul>
    <li>
        The teams batting first in Wankhede Stadium and Eden Gardens have a significant advantage over teams batting second.
    </li>
    <li>
        Teams chasing targets in MA Chidambaram Stadium and Narendra Modi Stadium have won more matches as compared to the teams who have batted first.
    </li>
</ul>

In [39]:
IFrame(src='plots/analysis_plots/venue_trends_3.html', width=980, height=1000)

<center><b>Fig 48: Plots showing the Average Runs Scored per Wicket at Different Venues</b></center>

It is observed that Wankhede Stadium has the highest difference between batting first and second (confirming the high number of wins for teams batting first in <b>Figure 47</b>). On the other hand, Narendra Modi Stadium has highest runs scored per wicket lost for second innings as compared to the first innings (confirming the high number of wins for teams batting second in <b>Figure 47</b>).

In [40]:
IFrame(src='plots/analysis_plots/venue_trends_4.html', width=980, height=1000)

<center><b>Fig 49: Plots showing the Average Run Rate in 1st and 2nd innings at different Venues</b></center>

From the above plot, it is observed that run scoring drops in the second innings as compared to first innings in Wankhede. On the other hand, a higher run rate is observed in the 2nd Batting innings at Narendra Modi Stadium.

Now let's look at some Individual Performances.

#### Individual Performance Stats

In [41]:
IFrame(src='plots/analysis_plots/scatter_1.html', width=980, height=1000)

<center><b>Fig 50: Scatter chart showing Runs Scored v/s strike Rate</b></center>

It shows that teams who have performed well have had batsmen scoring more runs at a higher strike rate.

In [42]:
IFrame(src='plots/analysis_plots/scatter_2.html', width=980, height=1000)

<center><b>Fig 51: Scatter chart showing Runs Scored v/s Batting Average</b></center>

From the above figure, there seems to be a positive correlation between Players scoring runs and batting average. 

In [43]:
IFrame(src='plots/analysis_plots/scatter_3.html', width=980, height=1000)

<center><b>Fig 52: Scatter chart showing Wickets Taken v/s Bowling Strike Rate</b></center>

Similarly, it could be seen that the players from the top-performing teams have taken the most wickets at a good strike rate. There seems to be a negative correlation between these two variables.

In [44]:
IFrame(src='plots/analysis_plots/scatter_4.html', width=980, height=1000)

<center><b>Fig 53: Scatter chart showing Wickets Taken v/s Economy Rate</b></center>

From the above graph, it can be observed that some bowlers have managed to take wickets with lower economy rate. However, there are lots of bowlers who have taken wickets but conceded runs at a higher economy rate.

In [45]:
IFrame(src='plots/analysis_plots/scatter_5.html', width=980, height=1000)

<center><b>Fig 54: Scatter chart showing Wickets Taken v/s Bowling Average</b></center>

From the above figure, there seems to be a negative correlation between wickets taken and bowling average.