# Using BeautifulSoup

## Let's get the data about Kings XI Punjab's performance in the IPL

We need to import the following libraries.

In [1]:
import requests
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from bs4 import BeautifulSoup

%matplotlib inline

Define some placeholder variables.

In [2]:
URL = 'https://en.wikipedia.org/wiki/20{:02d}_Indian_Premier_League'
SEASON = 'IPL-{:02d}'

### Generating urls
This is where we will get our data from.

IPL-01 https://en.wikipedia.org/wiki/2008_Indian_Premier_League

IPL-02 https://en.wikipedia.org/wiki/2009_Indian_Premier_League

IPL-03 https://en.wikipedia.org/wiki/2010_Indian_Premier_League

... 

IPL-11 https://en.wikipedia.org/wiki/2018_Indian_Premier_League

IPL-12 https://en.wikipedia.org/wiki/2019_Indian_Premier_League

**How will you generate these URLs?**

In [None]:
# WRITE A CODE TO GENERATE ABOVE URLS AND STORE THEM IN A LIST
urls = []
for i in range(8,20):
    urls.append(URL.format(i))

In [4]:
urls

['https://en.wikipedia.org/wiki/2008_Indian_Premier_League',
 'https://en.wikipedia.org/wiki/2009_Indian_Premier_League',
 'https://en.wikipedia.org/wiki/2010_Indian_Premier_League',
 'https://en.wikipedia.org/wiki/2011_Indian_Premier_League',
 'https://en.wikipedia.org/wiki/2012_Indian_Premier_League',
 'https://en.wikipedia.org/wiki/2013_Indian_Premier_League',
 'https://en.wikipedia.org/wiki/2014_Indian_Premier_League',
 'https://en.wikipedia.org/wiki/2015_Indian_Premier_League',
 'https://en.wikipedia.org/wiki/2016_Indian_Premier_League',
 'https://en.wikipedia.org/wiki/2017_Indian_Premier_League',
 'https://en.wikipedia.org/wiki/2018_Indian_Premier_League',
 'https://en.wikipedia.org/wiki/2019_Indian_Premier_League']

Since we now have the urls, we can create a dataframe.

Use the code template below to create our output dataframe.

Our dataframe will look like this:

![](ipl.png)

In [12]:
df = pd.DataFrame()

for i in range(8,20):
    url = URL.format(i)
    season = SEASON.format(i - 7)
    
    response = requests.get(url)
    soup = BeautifulSoup(response.content,'html.parser')
    
    trs = soup.find_all('tr')
    for tr in trs:    
        tags = tr.find_all('a')
        if len(tags) > 0:
            tds = tr.find_all('td')
            try:
                name = tds[0].text.strip()
                played = int(tds[1].text)
                wins = int(tds[2].text)
                
                if len(name) < 10:
                    continue
                name = name.split('(')[0]
                name = name.strip()
                
                row = [name, season, played, wins]
                df = df.append([row])
                print(row)
            except:
                continue
            
            row = [team, season, played, wins]
            df = df.append([row])

['Rajasthan Royals', 'IPL-01', 14, 11]
['Kings XI Punjab', 'IPL-01', 14, 10]
['Chennai Super Kings', 'IPL-01', 14, 8]
['Delhi Daredevils', 'IPL-01', 14, 7]
['Mumbai Indians', 'IPL-01', 14, 7]
['Kolkata Knight Riders', 'IPL-01', 14, 6]
['Royal Challengers Bangalore', 'IPL-01', 14, 4]
['Deccan Chargers', 'IPL-01', 14, 2]
['Delhi Daredevils', 'IPL-02', 14, 10]
['Chennai Super Kings', 'IPL-02', 14, 8]
['Royal Challengers Bangalore', 'IPL-02', 14, 8]
['Deccan Chargers', 'IPL-02', 14, 7]
['Kings XI Punjab', 'IPL-02', 14, 7]
['Rajasthan Royals', 'IPL-02', 14, 6]
['Mumbai Indians', 'IPL-02', 14, 5]
['Kolkata Knight Riders', 'IPL-02', 14, 3]
['Mumbai Indians', 'IPL-03', 14, 10]
['Deccan Chargers', 'IPL-03', 14, 8]
['Chennai Super Kings', 'IPL-03', 14, 7]
['Royal Challengers Bangalore', 'IPL-03', 14, 7]
['Delhi Daredevils', 'IPL-03', 14, 7]
['Kolkata Knight Riders', 'IPL-03', 14, 7]
['Rajasthan Royals', 'IPL-03', 14, 6]
['Kings XI Punjab', 'IPL-03', 14, 4]
['Delhi Daredevils', 'IPL-05', 16, 11]


In [13]:
df

Unnamed: 0,0,1,2,3
0,Rajasthan Royals,IPL-01,14,11
0,,IPL-01,14,11
0,Kings XI Punjab,IPL-01,14,10
0,,IPL-01,14,10
0,Chennai Super Kings,IPL-01,14,8
...,...,...,...,...
0,,IPL-12,14,6
0,Rajasthan Royals,IPL-12,14,5
0,,IPL-12,14,5
0,Royal Challengers Bangalore,IPL-12,14,5


## Aggregating the Data

Create a mean dataframe and a dictionary of team-wise dataframes

Calculate the win % as shown.

`df['Win %'] = df['Wins'] / df['Played'] * 100`

In [None]:
seasons = None
teams = None

mean_df = None

team_dfs = {}

## Plot the data

Use the code below to plot a chart as shown

![](plot.png)

In [None]:
plt.figure(figsize=(12,9))

plt.plot()

x = plt.gca().xaxis

for item in x.get_ticklabels():
    item.set_rotation(45)
plt.subplots_adjust(bottom=0.25)

plt.legend(loc=2)

ax = plt.gca()
ax.set_yticks(np.arange(0,110,10))

ax.set_ylabel('Win Percentage')
ax.set_xlabel('Season')
ax.set_title('Indian Premier League');