<a href="https://colab.research.google.com/github/dtawneyd/nfl_mvps/blob/main/nfl_mvps.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

**NFL MVP/Super Bowl MVP Percentage**

import packages.

In [81]:
from bs4 import BeautifulSoup as Soup
from pandas import DataFrame
import pandas as pd
import requests

Scrape historical data for NFL MVPs and Super Bowl MVPs.

In [82]:
league = requests.get('https://eatdrinkandsleepfootball.com/history/awards/mvp.html')
league_soup = Soup(league.text, features='lxml')
league_table = league_soup.find_all('table')  # finding all tags labeled table for choosing table in next line
league_table = league_table[0]  # choosing the first table on website
league_rows = league_table.find_all('tr')  # finding all tags labeled tr to find table rows

superbowl = requests.get('https://www.foxsports.com/nfl-super-bowl-mvps')
superbowl_soup = Soup(superbowl.text, features='lxml')
superbowl_table = superbowl_soup.find_all('table')
superbowl_table = superbowl_table[0]
superbowl_rows = superbowl_table.find_all('tr')

Writing a fuction that I found from Learn to Code with Fantasy Football to take the BeautifulSoup objects and convert them to python strings.

In [83]:
def parse_row(row):
    return [str(x.string) for x in row.find_all('td')]

Using the parse_row function and DataFrame method to convert to Pandas dataframe 

In [84]:
list_parsed_superbowl = [parse_row(row) for row in superbowl_rows]
superbowl_mvp = DataFrame(list_parsed_superbowl)

list_parsed_mvp = [parse_row(row) for row in league_rows[1:]]
league_mvp = DataFrame(list_parsed_mvp)

The following code takes the Super Bowl data and cleans the data to help with merging.

In [85]:
superbowl_mvp.columns = ['superbowl', 'year', 'superbowl_mvp', 'position', 'winning_team']  # naming the columns
superbowl_mvp = superbowl_mvp[1:]  # slicing to remove first row of irrelevant data
superbowl_mvp.set_index('superbowl', inplace=True)  # setting index to Superbowl number
superbowl_mvp.drop('position', axis=1, inplace=True)  # dropping the position columns
superbowl_mvp['winning_team'] = (superbowl_mvp['winning_team'].apply(lambda x: x.split(' ')[-1]))  # removing team name
# from winning_team (i.e. "Packers" from "Green Bay Packers" to match team name format in league MVP table

The following code does the same cleaning as before, but for the league MVP data.

In [86]:
league_mvp.columns = ['year', 'league_mvp']  # setting the name of the columns
league_mvp.sort_values('year', ascending=True, inplace=True)  # sorting the table by year
league_mvp = league_mvp[9:]  # this slice eliminates the league MVPs that were awarded before first Superbowl
league_mvp['mvp_team'] = (league_mvp['league_mvp'].apply(lambda x: x.split(', ')[-1]))
league_mvp['league_mvp'] = (league_mvp['league_mvp'].apply(lambda x: x.split(', ')[0]))
league_mvp['league_mvp'] = league_mvp['league_mvp'].str.replace('NFL: ', '')

Merge the league MVP and Super Bowl data frames.

In [87]:
df = pd.merge(league_mvp, superbowl_mvp, on='year')

Add a boolean column to the merged data frame that states whether the same player won both awards for a year or not.

In [88]:
df['league_and_superbowl'] = df['league_mvp'] == df['superbowl_mvp']
same_player = df['league_and_superbowl'] == True

Finding the average number of times the player won both awards and printing results.

In [89]:
average = round((same_player.mean() * 100), 1)

In [90]:
print(f'Since Super Bowl I (1967), the NFL MVP has also won the Super Bowl MVP {average}% of the time!')

Since Super Bowl I (1967), the NFL MVP has also won the Super Bowl MVP 7.1% of the time!
