# NBA Players Statistics Brochure

The goal of this project is to create a PDF brochure with career statistics and earnings for a given NBA player. 

The brochure contains the following information: 

- name,
- position,
- nationality,
- when was drafted ,
- total years in the NBA,
- logo of the current team (last team played for if retired),
- average points per year,
- average assists per year,
- average rebounds per year ,
- earnings per year.

The statistics is obtained from Spotrac.com website. Only data after 2011 is available. The logos were downloaded from ESPN.com.


## Description

The project is done in two steps. First we download all the necessary data, and then create the brochure using that data.

### Collecting the data

The data is obtained from Spotrac.com using  **BeautifulSoup** Python library.

The following function is used to collect the data and store it in a CSV file.

Libraries used:

In [1]:
import pandas as pd
import urllib.request
from bs4 import BeautifulSoup

Before moving on to the function itself, we need to define just one helper function that cleans out the data in case a players has more than one source of income for a given year.

In [2]:
def clean_salary_entries(salary):
    """ Sums up all the income because sometimes players have retained earnings."""
    sal = salary.replace(',', '').split('$')
    sal = [int(s) for s in sal if s != '']
    return '$' + f"{sum(sal):,}"

The function accepts player's name as a parameter. It uses that name to search for that player's page using Spotrac internal search engine. If only one player with that name exists, we go directly to the page. If there are several players with that name, we choose the first one, since that is usually the most famous player, and we assume he is the one we are looking for.   

In [None]:
def download_stats(player):
    """ Collects stats and salary information for a player from Spotrac website as
    of 2024. Only stats after 2011 are available."""    
    # Step 1. Get players ID on Spotrac.
    player_name = player.lower().replace(' ', '+')

    page = urllib.request.urlopen(f"https://www.spotrac.com/search?q={player_name}")
    soup = BeautifulSoup(page, "html.parser")
    rows = soup.find_all("a", class_="list-group-item")

    players_page = ''
    if len(rows) > 0:
        id = rows[0]['href'].split('/')[-1]
        players_page = f"https://www.spotrac.com/nba/player/statistics/_/id/{id}"
    else:
        players_page = f"https://www.spotrac.com/search?q={player_name}"

First, we get all the earnings of the player since 2011 per year. Spotrac has the salary information before year 2011 also, but not the statistcs before that year. In order for the data to be consistent with subsequent info, we delete all the earnings information before 2011.

In [None]:
    # Step 2. Get salaries.
    page = urllib.request.urlopen(players_page)

    soup = BeautifulSoup(page, "html.parser")
    rows = soup.find_all("tr", class_="")
    earnings = []
    header_met = False
    for row in rows:
        r = row.text.replace(' ', '').split()
        if r == ['Year', 'Team(s)', 'Age', 'Base', 'Signing', 'Incentives', 'CashTotal', 'CashCumulative']:
            header_met = True
        if r == ['Years', 'Team', 'Base', 'Signing', 'Incentives', 'CashCumulative']:
            header_met = False
        if header_met:
            earnings.append(r)

    columns = ['Year', 'Age', 'BaseSalary', 'CashTotal', 'CashCumulative']
    new_columns = ['Year', 'Age', 'BaseSalary']
    salaries = pd.DataFrame(earnings[1:], columns=columns)
    salaries = salaries.drop(salaries[salaries['Year'].astype(int) < 2011].index)
    salaries = salaries[new_columns]
    salaries = salaries.reset_index(drop=True)

    salaries['BaseSalary'] = salaries['BaseSalary'].apply(clean_salary_entries)

Next, we collect the playing statistics: average points, assists and rebounds per game for each year. Once we have it, we concatenate this information with the earnings and save it to a CSV file. Pandas library is used to perform all of that.

We also store the name of the player's current team and if he is an active player in separate variables. We will need that later.


In [None]:
    # Step 3. Get stats.
    rows = soup.find_all("tr", class_="")
    statistics = []
    header_met = False
    for row in rows:
        r = row.text.replace(' ', '').split()
        if r == ['Year', 'Team', 'GP', 'GS', 'Min/Gm', 'Pt/Gm', 'Reb/Gm', 'Ast/Gm', 'Stl/Gm', 'Blk/Gm', 'FG%', '3PT%', 'FT%']:
            header_met = True
        if header_met: # Spotrac doesn't provide 'games started' info for 2011 and 2012.
            if r[0] in ['2011', '2012']:
                r.insert(3, 'n/a')
            statistics.append(r)

    statistics = pd.DataFrame(statistics[1:], columns=statistics[0])

    current_team_abrv = statistics['Team'].values.copy()[0]
    is_active = True if statistics['Year'].values[0] == '2024' else False

    statistics = statistics.drop(statistics[statistics['Year'].astype(int) == 2024].index)
    statistics = statistics.iloc[::-1].reset_index(drop=True)

    # Join stats and salaries into one table.
    statistics['Age'] = salaries['Age'].values
    statistics['BaseSalary'] = salaries['BaseSalary'].values
    final_columns = ['Year', 'Age', 'Team', 'GP', 'GS', 'Min/Gm', 'Pt/Gm', 'Reb/Gm', 'Ast/Gm', 'Stl/Gm', 'Blk/Gm', 'FG%', '3PT%', 'FT%', 'BaseSalary']
    statistics = statistics[final_columns]

    # Save stats to file.
    csv_file_name = f"{player.replace(' ', '_')}.csv"
    statistics.to_csv(f"out_data/{csv_file_name}")

Finally, we collect some important bio information and store it in a .txt file.

In [None]:
    # Step 4. Get players info.
    rows = soup.find_all("div", class_="row m-0 mt-0 pb-3")
    information = []
    for row in rows:
        r = row.text.split('/n')[0].split('\n')
        information.append(r)
    information = [info for info in information[0] if info != '']

    players_info = ['Team: ' + information[0], information[2], information[3], information[5], current_team_abrv, is_active]

    # Save to file.
    file_name = f"{player.replace(' ', '_')}.txt"
    with open(f"out_data/{file_name}", "w") as output:
        for row in players_info:
            output.write(str(row) + '\n')  

All the information is saved to /out_data folder. Player's name is used as a CSV filename. 

### Creating the Brochure

The brochure is created in PDF format using **fpdf** library.

All the statistical information is taken from the CSV file that was previously saved.

The code for this step involves a lot of text formatting, therefore is monotonous and bulky. For your convenience, it is not provided here. The code can be found in create_pdf.py file in the root directory for this project.

The resulting PDF file created is stored in the root directory also.

Note: some temporary file are being created while running this program. They are deleted at the end.

### Top level code

The following top level code creates a PDF brochure for every player whose name is provided by the user.

The download_stats function is imported from the corresponding file. In the code above, this function is broken into pieces for explanation purposes and therefore cannot be run directly from this notebook.

Note: The program has the following limitations: 1. If a player retired because of an injury but before his contract expired (was getting paid, but didn't play), the result might be unexpected. Example: Chris Bosh. 2. If a player has incentives in addition to the salary, the result might be corrupted also. Example: Doug McDermott.

In [None]:
from download_stats import download_stats
from create_pdf import create_pdf

players = ['Chris Paul', 'Russel Westbrook', 'Devin Booker', 'Kevin Love', 'Dwyane Wade']

for player in players:
    download_stats(player)
    create_pdf(player)