# Basketball Playoffs Qualification Prediction

## Data Understanding - Information about the Dataset

[**awards_players.csv**](data/original/awards_players.csv) - a table with 95 rows that relates players to awards across 10 seasons.
| Column Name | Description                                  |
|-------------|----------------------------------------------|
| playerID    | A unique code assigned to each player.       |
| award       | The award that the player won.               |
| year        | The year that the player won the award.      |
| lgID        | The league that the player won the award in. |



[**coaches.csv**](data/original/coaches.csv) - a table with 163 rows that describes coaches who've managed teams across 10 seasons.
| Column Name | Description                                           |
|-------------|-------------------------------------------------------|
| coachID     | A unique code assigned to each coach.                 |
| year        | The year that the coach coached the team.             |
| tmID        | The team that the coach coached.                      |
| lgID        | The league that the coach coached in.                 |
| stint       | The number of times the coach coached the team. ???   |
| won         | The number of games the coach won in regular season.  |
| lost        | The number of games the coach lost in regular season. |
| post_wins   | The number of games the coach won in playoffs.        |
| post_losses | The number of games the coach lost in playoffs.       |

[**players.csv**](data/original/players.csv) - a table with 894 rows that contains all the details about the players.
| Column Name  | Description                                  |
|--------------|----------------------------------------------|
| bioID        | A unique code assigned to each player.       |
| pos          | The position that the player plays.          |
| firstseason  | The year that the player started playing.    |
| lastseason   | The year that the player stopped playing.    |
| height       | The height of the player in inches.          | 
| weight       | The weight of the player in pounds.          |
| college      | The college that the player attended.        |
| collegeOther | The other colleges that the player attended. |
| birthDate    | The birth date of the player.                |
| deathDate    | The death date of the player.                |



[**players_teams.csv**](data/original/players_teams.csv) - a table with 1877 that describes the performance of each player for each team they played.
| Column Name | Description                                           |
|-------------|-------------------------------------------------------|
| playerID    | A unique code assigned to each player.                |
| year        | The year that the player played for the team.         |
| stint       | The number of times the player played for the team.?? |
| tmID        | The team that the player played for.                  |
| lgID        | The league that the player played in.                 |
| GP          | The number of games the player played.                |
| GS          | The number of games the player started.               |
|...          | ...                                                   |

[**series_post.csv**](data/original/series_post.csv) - a table with 71 rows that describes the series' results.
| Column Name | Description                                 |
|-------------|---------------------------------------------|
| year        | The year that the series was played.        |
| round       | The round of the series.                    |
| series      | The series label.                           |
| tmIDWinner  | The team that won the series.               |
| lgIDWinner  | The league that the winning team played in. |
| tmIDLoser   | The team that lost the series.              |
| lgIDLoser   | The league that the losing team played in.  |
| W           | The number of games the winning team won.   |
| L           | The number of games the losing team won.    |

[**teams.csv**](data/original/teams.csv) - a table with 143 rows that describes the performance of teams for each season.
| Column Name | Description                                             |
|-------------|---------------------------------------------------------|
| year        | The year that the team played.                          |
| lgID        | The league that the team played in.                     |
| tmID        | The team code.                                          |
| franchID    | The franchise code.                                     |
| confID      | The conference code.                                    |
| divID       | The division code.                                      |
| rank        | The rank of the team in the season.                     |
| playoff     | Whether the team qualified for the playoffs or not.     |
| seeded      | Whether the team was seeded in the playoffs or not.     |
| firstRound  | The result of the team in the first round of playoffs.  |
| semis       | The result of the team in the semi-finals of playoffs.  |
| finals      | The result of the team in the finals of playoffs.       |
| name        | The name of the team.                                   |
| o_fgm       | The number of field goals made by the team.             |
| o_fga       | The number of field goals attempted by the team.        |
| o_ftm       | The number of free throws made by the team.             |
| o_fta       | The number of free throws attempted by the team.        |
| o_3pm       | The number of three-point field goals made by the team. |
| ...         | ...                                                     |

[**teams_post.csv**](data/original/teams_post.csv) - a table with 81 rows that describes the results of each team at the post-season.
| Column Name | Description                                             |
|-------------|---------------------------------------------------------|
| year        | The year that the team played.                          |
| lgID        | The league that the team played in.                     |
| W           | The number of games the team won in the post-season.    |
| L           | The number of games the team lost in the post-season.   |










## Data Collection

In [65]:
#Importing libraries
import pandas as pd

awards_players = pd.read_csv('data/original/awards_players.csv')
coaches = pd.read_csv('data/original/coaches.csv')
players_teams = pd.read_csv('data/original/players_teams.csv')
players = pd.read_csv('data/original/players.csv')
series_post = pd.read_csv('data/original/series_post.csv')
teams_post = pd.read_csv('data/original/teams_post.csv')
teams = pd.read_csv('data/original/teams.csv')

dataframes = {
    'awards_players': awards_players,
    'coaches': coaches,
    'players_teams': players_teams,
    'players': players,
    'series_post': series_post,
    'teams_post': teams_post,
    'teams': teams
}

## Data Cleaning

### Check for missing values

In [66]:
# Check missing values

missing = False

for name, df in dataframes.items():
    if df.isnull().sum().sum() > 0:
        missing = True
        print(f'{name} has missing values in the following columns:')

        missing_columns = df.isnull().sum()
        for col, count in missing_columns.items():
            if count > 0:
                print('\t', end='')
                print(f'{col}: {count} missing values')
        
        print()

if not missing:
    print('No missing values found')

players has missing values in the following columns:
	pos: 78 missing values
	college: 167 missing values
	collegeOther: 882 missing values

teams has missing values in the following columns:
	divID: 142 missing values
	firstRound: 62 missing values
	semis: 104 missing values
	finals: 122 missing values



We verified that there are only missing values in the datasets [players](data/original/players.csv) and [teams](data/original/teams.csv).

# Check for invalid values