# Basketball Playoffs Qualification Prediction

## Data Understanding - Information about the Dataset

[**awards_players.csv**](data/original/awards_players.csv) - a table with 96 rows that relates players to awards across 10 seasons.
| Column Name | Description                                  |
|-------------|----------------------------------------------|
| playerID    | A unique code assigned to each player.       |
| award       | The award that the player won.               |
| year        | The year that the player won the award.      |
| lgID        | The league that the player won the award in. |



[**coaches.csv**](data/original/coaches.csv) - a table with 163 rows that describes coaches who've managed teams across 10 seasons.
| Column Name | Description                                           |
|-------------|-------------------------------------------------------|
| coachID     | A unique code assigned to each coach.                 |
| year        | The year that the coach coached the team.             |
| tmID        | The team that the coach coached.                      |
| lgID        | The league that the coach coached in.                 |
| stint       | The number of times the coach coached the team. ???   |
| won         | The number of games the coach won in regular season.  |
| lost        | The number of games the coach lost in regular season. |
| post_wins   | The number of games the coach won in playoffs.        |
| post_losses | The number of games the coach lost in playoffs.       |

[**players.csv**](data/original/players.csv) - a table with 894 rows that contains all the details about the players.
| Column Name  | Description                                  |
|--------------|----------------------------------------------|
| bioID        | A unique code assigned to each player.       |
| pos          | The position that the player plays.          |
| firstseason  | The year that the player started playing.    |
| lastseason   | The year that the player stopped playing.    |
| height       | The height of the player in inches.          | 
| weight       | The weight of the player in pounds.          |
| college      | The college that the player attended.        |
| collegeOther | The other colleges that the player attended. |
| birthDate    | The birth date of the player.                |
| deathDate    | The death date of the player.                |



[**players_teams.csv**](data/original/players_teams.csv) - a table with 1877 that describes the performance of each player for each team they played.
| Column Name | Description                                           |
|-------------|-------------------------------------------------------|
| playerID    | A unique code assigned to each player.                |
| year        | The year that the player played for the team.         |
| stint       | The number of times the player played for the team.?? |
| tmID        | The team that the player played for.                  |
| lgID        | The league that the player played in.                 |
| GP          | The number of games the player played.                |
| GS          | The number of games the player started.               |
|...          | ...                                                   |

[**series_post.csv**](data/original/series_post.csv) - a table with 71 rows that describes the series' results.
| Column Name | Description                                 |
|-------------|---------------------------------------------|
| year        | The year that the series was played.        |
| round       | The round of the series.                    |
| series      | The series label.                           |
| tmIDWinner  | The team that won the series.               |
| lgIDWinner  | The league that the winning team played in. |
| tmIDLoser   | The team that lost the series.              |
| lgIDLoser   | The league that the losing team played in.  |
| W           | The number of games the winning team won.   |
| L           | The number of games the losing team won.    |

[**teams.csv**](data/original/teams.csv) - a table with 143 rows that describes the performance of teams for each season.
| Column Name | Description                                             |
|-------------|---------------------------------------------------------|
| year        | The year that the team played.                          |
| lgID        | The league that the team played in.                     |
| tmID        | The team code.                                          |
| franchID    | The franchise code.                                     |
| confID      | The conference code.                                    |
| divID       | The division code.                                      |
| rank        | The rank of the team in the season.                     |
| playoff     | Whether the team qualified for the playoffs or not.     |
| seeded      | Whether the team was seeded in the playoffs or not.     |
| firstRound  | The result of the team in the first round of playoffs.  |
| semis       | The result of the team in the semi-finals of playoffs.  |
| finals      | The result of the team in the finals of playoffs.       |
| name        | The name of the team.                                   |
| o_fgm       | The number of field goals made by the team.             |
| o_fga       | The number of field goals attempted by the team.        |
| o_ftm       | The number of free throws made by the team.             |
| o_fta       | The number of free throws attempted by the team.        |
| o_3pm       | The number of three-point field goals made by the team. |
| ...         | ...                                                     |

[**teams_post.csv**](data/original/teams_post.csv) - a table with 81 rows that describes the results of each team at the post-season.
| Column Name | Description                                             |
|-------------|---------------------------------------------------------|
| year        | The year that the team played.                          |
| lgID        | The league that the team played in.                     |
| W           | The number of games the team won in the post-season.    |
| L           | The number of games the team lost in the post-season.   |










## Data Exploration

In [1]:
#Importing libraries
import pandas as pd

awards_players = pd.read_csv('data/original/awards_players.csv')
coaches = pd.read_csv('data/original/coaches.csv')
players_teams = pd.read_csv('data/original/players_teams.csv')
players = pd.read_csv('data/original/players.csv')
series_post = pd.read_csv('data/original/series_post.csv')
teams_post = pd.read_csv('data/original/teams_post.csv')
teams = pd.read_csv('data/original/teams.csv')

#### Check for missing values

In [3]:
# Check missing values
print(awards_players.isnull().sum())
print(awards_players.info())

playerID    0
award       0
year        0
lgID        0
dtype: int64
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 95 entries, 0 to 94
Data columns (total 4 columns):
 #   Column    Non-Null Count  Dtype 
---  ------    --------------  ----- 
 0   playerID  95 non-null     object
 1   award     95 non-null     object
 2   year      95 non-null     int64 
 3   lgID      95 non-null     object
dtypes: int64(1), object(3)
memory usage: 3.1+ KB
None


In [4]:
print(coaches.isnull().sum())

coachID        0
year           0
tmID           0
lgID           0
stint          0
won            0
lost           0
post_wins      0
post_losses    0
dtype: int64


In [5]:
print(players_teams.isnull().sum())

playerID              0
year                  0
stint                 0
tmID                  0
lgID                  0
GP                    0
GS                    0
minutes               0
points                0
oRebounds             0
dRebounds             0
rebounds              0
assists               0
steals                0
blocks                0
turnovers             0
PF                    0
fgAttempted           0
fgMade                0
ftAttempted           0
ftMade                0
threeAttempted        0
threeMade             0
dq                    0
PostGP                0
PostGS                0
PostMinutes           0
PostPoints            0
PostoRebounds         0
PostdRebounds         0
PostRebounds          0
PostAssists           0
PostSteals            0
PostBlocks            0
PostTurnovers         0
PostPF                0
PostfgAttempted       0
PostfgMade            0
PostftAttempted       0
PostftMade            0
PostthreeAttempted    0
PostthreeMade   

In [6]:
print(players.isnull().sum())

bioID             0
pos              78
firstseason       0
lastseason        0
height            0
weight            0
college         167
collegeOther    882
birthDate         0
deathDate         0
dtype: int64


In [7]:
print(series_post.isnull().sum())

year          0
round         0
series        0
tmIDWinner    0
lgIDWinner    0
tmIDLoser     0
lgIDLoser     0
W             0
L             0
dtype: int64


In [8]:
print(teams_post.isnull().sum())

year    0
tmID    0
lgID    0
W       0
L       0
dtype: int64


In [9]:
print(teams.isnull().sum())

year        0
lgID        0
tmID        0
franchID    0
confID      0
           ..
confW       0
confL       0
min         0
attend      0
arena       0
Length: 61, dtype: int64


- We verified that there are only missing values in the dataset "players.csv" 

# Check for invalid values