![Data Dunkers Banner](https://github.com/PS43Foundation/data-dunkers/blob/main/docs/top-banner.jpg?raw=true)

# Visualizing Data

In this notebook, we'll visualize data using the [Plotly Express](https://plotly.com/python/plotly-express/) library, imported as `px`.

## Bar Graphs

We will start with the same dataset from our last notebook and create a bar graph.

In [1]:
#import piplite
#await piplite.install(['pandas', 'plotly', 'nbformat', 'statsmodels'])
import pandas as pd
import plotly.express as px

df = pd.read_csv('data/player_stats_2023_24.csv')

px.bar(df, x='PLAYER_NAME', y='PTS')

We can also adjust different parameters in our graph. These adjustments not only enhance visual clarity but also provide essential context. For example, adjusting `x` and `y` label names and adding a title.

In [None]:
labels={"PLAYER_NAME":"Player Name","PTS":"Points Scored"}
title = "Points Scored by NBA Player in 2023-24"
px.bar(df, x='PLAYER_NAME', y='PTS', labels=labels, title=title)

If we prefer a horizontal bar graph we can use `orientation='h'` and switch the `x` and `y` columns.

We'll also use `df.head()` to include just the first 5 entries of our dataframe.

In [None]:

px.bar(df.head(), x='PTS', y='PLAYER_NAME', orientation='h', labels={"PLAYER_NAME":"Player Name","PTS":"Points Scored"}, title="Points Scored by NBA Player in 2023-24")

Let's take another look at the columns in our dataframe.

In [None]:
df.columns

If we want to include multiple columns from our dataset, we can put them in a list using `[]` brackets.

In [None]:
px.bar(df, x='PLAYER_NAME', y=['PTS', 'REB'], labels={"PLAYER_NAME":"Player Name","value":"Points and Rebounds"}, title="Points and Rebounds by NBA Player in 2023-24")

By default it will stack the bars, we can use `barmode='group'` to put them side by side.

We'll use `df.head(15)` to display the first 15 entries of our dataframe.

In [None]:
px.bar(df.head(15), x='PLAYER_NAME', y=['PTS', 'REB'], barmode='group', labels={"PLAYER_NAME":"Player Name","value":"Points and Rebounds"}, title="Points and Rebounds by NBA Player in 2023-24")

## Exercise

---

Create a bar graph using your `df` dataframe:

* display the name of the NBA player on the x-axis
* display the number of assists they have on the y-axis
* add an appropriate title and appropriate labels to your plot
* add `, color = 'TEAM_ABBREIVATION'`

In [None]:
# Write your code in this cell.




## Scatter Plots

Another way to visualize your data is using a scatter plot. We will choose two of the columns for the dataframe:

| Column Name        | Description                                              |
|--------------------|----------------------------------------------------------|
| PLAYER_ID          | Unique ID for a player                                   |
| PLAYER_NAME        | Player's full name                                       |
| NICKNAME           | Player's nickname (if applicable)                        |
| TEAM_ID            | Unique ID for the player's team                          |
| TEAM_ABBREVIATION  | Abbreviation of the player's team                        |
| AGE                | Player's age                                             |
| GP                 | Games played                                             |
| W                  | Wins                                                     |
| L                  | Losses                                                   |
| W_PCT              | Winning percentage                                       |
| MIN                | Minutes played                                           |
| FGM                | Field goals made                                         |
| FGA                | Field goals attempted                                    |
| FG_PCT             | Field goal percentage                                    |
| FG3M               | Three-point field goals made                             |
| FG3A               | Three-point field goals attempted                        |
| FG3_PCT            | Three-point field goal percentage                        |
| FTM                | Free throws made                                         |
| FTA                | Free throws attempted                                    |
| FT_PCT             | Free throw percentage                                    |
| OREB               | Offensive rebounds                                       |
| DREB               | Defensive rebounds                                       |
| REB                | Total rebounds                                           |
| AST                | Assists                                                  |
| TOV                | Turnovers                                                |
| STL                | Steals                                                   |
| BLK                | Blocks                                                   |
| BLKA               | Blocked attempts against                                 |
| PF                 | Personal fouls                                           |
| PFD                | Personal fouls drawn                                     |
| PTS                | Points scored                                            |
| PLUS_MINUS         | Plus/minus rating (player's team points scored vs. opponent's team points scored while player is on the court) |

In [3]:
px.scatter(df, x='FGA', y='FGM', title='Field Goals Made versus Field Goal Attempts', labels={'FGA':'Field Goal Attempts', 'FGM':'Field Goals Made'})

Looking at the plot, we see the number of field goal attempts by a player and the number of field goals they actually made. To add more *context* to this plot, we can use the `hover_data` parameter.

In [4]:
px.scatter(df, x='FGA', y='FGM', hover_data=['PLAYER_NAME'], title='Field Goals Made versus Field Goal Attempts', labels={'FGA':'Field Goal Attempts','FGM':'Field Goals Made','PLAYER_NAME':'Player'})

Or we could use `color='PLAYER_NAME'` to show player's names.

In [6]:
px.scatter(df, x='FGA', y='FGM', color='PLAYER_NAME', title='Field Goals Made versus Field Goal Attempts', labels={'FGA':'Field Goal Attempts','FGM':'Field Goals Made','PLAYER_NAME':'Player'})

Now we see the actual names of each NBA player in the plot, making the visualization more informative.

We can also include a line of best fit, called a trendline. We often use the [ordinary least squares](https://en.wikipedia.org/wiki/Ordinary_least_squares) method of calculating the trendline. To add a trendline, use the parameter `trendline`.

In [7]:
px.scatter(df, x='FGA', y='FGM', hover_data=['PLAYER_NAME'], trendline='ols', title='Field Goals Made versus Field Goal Attempts', labels={'FGA':'Field Goal Attempts','FGM':'Field Goals Made','PLAYER_NAME':'Player'})

We can also add more parameters with `color` and `size` to visualize other columns from our dataframe.

In [8]:
labels = {'FGA':'Field Goal Attempts', 'FGM':'Field Goals Made', 'PLAYER_NAME':'Player', 'TEAM_ABBREVIATION':'Team', 'FG_PCT':'Field Goal Percentage'}
title = 'Field Goals Made versus Field Goal Attempts'
px.scatter(df, x='FGA', y='FGM', color='TEAM_ABBREVIATION', size='FG_PCT', hover_data=['PLAYER_NAME'], title=title, labels=labels)

## Exercise

Create a scatter plot using your `df` dataframe. Use the parameters `AGE` and `PTS` and add a trendline. Make sure to include an appropriate title and labels.

Compare this trendline to the previous plot comparing `Field Goal Attempts` and `Field Goals Made`. Do these variables have a stronger correlation then the previous plots variables?

In [None]:
# Write your code in this cell.




## Other Visualizations

Plotly has functions to create many different types of visualizations, listed on the [Plotly Express](https://plotly.com/python/plotly-express/) page. Let's try a pie chart.

Note: `.tail()` is the opposite of `.head()`, it will return the last 5 rows in our dataframe instead of the first 5.

In [None]:
df

In [None]:
px.pie(df, values='PTS', names='TEAM_ABBREVIATION', title='Points Scored by each NBA Team', labels={'value': 'Points Scored', 'variable': 'Team'})

In the [next notebook](03-mini-basketball-data.ipynb) we will use your own data using the mini-basketball or a regular basketball hoop to filter and visualize data.