![Data Dunkers Banner](https://github.com/PS43Foundation/data-dunkers/blob/main/docs/top-banner.jpg?raw=true)

<a href="https://hub.callysto.ca/jupyter/hub/user-redirect/git-pull?repo=https%3A%2F%2Fgithub.com%2FData-Dunkers%2Fdata-dunkers-modules&branch=main&subPath=6-hour-module/03-mini-basketball-data.ipynb&depth=1" target="_parent"><img src="https://raw.githubusercontent.com/callysto/curriculum-notebooks/master/open-in-callysto-button.svg?sanitize=true" width="123" height="24" alt="Open in Callysto"/></a>

# Obtaining Basketball Data

In this notebook, we'll be obtaining data derived from your own basketball shots. From there, we will visualize, assess, and adjust our data to analyze our data in new ways.

Take a few shots on your basketball hoop and record:

1. The Distance of the Shot (in metres)
2. Whether the Shot Was Made (Y or N)

Read your data using `.read_csv()` following the steps in the previous notebook about obtaining Google Forms data.

In this example, we'll be using a `.csv` file that has already been filled out.

In [None]:
import piplite
await piplite.install(['pandas', 'plotly', 'nbformat'])

import pandas as pd 
import plotly.express as px

# example mini-hoop data from google sheets
sheet_id = '1JhJKdcBwrlTmmr7vNaK7rncwp1TXmF7PlsKz9QQ84io'
mini_hoop_data = pd.read_csv(f"https://docs.google.com/spreadsheets/export?id={sheet_id}&format=csv")
mini_hoop_data

We see that we have 3 columns in our dataframe: `Timestamp`, `Distance of shot (m)`, and `Shot Made?`.

Let's make a simple visualization using our data.

In [None]:
px.scatter(mini_hoop_data, x="Distance of Shot (m)", y="Shot Made?")

Looking at the visualization, we can generally see that shots from closer distances area making it in more often. Let's create a stacked-bar graph to better understand the **frequencies** (number of times a data value occurs) of shots made.

In [None]:
# group data by distance and shot made, then 
mini_hoop_data_stacked = mini_hoop_data.groupby(['Distance of Shot (m)', 'Shot Made?'])

# .size() calculates the size of each group, or the number of rows in each group
# .unstack() then pivots the data so that each group becomes a column, and fill_value=0 sets missing values to 0
mini_hoop_data_stacked = mini_hoop_data_stacked.size().unstack(fill_value=0)

px.bar(mini_hoop_data_stacked, x='Distance of Shot (m)', y=['Yes', 'No'], title='Shot Made vs. Shot Missed at Different Distances',labels={'value': 'Number of Shots', 'Distance of Shot (m)': 'Distance (m)', 'variable': 'Shot Made'},color_discrete_sequence=['green', 'red'])

## Exercise

---

Filter your basketball data collected from Google Forms into two separate groups: one containing shots made at distances of *4 meters or closer*, and another containing shots made at distances of *4 meters or farther (including 4)*. Then, create a visualization using both groups of data.

In [None]:
# Write your code in this cell.




## Adding More Variables

We can make some nice visualizations with the data we have now, but we can improve and look at more specific reasons to why certain shots go in more often by adding more variables to our dataset. Let's look at a similar example dataset where a few external variables have been accounted for.

In [None]:
sheet_id = '19CtQAUpTy89QAi2rjyCr-HNoJMOtM_IUMR4tBjFvHyg'

mini_hoop_data = pd.read_csv(f"https://docs.google.com/spreadsheets/export?id={sheet_id}&format=csv")
mini_hoop_data

Looking at our new dataframe, we see there are 4 new columns: `Horizontal Distance (m)`, `Vertical Distance (m)`, `Player Height (cm)`, and `Shot Passed?`.

In this case, the negative values of `Horizontal Distance (m)` indicate a shot from the left of the hoop, and a positive value indicates a shot from the right of the hoop.

In [None]:
px.scatter(mini_hoop_data, x='Horizontal Distance (m)', y='Vertical Distance (m)', color='Shot Made?',size='Player Height (cm)', hover_data=['Shot Passed?'], title='Shot Outcome by Horizontal and Vertical Distance')

In the visualization above, we use a total of 5 different variables as parameters, showing how we can use a variety of variables to improve the quality of a visualization.

## Exercise

---

Similarly to our new dataframe, add some new variables that could affect whether a shot is made or not. Create a visualization using 3-5 different variables as parameters. 

Which variables appear to have the greatest effect on whether a shot is made or not?

In [None]:
# Write your code in this cell.




---

## Project Ideas

Now that you've gone through the basics of manipulating data in Python, creating visualizations in Plotly, and adding external variables to your own data, it's time to create your own project! Here are some project ideas to get you started:

1. **Shot Analysis Dashboard**
   - Analyze basketball shot data from various games or players.
   - Explore shot accuracy based on factors such as shot distance, player height, shot type, and defender distance.

2. **Player Performance Comparisons**
   - Gather statistics of multiple basketball players from a specific league/team and compare using your own basketball data.
   - Compare player performance metrics such as points per game, assists, rebounds, field goal percentage, etc.

3. **Player Shot Efficiency Dashboard**
   - Analyze the efficiency of basketball players' shots, including NBA players and yourself.
   - Explore factors affecting shot efficiency such as shot angle, shot distance, player position, and defender proximity.

---

## Congratulations!

You've completed the series of notebooks introducing the basics of data science!

Remember, the best way to solidify your understanding and skills is through practice and experimentation. Continue to apply what you've learned to real-world datasets and projects.

Keep exploring, keep learning, and enjoy your data science journey!

Happy coding! 🎉🐍📊