# Shot Selection & Efficiency Challenge

In basketball, not all points are created equal.

Some players score a lot because they take many shots.  
Others score efficiently by choosing better shots.

In this challenge, you will explore how **shot selection** relates to **scoring efficiency** using real NBA player data.

You will:
- Choose a player to study
- Decide which stats (fields/columns) matter
- Choose a plot type that supports your thinking
- Explain what the data suggests

There is more than one reasonable answer. Strong work is defined by clear reasoning supported by evidence.

# Step 1 – Get Oriented

## Warm-Up: I Notice / I Wonder

Run the code below, then look at the column names and the first few rows of the dataset.


In [17]:
import pandas as pd

url = "https://raw.githubusercontent.com/Data-Dunkers/data/refs/heads/main/NBA/player/nba_player_stats_2025-2026.csv"
df = pd.read_csv(url)

df.head()

Unnamed: 0,Name,Team,POS,GP,MIN,PTS,FGM,FGA,FG%,3PM,3PA,3P%,FTM,FTA,FT%,REB,AST,STL,BLK,TO,DD2,TD3
0,Luka Doncic,LAL,G,19,37.3,34.7,10.6,23.1,46.0,3.5,10.7,32.4,9.9,12.3,81.1,8.7,8.8,1.5,0.6,4.4,14,2
1,Shai Gilgeous-Alexander,OKC,G,25,33.2,32.4,10.8,19.4,56.0,2.2,5.0,43.7,8.6,9.7,88.4,4.6,6.4,1.4,0.7,1.9,2,0
2,Tyrese Maxey,PHI,G,23,39.9,31.5,10.7,22.9,46.7,3.7,9.3,39.1,6.4,7.3,88.1,4.7,7.2,1.7,0.9,2.7,2,0
3,Donovan Mitchell,CLE,G,25,34.5,30.7,10.7,21.6,49.4,4.0,10.6,38.1,5.3,6.3,84.1,4.7,5.5,1.4,0.3,3.3,1,0
4,Nikola Jokic,DEN,C,25,35.0,29.8,10.7,17.5,61.3,2.2,5.1,43.3,6.2,7.4,84.2,12.4,10.8,1.4,0.8,3.4,23,12


## Getting Oriented With the Data

The table you see above contains statistics for NBA players.

Each column represents a different type of information, such as:
- Basic player information (name, team, position)
- How often a player shoots
- How successful those shots are
- Other game statistics

You do not need to understand every column yet.
For now, this section is just to help you recognize what kind of information is included.

## Common Types of Columns You May See

- **Player information**  
  (Name, team, position, games played)

- **Shooting volume**  
  (How many shots a player takes)

- **Shooting success**  
  (Percentages showing how often shots go in)

- **Other game statistics**  
  (Rebounds, assists, steals, etc.)

Later in this activity, you will get a full explanation of each statistic.

## First Look: What Do You Notice?

Looking at the table above, write:

- One thing you **notice** about how shooting or scoring information appears in the data
- One thing you **wonder** about how shot choices or efficiency might be represented

Your answers can be based on patterns, column names, or anything that stands out.

**Your answer:**  
(Type here)

# Step 2 - Finding the Correct Player Name



When you search using part of a name (for example, `Joe`), the dataset may match **many rows**.

To make this easier to work with, the search tool shows **unique player names** only:
- If multiple players match your search, **all of their names will appear**
- If a player appears many times in the dataset, their name will still appear **once**

Your goal here is not analysis — it is **discovery**.

Once you see the correct name listed, you will copy it exactly (including spelling and spacing) into the next step.


In [18]:
# This cell helps you find the exact spelling of a player's name.
# You can type part of a name, and the program will show matching player names.

# Start by searching for Pascal Siakam
search_term = "pascal"  # You may change this to part of another player's name

# Find rows where the Name column contains the search term
matches = df[df["Name"].str.contains(search_term, case=False, na=False)]

# Display matching player names in alphabetical order
sorted(matches["Name"].unique())


['Pascal Siakam']

# Step 3 – Lock in Your Player

You have now explored the dataset and searched for the correct spelling of a player’s name.

In the next step, you will:
- Paste **one exact player name** into the code
- Use that name to isolate the player’s data for analysis

Important:
- The column used for player names is **`Name`** (capital N)
- The name must match the dataset **exactly**, including spacing

Once the player is selected, their full set of statistics will be displayed so you can decide which fields are most useful for analyzing shot selection and efficiency.


In [19]:
# Paste the exact player name you discovered using the search step
player_name = "Pascal Siakam"  # Change only this line

# Filter the dataset for the selected player
player_df = df[df["Name"] == player_name].copy()

# Display the player's data so no columns are hidden
pd.set_option("display.max_columns", None)
pd.set_option("display.width", None)
player_df


Unnamed: 0,Name,Team,POS,GP,MIN,PTS,FGM,FGA,FG%,3PM,3PA,3P%,FTM,FTA,FT%,REB,AST,STL,BLK,TO,DD2,TD3
18,Pascal Siakam,IND,F,25,34.0,23.8,8.7,18.3,47.6,1.7,4.7,35.9,4.6,6.7,69.0,6.8,4.0,1.2,0.4,2.1,6,0


# Understanding the Available Statistics

Before selecting which fields to use, it is important to understand what each column represents.  
These are standard basketball statistics used in professional analysis.

<table style="width:100%;">
<tr>
<td style="vertical-align: top; width:50%;">

**Player & Context**
- **Name** – Player name  
- **Team** – Team abbreviation  
- **POS** – Player position  
- **GP** – Games played  
- **MIN** – Average minutes played per game  

**Scoring & Shooting Volume**
- **PTS** – Points per game  
- **FGM** – Field goals made per game  
- **FGA** – Field goals attempted per game  
- **3PM** – Three-point shots made per game  
- **3PA** – Three-point shots attempted per game  

</td>
<td style="vertical-align: top; width:50%;">

**Shooting Efficiency**
- **FG%** – Field goal percentage  
- **3P%** – Three-point percentage  
- **FT%** – Free throw percentage  

**Other Performance Stats**
- **REB** – Rebounds per game  
- **AST** – Assists per game  
- **STL** – Steals per game  
- **BLK** – Blocks per game  
- **TO** – Turnovers per game  

**Game Impact Indicators**
- **DD2** – Double-doubles recorded  
- **TD3** – Triple-doubles recorded  

</td>
</tr>
</table>

You are **not expected to use all of these statistics**.  
Your task is to choose the ones that best help answer the challenge.


# Step 4 - Framing the Challenge

Before working with the data, it is important to understand **what problem you are trying to solve**.

You do **not** need to be a basketball expert for this challenge.

In basketball, players can score points in different ways:
- Some shots are taken closer to the basket
- Some shots are taken farther away
- Some shots are free throws (with no defender)

**Shot selection** means:
> The kinds of shots a player chooses to take.

Some players take a lot of difficult shots.
Other players take fewer shots, but choose ones that are easier to make.

**Scoring efficiency** means:
> How well those shots actually work.

A player is considered more efficient if the shots they choose lead to points more reliably.

## The Challenge Question

Using the data, you will investigate:

**How does shot selection affect scoring efficiency for a player?**

In the next steps, you will decide:
- Which statistics describe *shot selection*
- Which statistics describe *efficiency*
- Which statistics do **not** help answer this question


# Step 5 – Choosing the Right Data for the Challenge

Based on the challenge you just framed, your next task is to decide **which data actually helps you answer it**.

Remember the challenge question:

**How does shot selection affect scoring efficiency for a player?**

Not every statistic in the dataset is useful for this question.

In this step, you will:
- Identify which statistics describe **shot selection**
- Identify which statistics describe **scoring efficiency**
- Ignore statistics that do not help answer the challenge

This is an important part of data analysis.
Real analysts almost never use every column in a dataset — they choose the ones that matter.


## Decide Which Fields Matter

Look at the player’s full list of statistics above.

In the space below:
1. List **3–6 column names** you think are most relevant  
2. Briefly explain **why each one helps** answer the challenge  

You are not expected to choose the “correct” fields —  
you are expected to choose **reasonable** ones and justify them.


In [20]:
# Choose the columns that best support the challenge.
# Column names must match the dataset exactly.

selected_columns = [
    "Name",
    "Team",
    "PTS",
    "FG%",
    "3P%",
    "FT%"
]

# Reduce the dataset to only the selected columns
focused_df = player_df[selected_columns]

# Display the reduced dataset
focused_df


Unnamed: 0,Name,Team,PTS,FG%,3P%,FT%
18,Pascal Siakam,IND,23.8,47.6,35.9,69.0


# Step 6 – Create Evidence With a Visualization

You have now selected the statistics you believe are most relevant to the challenge.

Your next task is to turn those statistics into **evidence** by creating **one clear visualization**.

Remember the challenge question:

**How does shot selection affect scoring efficiency for a player?**

Your visualization should:
- Use **only** the reduced dataset (`focused_df`)
- Make a clear point related to shot selection and efficiency
- Be easy to read and properly labeled

In the next code cell, create a Plotly Express chart that supports your answer.


## Review Your Reduced Dataset

Before building your visualization, take a moment to review the data you selected in Step 5.

In the code cell below:
- The Plotly Express library is imported
- Your reduced dataset (`focused_df`) is displayed

This is your final check:
- Confirm the columns you selected are correct
- Make sure they match the challenge you are trying to answer

You will use this dataset to build your visualization in the next step.


In [21]:
import plotly.express as px

# Reminder:
# Use ONLY the reduced dataset: focused_df
# Choose which columns to plot based on your analysis.

focused_df

Unnamed: 0,Name,Team,PTS,FG%,3P%,FT%
18,Pascal Siakam,IND,23.8,47.6,35.9,69.0


## Build Your Visualization

In the code cell below, you will create **one** Plotly Express chart using **only** the reduced dataset: `focused_df`.

Your task is to:
- Choose which column(s) from `focused_df` belong in the chart
- Decide which type of chart best supports the challenge
- Update the code so it reflects your choices

Your visualization should help answer the challenge question:

**How does shot selection affect scoring efficiency for a player?**


In [None]:
# Build your visualization here using Plotly Express.
# Replace the column names below with ones you selected in Step 5.
# Note: You will get errors if you do not change the column names to valid ones

x_col = "FGA"   # example: a shot selection column
y_col = "FG%"   # example: an efficiency column

fig = px.scatter(
    focused_df,
    x=x_col,
    y=y_col,
    title=f"{player_name}: {y_col} vs {x_col}",
    labels={x_col: x_col, y_col: y_col}
)

fig.show()


# Step 7 – Explain What the Data Shows

Looking at the visualization you created above, what is one pattern or relationship that catches your attention?

It could be related to:
- Shot type (for example: closer shots vs farther shots)
- Shot volume
- Scoring efficiency
- Or any other pattern you notice in the data

**Your answer:**  
(Type here)

# You’re Finished — What’s Next?



You have now completed this challenge.

Because this notebook may be opened in different environments (such as Callysto or Google Colab), the way you save or submit your work may vary.

Here are some common options:

- **Save a copy of the notebook**  
  You can save a copy to your own account (for example, in Google Drive or Callysto).

- **Download the notebook file**  
  You may download the `.ipynb` file and submit it to your teacher if requested.

- **Export as a PDF**  
  Some platforms allow you to export or print the notebook as a PDF.

Before submitting anything, **check with your teacher** to confirm:
- Which format they prefer
- How they want the work submitted

Well done for working through the full challenge.
