![Banner](https://github.com/Data-Dunkers/lessons/blob/main/images/top-banner.jpg?raw=true)

# Probability Basics

Probability how likely something is to happen. For example, if a player has a shot percentage of 30%, the probability of them making a shot is 0.3.

We can simulate probability using random numbers. We'll generate random numbers between 0 and 1, if the number is less than 0.3 (30%) then the player makes the shot.

In [None]:
import random
import pandas as pd

shot_percentage = 0.3
shots_taken = 100

shots_made = [random.random() < shot_percentage for shot in range(shots_taken)]

df = pd.DataFrame(shots_made, columns=["Shot Made"])
df

Because we are using random numbers, we will get different results every time we run the simulation. They should be close to the expected values, though (30% of shots made).

We can display the results of our simulation by using `.value_counts()`.

In [None]:
df.value_counts()

Let's visualize that using a pie chart.

In [None]:
import plotly.express as px
px.pie(df, names="Shot Made", title="Shots Made")

Let's use some real shot percentages from NBA player statistics, specifically the 2024-2025 season, and visualize their field goal percentages.

In [None]:
df = pd.read_csv('https://raw.githubusercontent.com/Data-Dunkers/data/refs/heads/main/NBA/player/nba_player_stats_2024-2025.csv')
px.bar(df.sort_values('FG%'), x='Name', y='FG%', title='Field Goal Percentage')

We can also look at just Pascal Siakam's shot percentage.

In [None]:
df[df['Name'] == 'Pascal Siakam']['FG%']

We can also display players who have similar field goal percentages, about 51%.

In [None]:
df[ (df['FG%'] > 51) & (df['FG%'] < 52) ]

Let's run a simulated game of Knockout (Bump) with these 14 players.

In [None]:
filtered_df = df[(df['FG%'] > 51) & (df['FG%'] < 52)]
players = dict(zip(filtered_df['Name'], filtered_df['FG%']))
remaining_players = list(players.keys())
results = []

def simulate_round(player1, player2):
    if random.random() < players[player1]/100:
        results.append([player1, True])
    else:
        results.append([player1, False])
        if random.random() < players[player2]/100:
            results.append([player2, True])
            results.append([player1, 'eliminated by ' + player2])
            remaining_players.remove(player1)
        else:
            results.append([player2, False])
            simulate_round(player1, player2)

current_players = remaining_players[0:2]
while len(remaining_players) > 1:
    simulate_round(current_players[0], current_players[1])
    current_players[0] = current_players[1]
    try:
        current_players[1] = remaining_players[remaining_players.index(current_players[0]) + 1]
    except:
        current_players[1] = remaining_players[0] # loop back to first player in the list

print(f'The winner after {len(results)} shots is {remaining_players[0]}')

Every time we run the simulation, we get a different result. This is because the simulation is random, but based on the players' probabilities of making a shot.

We can also visualize the results of the simulation we ran.

In [None]:
import plotly.express as px
results_data = pd.DataFrame(results, columns=['Player', 'Result'])
px.bar(results_data[results_data['Result'] == True]['Player'].value_counts(), title='Knockout Shots Made by Player')

The simulation could even be adapted to have all of the players who play a certain position compete against each other.

In [None]:
new_df = df[df['POS'] == 'F']
players = dict(zip(new_df['Name'], new_df['FG%']))

remaining_players = list(players.keys())
results = []

def simulate_round(player1, player2):
    if random.random() < players[player1]/100:
        results.append([player1, True])
    else:
        results.append([player1, False])
        if random.random() < players[player2]/100:
            results.append([player2, True])
            results.append([player1, 'eliminated by ' + player2])
            remaining_players.remove(player1)
        else:
            results.append([player2, False])
            simulate_round(player1, player2)

current_players = remaining_players[0:2]
while len(remaining_players) > 1:
    simulate_round(current_players[0], current_players[1])
    current_players[0] = current_players[1]
    try:
        current_players[1] = remaining_players[remaining_players.index(current_players[0]) + 1]
    except:
        current_players[1] = remaining_players[0] # loop back to first player in the list

print(f'The winner after {len(results)} shots is {remaining_players[0]}')
new_results_data = pd.DataFrame(results, columns=['Player', 'Result'])
px.bar(new_results_data[new_results_data['Result'] == True]['Player'].value_counts(), title='Knockout Shots Made by Player')

## Questions

1. If you change the shot_percentage variable to 0.8 (80%), how would you expect the pie chart to look different compared to when it was 0.3?
2. Run the Knockout Simulation cell multiple times. Why does the winner keep changing?
3. When we increase the number of players in the simulation, how does that change the chances of any specific player winning? Why?

---

### Online Access
You can run this notebook online using the following links:

*   [**Google Colab**](https://colab.research.google.com/github/Data-Dunkers/student/blob/main/activities/probability-basics.ipynb)
*   [**Callysto Hub**](https://hub.callysto.ca/jupyter/hub/user-redirect/git-pull?repo=https%3A%2F%2Fgithub.com%2FData-Dunkers%2Fstudent&branch=main&subPath=activities/probability-basics.ipynb&depth=1)