![Data Dunkers Banner](https://github.com/Data-Dunkers/lessons/blob/main/images/top-banner.jpg?raw=true)

<a href="https://hub.callysto.ca/jupyter/hub/user-redirect/git-pull?repo=https%3A%2F%2Fgithub.com%2Fdata-dunkers%2Flessons&branch=main&subPath=data-statistics.ipynb&depth=1" target="_parent"><img src="https://raw.githubusercontent.com/Data-Dunkers/lessons/main/images/open-in-callysto-button.svg?sanitize=true" width="123" height="24" alt="Open in Callysto"/></a>
<a href="https://colab.research.google.com/github/data-dunkers/lessons/blob/main/data-statistics.ipynb" target="_parent"><img src="https://raw.githubusercontent.com/Data-Dunkers/lessons/main/images/open-in-colab-button.svg?sanitize=true" width="123" height="24" alt="Open in Colab"/></a>

# Data Dunkers Lesson: Statistics

## Objectives

By the end of this lesson, students will be able to:
- Calculate basic statistical measures such as minimum, maximum, mean, and median using Python.
    - Example: Calculate the minimum, maximum, median, and mean number of games played by Raptors players.
- Manipulate and format statistical data for clear presentation. 
    - Example: Convert the median number of games from a float to an integer and round the mean to one decimal place.
- Generate comprehensive statistics for entire datasets using functions like `describe()`.
    - Example: Use the `describe()` function to display all available statistics for numeric columns in the Raptors dataset.
- Apply f-strings for advanced data formatting in Python to create clear and precise output.
    - Example: Use f-strings to format the output of statistical measures like minimum, maximum, mean, and median in a readable manner.

## Introduction

Statistics are used in many fields of study to investigate why things happen, when they occur, and whether their reoccurrence is predictable. Some everyday examples of how statistics are used include¹:

- **Biology**: Statistics can be used to analyze data from experiments and research studies in biology.
- **Business growth**: Statistics can be used to analyze sales data and other business metrics to identify trends and opportunities for growth.
- **Economics**: Statistics can be used to analyze economic data such as GDP, inflation rates, and unemployment rates.
- **Farming & gardening**: Statistics can be used to analyze crop yields and other agricultural data.
- **Groceries**: Statistics can be used to analyze sales data for grocery stores and other retailers.
- **Housing**: Statistics can be used to analyze housing data such as home prices and rental rates.
- **Infrastructure**: Statistics can be used to analyze data related to infrastructure such as traffic patterns and road conditions.
- **Medicine**: Statistics can be used to analyze medical data such as patient outcomes and drug efficacy.
- **Warranties**: Statistics can be used to analyze warranty claims data to identify trends and potential issues with products.
- **Website performance**: Statistics can be used to analyze website traffic data and user behavior.

1. Source: Conversation with Bing, 2023-07-10

---

Let's calculate some basic statistics,including mean, median, minimum, and maximum.

Recalling the names of the columns in the Raptors file:

| Column | Meaning | Column | Meaning |
|--------|---------|--------|---------|
| Age    | Player's age on February 1 of the season | 3P   | 3-Point Field Goals Per Game |
| Lg     | League  | 3PA    | 3-Point Field Goal Attempts Per Game |
| Pos    | Position| 3P%    | 3-Point Field Goal Percentage |
| G      | Games   | 2P     | 2-Point Field Goals Per Game |
| GS     | Games Started | 2PA | 2-Point Field Goal Attempts Per Game |
| MP     | Minutes Played Per Game | 2P% | 2-Point Field Goal Percentage |
| FG     | Field Goals Per Game | eFG% | Effective Field Goal Percentage* |
| FGA    | Field Goal Attempts Per Game | FT | Free Throws Per Game |
| FG%    | Field Goal Percentage | FTA | Free Throw Attempts Per Game |
| ORB    | Offensive Rebounds Per Game | FT% | Free Throw Percentage |
| DRB    | Defensive Rebounds Per Game | TRB | Total Rebounds Per Game |
| AST    | Assists Per Game | STL | Steals Per Game |
| BLK    | Blocks Per Game | TOV | Turnovers Per Game |
| PF     | Personal Fouls Per Game | PTS | Points Per Game |


<span style="font-size:10px">*This statistic adjusts for the fact that a 3-point field goal is worth one more point than a 2-point field goal.</span>

## min(), max(), median(), mean()

Let's import a data file about the Raptors 2023 season and use the `'G'` column to calculate the minimum, maximum, mean, and median number of games played.

In [None]:
import pandas as pd

url = 'https://raw.githubusercontent.com/Data-Dunkers/data/main/NBA/raptors-2023.csv'
raptors_df = pd.read_csv(url)

print('Minimum =', raptors_df['G'].min())

print('Maximum =', raptors_df['G'].max())

print('Median =', int(raptors_df['G'].median()))

print('Mean =', raptors_df['G'].mean().round(1))

Notice how we used the `round()` function to round the mean to 1 decimal place?

We also converted the median from a float (a number with decimals) to an integer using the `int()` function.

## Exercise

What is the average age of the Raptors? 

In [None]:
# Write your program here


## Supplemental

### Average for all numbered columns

What if we want to look at the averages of *all* the numbered columns? Simple!

In [None]:
raptors_df.mean(numeric_only = True).round(2)

### All stats at once

Display all avaialble statistics for all numberic columns (with `describe()`):

In [None]:
raptors_df.describe()

What if we only want to look at one column?

In [None]:
raptors_df['G'].describe()

### f-strings

By using some advanced printing techniques in Python (called *f-strings*), we can format the output exactly like we like:

In [None]:
print(f"Minimum = {raptors_df['G'].min():.0f}")

print(f"Maximum = {raptors_df['G'].max():.0f}")

print(f"Mean = {raptors_df['G'].mean():.1f}")

print(f"Median = {raptors_df['G'].median():.0f}")

---
*Report issues or give us feedback about this notebook [here](https://docs.google.com/forms/d/e/1FAIpQLSdMRX2hPqZyD8-argFJXxB3ABQdLk3aUH1CAfmMEtcFAlWzCw/viewform?usp=pp_url&entry.1771525592=Module%20Resources%20%28the%20Jupyter%20notebooks%2C%20PPTS%20or%20additional%20resources%29&entry.1364186163=Statistics).*

---
Back to [Lessons](https://github.com/Data-Dunkers/lessons/blob/main/lessons.ipynb)

---