# M1L2 NumPy Data Challenge: Basketball Stats Analysis

## Scenario

Imagine you're analyzing basketball player statistics. Each player has several stats, such as points scored, rebounds, and assists. You'll use NumPy to store and manipulate this data.


## Learning Objectives

1. Create and manipulate NumPy arrays.
2. Work with multidimensional arrays (players as rows, stats as columns).
3. Perform mathematical operations on arrays.
4. Transpose arrays and understand their significance.
5. Learn about correlation and calculate it using NumPy.

### Step 1:  Import NumPy

In [3]:
# Import NumPy 
import numpy as np

### Step 2:  Create a 1D Array 

Create a 1D array to store the points scored by 5 players in a game.
Make up any 5 numbers you want (or you can research your favorite basketball team and get these 5 numbers)

In [28]:
points = np.array([23,30,16,20,38])
print(points)

[23 30 16 20 38]


In [29]:
points[::2]

array([23, 16, 38])

### Step 3:  Create a 2D Array

Now, create a 2D array where:
- Each row represents a player (5 rows total).
- Each column represents a stat (e.g., points, rebounds, assists).

In [42]:
# Create a 2D array for player stats
# Example: [[points, rebounds, assists], ...]

pplayer_stats = np.array([[23,5,10],
                         [21,9,2],
                         [34,8,7]])
print(player_stats)

[[25 10  5]
 [18  7  8]
 [30 12  4]
 [22  9  6]
 [15  5  7]]


In [51]:
average_point = np.average(pplayer_stats[:,0])
median_rebounds = np.median(pplayer_stats[:,1])
std =np.std(pplayer_stats) 
print(std)
print(median_rebounds)
print (average_point)

9.86326267607651
8.0
26.0


In [37]:
print(pplayer_stats[0:2,1:])

[[ 5 10]
 [ 9  2]]


### Step 4:  Perform Mathematical Operations

Calculate the total stats for each player (sum of points, rebounds, and assists).

**This is done for you however determine what axis=1 does.  Remove it and run the cell then add it back in.  This will be important for future code**

In [15]:
# Calculate total stats for each player
total_stats = np.sum(player_stats, axis=1)
print(total_stats)

[38 32 49]


### Step 5:  Transpose the Array

Transpose the `player_stats` array so that rows become columns and vice versa. Use the NumPy documentation online to learn about the `transpose()` function.

In [17]:
# Transpose the array
transposed_stats = np.transpose(player_stats)
print(transposed_stats)

[[23 21 34]
 [ 5  9  8]
 [10  2  7]]


### Step 6:  Correlation 

Correlation measures the relationship between two variables. For example, you can calculate the correlation between points scored and assists.

Use the `np.corrcoef()` function to calculate the correlation between two columns in the `player_stats` array.

In [18]:
# Calculate correlation between points and assists -- what does the output mean?

correlation = np.corrcoef(player_stats[:, 0], player_stats[:, 2])
print(correlation)

[[1.         0.28278381]
 [0.28278381 1.        ]]


## Above and Beyond (Optional Challenge)

### AAB Question 1:  Find the Player with the Best Stat

Task:
Using the player_stats array (where rows represent players and columns represent stats), find the player who has the highest total stats (sum of all stats for each player).

1) Calculate the total stats for each player (you may have already done this in a previous step).
2) Use NumPy to find the index of the player with the highest total stats.
3) Print the index of the player and their total stats.


Hint: Use `np.sum()` to calculate totals and `np.argmax()` to find the index of the maximum value.


In [55]:
# Player stats array
player_stats = np.array([
    [25, 10, 5],
    [18, 7, 8],
    [30, 12, 4],
    [22, 9, 6],
    [15, 5, 7]
])
average_point = np.average(player_stats[:,0])
median_rebounds = np.median(player_stats[:,1])
std_assist =np.std(player_stats[:,2]) 
print(std_assist)
print(median_rebounds)
print (average_point)
# Calculate total stats for each player
total_stats = np.sum(player_stats, axis=1)
percentile_75= np.percentile(player_stats,75)
print(percentile_75)

# Find the index of the player with the highest total stats
best_player_index = np.argmax(total_stats)


# Print the result
print(total_stats)
print(best_player_index)
print(f"Player with the highest total stats: Player {best_player_index + 1}")
print(f"Total stats: {total_stats[best_player_index]}")

1.4142135623730951
9.0
22.0
16.5
[40 33 46 37 27]
2
Player with the highest total stats: Player 3
Total stats: 46


In [57]:
# dealing with outliers
# calculating Q1 and Q3
Q1= np.percentile(player_stats,25)
Q3= np.percentile(player_stats,75)
# calculate IQR 
IQR= Q3-Q1
#Calculate the lower bound and upper bound
lower_bound= Q1-(1.5 * IQR)
upper_bouncd= Q3+(1.5 * IQR)
print (lower_bound)
print (upper_bouncd)

-8.5
31.5
