## NumPy Basics
NumPy Array

In [2]:
import numpy as np
# Create a 1D numpy array
arr = np.array([1, 2, 3, 4, 5]) # NumPy Arrays are similar to python lists but are optimized for mathematical operations
print(arr) # They allow vertorized operations
# like so
mult = arr * 2
print(mult)

[1 2 3 4 5]
[ 2  4  6  8 10]


Basketball Player Heights

In [3]:
heights = np.array([180, 175, 190, 185, 170])
print("Heights: ", heights)
# Accessing elements
print(heights[0])  # First element
print(heights[-1]) # Last element

# Array slicing (subsetting)
print(heights[1:4])  # Elements from index 1 to 3

# NumPy arrays allow in-place operations
heights[0] = 185  # Change the first element
print(heights)

Heights:  [180 175 190 185 170]
180
170
[175 190 185]
[185 175 190 185 170]


#### Subsetting NumPy Arrays
With NumPy, you can use more advanced techniques such as boolean indexing or masking.

In [4]:
# Boolean indexing
tall_players = heights[heights > 180]
print(tall_players)

# Advanced Subsetting
filtered_players = heights[(heights >= 175) & (heights <= 185)] # For players between 175 and 185 cm
print(filtered_players)

[185 190 185]
[185 175 185]


#### 2D NumPy Arrays
A 2D array is simply an array of arrays (think of it as a matrix). Here's how you can create one for more complex data (e.g., baseball players' heights and weights):


In [5]:
# Creating 2D NumPy Arrays
data = np.array([[180, 75], 
                 [175, 80], 
                 [190, 85], 
                 [185, 70], 
                 [170, 65]])
print("2D Array:\n", data)

# Subsetting 2D NumPy Arrays
# Accessing the first row
print(data[0])  # First row (height, weight)

# Accessing the second column (weights)
print(data[:, 1])  # All rows, second column

# Subsetting 2D NumPy Arrays
print(data[1, 1])  # Second row, second column (weight of player 2)

2D Array:
 [[180  75]
 [175  80]
 [190  85]
 [185  70]
 [170  65]]
[180  75]
[75 80 85 70 65]
80


##### Baseball Data in 2D form

In [6]:
baseball_data = np.array([[180, 75, 0.300],
                          [175, 80, 0.350], 
                          [190, 85, 0.280], 
                          [185, 70, 0.320], 
                          [170, 65, 0.290]])
print(baseball_data)

# Extracting Information
batting_averages = baseball_data[:, 2] # subsetting the third column
print("Batting Averages:\n", batting_averages) 

# Applying conditions to 2D NumPy Arrays
good_batters = baseball_data[baseball_data[:, 2] > 0.300] # Setting a standard for good batters
print("Good Batters:\n",good_batters)

[[180.    75.     0.3 ]
 [175.    80.     0.35]
 [190.    85.     0.28]
 [185.    70.     0.32]
 [170.    65.     0.29]]
Batting Averages:
 [0.3  0.35 0.28 0.32 0.29]
Good Batters:
 [[175.    80.     0.35]
 [185.    70.     0.32]]


#### 2D Arithmetic
NumPy supports arithmetic operations across entire arrays. You can add, subtract, multiply, or divide arrays element-wise.

In [7]:
# Increase everyone's weight by 10% (element-wise)
new_weights = baseball_data[:, 1] * 1.10
print("New Weights:", new_weights) # Everyone got 10% fatter via multiplication

# Matrix Multiplication
A = np.array([[1, 2], [3, 4]])
B = np.array([[5, 6], [7, 8]])

# Dot product of two matrices
C = np.dot(A, B)
print("Dot Product:\n", C)

New Weights: [82.5 88.  93.5 77.  71.5]
Dot Product:
 [[19 22]
 [43 50]]


#### NumPy: Basic Statistics
NumPy offers several built-in functions to calculate statistical properties of data.

In [8]:
# Average Height
average_height = np.mean(baseball_data[:, 0])
print("Average Height:", average_height)

# Median Height
median_height = np.median(baseball_data[:, 0])
print("Median Height:", median_height)

# Standard deviation of heights
std_dev_height = np.std(baseball_data[:, 0])
print("Standard Deviation of Height:", round(std_dev_height,8))

# Variance of batting averages
variance_batting_avg = np.var(baseball_data[:, 2])
print("Variance of Batting Averages:", round(variance_batting_avg, 8))

# Summary Statistics
percentile_25 = np.percentile(baseball_data[:, 0], 25)
percentile_75 = np.percentile(baseball_data[:, 0], 75)
print("25th Percentile of Heights:", percentile_25)
print("75th Percentile of Heights:", percentile_75)


Average Height: 180.0
Median Height: 180.0
Standard Deviation of Height: 7.07106781
Variance of Batting Averages: 0.000616
25th Percentile of Heights: 175.0
75th Percentile of Heights: 185.0


#### Practice Problem: Baseball Team Data Analysis using NumPy
You have been provided with data for a baseball team that includes the following columns:

Height (cm) of the player
Weight (kg) of the player
Batting Average of the player (a value between 0 and 1, where higher means better performance)
Your task is to use NumPy to perform the following:

- Step 1: Create the Data  
Create a 2D NumPy array called baseball_data with the following player data:  
Player	Height (cm)	Weight (kg)	Batting Average  
1	[[180, 75, 0.300],  
2	[175, 80, 0.350],  
3	[190, 85, 0.280],  
4	[185, 70, 0.320],  
5	[170, 65, 0.290]]  

- Step 2: Basic Array Operations  
Extract the heights of the players (first column).  
Extract the weights of the players (second column).  
Extract the batting averages (third column).  

- Step 3: Basic Statistics  
Calculate the mean height and median height of the players.  
Calculate the average batting average for the team.  
Find the standard deviation of the players' weights.  
Calculate the 25th percentile and 75th percentile of the batting averages.  

- Step 4: Manipulating Data  
Increase every player's weight by 5% (element-wise).  
Filter out players with a batting average below 0.300 and create a new array good_batters with their heights, weights, and batting averages.  
- Step 5: 2D Array Operations  
Find the average height and average weight of all players.  
Create a new 2D array where each player's height and weight are scaled by 10% (multiply by 1.10).

In [None]:
# Create NumPy Array
bb_data = np.array([[180, 75, 0.300],
                    [175, 80, 0.350],
                    [190, 85, 0.280],
                    [185, 70, 0.320],
                    [170, 65, 0.290]])

# Extract specific columns
ht = bb_data[:,0]
print("Heights:\n",ht)
wt = bb_data[:,1]
print("Weight:\n", wt)
ba = bb_data[:,2]
print("Weight:\n", ba)

# Basic Statistics
ht_mean = np.mean(ht)
print("Average Height:", ht_mean)
ht_median = np.median(ht)
print("Median Height:", ht_median)
ba_mean = np.mean(ba)
print("Team Battling Average:", ba_mean)
wt_std = np.std(wt)
print("Team Weight Standard Deviation:", wt_std)
percentile_25 = np.percentile(ba, 25)
print("25 Percentile:", percentile_25)
percentile_75 = np.percentile(ba, 75)
print("75 Percentile:", percentile_75)

#Data Manipulation
fat = wt*1.05
print("5% Fatter:", fat)
good_batters = ba[ba > 0.300]
print("Good Batters:", good_batters)

# Array Operations
ht_mean = np.mean(ht)
print("Average Height:", ht_mean)
wt_mean = np.mean(wt)
print("Average Height:", wt_mean)
bb_data[:, 1] = bb_data[:, 1] * 1.1
print(bb_data)

Heights:
 [180. 175. 190. 185. 170.]
Weight:
 [75. 80. 85. 70. 65.]
Weight:
 [0.3  0.35 0.28 0.32 0.29]
Average Height: 180.0
Median Height: 180.0
Team Battling Average: 0.308
Team Weight Standard Deviation: 7.0710678118654755
25 Percentile: 0.29
75 Percentile: 0.32
5% Fatter: [78.75 84.   89.25 73.5  68.25]
Good Batters: [0.35 0.32]
Average Height: 180.0
Average Height: 75.0
[[180.    82.5    0.3 ]
 [175.    88.     0.35]
 [190.    93.5    0.28]
 [185.    77.     0.32]
 [170.    71.5    0.29]]


#### Bonus Challenge
Use boolean indexing to find and print the heights of all players who are taller than 180 cm and have a batting average greater than 0.300.

In [22]:
tal = bb_data[(bb_data[:,0] > 180) & (bb_data[:,2] > 0.300)]
print("Tall and Good Players:\n", tal )

Tall and Good Players:
 [[185.    77.     0.32]]
