# Fun with NumPy
To get comfortable with the basics of NumPy I will base my practice off the Baseball exercises from [DataCamp](http://www.datacamp.com) in this section of the notebook. Specifically I will demonstrate how greate **NumPy** is for doing vector arithmetic.



## NumPy Basics

In [18]:
# Import the numpy package as np
import numpy as np

# Create list of the height of some baseball players
baseball_height = [74, 74, 72, 72, 73, 69, 69, 71]

# Create a numpy array from baseball: np_baseball_height
np_baseball_height = np.array(baseball_height)

# Print out np_baseball_height
print(np_baseball_height)


[74 74 72 72 73 69 69 71]


The baseball height list is currently in inches, lets convert it to meters. 

In [19]:
# Convert np_baseball height values to meter: np_baseball_height_m
np_baseball_height_m = np_baseball_height * 0.0254

# Print np_baseball_m
print(np_baseball_height_m)


[1.8796 1.8796 1.8288 1.8288 1.8542 1.7526 1.7526 1.8034]


Let suppose the players now let us analyze their BMI. This requires us to not only have the player's height but their weight as well.

In [20]:
# Create list of the weight of some baseball players
baseball_weight = [180, 215, 210, 210, 188, 176, 209, 200]

# Create a numpy array from baseball with metric units: np_baseball_weight_kg
np_baseball_weight_kg = (np.array(baseball_weight)) * 0.453592

# Print out np_baseball_weight
print(np_baseball_weight_kg)

[81.64656  97.52228  95.25432  95.25432  85.275296 79.832192 94.800728
 90.7184  ]


Now that we collected the player's weight data and have stored it in a NumPy array of the right measurement, let's calculate each player's BMI (Body Mass Index).

In [21]:
# Calculate BMI: np_baseball_bmi
np_baseball_bmi = np_baseball_weight_kg / np_baseball_height_m ** 2

# Print out bmi
print(np_baseball_bmi)

[23.11037639 27.60406069 28.48080465 28.48080465 24.80333518 25.99036864
 30.86356276 27.89402921]


Now I can gain some insight on the baseball players. For instance, perhaps I want to identify players that would be considered 'lightweight'.

In [22]:
# Create a boolean list identifying all the players with a BMI less than 24: baseball_light_bmi
baseball_light_bmi = np_baseball_bmi < 24

# Print out BMIs of all baseball players whose BMI is below 24.
print(np_baseball_bmi[baseball_light_bmi])

[23.11037639]


## 2D NumPy Arrays
The team's manager is interested in the results produced so far and the player insight that has been delievered, so he sends four more players to us to include in our studies. Let's use this opportunity to become more familiar with **NumPy** 2D Arrays.

In [23]:
# Create a list of lists of the new player's weights(lbs) & heights(inc): baseball_new_players
baseball_new_players = [[187, 70.4],
                        [222, 72.7],
                        [207, 68.5],
                        [169, 75.2]]

# Create a 2D numpy array from baseball_new_players: np_baseball_new_players
np_baseball_new_players = np.array(baseball_new_players)

# Print out the number of columns and rows of our 2D numpy array using the shape attribute of numpy arrays.
print(np_baseball_new_players.shape)

(4, 2)


In [24]:

print(np_baseball_new_players)

[[187.   70.4]
 [222.   72.7]
 [207.   68.5]
 [169.   75.2]]


Now that we have a 2D **NumPy** array for our new players ready to use, before we go any further we will want to convert the weight and height measurements to their metric values (kg and meters). **NumPy** is very useful with this since it can easily perform element-wise calculations.

In [25]:
# Create an numpy array to store the metric conversion multipliers: metric_conversions (weight, height)
metric_conversions = np.array((0.453592, 0.0254))

# Create a numpy 2D array to store the product of the new player's weight & height by the metric conversions: np_baseball_new_players_m
np_baseball_new_players_m = np_baseball_new_players * metric_conversions

print(np_baseball_new_players_m)

[[ 84.821704   1.78816 ]
 [100.697424   1.84658 ]
 [ 93.893544   1.7399  ]
 [ 76.657048   1.91008 ]]


To be able to analyze all our players for our study, both the original players and the new players we must convert the np_baseball_weight_kg and np_baseball_height_m **NumPy** arrays to one 2D array and append, then create a new 2D array with all the data.

In [26]:
# Convert the original players list to a numpy 2D array: np_baseball_2d
np_baseball_2d = np.column_stack((np_baseball_height_m, np_baseball_weight_kg))

# Combine original and new players data into one 2D Array: np_baseball_all
np_baseball_all = np.concatenate((np_baseball_2d,np_baseball_new_players_m),0)

# Let's check the shape now to make sure there are 2 rows and 2 columns.
print(np_baseball_all.shape)

(12, 2)


## Statistics with NumPy
Getting to know our data is important to help deliever insight and business value. So lets get started with some number crunching for our Data Analysis.


Lets start by generating some summarizeing statistics about our baseball player's data. A good place to start is to calculate come measures of Central Tendancy for our height and weight values.

In [27]:
# Calculate the median height of our players: med_baseball_height_m
med_baseball_height_m = round(np.median(np_baseball_all[:,1]),2)

# Calculate the average height of our players: avg_baseball_height_m
avg_baseball_height_m = round(np.mean(np_baseball_all[:,1]),2)

# Calculate the standard deviation for height of our players: std_baseball_height_m
std_baseball_height_m = round(np.std(np_baseball_all[:,1]),2)

print('Our baseball players sample has a average height of {0} meters and median height of {1}, with a standard deviation of {2} meters.'
      .format(avg_baseball_height_m, avg_baseball_height_m, std_baseball_height_m))

Our baseball players sample has a average height of 60.63 meters and median height of 60.63, with a standard deviation of 41.92 meters.


In [28]:
# Calculate the median weight of our players: med_baseball_weight_kg
med_baseball_weight_kg = round(np.median(np_baseball_all[:,0]),2)

# Calculate the average weight of our players: avg_baseball_weight_kg
avg_baseball_weight_kg = round(np.mean(np_baseball_all[:,0]),2)

# Calculate the standard deviation for weight of our players: std_baseball_weight_kg
std_baseball_weight_kg = round(np.std(np_baseball_all[:,1]),2)


print('Our baseball players sample has a average weight of {0} kilograms and median weight of {1}, with a standard deviation of {2} kilograms.'
      .format(avg_baseball_weight_kg, med_baseball_weight_kg, std_baseball_weight_kg))

Our baseball players sample has a average weight of 30.89 kilograms and median weight of 1.87, with a standard deviation of 41.92 kilograms.


We can also use **NumPy** to look for relationships between our datapoints. For instance, we can use the corrcoef() function within **NumPy** to see if there is a relationship potentially between height and weight of our baseball player sample. The question might be, do taller players tend to be heavier?

In [29]:
# Print our correlation between height and weight.
print(np.corrcoef(np_baseball_all[:,1], np_baseball_all[:,0]))

[[ 1.         -0.98417119]
 [-0.98417119  1.        ]]


## Data Generation
The team manager has found the insight we have been able to provide by simply capturing the data and calculating our basic summary statistics with the sample set of baseball players from his team. Now he wants us to include the whole team, or the population, into our analysis. 


Since this notebook is for a fictitious scenario, we will utilize some of the data generation helper functions within NumPy to simulate the remainder of the players on the baseball team.

In [30]:
# Generate height data for the additional 13 players: baseball_addt_players_height
baseball_addt_players_height = np.round(np.random.normal(1.75, 0.20, 13), 2)

# Generate weight data for the additional 13 players: baseball_addt_players_weight
baseball_addt_players_weight = np.round(np.random.normal(60.32, 15, 13), 2)

# Create a 2D array for the additional players: np_baseball_addt_players
np_baseball_addt_players = np.column_stack((baseball_addt_players_weight, baseball_addt_players_height))

# Print additional players data.
print(np_baseball_addt_players)

[[88.1   1.8 ]
 [69.6   1.9 ]
 [16.36  1.84]
 [33.71  1.59]
 [84.03  1.79]
 [67.76  1.99]
 [75.66  1.73]
 [46.42  2.06]
 [89.    1.83]
 [80.43  1.93]
 [58.17  1.9 ]
 [62.42  1.9 ]
 [59.96  1.6 ]]


Again, we want to work with one 2D Array instead of several different arrays, so we will combine this generated array with our 2D array for the team from our last example.

In [31]:
# Combine original and new players data into one 2D Array: np_baseball_team
np_baseball_team = np.concatenate((np_baseball_all, np_baseball_addt_players),0)

# Let's check the shape now to make sure there are 25 rows and 2 columns.
print(np_baseball_team.shape)

(25, 2)


## User Defined Functions with NumPy
Now that we have the whole team on board for our analysis it might be useful to write some python functions to help with our analysis. This will allow us to not have to repeat writing a block of code to get a result. 

A great user defined function that would help us out for this analysis would be a function that returns BMI as a result when you pass a player's height and weight in to the funciton. 

In [37]:
# Create a function to return BMI and BMI Classification for a given person: bmi_tuple
def get_bmi(height_m, weight_kg):
    """Returns the calculated BMI and classification (tuple) based on the given height and weight 
    parameters passed into the function."""
    
    # Initialize Variables
    bmi = 0
    bmi_class = 'Unknown'
    
    # Calculate BMI: player_bmi
    bmi = weight_kg / height_m ** 2
    
    # Based on the NIH classify the BMI: bmi_class
    if bmi < 18.5: 
        bmi_class = 'Underweight'
    elif bmi >= 18.5 and bmi <= 24.9:
        bmi_class = 'Normal'
    elif bmi >= 25.0 and bmi <= 29.9:
        bmi_class = 'Overweight'
    elif bmi >= 30.0 and bmi <= 39.9:
        bmi_class = 'Obesity'
    else:
        bmi_class = 'Extreme Obesity'
        
    # Store the BMI value and BMI classification result in a tuple: bmi_tuple
    bmi_tuple = (bmi, bmi_class)
    
    # Return calculated BMI and classification tuple
    return bmi_tuple

# Let's call our function as a check
print(get_bmi(2.0, 97.5 ))

(24.375, 'Normal')


Another useful function would be a metric convert for height and weight, so we'll create one.

In [45]:
def convert_inches_to_meters(inch_measure):
    """Returns the metric conversion for inches to meters."""
    try:
        
        # Multiple inches by 0.0254 to convert from inches to meters.
        return(inch_measure * 0.0254)
    except:
        print('Please input a float or int value for inch_measure parameter.')
    
# Let's call our function as a check
print('72 inches or 6 feet is equal to {0} meters.'.format(convert_inches_to_meters(72.0)))

72 inches or 6 feet is equal to 1.8288 meters.


In [2]:
def convert_lbs_to_kg(lbs_measure):
    """Returns the metric conversion for US pounds(lbs) to kilograms (kg)."""
    try:
        
        # Multiple inches by 0.453592 to convert from pounds(lbs) to kilograms(kg).
        return(lbs_measure * 0.453592)
    except:
        print('Please input a float or int value for lbs_measure parameter.')
    
# Let's call our function as a check
print('215 lbs is equal to {0} Kgs.'.format(convert_lbs_to_kg(215.0)))

215 lbs is equal to 97.52228 Kgs.
