# Numpy

Numpy is a Python package to efficiently do data science. Learn to work with the Numpy array, a faster and more powerful alternative to the list, and take your first steps in data exploration.

## Your First Numpy Array

In this chapter, we're going to dive into the world of baseball. Along the way, you'll get comfortable with the basics of Numpy, a powerful package to do data science. 

A list `baseball` has already been defined in the Python script, representing the height of some baseball players in centimeters. Can you add some code here and there to create a Numpy array from it?

In [3]:
# Create list baseball
baseball = [180, 215, 210, 210, 188, 176, 209, 200]

# Import the numpy package as np
import numpy as np

# Import the pandas package as pd
import pandas as pd

# Create a Numpy array from baseball: np_baseball
np_baseball = np.array(baseball)

# Print out type of np_baseball
print(type(np_baseball))

<class 'numpy.ndarray'>


## Baseball players' height

You are a huge baseball fan. You decide to call the MLB (Major League Baseball) and ask around for some more statistics on the height of the main players. They pass along data on more than a thousand players, which is stored as a regular Python list: `height`. The height is expressed in inches. Can you make a Numpy array out of it and convert the units to centimeters?

In [4]:
# Load data
socr = 'SOCR_MLB.csv'
data = pd.read_csv(socr)
list(data)

['Name', 'Team', 'Position', 'Height(inches)', 'Weight(pounds)', 'Age']

In [5]:
# Create array from height with correct units: np_height_m
np_height = data[['Height(inches)']]
np_height_m = np_height*0.0254

## Baseball player's BMI

The MLB also offers to let you analyze their weight data. Again, both are available as regular Python lists: `height` and `weight`. `height` is in inches and `weight` is in pounds.

It's now possible to calculate the BMI of each baseball player. Python code to convert `height` to a Numpy array with the correct units is already available in the workspace. Follow the instructions step by step and finish the game!

In [8]:
# Create array from weight with correct units: np_weight_kg
np_weight = data[['Weight(pounds)']]
np_weight_kg = np.array(np_weight) * 0.453592

# Calculate the BMI: bmi
bmi = np_weight_kg / np_height_m**2

## Lightweight baseball players

To subset both regular Python lists and Numpy arrays, you can use square brackets:

`
x = [4 , 9 , 6, 3, 1]
x[1]
import numpy as np
y = np.array(x)
y[1]
`

For Numpy specifically, you can also use boolean Numpy arrays:

`
high = y > 5
y[high]
`

The code that calculates the BMI of all baseball players is already included. Follow the instructions and reveal interesting things from the data!

In [9]:
# Create the light array
light = bmi < 21

# Print out BMIs of all baseball players whose BMI is below 21
print(bmi[light])

      Height(inches)
0                NaN
1                NaN
2                NaN
3                NaN
4                NaN
5                NaN
6                NaN
7                NaN
8                NaN
9                NaN
10               NaN
11               NaN
12               NaN
13         20.542557
14               NaN
15               NaN
16               NaN
17               NaN
18               NaN
19               NaN
20               NaN
21               NaN
22               NaN
23               NaN
24               NaN
25               NaN
26               NaN
27               NaN
28               NaN
29               NaN
...              ...
1004             NaN
1005             NaN
1006             NaN
1007             NaN
1008             NaN
1009             NaN
1010             NaN
1011             NaN
1012             NaN
1013             NaN
1014             NaN
1015             NaN
1016             NaN
1017             NaN
1018             NaN
1019         

## Subsetting Numpy Arrays

You've seen it with your own eyes: Python lists and Numpy arrays sometimes behave differently. Luckily, there are still certainties in this world. For example, subsetting (using the square bracket notation on lists or arrays) works exactly the same. To see this for yourself, try the following lines of code in the IPython Shell:

`
x = ["a", "b", "c"]
x[1]

np_x = np.array(x)
np_x[1]
`

The script on the right already contains code that imports `numpy` as `np`, and stores both the height and weight of the MLB players as Numpy arrays.


In [None]:
# Print out the weight at index 50
print(np_weight[50])

# Print out sub-array of np_height: index 100 up to and including index 110
print(np_height[100:111])

In [10]:
# Print out the shape of data
print(data.shape)

(1034, 6)
