# Introduction to Numpy

Numpy stands for numeric python, and using a Numpy Array as an alternative to a Python List allows for a more efficient way to handle data. 

However, arrays can only use one data type, if there is more than one data type it defaults to strings.

In [4]:
import numpy as np

baseball = [180, 215, 210, 210, 188, 176, 209, 200]

np_baseball = np.array(baseball)

print(np_baseball)

[180 215 210 210 188 176 209 200]


After calculating BMI below, an array showing all players (1) with a BMI less than 21 can be created and then identified from the existing array. Here light is used as an index of the existing array - a sub-category.

In [17]:
height = [74, 74, 73, 74, 75, 77, 73, 74, 76, 74, 75, ]
weight = [190, 170, 208, 225, 190, 225, 185, 180, 165, 240, 220]

# Import numpy
import numpy as np

# Calculate the BMI: bmi
np_height_m = np.array(height) * 0.0254
np_weight_kg = np.array(weight) * 0.453592
bmi = np_weight_kg / np_height_m ** 2

# Create the light array
light = bmi < 21

# Print out light
print(light)

# Print out BMIs of all baseball players whose BMI is below 21 and a count
print(bmi[light])
print(np.count_nonzero(light))

[False False False False False False False False  True False False]
[20.0842081]
1


Using Numpy Array data structures may be affected differently when using arithmetic operators i.e. + * / - Here True is converted to 1 and False as 0.

In [4]:
import numpy as np

# these two lines create the same result using addition

np.array([True, 1, 2]) + np.array([3, 4, False])

np.array([4, 3, 0]) + np.array([0, 2, 2])

array([4, 5, 2])

Subsetting Numpy Arrays works in a very similar way to lists with square brackets, however arithmetic will apply to both arrays slotted together.

In [6]:
import numpy as np

np_data_one = np.array([70, 60, 50, 40, 33, 22, 11, 9, 70, 60])

np_data_two = np.array([60, 10, 40, 50, 22, 11, 22, 40, 9, 68])

print(np_data_one[3])

print(np_data_two[4:9])

print(np_data_one + np_data_two)

40
[22 11 22 40  9]
[130  70  90  90  55  33  33  49  79 128]


2D Arrays can also be used to display data in columns and rows. These can be selected using square brackets too, and function similarly. 2D Arrays can be produced from existing Python lists. Using the shape function is useful to understand the structure of larger datasets.

In [1]:
# Create baseball, a list of lists
baseball = [[180, 78.4],
            [215, 102.7],
            [210, 98.5],
            [188, 75.2]]

# Import numpy
import numpy as np

# Create a 2D Numpy array from baseball: np_baseball
np_baseball = np.array(baseball)

# Print out the type of np_baseball
print(type(np_baseball))

# Print out the shape of np_baseball which is 4 rows 2 columns
print(np_baseball.shape)

<class 'numpy.ndarray'>
(4, 2)


To subset and slice a 2D Array, square brackets are used with the first number the row, and the second the column.  Remember that in Python, the first element is at index 0! This counts for rows and columns!!

In [30]:
import numpy as np

# Create a 2D array
np_baseball = np.array([[180, 78.4],
                        [195, 77.7],
                        [210, 98.5],
                        [188, 75.2],
                        [182, 85.0],
                        [160, 43.0]])

# Print out the 3rd row of np_baseball
print(np_baseball[2,:])

# Select the entire second column of np_baseball as np_weight
print(np_baseball[:,1])

# Print out height of 4th player
print(np_baseball[3,0])

# height in metres
height_m = np_baseball[:,0] / 100

print((np_baseball[:,1] / height_m)/height_m)

[210.   98.5]
[78.4 77.7 98.5 75.2 85.  43. ]
188.0
[24.19753086 20.43392505 22.33560091 21.27659574 25.66115203 16.796875  ]


Numpy Arrays can generate random data, and produce summary statistics for the dataset using functions. Here heights and weights are randomly generated by mean, stdev and iterations to produce 5000 data points - then stacking them together.

In [49]:
import numpy as np
                               #  mean stdev iterations decimals
height = np.round(np.random.normal(1.74, 0.25, 5000), 2)
weight = np.round(np.random.normal(60.32, 15, 5000), 2)

data = np.column_stack((height, weight))

print(np.mean(data[:,0]))
print(np.mean(data[:,1]))


1.7396040000000002
60.48364599999999


Finally, using an index the mean height of goalkeepers can be found and compared to attackers, midfielders or defenders

In [3]:
import numpy as np

heights = np.array([188, 177, 160, 140, 190, 176, 162, 141])
positions = np.array(['GK', 'A','M', 'D', 'GK', 'A','M', 'D', ])

# Heights of the goalkeepers: gk_heights
gk_heights = heights[positions == 'GK']

# Heights of the other players: other_heights
other_heights = heights[positions != 'GK']

# Print out the median height of goalkeepers. Replace 'None'
print("Median height of goalkeepers: " + str(np.median(gk_heights)))

# Print out the median height of other players. Replace 'None'
print("Median height of other players: " + str(np.median(other_heights)))

Median height of goalkeepers: 189.0
Median height of other players: 161.0
