# NumPy

NumPy is a python library, short for numeric python

## Why Numpy?

Python has many builtin types of collections, including tuples, lists and sets, which have their own purpose and benefits, however, we still miss the array we got used to in c and c++

You might say I'm gotta be kidding because most of us hated arrays didn't we? But here's the thing-

In an array we stored only on type of elements, for an integer array we only stored ints and same for floats and character. Why does that matter? 

Because sometimes you might wanna perform an operation on entire array, for example let's say you have a list to store heights of people in your calss and you want to find the mean of that list, you'll have to write a function to do that, and even before that you'll need to make sure that the list contains only heights in floats.

Doesn't make sense? Let's suppose you've been more collective about data and now you want both heights and weights of students. You have a 2D list with two columns, first one with heights and second one with weights. Let's see-

In [6]:
Data = [[1.74, 1.45, 1.58, 1.80],
        [58.2, 60.0, 61.2, 63.2]]   

In [7]:
heights = Data[0]
weights = Data[1]

In [10]:
print(heights)
print(weights)

[1.74, 1.45, 1.58, 1.8]
[58.2, 60.0, 61.2, 63.2]


Suppose you want BMI or Body Mass Index of every person

BMI = weight (in kgs.) / (heights (in mtr.))^2

In [11]:
BMI = weights / heights ** 2

TypeError: unsupported operand type(s) for ** or pow(): 'list' and 'int'

In [12]:
BMI = []
for weight, height in zip(weights, heights):
    BMI.append(weight / height ** 2)

print(BMI)

[19.223147047166073, 28.53745541022592, 24.515302034930297, 19.506172839506174]


### With NumPy|

In [13]:
import numpy as np

In [16]:
np_heights = np.array(heights)
np_weights = np.array(weights)

In [19]:
bmi = np_weights / np_heights ** 2

In [20]:
print(bmi)

[19.22314705 28.53745541 24.51530203 19.50617284]


### Why does that even matter?

This was a data for only four people, what if you a have a million rows of data?

In [21]:
np_heights = np.round(np.random.normal(1.75, 0.20, 5000), 2)
np_weights = np.round(np.random.normal(60.32, 15, 5000), 2)

In [22]:
print(np_heights)

[1.48 2.07 2.06 ... 1.7  1.87 1.77]


In [23]:
np_heights.shape

(5000,)

In [24]:
BMI = np_weights / np_heights ** 2

In [25]:
print(BMI)

[20.06482834 15.19522043  9.17381469 ... 30.11764706 18.68512111
 23.94905678]


In [26]:
np_data = np.column_stack((np_heights, np_weights))

### More with NumPy

Calculating mean median

In [27]:
np.mean(np_heights)

1.7528599999999999

In [28]:
np.median(np_heights)

1.75

### Corelation coefficient

--More in stats--

Corelation coefficient is a number between -1 and 1 that signifies how closely two variables x and y (e.g. heights and weights) that belong to same record, depend on each other

In [29]:
np.corrcoef(np_heights, np_weights)

array([[ 1.       , -0.0277154],
       [-0.0277154,  1.       ]])