# Numpy

In this lecture, we will cover the basic of Numpy (a python library) and how to work with array. While as a analyst, you won't work directly with Numpy, it is the basic building block that makes the other Python libraries incredibly useful.

In this section we will learn about:
    
    1) Basics of Numpy Arrays
    2) Numpy Array Built-In Methods
    3) Grabbing part of the array
    4) Selecting part of array based on criteria
    5) Mathematical Operations on Arrays
___

# 1) Basics of Numpy Arrays

The first step to using numpy is installing the python library. To do so, we need to "pip install" it. PIP is python package/modules manager and is already installed when you install python.

To install packages the syntax is pip install [Package Name]

In [1]:
pip install numpy # running this code will install the package for you

Note: you may need to restart the kernel to use updated packages.


ERROR: Invalid requirement: '#'
You should consider upgrading via the 'c:\Users\cryst\AppData\Local\Programs\Python\Python39\python.exe -m pip install --upgrade pip' command.


To check if it was installed correctly, we can import it. Sometimes after installing package its good to remove the block and then restart the kernel.

When we import modules we can also give it different names.



In [2]:
import numpy as np # if this runs without error you should be good to go

 In the cell above we import numpy but name it as np. So when we want to use the numpy methods, we use np instead so its faster to type

You can choose any name but for the more popular packages people typically use the same shorthand across the community.

Examples:

    Numpy       --> np
    Pansas      --> pd
    matplotlib  --> plt
    seaborn     --> sns

Here is an example of using accessing numpy's constant for pi

In [3]:
np.pi

3.141592653589793

## Numpy Arrays

Numpy arrays are the main star in the package and most known for why people use this package. While we when over lists, Numpy arrays are built using lists but the package adds a lot of functions and different ways to use it so it easier for data analysis.

So let's begin by creating your first Numpy array

In [4]:
list_1D = [1,2,3]
my_list = np.array(list_1D)
my_list

array([1, 2, 3])

We can also make 3D arrays as well

In [5]:
list_2D = [[1,2,3],[4,5,6],[7,8,9]]
my_array = np.array(list_2D)
my_array

array([[1, 2, 3],
       [4, 5, 6],
       [7, 8, 9]])

# 2) Built in Methods
While this might look exactly like a list, note that python treats it like an entirely different data structure. Along with the Numpy array data structure, we have built in methods ready to use

## .arange()
We can create an evenly spaced array with values in a certain range, or in a certain stepsize

In [6]:
print("Default Step Size:", np.arange(0,10))
print("Set Step Size:    ", np.arange(0,10,2))

Default Step Size: [0 1 2 3 4 5 6 7 8 9]
Set Step Size:     [0 2 4 6 8]


## .linespace()

np.linespace is similar to np.arange, but is more commonly used since it divides the range into n number of evenly space intervals and retuns decimals/floats instead.

In [7]:
np.linspace(0,10,50)

array([ 0.        ,  0.20408163,  0.40816327,  0.6122449 ,  0.81632653,
        1.02040816,  1.2244898 ,  1.42857143,  1.63265306,  1.83673469,
        2.04081633,  2.24489796,  2.44897959,  2.65306122,  2.85714286,
        3.06122449,  3.26530612,  3.46938776,  3.67346939,  3.87755102,
        4.08163265,  4.28571429,  4.48979592,  4.69387755,  4.89795918,
        5.10204082,  5.30612245,  5.51020408,  5.71428571,  5.91836735,
        6.12244898,  6.32653061,  6.53061224,  6.73469388,  6.93877551,
        7.14285714,  7.34693878,  7.55102041,  7.75510204,  7.95918367,
        8.16326531,  8.36734694,  8.57142857,  8.7755102 ,  8.97959184,
        9.18367347,  9.3877551 ,  9.59183673,  9.79591837, 10.        ])

## Common Matrix Creation

We can create arrays with n number of zeros

In [8]:
np.zeros(5)

array([0., 0., 0., 0., 0.])

We can give it a dimension and create a matrix of zeros

In [9]:
np.zeros((3,3))

array([[0., 0., 0.],
       [0., 0., 0.],
       [0., 0., 0.]])

Instead of zeros, we can do the same thing but with a matrix of ones

In [10]:
np.ones((3,3))

array([[1., 1., 1.],
       [1., 1., 1.],
       [1., 1., 1.]])

We can also make an identity matrix (all zeros except the diagonal are ones)

In [11]:
np.eye(3)

array([[1., 0., 0.],
       [0., 1., 0.],
       [0., 0., 1.]])

## Merging Arrays

If we have two numpy arrays we can add them

<img src="../jupyter-notebook-images/concat.jpg" style="width: 50%"/>

In [25]:
x = np.array([1,2,3])
y = np.array([4,5,6])

concat = np.concatenate((x,y))

print("Array 1", x)
print("Array 2", y)
concat

Array 1 [1 2 3]
Array 2 [4 5 6]


array([1, 2, 3, 4, 5, 6])

We can also concatenate vertically as well using vstack

<img src="../jupyter-notebook-images/vstack.jpg" style="width: 50%"/>

In [24]:
x = np.array([1,2,3])
y = np.array([4,5,6])

concat = np.vstack((x,y))

print("Array 1", x)
print("Array 2", y)
concat

Array 1 [1 2 3]
Array 2 [4 5 6]


array([[1, 2, 3],
       [4, 5, 6]])

We can also add arrays horizontally using hstack

<img src="../jupyter-notebook-images/hstack.jpg" style="width: 50%"/>

In [22]:
a = np.array([[1],[2],[3]])
b = np.array([[4],[5],[6]])

concat = np.hstack((a,b))

print("Array 1", x)
print("Array 2", y)
concat


Array 1 [1 2 3]
Array 2 [4 5 6]


array([[1, 4],
       [2, 5],
       [3, 6]])

## Max, Min, Avg, Median

In [58]:
arr = [1,2,3,5,14,17,20]

Get the MAX value in array

In [60]:
np.max(arr)

20

Get the MIN value in array

In [61]:
np.min(arr)

20

Get the average or mean

In [62]:
np.mean(arr)

8.857142857142858

Get the median value

In [63]:
np.median(arr)

5.0

# 3) Grabbing part of the array

Getting a part of an array is very similar to lists

In [27]:
arr = np.array([1,2,3,4,5])

Grabbing one value

In [28]:
arr[2]

3

Grabbing between two index

In [29]:
arr[1:3]

array([2, 3])

Grabbing last value

In [30]:
arr[-1]

5

# 4) Selecting part of array based on criteria

Another important attribute of numpy arrays is that we can select parts of an array based on a criteria. Think of it as filtering based on some conditions

In [38]:
arr = np.array([1,3,6,19,30,340,583])

Filtering based on numerical values

In [47]:
print(arr[arr<50])
print(arr[(arr > 5) & (arr < 20)])

[ 1  3  6 19 30  8]
[ 6 19  8]


Filtering based on if value exists in another list ( wil be more useful as we talk about dataframes later)

In [55]:
arr = np.array([1,3,6,19,30,340,583,8])
lucky_nums = [3,8]

np.in1d(arr, lucky_nums)

array([False,  True, False, False, False, False, False,  True])

# 5) Mathematical Operations on Arrays
We can also do math on arrays similar to if they were just one value

In [64]:
arr = np.array([1,3,6,19,30,340,583,8])

We can do basic math similar to how we did with lists

In [70]:
arr * 2

array([   2,    6,   12,   38,   60,  680, 1166,   16])

In [71]:
arr + 2

array([  3,   5,   8,  21,  32, 342, 585,  10])

In [73]:
arr + arr

array([   2,    6,   12,   38,   60,  680, 1166,   16])

In [74]:
arr * arr

array([     1,      9,     36,    361,    900, 115600, 339889,     64])

Getting the exp() of every element

In [67]:
np.exp(arr)

array([2.71828183e+000, 2.00855369e+001, 4.03428793e+002, 1.78482301e+008,
       1.06864746e+013, 4.57218555e+147, 1.56200691e+253, 2.98095799e+003])

In [76]:
np.sqrt(arr)

array([ 1.        ,  1.73205081,  2.44948974,  4.35889894,  5.47722558,
       18.43908891, 24.14539294,  2.82842712])

In [75]:
np.log(arr)

array([0.        , 1.09861229, 1.79175947, 2.94443898, 3.40119738,
       5.82894562, 6.36818719, 2.07944154])

# 6) Copying Arrays (BONUS)

An IMPORTANT computer science concept relates to how we copy data structures. When we work with data structures like lists and arrays, we can't just reassign an array to another variable and expect it to be a new copy.

Important concepts to understand
1) When we assign a variable to another variable that refers to an array, both variables can change the array
2) Variable names is only a reference to the location of the array in memory
3) If we want another variable to be the same as another array but independent of each other, we have to use a deep copy

In [80]:
arr1 = [1,2,3]

arr2 = arr1

print(arr1)
print(arr2)

[1, 2, 3]
[1, 2, 3]


Changes to arr2 will make the same change in arr1 because they reference the same array

<img src="../jupyter-notebook-images/reference-copy.jpg" style="width: 75%"/>

In [81]:
arr2[2] = 10

print(arr1)
print(arr2)

[1, 2, 10]
[1, 2, 10]


If we want arr1 and arr2 to be independent of each other, we need to use the .copy() method
<img src="../jupyter-notebook-images/deep-copy.jpg" style="width: 75%"/>

In [84]:
arr1 = [1,2,3]
arr2 = arr1.copy()

arr1[1] = 10

print(arr1)
print(arr2)

[1, 10, 3]
[1, 2, 3]
