# Learning Numpy

### Introduction

- Numpy is a very popular library which is used for dealing with numbers exclusively.
- It is actually a very simple and limited library but it has a lot useful ibraries bui;t on top of it 
- Python is very slow for procecssing laarge amounts of data $\rightarrow$ Numpy solves this problem exclusively
- In actual programs we rarely employ Numpy directly, but it gets used in other libraries such as Pandas and Matplotlib
- We need to have a low level understanding of how computers see numbers to understand exactly how Numpy works

### A Low Level Understanding of Numpy

- A computer can only store binary values i.e. 1's and 0's
- 8 bits = 1 byte
- A computer can't just read data right off a hard disk, it must first be loaded onto the computer's memory
- 8 Gigabytes (GB) = 8192 Megabytes (MB) = 8,388,608 Kilobytes (KB) = 8,589,934,592 Bytes (B) = 68,719,476,736 bits (b)
- Numpy allows us to very accurately select the number of bits we want to use to store a number
- Python wastes a ton of memory when storing numbers because a lot of redudandant bits end up getting used and because python stores a lot more information about each integer (like `__getattr__`, `__setattr__`, etc.)
- This means that you need 28 bytes just to store a single digit integer!
- Compare this to Numpy's `int8` which only requires 1 byte.

In [None]:
import numpy as np

In [None]:
# this can be used to declare an integer with 8 bits precisely
np.int8

## Fast Array Processing

#### Introduction

- It is not guaranteed that all numbers in a python list will be stored in the same place
- On the other hand, when we create an `int8` array in Numpy then a contiguous block of memory is created with no more and no less than what was needed. 
- This also allows us to use some very efficient low-level CPU directives

#### Declaring Numpy Arrays

In [None]:
# declaring a Numpy array
a = np.array([1,2,3,4])

In [None]:
# Accessing elements works the same way
a[0]

In [None]:
# Slicing to get sub-array also works the same way, it still gives us a Numpy array
a[0:1]

In [None]:
# Multi-indexing to get multiple elements from a Numpy array in the form of a Numpy array
a[[0,1,-1]]

#### Array Types

- By default Numpy is using 64-bit integers because this is a 64-bit platform

In [None]:
a.dtype

In [None]:
b = np.array([1,2,3,4], dtype=np.float)

In [None]:
b.dtype

#### Multi-dimensional Arrays

- We can also create multi-dimenstional Numpy arrays
- The `shape` of a numpy array gives us a tuple containing the number of rows and columns in the numpy array
- The `ndim` gives us the dimensions of a particular array
- The `size` gives us the total number of elements in the array
- If the shape isn't consistent then the array will fall back to being a regular python object!

In [None]:
c = np.array([
    [1,2,3],
    [2,3,4],
    [3,4,5]
])

In [None]:
c.shape

In [None]:
c.ndim

In [None]:
c.size

In [None]:
c[1]

In [None]:
c[1][0]

In [None]:
# We can also use multi-selection for Numpy array
c[1, 0]

In [None]:
# Slicing also works the same way
c[:2, :2]
# this will take whatever is before row2 and before col2

In [None]:
c[2] = np.array([10, 11, 12])
c

### Summary Statistics

These are some very useful methods
- `.sum()` $\rightarrow$ gets the sum of all elements
- `.mean()` $\rightarrow$ gets the mean of all elements
- `.std()` $\rightarrow$ gets the standard deviation
- `.var()` $\rightarrow$ gets the variance
- When using matrices, we can give a kwarg `axis=` to the sums along a particular dimension. 
    - Using `axis=0` gets you a Numpy array with the sums of each column. 
    - Using `axis=1` gets you a Numpy array with the sums of each row. 

In [None]:
c

In [None]:
c.sum()

In [None]:
c.sum(axis=0)

In [None]:
c.sum(axis=1)

### Broadcasting and Vectorised Operations

- This is one of the most fundamental aspects of Numpy
- These operations are optimised to be extremely fast

In [None]:
arr = np.arange(4)
arr

In [None]:
# the operation gets applied to each element
arr + 10

In [None]:
# we can perform an operation and assign the new values in a similar way
arr += 10

In [None]:
arr

In [None]:
barr = np.array([9,8,7,6])

In [None]:
barr

In [None]:
# We can add all the eorresponding elements of both the arrays if they have the same number of elements
arr + barr

### Boolean Arrays

- These are similar to vectorised operations
- We don't just have mathematical operations in programming (like +,-,*), we also have boolean operators (like and, or, not)

In [None]:
a = np.arange(4)
a

There are three ways by which we can select the first and the last element

In [None]:
# first method - Python
a[0], a[-1]

In [None]:
# second method - Numpy multi-indexing
a[[0, -1]]

In [None]:
# third method - boolean arrays
a[[True, False, False, True]]

But writing True/False for every element is obviously not scalable.

**However, we can create those boolean arrays using vectorised operations!**

In [None]:
a >= 2

In [None]:
a[a>=2]

In [None]:
# all elements greater than the mean
a[a > a.mean()]

### Linear Algebra

Numpy already contains extremely well optimised implementaions of all common linear algebra operations:
- `<matrix_A>.dot(<matrix_B>)` $\rightarrow$ gets the dot product of these matrices
- Similarly for `<matrix_A>.cross(<matrix_B>)` for cross product
- `.T` $\rightarrow$ for transposing a matrix
- The `@` symbol is actually used as an operator for matrix multiplication
- Combinations of these symbols can be used to create complex arithmetic

## Performance Comparison

In [None]:
l = list(range(100000))

In [None]:
a = np.arange(100000)

In [None]:
%time np.sum(a ** 2)

In [None]:
%time sum([x ** 2 for x in l])