# Intro to numpy

####  Review and Outline

Great Work! We have made it this far...we know some basic calculations, built-in data types and structures (lists, tuples, strings, dictionaries), we also know some key operations if else conditional operations, for loops, etc. 

Where are we going know...we will get into the key scientific computing packages in python: **numpy**. 

What is numpy, a short for **Numerical Python**. It can be used for high performance computing and data analysis. 
* **Efficiency**: it provides the most efficient data structure: `ndarray` for this type of computing. Imagine when you need to conduct calculations on more than 200k rows with 10k columns over and over again.
* **Data analysis**: though itself does not provide very high-level data analytical function as `pandas`, having an understanding of it will help us use tools in pandas with less pain.

[This notebook largely follows the discussion in the Book.](https://nyudatabootcamp.gitbooks.io/data-bootcamp/content/py-fun2.html)

#### Python

Just like `pandas`, first we need to import the `numpy` package. 

Then we will learn the key data structures in **numpy** and their attributes and methods. Moreover, we will learn how to select data in **`DataFrame`** and then do computations afterwards.

**Buzzwords.** DataFrame, Series

---
## Basics

In [None]:
import numpy as np

### Array

The array is the primary building block of numpy. It enables us to perform mathematical computations efficiently using similar syntax to the equivalent operations for scalar elements as we learned in python fundamental notebook 1. So let's creat an array object via `array` methods in `numpy`.

In [None]:
data1 = [3.4,2,7,1.4,5]
arr1 = np.array(data1)

print(arr1)
print(arr1.dtype)

In [None]:
# Let's create an another array
data2 = [5,2,1,3,4]
arr2 = np.array(data2)


Now we can do some simple computations like we've done for scalars in python fundamental notebook 1.

In [None]:
arr1+arr2

In [None]:
arr1*arr2

In [None]:
print(arr1.shape)

It seems that there is something missing after the comma. Why? Is it wrong or undefined.

No, it is not wrong but will sometimes lead to unexpected results in computations, especially for operations among matrices and this type of arrays. So we recommend using the `reshape` methods in `numpy` to specify the second dimension as 1.

In [None]:
arr1=arr1.reshape(5,1)
print(arr1.shape)

Three more ways to initialize arrays

In [None]:
arrZeros = np.zeros((2,3))
arrZeros

In [None]:
arrOnes = np.ones((2,2))
arrOnes

In [None]:
arrEyes = np.eye(3)
arrEyes

In fundamental notebook 2, we have learned the `range` object when using it with for loops. Here we present the `numpy` array version of it.

In [80]:
np.arange(0,10,2)

array([0, 2, 4, 6, 8])

### Transpose an array

In `numpy`, transpose an array is super easy and fast via `.T`.


In [83]:
print(arrZeros.shape)

print(arrZeros.T.shape)

(2, 3)
(3, 2)


### A Gentle Touch on Broadcasting

Arrays with different sizes cannot be added, subtracted, or generally be used in arithmetic.

A way to overcome this is to duplicate the smaller array so that it is the dimensionality and size as the larger array. This is called array **broadcasting** and is available in `numpy` when performing array arithmetic, which can greatly reduce and simplify your code.

For example, what will be the results?

In [None]:
arr3 = arr1+2
arr3

It broadcasts the scalar value **2** five times and add it to the each value in the **arr1**.

---
## Slicing

Slicing in numpy array is pretty much like we do for lists. Let's first define a two-dimensional array and then review what we have learned. 


In [87]:
arr4=np.array([[2,3,4],[8,5,7]])
arr4

array([[2, 3, 4],
       [8, 5, 7]])

How to get number **3** from the above 2-dimensional arrays?

In [None]:
arr4[0,1]

In [None]:
arr4[0][1]

In [91]:
arr4[0,1:2]

array([3])

Can you figure out why this line of code only return one number instand of 3 and 4? In particular, this is different for the methods in pandas `iloc` dataframe methods. Be careful with the indexing hassals for different data structure, it may result potential errors and hard to identify. 

See the example...

In [95]:
import pandas as pd

arr4_datafram=pd.DataFrame(arr4)
arr4_datafram.iloc[0,1:2]


1    3
Name: 0, dtype: int64

In addition, we can continue using **forward** counter, a **backward** counter, and **:** operator like we did with list or string data structures when selecting data.

---
## Useful Math Methods in Numpy


### Elementwise Methods

Remeber in python fundamental notebook1, when we want to compute the log of a scalar, it returns an error, saying not defined. Yes, it is. Since in python, the majority of math operations like log, exp and so on are defined in `numpy` package. 

In [None]:
arr1_log = np.log(arr1)
arr1_log

In [None]:
arr1_exp = np.exp(arr1)
arr1_exp

In [None]:
arr1_sqrt = np.sqrt(arr1)
arr1_sqrt

### Array-wise Operation

In [89]:
arr4

array([[2, 3, 4],
       [8, 5, 7]])

What will we get in the following？

In [None]:
np.sum(arr4)

Remember last time, we talk about axis right and when want to perform row-wise operation, we set axis=0

In [None]:
np.sum(arr4,axis=0)

In [None]:
np.sum(arr4,axis=1)

---
### Time to practice

**Exercises.** How to compute the **column** mean?

**Exercises.** How to compute the **column** mean in second and third column?

**Exercises.** How to compute the **row** mean?

### Random number generator 

We can use randn random number generator to generate an `numpy` with samples from a  “standard normal” distribution in specified shape.

For example, we generate a 2 by 4 random number array...

In [90]:
np.random.randn(2,4)

array([[ 1.38234093,  0.3194114 ,  1.24733287, -0.75388242],
       [ 0.14107566, -1.02258821, -1.21415769, -1.38138594]])

---
## Conditional Selection and Data Manipulation

Sometimes, you might want to only do certain operations on selected rows in an array. `np.where` is the solution. Before knowing this one, I have to rely on my certain choice but as you see, sometimes `np.where` is more intuitive and straight. 

In this example, we want to perform exponential operations on the original elements of **arr1** larger than 4 while setting values to 0 if they are less than or equal to 4

In [None]:
arr1

In [None]:
np.where(arr1>4,np.exp(arr1),0)

Alternatively, we can use the alternative ways as follows. But in this way, we'll need to change the original data `arr1`, which might not be a good practice.

In [None]:
arr1[arr1>4]=np.exp(arr1[arr1>4])
arr1[arr1<=4]=0
arr1

---
### Time to practice

---
## Summary