## Numpy
* A very important Scientific library and toolbox that provides the foundation for several otrher libraries
* ndarray, an efficient multidimensional array providing fast array-oriented arithmetic
operations and flexible broadcasting capabilities.
* Mathematical functions for fast operations on entire arrays of data without having
to write loops.
* Tools for reading/writing array data to disk and working with memory-mapped
files.
* Linear algebra, random number generation, and Fourier transform capabilities.
* A C API for connecting NumPy with libraries written in C, C++, or FORTRAN.

## Our Focus Topics
* Data cleaning, filtering, transformations
* Common array operations like sorting, unique, and set operations
* Descriptive statistics, aggregating/summarizing data
* Data alignment and relational data manipulations: merging and joining together heterogeneous datasets
* Conditional logic on arrays: if-elifelse branches
* Group-wise data manipulations: aggregation, transformation, higher order functions

In [None]:
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
import seaborn as sns
import statsmodels as sm

In [None]:
from IPython.core.interactiveshell import InteractiveShell
InteractiveShell.ast_node_interactivity = "all"

In [None]:
%matplotlib inline

In [None]:
np.random.seed(12345)
np.set_printoptions(precision=4, suppress=True)

In [None]:
import time
a = np.random.rand(1000000)
b = np.random.rand(1000000)
c = 0


In [None]:
t1 = time.time()
for i in range(1000000):
    c += a[i] * b[i]
t2 = time.time()
print('{:.6f}'.format(c))
delta = (t2 - t1) * 1000.0
print("Array version: time = {:.6f} ms".format(delta))


In [None]:
t1 = time.time()
c = np.dot(a,b)
t2 = time.time()
delta = (t2 - t1) * 1000.0
print('{:.6f}'.format(c))
print("Vectorized version: time = {:.6f} ms".format(delta))


### Return a sample (or samples) from the “standard normal” distribution.

In [None]:
data = {i : np.random.randn() for i in range(7)}
data

In [None]:
data = np.random.randn(2,3)
data

In [None]:
data * 10

In [None]:
data.dtype

In [None]:
data2 = [[1, 2, 3, 4], [5, 6, 7, 8]]
arr2 = np.array(data2)
arr2

In [None]:
arr2.dtype

In [None]:
np.zeros(10)

In [None]:
np.ones(19)

In [None]:
np.zeros((3,5))

In [None]:
np.empty((2, 3, 2))

* It’s not safe to assume that np.empty will return an array of all. zeros. In some cases, it may return uninitialized “garbage” values.

![image.png](attachment:image.png)
Source: Python for Data Analysis, second Edition, 2017 Wes Mckinney, Oreilly

In [None]:
arr = np.array([1, 2, 3, 4, 5])
arr.dtype

In [None]:
float_arr = arr.astype(np.float64)
float_arr.dtype

In [None]:
arr = np.array([3.7, -1.2, -2.6, 0.5, 12.9, 10.1])
arr

In [None]:
arr.astype(np.int32)

In [None]:
numeric_strings = np.array(['1.25', '-9.6', '42'], dtype=np.string_)
numeric_strings.astype(float)

In [None]:
int_array = np.arange(10)
values = np.array([.22, .270, .357, .380, .44, .50], dtype=np.float64)
int_array.astype(values.dtype)

## Vectorization
* Arrays enable you to express batch operations on data without writing any for loops
* This is called vectorization
* Any arithmetic operations between equal-size arrays applies the operation element-wise


In [None]:
arr = np.array([[1., 2., 3.], [4., 5., 6.]])
arr
arr * arr

In [None]:
arr - arr

In [None]:
In [55]: 1 / arr

In [None]:
In [56]: arr ** 0.5

In [None]:
arr2 = np.array([[0., 4., 1.], [7., 2., 12.]])
arr2

In [None]:
arr2 > arr

### Numpy array indexing, views and copying

In [None]:
arr = np.arange(10)
arr

In [None]:
arr[5]

In [None]:
arr[5:8]

In [None]:
arr_slice = arr[5:8]
arr_slice

In [None]:
py_arr = list(range(10))
py_arr

In [None]:
py_arr_slice = py_arr[5:8]
for i in range(len(py_arr_slice)):
    py_arr_slice[i] = 12
py_arr,py_arr_slice

In [None]:
arr_slice = arr[5:8]
arr_slice[:] = 12
arr_slice,arr

In [None]:
arr_slice2 = arr_slice.copy()
arr_slice2[:] = 10
arr,arr_slice,arr_slice2

### indexing two dimensional arrays

In [None]:
arr2d = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]])
arr2d[2]

In [None]:
arr2d[0][2]

In [None]:
arr2d[0, 2]

In [None]:
arr3d = np.array([[[1, 2, 3], [4, 5, 6]], [[7, 8, 9], [10, 11, 12]]])
arr3d

In [None]:
arr3d.shape

In [None]:
arr3d[0]

In [None]:
old_values = arr3d[0].copy()

In [None]:
arr3d[0] = 42

In [None]:
arr3d

In [None]:
arr3d[0] = old_values

In [None]:
arr3d

arr3d[1, 0] gives you all of the values whose indices start with (1, 0),forming a 1-dimensional array

In [None]:
arr3d[1, 0]

In [None]:
x = arr3d[1]
x

In [None]:
x[0]

In [None]:
x[0,1]

In [None]:
arr

In [None]:
arr[1:6]

In [None]:
arr2d

In [None]:
arr2d[:2]

In [None]:
arr2d[:2,:1]

How can I get a slice consisting of the two-dimensional array [[2,3],[5,6]]?

How can I get a slice consisting of column 2 (third column) but only first two rows?

In [None]:
arr2d[:2, 1:] = 0
arr2d

### Conditional indexing

In [None]:
names = np.array(['Ali', 'Sohail', 'Sana', 'Umar', 'Sohail', 'Asim', 'Sania'])
data = np.random.randn(7, 4)

In [None]:
names

In [None]:
data

In [None]:
names == 'Sohail'

In [None]:
data[names == 'Sohail']

In [None]:
np.append(names,['Asif'],axis=0)

In [None]:
data[names == 'Asif']

In [None]:
data[names == 'Sohail',2:]

In [None]:
data[names != 'Sohail']

In [None]:
data[~(names == 'Sohail')]

In [None]:
condition = (names == 'Sohail')

In [None]:
data[~condition]

For combining multiple conditions Python keywords ***and*** and ***or*** do not work with boolean arrays.
Use & (and) and | (or) instead.

In [None]:
mask = (names == 'Sohail') | (names == 'Sana')

In [None]:
mask

In [None]:
data[mask]

### Setting values in an array based on conditions

In [None]:
data[data < 0] = 0

In [None]:
data

### Fancy Indexing
A term adopted by NumPy to describe indexing using integer arrays. Unlike slicing, fancy indexing always copies the data into a new array

In [None]:
arr = np.empty((8, 4))
for i in range(8):
    arr[i] = i
arr

In [None]:
arr[[4, 3, 0, 6]] # selects rows 4,3,0 and6 in that order

#### Negative indexes start from end

In [None]:
arr[[-3, -5, -7]]

#### Passing multiple index arrays

In [None]:
arr[[1, 5, 7, 2], [0, 3, 1, 2]] # selects elements (1,0),(5,3),(7,1),(2,2)

Regardless of how many dimensions the array has, the result of fancy indexing is
always one-dimensional.

In [None]:
arr = np.arange(15)
arr

### Transpose, Reshape and Swapping Axes

In [None]:
arr = arr.reshape((3,5))

In [None]:
arr

In [None]:
arr.transpose()

In [None]:
arr.T

In [None]:
arr2 = np.random.randn(6, 3)
arr3 = arr2.T

In [None]:
results = np.dot(arr3,arr2)

In [None]:
results

In [None]:
results = np.dot(arr2,arr3)

In [None]:
results

In [None]:
arr_t = arr.transpose(1,0,2)

In [None]:
arr_t

### Reshaping and Flattening

In [None]:
arr = np.arange(24).reshape((2, 3, 4))

In [None]:
arr

In [None]:
arrnew = np.arange(15)
arrnew = arrnew.reshape((5, -1))
arrnew

In [None]:
other_arr = np.ones((3, 5))
other_arr.shape

In [None]:
arrnew.reshape(other_arr.shape)

#### Flattening the array

In [None]:
arrnew.ravel()

In [None]:
arrnew.flatten()

#### Swapaxes method takes axes to swap

In [None]:
arr.swapaxes(1,2)