# Table of Contents
- Chapter notes 
    - Numpy
    - Vectorization and broadcasting
    - Numpy universal functions
    - view vs copy in NumPy
    - Learning NumPy for pandas (next chapter)

# Chapter 4 - NumPy Foundations 

- NumPy is the main package for scientific computing in Python
- NumPy uses compiled programs from C or FORTRAN to run array calculations many times faster
- one dimensional array only has one axis (no explicit column or row orientation)
- two dimensional array has more than one axis (two)
- numPy uses its own datatypes (like float64 instead of Pythons regular floats)
- vectorization, sum of a scalar and NumPy array (element-wise operation)
- scalar, basic Python data type like float or string (used to differentiate between data structures that use several data types like dictionaries)
- broadcasting, making the smaller array in an array operation automatically cross the larger array so their shapes are compatible 

#### Universal Functions
- Universal functions (ufunc), functions that work on every element in a Numpy array
- ufunc is much faster for larger arrays than using loops
- ufunc's are methods you can use with the arrays that NumPy provides!
- ufunc's can be used with pandas DataFrames

#### Getting and Setting Array Elements
- chained indexing, matrix[0][0], using more than one index for a more than one dimensional structure
- NumPy syntax for chained indexing: numpy_array[row_selection, column_selection]
- index syntax: start:end

#### useful Array Constructors
- arange function, array range, similar to Pythons range() but it returns a NumPy array
- Monte Carlo simulations use psuedo random arrays, can make them using .random.randn()

#### View vs. Copy
- NumPy arrays return views when they are sliced (no real copy of new array)
- setting a value on a view will also change the original array


#### Two main issues of NumPy (For Data Analysis)
- whole NumPy array needs to be of the same data type
- using NumPy arrays for data analysis makes it hard to know which column or row refers to

In [2]:
#array calculations without NumPy
matrix = [[1,2,3],[4,5,6],[7,8,9]]
[[i + 1 for i in row] for row in matrix] #list comprehension

[[2, 3, 4], [5, 6, 7], [8, 9, 10]]

In [3]:
import numpy as np

In [4]:
#Constructing an array with a simple list results in a 1d array
array1 = np.array([10, 100, 1000.])

In [5]:
#Constructing an array with a nested list results in a 2d array
array2 = np.array([[1., 2., 3.], [4.,5.,6.]])

In [6]:
array1.dtype

dtype('float64')

In [7]:
#explicitly casting an array element in NumPy
float(array1[0])

10.0

In [8]:
#example of vectorization
array1 + 1

array([  11.,  101., 1001.])

In [9]:
array1 * array2

array([[  10.,  200., 3000.],
       [  40.,  500., 6000.]])

In [11]:
array2 @ array2.T # array2.T is a shortcut for array2.transpose()

array([[14., 32.],
       [32., 77.]])

In [12]:
#example of Python math library not working with NumPy arrays
import math

math.sqrt(array2) #will raise error

TypeError: only size-1 arrays can be converted to Python scalars

In [13]:
#how to compute error above correct way using loop

np.array([[math.sqrt(i) for i in row] for row in array2])

array([[1.        , 1.41421356, 1.73205081],
       [2.        , 2.23606798, 2.44948974]])

In [14]:
#using a ufunc instead

np.sqrt(array2)

array([[1.        , 1.41421356, 1.73205081],
       [2.        , 2.23606798, 2.44948974]])

In [15]:
#sum of each column
#Returns 1d array

array2.sum(axis=0)

array([5., 7., 9.])

In [16]:
array2.sum()

21.0

In [17]:
array1[2] # returns a scalar

1000.0

In [18]:
array2[0,0] # returns a scalar

1.0

In [19]:
array2[:,1:] # returns a 2d array

array([[2., 3.],
       [5., 6.]])

In [20]:
array2[:,1] # returns a 1d array

array([2., 5.])

In [21]:
array2[1, :2] # returns a 1d array

array([4., 5.])

In [22]:
np.arange(2 * 5).reshape(2,5) # 2 rows, 5 columns

array([[0, 1, 2, 3, 4],
       [5, 6, 7, 8, 9]])

In [23]:
np.random.randn(2,3) # 2 rows, 3 columns

array([[-0.81367624, -0.86541404, -0.08545628],
       [ 1.36118403,  1.66898984, -1.34802406]])

In [24]:
array2

array([[1., 2., 3.],
       [4., 5., 6.]])

In [26]:
subset = array2[:, :2]   #all rows, first 2 columns
subset

array([[1., 2.],
       [4., 5.]])

In [27]:
subset[0,0] = 1000
subset

array([[1000.,    2.],
       [   4.,    5.]])

In [28]:
array2  #array2 was changed by subset

array([[1000.,    2.,    3.],
       [   4.,    5.,    6.]])

In [29]:
#to get new copy of view
subset = array2[:, :2].copy()