# The NumPy Library
This is the basic package for scientific computing with Python and especially for data analysis.

## The NumPy Installation
* Linux
sudo apt-get install python-numpy
* Anaconda
conda install numpy

In [3]:
# Importing the NumPy modules for this session 
import numpy as np

## Ndarray: The Heart of the Library
* ndarray stands for N-dimensional array
    * The number of dimensions and items ina n array is defined by its shape, a tuple of N-positive integers that specifies size for each dimension.
    * Dimensions = axes, and number of axes as rank

## NdArray 

In [4]:
a = np.array([1,2,3])
a

array([1, 2, 3])

In [5]:
type(a)

numpy.ndarray

In [6]:
a.dtype

dtype('int64')

In [7]:
# Use ndim attribute for getting the axe
print(a.ndim)
# Use size attribute for array length
print(a.size)
# Use shape attribute to get its shape
print(a.shape)
# Use .T to transpose shape
print(a.T)

1
3
(3,)
[1 2 3]


## Different Shapes of Arrays

In [12]:
# Dimensional 2x2 array
b = np.array([[1.3, 2.4],[0.3, 4.1]])
print(b.dtype)
print(b.ndim)
print(b.shape)
# Array rank 2, since 2 axes each of 2 length

float64
2
(2, 2)


### itemsize Attribute
* Can be used with ndarray objects
* Defines the size in bytes of each item in the array
### data Attribute
* is the buffer containing the actual elements of the array

In [14]:
print(b.itemsize)
print(b.data)

8
<memory at 0x7fc54006c048>


## Create an Array
* The most common path is through lits or sequence of lists as arguments to the array() function.

In [16]:
c = np.array([[1,2,3],[4,5,6]])
c

array([[1, 2, 3],
       [4, 5, 6]])

* The array() function, in addition to lists can accept tuples and sequences of tuples. 
    * Tuples are immutable and fixed, belove is a sequence of tuples

In [21]:
d = np.array(((1,2,3),(4,5,6))) # tuples in () with elements separated by ,
d

array([[1, 2, 3],
       [4, 5, 6]])

* It can also accept sequences of tuples and interconnected lists.

In [22]:
e = np.array([(1,2,3), [4,5,6], (7,8,9)]) # Grabbing both types of data
e

array([[1, 2, 3],
       [4, 5, 6],
       [7, 8, 9]])

## Types of Data
* NumPy arrays can be designed to contain a wide variety of data types
    * Below is an example of data type string

In [26]:
g = np.array([['a', 'b'],['c', 'd']])
g

array([['a', 'b'],
       ['c', 'd']], dtype='<U1')

In [30]:
g.dtype

dtype('<U1')

In [29]:
g.dtype.name

'str32'

## The dtype Option
* The array() function does not accept a single argument.
    * This associates the most suitable type that will occupy each item in the library.
* The dtype option allows you to explicitly define the dtype using the dtype option as argument of the function

In [32]:
# Defining an array with complex values
f = np.array([[1,2,3],[4,5,6]], dtype=complex)
f # Shows the array in j

array([[1.+0.j, 2.+0.j, 3.+0.j],
       [4.+0.j, 5.+0.j, 6.+0.j]])

## Intrinsic Creation of an Array
* The NumPy library provides a set of functions that generat ndarrays with initial content, created with different values depending on the function.
    * Allowing a single line of code to generate large amounts of data.

### zeros() function
* Creates a fully array of zeroes with dimensions defined by the shape.

In [34]:
# 3x3 (rows, columns) filled with 0s
np.zeros((3,3))

array([[0., 0., 0.],
       [0., 0., 0.],
       [0., 0., 0.]])

### ones() function
* Creates a fully array of ones with dimensions defines by the shape.

In [35]:
np.ones((3,3))

array([[1., 1., 1.],
       [1., 1., 1.],
       [1., 1., 1.]])

### arange() function
* This function generates NumPy arrays with numerical sequences that respond to particular rules dpending on the passed argument.
    * For examples, generates a sequence of values between 0 and 10.

In [39]:
np.arange(0, 10)
# library_name.function_name.(start, end)
# Creates an array with start, end
# Start at 0, go to 10 but not including 10

array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

In [40]:
# Starting from a different number
np.arange(4, 10)
# Start at 4, go to 10 but not including 10

array([4, 5, 6, 7, 8, 9])

In [41]:
# Can also be a float instead of int
np.arange(0, 6, 0.6)

array([0. , 0.6, 1.2, 1.8, 2.4, 3. , 3.6, 4.2, 4.8, 5.4])

### reshape() function
* To generate two-dimension arrays you can still continue to use the arange() function but combined with the rehape() function.
    * This functions dives a linear array in differnet parts in the manner specified by shape.

In [43]:
np.arange(0, 12).reshape(3,4)
# Make this array, give it 3 rows and 4 columns

array([[ 0,  1,  2,  3],
       [ 4,  5,  6,  7],
       [ 8,  9, 10, 11]])

### linspace() function
* The first two arguments are still start, end values of the sequence, but the 3rd argument instead of specifying the distance between one element and the next...
    * defines the number of elements in which we want the interval to be split.

In [45]:
np.linspace(0,10,5)
# 5 elements evenly spaced into 5 entries from 0 to 10.

array([ 0. ,  2.5,  5. ,  7.5, 10. ])

### random() function
* This is part of the numpy.random modules
    * This generates an array with many elements as specified in the argument.

In [47]:
np.random.random(3)
# Create random numbers from 0-1 in a 3 entry array

array([0.09554733, 0.54127461, 0.88107371])

In [49]:
np.random.random((3,3)) # row, column
# Creates a multidimensional array 

array([[0.19360937, 0.66997144, 0.85191718],
       [0.82468937, 0.19324519, 0.05215793],
       [0.46516633, 0.73678422, 0.46316788]])

## Basic Operations
* This is how to apply various operations on NumPy arrays.

### Arithmetic Operations
#### The first operations that you will perform on arrays are the arithmetic operations.
* The most obvious are adding and multiplying an array by a scalar

In [51]:
a = np.arange(4)
a

array([0, 1, 2, 3])

In [52]:
a + 4 # Add 4 to each original element

array([4, 5, 6, 7])

In [53]:
a * 2 # Multiply each original element by 2

array([0, 2, 4, 6])

* These operators an also be used between 2 arrays 
    * In NumPy, these operations are element-wise
        * Operators are applied only between corresponding elements.

In [54]:
b = np.arange(4,8)
b

array([4, 5, 6, 7])

* Addition

In [57]:
a + b # Remember a = np.arange(4) = [0,1,2,3] + [4,5,6,7] = [4,6,8,10]

array([ 4,  6,  8, 10])

* Subtraction

In [59]:
a - b

array([-4, -4, -4, -4])

* Multiplication

In [60]:
a * b

array([ 0,  5, 12, 21])

* As long as it is a NumPy array, you can multiply the array by the sine or the square root of the elements of array b

In [62]:
a * np.sin(b) # nparray a x sin applied to every b element

array([-0.        , -0.95892427, -0.558831  ,  1.9709598 ])

In [63]:
a * np.sqrt(b)

array([0.        , 2.23606798, 4.89897949, 7.93725393])

* In Multidimensional cases, the arithmetic operations continue to operate element-wise

In [65]:
A = np.arange(0,9).reshape(3,3)
A

array([[0, 1, 2],
       [3, 4, 5],
       [6, 7, 8]])

In [66]:
B = np.ones((3, 3))
B

array([[1., 1., 1.],
       [1., 1., 1.],
       [1., 1., 1.]])

In [67]:
A * B

array([[0., 1., 2.],
       [3., 4., 5.],
       [6., 7., 8.]])

### The Matrix Product
* In many other tools for data analysis, the * operator is understood as a matrix product when it is applied to 2 matrices.

### dot() function
* This operation is not element wise.
    * The result at each poition is the sum of the products of each element in the corresponding row of the first matrix with the corresponding element of the corresponding column of the 2nd matrix.

In [72]:
print(np.dot(A,B))
print(A)
print(B)

[[ 3.  3.  3.]
 [12. 12. 12.]
 [21. 21. 21.]]
[[0 1 2]
 [3 4 5]
 [6 7 8]]
[[1. 1. 1.]
 [1. 1. 1.]
 [1. 1. 1.]]


In [74]:
A.dot(B)
# array([[0 * 1 + 1 * 1 + 2 * 1 = 3's for all row 1 in new array]])
# array([[3 * 1 + 4 * 1 + 5 * 1 = (3+4+5) = 12 for all row 2]])
# array([[6 * 1 + 7 * 1 + 8 * 1 = (6+7+8) = 21 for all row 3]])
# Prevalent data type is float here

array([[ 3.,  3.,  3.],
       [12., 12., 12.],
       [21., 21., 21.]])

In [76]:
np.dot(B,A)
# The order of operations is important
# A * B is not equal to B * A

array([[ 9., 12., 15.],
       [ 9., 12., 15.],
       [ 9., 12., 15.]])

### Increment and Decrement Operators
* No such operators in Python called ++ or --
    * Use operators such as += and -+
* These do not make a new array, they will reassign the results to the same array

In [82]:
a = np.arange(4)
a

array([0, 1, 2, 3])

* Incremental (+=)
    * rewrite to original array 

In [83]:
a += 1
a

array([1, 2, 3, 4])

In [84]:
a -= 1
a

array([0, 1, 2, 3])

* These can be applied in many cases.
* You need them every time you want to change te values in an array without generating a new one.

In [85]:
a += 4
a

array([4, 5, 6, 7])

In [86]:
a *= 2
a

array([ 8, 10, 12, 14])

### Universal Function (ufunc)
* A universal function, generally called ufunc.
    * Is a function operating on an array in an element-by-element fashion.
    * In the end you obtain an array of the same size as the input.

* Many mathemetical and trigonometric operations that meet this definition.
    * For example, using sqrt(), log(), or sin with sin().

In [87]:
a = np.arange(1,5)
a

array([1, 2, 3, 4])

In [88]:
np.sqrt(a)

array([1.        , 1.41421356, 1.73205081, 2.        ])

In [89]:
np.log(a)

array([0.        , 0.69314718, 1.09861229, 1.38629436])

In [90]:
np.sin(a)

array([ 0.84147098,  0.90929743,  0.14112001, -0.7568025 ])

### Aggregate Functions
* These perform an operation ona  set of values, an array for example producing a single result
    * The sum of all elements in an array is an aggregate function.
* Implented with the clas ndarray.

In [92]:
a = np.array([3.3, 4.5, 1.2, 5.7, 0.3])
a.sum()
# Sum = 3.3 + 4.5 + 1.2 + 5.7 + 0.3

15.0

In [94]:
a.min()
# Minimum number in array

0.3

In [95]:
a.max()
# Maximum number in an array

5.7

In [96]:
a.mean()

3.0

In [97]:
a.std()

2.0079840636817816

### Indexing, Slicing, and Iterating
* This will show you how to manipulate these objects.
    * This shows you how to select elements through indexes and slides* In order to obtain the values contained in them or to make assignments in order to change their values.
* This will also how you how to make iterations within them.

### Indexing
* Array indexing always uses square brackets ([ ]) to index the elements of the array so that the elements can then be referred individually for various, uses such as extracting a value selecting items, or even assigning a new value.
* Every array as an appropriate scale index that is automatically created.
    * Scale index always starts at 0.
    * -1 is the last element and can continue backwards sequentially.

In [107]:
a = np.arange(10, 16)
a

array([10, 11, 12, 13, 14, 15])

* Positive Indexes

In [113]:
print(a[4]) # 4th element in the array
print(a[2])

14
12


* Negative Indexes

In [112]:
print(a[-1])
print(a[-6])

15
10


* To select multiple items at once, you can pass array of indexes as square brackets

In [116]:
a[[1, 3, 4]] 
# in ndarray, make another array from grabbing element 1, 3, and 4
# separate positions by commas in the inner paranthesis

array([11, 13, 14])

* In two-dimensional cases
    * Rectangular arrays consisting of rows and columns, defined by 2 axes.
        * Rows
            * axis = 0
        * Columns
            * axis = 1
    * Indexing in two-dimensional cases is represented by a pair of values
        * First value is the index of the row
        * Second value is the index of the column

* Accessing the values or select the elements in a matrix...
    * Still use square brackets [] but for the two values.
        * [row index, column index]

In [119]:
A = np.arange(10, 19).reshape((3,3))
A

array([[10, 11, 12],
       [13, 14, 15],
       [16, 17, 18]])

* Removing the element of the 3rd column in the 2nd row, you have to insert the pair [1,2]

In [120]:
A[1,2]

15

In [125]:
A[1,2] = 16
print(A)
A[1,2] = 15
print(A)

[[10 11 12]
 [13 14 16]
 [16 17 18]]
[[10 11 12]
 [13 14 15]
 [16 17 18]]


### Slicing

* Allows us to extract portions of an array to generate new arrays
* There is a slice syntax to flow
    * Use a sequence of numbers separated by columns(:) within square brackets. 

* Extracting a portion of the array, one tha tgoes form the 2nd to the 6th element.
    * Start at the index of the starting position, that is 1 and the index of the final element 5, that is separated by :.

In [128]:
a = np.arange(10, 16)
a

array([10, 11, 12, 13, 14, 15])

In [130]:
a[1:5] # start : end index

array([11, 12, 13, 14])

* Using a 3rd number that defines the gape in the sequence of elements.
    * In this example the value is 2

In [132]:
a[1:5:2] # start element 1, end element 5, increment 2 without going to 5

array([11, 13])

In [133]:
a[::2] # Every 2nd index starting at 0 to end

array([10, 12, 14])

In [135]:
a[:5:2] # Same as above

array([10, 12, 14])

In [136]:
a[:5:] # all elements until 5 with no incrementor

array([10, 11, 12, 13, 14])

### Slicing syntax in a Two-Dimensional Array

In [137]:
A = np.arange(10, 19).reshape((3,3))
A

array([[10, 11, 12],
       [13, 14, 15],
       [16, 17, 18]])

* Extracting the first row all columns

In [141]:
A[0,:] # Leaving only the colon without defining a number, you will select all columns.

array([10, 11, 12])

* Extracting all the values in the first column, you have to write the inverse.

In [143]:
A[:, 0] # All rows of the first columns (index = 0)

array([10, 13, 16])

* Extracting a smaller matrix
    * Explicity define all intervals with indexes that define them. 

In [146]:
print(A)
A[0:2, 0:2] # 2 x 2 matrix of rows consisting of 0 and 1 and columns consisting of 0 and 1

[[10 11 12]
 [13 14 15]
 [16 17 18]]


array([[10, 11],
       [13, 14]])

* If the Indexes of the rows or columns to be extracted are not contiguous, you can specify an array of indexes.

In [147]:
A[[0,2], 0:2]

array([[10, 11],
       [16, 17]])

### Iterating an Array
* Use the for construct

In [148]:
for i in a:
    print(i)

10
11
12
13
14
15


* In a Two-dimensional case, you could think of applying the sol'n of two nested loops with the for construct.
    * The first loop will scan the rows of the array.
    * The second loop will scan the columns
    * Using a for loop in a matrix will always perform a scan according to the first axis (0)

In [149]:
for row in A:
    print(row)

[10 11 12]
[13 14 15]
[16 17 18]


* Making an iteration element by element
    * use the for loop on A.flat.

In [150]:
for item in A.flat: # Turns in vertically
    print(item)

10
11
12
13
14
15
16
17
18


* If you want to launch an aggregate function that returns a value calculatedd for every single columns or every single row, there is an optimal way that NumPy manages the iteration

### apply_along_axis() function
* Takes 3 arguments
    * Aggregate function
    * The axis on which to apply the interation 
    * And the array
* If the option axis =  0, then the iteration evaluates the elements row by row
* For example, you can calculate the average values first by column, and then by row.

In [151]:
np.apply_along_axis(np.mean, axis=0, arr=A) #(function, applied axis, array)

array([13., 14., 15.])

In [152]:
np.apply_along_axis(np.mean, axis=1, arr=A)
# Not sure how they got these numbers

array([11., 14., 17.])

* Nothing prevents use from usying our own defined functions.
* Nothing forbids you from using an ufunc
    * An examle of how ufunc perform one iteration element-by-element.

In [153]:
def foo(x):
    return x/2

In [155]:
np.apply_along_axis(foo, axis=1, arr=A)
# Every element starting at the columns

array([[5. , 5.5, 6. ],
       [6.5, 7. , 7.5],
       [8. , 8.5, 9. ]])

In [156]:
np.apply_along_axis(foo, axis=0, arr=A)

array([[5. , 5.5, 6. ],
       [6.5, 7. , 7.5],
       [8. , 8.5, 9. ]])

### Conditions and Boolean Arrays
* So far everything has used indexing and slicing to select or extract a subset of an array.
    * These methods use numerical indexes
* An alternative way to selectively extract the elements in an array is to use the conditions and Boolean Operators

* Select all values that are less then 0.5 in a 4x4 matrix containing random numbers from 0 and 1.

In [158]:
A = np.random.random((4,4))
A

array([[0.41630022, 0.62714843, 0.88129016, 0.42534814],
       [0.29414251, 0.13486971, 0.218007  , 0.33925804],
       [0.71796558, 0.0853269 , 0.04568871, 0.75198672],
       [0.26470374, 0.06715137, 0.03222763, 0.53608054]])

* Once a matrix of random numbers is defined, if you apply an operator condition, you will receive as a return value a Boolean array containnig true values in the positions in which the condition is satisfied. 

In [159]:
A < 0.5

array([[ True, False, False,  True],
       [ True,  True,  True,  True],
       [False,  True,  True, False],
       [ True,  True,  True, False]])

* Boolean arrays are used implicitly for making selections of parts of arrays. 
* Inserting the previous condition directly inside the square bracks, extracts all elements smaller than 0.5, and puts into a new array

In [160]:
A[A < 0.5]

array([0.41630022, 0.42534814, 0.29414251, 0.13486971, 0.218007  ,
       0.33925804, 0.0853269 , 0.04568871, 0.26470374, 0.06715137,
       0.03222763])

### Shape Manipulation
* This can convert a 1-dimensional array into a matrix.

### reshape() function

In [162]:
a = np.random.random(12)
a

array([0.35690086, 0.91328128, 0.90452286, 0.21517588, 0.66729256,
       0.09576871, 0.66065936, 0.42476646, 0.53895151, 0.02097033,
       0.78666216, 0.66391236])

In [163]:
A = a.reshape(3, 4) # 3 rows by 4 columns
A

array([[0.35690086, 0.91328128, 0.90452286, 0.21517588],
       [0.66729256, 0.09576871, 0.66065936, 0.42476646],
       [0.53895151, 0.02097033, 0.78666216, 0.66391236]])

### ravel() function
* This converts a two-dimensional array into a one-dimensional array by using this function.

In [167]:
a = a.ravel()
print(a)
# Output is a 1 dimensional straight array
print(a.shape)

[0.35690086 0.91328128 0.90452286 0.21517588 0.66729256 0.09576871
 0.66065936 0.42476646 0.53895151 0.02097033 0.78666216 0.66391236]
(12,)


* Setting .shape = int sets it into the same thing as ravel()

In [169]:
a.shape = (12)
a

array([0.35690086, 0.91328128, 0.90452286, 0.21517588, 0.66729256,
       0.09576871, 0.66065936, 0.42476646, 0.53895151, 0.02097033,
       0.78666216, 0.66391236])

### transpose() function
* This inverts the columns with the rows

In [175]:
print(A)
print('\n')
print(A.transpose())

[[0.35690086 0.91328128 0.90452286 0.21517588]
 [0.66729256 0.09576871 0.66065936 0.42476646]
 [0.53895151 0.02097033 0.78666216 0.66391236]]


[[0.35690086 0.66729256 0.53895151]
 [0.91328128 0.09576871 0.02097033]
 [0.90452286 0.66065936 0.78666216]
 [0.21517588 0.42476646 0.66391236]]


## Array Manipulation
* This shows how to create new arrays by joining or splitting arrays are are already defined.

### Joining Arrays
* Merging multiple arrays to form a new one that contains all the arrays.
* NumPy using stacking
    * Perform vertical stacking with the vstack() function
        * This combines the second array as new rows of the first array
        * The array grows in a vertical direction
    * Perform horizontal stacking using the hstack() function
        * The second array is added to the columns of the first array.

### vstack() function

In [177]:
A = np.ones((3, 3))
B = np.zeros((3,3))
np.vstack((A,B))

array([[1., 1., 1.],
       [1., 1., 1.],
       [1., 1., 1.],
       [0., 0., 0.],
       [0., 0., 0.],
       [0., 0., 0.]])

### hstack() function

In [179]:
np.hstack((A,B))

array([[1., 1., 1., 0., 0., 0.],
       [1., 1., 1., 0., 0., 0.],
       [1., 1., 1., 0., 0., 0.]])

### column_stacks() function
* Generally used with 1-D arrays
    * stacked as columns or rows in order to form a new 2-D array

In [181]:
a = np.array([0,1,2])
b = np.array([3,4,5])
c = np.array([6,7,8])
np.column_stack((a,b,c)) # column 0 = a 1 = b 2 = c

array([[0, 3, 6],
       [1, 4, 7],
       [2, 5, 8]])

### row_stack() function
* stacked as rows in order to form a new 2-D array

In [183]:
np.row_stack((a,b,c))

array([[0, 1, 2],
       [3, 4, 5],
       [6, 7, 8]])

### Splitting Arrays
* Here you have a set of functionals that work both horizontaly and vertically

### hsplit() function

In [186]:
A = np.arange(16).reshape((4,4))
A

array([[ 0,  1,  2,  3],
       [ 4,  5,  6,  7],
       [ 8,  9, 10, 11],
       [12, 13, 14, 15]])

* Splitting the array horizontally, meaning th width of the array is divided into two parts.
    * The 4x4 Matrix A will be split into 2x4 matrices

In [190]:
[B,C] = np.hsplit(A, 2)
print(B) # Split into 4 rows and 2 columns
print('\n')
print(C) # Split into 4 rows and other 2 columns

[[ 0  1]
 [ 4  5]
 [ 8  9]
 [12 13]]


[[ 2  3]
 [ 6  7]
 [10 11]
 [14 15]]


### v.split() function
* This vertically split array, meaning the height of the array is divided into 2 parts, the 4x4 matrix A wil be split ito 4x2 matrices.

In [193]:
[B,C] = np.vsplit(A,2)
print(B) # Split first 2 rows of array A
print('\n')
print(C) # Split last 2 rows of array A

[[0 1 2 3]
 [4 5 6 7]]


[[ 8  9 10 11]
 [12 13 14 15]]


### split() function
* Complex command
* Allows you to split th array into nonsymmetrical parts
    * Must specify the indexes of the parts to be divided
        * axis = 1 indexes will be columns
        * axis = 0 indexes will be rows
    * Specify three indexes in the following way.

In [202]:
[A1,A2,A3] = np.split(A, [1,3], axis=1) # Rows 1-3 of column index
print(A)
print('\n')
print(A1) # Matrix divided into 3 parts, first of which will include the first column
          # Second will include the second.
          # Third will include the last column
print('\n')
print(A2) # Middle 2 columns (nonsymmetrical)
print('\n')
print(A3)

[[ 0  1  2  3]
 [ 4  5  6  7]
 [ 8  9 10 11]
 [12 13 14 15]]


[[ 0]
 [ 4]
 [ 8]
 [12]]


[[ 1  2]
 [ 5  6]
 [ 9 10]
 [13 14]]


[[ 3]
 [ 7]
 [11]
 [15]]


* Now doing it by row Axis = 0

In [205]:
[A1,A2,A3] = np.split(A, [1,3], axis=0)
print(A1) # First row
print('\n')
print(A2) # Second two rows
print('\n')
print(A3) # Last row

[[0 1 2 3]]


[[ 4  5  6  7]
 [ 8  9 10 11]]


[[12 13 14 15]]


## General Concepts
* Difference between copies and views is when they return values.
* Broadcasting which occurs implicity is also covered here

### Copies or Views of Objects
* You can return a copy or a view of the array

In [208]:
a = np.array([1,2,3,4])
b = a
print(b)
a[2] = 0
print(b)

[1 2 3 4]
[1 2 0 4]


* Assigning one3 array a to another array b, you are not copyng it.
    * b is another way to call array a.

In [211]:
c = a[0:2]
print(c)
a[0] = 0
print(c)

[1 2]
[0 2]


### copy() function
* This generates a complete and distinct array using the copy() function.

In [212]:
a = np.array([1,2,3,4])
c = a.copy()
c

array([1, 2, 3, 4])

In [214]:
a[0] = 0
print(c)
print(a)

[1 2 3 4]
[0 2 3 4]


### Vectorization
* This along with broadcasting is the basis of the internal implementations of NumPy
* Vectorization is the absence of an explicit loop during the developing of a code
    * Loops cannot be ommitd
* Vectorization application leads to a more concise and readable code

In [219]:
print(a * b)
print('\n')
print(a)
print('\n')
print(b)

[ 0  4  0 16]


[0 2 3 4]


[0 2 0 4]


#### The first operation would be expressed in the following way
for (i = 0; i < rows; i++)
    {
    c[i] = a[i]*b[i];
    }

#### The product of matrices would be expressed as follows:
for(i=0; i<rows; i++){
    for(j=0; j < columns; j++){
        c[i][j] = a[i][j]*b[i][j];
    }
}


### Broadcasting
* Allows an operator or a function to act on two or more arrays to operate even if these arrays do not have the same shape.
* Not all the dimensions can be subjected to broadcasting
    * Must be several rules

* Two arrays can be subjected to broadcasting when all their dimensions are compatible
    * The length of each dimension must be equal or one of them must be equal to 1.
    * If not then you can an exception that states the 2 arrays are not compatible.

In [227]:
A = np.arange(16).reshape(4,4)
b = np.arange(4)
print(A)
print('\n')
print(b)
# Two arrays here in 4x4 and 4

[[ 0  1  2  3]
 [ 4  5  6  7]
 [ 8  9 10 11]
 [12 13 14 15]]


[0 1 2 3]


* Two rules for broadcasting
    * Add a 1 to each missing dimension
        * Should get 4 x 4 and 4 x 1

* Second rule of broadcasting
    * This extends the size of the smallest array so that it's the size of the biggest array.
    * This allows for the element-wise function or operator to be applicable.
    * Also assumes the missing elements(size, length 1) are filled with replicas of th values contained in extended sizes. 

In [229]:
print(b)
A + b

[0 1 2 3]


array([[ 0,  2,  4,  6],
       [ 4,  6,  8, 10],
       [ 8, 10, 12, 14],
       [12, 14, 16, 18]])

* More complex cases in which two arrays have different shapes and each smaller than the other only in certain dimensions.

In [233]:
m = np.arange(6).reshape(3, 1, 2)
n = np.arange(6).reshape(3, 2, 1)
print(m)
print('\n')
print(n)
# Both arrays are not compatible 
# 3 x 1 x 2
# 3 x 2 x 1

[[[0 1]]

 [[2 3]]

 [[4 5]]]


[[[0]
  [1]]

 [[2]
  [3]]

 [[4]
  [5]]]


In [237]:
#m* = [[[0,1],
#       [0,1]],
#      [[2,3],
#      [2,3]],
#      [[4,5],
#       [4,5]]]
m + n # Addition operator between the two arrays operating element-wise.

array([[[ 0,  1],
        [ 1,  2]],

       [[ 4,  5],
        [ 5,  6]],

       [[ 8,  9],
        [ 9, 10]]])

## Structured Arrays
* NumPy allows you to create arrays that are much more complex not only in size but in structure called structured arrays.
    * These type of of array contains structors or records instead of individual items.

In [245]:
# Creating a structured array
structured = np.array([(1, 'First', 0.5, 1+2j),(2, 'Second', 1.3, 2-2j),
                       (3, 'Third', 0.8, 1+3j)], dtype=('i2, a5, f4, c8'))
print(structured)
print('\n')
# dtype sequence containing the name of each item of the struct with the corresponding data type
print(structured[1])

[(1, b'First', 0.5, 1.+2.j) (2, b'Secon', 1.3, 2.-2.j)
 (3, b'Third', 0.8, 1.+3.j)]


(2, b'Secon', 1.3, 2.-2.j)


* Names automatically assigned to each item of struct can be considered as the names of the columns of the array. 
* Using them as a structured index, you can refer to all the eleents of the same type, or of the same column.

In [246]:
print(structured['f1'])

[b'First' b'Secon' b'Third']


* Names are assigned automatically with an f(field) and a progressive integer that indicated the position in the sequence
* More useful to specify the names with something more meaningful.
* Possible, and can be done at the time of array declaration.

In [248]:
structured = np.array([(1, 'First', 0.5, 1+2j), (2, 'Second', 1.3, 2-2j),
                       (3, 'Third', 0.8, 1+3j)], dtype =[('id', 'i2'), ('position', 'a6'), ('value', 'f4'), ('complex', 'c8')])
print(structured)

[(1, b'First', 0.5, 1.+2.j) (2, b'Second', 1.3, 2.-2.j)
 (3, b'Third', 0.8, 1.+3.j)]


* Can do it at a later time, redefining the tuples of names assigned to the dtype attribute of the structured array.

In [250]:
structured.dtype.names = ('id', 'order', 'value', 'complex')

* Now we can use meaningful names for the various field types. 

In [251]:
structured['order']

array([b'First', b'Second', b'Third'], dtype='|S6')

## Reading and Writing Array Data on Files

* NumPy allows you to read and convert written data in a file into an array.

### Loading and Saving Data in Binary Files

* Pair of functions called save() and load() that enable you to save and then later retrieve data stored in binary format.

### save() function

* Use the save() function and specify as arguments dth name of the file and array.
* This file will automatically be given the .npy extension

In [254]:
data = np.random.random((4,3))
print(data)
np.save('saved_data', data)

[[0.36838458 0.22779669 0.11709322]
 [0.29277543 0.69756215 0.09948199]
 [0.77627442 0.73976143 0.88067001]
 [0.79832777 0.79865273 0.66143198]]


* Whenever you need to recover the data stored in an .npy file use the load() function

### load() function
* Specify the filename as the argument, this time adding the extension .npy

In [255]:
loaded_data = np.load('saved_data.npy')
print(loaded_data)

[[0.36838458 0.22779669 0.11709322]
 [0.29277543 0.69756215 0.09948199]
 [0.77627442 0.73976143 0.88067001]
 [0.79832777 0.79865273 0.66143198]]


### Reading Files with Tabular Data
* Many times you want to read or save data they may be in TXT or CSV format.
    * CSV Values separated by commas

In [256]:
# Example of data in csv format
# id,value1,value2,value3
# 1,123,1.4,23
# 2,110,0.4,18
# 3,164,2.1,19

* To be able to read your data ina text file and insert values into an array. NumPy provides a function called genfromtxt().

### genfromtxt() function
* Takes 3 arguments
    * File containing the data
    * Character that separates values from each other (usually a comma)
    * Whether the data contain column headers

In [259]:
data = np.genfromtxt('~/Python-Data-Analytics/ch_data.csv', delimiter=',', names=True)

OSError: ~/Python-Data-Analytics/ch_data.csv not found.

* genfromtxt() function replaces the blanks in the file with nan values