# Google Advanced Data ANalytics specialization 2

## matplotlib
A library for creating static, animated, and interactive visualizations in Python 

## Seaborn:
A visualization library based on matplotlib that provides a simppler interface for working with common plots and graphs

## NumPy:
An essential library that contains multidimensional array and matrix data structures and functions to manipulate them.

## pandas:
A powerful library build on top of NumPy that's used to manipulate and analyze tabular data 

**Module**:
A simple Python file containing a collection of functions and global variables

**Global Variable**
Variables that can be accessed from anywhere in a program or script

## Introduction to Numpy
### Vectorization:
- Enables operations to be performed on multiple components of a data object at the same time.


In [None]:
list_a = [1, 2, 3]
list_b = [2, 4, 6]

# using numpy for numerical operations
import numpy as np
array_a = np.array(list_a)
array_b = np.array(list_b)

array_a * array_b

array([ 2,  8, 18])

### Commonly used built in modules
1. `datetime`
provides many helpful date and time conversions and calculations 

In [6]:
import datetime 
date = datetime.date(1977, 5, 8)
print(date)
print(date.year)

delta = datetime.timedelta(days=30)
print(date - delta)

1977-05-08
1977
1977-04-08


2. `math` provides access to mathmatical functions

In [7]:
import math 
print(math.exp(0))
print(math.log(1))
print(math.factorial(4))
print(math.sqrt(100))

1.0
0.0
24
10.0


3. `random` Useful for generating pseudo-random numbers

In [8]:
import random 
print(random.random())
print(random.choice([1, 2, 3]))
print(random.randint(1, 10))

0.42549589006098254
1
8


In [9]:
import numpy as np 
import pandas as pd 
print(np.__version__)
print(pd.__version__)

1.26.4
2.2.3


### N-dimensional array (ndarray)
The core data object of NumPy
- they are mutable
- to change a size on an array you have to reassign it 
- all the elements of an array should be of same data type
- np arrrays can be multidimentional 

In [10]:
import numpy as np 

x = np.array([1, 2, 3, 4])
x

array([1, 2, 3, 4])

In [12]:
x[-1] = 5
x

array([1, 2, 3, 5])

In [14]:
arr = np.array([1, 2, 'cpcpnut'])
arr

array(['1', '2', 'cpcpnut'], dtype='<U21')

>`dtype`:
A NumPy attribute used to check the data type of the contents of an array 

In [15]:
arr.dtype

dtype('<U21')

> `shape`: A NumPy attribute used to check the shape of an array 

In [16]:
arr.shape

(3,)

>`ndim`: A NumPy attribute used to check the number of dimensions of an array.

In [17]:
arr.ndim

1

In [None]:
# A 2 dimensional array is a lists of lists
arr_2d = np.array([[1, 2], [3, 4], [5, 6], [7, 8]])
print(arr_2d.shape)
print(arr_2d.ndim)
arr_2d

(4, 2)
2


array([[1, 2],
       [3, 4],
       [5, 6],
       [7, 8]])

In [20]:
arr_2d = arr_2d.reshape(2, 4)
arr_2d

array([[1, 2, 3, 4],
       [5, 6, 7, 8]])

In [19]:
# A 3 dimensional array is a lists of lists of list
arr_3d = np.array([[[1, 2, 3],
                    [3, 4, 5]],
                   
                   [[5, 6, 7],
                    [7, 8, 9]]])

print(arr_3d.shape)
print(arr_3d.ndim)
arr_3d

(2, 2, 3)
3


array([[[1, 2, 3],
        [3, 4, 5]],

       [[5, 6, 7],
        [7, 8, 9]]])

In [21]:
# calculating mean 
arr = np.array([1, 2, 3, 4, 5])
np.mean(arr)

3.0

In [22]:
# calculating log
np.log(arr)

array([0.        , 0.69314718, 1.09861229, 1.38629436, 1.60943791])

In [23]:
np.ceil(5.3)

6.0

>`np.array()`: This creates an `ndarray`. THere is no limit to how many dimensions a NumPy array can have, but arrays with many dimensions can be more difficult to work with 

In [25]:
# 1d array:
import numpy as np 
array_1d = np.array([1, 2, 3])
array_1d

array([1, 2, 3])

In [27]:
# 2d array:
array_2d = np.array([[1, 2, 3], 
                    [4, 5, 6]])
array_2d

array([[1, 2, 3],
       [4, 5, 6]])

> `np.zeros()`: This creates an array of a designated shape that is pre-filled with zeros:

In [28]:
np.zeros((3, 2))

array([[0., 0.],
       [0., 0.],
       [0., 0.]])

> `np.ones()`: This creates an array of a designated shape that is pre-filled with ones:

In [29]:
np.ones((2, 2))

array([[1., 1.],
       [1., 1.]])

>`np.full()`: This creates an array of a designated shape that is pre-filled with a specified value

In [30]:
np.full((5, 3), 8)

array([[8, 8, 8],
       [8, 8, 8],
       [8, 8, 8],
       [8, 8, 8],
       [8, 8, 8]])

### Array Methods:
- NumPy arrays have many methods that allow you to manipulate and operate on them. 

>`ndarray.flatten()`:
This returns a copy of the array collapsed into one dimension

In [31]:
array_2d = np.array([(1, 2, 3), (4, 5, 6)])
print(array_2d)
print()
array_2d.flatten()

[[1 2 3]
 [4 5 6]]



array([1, 2, 3, 4, 5, 6])

>`nd_array.reshape()`: This gives a new shape to an array without changing its data. 

In [34]:
array_2d = np.array([(1, 2, 3), (4, 5, 6)])
print(array_2d)
print()
array_2d.reshape(3, 2)

[[1 2 3]
 [4 5 6]]



array([[1, 2],
       [3, 4],
       [5, 6]])

## Introduction to Pandas
### Tabular data 
- Data that is in the form of a table, with rows and columns

In [44]:
import numpy as np 
import pandas as pd
dataframe = pd.read_csv('~/Desktop/data_analysis/DATA/raw_data/train.csv')
print(dataframe.head(20))

    PassengerId  Survived  Pclass  \
0             1         0       3   
1             2         1       1   
2             3         1       3   
3             4         1       1   
4             5         0       3   
5             6         0       3   
6             7         0       1   
7             8         0       3   
8             9         1       3   
9            10         1       2   
10           11         1       3   
11           12         1       1   
12           13         0       3   
13           14         0       3   
14           15         0       3   
15           16         1       2   
16           17         0       3   
17           18         1       2   
18           19         0       3   
19           20         1       3   

                                                 Name     Sex   Age  SibSp  \
0                             Braund, Mr. Owen Harris    male  22.0      1   
1   Cumings, Mrs. John Bradley (Florence Briggs Th...  female  38.