# Introduction to Machine Learning

Parts of this notebook are from CS231n [Python tutorial]((http://cs231n.github.io/python-numpy-tutorial/)) by Justin Johnson and [cs228-python-tutorial](https://github.com/kuleshov/cs228-material/blob/master/tutorials/python/cs228-python-tutorial.ipynb) by [Volodymyr Kuleshov](http://web.stanford.edu/~kuleshov/) and [Isaac Caswell](https://symsys.stanford.edu/viewing/symsysaffiliate/21335).

## Introduction to python

In this tutorial, we will cover:
* **Data Types**: Different data types available in python
* **Containers**: Various data structures available in python
* **Functions**: Creating functions and how to use them in python
* **Classes**: Creating classes and using them in python
* **File I/O**: Different file related input/output operations


## Data Types

Like most languages, Python has a number of basic types including integers, floats, booleans, and strings. These data types behave in ways that are familiar from other programming languages:

**Numbers, Strings and Lists**

Python also has built-in types for long integers and complex numbers; you can find all of the details in the [documentation](https://docs.python.org/3/library/stdtypes.html#numeric-types-int-float-long-complex).

To learn more about Numbers, String and Lists. [documentation](https://docs.python.org/3/tutorial/introduction.html)

Everything you need to know about built-in types. [documentation](https://docs.python.org/3/library/stdtypes.html#bytes-and-bytearray-operations) 

You can find a list of all string methods in the [documentation](https://docs.python.org/3/library/stdtypes.html#string-methods).

### Functions

Python functions are defined using the def keyword. For example:

In [34]:
def sign(x):
    if x > 0:
        return 'positive'
    elif x < 0:
        return 'negative'
    else:
        return 'zero'

for x in [-1, 0, 1]:
    print(sign(x))
    

negative
zero
positive


We will often define functions to take optional keyword arguments, like this:

In [35]:
def hello(name, loud=False):
    if loud:
        print('HELLO, %s!' % name.upper())
    else:
        print('Hello, %s' % name)

hello('Bob')
hello('Fred', loud=True)

Hello, Bob
HELLO, FRED!


### Classes

The syntax for defining classes in Python is straightforward:

In [36]:
class Greeter(object):

    # Constructor
    def __init__(self, name):
        self.name = name  # Create an instance variable

    # Instance method
    def greet(self, loud=False):
        if loud:
            print('HELLO, %s!' % self.name.upper())
        else:
            print('Hello, %s' % self.name)

g = Greeter('Fred')  # Construct an instance of the Greeter class
g.greet()            # Call an instance method; prints "Hello, Fred"
g.greet(loud=True)   # Call an instance method; prints "HELLO, FRED!"

Hello, Fred
HELLO, FRED!


### File I/O Operations

In Python, a physical file must be mapped to a built-in file object with the help of built-in function `open()`.

#### Writing to a File

Two different built-in methods are provided to write data to a file:

* `write()`: writes string into a text file in a single line.
* `writelines()`: writes list of string into a text file in multiple lines.

In [37]:
strings = ["Welcome to Nimblebox\n", "Open source is the best"]
with open("output.txt", "w+") as file:
    file.write("Hey there user!\n")
    file.writelines(strings)

#### Reading from a File

Three different built-in methods are provided to read data from a file:

* `readline()`: reads the characters starting from the current reading position up to a newline character.
* `read()`: reads the specified number of characters starting from the current position.
* `readlines()`: reads all lines until the end of file and returns a list object

In [38]:
with open("output.txt", "r") as file:
    data = file.readline()
    print(data)

Hey there user!



Looping through the contents of the files is pretty easy in Python

In [39]:
with open("output.txt", "r") as file:
    for line in file:
        print(line)

Hey there user!

Welcome to Nimblebox

Open source is the best


It is highly recommened to take a look at [Harrison Kinsley's](https://www.youtube.com/playlist?list=PLQVvvaa0QuDeAams7fkdcwOGBpGdHpXln) tutorial series on Python to futher enhance your knowlegde base about Python.

# Introduction to NumPy


* **Basics**: Different ways to create NumPy Arrays and Basics of NumPy
* **Computation**: Computations on NumPy arrays using Universal Functions and other NumPy Routines
* **Aggregations**: Various function used to aggregate for NumPy arrays

### Basics

##### If you're very new to NumPy, check this out [Documentation](https://numpy.org/devdocs/user/absolute_beginners.html)

**NumPy converts to most logical data type**

In [7]:
data1 = np.array([1.2, 2, 3, 4])
print(data1)
print(data1.dtype) # all values will be converted to floats

[1.2 2.  3.  4. ]
float64


In order to perform any mathematical operations on NumPy arrays, all the elements must be of a type that is valid to perform these mathematical operations.

**Error is resolved by just changing the dtype of 'a' manually**

In [24]:
a = np.random.normal(0,1,10)
a = a.astype(np.int16)
b = np.arange(10, dtype=np.int16)
c = np.arange(10, dtype=np.int16)
c += a + b
print(c)

[ 1  3  4  6  8 10 13 14 16 17]


Unlike python list, we can create multi-dimensional arrays using NumPy.

**Nested lists result in multi-dimensional arrays**

In [14]:
x1 = np.array([range(i, i + 3) for i in [2, 4, 6]])
print(x1)

[[2 3 4]
 [4 5 6]
 [6 7 8]]


For more information and other NumPy operations based on Python list, refer to the [NumPy documentation](http://numpy.org/).

**Using NumPy routines**

##### Create a 3x3 array of uniformly distributed random values between 0 and 1

In [15]:
print(np.random.random((3, 3)))

[[0.84656241 0.61961238 0.19955991]
 [0.89190871 0.65471397 0.03095116]
 [0.93051268 0.33584247 0.80735685]]


##### Create a 3x3 array of normally distributed random values with mean 0 and standard deviation 1

In [16]:
print(np.random.normal(0, 1, (3, 3)))

[[ 0.50109517 -1.07614452 -0.10107323]
 [ 1.29574169 -0.46030814 -0.25632801]
 [ 1.03215511  0.70735419 -0.66970457]]


You can always explore the [documentation](http://numpy.org/) for more.

**Attributes of NumPy Array**

Each NumPy array has the following attributes,

In [19]:
x3 = np.random.randint(10, size=(3, 4, 5))  # Create a 3-D array

print("x3 ndim: ", x3.ndim) # np.ndim yields the number of dimensions 
print("x3 shape:", x3.shape) # np.shape yields the size of each dimension
print("x3 size: ", x3.size) # np.size yields the total size of the array
print("dtype:", x3.dtype) # np.dtype yields the data type of the array
print("itemsize:", x3.itemsize, "bytes") # np.itemsize yields the size (in bytes) of each array element
print("nbytes:", x3.nbytes, "bytes") # np.nbytes yields the total size (in bytes) of the array

x3 ndim:  3
x3 shape: (3, 4, 5)
x3 size:  60
dtype: int64
itemsize: 8 bytes
nbytes: 480 bytes


For more information, refer the [documentation](http://numpy.org/).

##### Accessing elements: Slicing and Indexing

Slicing and Indexing of NumPy Arrays is quite similar to that of Python lists

In [20]:
data = np.arange(10) # Create a 1-D array
print("Original Data:\n", data, "\n")

# Indexing
print("Indexing NumPy Array:")
print("  ", data[4]) # 4th element of the numpy array
print("  ", data[-1], "\n") # 1st element from right side of the numpy array

# Slicing: To access a slice of an array 'data', we use this `data[start:stop:step]`
print("Slicing NumPy Array:")
print("  ", data[:5]) # First 5 element of the numpy array
print("  ", data[::-1]) # All the elements of the numpy array but in reverse order


Original Data:
 [0 1 2 3 4 5 6 7 8 9] 

Indexing NumPy Array:
   4
   9 

Slicing NumPy Array:
   [0 1 2 3 4]
   [9 8 7 6 5 4 3 2 1 0]


<u><i>Indexing in a multi-dimentional NumPy Array</i></u>: Multi-dimensional indices work in the same way, with multiple indices separated by commas

##### 3-D array

In [27]:
x3 = np.random.randint(10, size=(3, 4, 5))
print(x3)

[[[5 9 7 3 8]
  [6 0 5 9 1]
  [4 7 0 5 5]
  [4 8 0 0 2]]

 [[8 7 4 1 2]
  [9 0 6 1 5]
  [4 1 5 7 4]
  [1 6 5 0 9]]

 [[1 2 8 9 0]
  [8 4 5 8 7]
  [5 0 1 9 6]
  [3 3 8 9 4]]]


For further exploration, refer the [documentation](https://numpy.org/doc/1.17/user/basics.indexing.html) of NumPy

**Python Lists and NumPy Arrays**

NumPy utilizes efficient pointers to a location in memory and it will store the full array. Lists on the other hand are pointers to many different objects in memory.

<u><i>Subarray (default returns)</i></u>: Slicing returns a view for a NumPy Array, where as Python Lists returns a copy the list

##### Let's create a NumPy Array and slice it

In [31]:
data_numpy = np.random.randint(10, size=(10))
print("Pre-slicing NumPy Array: ", data_numpy)
slicing_numpy = data_numpy[0:3]
print("Slice of NumPy Array: ", slicing_numpy)

import random
data_list = random.sample(range(0, 10), 10)
print("\nPre-slicing Python List: ", data_list)
slicing_list = data_list[0:3]
print("Slice of Python List: ", slicing_list)

Pre-slicing NumPy Array:  [4 2 3 3 2 7 0 9 5 0]
Slice of NumPy Array:  [4 2 3]

Pre-slicing Python List:  [9, 2, 6, 4, 1, 8, 7, 0, 3, 5]
Slice of Python List:  [9, 2, 6]


##### Let's change the first element of both array and list

In [29]:
slicing_numpy[0] = -1
print("Slice of NumPy Array: ", slicing_numpy)
slicing_list[0] = -1
print("Slice of Python List: ", slicing_list)

Slice of NumPy Array:  [-1  4  5]
Slice of Python List:  [-1, 8, 4]


In [30]:
print("Post-slicing NumPy array: ", data_numpy) # has changed
print("Post-slicing Python list: ", data_list) # has not changed

Post-slicing NumPy array:  [-1  4  5  3  8  7  0  4  1  6]
Post-slicing Python list:  [3, 8, 4, 2, 9, 1, 0, 7, 6, 5]


<u><i>Subarray (custom)</i></u>: Slicing of NumPy Array should create a copy of the array just like Python Lists

##### Creating copies of the array instead of views

In [31]:
data_numpy = np.random.randint(10, size=(10))
print("Pre-slicing NumPy Array: ", data_numpy)
slicing_numpy_copy = data_numpy[0:3].copy()
print("Slice of NumPy Array: ", slicing_numpy_copy)

Pre-slicing NumPy Array:  [4 5 8 0 8 8 7 4 4 9]
Slice of NumPy Array:  [4 5 8]


##### Let's chage the first element of our numpy array and observe

In [None]:
slicing_numpy_copy[0] = -1
print("Post-slicing NumPy Array: ", slicing_numpy_copy)
print("Pre-slicing NumPy Array: ", data_numpy) # now it is not a view any more but we created a copy of data_numpy

### Computation 

#### 1. Universal Function
A universal function (or ufunc) that is applied on an `ndarray` in an element-by-element fashion. That is, a ufunc is a “vectorized” wrapper for a function that takes a fixed number of specific inputs and produces a fixed number of specific outputs.

In [34]:
x = np.random.randint(1, 11, size=(10))
y = np.random.randint(1, 11, size=(10))
print ("Array 'x' = ", x)
print ("Array 'y' = ", y)

Array 'x' =  [3 8 3 9 1 2 6 6 6 2]
Array 'y' =  [ 9  8  5  7 10  1  6  6  5  5]


Each of these arithmetic operations are simply convenient wrappers around specific functions built into NumPy, for example, the `+` operator is a wrapper for the `add` function

In [36]:
print(np.add(x, y))
print(np.subtract(x, y))
print(np.multiply(x, y))
print(np.mod(x, y))

[12 16  8 16 11  3 12 12 11  7]
[-6  0 -2  2 -9  1  0  0  1 -3]
[27 64 15 63 10  2 36 36 30 10]
[3 0 3 2 1 0 0 0 1 2]


The following table lists some of the `ufunc` implemented in NumPy:


| Universal Functions	  | Operator (if any)  | Description                                                    |
|:-----------------------:|:------------------:|:--------------------------------------------------------------:|
|``np.add``               | ``+``              |Addition (e.g., ``[10  6  8] + [3 10  6] = [13 16 14]``)        |
|``np.subtract``          | ``-``              |Subtraction (e.g., ``[10  6  8] - [3 10  6] = [ 7 -4  2]``)     |
|``np.negative``          | ``-``              |Unary negation (e.g., ``[-10  -6  -8]``)                        |
|``np.multiply``          | ``*``              |Multiplication (e.g., ``[10  6  8] * [3 10  6] = [30 60 48]``)  |
|``np.divide``            | ``/``              |Division (e.g., ``[10  6  8] / [3 10  6] = [3.33 0.6 1.33]``)   |
|``np.floor_divide``      | ``//``             |Floor division (e.g., ``[10  6  8] // [3 10  6] = [3 0 1]``)    |
|``np.mod``               | ``%``              |Modulus/remainder (e.g., ``[10  6  8] % [3 10  6] = [1 6 2]``)  |
|``np.log``               |                    |Natural logarithm, element-wise                                 |
|``np.log2``              |                    |Base-2 logarithm of x                                           |


More information on universal functions (including the full list of available functions) can found in the NumPy [documentation](https://numpy.org/doc/1.17/reference/ufuncs.html).

#### 2. NumPy Routines

NumPy being a the scientific computing package, it has several in-build routines/functions to aid mathematical and scientific computing. Some of the common routines used in Machine Learning are discussed below.

##### NumPy allows use to concatenate or append different NumPy Arrays

In [38]:
a = np.random.randint(1, 11, size=(3, 3, 2))
b = np.random.randint(1, 11, size=(3, 3, 3))
c = np.ones((1, 3, 2), dtype="int32")
d = np.ones((3, 1, 2), dtype="int32")

print("'a':\n", a, "\n")
print("'b':\n", b, "\n")
print("'c':\n", c, "\n")
print("'d':\n", d, "\n")

'a':
 [[[ 2  4]
  [10  3]
  [ 7  2]]

 [[ 1  3]
  [ 6  6]
  [ 8  7]]

 [[ 3  6]
  [ 6  3]
  [ 3  6]]] 

'b':
 [[[ 3  9  9]
  [ 4 10  3]
  [ 5  5  8]]

 [[ 3  4  4]
  [10  2  5]
  [ 5  8  2]]

 [[ 1  1  5]
  [ 4  9  6]
  [ 9  1  8]]] 

'c':
 [[[1 1]
  [1 1]
  [1 1]]] 

'd':
 [[[1 1]]

 [[1 1]]

 [[1 1]]] 



##### Let's create a random NumPy Array

In [39]:
numpy_array = np.random.randint(1, 11, size=(9))
print("Original Array Shape: ", numpy_array.shape)
print("Original Array: ", numpy_array, "\n")

Original Array Shape:  (9,)
Original Array:  [ 6  3 10  6  9 10  9  2  8] 

New Array Shape:  (3, 3)
New Array:
 [[ 6  3 10]
 [ 6  9 10]
 [ 9  2  8]]


##### Using np.reshape() routine to reshape an array

In [None]:
numpy_array = numpy_array.reshape(3,3)
print("New Array Shape: ", numpy_array.shape)
print("New Array:\n", numpy_array)

##### We can also flatten matrices using ravel()

In [40]:
numpy_array = np.random.randint(1, 11, size=(24))
numpy_array = numpy_array.reshape(4,6)
print("Original Array Shape: ", numpy_array.shape)
print("Original Array:\n", numpy_array, "\n")

Original Array Shape:  (4, 6)
Original Array:
 [[ 2  3  4  7  6 10]
 [ 9  5  5  9 10  4]
 [ 7  1 10  1  4  9]
 [ 9  8  3 10  7  9]] 

Flattened Array Shape:  (24,)
Flattened Array:
 [ 2  3  4  7  6 10  9  5  5  9 10  4  7  1 10  1  4  9  9  8  3 10  7  9]


##### Flattening an unflattened array

In [None]:
numpy_array = numpy_array.ravel()
print("Flattened Array Shape: ", numpy_array.shape)
print ("Flattened Array:\n", numpy_array)

##### Other useful routines for data analysis using NumPy

In [41]:
numpy_array = np.random.randint(1, 11, size=(3, 4))

print(numpy_array, "\n")
print ("Sum of all Elements:", numpy_array.sum())
print("Smallest Element:", numpy_array.min())
print("Highest Element:", numpy_array.max())
print("Cumulative Sum of Elements:", numpy_array.cumsum())
print ("Column-wise Sum:", numpy_array.sum(axis=0))
print ("Row-wise Sum:",numpy_array.sum(axis=1))

[[8 5 3 6]
 [5 5 9 8]
 [7 7 4 1]] 

Sum of all Elements: 68
Smallest Element: 1
Highest Element: 9
Cumulative Sum of Elements: [ 8 13 16 22 27 32 41 49 56 63 67 68]
Column-wise Sum: [20 17 16 15]
Row-wise Sum: [22 27 19]


You can do matrix multiplication and matrix manipulations

##### Dot products of two "arrays"

In [42]:
a = np.random.randint(1, 11, size=(3, 3))
b = np.random.randint(1, 11, size=(3, 3))

print("'a':\n", a, "\n")
print("'b':\n", b, "\n")

print("Dot Product of 'a' and 'b' (arrays):\n", np.dot(a, b))

'a':
 [[ 3  7  2]
 [ 8  8  4]
 [ 2 10  3]] 

'b':
 [[7 6 2]
 [7 6 6]
 [7 4 6]] 

Dot Product of 'a' and 'b' (arrays):
 [[ 84  68  60]
 [140 112  88]
 [105  84  82]]


##### Let's concatenate 'a' and 'b' together alond axis=2

In [None]:
print("Concatenate:\n", np.concatenate((a, b), axis=2), "\n")

##### Let's append 'c' to 'a' vertically

In [None]:
print("Vertical Append:\n", np.vstack((a, c)), "\n") # try appending 'd' to 'a' vertically

##### Let's append 'd' to 'a' horizontally

In [None]:
print("Horizontal Append:\n", np.hstack((a, d))) # try appending 'c' to 'a' horizontally

##### Matrix product of two "arrays"

In [43]:
a = np.random.randint(1, 11, size=(3, 4))
b = np.random.randint(1, 11, size=(4, 2))

print("'a':\n", a, "\n")
print("'b':\n", b, "\n")

print("Matrix Product of 'a' and 'b' (arrays):\n", np.matmul(a, b))

'a':
 [[9 7 7 9]
 [4 1 7 5]
 [2 3 2 5]] 

'b':
 [[8 5]
 [7 5]
 [1 3]
 [5 7]] 

Matrix Product of 'a' and 'b' (arrays):
 [[173 164]
 [ 71  81]
 [ 64  66]]


##### Taking the transpose of an array Matrix

In [44]:
a = np.random.randint(1, 11, size=(3, 4))
print("'a':\n", a, "\n")

'a':
 [[ 9  1  5  5]
 [ 1  5 10  7]
 [ 6  8  9  4]] 

'a' Transpose (using 'array.T'):
 [[ 9  1  6]
 [ 1  5  8]
 [ 5 10  9]
 [ 5  7  4]] 

'a' Transpose (using 'np.transpose()''):
 [[ 9  1  6]
 [ 1  5  8]
 [ 5 10  9]
 [ 5  7  4]] 



##### You can take transpose in two ways

In [None]:
print("'a' Transpose (using 'array.T'):\n", a.T, "\n")
print("'a' Transpose (using 'np.transpose()''):\n", np.transpose(a), "\n")

There so many more routines available in this package. To explore all the NumPy routines, refer the [documentation](https://numpy.org/doc/1.17/reference/routines.html). 

# Introduction to Pandas

In this tutorial, we will cover:

* **Basics of Pandas**: Introduction Pandas Objects, creation of commonly used Pandas Objects.
* **Operations on Data**: 
* **Aggregations**: Various function used to aggregate for NumPy arrays

Read this if you're new to Pandas [Documentation](https://pandas.pydata.org/pandas-docs/version/0.15/tutorials.html) (chapters 1 and 2 and lessons 1-3)

### Reading data from JSON

In [None]:
df = pd.read_json('./assets/programming.json')
df

Notice this time our index came with us correctly since using JSON allowed indexes to work through nesting. Pandas will try to figure out how to create a DataFrame by analyzing structure of your JSON, and sometimes it doesn't get it right. Often you'll need to set the orient keyword argument depending on the structure, so check out [read_json docs](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.read_json.html) about that argument to see which orientation you're using.

### Reading data from a SQL database

If you’re working with data from a SQL database you need to first establish a connection using an appropriate Python library, then pass a query to pandas. Here we'll use SQLite to demonstrate.

First, we need `pysqlite3` installed, so run this command in your terminal:

```
$ pip install pysqlite3
```


`sqlite3` is used to create a connection to a database which we can then use to generate a DataFrame through a `SELECT` query.

So first we'll make a connection to a SQLite database file:

In [None]:
import sqlite3
con = sqlite3.connect("./assets/database.db")
df = pd.read_sql_query("SELECT * FROM programming", con)

df

Just like with CSVs, we could pass `index_col='index'`, but we can also set an index after-the-fact:

In [None]:
df = df.set_index('index')
df

In fact, we could use `set_index()` on any DataFrame using any column at any time. Indexing Series and DataFrames is a very common task, and the different ways of doing it is worth remembering.

### Converting back to a CSV, JSON, or SQL

So after extensive work on cleaning your data, you’re now ready to save it as a file of your choice. Similar to the ways we read in data, pandas provides intuitive commands to save it:

In [None]:
df.to_csv('./assets/programming.csv')

df.to_json('./assets/programming.json')

df.to_sql('./assets/programming_1', con)

When we save JSON and CSV files, all we have to input into those functions is our desired filename with the appropriate file extension. With SQL, we’re not creating a new file but instead inserting a new table into the database using our `con` variable from before.

### Selecting rows and columns
Using .loc[] and.iloc[] you can select particular rows and columns in a dataframe

In [None]:
movies_df.iloc[45]

In [None]:
movies_df_2=movies_df.set_index('Title')

In [None]:
movies_df_2.head()

In [None]:
movies_df_2.loc['Sing']

In [None]:
movies_df.iloc[[5,11,15,29]]

### Operations on dataframe

In [None]:
movies_df[movies_df['Runtime']>170]

In [None]:
movies_df[(movies_df['Runtime']>150) & (movies_df['Rating']>8.5)]

#### Group DataFrame using a mapper or by a Series of columns using .groupby()

In [None]:
directors_df=movies_df.groupby('Director').mean().reset_index()
directors_df.head()

In [None]:
directors_df[directors_df['Rating']>8.3]

In [None]:
movies_df.groupby('Year')['Rating'].mean()

In [None]:
movies_df.groupby('Year')['Rating'].max()

#### Apply a function along an axis of the DataFrame using .apply()

In [None]:
def times10(x): 
    return 10*x
movies_df['Rating']=movies_df['Rating'].apply(times10)

In [None]:
movies_df.head()

In [4]:
!pwd

/Users/abirami/Development/30-Days-ML-d4e9cb1a7c29ff6a747fc62e827801e1666584af/Week 1/Day 1
