# Python Workshop

This workshop will work as an introduction to python 3, jupyter notebook and the python machine learning libraries that are used throughout CS82 and CS84.


**NOTE**

Basic python knoweldge is assumed as well as programming skills. This python tutorial is **NOT** comprehensive. For a comprehensive guide on grammar and python idioms please study https://docs.python.org/3/tutorial/ 


### Study Resources

* [Jupyter Notebook](https://jupyter-notebook.readthedocs.io/en/stable/)
* [Python Tutorial](https://docs.python.org/3/tutorial/)
* [Python API](https://docs.python.org/3/library/)
* [Numpy](https://numpy.org/doc/stable/)
* [Pandas](https://pandas.pydata.org/docs/)
* [MatplotLib](https://matplotlib.org/stable/index.html)
* [sklearn](https://scikit-learn.org/stable/)
* [OpenAI-GYM](https://gym.openai.com/docs/)
* [Markdown](https://www.markdownguide.org/)

#### Additional Libraries
* [glob](https://docs.python.org/3/library/glob.html)
* [Image - PIL / pillow](https://pillow.readthedocs.io/en/stable/reference/Image.html)


## Contents

1. Jupyter Notebook
2. Unix / Linux
3. Python operations
4. List / String comprehension 
5. Python functions /Lambda functions
6. Loop - Pass - Range
7. Exception Handling and understanding https://docs.python.org/3/tutorial/errors.html
8. Installing and using libraries / help() module
9. Examples for libraries used in class

### Jupyter Notebook

Notebooks are programming enviroments that are used for prototyping code. They are development enviroments and usually a notebook corresponds to a single project. We will use notebooks to run python code and execute unix commands. The execution flow is divided into cells that run indepedently from the rest of the notebook. The run-time or kernel stores in memory all existing variables. While the notebook is live, we can modify the memory by executing cells indepedently in an arbitary order. 

![image.png](attachment:image.png)
![image-2.png](attachment:image-2.png)
![image-3.png](attachment:image-3.png)


#### IMPORTANT TOOLS
![image-6.png](attachment:image-6.png)

**LEFT TO RIGHT**

* Save Notebook
* Add a cell
* Cut / Delete a cell
* Copy a cell
* Paste a cell
* Move a cell up
* Move a cell down
* Run a cell
* Stop cell execution
* Restart run-time (memory of the notebook is wiped out and all variable states are lost)
* Restart and run entire notebook
* Choose cell type (programming language of the cell)
    * Markdown is a programming language for documents (such as writing this cell) 
    * Code is python3

### Python Operations


```python

a=1
b=2
a+b

# Lists = Java Arrays
a=[]
a.append(1)

# Dictionaries
a={}
a["a"]=0
a[0]=10

# Tuples
a=(10,10)
a[0]=0 # GIVES ERROR. Tuples are immutable

# Strings

a="Hello World"

# Strings are also character arrays / lists 

a[1] # e

# calling functions

a.lower() # lower case

a.len


# casting types

float("2.2")

int("2.2") #error

int("2")

float("inf")

float("-inf")

int(2.2)

# other types

dict #dictionary 

zip # please look more into it in the API

list # cast to list

iter # please look more into it in the python API

tuple # tuple

set #

```

### Unix Operations


Start in a code cell with `!` to run a unix command

`!ls` lists files in current directory


# [COMMAND CHEAT SHEET](http://www.mathcs.emory.edu/~valerie/courses/fall10/155/resources/unix_cheatsheet.html)


`!pwd` Current Directory

`!unzip something.zip` extract an archive file zip

`!tar -xvf something.tar` extracts **tar** files

### [WHAT IS ZIP](https://en.wikipedia.org/wiki/ZIP_(file_format))
### [WHAT IS TAR](https://en.wikipedia.org/wiki/Tar_(computing))

### List / String Operations

```python

a="Hello World"

a[:10] # Hello Worl
a[:4] # Hell
a[1:10] # ello World
a[-1:] # d (last letter)
a[-2:] # ld (last 2 letters)
a[-2] # l (second from the end letter)
a[:-2] # start to end except last 2 letters
a[::1] # loop through the letters with step 1
a[::2] # "HloWrd" loop through the letters with step 2
a[:-2:] # HloWr loop through the subarray ending at -2 with step 2

b=" test"
a+b #Hello World test

a.split(" ")

```

#### FORMATTING

```python

"%s"%a

"this is formatting %s"%a

a=2.202331

"this is formatting %s"%a

"this is formatting %.2f"%a


"this is formatting %.2e"%a

a=2

# PRINT

print("this is formatting %d"%a)


```

### List Comprehension


```python

a=[1,2.2,"a"]

b=["b"]

a+b  # combine lists

a.append("b") # add to the end

a.remove("b") # remove value

a.pop() # remove last

# SAME OPERATIONS FOR INDEXING AS ABOVE

a[:] # select all elements

```

### Python functions

```python

def some_function():
    
    return 0

some_function()


new_name=some_function

new_name()


def some_function(argument, keyword_argument="A",keyword_argument_2="A"):
    return argument+key_word_argument+keyword_argument_2

some_function("A")

some_function("A","B")

some_function("A",keyword_argument_2="B")

some_function(keyword_argument_2="B") # error, can't skip arguments


lambda_fn=lambda x: x

lambda_fn("A")

lambda_fn=lambda x: x[1] 

lambda_fn("A,B")

def some_function(fn,s="A"):
    return fn(s)

some_function(lambda x:x.lower())

```

### Loop - Pass - Range

```python
a=0

while(a<10):
    a+=1

for i in range(10):
    print(i)
    
for i in range(10):
    pass

l="Hello World"
for i in l:
    print(i)
    

l=[1,2,3,4]
for i in l:
    print(i)
    
for num in range(2, 10):
    if num % 2 == 0:
        print("Found an even number", num)
        continue
    print("Found an odd number", num)
```


## Exceptions

Even if a statement or expression is syntactically correct, it may cause an error when an attempt is made to execute it. Errors detected during execution are called exceptions and are not unconditionally fatal: you will soon learn how to handle them in Python programs. Most exceptions are not handled by programs, however, and result in error messages as shown here:

```python

>>> 10 * (1/0)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
ZeroDivisionError: division by zero
>>> 4 + spam*3
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
NameError: name 'spam' is not defined
>>> '2' + 2
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: Can't convert 'int' object to str implicitly
    
```

The last line of the error message indicates what happened. Exceptions come in different types, and the type is printed as part of the message: the types in the example are ZeroDivisionError, NameError and TypeError. The string printed as the exception type is the name of the built-in exception that occurred. This is true for all built-in exceptions, but need not be true for user-defined exceptions (although it is a useful convention). Standard exception names are built-in identifiers (not reserved keywords).

The rest of the line provides detail based on the type of exception and what caused it.

The preceding part of the error message shows the context where the exception occurred, in the form of a stack traceback. In general it contains a stack traceback listing source lines; however, it will not display lines read from standard input.

[Built-in Exceptions](https://docs.python.org/3/library/exceptions.html#bltin-exceptions) lists the built-in exceptions and their meanings.



## If Statement
```python
x = int(input("Please enter an integer: ")) # Please enter an integer: 42
if x < 0:
    x = 0
    print('Negative changed to zero')
elif x == 0:
        print('Zero')
elif x == 1:
    print('Single')
else:
    print('More')
    
```



## Python Libraries

* [Link to all Built-in Libraries](https://docs.python.org/3/library/)
* [Link to all Built-In Functions](https://docs.python.org/3/library/functions.html)

## Mostly Used in this Class

### Libraries 

* [pickle](https://docs.python.org/3/library/pickle.html)
* [os](https://docs.python.org/3/library/os.html)
* [datetime](https://docs.python.org/3/library/datetime.html)
* [collections](https://docs.python.org/3/library/collections.html)
* [copy](https://docs.python.org/3/library/copy.html)
* [string](https://docs.python.org/3/library/string.html)
* [re](https://docs.python.org/3/library/re.html)

### Functions 

* [open](https://docs.python.org/3/library/functions.html#open)
* [eval](https://docs.python.org/3/library/functions.html#eval)
* [enumerate](https://docs.python.org/3/library/functions.html#enumerate)
* [float](https://docs.python.org/3/library/functions.html#float)
* [format](https://docs.python.org/3/library/functions.html#format)
* [help](https://docs.python.org/3/library/functions.html#help)
* [len](https://docs.python.org/3/library/functions.html#len)
* [reversed](https://docs.python.org/3/library/functions.html#reversed)
* [sorted](https://docs.python.org/3/library/functions.html#sorted)
* [zip](https://docs.python.org/3/library/functions.html#zip)

## External Libraries

External Libraries are install with `pip` https://pypi.org/

Must use `pip` or `pip3`. Make sure `pip` version is 3+ by running this command `!pip -V` and `!pip3 -V` and use the one that has a version 3+

To install a library locally run 

`!pip install --user library_name`

## Mostly used in this Class

* [Numpy](https://numpy.org/doc/stable/)
* [Pandas](https://pandas.pydata.org/docs/)
* [MatplotLib](https://matplotlib.org/stable/index.html)
* [sklearn](https://scikit-learn.org/stable/)
* [OpenAI-GYM](https://gym.openai.com/docs/)


```python

import numpy

numpy.arange(10)

import numpy as np

np.arange(10)

import numpy as ktbyte

ktbyte.arange(10)


from numpy import arange

arange(10)

```

## Numpy

### Documentation (MUST READ)

[Documentation](https://numpy.org/doc/stable/reference/)

[Tutorial](https://numpy.org/doc/stable/user/absolute_beginners.html)

### Used throughout the Class (MUST READ)

### Array Definitions
* [np.array](https://numpy.org/doc/stable/reference/routines.array-creation.html)
* [apply](https://numpy.org/doc/stable/reference/ufuncs.html#available-ufuncs)
* [nan](https://numpy.org/doc/stable/reference/constants.html#numpy.NAN)
* [random](https://numpy.org/doc/stable/reference/random/index.html)

### Array Manipulation
* [array manipulation](https://numpy.org/doc/stable/reference/routines.array-manipulation.html)
* [shape](https://numpy.org/doc/stable/reference/generated/numpy.ndarray.shape.html#numpy.ndarray.shape)
* [reshape](https://numpy.org/doc/stable/reference/generated/numpy.reshape.html#numpy.reshape)
* [flatten](https://numpy.org/doc/stable/reference/generated/numpy.ndarray.flatten.html#numpy.ndarray.flatten)
* [astype](https://numpy.org/doc/stable/reference/generated/numpy.ndarray.astype.html#numpy.ndarray.astype)

### Sorting 
* [sort](https://numpy.org/doc/stable/reference/routines.sort.html)
* [argsort](https://numpy.org/doc/stable/reference/generated/numpy.argsort.html#numpy.argsort)
* [sort](https://numpy.org/doc/stable/reference/generated/numpy.sort.html#numpy.sort)
* [argmax](https://numpy.org/doc/stable/reference/generated/numpy.argmax.html#numpy.argmax)
* [amax](https://numpy.org/doc/stable/reference/generated/numpy.amax.html#numpy.amax)

### Indexing
* [broadcasting](https://numpy.org/doc/stable/reference/ufuncs.html#broadcasting)
* [indexing](https://numpy.org/doc/stable/reference/arrays.indexing.html)
* [functional](https://numpy.org/doc/stable/reference/routines.functional.html)


### Array Definitions
```python

x = np.array([[1, 2, 3], [4, 5, 6]], np.int32)
type(x) # <class 'numpy.ndarray'>
x.shape # (2, 3)
x.dtype # dtype('int32')
x[1, 2] # The element of x in the *second* row, *third* column, namely, 6.
# For example slicing can produce views of the array:
y = x[:,1]

np.nan

np.e

np.inf

np.empty([2, 2])
# array([[ -9.74499359e+001,   6.69583040e-309],
#       [  2.13182611e-314,   3.06959433e-309]])         #uninitialized

np.ones_like(x)

np.ones(5)

np.add(1.0, 4.0)


x1 = np.arange(9.0).reshape((3, 3))

x2 = np.arange(3.0)

np.add(x1, x2) # broad casting


x1 + x2


np.multiply(x1, x2)

a = np.array([[1, 0],
              [0, 1]])
b = np.array([[4, 1],
              [2, 2]])
np.matmul(a, b)

np.random.random((5,))
```

### Array Manipulation

```python
x = np.array([1, 2, 3, 4])
x.shape
(4,)
y = np.zeros((2, 3, 4))
y.shape
(2, 3, 4)
y.shape = (3, 8)
y
array([[ 0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.],
       [ 0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.],
       [ 0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.]])
y.shape = (3, 6)
#Traceback (most recent call last):
#  File "<stdin>", line 1, in <module>
#ValueError: total size of new array must be unchanged


a = np.arange(6).reshape((3, 2))

a = np.array([[1,2,3], [4,5,6]])

np.reshape(a, 6)

np.reshape(a, (3,-1))       # the unspecified value is inferred to be 2


a.reshape(-1)

a.flatten()

x = np.array([1, 2, 2.5])

x.dtype

x.astype(int)

```

### Sorting

```python

a = np.array([[1,4],[3,1]])
np.sort(a)                # sort along the last axis

np.sort(a, axis=0)        # sort along the first axis

dtype = [('name', 'S10'), ('height', float), ('age', int)]
values = [('Arthur', 1.8, 41), ('Lancelot', 1.9, 38),
          ('Galahad', 1.7, 38)]
a = np.array(values, dtype=dtype)       # create a structured array
np.sort(a, order='height')                        
# Sort by age, then height if ages are equal:
np.sort(a, order=['age', 'height'])               

x = np.array([3, 1, 2])

np.argsort(x) #returns an array of indices of the same shape as a that index data along the given axis in sorted order.


```


### Indexing

[Broadcasting Theory](https://numpy.org/devdocs/user/theory.broadcasting.html)

```python
from numpy import array
a = array([1.0, 2.0, 3.0])
b = array([2.0, 2.0, 2.0])
a * b

a = array([1.0,2.0,3.0])
b = 2.0
a * b


a = array([[ 0.0,  0.0,  0.0],
           [10.0, 10.0, 10.0],
           [20.0, 20.0, 20.0],
           [30.0, 30.0, 30.0]])
b = array([1.0, 2.0, 3.0])
a + b

        

```
![image.png](attachment:image.png)
![image-2.png](attachment:image-2.png)
```python
x = np.arange(10)
x[2]

x.shape = (2,5) # now x is 2-dimensional

x[1,3]
x[1,-1]
x[0]


x = np.arange(10)
x[2:5]

x[:-7]

x[1:7:2]

y = np.arange(35).reshape(5,7)
y[1:5:2,::3]



x = np.arange(10,1,-1)
x[np.array([3, 3, 1, 8])]
x[np.array([3,3,-3,8])]

y[np.array([0,2,4]), np.array([0,1,2])]


```

Boolean Mask index arrays.

Boolean arrays used as indices are treated in a different manner entirely than index arrays. Boolean arrays must be of the same shape as the initial dimensions of the array being indexed. In the most straightforward case, the boolean array has the same shape:


```python

b = y>20
y[b]

b[:,5] # use a 1-D boolean whose first dim agrees with the first dim of y
y[b[:,5]]

y[b[:,5]]=10
```
numpy.apply_along_axis

Apply a function to 1-D slices along the given axis.

Execute func1d where func1d operates on 1-D arrays and a is a 1-D slice of arr along axis.

```python

def my_func(a):
    """Average first and last element of a 1-D array"""
    return (a[0] + a[-1]) * 0.5
b = np.array([[1,2,3], [4,5,6], [7,8,9]])
np.apply_along_axis(my_func, 0, b)
np.apply_along_axis(my_func, 1, b)


b = np.array([[8,1,7], [4,3,9], [5,2,6]])
np.apply_along_axis(sorted, 1, b)

b = np.array([[1,2,3], [4,5,6], [7,8,9]])
np.apply_along_axis(np.diag, -1, b)


```

## PANDAS

### General Functions

* [read_csv](https://pandas.pydata.org/docs/reference/api/pandas.read_csv.html#pandas.read_csv)
* [concat](https://pandas.pydata.org/docs/reference/api/pandas.concat.html#pandas.concat)
* [unique](https://pandas.pydata.org/docs/reference/api/pandas.unique.html#pandas.unique)
* [isna](https://pandas.pydata.org/docs/reference/api/pandas.isna.html#pandas.isna)
* [to_numeric](https://pandas.pydata.org/docs/reference/api/pandas.to_numeric.html#pandas.to_numeric)
* [to_datetime](https://pandas.pydata.org/docs/reference/api/pandas.to_datetime.html#pandas.to_datetime)

### DataFrames

* DataFrame.index
* DataFrame.columns
* DataFrame.dtypes
* DataFrame.values
* DataFrame.shape
* [astype](https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.astype.html#pandas.DataFrame.astype)
* [loc](https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.loc.html#pandas.DataFrame.loc)
* [iloc](https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.iloc.html#pandas.DataFrame.iloc)
* [apply](https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.apply.html#pandas.DataFrame.apply)
* [dropna](https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.dropna.html#pandas.DataFrame.dropna)
* [groupby](https://pandas.pydata.org/pandas-docs/stable/user_guide/groupby.html)


### General Functions

```python

import pandas as pd

df1 = pd.read_csv("https://archive.ics.uci.edu/ml/machine-learning-databases/00432/Data/Facebook_Economy.csv")
df2=pd.read_csv("https://archive.ics.uci.edu/ml/machine-learning-databases/00432/Data/Facebook_Microsoft.csv")
pd.concat([df1,df2],axis=0)
pd.concat([df1,df2],axis=1)
pd.unique(df1['TS1'])

array = np.array([[1, np.nan, 3], [4, 5, np.nan]])
pd.isna(array)

s = pd.Series(['1.0', '2', -3])
pd.to_numeric(s)


pd.to_datetime('13000101', format='%Y%m%d', errors='ignore')
s = pd.Series(['3/11/2000', '3/12/2000', '3/13/2000'] * 1000)

pd.to_datetime(s, infer_datetime_format=True)  



```

### DataFrames

```python

df1.columns

df1.index


df1.dtypes

df1.values

df1.shape

df1.astype('float32').dtypes

df1.loc[0]

df1.loc[:,'TS1']
df1.loc[0,'TS1']

df1.loc[0,['TS1','TS2']]

df1.iloc[0,[1,2]]

df1.iloc[:,[1,2]].apply(lambda x: x.sum(),axis=0)


df1.iloc[:,[1,2]].apply(lambda x: x.sum(),axis=1)


df1.iloc[:,[1,2]].apply(lambda x: x[0]+x[1],axis=0)


df1.iloc[:,[1,2]].apply(lambda x: x[0]+x[1],axis=1)

df = pd.DataFrame({"name": ['Alfred', 'Batman', 'Catwoman'],
                   "toy": [np.nan, 'Batmobile', 'Bullwhip'],
                   "born": [pd.NaT, pd.Timestamp("1940-04-25"),
                            pd.NaT]})
df.dropna()

df = pd.DataFrame(
     [
         ("bird", "Falconiformes", 389.0),
         ("bird", "Psittaciformes", 24.0),
            ("mammal", "Carnivora", 80.2),
            ("mammal", "Primates", np.nan),
            ("mammal", "Carnivora", 58),
        ],
        index=["falcon", "parrot", "lion", "monkey", "leopard"],
        columns=("class", "order", "max_speed"),
    )

grouped = df.groupby("class")
df
for name, group in grouped:
    print(name)
    print(group)
grouped['max_speed'].mean()


```

## Matplotlib


### Type of Plots

* [plot](https://matplotlib.org/stable/api/_as_gen/matplotlib.pyplot.plot.html)
* [bar](https://matplotlib.org/stable/api/_as_gen/matplotlib.pyplot.bar.html)
* [hist](https://matplotlib.org/stable/api/_as_gen/matplotlib.pyplot.hist.html)
* [scatter](https://matplotlib.org/stable/api/_as_gen/matplotlib.pyplot.scatter.html)
* [imshow](https://matplotlib.org/stable/api/_as_gen/matplotlib.pyplot.imshow.html)
* [contour](https://matplotlib.org/stable/api/_as_gen/matplotlib.pyplot.contour.html)

**Tutorial** https://github.com/rougier/matplotlib-tutorial#regular-plots


### Plot

```python

import numpy as np
import matplotlib.pyplot as plt

n = 256
X = np.linspace(-np.pi,np.pi,n,endpoint=True)
Y = np.sin(2*X)

plt.plot (X, Y+1, color='blue', alpha=1.00)
plt.plot (X, Y-1, color='blue', alpha=1.00)
plt.show()

```
![image.png](attachment:image.png)

### Scatter

```python

import numpy as np
import matplotlib.pyplot as plt

n = 1024
X = np.random.normal(0,1,n)
Y = np.random.normal(0,1,n)
T = np.arctan2(Y,X)

plt.axes([0.025,0.025,0.95,0.95])
plt.scatter(X,Y, s=75, c=T, alpha=.5)

plt.xlim(-1.5,1.5), plt.xticks([])
plt.ylim(-1.5,1.5), plt.yticks([])
# savefig('../figures/scatter_ex.png',dpi=48)
plt.show()

```

![image-2.png](attachment:image-2.png)

### Bar
```python
import numpy as np
import matplotlib.pyplot as plt

n = 12
X = np.arange(n)
Y1 = (1-X/float(n)) * np.random.uniform(0.5,1.0,n)
Y2 = (1-X/float(n)) * np.random.uniform(0.5,1.0,n)

plt.bar(X, +Y1, facecolor='#9999ff', edgecolor='white')
plt.bar(X, -Y2, facecolor='#ff9999', edgecolor='white')

for x,y in zip(X,Y1):
    plt.text(x+0.4, y+0.05, '%.2f' % y, ha='center', va= 'bottom')

plt.ylim(-1.25,+1.25)
plt.show()
```

![image-3.png](attachment:image-3.png)


### Contourf
```python


def f(x,y):
    return (1-x/2+x**5+y**3)*np.exp(-x**2-y**2)

n = 256
x = np.linspace(-3,3,n)
y = np.linspace(-3,3,n)
X,Y = np.meshgrid(x,y)

plt.axes([0.025,0.025,0.95,0.95])

plt.contourf(X, Y, f(X,Y), 8, alpha=.75, cmap=plt.cm.hot)
C = plt.contour(X, Y, f(X,Y), 8, colors='black', linewidth=.5)
plt.clabel(C, inline=1, fontsize=10)

plt.xticks([]), plt.yticks([])
# savefig('../figures/contour_ex.png',dpi=48)
plt.show()

```
![image-4.png](attachment:image-4.png)

### Imshow

```python

import numpy as np
import matplotlib.pyplot as plt

def f(x,y): return (1-x/2+x**5+y**3)*np.exp(-x**2-y**2)

n = 10
x = np.linspace(-3,3,4*n)
y = np.linspace(-3,3,3*n)
X,Y = np.meshgrid(x,y)
plt.imshow(f(X,Y))
plt.show()

```

![image-5.png](attachment:image-5.png)

## SKLearn

## Models used

**NOTE** You do not have to learn or memorize anything from the specific model page. You only need to understand the **IMPORTANT GENERAL FUNCTIONS** and how they work. 

### Regression
* [Linear Regression](https://scikit-learn.org/stable/modules/generated/sklearn.linear_model.LinearRegression.html#sklearn.linear_model.LinearRegression)
* [Decision Tree Regressor](https://scikit-learn.org/stable/modules/generated/sklearn.tree.DecisionTreeRegressor.html#sklearn.tree.DecisionTreeRegressor)
* (Neural Network) [Multi-Layer Perceptron(MLP) Regressor](https://scikit-learn.org/stable/modules/generated/sklearn.neural_network.MLPRegressor.html#sklearn.neural_network.MLPRegressor)


### Classification 
* [Logistic Regression](https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression)
* [Naive Bayes](https://scikit-learn.org/stable/modules/generated/sklearn.naive_bayes.GaussianNB.html#sklearn.naive_bayes.GaussianNB)
* [Decision Tree Classifier](https://scikit-learn.org/stable/modules/generated/sklearn.tree.DecisionTreeClassifier.html#sklearn.tree.DecisionTreeClassifier)
* [MLP Classifier](https://scikit-learn.org/stable/modules/generated/sklearn.neural_network.MLPClassifier.html#sklearn.neural_network.MLPClassifier)


### Unsupervised 

* [KMeans](https://scikit-learn.org/stable/modules/generated/sklearn.cluster.KMeans.html#sklearn.cluster.KMeans)
* [PCA](https://scikit-learn.org/stable/modules/generated/sklearn.decomposition.PCA.html#sklearn.decomposition.PCA)

## Auxilery Functions 

### Metrics

* (Classification) [Classification Report](https://scikit-learn.org/stable/modules/generated/sklearn.metrics.classification_report.html#sklearn.metrics.classification_report)
* (Regression) [Mean Squared Error](https://scikit-learn.org/stable/modules/generated/sklearn.metrics.mean_squared_error.html#sklearn.metrics.mean_squared_error)

### Cross Validation

* [Train Test Split](https://scikit-learn.org/stable/modules/generated/sklearn.model_selection.train_test_split.html#sklearn.model_selection.train_test_split)


### Preprocessing

* [Label Encoder](https://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.LabelEncoder.html#sklearn.preprocessing.LabelEncoder)
* [One Hot Encoder](https://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.OneHotEncoder.html)
* [Min Max Scaler](https://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.MinMaxScaler.html#sklearn.preprocessing.MinMaxScaler)
* [Min Max Scaler](https://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.Normalizer.html#sklearn.preprocessing.Normalizer)


### Feature Extraction

* [Count Vectorizer](https://scikit-learn.org/stable/modules/generated/sklearn.feature_extraction.text.CountVectorizer.html#sklearn.feature_extraction.text.CountVectorizer)

* [TF-IDF](https://scikit-learn.org/stable/modules/generated/sklearn.feature_extraction.text.TfidfVectorizer.html#sklearn.feature_extraction.text.TfidfVectorizer)

### IMPORTANT GENERAL FUNCTION

#### MODEL Functions
* `model.fit` - Tunes the parameters of the model using some **training data**
* `model.predict` - Generate predictions using **test data**
* `model.score` - Evaluates the model on **test data** and with a metric corresponding to the type of model

#### NON-MODEL Functions

Used for feature extractors and Preprocessing

* `extractor.fit` - Tunes the parameters of the class / feature extractor using **training data**
* `extractor.transform` - Use **test data** to transform it from the trained parameters of the same class

```python

import numpy as np
from sklearn.linear_model import LinearRegression
X = np.array([[1, 1], [1, 2], [2, 2], [2, 3]])
# y = 1 * x_0 + 2 * x_1 + 3
y = np.dot(X, np.array([1, 2])) + 3
reg = LinearRegression().fit(X, y)
reg.score(X, y)

reg.coef_

reg.intercept_

reg.predict(np.array([[3, 5]]))



## OpenAI

[Documentation](https://gym.openai.com/docs/)

[Atari API](https://github.com/openai/gym/blob/master/gym/envs/atari/atari_env.py)


Tutorial Reinforcement Learning: https://colab.research.google.com/github/ageron/handson-ml2/blob/master/18_reinforcement_learning.ipynb#scrollTo=wzkLmX-xNniA


```python

#!pip install gym[atari]
#!wget -O space_invaders.bin https://github.com/orcax/atari_roms/blob/master/space_invaders.bin?raw=true
#!pwd | xargs python3 -m atari_py.import_roms 
import gym
import matplotlib.pyplot as plt
from IPython.display import display
from PIL import Image as Image
import time

display_handle=display(None, display_id=True)


env = gym.make('SpaceInvaders-ram-v0')
env.reset()


for _ in range(1000):
    img = env.render(mode="rgb_array")
    img=PILImage.fromarray(img)
    
    # Image Resize and display Image
    basewidth = 500
    wpercent = (basewidth/float(img.size[0]))
    hsize = int((float(img.size[1])*float(wpercent)))
    img = img.resize((basewidth,hsize), Image.ANTIALIAS)
    display_handle.update(img)
    
    
    out=env.step(env.action_space.sample()) # take a random action
    
    time.sleep(0.01)
    
env.close()
display_handle.update(None)
```
![image.png](attachment:image.png)

## Homework Items

These are to be completed and due as assigned. These are not to be completed all at once. Please take a note of which items are due and when. 


1. Register for a [Kaggle](https://www.kaggle.com/) Account 
2. Find a [Dataset](https://www.kaggle.com/dataset) you like 
![image.png](attachment:image.png)

    * The dataset must be small in size i.e. less than 1MB
    * Must be in a tabular format you can import with pandas [Available Formats](https://pandas.pydata.org/docs/reference/io.html)
3. Download the dataset and export it into a folder
4. Open the dataset with pandas
5. Demonstrate your understanding and knoweldge of all the pandas functionality demonstrated in this notebook 
6. Convert the pandas dataset to numpy
7. Demonstrate your understanding and knoweldge of all the numpy functionality demonstrated in this notebook  
----
8. Demonstrate your understanding and knoweldge of all the matplotlib functionality demonstrated in this notebook 
9. Demonstrate your understanding and knoweldge of the Sklearn `fit` `predict` `score` `transform` `train_test_split`
10. Run the OpenAI demo in your VM

#TODO
I am going to use SKLearn and MatPlotLib to predict the acceptance rate of a college based on its rank.