# Welcome to the Dark Art of Coding:
## Introduction to Machine Learning
Intro to Scikit-Learn

<img src='../universal_images/dark_art_logo.600px.png' width='300' style="float:right">

# Objectives
---

In this session, students should expect to:

* Explore machine learning types and techniques
   * Supervised learning
   * Unsupervised learning
   * Classification
   * Regression
   * Clustering
   * Dimensionality reduction
* Review key characterisitcs of Scikit-Learn, especially the application programming interface (API)

# Machine Learning Types and Techniques
---

## Supervised learning

## Unsupervised learning

## Classification

## Regression

## Clustering

## Dimensionality reduction

# Key Characteristics of Scikit-Learn
---

# The Scikit-Learn API
---

The Scikit-Learn interface follows a number of guidelines covered in the API Contract (as defined in the API design paper: https://arxiv.org/abs/1309.0238)

"As much as possible, our design choices have been guided so as to avoid the
proliferation of framework code. We try to adopt simple conventions and to
limit to a minimum the number of methods an object must implement. The API
is designed to adhere to the following broad principles:

**Consistency**. All objects (basic or composite) share a consistent interface composed of a limited set of methods. This interface is documented in a consistent manner for all objects.

**Inspection**. Constructor parameters and parameter values determined by learning algorithms are stored and exposed as public attributes.

**Non-proliferation of classes**. Learning algorithms are the only objects to be
represented using custom classes. Datasets are represented as NumPy arrays
or SciPy sparse matrices. Hyper-parameter names and values are represented
as standard Python strings or numbers whenever possible. This keeps scikitlearn easy to use and easy to combine with other libraries.

**Composition**. Many machine learning tasks are expressible as sequences or
combinations of transformations to data. Some learning algorithms are also
naturally viewed as meta-algorithms parametrized on other algorithms. Whenever feasible, such algorithms are implemented and composed from existing
building blocks.

**Sensible defaults**. Whenever an operation requires a user-defined parameter,
an appropriate default value is defined by the library. The default value
should cause the operation to be performed in a sensible way (giving a baseline solution for the task at hand)."

For some details on how the API is put together:

[Contributors API Overview](https://scikit-learn.org/stable/developers/contributing.html#api-overview)

## Using the Scikit-Learn API

#TODO revise:

**Choose the model**: to choose a model, we will import the appropriate estimator class

**Choose appropriate hyperparameters**: to prepare the model, we create a class instance and provide hyperparameters as arguments to the class

**Fit the model**: call the `.fit()` method on the model instance and provide training data

**Apply the model**: lastly, we apply the model to new data, using primarily one of two methods:

* **Supervised learning**: generally, we use the `.predict()` method to predict new labels
* **Unsupervised learning**: generally, we use either the `.predict()` or `.transform()` methods to predict properties OR transform properties of the data.



## A quick demo

For this example, we will take a quick look at coffee prices near the North Shore of Oahu, Hawaii. Our goal will be to predict the price of a cup of coffee, given a cup size.

These prices come from several coffee shops in the area, in 2019.

|Size (oz)|Price ($)|
|----|----|
|12|2.95|
|16|3.65|
|20|4.15|
|14|3.25|
|18|4.20|


### Prep the data

We will start by looking generally at the data using some charting tools and putting it into a format that our machine learning tools can use.

Let's look at the data in a simple scatter plot to compare the cost of coffee versus the size of the cup.

In [None]:
import matplotlib.pyplot as plt

In [None]:
import numpy as np

In [None]:
x = np.array([12, 16, 20, 14, 18])
y = np.array([2.95, 3.65, 4.15, 3.25, 4.20])

In [None]:
plt.scatter(x, y);

In order to put this data into a linear regression machine learning algorithm, our x values need to be in a matrix format, with one "row" per data point (despite the row only having a single value).

It is fairly common to name the features of the model: `X` (as a capital letter)

In [None]:
X = x[:, np.newaxis]
print(X.shape)             # five rows, one value per row

In [None]:
X

Our target values are generally labeled `y` (lower case) and these values can be a simple array.

In [None]:
y

### Choose the Model

For this example, we are gonna import a simple **linear regression** model from the sklearn collection of linear models.

In [None]:
from sklearn.linear_model import LinearRegression

### Choose Hyperparameters

This model comes, as do most of the models in sklearn with arguments (or hyperparameters) set to sane defaults, so for this case, we won't add or change any arguments.

Here Jupyter simply displays the current default settings for this model.

In [None]:
model = LinearRegression()
model

### Fit the model

With a prepared model, we need to feed it data to evaluate. For this linear regression model, we give it two arguments: `X` and `y`.

In [None]:
model.fit(X, y)

With these inputs, the model was able to calculate the slope (coefficient) and the y-intercept of the line that aligns most closely with our training data.

Let's look at both of these calculated results.

```python
model.coef_
model.intercept_
```

NOTE: scikit-learn appends an `_` to the end of calculated values.

In [None]:
model.coef_

In [None]:
model.intercept_

### Apply the model

With a trained model, we can now feed the model some test data to see what values it predicts.

Let's prep several cup sizes to see what price the model will predict.

It is common to see these data sets labeled as `fit` data, hence the labels below (`xfit`, `Xfit`, `yfit`).

We start by pulling together a set of x values (representing size in oz.) and storing them in a matrix for inclusion as an argument in the `model.predict()` method.

In [None]:
xfit = np.array([16, 15, 12, 20, 17])

In [None]:
Xfit = xfit[:, np.newaxis]

In [None]:
yfit = model.predict(Xfit)
yfit

### Examine the Results

From here, we can plot all of the data points together on one chart:

* original values in purple
* predicted values in red
* predicted slope of the line that best fits the original training data

In [None]:
plt.scatter(x, y, color='rebeccapurple')
plt.scatter(xfit, yfit, color='red')
plt.plot(xfit, yfit, color='red');

### Deep Dive

N/A

### Gotchas

N/A

### How to learn more: tips and tricks

As we explore the Scikit-Learn API, and as we progress through the upcoming examples I want to preposition you for success by showing you where and how you can learn more.

One great resource to better understand the many options available to you in terms of the machine learning algorithms and the hyper parameters in scikit learn is the API Reference.

[API Reference](https://scikit-learn.org/stable/modules/classes.html): A one-stop shop for the classes and functions in `sklearn`

# Experience Points!
---

# delete_this_line: task 01

In **`jupyter`** create a simple script to complete the following tasks:


**REPLACE THE FOLLOWING**

Create a function called `me()` that prints out 3 things:

* Your name
* Your favorite food
* Your favorite color

Lastly, call the function, so that it executes when the script is run

---
When you complete this exercise, please put your **green** post-it on your monitor. 

If you want to continue on at your own-pace, please feel free to do so.

<img src='../universal_images/green_sticky.300px.png' width='200' style='float:left'>

# Experience Points!
---

# delete_this_line: task 02

In **`jupyter`** create a simple script to complete the following tasks:

**REPLACE THE FOLLOWING**

Task | Sample Object(s)
:---|:---
Compare two items using `and` | 'Bruce', 0
Compare two items using `or` | '', 42
Use the `not` operator to make an object False | 'Selina' 
Compare two numbers using comparison operators | `>, <, >=, !=, ==`
Create a more complex/nested comparison using parenthesis and Boolean operators| `('kara' _ 'clark') _ (0 _ 0.0)`

---
When you complete this exercise, please put your **green** post-it on your monitor. 

If you want to continue on at your own-pace, please feel free to do so.

<img src='../universal_images/green_sticky.300px.png' width='200' style='float:left'>

# Experience Points!
---

# delete_this_line: sample 03

In your **text editor** create a simple script called:

```bash
my_lessonname_03.py```

Execute your script on the command line using **`ipython`** via this command:

```bash
ipython -i my_lessonname_03.py```

**REPLACE THE FOLLOWING**

I suggest that as you add each feature to your script that you run it right away to test it incrementally. 

1. Create a variable with your first name as a string AND save it with the label: `myfname`.
1. Create a variable with your age as an integer AND save it with the label: `myage`.

1. Use `input()` to prompt for your first name AND save it with the label: `fname`.
1. Create an `if` statement to test whether `fname` is equivalent to `myfname`. 
1. In the `if` code block: 
   1. Use `input()` prompt for your age AND save it with the label: `age` 
   1. NOTE: don't forget to convert the value to an integer.
   1. Create a nested `if` statement to test whether `myage` and `age` are equivalent.
1. If both tests pass, have the script print: `Your identity has been verified`

When you complete this exercise, please put your **green** post-it on your monitor. 

If you want to continue on at your own-pace, please feel free to do so.

<img src='../universal_images/green_sticky.300px.png' width='200' style='float:left'>

# References
---

Below are references that may assist you in learning more:
    
|Title (link)|Comments|
|---|---|
|[API Reference](https://scikit-learn.org/stable/modules/classes.html)|One stop shop for the classes and functions in `sklearn`|

API design:  https://arxiv.org/abs/1309.0238