# Machine Learning Demystified


## Agenda

> 3 Hour Hands-On Workshop

1. [Introduction (Python and Jupyter Basics)](01%20-%20Intro.ipynb)
2. [Demistifying ML Terms](02%20-%20Demistifying%20ML%20Terms.ipynb)
2. [Regression and Classification](03%20-%20Regression%20or%20Classification.ipynb)
2. [Classification and Unsupervised Learning Examples (Clustering)](04%20-%20Classification%20and%20Unsupervised%20Learning%20Examples.ipynb)
2. Bio Break
2. [Preparing Data (Data Science!)](05%20-%20Preparing%20Data.ipynb)
2. [Regression Examples (Linear Regression and Neural Network)](06%20-%20Regression%20Examples.ipynb)
2. [Where Do You Go From Here?](07%20-%20From%20Here.ipynb)

## Get Started
Visit **https://github.com/atomantic/ml_class** and follow the setup instructions

## What is Machine Learning?

All machine learning boils down to 1 simple concept:
> Is this number either very close or very far from this other number?
* Clustering: grouping datasets where their numeric values are closer related than others
* Linear Regression: finding a formula that will compute a result that matches the truth very closely
* Neural Networks: A way of creating arbitrarily complex regression functions that feed into each other

So... it's just math (and not even very complex math)

## Follow Along!
This section focuses on getting comfortable with the Juypter Notebook, reading Python source code, and executing Python statements.

In [1]:
# In Jupyter you can execute commandline programs by prefixing with a '!'
# hit ctrl+enter (or shift+enter) to execute
!python --version

Python 3.6.5


In [2]:
# Import the common packages for exploring Machine Learning
import numpy as np  # <-- common convention for short names of packages...
import pandas as pd
import sklearn
import matplotlib
import matplotlib.pyplot as plt

# Always good to check versions - because DOCS differ!
print('NumPy Version',np.__version__)
print('Pandas Version',pd.__version__)
print('Scikit Learn Version',sklearn.__version__)
print('MatplotLib Version',matplotlib.__version__)

NumPy Version 1.13.3
Pandas Version 0.23.1
Scikit Learn Version 0.19.1
MatplotLib Version 2.2.2


![numpy](images/logo_numpy.jpg)
NumPy is a library for the Python programming language, adding support for large, multi-dimensional arrays and matrices, along with a large collection of high-level mathematical functions to operate on these arrays.
- [docs](https://docs.scipy.org/doc/)
- n-dimensional array object
- random numbers
- complex array navigation: https://docs.scipy.org/doc/numpy-1.13.0/reference/arrays.indexing.html

In [None]:
# Create a simple NumPy array
a = np.array([[1,2],
              [3,4],
              [5,6],
              [7,8],
              [9,10],
              [11,12]])

print("full array:", a)

# Numpy uses interesting syntax for slicing data
# Zero-indexed!
print("\nfirst row:", a[0])

# query segments from an array: array[start:stop:step]
b = np.array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
# note that stop is non-inclusive
# step defaults to 1
print("\n1st to 4th index (non-inclusive of last)", b[1:4])
print("\nevery 2 items from 0-4 (non-inclusive of last)", b[0:4:2])

# let's play with the first array:
#print("\nfirst column values from all rows:", a[:,0])
#print("\nsecond column, second row:", a[1,1])
#print("\nmore complex value pulling:", a[2:4,0])

# Your Turn
#print("\nPlayground:", a[2,:])

![pandas](images/logo_pandas.png)
Pandas is a Python library that provides powerful data structures for data analysis, time series,and statistics. 
- [docs](https://pandas.pydata.org/pandas-docs/stable/)
- powerful data analysis and manipulation
- makes data into something like a spreadsheet

In [None]:
# Lets create a DataFrame with Pandas that has more advanced utility functions built in
# Load the previously created NumPy array as an input argument known aka function parameter
df = pd.DataFrame(a)
# with column names for ease of use
df.columns = ['Feature 1','Feature 2']

# ** note: Jupyter will 'pretty print' the LAST object you reference without a print()
# But you have to use print('') to show any others before it

print(df) # <--- this gets printed
df.values              # <--- but this DOESN'T get printed
df                     # <--- but this does (last direct item)

![matplotlib](images/logo_matplotlib.png)
matplotlib is a plotting library for the Python programming language and its numerical mathematics extension NumPy.
- [docs](https://matplotlib.org/contents.html)
- powerful data visualization
- interactive with iPython/Jupyter Notebooks

In [None]:
# multiple plots can be created and shown by giving the plots a figure number
plt.figure(1)
# generate some random data (10K numbers between 0-1)
x = np.random.rand(10000)
# create a histogram, placing the values in x into 100 buckets
plt.hist(x, 100)
# render it
plt.show()

In [None]:
# Use the 'magic' % have iPython load matplotlib in interactive mode
%matplotlib notebook

In [None]:
# interactive scatterplot
N = 50
x = np.random.rand(N)
y = np.random.rand(N)
colors = np.random.rand(N)
area = np.pi * (15 * np.random.rand(N))**2  # 0 to 15 point radii

plt.figure(1)
plt.scatter(x, y, s=area, c=colors, alpha=0.5)
plt.show()

In [None]:
# Use the 'magic' % to see what variables are in memory
%who
%whos

See [Magic Commands Docs](http://ipython.readthedocs.io/en/stable/interactive/magics.html)

![scikit-learn](images/logo_scikit.png)
Scikit-learn is a machine learning library for the Python programming langauge. It is simple and provides efficient tools for data mining and data analysis.
- [docs](http://scikit-learn.org/stable/documentation.html)
- complete machine learning toolkit
- clustering tools
- neural networks
- experimental data

### ...We'll Get to This

## But First

Let's continue to [Demistifying ML Terms](02%20-%20Demistifying%20ML%20Terms.ipynb)