# Intro to Machine Learning (ML)

This Jupyter notebook will help get you setup with a very basic machine learning project! Today, we'll be learning how to classify flowers using a simple linear classifier model.

Prerequisites:

- Basic Python knowledge
- Information provided on the corresponding slides about ML

Setup:

1. Download Python 3 from https://www.python.org/downloads/

2. Using a command prompt, run the command:
``pip install numpy scikit-learn --user``

In [84]:
# Basic imports you'll use in most elementary machine learning projects

import numpy as np # NumPy is an efficient library for running matrix operations and other math in Python
import sklearn # Scikit Learn implements many machine learning algorithms for easy use

In [85]:
from sklearn.datasets import load_iris # Iris flower dataset

In [86]:
from sklearn.linear_model import SGDClassifier # A simple linear classifier

In [87]:
X, y = load_iris(return_X_y=True) # Loading the Iris Dataset into X inputs and y targets

num_examples = X.shape[0] # retrieve the number of elements we have in the dataset


# Randomly shuffling the data for better training
indexes = np.random.permutation(np.arange(num_examples))
X = X[indexes]
y = y[indexes]

# Splitting the data into training and test sets

X_train, X_test = X[:120], X[120:]
y_train, y_test = y[:120], y[120:]

In [88]:
print(X_train[:10]) # Viewing the first 10 items in the dataset 

[[5.9 3.2 4.8 1.8]
 [6.7 2.5 5.8 1.8]
 [6.1 2.9 4.7 1.4]
 [7.7 3.  6.1 2.3]
 [6.1 3.  4.6 1.4]
 [6.5 3.  5.8 2.2]
 [6.7 3.3 5.7 2.1]
 [5.9 3.  5.1 1.8]
 [6.5 3.  5.5 1.8]
 [5.7 2.5 5.  2. ]]


In [89]:
print(y_train[:10]) # 0's, 1's, and 2's indicate the type of flower each  

[1 2 1 2 1 2 2 2 2 2]


In [90]:
# Setting up the classifier with log loss (common for classification) and 1000 iterations of the training
classifier = SGDClassifier(loss='log', max_iter=1000) 


# Train the classifier on the training set
classifier.fit(X_train, y_train)

SGDClassifier(alpha=0.0001, average=False, class_weight=None, epsilon=0.1,
       eta0=0.0, fit_intercept=True, l1_ratio=0.15,
       learning_rate='optimal', loss='log', max_iter=1000, n_iter=None,
       n_jobs=1, penalty='l2', power_t=0.5, random_state=None,
       shuffle=True, tol=None, verbose=0, warm_start=False)

In [91]:
classifier.predict(X_train[:10]) # Our classifier successfully learned the first 10 training examples!

array([1, 2, 1, 2, 1, 2, 2, 2, 2, 2])

In [93]:
# Let's test the classifier on the test set we set aside earlier

accuracy = classifier.score(X_test, y_test)
print("Accuracy:",accuracy)

Accuracy: 0.9666666666666667


96.7% accuracy! That's great!

### Congratulations! You've just created your first machine learning project using Scikit Learn!

Try using linear classification on other datasets and see how it goes!