<a href="https://www.bigdatauniversity.com"><img src = "https://ibm.box.com/shared/static/wbqvbi6o6ip0vz55ua5gp17g4f1k7ve9.png" width = 300, align = "center"></a>

# <center>Machine Learning Basics</center>

<img src = "https://ibm.box.com/shared/static/860wrw1jvullt57vl470fe7zikucwzzh.png", height="400", width="400" align = 'right'>
<img src = "https://ibm.box.com/shared/static/f7wewzfjxozemzlhsf7tay1me0alyofa.png", height="150", width="150", align = 'left'>



### <b>Welcome to Lab 1a of Machine Learning 101 with Python.</b>
<p><b>Machine Learning is a subset of artificial intelligence (AI), where the system can "learn" without explicitly being coded</b></p>

In this lab exercise, you will learn basic functionalities for looking at data, target data, feature names, etc. As well, you will get a basic understanding of how to fit data into a model to train it and have a quick look at prediction. These will be the building blocks for future labs!


### Some Notebook Commands
<p>In case you haven't dealt with a Jupyter Notebook before, here are some quick, useful commands that may be handy to get started.</p>
<ul>
    <li>Run a cell: CTRL + ENTER</li>
    <li>Create a cell above a cell: a</li>
    <li>Create a cell below a cell: b</li>
    <li>Change a cell to Markdown: m</li>
    
    <li>Change a cell to code: y</li>
</ul>

If you are interested in more keyboard shortcuts, go to <b> Help -> Keyboard Shortcuts </b>

<b> <i> Before starting the lab, please run the following code in order to access the solutions </i> </b>

In [1]:
from IPython.core.display import HTML
HTML("""
<style type="text/css">
    #ans:hover { background-color: black; }
    #ans {padding: 6px; 
        background-color: white; 
        border: green 2px solid; 
        font-weight: bold; }
</style>
""") 

### Hello! We will start by introducing you to the digits dataset.

The digits dataset is made of up of 1797 8x8 images such as the one below.
<img src = "https://ibm.box.com/shared/static/psb68kpyyt0o6kbhcq88cwj7fuv7nlhq.png">
These images are hand-written digits converted into image format. <br>
We can use this data to train our machine to further determine other 8x8 images as specific digits! <br>
Sounds like we are <i>Classifying</i> data!

---
First we will need to <b>import</b> the dataset from **sklearn** and declare the dataset.

In [2]:
from sklearn.datasets import load_digits
digits = load_digits()

Now let's check out the <b>type</b> and <b>data</b> for digits. The type should be <i>'Bunch'</i> which is a dictionary-like object specifically useful for loading sklearn internal sample datasets. 

In [4]:
print(type(digits))
print(digits.data)

<class 'sklearn.datasets.base.Bunch'>
[[  0.   0.   5. ...,   0.   0.   0.]
 [  0.   0.   0. ...,  10.   0.   0.]
 [  0.   0.   0. ...,  16.   9.   0.]
 ..., 
 [  0.   0.   1. ...,   6.   0.   0.]
 [  0.   0.   2. ...,  12.   0.   0.]
 [  0.   0.  10. ...,  12.   1.   0.]]


In reality, you won't be creating 'Bunch' types. But they come with a lot of useful information to learn for beginners.

---
Let's check out the <b>description</b> of this dataset for more information!

In [5]:
print(digits.DESCR)

Optical Recognition of Handwritten Digits Data Set

Notes
-----
Data Set Characteristics:
    :Number of Instances: 5620
    :Number of Attributes: 64
    :Attribute Information: 8x8 image of integer pixels in the range 0..16.
    :Missing Attribute Values: None
    :Creator: E. Alpaydin (alpaydin '@' boun.edu.tr)
    :Date: July; 1998

This is a copy of the test set of the UCI ML hand-written digits datasets
http://archive.ics.uci.edu/ml/datasets/Optical+Recognition+of+Handwritten+Digits

The data set contains images of hand-written digits: 10 classes where
each class refers to a digit.

Preprocessing programs made available by NIST were used to extract
normalized bitmaps of handwritten digits from a preprinted form. From a
total of 43 people, 30 contributed to the training set and different 13
to the test set. 32x32 bitmaps are divided into nonoverlapping blocks of
4x4 and the number of on pixels are counted in each block. This generates
an input matrix of 8x8 where each element is a

We can see the categories that <i>classify</i> each of the images by invoking the <b>target</b> field. There is a number associated to the classification of each digit. The target field fetches these numbers, where each digit is mapped to a name in target_names

In [6]:
print(digits.target)

[0 1 2 ..., 8 9 8]


Now if we print out the <b>target_names</b>, we can find out what the data is categorized as.

In [7]:
print(digits.target_names)

[0 1 2 3 4 5 6 7 8 9]


An important piece of information to note is that the data is stored as a <i>numpy datatype</i>, which is a homogeneous multidimensional array (ndarray). 

In [8]:
print(type(digits.data))
print(type(digits.target))
print(type(digits.target_names))

<class 'numpy.ndarray'>
<class 'numpy.ndarray'>
<class 'numpy.ndarray'>


Now let's confirm that the <b>shape</b> of the data and target match (first column) <br>
<b>Note</b>: The shape of the data is a tuple, where the first field is the number of observations and the second field is the number of attributes.

In [9]:
print(digits.data.shape)
print(digits.target.shape)

(1797, 64)
(1797,)


---
Then we can declare variables for the data and target which will be used to fit (train) the machine!

In [14]:
X = digits.data
y = digits.target

First we need to <b>import svm</b> (an algorithm) and declare a variable called clf with gamma and C attributes.
Now we can <b>fit</b> it and <b>predict</b> the last digit as 8. <br>
<b>Note</b>: The predict function will show a warning when run. Please ignore the warning.

In [15]:
from sklearn import svm
clf = svm.SVC(gamma=0.001, C=100)
clf.fit(X,y)
print('Prediction:', clf.predict(digits.data[-1]))
print('Actual:', y[-1])

Prediction: [8]
Actual: 8




---
# Additional Resources
<br>
Tools for loading datasets: http://scikit-learn.org/stable/auto_examples/datasets/plot_digits_last_image.html
<br><br>
Introduction to sklearn: http://scikit-learn.org/stable/tutorial/basic/tutorial.html
<br><br>
Difference between Machine Learning and Statistical Modelling: <br>
http://www.analyticsvidhya.com/blog/2015/07/difference-machine-learning-statistical-modeling/

<hr>
Copyright &copy; 2016 [Big Data University](https://bigdatauniversity.com/?utm_source=bducopyrightlink&utm_medium=dswb&utm_campaign=bdu). This notebook and its source code are released under the terms of the [MIT License](https://bigdatauniversity.com/mit-license/).​