---
# Introduction to Machine Learning
---

# Machine Learning: What is it? 
**Machine learning** (ML) is a branch of **Artificial Intelligence** (AI) that systematically applies **algorithms** to *synthesize* (vs. *analyze*) the underlying **functional relationships** among **data** and **information**.

> ######  Breviary...
* *Synthesis* - combining multiple sources/ideas/features into a whole object, in order to understand shared qualities between individuals being studied
    * taking individual pieces of a puzzle and putting them together
* *Analysis* - breaking down the hole object, which we are examining, in order to understand individual parts and how it works.
    * taking an already completed puzzle apart


**Machine learning** (ML) ambraces an exhaustive methodology (*set of methods and principles*) for doing **analysis of data** (*data analytics*).

* ML *automates* analytical model building 
* ML uses *algorithms* that iteratively learn from data 
* ML allows *programs* to find *hidden knowledge* without being *explicitly planned* for it.

> ######  Breviary...
* **Data analytics** (DA) is the process of examining **data sets** in order to draw conclusions about the **information** they contain
    * Increasingly with the aid of specialized systems and software.
    * Advanced types of data analytics include **data mining** and **big data**. 

## Data career paths:  Scientist vs. Analyst vs. Engineer

![alt text](./Figs/scientistVsEngineer.svg "ScientistVsEngineer")


# Machine Learning: What is it used for?

Machine learning is also a *buzzword* in the **technology world** right now

* Online Search
  - Semantic web
  - Natural language processing
* Web-based financial trading systems
    * Recommendation engines
    * Real-time ads on web pages 
    * Marketing personalization
    * Customer segmentation
    * New pricing models
* Smart Communities
    * Urban analytics
    * Smart homes and buildings (*domotics*)
    * Social behavior modeling
    * Location-based social networks 
    * Internet of things (IoT)

* Healthcare 
    * Computer assisted diagnosis
    * Disease risk factors 
    * Predict hospitalizations
    * Pest control
* Smart vehicular networks
    * City-wide mobile traffic modeling
    * Human mobility modeling
    * Smart roads
    * Autonomous cars
* Public safety (smart policing)    
    * Cognitive apps for social good
    * Situational Awareness
    * Predict criminal patterns
    * Disasters, integration/dissemination of data


... ***a lot more!***

# Machine learning: General workflow (pipeline) 
Most projects can be thought of as a series of discrete steps:
* Data acquisition/loading and preprocessing
* Feature creation, selection, and normalization
* Model building and testing
    * Combining multiple models
* Reporting/Deployment

![alt text](./Figs/ml-process.svg "Machine Learning process")

# How to learn data analysis?

1. Learn to code
    - Basic data types, control flow, and paradigms 
2. Learn to work with data
    - Find some programming utilities to work with data
    - Find datasets and answer questions about the data
3. Start with applied ML
    - Get familiar with implementing some methods 
    - Progressively add pieces (*pipelining*)
4. More application, and some theory
    - Learn statistics, linear algebra, and more algorithms
5. Get better at coding
    - Learn computer science (algorithm analysis and optimization)
    - Best coding practices (testing, refactoring, versioning)
6. Broaden the scope 
    - Learn about hardware, storage, and networking 

---
# Python for Data Analysis
---

# Data analysis: Python vs. Matlab
![alt text](./Figs/pythonDataAnalysis.svg "Machine Learning process")

# Python NumPy
NumPy (or numpy) is a Python library for **numeric analysis** and **linear algebra**.
* Adds support for large, multi-dimensional arrays and matrices 
* Enables a large collection of high-level mathematical functions to operate on these arrays 
* Almost all libraries in the **PyData Ecosystem**, for **data analysis with Python**, rely on **NumPy** as one of their main building blocks
* Is **computationally fast**, as it has bindings to C libraries 
* Take a time to investigate why you would want to use **NumPy arrays** instead of **Python built-in lists**; e.g., check out this [StackOverflow post](http://stackoverflow.com/questions/993984/why-numpy-instead-of-python-lists).

Now, we will only overview the basics of NumPy, to get started we need to install it!

# NumPy installation...
> 
**It is highly recommended you install Python using the Anaconda distribution to make sure all underlying dependencies (such as Linear Algebra libraries) all sync up with the use of a conda install. We can install NumPy by going to the Linux terminal and typing:**

* `conda install numpy`
* `pip3 install numpy`
* ` apt install python-numpy`

# Numpy usage...
Numpy has many built-in functions and capabilities
* We won't cover them all, 
* We will focus on some of the most important aspects of Numpy:
    * **arrays**, **matrices**, and **random number generation**. 

Once you've installed NumPy you can import it as a library:

In [1]:
import numpy as np

In [2]:
len(dir(np))

607

# Creating numpy `array`
## From a Python `list`
We can create an array by converting a `list` or a `list` of `lists`:

In [3]:
my_list = [1,2,3]
my_list

[1, 2, 3]

In [4]:
my_list= np.array(my_list)
my_list

array([1, 2, 3])

In [5]:
my_matrix = [[1,2,3],[4,5,6],[7,8,9]]
my_matrix

[[1, 2, 3], [4, 5, 6], [7, 8, 9]]

In [6]:
my_matrix = np.array(my_matrix)
my_matrix

array([[1, 2, 3],
       [4, 5, 6],
       [7, 8, 9]])

# Built-in methods for `array` generation
There are lots of built-in ways to generate `array`s
* `zeros`
* `ones`
* `eye`
* `arange`
* `linespace`
* ...

## Built-in methods: ``zeros()``
Generate arrays of zeros or ones

In [7]:
np.zeros(5) # initialized at Zeros

array([ 0.,  0.,  0.,  0.,  0.])

In [8]:
np.zeros((5,5)) # 2D array initialized at Zeros

array([[ 0.,  0.,  0.,  0.,  0.],
       [ 0.,  0.,  0.,  0.,  0.],
       [ 0.,  0.,  0.,  0.,  0.],
       [ 0.,  0.,  0.,  0.,  0.],
       [ 0.,  0.,  0.,  0.,  0.]])

## Built-in methods:  ``ones()``

In [9]:
np.ones(5) # 1D array initialized at Ones

array([ 1.,  1.,  1.,  1.,  1.])

In [10]:
np.ones((5,5)) # 2D array initialized at Ones

array([[ 1.,  1.,  1.,  1.,  1.],
       [ 1.,  1.,  1.,  1.,  1.],
       [ 1.,  1.,  1.,  1.,  1.],
       [ 1.,  1.,  1.,  1.,  1.],
       [ 1.,  1.,  1.,  1.,  1.]])

## Built-in methods:  ``eye()``
Creates an **Identity Matrix**

In [11]:
np.eye(10)

array([[ 1.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.],
       [ 0.,  1.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.],
       [ 0.,  0.,  1.,  0.,  0.,  0.,  0.,  0.,  0.,  0.],
       [ 0.,  0.,  0.,  1.,  0.,  0.,  0.,  0.,  0.,  0.],
       [ 0.,  0.,  0.,  0.,  1.,  0.,  0.,  0.,  0.,  0.],
       [ 0.,  0.,  0.,  0.,  0.,  1.,  0.,  0.,  0.,  0.],
       [ 0.,  0.,  0.,  0.,  0.,  0.,  1.,  0.,  0.,  0.],
       [ 0.,  0.,  0.,  0.,  0.,  0.,  0.,  1.,  0.,  0.],
       [ 0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  1.,  0.],
       [ 0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  1.]])

## Built-in methods:  ``arange()``
Return **evenly spaced values** within a given interval.

In [12]:
np.arange(0,11)

array([ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10])

In [13]:
np.arange(0,11,2)

array([ 0,  2,  4,  6,  8, 10])

## Built-in methods:  ``linspace()``
Return **evenly spaced numbers** over a specified interval.

In [14]:
np.linspace(0,10,5)

array([  0. ,   2.5,   5. ,   7.5,  10. ])

In [15]:
np.linspace(0,10,50) # evenly spaced values

array([  0.        ,   0.20408163,   0.40816327,   0.6122449 ,
         0.81632653,   1.02040816,   1.2244898 ,   1.42857143,
         1.63265306,   1.83673469,   2.04081633,   2.24489796,
         2.44897959,   2.65306122,   2.85714286,   3.06122449,
         3.26530612,   3.46938776,   3.67346939,   3.87755102,
         4.08163265,   4.28571429,   4.48979592,   4.69387755,
         4.89795918,   5.10204082,   5.30612245,   5.51020408,
         5.71428571,   5.91836735,   6.12244898,   6.32653061,
         6.53061224,   6.73469388,   6.93877551,   7.14285714,
         7.34693878,   7.55102041,   7.75510204,   7.95918367,
         8.16326531,   8.36734694,   8.57142857,   8.7755102 ,
         8.97959184,   9.18367347,   9.3877551 ,   9.59183673,
         9.79591837,  10.        ])

# Built-in methods for *Random Number Generation*
Numpy also has lots of ways to create **random number arrays**:
* ``rand``
* ``randn``
* ``randint``

## Built-in methods:  ``rand()``
Creates an array of the given shape and populates it with random 
samples from a **uniform distribution** over ``[0, 1)``, same as $X\sim U[0,1)$

In [16]:
np.random.rand(4)

array([ 0.7129766 ,  0.15353546,  0.63258483,  0.24711646])

In [17]:
np.random.rand(4,4)

array([[ 0.65070493,  0.40990416,  0.96882823,  0.54819764],
       [ 0.03144442,  0.49906145,  0.44495572,  0.0121219 ],
       [ 0.12765592,  0.2993503 ,  0.52396754,  0.8426172 ],
       [ 0.94604857,  0.41335213,  0.40312477,  0.77898079]])

## Built-in methods: ``randn()``
Return a sample (or samples) from the **standard normal** distribution. 

In [18]:
np.random.randn(4)

array([ 1.41061103, -0.49236567,  1.09879182,  0.1372343 ])

In [19]:
np.random.randn(4,4)

array([[ 1.84660236,  0.05195818,  0.52348476,  1.38288347],
       [-0.74263512, -0.39761617, -0.82491337, -0.91454133],
       [ 0.37058191,  0.63825387, -0.15442887,  1.2103932 ],
       [-0.28872447, -0.91061098, -0.26317911,  1.39880843]])

## Built-in methods: ``randint()``
Return random integers from `low` (inclusive) to `high` (exclusive).

In [20]:
np.random.randint(1,100)

26

In [21]:
np.random.randint(1,100,10)

array([80,  1, 39, 99, 67, 72, 86, 88, 88, 21])

# Built-in methods for ``array`` manipulation
Let's discuss some useful attributes and methods for ``array``s:

In [22]:
arr = np.arange(15)
ranarr = np.random.randint(0,15,15)

In [23]:
arr

array([ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12, 13, 14])

In [24]:
ranarr

array([ 6,  5,  4,  3, 13,  3,  9, 10,  4, 13,  1, 12, 14, 12, 11])

## Built-in methods: ``reshape()``
Returns an array containing the same data with a new shape.

In [25]:
arr.reshape(3,5)

array([[ 0,  1,  2,  3,  4],
       [ 5,  6,  7,  8,  9],
       [10, 11, 12, 13, 14]])

In [26]:
arr.reshape(5,3)

array([[ 0,  1,  2],
       [ 3,  4,  5],
       [ 6,  7,  8],
       [ 9, 10, 11],
       [12, 13, 14]])

In [27]:
arr  # unchanged

array([ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12, 13, 14])

## Built-in methods: ``max()``, ``argmax()``
These are useful methods for finding **max** value and its corresponding **index location**.

In [28]:
ranarr

array([ 6,  5,  4,  3, 13,  3,  9, 10,  4, 13,  1, 12, 14, 12, 11])

In [29]:
ranarr.max()

14

In [30]:
ranarr.argmax()

12

## Built-in methods: ``min()``,   ``argmin()``
These are useful methods for finding **min** value and its corresponding **index location**

In [31]:
ranarr

array([ 6,  5,  4,  3, 13,  3,  9, 10,  4, 13,  1, 12, 14, 12, 11])

In [32]:
ranarr.min()

1

In [33]:
ranarr.argmin()

10

# Built-in ``array`` method and attribute 

## Built-in method:  ``reshape()``, attribute: ``shape``
`shape` is an attribute of arrays, but `reshape()` is a method.

In [34]:
arr = np.arange(3)

In [35]:
arr.shape # Vector

(3,)

In [36]:
arr.reshape(3,1)  # Column Vector

array([[0],
       [1],
       [2]])

In [37]:
arr.reshape(3,1).shape  # Column Vector

(3, 1)

In [38]:
arr.reshape(1,3)  # Arrow Vector

array([[0, 1, 2]])

In [39]:
arr.reshape(1,3).shape # Arrow Vector

(1, 3)

## Built-in attribute: dtype
We can also grab the `data_type` of the array object:

In [40]:
arr.dtype

dtype('int64')