# Machine Learning Begins

## Definition

![Boss: "Internet is trying to kill me." Programmer: "We call it 'Machine Learning'." - Dilbert.com (2013)](images/intro/ml-dilbert.gif)

---

> Machine Learning is a field of study that gives computers the ability to learn without being explicitly programmed.
> 
> *Arthur Samuel (1959)*

---

> A computer is said to *learn* from experience E with respect to some task T and some performance measure P, if its performance on T, as measured by P, improves with experience E.
> 
> *Tom Mitchell (1998)*

---

<sub>Source: [Coursera's "What is Machine Learning?" lecture](https://pt.coursera.org/learn/machine-learning/lecture/Ujm7v/what-is-machine-learning)</sub>


## Scope

![Artificial Intelligence (AI) is any technique that enables computers to mimic human behavior such as voice, vision, brain. Machine Learning (ML) is a subset of AI which enables computers to learn by giving experiences. Deep Learning (DL) is a subset of ML that works with multi-layer neural networks.](images/intro/ai_ml_dl.jpg)

## Hype

![In 2017, Machine Learning comes right after Deep Learning, which is exactly on top the "Peak of inflated expectations" - Gartner's Hype Cycle for Emerging Technologies, 2017](images/intro/gartner-hype-cycle-for-emerging-technologies-2017.jpg)

## Skill set

### Data Science Metromap

![Metromap: Fundamentals, Statistics, Programming, Machine Learning, Text mining/NLP, Visualization, Big Data, Data Munging, Data Ingestion, Toolbox](images/intro/data-science-roadmap.png)

### Typical kind of resources we find in the web

> How neural network layers internally work

![](images/intro/neural-network-internals.jpeg)

---

> Regularized linear regression formula

![](images/intro/regularized-linear-regression.png)


### Typical reactions

![Jurassic Park character getting his glasses off in fear.](images/intro/sam-neill-glasses-off-jurassic-park.gif)

---

![Jurassic Park character trembling with fear.](images/intro/jurassic-park.gif)

### Typical concerns

> But I won't manage to master everything on Data Science Metromap!

---

> But I don't understand those mathematical formulas! I'm not an statistician!

---

> But I can't understand exactly how the [internals](http://www.kdnuggets.com/2016/05/implement-machine-learning-algorithms-scratch.html) work!

---

> But I don't have Big Data!

### Machine Learning Pyramid

![Data Engineers knows how to extract and transform data structures in different infrastructures with performance, quality, scalability, and reliability. Machine Learning Engineers understands and knows how to use different algorithms to create solutions. Machine Learning Researchers creates new algorithms, paradigms, and mathematical models.](images/intro/skills-pyramid.png)

### Baby steps

> We can start just understanding how different algorithms work and when and how to use them to solve problems!

![Keep calm and learn machine learning](images/intro/keep-calm-and-learn-machine-learning.jpg)

## Model

- Simple workflow (generic example)

![In broad words, a typical machine learning process workflow consists in passing your data as the input to a machine learning algorithm that will generate a machine learning model as the output, so you have a prediction service to better understand new data, based on the past.](images/workflow/typical-workflow.jpg)

- Supervised learning workflow (example with an image as input)

![In detail, in a supervised learning example (we'll understand each type in the next section), ](images/workflow/machine-learning-phases.png)

## Steps to build a great model


![Problem/roi; Data gathering; Exploratory data analysis; Choose algorithm; Data cleaning; Feature engineering; Evaluation; Tunning; Predict/Discover/Execute;](images/workflow/Process-Overview.png)

---

### Some extra concerns
 - Bias
 - Underfitting
 - Overfitting
 - Cross validation

## Types of Algorithms

![](images/workflow/major-types-of-learning.png)

---

### [Machine Learning Types](http://en.proft.me/2015/12/24/types-machine-learning-algorithms/)

- Supervised Learning (predictive modeling, labeled data)
 - Classification (Diagnose alzheimer positive or negative in image)
 - Regression (Predict population growth)
- Unsupervised Learning (descriptive modeling, unlabeled data)
 - Clustering (Customer segmentation)
  - Association (Product recommendation)
 - Dimensionality Reduction (Big data visualization, Meaninful compression)
- Semi-supervised Learning (mixture of labeled and unlabeled data)
- Reinforcement Learning (learn from environment exposure by trial and error)

---

![](images/types/learning-types-cheat-sheet.jpg)

### Exercise!

- Find similar Stack Overflow posts
- Real time attractive ecommerce discount
- Categorize new interview data into jr/mid/sr
- Obfuscate sensitive data to later classify
- Discover groups of interest in GitHub
- Choose best NYSE option to invest
- Estimate my house pricing

## Examples of algorithms (to explore)

![](images/types/algorithms-characteristics-2.jpg)

![](images/types/algorithms-characteristics.jpg)

---

![](images/types/algorithm-cheat-sheet.png)

---

![](images/types/algorithm-types.jpg)

## Some technologies

### Programming languages

- [**Python**](https://www.python.org/)
- [R](https://www.r-project.org/)

### Code share and visualization

- [**Jupyter Notebook**](http://jupyter.org/)

### Data structure packages 

- [**NumPy**](http://www.numpy.org/) (N-dimensional array package)
- [**Pandas**](http://pandas.pydata.org/) (Data frame and data analysis package)

### Data visualization

- [**matplotlib**](https://matplotlib.org/) (2D plotting package)
- [d3](https://d3js.org/)
- [Tableau](https://www.tableau.com/) (Data Analysis)

### ML algorithms & ecosystem

- [**scikit-learn**](http://scikit-learn.org)
- [PyML](http://pyml.sourceforge.net/)
- [Keras](https://keras.io/)/[Tensor Flow](https://www.tensorflow.org/) (Neural network/Deep learning algorithms)

### Distributed data processing

- [PySpark](https://spark.apache.org/docs/0.9.0/python-programming-guide.html)/[Spark](https://spark.apache.org/)

### AIaaS/MLaaS

- [Google Cloud AI](https://cloud.google.com/products/machine-learning/)
- [Amazon AI](https://aws.amazon.com/pt/amazon-ai/)
- [Azure Machine Learning](https://azure.microsoft.com/pt-br/services/machine-learning/)