# Introduction to Machine Learning

## What is Machine Learning?


Machine Learning is the field of study that gives computers the capability to learn without being explicitly programmed.

I like to think of it as a comparison rather than a definition.

- If you **can** give clear instructions on how to do the task - traditional computing
- If you **cannot** give clear instructions but can give lots of examples - machine learning

Let's look at a few illustrations - 

1. Complex mathematical calculations - can give clear instructions: traditional computing
2. Processing a financial transaction - can give clear instructions: traditional computing
3. Differentiate between pictures of cats and dogs - cant give instructions but can give examples: machine learning
4. Playing chess - can give clear instructions for how to play but cannot give instructions on how to win! We can give a lot of past games as examples though: machine learning
5. Customer segmentation - dont know what the segments/groupings are, so giving clear instructions is out of question. Can give a large amount of examples with customer demographic data and purchase history: machine learning

So we can say that traditional programming takes data and program to give us output, while machine learning takes data and output (examples) to give us a program!

<img src='https://drive.google.com/uc?id=1SAu0GNpDqDNRNxEtXRqBX-t20BuB0HcR' align = 'left'/>

## Data is the New Oil!

Data is absolutely **critical** to creating a viable Machine Learning model. Here's simple representation of how data helps us create a model and a model helps us make predictions.

<img src="https://drive.google.com/uc?id=1rM6SBXOMeAcFXu1OLtvk4HOWsXdY_xGU" width=500 height=300 align="left"/>

Here's a short explainer video if the pictures didnt really do it for you...

In [8]:
## Run this cell (shift+enter) to see the video

from IPython.display import IFrame
IFrame("https://www.youtube.com/embed/f_uwKZIAeM0", width="600", height="400")

## What are the Different Types of Machine Learning?

<img src="https://drive.google.com/uc?id=1ESgroj56fbOoE0_xiMhsaibVa8D-_80H" align="left" width="1000" height="800"/>

---
## Course Overview

This course is designed for the 'do-ers'. Our entire focus during this course will be to apply and experiment. Conceptual understanding is very important and we will build a strong conceptual foundation but it will always be in context of a project rather than just a theoretical understanding.

We will be exploring a variety of Machine Learning algorithms. For each we will use an appropriate real world dataset, work on a real problem statement, and execute a project that can become the foundation of your ML skills portfolio and your resume.

You now have access to a full scale ML lab-on-cloud. This is a very powerful tool, IF you use it. Make the most of what you have - explore, experiment, break a few things. You learn the most out of failure!

### What Will We Do?

- We will understand the life cycle of a typical ML project and exercise it through real projects
- We will be exploring a slew of ML algorithms (supervised and un-supervised learning)
- For each of these algorithms we will understand how it works and apply it in a project
- We will extensively work on real world datasets and strive to be hands-on

### What Will We NOT Do?

- We will not cover every ML algorithm under the sun
- We will not cover reinforced learning and deep learning in this course
- We will not go deep into the mathematical, probabilistic, and statistical foundations of ML

## Course Curriculum

**Key Concepts Covered**
1. Lifecycle of a typical ML project
2. Data Pre-processing</td>
    - Data acquisition and loading\n
    - Data integration\n
    - Exploratory data analysis
    - Data cleaning
    - Feature selection
    - Encoding
    - Normalization
3. Picking the Right Algorithm
4. Evaluating Your Model
    - Train - Test Split
    - Evaluation Metrics
    - Under and Over Fitting
5. Other key concepts 
    - Imputation
    - Kernel Functions
    - Bagging
    - Hyperparameters
    - Boosting

**Algorithms Covered**
1. Linear Regression
2. Logistic Regression
3. K Nearest Neighbors
4. Decision Trees
5. Random Forest
6. Naive Bayes
7. Support Vector Machine
8. K Means Clustering
9. Hierarchical Clustering

**Datasets Used**
1. Healthcare - patient data on drug efficacy
2. Telecom - customer profiles
3. Retail - customer profiles
4. Automobile - automobile catalogue make, model, engine specs, etc.
5. Environment - CO2 emmissions data
6. Health Informatics - cancer cell biopsy observations

---
## Life Cycle of a Typical ML Project

A typical ML project goes through 5 major steps - 

1. Define Project Objectives
2. Acquire, Explore and Prepare Data
3. Model Data
4. Interpret and Communicate the Insights
5. Implement, Document, and Maintain

We will work through steps 1 thru 4 during this course. We will **not** be deploying, documenting or maintaining our models.

<img src="https://drive.google.com/uc?id=1hQrE2Q7D_j4T8y5aM8pW-ejuS4VUP7Co" align="left"/>




Let's look at each of steps in further detail - 

1. **Define Project Objectives** - this is very important step that most of us tend to forget. Without a clear understand of why you are doing any project, the project will fail. What the business or clients expects as outcome of the project has to be discussed and understood before you start off.


2. **Acquire, Explore, and Prepare Data** - you will spend a lot of your time on this step when you do an ML project. This is a critical step - exploring the data will help you decide which models you might want to employ, based this preliminary hypothesis you will prepare the data for the next step (Model Data). Here are a few things you will end up doing within this step - 
    - Data acquisition and loading
    - Data integration
    - Exploratory data analysis
    - Data cleaning
    - Feature selection
    - Encoding
    - Normalization


3. **Model Data** - this is the heart of our project. But, most students of ML get stuck on fancy algorithm names. There's a lot more to it than just claiming that you have done a project using SVM or Logistic Regression. You have to be able to articulate how you picked a model, how you trained it, and why did you conclude that the output looks good.
    - Select the algorithm(s) to use
    - Train the model(s)
    - Evaluate performance
    - Tweak parameters and re-evaluate


4. **Interpret and Communicate the Insights** - just modeling the data, showing a few visualizations, and reducing the error is not enough. As an ML engineer you have to be able to talk to your client and help them interpret the outcome of all your hard work. Be ready to answer a few questions - 
    - What interesting patterns did you notice in the data?
    - Did you notice any intrinsic dependencies, correlation, or causation in the features?
    - Why did you pick the algorithm that you did?
    - How did you split the train-test data? why?
    - Is this error rate acceptable? why?
    - How will the outcome of this project help the client?


5. **Implement, Document, and Maintain** - at a real client, you will have to deploy your model in production, document it extensively, and also maintain it going forward. We will not go into this step given we are not going to be deploying our models in production.


## Kick Start!
Here's a 12 minute crash course on ML to kick-start our journey!

In [9]:
## Run this cell (shift+enter) to see the video

from IPython.display import IFrame
IFrame("https://www.youtube.com/embed/z-EtmaFJieY", width="814", height="509")

Here's a great article that summarizes Machine Learning really well.

https://machinelearningmastery.com/basic-concepts-in-machine-learning/