# Machine Learning With Python
### Covering:

* 1. Introduction to Machine Learning
* 2. Machine Learning Workflow
* 3. Evaluation Metrics
* 4. Underfitting and Overfitting
* 5. Building a Linear Regression Model
* 6. Deploying your model with Streamlit / Gradio

## 1. Introduction to Machine Learning<br>

Machine Learning is the branch of tech where computers are given the autonomy to learn from data and subsequently make decisions on their own without needing much human intervention.

### Here's the Wikipedia definition of Machine Learning:

**Machine learning (ML) is the study of computer algorithms that can improve automatically through experience and by the use of data. It is seen as a part of artificial intelligence. Machine learning algorithms build a model based on sample data, known as training data, in order to make predictions or decisions without being explicitly programmed to do so. -- Wikipedia**


What exactly does that mean?
It's simple! we don't program machines, rather, we simply feed them enough food (data) and they'll do the Lord's work for us (figure out the rest on their own.) In essence, we only need to show them the data and they figure out the rest.

Still not clear? Let me paint a scenario with this example to enable you understand better.

**Machine Learning**

Imagine you have images of dogs and cats and you want your computer to know how they look like.In machine learning, you feed the images of cat and dog to your computer, it looks at what dogs and cats look like and the differences between them (all by itself!) Now that your computer knows what dogs and cats look like, you can now give a random image of a cat or a dog to your computer and it will figure out whether its a cat or a dog just like we (humans) would.

**Traditional Programming**

In order to accomplish this same task without machine learning, you'd have to tell your computer, "hey!, cats have sharper ears than dogs and have longer tails and have whiskers and..." by writing programming codes. And this can get really cumbersome!

This brings us to the next topic, Machine Learning Vs Traditional programming.

## Machine Learning Vs Traditional Programming**
The key inputs to any typical machine learning system are data and the answers (or labels). By providing those two things, we get the answers or rules. Let's think about it in terms of our previous example: if you provide a bunch of pictures of cats and dogs to the ML system along with their labels(their names), you can get the rules which can help you perform the task of classifying cat and dog.

On the flip side, in standard programming, this is not what we do, and if you are a coder you'll know it well. In standard programming, for you to get results, you must provide both data and rules.

![Machine%20learning%20image.png](attachment:Machine%20learning%20image.png)

## Types of Machine Learning

In broad, there are 5 types of Machine learning:

>**Supervised Learning**: This is type of machine learning where the labels of your data are known. Think of labels as what represents the features (the input data.) The supervised learning method consists of input-output pairs where the algorithm learns from examples. Most ML problems fall in the category of supervised learning.

An example of supervised learning is our earlier example of cats and dogs; if cat images are labelled as cats and same for dogs, it'll be much easier for a machine learning model to relate each image with its corresponding label.

>**Unsupervised Learning**: This is type of machine learning where the labels of your data are unknown. This type of machine learning involves training algorithms with unlabeled data. Datasets are then scanned for meaningful connections using the algorithm.

An example of unsupervised learning is customer segmentation. Let's say that you want to provide promotions to a group of your customers based on their purchasing history, but you don't know these groups and their interests so well. You only have their data. Using unsupervised techniques such as clustering, you can group customers who share the same interests and will likely all appreciate the promotion that you're offering. That's just one example, there are more applications of unsupervised learning.

>**Semi-supervised Learning**: This is type of machine learning that uses the combination of supervised and unsupervised learning. It uses a small amount of labeled data and a large amount of unlabeled data, which provides the benefits of both unsupervised and supervised learning while avoiding the challenges of finding a large amount of labeled data. That means you can train a model to label data without having to use as much labeled training data.

Example of semi-supervised learning is combining clustering and classification algorithms. Clustering algorithms are unsupervised machine learning techniques that group data together based on their similarities. The clustering model will help us find the most relevant samples in our data set. We can then label those and use them to train our supervised machine learning model for the classification task.

>**Self-supervised Learning**: This is type of machine learning that obtains supervisory signals from the data itself, often leveraging the underlying structure in the data. The general technique of self-supervised learning is to predict any unobserved or hidden part (or property) of the input from any observed or unhidden part of the input. In essence, Self-supervised Learning allows AI systems to break down complex tasks into simple ones to arrive at a desired output despite the lack of labeled datasets. The basic concept of self-supervision relies on encoding an object successfully. A computer capable of self-supervision must know the different parts of any object so it can recognize it from any angle. Only then can it classify the thing correctly and provide context for analysis to come up with the desired output.

For example, as is common in NLP, we can hide part of a sentence and predict the hidden words from the remaining words. We can also predict past or future frames in a video (hidden data) from current ones (observed data). Since self-supervised learning uses the structure of the data itself, it can make use of a variety of supervisory signals across co-occurring modalities (e.g., video and audio) and across large data sets — all without relying on labels.

The closest we have to self-supervised learning systems are the so-called “Transformers.” These are ML models that successfully use natural language processing (NLP) without the need for labeled datasets. They are capable of processing massive amounts of unstructured data and “transform” them into usable information for various purposes. The Transformers are behind Google’s BERT and Meena, OpenAI’s GPT2, and Facebook’s RoBERTa. But while they are better than their predecessors at answering questions, they still require much work to hone their understanding of human linguistics.

>**Reinforcement Learning**: This is the type of machine learning that deals with the behaviour of agents in an environment where they must make decisions in order to maximize some notion of cumulative reward. In Reinforcement Learning (RL) agents are trained on a reward and punishment mechanism. The agent is rewarded for correct moves and punished for the wrong ones. In doing so, the agent tries to minimize wrong moves and maximize the right ones.

Reinforcement learning is applied in areas such as Robotics, Self-Driving cars, Game Development, etc.

### Categories of ML Problems
In terms of problems that we can solve with machine learning, there are 3 categories which are closely connected to the types we saw in the last section.

**Classification**: This falls in the supervised learning type. As we saw, the example can be to classify a dog or cat. Whether you have two categories or more, they are all classification problems. With two categories, it is usually termed as binary classification, multi-classifcation for more than two categories, and multi-label classification involves predicting zero or more class labels.

**Regression**: This is also a supervised type. The goal here is to predict a continuous value. Example is predicting the price of a house given its size, region, number of rooms, etc...

Clustering where the goal is to group some entities based on given characteristics. An example can be to group the customers based on some similar characteristics.

## ML Applications
Machine learning algorithms are used in a wide variety of applications, such as in medicine, email filtering, speech recognition, and computer vision, where it is difficult or unfeasible to develop conventional algorithms to perform the needed tasks.

1. It's fair to say that Machine Learning has transformed many industries, from banking, healthcare, production, streaming, to autonomous vehicles. Here are other detailed scenarios highlighting the applications of machine learning in the real world settings:

2. A bank or any credit card provider can detect fraud in real-time. Banks can also predict if a given customer requesting a loan will pay it back or not based on their financial history.

3. A Medical Analyst can diagnose a disease in a handful of minutes, or predict the likelihood or cause of diseases or survival rate(Prognosis).

4. An engineer in a given industry can detect failure or defect on the equipment.

5. A telecommunication company can learn that a given customer is not satisfied with the service and is likely to opt-out from the service (churn).

6. Our email inboxes are smart enough to differentiate spams and important emails.

7. A driverless car can confidently know that an object in front is a pedestrian.

8. A streaming service can suggest the best media to their clients based on their interests.<br>




EXERCISE 1: Watch The Video Below and Give Your Own Defintion of Machine Learning

In [10]:
from ipywidgets import widgets

out=widgets.Output()

with out:
    from IPython.display import YouTubeVideo
    video = YouTubeVideo(id='5q87K1WaoFI', width=730, height=410, fs=1, rel=0)
    print("Video available at https://youtube.com/watch?v" + video.id)
    display(video)
    
display(out)    

Output()

# 2. Machine Learning Workflow

![MAL%202.jpg](attachment:MAL%202.jpg)

Data science is not just about creating models in a notebook. It's also mastering the key components of the machine learning workflow.

The unspoken rule of thumb in ML workflow is to first understand the problem statement. Problem definition is the important and the initial step in any ML project. This is where you make sure you understand the problem really well. Understanding the problem will give you proper intuitions about the next steps to follow such as right learning algorithms, etc. After you have defined your problem, the next step is to find the relevant data.

* Data ingestion pipelines
* Data wrangling at scale
* Featurization
* Model training and evaluation
* Model deployment and monitoring

# 3. Evaluation Metrics

Earlier in this introduction to Machine Learning, we saw that most problems are either regression or classification. In this section, we will learn the evaluation metrics that are used in evaluating the performance of the machine learning models.

Let's kick this off with the regression metrics!

### Regression Metrics**
In regression, the goal is to predict the continuous value. The difference between the actual value and the predicted value is called the error.

`Error = Actual value - Predicted value`

The square of the error over all samples is called mean squarred error.

`MSE = SQUARE(Actual value - Predicted value)/Number of Samples`

*MSE Actual Formula*:

$$\frac 1n\sum_{i=1}^n(y_i-\hat{y}_i)^2$$


Taking the square root of the mean squared error will give the Root Mean Squared Error(RMSE). RMSE is the most used regression metric.

*RMSE Actual Formula*:

$$\frac 1n\sum_{i=1}^n|y_i-\hat{y}_i|2$$
    
There are times that you will work with the datasets containing outliers. In this case, the commonly used metric is called Mean Absolute Error (MAE). As simple as calculating MSE, MAE is also the absolute of the error.

`MAE = ABSOLUTE (Actual value - Predicted Value)`

*MAE Actual Formula*

$$\frac 1n\sum_{i=1}^n|y_i-\hat{y}_i|$$
    
Like said, MAE is very sensitive to outliers. This will make it a suitable metric for all kinds of problems which are likely to have abnormal scenarios such as time series.

### Classification Metrics**
In classification problems, the goal is to predict the categories/class. Accuracy is the most used metric. The accuracy shows the ability of the model in making the correct predictions. Take an example, in a horse/human classifier. If you have 250 training images for horses and the same number for humans, and the model can confidently predict 400 images, then the accuracy is 400/500 = 0.8, so your model is 80% accurate.

The accuracy is simply an indicator of how your model is in making correct predictions and it will only be useful if you have a balanced dataset (like we had 250 images for horses and 250 images for humans).

When we have a skewed dataset or when there are imbalances, we need a different perspective on how we evaluate the model. Take an example, if we have 450 images for horses and 50 images for humans, there is a chance of 90% (450/500) that the horse will be correctly predicted, because the dataset is dominated by the horses. But how about humans? Well, it's obvious that the model will struggle predicting them correctly.

This is where we introduce other metrics that can be far more useful than accuracy, such as precision, recall, and F1 score.

Precision shows the percentage of the positive predictions that are actually positive. To quote [Google ML Crash Course](https://developers.google.com/machine-learning/crash-course/classification/precision-and-recall), precision answer the following question: What proportion of positive identifications was actually correct?

The recall on the other hand shows the percentage of the actual positive samples that were classified correctly. It answers this question: `What proportion of actual positives was identified correctly?`

There is a tradeoff between precision and recall. Often, increasing precision will decrease recall and vice versa. To simplify things, we combine both of these two metrics into a single metric called the F1 score.

F1 score is the harmonic mean of precision and recall, and it shows how good the model is at classifying all classes without having to balance between precision and recall. If either precision or recall is very low, the F1 score is going to be low too.

Both accuracy, precision, and recall can be calculated easily by using a confusion matrix. A confusion matrix shows the number of correct and incorrect predictions made by a classifier in all available classes.

More intuitively, a confusion matrix is made of 4 main elements: True negatives, false negatives, true positives, and false positives.

* **True Positives(TP)**: Number of samples that are correctly classified as positive, and their actual label is positive.

* **False Positives (FP)**: Number of samples that are incorrectly classified as positive, when in fact their actual label is negative.

* **True Negatives (TN)**: Number of samples that are correctly classified as negative, and their actual label is negative.

* **False Negatives (FN)**: Number of samples that are incorrectly classified as negative, when in fact their actual label is positive.

The accuracy that we talked about is the number of correct examples over total examples. So, that is

`Accuracy = (TP + TN) / (TP + TN + FP + FN)`

Precision is the model accuracy on predicting positive examples.

`Precision = TP / (TP + FP)`

On the other hand, Recall is the model ability to predict the positive examples correctly.

`Recall = TP / (TP+FN)`

The higher the recall and precision, the better the model is at making accurate predictions but there is a tradeoff between them. Increasing precision will reduce the recall and vice versa.

A classifier that doesn't have false positives has a precision of 1, and a classifier that doesn't have false negatives has a recall of 1. Ideally, a perfect classifier will have the precision and recall of 1.

We can combine both precision and recall to get another metric called F1 Score. F1 Score is the harmonic mean of precision and recall.

`F1 Score = 2 *(precision * recall) / (precision + recall)`

Take an example of the following confusion matrix.
    
    
![Mal%203.png](attachment:Mal%203.png)
    
From the above confusion matrix:

* Accuracy = `(TP + TN) / (TP + TN + FP + FN) = (71 +36)/(71+36+7+0) = 0.93 or 93%`
* Precision = `TP / (TP + FP) = 71/(71+7) =0.91 or  91%`
* Recall = `TP / (TP + FN) = 71/(71+0) = 1, or 100%`
* F1 score = `2PR / (P + R) = 2x0.91x1/(0.91+1) = 0.95, or 95%`

Both accuracy, confusion matrix, precision, recall, and F1 score are implemented easily in Scikit-Learn, a Machine Learning framework used to build classical ML algorithms.    



# 4. Overfitting and Underfitting

Building a machine learning model that can fit the training data well is not a trivial task. Often, at the initial training, the model will either underfit or overfit the data. Some machine learning models proves that really well. Take an example: When training a decision trees, it is very likely that they will overfit the data at first.

There is a trade off between underfitting/overfitting, and so it's important to understand the difference between them and how to handle each. Understanding and handling underfitting/overfitting is a critical task in diagonizing machine learning models.

### Underfitting (High Bias)

Underfitting happens when the model does poor on the training data. It can be caused by the fact that the model is simple for the training data or the data does not contains the things that you are trying to predict. Good data has high predictive power, and poor data has low predictive power.

Here are some of the techniques that can be used to deal with a model which has high bias(underfit):

* Use complex models. If you are using linear models, try other complex models like Random forests or Support Vector Machines. Not to mention neural networks if you are dealing with unstructured data (images, texts, sounds)
* Add more training data and use good features. Good features have high predictive power.
* Reduce the regularization.
* If you're using neural networks, increase the number of epochs/training iterations. If the epochs are very low, the model may not be able to learn the underlying rules in data and so it will not perform well.

### Overfitting (High Variance)
Overfitting is the reverse of underfitting. An overfitted model will do well on the training data but will be so poor when supplied with a new data (the data that the model never saw).

Overfitting is caused by using model which is too complex for the dataset and few training examples.

Here are techniques to handle overfitting:

Try simple models or simplify the current model. Some machine learning algorithms can be simplified. Take an example: in neural networks, you can reduce the number of layers or neurons. Also in classical algorithms like Support Vector Machines, you can try different kernels, a linear kernel is simple than a polynomial kernel.
Find more training data.
Stop the training early (a.k.a Early Stopping)
Use other different regularization techniques like dropout(in neural networks).
To summarize this, it is very important to be able to understand why the model is not doing well. If the model is being poor on the data it was trained on, you know it is underfitting and you know what to do about it.

Also with the exception of improving/expanding the training data, often you have to tune hyperparameters to get a model that can generalize well. While there are techniques that simplified hyperparameter search(like [Grid search](https://scikit-learn.org/stable/modules/generated/sklearn.model_selection.GridSearchCV.html#sklearn.model_selection.GridSearchCV), [Random search](https://scikit-learn.org/stable/modules/generated/sklearn.model_selection.RandomizedSearchCV.html), [Keras Tuner]), it is important to understand the hyperparameters of the model you are using so that you can know their proper search space.


<a name='5'></a>
# 5. Building a Linear Regression Model

![](https://i.imgur.com/1EzyZvj.png)

##### <a name='regression'></a>
Early in the previous parts, we saw that regression is a supervised learning type in which we are predicting a continous value. Take an example, given the information about a person's age, body mass index, smoking status, region of residence, can you predict the medical expenses the person is likely to incur? 

This tutorial takes a practical and coding-focused approach. We'll work through a typical regression problem in machine learning problem step-by-step; and to achieve that, we will use Scikit-Learn. 


As much as we can, we will try to structure all machine learning projects in accordance to the standard ML worklow. Here are the typical steps that you will see in most ML projects: 

* [5.1 Problem Statement](#7)
* [5.2 Loading the data](#8)
* [5.3 Exploraroty Data Analysis](#9)
* [5.4 Feature Engineering](#10)
* [5.5 Correlation Analysis](#11)
* [5.6 Data Processing](#12)
* [5.7 Understanding Linear Regression](#13)
* [5.8 Loss Function](#14)
* [5.9 Optimizer (Mininizing The Loss Function)](#15)
* [5.10 Choosing and Training a model](#16)
* [5.11 Model Evaluation](#17)
* [5.12 Improving the Model](#18)

### 5.1 Problem Statement

> ACME Insurance Inc. offers affordable health insurance to thousands of customer all over the United States. As the lead data scientist at ACME, *you're tasked with creating an automated system to estimate the annual medical expenditure for new customers*, using information such as their age, sex, BMI, children, smoking habits and region of residence. 
>
> Estimates from your system will be used to determine the annual insurance premium (amount paid every month) offered to the customer. Due to regulatory requirements, you must be able to explain why your system outputs a certain prediction.
> 
> You're given a CSV file containing verified historical data, consisting of the aforementioned information and the actual medical charges incurred by over 1300 customers. 
> <img src="https://i.imgur.com/87Uw0aG.png" width="480">
>
Dataset source: [Github](https://raw.githubusercontent.com/JovianML/opendatasets/master/data/medical-charges.csv)

<a name='8'></a>
## 5.2 Loading the Data


We will collect the data from the internet. Let's load it but first, let's import all relavant libraries that we will need.

In [13]:
import numpy as np
import pandas as pd
import seaborn sns
import plotly.express as px
import matplotlib.pyplot as plt

SyntaxError: invalid syntax (2052473716.py, line 3)