# Intro to Machine Learning.

- Analytics = AI + ML + DL + tools used for creating value from data.
- AI = Systems and algos that exhibit human-like intelligence.
- ML = Subset of AI, comprises of using statistical algos to extract intelligence from big data.

- ML : It gives computer ability to learn without being explicitly programmed.
- model = simplified representation of reality created to serve some purpose.
- A prediction model is a formula for estimating the unknown value of interest: **the target**.
- In data science, prediction more generally means to estimate an unknown value.
- Indeed, since data mining techniques involves collecting huge amounts of historical data.
- Models are very often are built and tested using events from the past.

<pre>

                               transduction
                        data -----------------> prediction
                        |                           ^    
                        |                           |
            induction   |                           |
                        |                           | deduction
                        |                           |
                        V                           |
                        model ----------------------|

</pre>

## Intro:
- AI is a discipline
- ML is a subfield of AI
- DL is a subfield of ML.

## Classification of ML algorithms:
- Supervised
- Unsupervised.

- if y = x -> Unsupervised learning
- if y = {0, 1} -> Supervised **binary classification**
- if y = {0, 1, ...} -> Supervised **multiclass classification**
- if y = {-inf, inf} -> Supervised **regression**

## Supervised ML algorithms:
- require the knowledge of both outcome varaible (dependent variable) and the features (independent variable).
- usually a **loss function** is required.
- eg: linear regression and logistic regression.
- ps: logistic regression is just another name for classification.

- amount of data points in y is very important

## Unsupervised ML algorithms:
- No knowledge of outcome variable is given to the algorithms.
- Algorithms must find the possible values of the outcome variable.
- Examples: clustering, principal component analysis, etc.

- principal component analysis = It helps to reduce the number of features.

# ML algorithms:
- For supervised learning:
<pre>
input data -> Model < ----------------
                |                     | Model update
                V                     |
            predict output -------> Error (Loss function)
                |             ^
        Compare |             |
                V             |
            Expected output --|
</pre>

- For unsupervised learning:
<pre>

    input data -> model -> generated example
</pre>



## Why ML?
- It helps in understanding the association between key performance indicators (KPIs).
- Identifying the factors that have a significant impact on the KPIs for effective management.

## Steps in ML:
1. Identify the problem or opportunity for value creation.
2. Identify sources of data and create a data lake.
3. Pre-process the data for issues such as missing and incorrect data.
4. Generate derived variables and transform the data if necessary.
5. Divide the datasets into subsets of training and validation datasets.
6. Build ML models to identify the best model(s) using model performance in validation data.
7. Implement Solution/Decision/Develop Product.

- There are two phases, first phase is training and second phase is validation.
- in training, we simply train the model with largest section of data available.
- in validation, we do the same, but with a different section of data and this data is distinct from training data.
- the purpose of validation, is to ensure that training has happened properly.
- Instructor gives the following example:
    - Training is like securing marks in internal exam.
    - Validation is like securing marks in final exam.



- The main goal is to minimize the loss function,
- Instructor draws a parabola, x = loss function, y = complexity of ML.
- This is curve is called "loss function vs complexity of ML model"
- left hand of parabola is **Low variance and high bias**
- right hand of parabola is **low bias and high variance**
- The global minima of the curve is **moderate bias and moderate variance**

## Unsupervised machine learning algorithms:
- Objective is to generate labels.
- How many groups are required to make the data clusters which can be labelled.

### K-means clustering algorithm:

- Using distance measures such as Euclidean distance in clustering
- Learn to build clusters using sklearn library in python.

## Introduction unsupervised learning (ML)
- Training data = X = {x1, x2, ..., xn}, X ⊂ R<sup>n</sup>
- Clustering / segmentation: 
    - f : R<sup>d</sup> ---> {C1, ..., Ck} (set of clusters).

## Introduction to clustering
- Clustering is a divide-and-conquer strategy which divides the dataset into homogenous groups which can be further used to prescribe the right strategy for different groups.
- In clustering, **the objective** is to ensure that the **variation within a cluster is minimized while the variation between clusters is maximized**.

## Case study: Do clustering operation on customer data.
- Establish a relationship between age and salary with k-mean clustering.

- Loading data:
```python
    import pandas as pd
    customers_df = pd.read_csv("customers.csv")
    customers_df.head(5)
```

- Consider grouping as per their income:
    - Low income with low age
    - Medium income with medium age
    - High income with high age etc...
- For this problem statement there can be 4 possibilities for (age,income) pairings:
    - LL, LH, HL, HH

- Visualizing the relationship:
```python
    # Visualize them before going for clustering
    import pandas as pd
    import numpy as np
    import matplotlib.pyplot as plt
    import seaborn as sn

    sn.lmplot(data=customers_df, x='age', y = 'income');
    plt.title("Fig 1: Customer segments based on income and age")
```

## Euclidean distance
- D(X1, X2) = sqrt(  Summation_of( (Xi1 - Xi2)^2 )  )


## Other methods of distance measurement:
- minkowski distance
- jaccard similarity coefficent
- cosine similarity
- gower's similarity coefficent

## Procedurre of k-mean clustering (All of this happens internally):
1. Decide the value of k.
2. Choose K observations from the data that are likely to be in different clusters. choose observations that are farthest.
3. ...