### Why Machine learning

- Consistency: With lots of data, it is a reliable decision maker and will lead to consistent predictions, unlike humans
- Ability to scale: If we were to make a recommendor system, or segment customers, or perform linear regression on multiple data points, it would take a lot of manpower todo so.
- Efficent


We needs to have a thorough end to end process. We often use the same models for different data, so the thing that changes most is
- Delivery of the ML solution
    - think power points at one end
    - think APIs, live data on the other

Remember, we need to create ***business value***

##### Real world ML examples

- Spam filters
- Costco predicts demands with regression techniques
- Tiktok and facebook recommendation systems




#### When to apply ML?
- Many organisations won't need it
    - Collect data: We need to collect data in order to even have KPIs in the first place
        - we can do this with logging and basic modelling
    - Cleaning data: create ETL to transform data into a readable table
        - often with DE or analytical engineers
    - Define KPIs and track them
    - Analyse KPIs and Gain insgihts: e.g. EDA. 
        - Once we get directional insights, then we can consider ML
    - Finally, we want to improve and optimise predictions with ML.

<img src='analytical_hierarchy_of_needs.png' alt='image' width='600' height='400'>


- Since ML is so late in the analytical pyramid of needs, when do we need it?
    - Lots of data (generally more than 50k rows)
    - time consuming problem
    - expensive to solve for humans
    - Generally speaking, you want to have a good rule of thumb initially which will help guide most of the learning, but if you need to add more complexity to the rule to gain that last edge, ML is a a fantastic idea.

#### When to NOT use ML
- Lack of data
    - Dependent on problem, and how much production is needed to scale
- Time Constrains
- Have not tried a simple solution:
    - Ockham's Razor: It is better to use a simple solution over a complex solution if they leaad to similar value because a complex solution has many areas where it can go wrong


https://eugeneyan.com/writing/first-rule-of-ml/

##### ***Summary of above***

The first rule of ML is to start without ML
- Machine learning is great, but requires data. Often basic rules and heuristiscs will do
- by trying to solve the problem without ML, you can get a better understanding of the data, or you can get a better idea of the domain naturally
- What do I mean solve the problem manually?
    - instead of using a k means algorithm, why not try to just use an if-else statement to classify the data?
    - use basic heauristics or rules to predict outcomes!

**What should we use instead of ML then?**
Before doing an ML, we do an EDA to find any correlations or relationships int he data. Use plots and summary statistics to do so!
- think scatter plots, .describe() etc
- Use **simple correlations** to figure out relationships
    - select a subset of features with the strongest relationships to visualise.

#### Real life ML process?

<img src='ml%20process%201.png' alt='image' width='600' height='400'>
<p>
<img src='ml%20process%202.png' alt='image' width='600' height='400'>
</p>



Summary of above:
- You get an email from an executive who has a vision that he can improve some form of KPI with ML
- You have a healthy level of skepticism, so you decide to liaise with the product manager to ask more questions
    - What are you doing currently?
    - How are the KPIs at the moment?
    - Is there anything glaring that you see at the moment?
- You look at the data and do an EDA to see what's going on in the data
    - correlation matrices, relationships in the variables
- You set up a project plan
- you take amonth to make a model, and test it so that it's the winning model
- you work with an MLE to create an API to make predictions question
- Reevaluate your performence with an A/B test

<img src='supervised%20and%20unsupervised%20learning.png' alt='image' width='600' height='400'>

#### Regression and classification models

Regression models:
- aim to predict continuous variables
    - e.g. Customer lifetime value
    - forecasting revenue of a company
    - Sales
    - Inventory levels
- real world examples:
    - eta on rides of ubers or google maps
    - predict values of homes on airbnb
    - predict value of ad requests on twitter

Classification problems:
- Aim to predict a **categorical** value
    - whether or not a person churns
    - user retention
    - email sent/ open/ clickthru rate
- Real world examples:
    - predicting advertiser churn on google adwords
    

### **ML for product analytics**

Typical process
1. Data preprocessing
    - null values, duplicate values, missing values
    - outliers
2. EDA
3. Feature engineering
4. Modeling
5. Hypothesis/insights generation
    - often for the hypothesis we need to be able to run an A/B test, which will therefore allow us to see if there is a statistical difference. 

We use ML for product analytics to see if something workedbetter than another, 

The goal is to land a positive impact on a product by influencing a product decision. Working backwards from this goal....
- What can we actually change in the product?
    - what can we choose and how can we decide which option to choose?
    - Should we even decide to change this thing about a product?
- What information is relevant to make that decision?
    - what isn't relevant to the decision?
- Then, follow the typical process above

### **ML for data products**

ML as a data product needs to be accurate, as you need it to forecast future events. This can be:
- recommendor systems
- Clustering algorithms
- forecasting models

Typical process:
1. Problem framing
2. Data collection
3. Data preprocessing
4. EDA
5. Feature engineering
6. Cross-Validation
7. Model
8. Model
9. Productionisation

The below subtitles explain each step in the process

#### 1. Problem framing

Defining the problem:
- There are 2 types of problem statements: vague and defined
    - our goal is to turn a vague problem into a defined problem
    - e.g. vague would be like: How do we drive more revenue? How do you improve our stock?
    - Defined would be like: A bug is causing an error in our pipeline, how do we fix this? We need to order more xxx because of yyy


Understanding the problem:
- How do we achieve the goal above? asking questions!

1. Who is the end user?
2. What is the problem we are trying to solve?
3. What impact would you like with this project?
4. How urgent is this project?
5. Scale, constraints, dependencies?

How to answer the above questions?
- Interviewing subject matter experts
- Reading documents
- light exploratory analysis.

What is a quality problem statement?
- Needs to be **quantitative, specific, user-focused, and have constraints**

e.g.

<img src='problem statement.png' alt='image' width='600' height='300'>

#### Calculating business value of the project

By analysing the value of the porject, we can determine how accurate a model needs to be, and what the business goal is.

Generally speaking, PROFIT is the main goal

profit = revenue - cost
(Mature companies go for profit, startups go for revenue growth)


**So how do we calculate the value the model will drive?**

For example:
- On a website, an item costs 10$ per item, and on average it sells 1000 items per day
- The new model is expected to improve product recommendation accuracy to 60% from 50%
    - this is a 60-50/60 increase (17% increase) in accuracy, which means that there should be a 170 increase in items sold


#### Mapping solutions

Once we have defined the problems, and valide the business value...

- Have 3-4 solutions and determine the pros and cons

1. What type of algorithms to use to get the best outcomes?
2. Is this a supervised or unsupervised approach?
3. Should we be using classification or regression to get the best results?
4. What are the pros and cons of each solution?
5. What is effort vs level of impact ratio?

Sidenote: Don't take too long to map and decide the pros and cons, because once you try one, it'll help to determine which solution plan you want!