## What is Machine Learning (ML)?
As humans, we are able to learn and get better at doing things and performing tasks. We're born knowing almost nothing, and can do almost nothing for ourselves. But we learn, we try, we improve and optimize. and with experience we get better. Computers can do the same and that's what machine learning is about.

Machine learning is a very powerful tool, and it's used in many different fields. It's used in the medical field, to help doctors diagnose patients, it's used in the financial field, to help banks detect fraud, it's used in the retail field, to help companies recommend products to customers, it's used in the manufacturing field, to help companies optimize their production lines, it's used in the transportation field, to help companies optimize their routes, it's used in the energy, in agriculture, in entertainment, and the list goes on and on and on. 

`<Data Science?>`

Machine learning is the science of getting computers to act without being explicitly programmed to do so. It brings together the fields of statistics, algebra, , and computer science to build algorithms that can learn from data.

> A computer program is said to learn from Experience E with respect to some class of tasks T and performance measure P if its performance at tasks in T, as measured by P, improves with experience E.
> 
> _Tom M. Mitchell_

- **Experience E**: the data we feed to the algorithm in conjunction with the algorithm's actions on that data
- **Tasks T**: the tasks the algorithm is supposed to perform.
  - OCR, image classification, spam detection, Medical Diagnosis by analyzing patient data and finding patterns, ...etc.
- **Performance measure P**: the measure of the algorithm's performance and accuracy.
  - Accuracy, Precision, Recall, F1-Score, ...etc.

xxx
If we take filtering spam emails as an example, we can define the following:
- **Tasks T**: Filtering the spam emails.
- **Experience E**: the emails we feed to the algorithm and whether users have flagged them as spam, in conjunction with the algorithm's actions on that data
- **Performance measure P**: the ratio of emails that are correctly classified as spam or not spam.
xxx

Another great example, to build a self-driving car, we don't need to program it such that if you see a car don't crash into it. Actually even that statement without machine learning would be a lot more complex that that, because you need to programmatically describe what is "another car", what are all the possible shapes and sizes, and colors of a car. And then you need to program it to understand that if you see a car, you need to slow down, and if you see a pedestrian, you need to stop. and if you see a red-light stop, a green you move, unless there's a car or a pedestrian. and oh what is a pedestrian, we need to program what a pedestrian looks like. you need to define that programmatically.... `<fade>` and that's just one example of a single task. 
What machine learning allows us to do, is to just give the program a lot of data about how to drive, by installing a lot of cameras and sensors and actuators on a car and have someone drive it, the longer you drive this, the more scenarios you get into and collect data about, the better the algorithm would be. and once you have all of this data.  and it will learn from that data. and it will be able to drive itself.

So when do we use Machine Learning? Let's talk about that in the next video.

## Why use Machine Learning?
> If the only tool you have is a hammer, you tend to see every problem as a nail.
>
> _The law of Instrument By Abraham Maslow_

Obviously not every problem is a nail and so we need to know when it's appropriate to use machine learning.

2 Factors to guide your decision here:
- The problem's complexity
- and/or the need for adaptivity

### Tasks that are too complex to program
* **Tasks performed by animals or humans**: There's a lot of things that living beings do routinely , yet our introspection of how we do them is not elaborate enough to extract a well defined steps to be able to program it. Speech recognition, understanding images and objects and shapes, and driving cars, playing chess. and quite like how we, humans, do those things, provided the right mindset (algorithm), the more data we're exposed to the better we get at doing them.
* **Tasks Beyond Human Capability**: Things like analyzing very large and very complex data sets, finding patterns in millions od records, weather prediction, learning from the purchasing patterns of customers around the different events, holidays, seasons, ... We have a lot of data, but without tools like machine learning we wouldn't be able to extract the value from it.
* The purchase patterns of customers around different events, holidays, seasons, ...etc.

### Tasks that require adaptivity
Programmed solutions are rigid, once the program is written and installed, it can't change. Machine learning algorithms are flexible, they can change and adapt to new data, new experiences, ... so on.
Programs that would adapt to the changing nature of spam emails.
If spammers realize that your program is not filtering out any email that has contains "4 U" in it, they would start sending emails with "For U", ...etc.
Also in speech recognition, how can you program something to adapt to the different accents and dialects, and even the connotation of the language.

All of this needs a human-like or near-human intelligence, and that's what machine learning is about.

Within that context, you can see a lot of applications of machine learning.
## Applications of ML
- Analyzing images of products on a production line to detect defects and classify them automatically.
- Detecting tumors in brain scans or body scans. Due to the complex nature of the data, we use a more advanced algorithm called Convolutional Neural Networks.
- Creating a recommendation system for a website, to recommend products to users based on their previous purchases.
- Segmenting customers based on their purchasing patterns, to create targeted marketing campaigns (clustering).
- Representing high dimensional data in a lower dimensional space, to visualize it and understand it better (dimensionality reduction).
- Forecasting and optimizing a company revenue, based on historical data (time series analysis).

The list will go on and on and it will keep growing as we keep discovering new applications for machine learning.
Because there are way too many applications of machine learning and way to many algorithms to cover in this course, we will only focus on a few of those, however, it is important to introduce some classifications we will focus on the most important ones, and the ones that are most commonly used in the industry.

## Types of ML Systems:
Learning, even for humans, is very wide domain, and has different types and styles, and so it is for machine learning as well. It also branched out into many different subfields and subdomains. While we will not cover all of them, we will cover some of the most commonly used ones. But it would be important to get a top view of the different types of machine learning systems.

Three are different ways to classify machine learning systems, or to distinguish between them, like:
- Whether or not they are trained with human supervision (supervised, unsupervised, semisupervised, and reinforcement learning).
- weather or not they can learn incrementally on the fly (online versus batch learning).
- weather or not they work by simply comparing new data points to known data points, or instead detect patterns in the training data and build a predictive model, much like humans do (instance-based versus model-based learning).

I will only talk about the first one, the other two may be covered as they arise in the rest of the course.

**whether it is trained with human supervision or not. and I'll explain that in a minute.**
  - **Supervised Learning**
    - for supervised learning, the training data you feed to the algorithm includes the desired solutions, called labels. as in you provide a lot of data that has the correct answer, and the algorithm learns from that data to find the patterns and the rules that would allow it to predict the correct answer for new unlabeled data.
    - if you're building a classifier, that to differentiate between, cats, rabbits, dogs, and tigers, you need to get a lot of examples of each of those, and you need to label them, and then you feed that data to the algorithm, and it will learn from that data to be able to classify new images of cats, rabbits, dogs, and tigers.
    - These labels you provide to the algorithm are called **_supervision signals_**.
    - IT's not just for classification algorithms. It can also be used in regression algorithms, where you want to predict a value, like the price of a house, given a set of features about that house, like the number of rooms, the location, the size, ...etc. you get a lot of data about houses, and you label them with their prices, and then you feed that data to the algorithm, and it will learn from that data to be able to predict the price of a new house, given its features.

    - Examples:
      - k-Nearest Neighbors
      - Linear Regression
      - Logistic Regression
      - Support Vector Machines (SVMs)
      - Decision Trees and Random Forests
      - Neural networks2

  - **Unsupervised Learning**
    - so what is unsupervised? it's the opposite of supervised. in unsupervised learning, the training data you feed to the algorithm does not include the desired solutions, called labels. as in you provide a lot of data that does not have the correct answer, and the algorithm learns from that data to find the patterns and the rules that would allow it to detect the structure of the data, and group the data into clusters.
    - Similar to classification but here we would call it clustering. The algorithms finds patterns and groups the data accordingly.
    - For examples, if you want to segment your customers based on their purchasing patterns, you would feed the algorithm with a lot of data about your customers, and it will find patterns and group them into clusters. maybe those clusters are based on some demographic and interests. and then you can use that information to create targeted marketing campaigns. or provide recommendations. People who purchased this product also purchased this product. People who took this course also took this course. (That would be cool)
    - Examples:
      - Clustering
        - k-Means
        - Hierarchical Cluster Analysis (HCA)
        - Expectation Maximization
      - Visualization and dimensionality reduction
        - Principal Component Analysis (PCA)
        - Kernel PCA
        - Locally-Linear Embedding (LLE)
        - t-distributed Stochastic Neighbor Embedding (t-SNE)
      - Association rule learning
        - Apriori
        - Eclat

   - **Semi-supervised Learning**
    - This is kind of a hybrid of supervised and unsupervised learning., where you have a lot of unlabeled data, and a little bit of labeled data. and the algorithm can learn from both. This is when you have your phone labeling your photos with who's who. and you have a lot of unlabeled photos, and you have a few labeled photos. and the algorithm can use both to create clusters and classification of the data.
    - Examples:
      - Deep Belief Networks
      - Restricted Boltzmann Machines

  - **Reinforcement Learning**
    - Reinforcement Learning is a very different beast. The learning system, called an agent in this context, can observe the environment, select and perform actions, and get rewards in return (or penalties in the form of negative rewards, as shown in Figure 1-12). It must then learn by itself what is the best strategy, called a policy, to get the most reward over time. A policy defines what action the agent should choose when it is in a given situation.
    - For example, many robots implement Reinforcement Learning algorithms to learn how to walk. DeepMind’s AlphaGo program is also a good example of Reinforcement Learning: it made the headlines in May 2017 when it beat the world champion Ke Jie at the game of Go. It learned its winning policy by analyzing millions of games, and then playing many games against itself. Note that learning was turned off during the games against the champion; AlphaGo was just applying the policy it had learned.

1. Batch vs. Online Learning:
  - Batch Learning
  - Online Learning
3. Instance-Based vs Model-Based Learning
  - Instance-Based Learning
  - Model-Based Learning  


## Challenges of ML

### Irrelevant Features
Garbage In, Garbage out. If you feed the algorithm with irrelevant features, it will learn irrelevant patterns and will not be able to generalize well. So it's important to make sure that the features you feed the algorithm with are relevant to the task you want it to perform. otherwise you start introducing noise into the system.

There's a whole area of machine learning called feature engineering, which is about selecting the right features, extracting features from the data, or creating new features from existing ones. You may cover some of that as we start building our models. augementation, ...etc.

### Overfitting the Training Data
Overfitting is when the machine learning model is trained to fit exactly the training data including the noise in the data, and so it will not be able to generalize the underlying information well to new data. The model in this scenario Overfitting occurs when your model learns too much from training data and isn’t able to generalize the underlying information, or when you have a lot of features in your data, and you don't have a lot of data to be able to train your model to detect all the patterns and relationships between the different features and come up with a generalized state.

A fun examples is presidential elections and statements like No party candidate has won the election without state X, or no president were elected under those circumstances. There's only been 56 presidential elections and 44 presidents. That's not a lot od data to train a model on, especially if we expand the features to include things like the scrabble point value of names. It's easy for the model to overfit the data and make predictions that are not accurate.
![overfitting-presidential-candidates](https://imgs.xkcd.com/comics/electoral_precedent.png)

Another example of overfitting is the following diagram. If you have this relationship between 2 different features. you can come up with a model that will fit the data perfectly, but if you try to use it get a prediction, you'll get a very off prediction.

This could also occur when you have a lot of data but this data is biased.

For example, if we're training a model to differentiate between, cats, rabbits, dogs, and tigers. 
The training data has 1000 cats, 1000 dogs, 1000 tigers, and 4000 Rabbits. Then there is a considerable probability that it will identify the cat as a rabbit. In this example, we had a vast amount of data, but it was biased or Nonrepresentative of the population.

We can tackle this issue by:
- Choosing the right algorithm and the right models.
- use data augmentation techniques to increase the size of the training data. and we may talk about that later on in the class.
- Remove the noise in the data, whether the noise here is the irrelevant features or the irrelevant or outlier data points.

https://towardsdatascience.com/an-example-of-overfitting-and-how-to-avoid-it-f6739e67f394
https://www.ibm.com/cloud/learn/overfitting


### Underfitting the Training Data
The opposite of overfitting is underfitting. Underfitting is when the model is not complex enough to fit the data, and so it will not be able to generalize the underlying information well to new data. TIn overfitting the model was too complex and learning everything from the data including the noise. In underfitting, the model is not complex enough to capture the underlying trends and information in the data.

This can also be caused by lack of data or features, can be caused by biased data, or like we said a bad model, or an overly simple model. What we mean by simple and complex here the degree of the polynomial here. 

This is a constant or monomial model or funciton, this is a linear model, this is a quadratic model, this is a cubic model, and this is a quartic model. And so as you can see, the more complex the model, the more fluctuation in the model there is it will fit the data, and the more it will be able to generalize the underlying information to new data.

### Insufficient Quantity of Training Data
The most important task a machine learning algorithm goes through is getting trained on the data. and so the more data you have, the better the algorithm will be. and so if you don't have enough data, you will end up with biased output or inaccurate predictions, or classifications or whatever it is you want to do.

The quantity of data is a relative, and it depends on the complexity of the problem you're trying to solve. and so it's not always easy to know how much data you need. and so you need to experiment with different amounts of data and see how it affects the performance of your algorithm.
However one factor that is the number of features in your data. and so if you have a lot of features, you need to have a lot of data to be able to train your algorithm to detect all the patterns and relationships between the different features.

If we're building a prediction algorithm that will predict a house price based on location only, you would need less data then if you're predicting based on location, number of rooms, proximity to certain places, and so one. You need to have enough data for the algorithms to weight in the effect of each factor on the target value.

In a number of research papers, it was actually found that even with poor algorithms and models, you can get good results if you have enough data. and so it's not always about the algorithm. It is also about the data.


## Hands-on ML

## From Hands-on Machine Learning with Scikit-Learn, Keras, and TensorFlow
the main steps:
1. Look at the Big Picture
2. Get the data.
3. Discover and Visualize the data data to gain insight
4. Prepare the Data for Machine Learning Algorithms
5. Select a model and train it.
6. Fine Tune your model.
7. Present your solution.
8. Launch, monitor, and maintain your system.

## Additional Links and Resources
[Roadmap of mathematics for deep learning](https://towardsdatascience.com/the-roadmap-of-mathematics-for-deep-learning-357b3db8569b)



- Diagnostic analysis

  - Linear Regression
  - Logistic Regression
  - Linear Regression with Multiple Variables
  - Logistic Regression with Multiple Variables
  - Classification
  - Correlation vs Causality
  - Hypothesis Testing



https://courses.helsinki.fi/sites/default/files/course-material/4509270/IntroDS-03.pdf

https://share.mindmanager.com/#publish/LNTmuYzLEKtQcVbSd1XU18immd4PsYbm20tlmSlj

https://plotnine.readthedocs.io/en/stable/

Chapter 2 of the Hands-on ML

GDP per capita vs Life Satisfaction - Linear Regression

https://ourworldindata.org/grapher/gdp-vs-happiness?time=2020

## References
- [Hands-on Machine Learning with Scikit-Learn, Keras, and TensorFlow](https://www.oreilly.com/library/view/hands-on-machine-learning/9781492032632/)