# Reference Book: 
### Hands-on Machine Learning with Scikit-Learn, Keras & TensorFlow (By Aurélien Géron)

# What Is Machine Learning?

- Machine Learning is the science (and art) of programming computers so they can learn from data.
- [Machine Learning is the] field of study that gives computers the ability to learn without being explicitly programmed. —Arthur Samuel, 1959

### Example

- For example, your spam filter is a Machine Learning program that can learn to flag spam given examples of spam emails (e.g., flagged by users) and examples of regular (nonspam, also called “ham”) emails. 

# Why Use Machine Learning?

1. Consider how you would write a spam filter using traditional programming techniques.
<!-- ![Alt text](Images/1.%20ML%20Traditional.png) -->
<img src="Images/1. ML Traditional.png" width="700" style="margin-left: 40px;">

2. First you would look at what spam typically looks like. You might notice that some words or phrases (such as “4U,” “credit card,” “free,” and “amazing”) tend to come up a lot in the subject. Perhaps you would also notice a few other patterns in the sender’s name, the email’s body, and so on.

3. You would write a detection algorithm for each of the patterns that you noticed, and your program would flag emails as spam if a number of these patterns are detected.

4. You would test your program, and repeat steps 1 and 2 until it is good enough.

Since the problem is not trivial, your program will likely become a long list of complex rules—pretty hard to maintain.

1. In contrast, a spam filter based on Machine Learning techniques automatically learns which words and phrases are good predictors of spam by detecting unusually frequent patterns of words in the spam examples compared to the ham examples. The program is much shorter, easier to maintain, and most likely more accurate.

<img src="Images/2. ML Modern.png" width="700" style="margin-left: 40px;">

2. Moreover, if spammers notice that all their emails containing “4U” are blocked, they might start writing “For U” instead. A spam filter using traditional programming techniques would need to be updated to flag “For U” emails. If spammers keep work‐ ing around your spam filter, you will need to keep writing new rules forever.

3. In contrast, a spam filter based on Machine Learning techniques automatically noti ces that “For U” has become unusually frequent in spam flagged by users, and it starts flagging them without your intervention.

- Finally, Machine Learning can help humans learn (Figure 1-4): ML algorithms can be inspected to see what they have learned (although for some algorithms this can be tricky). For instance, once the spam filter has been trained on enough spam, it can easily be inspected to reveal the list of words and combinations of words that it believes are the best predictors of spam. Sometimes this will reveal unsuspected correlations or new trends, and thereby lead to a better understanding of the problem.

- Applying ML techniques to dig into large amounts of data can help discover patterns
that were not immediately apparent. This is called **data mining.**

<img src="Images/3. ML Understanding.png" width="700" style="margin-left: 40px;">


# Types of Machine Learning Systems

1. **Supervised learning**

- In supervised learning, the training data you feed to the algorithm includes the desired
solutions, called labels
- Types:
    1. k-Nearest Neighbors
    2. Linear Regression
    3. Logistic Regression
    4. Support Vector Machines (SVMs)
    5. Decision Trees and Random Forests
    6. Neural networks

<img src="Images/4. Supervised Learning.png" width="700" style="margin-left: 40px;">

2. **Unsupervised learning**

- In unsupervised learning, as you might guess, the training data is unlabeled. The system tries to learn without a teacher.
- Types:
    1. Clustering
        - K-Means
        - DBSCAN
        - Hierarchical Cluster Analysis (HCA)
    2. Anomaly detection and novelty detection
        - One-class SVM
        - Isolation Forest
    2. Visualization and dimensionality reduction
        - Principal Component Analysis (PCA)
        - Kernel PCA
        - Locally-Linear Embedding (LLE)
        - t-distributed Stochastic Neighbor Embedding (t-SNE)
    3. Association rule learning
        - Apriori
        - Eclat

- Example:
    - say you have a lot of data about your blog’s visitors. You may want to run a clustering algorithm to try to detect groups of similar visitors (Figure 1-8). At no point do you tell the algorithm which group a visitor belongs to: it finds those connections without your help. For example, it might notice that 40% of your visitors are males who love comic books and generally read your blog in the evening, while 20% are young sci-fi lovers who visit during the weekends, and so on. If you use a hierarchical clustering algorithm, it may also subdivide each group into smaller groups. This may help you target your posts for each group.

<img src="Images/5. Unsupervised Learning.png" width="700" style="margin-left: 40px;">

- **Visualization alogorithm:**
    - Visualization algorithms are also good examples of unsupervised learning algorithms: you feed them a lot of complex and unlabeled data, and they output a 2D or 3D rep‐ resentation of your data that can easily be plotted (Figure 1-9). These algorithms try to preserve as much structure as they can (e.g., trying to keep separate clusters in the input space from overlapping in the visualization), so you can understand how the data is organized and perhaps identify unsuspected patterns.

<img src="Images/6. Visualization Algorithm.png" width="700" style="margin-left: 40px;">

- **Dimensionality Reduction:**
    - A related task is dimensionality reduction, in which the goal is to simplify the data without losing too much information. One way to do this is to merge several correlated features into one. For example, a car’s mileage may be very correlated with its age, so the dimensionality reduction algorithm will merge them into one feature that represents the car’s wear and tear. This is called feature extraction.

    - It is often a good idea to try to reduce the dimension of your training data using a dimensionality reduction algorithm before you feed it to another Machine Learning algorithm (such as a supervised learning algorithm). It will run much faster, the data will take up less disk and memory space, and in some cases it may also perform better.

- **Anomaly Detection:**
    - Yet another important unsupervised task is anomaly detection—for example, detecting unusual credit card transactions to prevent fraud, catching manufacturing defects, or automatically removing outliers from a dataset before feeding it to another learning algorithm. The system is shown mostly normal instances during training, so it learns to recognize them and when it sees a new instance it can tell whether it looks like a normal one or whether it is likely an anomaly. A very similar task is novelty detection: the difference is that novelty detection algorithms expect to see only normal data during training, while anomaly detection algorithms are usually more tolerant, they can often perform well even with a small percentage of outliers in the training set.

<img src="Images/7. Anomaly Detection.png" width="700" style="margin-left: 40px;">

- **Association Rule:**
    - Finally, another common unsupervised task is association rule learning, in which the goal is to dig into large amounts of data and discover interesting relations between attributes. For example, suppose you own a supermarket. Running an association rule on your sales logs may reveal that people who purchase barbecue sauce and potato chips also tend to buy steak. Thus, you may want to place these items close to each other.