# The Machine Learning Landscape

## What Is Machine Learning?
- `Machine Learning` is the science (and art) of programming computers so they can *learn from data*.
    - Your spam filter is a Machine Learning program that, given examples of spam emails (flagged by users) and examples of regular (nonspam, also called "ham) emails, can learn to flag spam.
- The examples that the system uses to learn are called the `training set`. Each training example is called a `training instance` (or sample).
    - In this case, the task T is to flag spam for new emails, the experience E is the training data, and the performance measure P needs to be defined; for example, you can use the ratio of correctly classified emails.
- This particular performance measure is called `accuracy`, and it is often used in classification tasks.

## Why Use Machine Learning?
- Problems for which existing solutions require a lot of fine-tuning or long lists of rules: one Machine Learning algorithm can often simplify code and perform better than the traditional approach.
- Complex problems for which using a traditional approach yields no good solution: the best Machine Learning techniques can perhaps find a solution.
- Fluctuating environments: a Machine Learning system can adapt to new data.
- Getting insights about complex problems and large amounts of data.
- Applying ML techniques to dig into large amounts of data can help discover patterns that were not immediately apparent. This is called `data mining`.
    - Ex: patterns in spam emails

## Examples of Applications
- Analyzing images of products on a production line to automatically classify them
- Detecting tumors in brain scans
- Automatically classifying news articles
- Automatically flagging offensive comments on discussion forums
- Summarizing long documents automatically
- Forecasting your company's revenue next year, based on many performance metrics
- Making your app react to voice commands
- Detecting credit card fraud
- Segmenting clients based on their purchases so that you can design a different marketing strategy for each segment
- Representing a complex, high-dimensional dataset in a clear and insightful diagram
- Recommending a product that a client may be interested in, based on past purchases
- Building an intelligent bot for a game

## Types of Machine Learning Systems
- Whether or not they are trained with human supervision (supervised, unsupervised, semisupervised, and Reinforcement Learning)
- Whether or not they can learn incrementally on the fly (online versus batch learning)
- Whether they work by simply comparing new data points to known data points, or instead by detecting patterns in the training data and building a predictive model, much like scientists do (instance-based versus model-based learning)

## Supervised/Unsupervised Learning
- There are four major categories: supervised, unsupervised, semisupervised, and Reinforcement Learning
#### Supervised Learning
- In `supervised learning`, the training set you feed to the algorithm includes the desired solutions, called `labels`.
- A typical supervised learning task is `classification`.
    - The spam filter is good example of this: it is trained with many example emails along with their *class* (spam or ham), and it must learn how to classify new emails.
- Another typical task is to predict `target` numeric value, such as the price of a car, given a set of `features` (milage, age, brand, etc.) called `predictors`. This sort of task is called `regression`. To train the system, you need to give it many examples of cars, including both their predictors and their labels (i.e., their prices).
- ***Note**: In Machine Learning an `attribute` is a data type (e.g., "mileage"), while a `feature` has several meanings, depending on the context, but generally means an attribute plus ite value (e.g., "mileage = 15000"). Many people use the words attribute and feature interchangeably.*
- Some regression algorithms can be used for classification as well, and vice versa. For example, `Logistic Regression` is commonly used for classification, as it can output a value that corresponds to the probability of belonging to a given class (e.g., 20% chance of being spam)
- Some of the most important supervised learning algorithms:
    - k-Nearest Neighbors
    - Linear Regression
    - Logistic Regression
    - Support Vector Machines (SVMs)
    - Decision Trees and Random Forests
    - Neural Networks
#### Unsupervised Learning
- In `unsupervised learning` the training data is unlabeled. This system tries to learn without a teacher.
- Some of the most important unsupervised learning algorithms:
    - Clustering
        - K-Means
        - DBSCAN
        - Hierarchical Cluster Analysis (HCA)
    - Anomaly detection and novelty detection
        - One-class SVM
        - Isolation
    - Visualization and dimensionality reduction
        - Principal Component Analysis (PCA)
        - Kernal PCA
        - Locally Linear Embedding (LLE)
        - t-Distributed Stochastic Neighbor Embedding (t-SNE)
    - Association rule learning 
        - Apriori
        - Eclat
- For example, say you have a lot of data about your blog's visitors. You may want to run a `clustering` algorithm to try to detect groups of similar visitors. At no point do you tell the algorithm which group a visitor belongs to: it finds those connections without your help. For example, it might notice that 40% of your visitors are males who love comic books and generally read your blog in the evening, while 20% are young sci-fi lovers who visit during the weekends.
- If you can use a `hierarchical clustering` algorithm, it may also subdivide each group into smaller groups. This may help you target your posts for each group.
- `Visualization` algorithms are also good examples of unsupervised learning algorithms: you feed them a lot of complex and unlabeled data, and they output 2D and 3D representation of your data that can easily be plotted. These algorithms try to preserve as much structure as they can (e.g., trying to keep separate clusters in the input space from overlapping in the visualization) so that you can understand how the data is organized and perhaps identify unsuspected patterns.
- A related task is `dimensionality reduction`, in which the goal is to simplify the data without losing too much information. One way to do this is to merge several correlated features into one.
    - For example, a car's milage may be strongly correlated with its age, so the dimensionality reduction algorithm will merge them into one that represents the car's wear and tear. This is called `feature extraction`.
- ***Tip**: It is often a good idea to try to reduce the dimension of your training data using a dimensionality reduction algorithm before you feed it to another Machine Learning algorithm (such as a supervised learning algorithm). It will run much faster, the data will take up less disk and memory space, and in some cases it may also perform before.*
- Yet another important unsupervised task is `anomaly detection`
    - For example, detecting unusual credit card transactions to prevent fraud, catching manufacturing defects, or automatically removing outliers from a dataset before feeding it to another learning algorithm.
- A very similar task is `novelty detection`: it aims to detect new instances that look different from all instances in the training set. This requires 
