# Fundamentals of Machine Learning


## Imp topics to cover
#### supervised versus unsupervised learning
#### online versus batch learning
#### instance-based versus model-based learning
#### how to evaluate and fine-tune a Machine Learning system


## What is Machine Learning?
#### Machine Learning is the study that gives computers the ability to learn without being explicitly programmed.
##### —Arthur Samuel, 1959

### difference between knowledge and machine learning
##### example: If you just download a copy of Wikipedia, your computer has a lot more data, but it is not suddenly better at any task. Thus, it is not Machine Learning. whereas your spam filter is a Machine Learning program that can learn to flag spam given examples of spam emails (e.g., flagged by users) and examples of regular (nonspam, also called “ham”) emails. The examples that the system uses to learn are called the training set. Each training example is called a training instance (or sample). In this case, the task T is to flag spam for new emails, the experience E is the training data, and the performance measure P needs to be defined; for example, you can use the ratio of correctly classified emails. This particular performance measure is called accuracy and it is often used in classification tasks.


## Why use Machine Learning
#### how would you write a spam filter 
#### Since the problem is not trivial, your program will likely become a long list of complex rules—pretty hard to maintain
#### spam filter based on Machine Learning techniques automatically learns which words and phrases are good predictors of spam by detecting unusually fre‐ quent patterns of words in the spam examples compared to the ham examples. The program is much shorter, easier to maintain, and most likely more accurate.

#### Another area where Machine Learning shines is for problems that either are too com‐ plex for traditional approaches or have no known algorithm. example: speech recogination


## To summarize, Machine Learning is great for:
#### Problems for which existing solutions require a lot of hand-tuning or long lists of rules: one Machine Learning algorithm can often simplify code and perform bet‐ ter.
#### Complex problems for which there is no good solution at all using a traditional approach: the best Machine Learning techniques can find a solution.
#### Fluctuating environments: a Machine Learning system can adapt to new data.
#### Getting insights about complex problems and large amounts of data.



# Types of Machine Learning Systems


## Supervised/Unsupervised/Reinforcement Learning
#### Machine Learning systems can be classified according to the amount and type of supervision they get during training
#### 4 main types
##### supervised learning, unsupervised learning, semisupervised learning, and Reinforcement Learning.

### Supervised Learning
#### In supervised learning, the training data you feed to the algorithm includes the desired solutions, called labels
#### A typical supervised learning task is classification. The spam filter is a good example of this: it is trained with many example emails along with their class (spam or ham), and it must learn how to classify new emails.
#### Note that some regression algorithms can be used for classification as well, and vice versa. For example, Logistic Regression is commonly used for classification, as it can output a value that corresponds to the probability of belonging to a given class (e.g., 20% chance of being spam).
#### Here are some of the most important supervised learning algorithms
##### • k-Nearest Neighbors
##### • Linear Regression
##### • Logistic Regression
##### • Support Vector Machines (SVMs)
##### • Decision Trees and Random Forests
##### • Neural networks2


### Unsupervised Learning
#### In unsupervised learning, as you might guess, the training data is unlabeled. The system tries to learn without a teacher.
#### Some of the most important unsupervised learning algorithm
##### • Clustering
#####    — K-Means
#####    — DBSCAN
#####    — Hierarchical Cluster Analysis (HCA)
##### • Anomaly detection and novelty detection — One-class SVM
#####    — Isolation Forest
##### • Visualization and dimensionality reduction 
#####    — Principal Component Analysis (PCA) 
#####    — Kernel PCA
#####    — Locally-Linear Embedding (LLE)
#####    — t-distributed Stochastic Neighbor Embedding (t-SNE)
##### • Association rule learning
#####    — Apriori 
#####    — Eclat

#### example: visitors on nados, it will atomatically detect whats the intrest and level of a certain user

#### Yet another important unsupervised task is anomaly detection
##### for example, detecting unusual credit card transactions to prevent fraud, 
##### catching manufacturing defects, 
##### or automatically removing outliers from a dataset before feeding it to another learning algorithm.


### Semisupervised learning
#### Some algorithms can deal with partially labeled training data, usually a lot of unla‐ beled data and a little bit of labeled data. This is called semisupervised learning
#### example:  Google Photos, are good examples of this. Once you upload all your family photos to the service, it automatically recognizes that the same person A shows up in photos 1, 5, and 11, while another person B shows up in photos 2, 5, and 7. This is the unsupervised part of the algorithm (clustering). Now all the system needs is for you to tell it who these people are. Just one label per person,4 and it is able to name everyone in every photo, which is useful for searching photos.
#### algo example: deep belief networks (DBNs) are based on unsu‐ pervised components called restricted Boltzmann machines (RBMs) stacked on top of one another.


### Reinforcement Learning
#### Reinforcement Learning is a very different beast. The learning system, called an agent in this context, can observe the environment, select and perform actions, and get rewards in return. It must then learn by itself what is the best strategy, called a policy, to get the most reward over time. A policy defines what action the agent should choose when it is in a given situation.
#### example: you are left on a new planet



## Batch and Online Learning
#### Another criterion used to classify Machine Learning systems is whether or not the system can learn incrementally from a stream of incoming data.

### Batch Learning or Offline Learning
#### In batch learning, the system is incapable of learning incrementally: it must be trained using all the available data. This will generally take a lot of time and computing resources, so it is typically done offline. First the system is trained, and then it is launched into production and runs without learning anymore; it just applies what it has learned. This is called offline learning.
#### Fortunately, the whole process of training, evaluating, and launching a Machine Learning system can be automated fairly easily
#### This solution is simple and often works fine, but training using the full set of data can take many hours, so you would typically train a new system only every 24 hours or even just weekly. If your system needs to adapt to rapidly changing data (e.g., to pre‐ dict stock prices), then you need a more reactive solution.
#### Also, training on the full set of data requires a lot of computing resources
#### Fortunately, a better option in all these cases is to use algorithms that are capable of learning incrementally.


### Online Learning
#### In online learning, you train the system incrementally by feeding it data instances sequentially, either individually or by small groups called mini-batches. Each learning step is fast and cheap, so the system can learn about new data on the fly, as it arrives
#### It is a good option if you have limited computing resources: once an online learning system has learned about new data instances, it does not need them anymore, so you can discard them (unless you want to be able to roll back to a previous state and “replay” the data). This can save a huge amount of space.

#### One important parameter of online learning systems is how fast they should adapt to changing data: this is called the learning rate. If you set a high learning rate, then your system will rapidly adapt to new data, but it will also tend to quickly forget the old data (you don’t want a spam filter to flag only the latest kinds of spam it was shown). Conversely, if you set a low learning rate, the system will have more inertia; that is, it will learn more slowly, but it will also be less sensitive to noise in the new data or to sequences of nonrepresentative data points (outliers)
#### A big challenge with online learning is that if bad data is fed to the system, the sys‐ tem’s performance will gradually decline.
#### To reduce this risk, you need to monitor your system closely and promptly switch learning off (and possibly revert to a previously working state) if you detect a drop in performance



## Instance-Based Versus Model-Based Learning

### Instance Based Learning
#### Possibly the most trivial form of learning is simply to learn by heart.
#### If you were to create a spam filter this way, it would just flag all emails that are identical to emails that have already been flagged by users—not the worst solution, but certainly not the best. Instead of just flagging emails that are identical to known spam emails, your spam filter could be programmed to also flag emails that are very similar to known spam emails. This requires a measure of similarity between two emails. A (very basic) simi‐ larity measure between two emails could be to count the number of words they have in common. The system would flag an email as spam if it has many words in com‐ mon with a known spam email.


### Model based learning
#### Another way to generalize from a set of examples is to build a model of these examples, then use that model to make predictions. This is called model-based learning



## Main Challenges of Machine Learning
### Bad algo or Bad Data

### Bad Data
#### insuficient data
#### Bad quality data
#### irrelevent features
#### overfitting the training data
#### underfitting the training data