# 1. Machine Learning Landscape

This book will feature three production-ready Python frameworks:

1. **Scikit-learn** - easy to use and efficient in implementation (*ideal as entry point*)
2. **TensorFlow** - more complex library for distributed numerical computation (*ideal for large scale implementations*)
3. **Keras** - high level Deep Learning API (*use on top of TensorFlow, Theano or Microsoft Cognitive Toolkit*) 

A working definition for Machine Learning is given by Tom Mitchell in 1997:

_"A computer program is said to learn from experience E with respect to some task T
and some performance measure P, if its performance on T, as measured by P, improves
with experience E."_

**Note**: Applying ML techniques to dig into large amounts of data to discover patterns that were not immediately apparent is **data mining**.

### Classification of algorithms

It is useful to classify ML algorithms according to:

1. Use of human supervision (supervised - unsupervised - semisupervised - reinforcement)
2. Possibility to learn incrementally (batch vs. online)
3. Simply compare new data points to known data points or detect patterns and build a predictive model (instance-based versus model-based learning)

### Data

Quoting the authors of a seminal [paper](http://delivery.acm.org/10.1145/1080000/1073017/p26-banko.pdf?ip=114.250.152.226&id=1073017&acc=OPEN&key=4D4702B0C3E38B35%2E4D4702B0C3E38B35%2E4D4702B0C3E38B35%2E6D218144511F3437&__acm__=1566022014_f658a82c6c95cd915a402a5185bff103) on the nature of the relationship between algorithms and amount of data (what they call _corpora_):   
_"[..] we may want to reconsider the trade-off between spending time and money on algorithm development versus spending it on corpus development.”_

![Data](Images/1.1.Test_accuracy_vs_Dataset_size.jpg)

_Michele Banko and Eric Brill (Microsoft, 2001)_

Peter Norvig expanded this idea further in its paper titled ["The unreasonable effectiveness of data"](https://research.google.com/pubs/archive/35179.pdf) (2009).

#### Features

Garbage in. Garbage out. In order to learn effectively, we need relevant features. This can be accomplished by:

* **Feature selection**: selecting the most useful features to train on among existing features;  
* **Feature extraction**: combining existing features to produce a more useful one (as we saw earlier, dimensionality reduction algorithms can help);  
* **Feature building**: creating new features by gathering new data