# Machine learning does not magically create added value
*How machine learning allows us to overcome previously impossible challenges, but is no silver bullet*

# Introduction
"*I thought AlphaGo was based on probabilty calculation, and it was merely a machine. But when I saw this move, I changed my mind. Surely AlphaGo is creative. The move was really creative and beatiful*" ([Alpha Go documentary](https://www.youtube.com/watch?v=WXuK6gekU1Y), 52:10-52:40). In this quote Lee Sedol, the best human to ever touch the game of Go, reacts to the infamous move 37 in one of his games against the reinforcement learning agent Alpha Go. One the developers of Alpha Go later comments (56:30) that Alpha Go is "*really a simple program and nowhere near full AI*". Apart from it being a beatiful story (you should watch the documentary), it also highlights the kind of magical aura that surrounds machine learning and especially deep learning. The AI will solve our problems, simply feed it data. 

This kind of magical thinking surrounding machine learning is sadly not true. Granted, new deep learning techniques can solve problems we previously thought unsolvable. However, building these deep learning networks and tweaking them for a specific application is still a massive amount of work. 

In this article I want to peel away some of the obfuscating layers that cover machine learning. My goal is to provide a more balanced view at what exactly makes machine learning special, but also why it is still a lot of work to create a solution that adds value. 

# Example: basic data science workflow
To elaborate on the point we consider the following three Data Science workflows:

![ds_workflows](ds_workflows.png)

1. Multiple Linear Regression (MLR), classic regression approach. Here we collected a small amount (~10) variables that relate to what we want to predict. Maybe we manually tweak the input variables.
2. TSfresh combined with Lasso. Here we use a feature generation package to generate hundreds of potential features, and tune the regularisation strength to select which of the features should end up in the final model. 
3. A simple neural network (Multi-layer perceptron). Here we input our initial data into our Neural network and let it create and select out features. Also, the functional form can now be different than linear, which is the case in our previous two examples.

Note that in the sequence of these three approaches that we outsource more and more decisions to our statistical approach. **In stead of solving the statistical problem ourselves, we build a machine that solves the problem for us**. This however still requires us to do a lot of thinking. For example:

- Which feature generation algorithm do I use in approach 2? 
- Which loss function do I choose in my neural network approach? 

Because we made a machine to solve our problem does not mean the machine works automatically. 

# ML as a form of meta-learning: control problems and reinforcement learning
Here we want to expand on the concept of building this machine that does the problem solving. Example using Reinforcement Learning:

- Simple control problem: robot should not hit the wall and get to the end. Very classical approach would be an [expert system](https://en.wikipedia.org/wiki/Expert_system):
    - Get a bunch of experts in a room
    - Discuss what rules the robot should follow, e.g.:
     - Use your sensor too look ahead
     - When an obstacle comes within range, stop
     - Pivot around until you find a clear angle, continue driving
    - Perform experiments and check if the robot performs well. Evaluate the rules with the expert and adapt them. 
- Reinforcement learning approach: 
    - Define what the robot knows about the world, i.e. the one distance sensor. 
    - Also define a function that defines succes for the robot, i.e. a reward function. positive if we move, negative if we hit the wall. 
    - Let the robot walk around, giving it rewards appropriate to its behavior. The RL agent will slowly learn the policy: what is a good idea in a given situation
    - The researcher needs to tweak all the settings of the RL algorithm, e.g. learning rate, architecture of the neural network (if you use one), exploration vs exploitation setting, which reward function to use, etc.
- Note that we switched from determine the rules ourselves to building a system that can find the rules on its own. Improving performance is not done by tweaking the rules itself, but by tweaking the learning system that then learns the rules. Effectively this is meta-learning. Meta-learning does not make our tasks easier, it does allows us to solve different problems. 

# Meta-learning in full effect: computer vision and CNNs
- Elaborate more on what different classes of models we can solve by providing a nice example. 
- Write about object recognition in images using CNN. Contrast this with an expert system approach. 


# Questions:
- Title is better, but for my taste not yet optimal. 
- Not happy with the term meta-learning. [There is a term already](https://en.wikipedia.org/wiki/Meta_learning_(computer_science)) that is not what I mean. The goal is to find something that fits with the abstraction layer idea. First we learn rules, then we learn the system that makes the rules, then we might go to AGI that learns about learning the rules. 

![tree](agi_to_rules.png)

# Interesting angles

### Increased abstraction of machine learning leads to believe in magical thinking
- ML models become more and more abstract and hard to explain. This has a tendency to introduce an almost magical believe in what these models can do. 
- Good example is Lee Sedol, world champion Go player, that says Alpha Go is creative and plays like a human. 

## Classes of problems
- Couple of features, relativly uncorrelated, not much feature engineering needed, each labeled, **functional form** reasonably well understood. 
- Very large number of features, much feature engineering needed. 

### Less abstract
- Regression of milk spectra to determine fat content of milk 
- Fraud detection on 12 features

### More abstract
- Classification of photos
- Very good chat bots

## Hierarchy of models
- low coef models: single or multiple lin regression
- intermediate coef models: principal component regression, lasso
- high coef models: neural networks, deep neural networks


- Traditional models: regression, lasso, etc. Order of magnitude 1-100 coefs
    - Feature generation and selection by the user
    
- Machine learning: neural nets (1000s coef) to deep neural nets (100.000s coefs - millions of coefs)
