In [None]:
from ipywidgets import interactive

In [None]:
def plot_predictions_model_conditional(age, nihss, mrs):
    example = np.concatenate([normalize(nihss, train_mean[0] ,train_std[0]),
              normalize(age, train_mean[1] ,train_std[1]),
              one_hot_mrs(mrs).reshape(1, -1)
              ],
             axis=1)
  
    probs = lm.predict_proba(example)
    conditional_graphs(age, nihss, mrs, probs)
    
def conditional_graphs(age, nihss, mrs, probs):
    g = sns.FacetGrid(df, col="MRS",  hue='Outcome', legend_out=True)
    g.map(sns.scatterplot, "Age", 'NIHSS', alpha=.5)
    g.add_legend()
    
    plt.sca(g.axes[0][mrs])

    plt.axvline(age, color='r')
    plt.axhline(nihss, color='r')
    plt.show()
    
    plt.figure(figsize=(5,5))
    ax = plt.bar(np.arange(len(probs.reshape(-1,))), probs.reshape(-1,).tolist(), linewidth=.2)
    plt.xticks(np.arange(len(probs.reshape(-1,))), outcome_dict.values())
    plt.ylim([0, 1])
    plt.title('Predictions')
    
    
interactive_plot = interactive(plot_predictions_model_conditional, age=(0, 100, 2), nihss=(0, 35, 1), mrs=(0, 5, 1))
output = interactive_plot.children[-1]
output.layout.height = '500px'
interactive_plot

# Foundations

## Dimensions

The first concept we're going to tackle is that of a dimension. You're probably used to thinking about dimensions in space. Space, as you've probably heard, is 3D. 

What does that mean? It means that I can describe where you are - or where this ball is - using 3 numbers. By convention, these are often called `x, y, z`. I can make those numbers really precise - adding lots of decimal places - but there's no need for more than 3 numbers.

[3D graph with ball]

What happens if I try and describe where you are using just 2 numbers? It's ambigious. I can specify where you are in terms of longitude or latitude, but you could be at any height. Or I can choose to specify your height, and longitude, but then you could be at any latitude.

[3D graph but with 2D controls]

What if I want to specify your position in time? How many numbers do I need? Just 1 - your position in time can be represented just by a single number. That's why it's called a timeline:

[Timeline]

And so if we want to specify somebody's position in space and time, we're going to need 4 dimensions - 3 for space, and 1 for time.

And actually, that's really hard to draw. Unfortunately, so are most of the spaces we're going to tackle in this piece. Because once you start looking for them, high dimensional spaces are everywhere. Let's look at the stats for one of my childhood idols, diminuitive rugby player Jason Robinson:

Mat	Start	Sub	Pts	Tries	Conv	Pens	Drop	Won	Lost	Draw	%

overall	2001-2007	56	52	4	150	30	0	0	0	39	17	0

So we have numbers corresponding to the number of matches he played, started, appeared as substitute, how many points he scored, how many tries he scored, how many conversions, penalties, dropgoals he scored, and how many games he won, lost and drew, and his winning percentage.

So what do we have? That's right, we have a 12 dimensional space.

Every rugby player in the `ESPN` database can be represented as a point in this space. Here are a couple more:

x
y
x

If you're used to working with spreadsheets or databases, you're probably thinking: oh I get it, dimensions are basically like columns in my spreadsheet. And that's exactly right - you can think of each row in your database as a point in a high-dimensional space defined by the columns.


### Why does this matter?
Machine learning is (more or less) the business of predicting some dimensions given some others.

Let's dig into this given the examples we have so far. An example might be:

1. Predicting longitude based upon latitude
2. Predicting height above the earth based upon longitude and latitude
3. Predicting how many points you've scored based upon how many tries, conversions, and dropgoals you've socred

These range in difficulty, from very hard to very easy. You can probably see this intuitively: 

1. If I know your latitude, I can draw a line upon which your longitude might lie. But there are lots of different possible latitudes for a given longitude. The fact that you're unlikely to be in the sea probably helps, but we probably can't be super precise.


2. If I know your latitude and longitude, I might actually be able to say quite alot about how high you are above the earth. For instance, if you're in New York, then you're likely to be higher above the earth than if you're in rural Zimbabwe. 


3. This one's actually trivial, because the number of points you scored is a product of the tries, conversions, and dropgoals you've scored! So there's a very simple mathematical rule we could write down to describe this relationship. But actually, we could learn it from the data too, as we'll show.


Is life really this simple? Is all machine learning predicting one column of a database from a bunch of the others? Almost. So let's think about Go, the ancient Chinese game that DeepMind cracked using something called Deep Reinforcement Learning.

Can we build this kind of 'database-style' representation? It's kind of tricky. We want to be able to pick the next move we make. Let's think about the ingredients we need:

- For every position on the board, whether there is a white piece there, a black piece there, or neither
- Whether that move was good or not

The ingredients are simple, but actually getting them is rather hard:

- a Go board is 19 x 19, so there are 361 positions that we need to specify. That's ok - we can have 361 columns. Unfortunately, we're going to need a lot of examples to understand what each column means. Imagine a sport you've never heard of (like, I don't know, rugby), where somebody gives you 361 numbers to describe how good each player is, along with an answer of how good that player actually is. You're going to need *a lot* of examples of players to figure out the significance of each of those columns.

- We don't actually have access to that information
