# Modeling Data

Machine learning and data science is about modeling data. **Modeling** is the representation of an idea into a mathematical equation. All machine learning methods are about training a computer to fit a model to some data. Even the fanciest terms like neural networks are simply choices for models. In this notebook we will begin to start building our first computational model of data.

## Modeling Data is Hard!

Let's pick up where we left off in notebook 1 with fruit. We were left with a riddle. When we load images of apples and bananas

In [None]:
using Images
apple = load("data/10_100.jpg")

In [None]:
banana = load("data/104_100.jpg")

and then compare their average value for the color red, we end up with something paradoxical:

In [None]:
apple_red_amt = mean(float.(red.(apple)))
banana_red_amt = mean(float.(red.(banana)));

In [None]:
"The average value of red in the apple is $apple_red_amt while the average value of red in the banana is $banana_red_amt."

Were you able to guess why? There are actually two reasons. One of the reasons is the background. The image of the banana has a lot more background than the apple, and the white background has a red value of 1! In our minds we ignore the background and say "the banana is bright yellow, the apple is a dark red", but a computer just has a bundle of numbers and does not know where it should be looking.

But the other issue is that "bright yellow" isn't a color that exists in a computer. The computer has three colors: red, green, and blue. "Bright yellow" in a computer is a mixture of red and green, and it just so happens that to get this color yellow it needs more red than the apple!

In [None]:
"The amount of red in the apple at (60,60) is $(float(red(apple[60,60]))) while the amount of red in the banana at (60,60) is $(float(red(banana[60,60])))"

This is a clear example that modeling data is hard!

** A note on string interpolation **

In the last two input cells, we *interpolated a string*. This means that when we write the string, we insert a placeholder for some value we want the string to include. When the string is evaluated, the value we want the string to include replaces the placeholder. For example, in the following string,

```julia
mystring = "The average value of red in the apple is $apple_red_amt"
```

`$apple_red_amt` is a placeholder for the value stored in the variable `apple_red_amt`. Julia knows that we want to use the value bound to the variable `apple_red_amt` **not** the word "apple_red_amt" because of the dollar sign, $, that comes before `apple_red_amt`.

Execute the following code to see what the dollar sign does:

```julia
mypi = 3.14159
println("I have a variable called mypi that has a value of $mypi.")
```

#### Exercise

Alter and execute the code that creates `mystring` below 

```julia
apple_amt_blue = mean(float.(blue.(apple)))
mystring = "The average amount of blue in the apple is apple_amt_blue"
```

so that `println(mystring)` prints a string that reports the mean value of blue coloration in our image of an apple.

## Take some time to think about the data

Apples and bananas are very different, but how do you use the RGB brightness to tell the difference between the two? Here are some quick ideas:

- We can use the shape of the object in the image. But how can we encode ideas about shape from an array?
- We can use the size of the object in the image. But how do we calculate size?
- We can use another color, or combinations of colors, from the image. What colors?

Let's go with the last route. The banana is yellow which is a combination of red and green, while the apple is red. This means that the color that clearly differentiates between the two is not red but green!

In [None]:
apple_green_amt = mean(float.(green.(apple)))
banana_green_amt = mean(float.(green.(banana)));

In [None]:
"The average value of green in the apple is $apple_green_amt while the average value of green in the banana is $banana_green_amt"

What we just did has fancy names: feature selection and data munging. 

**Feature selection** is the process of subsetting the data to a more relevant and informative set. We took the full image data and decided to select out the green channel. 

**Data munging** is transforming the data into a format more suitable for modeling. Here, instead of keeping the full green channel, we transformed it down to a single data point: the average amount of green.

## Building a Model

We want to model the connection between "the average amount of green" and "is an apple or banana". 

<img src="data/data_flow.png" alt="Drawing" style="width: 800px;"/>

This model is a mathematical function which takes in our data and spits out a number that we will interpret as "is an apple" or "is a banana".

<img src="data/what_is_model.png" alt="Drawing" style="width: 500px;"/>


We will interpret the output of the function as "is an apple" if the output is close to zero, and "is a banana" if it's close to one. Anything in the middle is something we are unsure about. A common function for performing this kind of **classification** is the sigmoid:

$$\sigma(x;w,b) := \frac{1}{1 + \exp(-wx + b)}$$

$$ x = data $$

$$ \sigma(x;w,b)=0 \implies apple$$

$$ \sigma(x;w,b) = 1 \implies banana $$

In our mathematical notation above, the `;` in the function differentiates from the **data** and the **parameters**. `x` is the data and is determined from the image. The parameters, `w` and `b`, are numbers which we choose to make our function match the results it should be modeling.

Note that in the code below, we don't distinguish between data and parameters - both are just inputs to our function, σ!

In [None]:
σ(x,w,b) = 1 / (1 + exp(-w*x+b))

What we want is that when we give σ the average green for the apple, `x=0.33820274`, it should give something close to 0 meaning apple. And when we give σ `x=0.88079727`, it should output something close to 1 meaning banana.

We can understand how our choice of `w` and `b` affect our model by seeing how our values for `w` and `b` change the plot:

In [None]:
using Plots; gr()
using Interact
@manipulate for w in -10:0.01:30, b in 0:0.1:30
    plot(x->σ(x,w,b),0,1,label="Model",legend = :topleft,lw=3)
    scatter!([apple_green_amt],[σ(apple_green_amt,w,b)],label="Apple")
    scatter!([banana_green_amt],[σ(banana_green_amt,w,b)],label="Banana")
end

Notice that the two parameters do two very different things. The **weight** `w` determines how fast the transition between 0 and 1 occurs. It encodes how trustworthy we think our data can actually is, and in what range we should be putting points between 0 and 1 and thus calling them "unsure". The **bias** encodes where on the x-axis the switch should take place. It can be seen as shifting the function left-right. We'll come to understand these *parameters* more in notebook 5 - "Tools - Function parameters".

Here are some parameter choices that work well:

In [None]:
w = 25.58; b = 15.6
plot(x->σ(x,w,b),0,1,label="Model",legend = :topleft,lw=3)
scatter!([apple_green_amt],[σ(apple_green_amt,w,b)],label="Apple")
scatter!([banana_green_amt],[σ(banana_green_amt,w,b)],label="Banana")

Once we have a model, we have a computational representation for how to choose between "apple" and "banana". Let's pull in some new images and see what our model says!

In [None]:
apple2 = load("data/107_100.jpg")

In [None]:
green_amt = mean(float.(green.(apple2)))
@show green_amt
scatter!([green_amt],[σ(green_amt,w,b)],label="New Apple")

Our model successfully says that our new image is an apple! Pat yourself on the back: you've actually trained your first neural net.

#### Exercise

Load the image of a banana in `data/8_100.jpg` as `mybanana`. Edit the code below to calculate the amount of green in `mybanana` and to overlay data for this image with the existing model and data points.

In [None]:
mybanana = load("data/8_100.jpg")
# mybanana_green_amt = 
# scatter!(label="my banana")

## Closing Remarks: Bigger Models, More Data, More Accuracy

That last apple should start making you think: not all apples are that red and some are quite yellow. "redness" is one attribute of being an apple, but isn't the full thing. What we need to do is incorporate more ideas into our model by allowing more inputs. However, more inputs would mean more parameters to tweak. Also, we would like to have the computer start "learning" on its own. How do we take the next step?

The first thing to think about is, if you wanted to incorporate more data into the model, how would you change the sigmoid function? Play around with some ideas. But also, start thinking about how you chose parameters. What process did you do to finally end up at good parameters? These two problems (working with models with more data and automatically choosing parameters) are the last remaining step to understanding deep learning.