# Computer Representations of Data

The core to data science and machine learning is data. But how do computers represent data? Before we start getting into mathematics, let's find out exactly what an "artificial intellegence" has at its disposal to learn from.

## Data is Represented as Arrays

Let's take a look at some fruit. Using the Images.jl library we can load in some images.

In [None]:
using Images
apple = load("data/10_100.jpg")

In [None]:
banana = load("data/104_100.jpg")

Here we have images of apples and bananas. We would like to eventually build a program that can automatically distinguish between the two. However, the computer doesn't "see" an apple or a banana, instead it just sees numbers. An image is encoded in something called an array, which is like a container that has "boxes" or "slots" for individual pieces of data:

![data/array_cartoon.png](attachment:array_cartoon.png)

An array is a bunch of connected numbers. For example, our `apple` is a 100x100 image:

In [None]:
size(apple)

We can grab the number at any of the locations using the brackets `[i,j]` to grab the value at a specific point. Let's get the `(40,60)`th pixel:

In [None]:
apple[40,60]

## Colors as Numbers

At each point of the image, we get a color. Computers store colors in RGB format, that is they store a 0-1 value of red, green, and blue where 0 means none of that color and 1 means the brightest form of that color. For example, we can pull out the `red` value like:

In [None]:
red_value = float(red(apple[40,60]))
green_value = float(green(apple[40,60]))
blue_value = float(blue(apple[40,60]))
print("The RGB values are ($red_value, $green_value, $blue_value)")

Since the red value is high while the others are low, this means that at pixel `(40,60)`, the picture of the apple is very red. If we do the same at one of the corners of the picture:

In [None]:
red_value = float(red(apple[1,1]))
green_value = float(green(apple[1,1]))
blue_value = float(blue(apple[1,1]))
print("The RGB values are ($red_value, $green_value, $blue_value)")

we see that every color is bright which means white. 

## Working on an Image as a Whole

In Julia, to apply a function to the whole array we add a `.` after the function name to *broadcast* that function, so the following gives us the `red` value at every point in the image:

In [None]:
float.(red.(apple))

Julia's [mathematical standard library](https://docs.julialang.org/en/stable/stdlib/math/#Mathematics-1) has many mathematical functions built in. One of them is the `mean` function which computes the average value. If we apply this to our apple:

In [None]:
mean(float.(red.(apple)))

we see that the value indicates that the average amount of red in the image is a value between the amount of red in the apple and the amount of red in the white background. 

*Somehow we need to teach a computer to use this information about a picture to recognize there's an apple there!*

## A Quick Riddle

Here's a quick riddle. Let's check the average value of red in the image of the banana.

In [None]:
mean(float.(red.(banana)))

Oh no, that's more red than our apple? This isn't a mistake and is actually true! Before you move onto the next exercise, examine the images of the apple and the banana very carefully and see if you can explain why this is expected. 

#### Exercise

What is the average value of blue in the banana?

In [None]:
mean(float.(blue.(banana)))

#### Exercise

Does the banana have more blue or more green?

In [None]:
mean(float.(green.(banana)))