# Using ML techniques to infer a multiplier

### Scenario

You discover that the number of apple seeds is directly tied to the overall weight of the fruit, the seed count just needs to be multiplied by some fixed number. Create a model such that given the number of seeds, you can predict the weight of the fruit. Use an iterative guessing approach to estimate the value of ***the multiplier.***

### We use only a single package for the model and one more to load in the exact value of pi
* numpy - this package handles matricies (or more technically arrays, which may have more dimensions than a matrix)
* math - provides mathematical functions

In [1]:
import numpy as np
import math

## Part 1 - Set up data

### Set the multiplier
This will be the value the seed count is multiplied by, and the number we're trying to discover
* Declare a variable named "actual_multiplier" and set it to the value of pi using the math module

### Collect some apple seeds
Collect some samples of apple seeds, and measure the associated fruits
* To start we'll use 10 samples with different numbers of seeds in each sample. Here we'll use numbers 1, 2, ..., 9, 10
    * Make a numpy array named seed_count_array with these values
* For obvious reasons, we will not be measuring any apples right now. We're going to cheat a bit and say that the weight of the associated apples were the number of seeds times our multiplier value - plus noise
    * Make a numpy array called apple_weight_array that is length 10, and equal to the seed_count_array times the actual_multiplier
    * Use the np.random.random method to create an array of length 10, and name it noise_array
    * Add the values from the elements of the noise array to the elements of apple_weight_array
* Print out the actual_multiplier, seed_count_array, and apple_weight_array

### Bonus
* Look up np.random.rand to find which values it can return
* Use a plotting package (e.g. matplotlib) to plot the apple weight vs seed count

### Sidenote - How contrived is this exercise?

This is toy problem where we know the answer before we start. The point of this example is understand overall process of iterative improvement. Relationships typically being modelled with ML are more complicated than a simple multiplier, but suprisingly little changes for more complex problems. Here we're modelling a single parameter, but many models used in biology have 10s of millions, but are built out of many simple calculations like our exercise. The math is more advanced (but maybe not as much as you might think) and beyond our scope, but wouldn't serve much practical use anyway since these calculations are never ever done by hand, and a comprehensive understanding of them is not strictly necessary unless researching novel algorithm designs.

## Part 2 - Build out a training loop

### Steps
1. Use nested for-loops to 1) make a random prediction for each sample and 2) go through 10 epochs
    * Write a prediction function (named predict_multiplier) for this that guesses a value from -100 to 100
2. Write a loss function (named calculate_loss) which finds the difference between the predicted weight and the measured weight
    * N.B. the multiplier is the parameter we are trying to guess, the prediction is the multiplier times the number of seeds
3. Create a variable that keeps track of the best (lowest) loss value, call it best_loss
    * Make a list called best_param_list that appends *another* list of the epoch number, loss, predicted multiplier, actual multiplier, predicted target, actual target, and number of seeds (making a list of lists) whenever a new best loss is found
    * Try increasing the number of epochs (make sure you comment out the print statements though!)
4. Update the predict function to take in the previous step’s prediction and loss to make the output more accurate
    * Add a step before your loop to initialise these values
5. Add a learning rate
    * Multiply the loss function by learning rate of 0.001
    * Try a few different learning rates and compare how they affect the predictions



In [1]:
#Main loop
