## Modeling Data Exercise

In the world population notebooks, we used different mathematical models to describe existing data.   The advantage of this approach is that if we can "fit" the model to the data, we will have a better understanding of the nature of the data.

<br>  Here we'll try to do the same thing with a new data set.  We'll start with the functions from the world population notebooks, which allow us to create linear, proportional, and quadratic models.  We're going to generalize these here, so we can look at data sets other than world population.

In [None]:
import pandas as pd
import numpy as np

# A run_simulation function that accepts different change functions
def run_simulation(system, change_func):
    results = pd.Series([],dtype=object)
    results[system['t_0']] = system['p_0']

    for t in range(system['t_0'], system['t_end']):
        pop = results[t]
        growth = change_func(t, pop, system)
        results[t+1] = results[t] + growth
    return results

Here are the three change functions.  Notice the similarity in structure, and notice the parameters that each function uses:

In [None]:
# Three different change functions
def change_func_lin(t,pop, system):
    growth = system['annual_growth']
    return growth

def change_func_prop(t, pop, system):
    growth = system['alpha'] * pop
    return growth

def change_func_quad_rk(t, pop, system):
    growth = (system['r'] * pop)*(1 - pop/system['K'])
    return growth

Let's run our simulations using randomly generated values, just to see how they work.  We'll put all of the possible parameters in our system dictionary, just in case we want to use them:

In [None]:
# Initial values and time span
t_0 = 100; p_0 = 100.0; t_end = 500

# Linear parameter
annual_growth = 3.0

# Proportional parameter
alpha = 0.01

# Quadratic parameters
r = 0.025
K = 200.0

# Create the system
system = dict(t_0=t_0, p_0=p_0, t_end=t_end,
              alpha=alpha, annual_growth=annual_growth,
              r=r, K=K)

Now let's run the simulation.  Play with the values and the different parameters to get different behaviors for the models.  What happens when you change the signs of the linear and proportional parameters?  How you get the quadratic model to reach an equilibrium below its starting point?

In [None]:
results = run_simulation(system, change_func_lin)
results.plot(label='Modeled function',title='Simulated Mathematical Model',
             legend=True);


## Part 1

Now let's look at some data.  Below is data that represents the population of pumpkinseed fish in Duck Creek in Davenport over the course of a couple decades (Not really, but I wish.  Prettiest fish in world).  

<br>

<img src = https://github.com/MAugspurger/ModSimPy_MAugs/raw/main/Images_and_Data/Images/1_3/pumpkinseed.PNG width = 600>

<br>

We'll use the built-in function `len()` to count our data points:

In [None]:
pop = [2749, 3756, 3122, 1843, 2010, 2821, 1174, 1284, 1287,
        2339, 1177, 962, 1176, 2149, 1404, 969, 1237, 1615, 1201]
num_years = len(pop)
len(pop)

To make this easier to plot, let's put it in a series, and then index it according to years.  

<br> Just for variety, we'll use the NumPy function `arange()` instead of `linspace()`.  `arange()` asks for a start, a stop, and a step size, so we don't have to count how many steps as we do with `linspace()`:

In [None]:
years = np.arange(2000, 2000+num_years, 1)
len(years)


Since they're the same length, we can make them into a Series (with the years as an index) and plot the data:

In [None]:
pumpkinseed = pd.Series(data=pop, index=years)
pumpkinseed.name = 'Number of Pumpkinseed'
pumpkinseed.index.name = 'Year'
pumpkinseed.plot(title='Pumpkinseed Population', ylabel=pumpkinseed.name,
                 xlim=[2000,2020], xticks=[2000, 2005, 2010, 2015, 2020]);

Ok, now it's your job.   Using the model simulations above, create a linear model that approaches this data in the cell below.  Notice: you don't really need to write much code--you mostly need to pull code from our generalized simulation at the beginning of this notebook.

In [None]:
# Create a new system



# Run the simulation and plot both the model and the data




Now find the relative error in your linear model and print the mean relative error out (look back at Notebook 1.3.1 to remind yourself how to do this).  Then use this to adjust and correct your parameters:

In [None]:
# define absolute and relative error



Now do the same with a proportional and quadratic model.  Notice that the proportional model wants to find an equilibrium at zero.  Can you make a change to `change_func_prop` to make it level out at a different point?  Hint: you'll need to add a second parameter to the equation.

In [None]:
# Proportional model
# Create a new system, run the simulation, and plot (just as above)


In [None]:
# Define absolute and relative error for the proportional model



In [None]:
# Quadratic model
# Create a new system, run the simulation, and plot


In [None]:
# define absolute and relative error for the quadratic model



✅ ✅  Which model allows you to produce the smallest error?  What makes it difficult to get the error below 25-30%?

✅ ✅ Answer here.

## Part 2

The data has some clear oscillations.  Can we capture that?   Try to create a new change function that uses `np.sin()` or `np.cos()` that causes an oscillation in the population.   You're going to need three parameters for this function: one that controls the frequency of the oscillation, one that controls the amplitude (i.e. the "height") of the oscillations, and and one that controls the phase (i.e. the right/left "shift" of the oscillations).  

<br> As always, start simple with just the frequency, then add amplitude, and then see if you can figure out how to include phase.

In [None]:
# Define the new sinusoidal change function
def change_func_sinusoid(t, pop, system):


Test the function to see that you sometimes get a negative growth and sometimes positive based only on changing the value of $t$:

In [None]:
growth = change_func_sinusoid(0, 10, system)
print(growth)

Now use `run_simulation` with the sinusoidal change function:

In [None]:
# Sinusoidal model
# Create a new system, run the simulation, and plot (just as above)




## Part 3

Finally, notice that the data for the pumpkinseed population both oscillates *and* declines.  For *extra special bonus points*, see if you can come up with a change function whose `growth` can duplicate this complex behavior.  You can either use the linear model or the exponential model as the basis of the new function.

In [None]:
# Define the new complex change function
def change_func_complex(t, pop, system):


In [None]:
# Complex model
# Create a new system, run the simulation, and plot (just as above)



In [None]:
# Define absolute and relative error

