# Population Modelling

<font size=3>The United States conducts a census every 10 years. The census has a breakdown of the population by state at the time during which it is conducted. This report attempts to use two models to predict the population of the different states. The two models are the exponential growth model and the logistical growth model</font>

In [9]:
#Imports of relevant libraries and initializing offline notebooks

#Graphing libraries
import plotly
import plotly.graph_objs as go

#Number libraries
import numpy as np
from scipy.optimize import curve_fit

#Set notebook mode to offline
plotly.offline.init_notebook_mode(connected=True)
# plotly.__version__

## Exponential Growth

There is no conception of a limited environment or constrained resources in a model of exponential population growth.

Exponential growth model of population makes the following assumptions about the populations changing:

* The rate of growth of the population is proportional to the size of the population.

This means that it only has to take only a few quantities into account

* $t = time$ (independent variable)
* $P = population$ (dependent variable)
* $k =  proportionality \ parameter$ (parameter) between rate of growth of the population and size of the population


$$
\begin{align}
\therefore \frac{dP}{dt} &= kP \\
\implies \frac{1}{P}\,dP &= k \,dt \\
\implies \int \frac{1}{P} \,dP &= \int k \,dt \\
\implies \ln P &= kt + c \\
\implies  P &= e^{kt + c} 
\end{align}
$$

$ P = Ce^{kt}$

The parameters $k$ and $C$ have to be determined so the function can be fitted to the data for predictive modelling. 

In [10]:
#defining the exponential growth function in code
def exponentialGrowth(t,k,a):
    return a*np.exp(k*t)

## Logistic Growth

Logistic growth ammends the exponential growth model to account for an environment with constained resources.

Logistic growth model of population makes the following assumptions about the populations changing:

* If the population is small, the rate of growth of the population is proportional to its size.
* If the population is too large to be supported by its environment and resources, the population will decrease.

The quantities it takes into account are:

* $t = time$ (independent variable)
* $P = population$ (dependent variable)
* $k =  proportionality \ parameter$ (parameter) between rate of growth of the population and size of the population
* $N = carrying \ capacity$  (parameter) 

$$
\begin{align}
\therefore \frac{dP}{dt} &= kP(1-\frac{P}{N}) \\
\implies k\,dt &= \frac{1}{P(1-\frac{P}{N})}\,dP\\
\implies \int k\,dt &= \int \frac{1}{P}\,dP + \int \frac{N}{1-\frac{P}{N}}\\
\implies kt + c &= \ln(P) - \ln(1-\frac{P}{N})\\
\implies  Ce^{kt} &= \frac{P}{1-\frac{P}{N}}\\
\implies  Ce^{kt} &= P + \frac{PCe^{kt}}{N}\\
\implies P &= (1+\frac{Ce^{kt}}{N})^{-1}(Ce^{kt})
\end{align}
$$

The parameters $C$, $k$ and $N$ have to be determined based on the values that fit the model best

In [11]:
def logisticGrowth(t,C,k,N):
    part1 = 1+((C*np.exp(k*t))/N)
    part2 = C*np.exp(k*t)
    
    return ((part2/part1))
    

## Eulers Method

Eulers Method is a numerical approach to solving the logistic growth model.

In [12]:
def eulerLogistic(x0, y0,xf,n,C,k,N):
    
    deltaX = (xf-x0)/(n-1)
    
    x = np.linspace(x0,xf,n)
    y = np.zeros([n])
    
    y[0] = y0
    
    for i in range(1,n):
        y[i] = deltaX*(k*y[i-1]*(1-(y[i-1]/N))) + y[i-1]
        
    return y

Both of these simple models are compared to the model for linear extrapolation. 

## Linear Extrapolation

Linear extrapolation makes the following assumptions about the population changing:

* rate of change of population is constant

The quantities is takes into account are:

* $t = time$ (independent variable)
* $P = population$ (dependent variable)
* $k =  proportionality \ parameter$ (parameter) between rate of growth of the population and size of the population

$$
\begin{align}
\therefore \frac{dP}{dt} &= k \\
\implies P &= kt + c
\end{align}
$$

In [5]:
#Linear extrapolation model
def linearExtrapolation(t,k,c):
    return (k*t) + c

## Population Data

The population data is obtained from the census from 1790 to 2000 and the modelling is done on this data in an attempt at predicting the population of the particular state in 2010

In [6]:
#Population Data Raw

populationData = {
    "masachusetts": {
        "years": np.array(range(1790,2010,10)),
        "population": np.array([379,423,472,523,610,738,995,1231,1457,1783,2239,2805,3366,3852,4250,4317,4691,5149,5689,5737,6016,6349])
    },
    "newYork": {
        "years": np.array(range(1790,2010,10)),
        "population": np.array([340,589,959,1373,1919,2429,3097,3881,4383,5083,6003,7269,9114,10385,12588,13479,14830,16782,18241,17558,17990,18976])
    },
    "northCarolina":{
        "years": np.array(range(1790,2010,10)),
        "population": np.array([394,478,556,639,738,753,869,993,1071,1400,1618,1893,2206,2559,3170,3572,4062,4556,5084,5880,6628,8049])
    }
}

# populationData

In [7]:
traceData = []
buttons = []
times = np.array(range(0, 22, 1))

timeFit = np.linspace(0,26,1000)
yearFit = np.linspace(1790,2050,1000)

for state in populationData.keys():
    
    trace = go.Scatter(x = populationData[state]["years"],
                       y = populationData[state]["population"],
                       mode = "markers",
                       name = state)
    
#   Exponential Fit
    popt, pcov = curve_fit(exponentialGrowth, times, populationData[state]["population"], p0=(1e-1, 1))
    exponentialFit = exponentialGrowth(timeFit, *popt)
    
    traceExponentialFit = go.Scatter(x = yearFit,
                                     y = exponentialFit,
                                     mode = "lines",
                                     name = state + " exponential fit")

#   Logistic Fit
    popt, pcov = curve_fit(logisticGrowth, times, populationData[state]["population"], p0=(1, 1e-3,5000), bounds=([-np.inf,-np.inf,0],np.inf))
    logisticFit = logisticGrowth(timeFit, *popt)
    
    traceLogisticFit = go.Scatter(x = yearFit,
                                  y = logisticFit,
                                  mode = "lines",
                                  name = state + " logistic fit")
    
#   Euler Method Logistic fit 
    euler = eulerLogistic(0, populationData[state]["population"][0], 23, 1000, *popt)
    traceEulerLogistic = go.Scatter(x = yearFit,
                                    y = euler,
                                    mode = "lines",
                                    name = state + " logistic euler method")
    
#   Linear Extrapolation
    popt = np.polyfit(times[-2:], populationData[state]["population"][-2:],1)
    linearExtrapolate = linearExtrapolation(timeFit, *popt)
    
    traceLinearExtrapolate = go.Scatter(x = yearFit,
                                        y = linearExtrapolate,
                                        mode = "lines",
                                        name = state + " linear Extrapolate")
    
    visible = [False]*len(populationData)*5
    visible[len(traceData)] = True
    visible[len(traceData)+1] = True
    visible[len(traceData)+2] = True
    visible[len(traceData)+3] = True
    visible[len(traceData)+4] = True
    
    #Adding Drop Down Buttons
    button = dict(label = state,
                  method = 'update',
                  args = [{'visible': visible},
                          {'title': state,
                           'annotations': []}])
    
    traceData.append(trace)
    traceData.append(traceExponentialFit)
    traceData.append(traceLogisticFit)
    traceData.append(traceLinearExtrapolate)
    traceData.append(traceEulerLogistic)
    
    buttons.append(button)


overflow encountered in exp


overflow encountered in exp


invalid value encountered in true_divide



In [8]:
#styling The Graph
updatemenus = list([
    dict(active=-1,
         buttons=buttons
    )
])

layout = go.Layout(yaxis=dict(
                   rangemode='nonnegative',
                   autorange=True,
                   title="Population (in thousands)"),
                   title="Population by State",
                   updatemenus=updatemenus)

fig = dict(data=traceData, layout=layout)

plotly.offline.iplot(fig, filename='dataPlot')

## Quality of the fit

Depending on the state, exponential growth model may be as good as the logistic growth model. This only holds true where the data hasn't reached anywhere near carrying capacity such as in North Carolina.

The Logistic model, is overall a better model however. 

Analytically solving the first order differential equation for the logistic model allows for a better fit than using Euler's method and overall makes the best prediction.

All the fits to data done here are using least squares