# Fitting a model  

If you want to add a trendline or curve to fit a set of data and get the equation, spreadsheets are **way** easier. But you might need to use Python if your data set if too big for a spreadsheet. Here's how.

In [None]:
import pandas as pd
import numpy as np
from numpy.random import default_rng
import matplotlib as mpl
import matplotlib.pyplot as plt
from scipy.optimize import curve_fit

We need a set of data to analyze, so I'll make one the looks somewhat like y=2x+1 but with a little random jitter added in so it isn't so clean-looking. You could read in a set of data instead.

In [None]:
data = pd.DataFrame()
data['x'] = np.linspace(0,5,num=6)                     # evenly spaced x-values
jitter = default_rng().uniform(low=-1, high=1, size=6) # some random numbers
data['y'] = 2 * data['x'] + 1 + jitter                 # y=2x+1 plus some random jitter
data

In [None]:
# this should look close to y=2x+1
plt.scatter(data['x'], data['y'])

We'll need to define a function of the form we want. This one defines a linear model (y=mx+b), but you could write any sort of model, like quadratic, exponential, etc.

In [None]:
def my_model(x, m, b):   # first argument (x) is input data, scipy will optimize m & b to fit
    y = m * x + b
    return(y)

Then, Scipy's curve_fit will fit the model function to our data.  

In [None]:
# Scipy's curve_fit takes the arguments (model function, x-values, y-values)
popt, pcov = curve_fit(my_model, data['x'], data['y'])

# popt and pcov are some properties of the curve fit we'll use next.

In [None]:
popt   # optimized coeeficients, in the order they appear in the model function

The model function's optimized coefficients above are probably close to 2 and 1. Recall the original data looked like y=2x+1 (plus some random jitter). popt is an *array* and you can access just one of the array's elements (your optimized coefficients) like this:

In [None]:
# the first element in the array "popt". That's the first coefficient.
popt[0]

In [None]:
# plotting the original data and the optimized model (i.e., trendline)
plt.scatter(data['x'], data['y'], label="data", color='k')
plt.plot(data['x'], my_model(data['x'], popt[0], popt[1]), label="model fit", color='darkorchid')
plt.legend()
plt.show()