# Modeling Process

#### June 2023

### Imports and set-up

In [2]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import altair as alt
import statsmodels.formula.api as smf
import time

# set the baseline colors
color0 = "#121619"  # Dark grey
color1 = "#00B050"  # Green

# improve plot resolution
plt.rcParams["figure.dpi"] = 300
plt.rcParams["savefig.dpi"] = 300

# set a random seed just in case its needed
np.random.seed(41)

### Import Dataset

In [3]:
df = pd.read_csv("../30_intermediate_files/regression_dataset.csv")

### Simple Linear Regression

The first relaitonship we will look at is the price of gold to the price of silver because this is our main relationship of interest. The other variables will be analyzed one at a time, and then all together. We'll start with a simple linear regression and then move on to more complex models if warranted. 

#### Gold

In [8]:
# fit a linear model
lm = smf.ols(
    formula="Silver ~ Gold", data=df
).fit()

print(lm.summary())

                            OLS Regression Results                            
Dep. Variable:                 Silver   R-squared:                       0.849
Model:                            OLS   Adj. R-squared:                  0.849
Method:                 Least Squares   F-statistic:                     2246.
Date:                Sat, 17 Jun 2023   Prob (F-statistic):          5.81e-166
Time:                        21:16:11   Log-Likelihood:                -1057.8
No. Observations:                 401   AIC:                             2120.
Df Residuals:                     399   BIC:                             2128.
Df Model:                           1                                         
Covariance Type:            nonrobust                                         
                 coef    std err          t      P>|t|      [0.025      0.975]
------------------------------------------------------------------------------
Intercept      0.3067      0.311      0.985      0.3

>*This regression summary tells us that the price of Gold accounts for roughly 85% of the variablility in the price of silver. This is a very strong relationship. Additionally, the confidence interval for the slope is very narrow and does not encompass zero. This means that we can be very confident that the slope is not zero. and that there is a measurable effect. From the table above, a dollar change in the price of Gold is associated with a 0.0146 dollar change in the price of silver. Roughly a 70:1 ratio.*

#### Aluminum

In [9]:
# fit a linear model
lm = smf.ols(
    formula="Silver ~ Aluminum", data=df
).fit()

print(lm.summary())

                            OLS Regression Results                            
Dep. Variable:                 Silver   R-squared:                       0.400
Model:                            OLS   Adj. R-squared:                  0.398
Method:                 Least Squares   F-statistic:                     265.8
Date:                Sat, 17 Jun 2023   Prob (F-statistic):           3.67e-46
Time:                        21:21:11   Log-Likelihood:                -1334.6
No. Observations:                 401   AIC:                             2673.
Df Residuals:                     399   BIC:                             2681.
Df Model:                           1                                         
Covariance Type:            nonrobust                                         
                 coef    std err          t      P>|t|      [0.025      0.975]
------------------------------------------------------------------------------
Intercept     -8.7912      1.360     -6.462      0.0

>*The summary table above shows what we already suspected from Aluminum: it is not a strong predictor of the price of Silver as evidenced by the roughly 40% variability explained by the model.*

#### Platinum

In [10]:
# fit the model
lm = smf.ols(
    formula="Silver ~ Platinum", data=df
).fit()

print(lm.summary())

                            OLS Regression Results                            
Dep. Variable:                 Silver   R-squared:                       0.695
Model:                            OLS   Adj. R-squared:                  0.694
Method:                 Least Squares   F-statistic:                     908.4
Date:                Sat, 17 Jun 2023   Prob (F-statistic):          7.06e-105
Time:                        21:22:50   Log-Likelihood:                -1199.0
No. Observations:                 401   AIC:                             2402.
Df Residuals:                     399   BIC:                             2410.
Df Model:                           1                                         
Covariance Type:            nonrobust                                         
                 coef    std err          t      P>|t|      [0.025      0.975]
------------------------------------------------------------------------------
Intercept     -1.7739      0.537     -3.303      0.0

>*Platinum continues to make a compelling case as a possible variable of interest. The model above indicates a fairly strong relationship, with approximately 70% of the variablility being explained by the price of Platinum.*

#### Copper

In [11]:
# fit the model
lm = smf.ols(
    formula="Silver ~ Copper", data=df
).fit()

print(lm.summary())

                            OLS Regression Results                            
Dep. Variable:                 Silver   R-squared:                       0.812
Model:                            OLS   Adj. R-squared:                  0.812
Method:                 Least Squares   F-statistic:                     1726.
Date:                Sat, 17 Jun 2023   Prob (F-statistic):          5.38e-147
Time:                        21:24:34   Log-Likelihood:                -1101.7
No. Observations:                 401   AIC:                             2207.
Df Residuals:                     399   BIC:                             2215.
Df Model:                           1                                         
Covariance Type:            nonrobust                                         
                 coef    std err          t      P>|t|      [0.025      0.975]
------------------------------------------------------------------------------
Intercept     -1.2568      0.385     -3.262      0.0

>*Copper is another strong relationship, with roughly 81% of the variablility in the price of Silver being explained. Copper has a very tight confidence interbal on its slope, which is reflected in the plots seen in the previous step. A dollar change in the price of Copper indicates a .003 dollar, or less than a third of a cent, change in the price of Silver.*

#### Combo Model

In [12]:
# fit a linear model with Gold, Platinum, and Copper
combo_lm = smf.ols(
    formula="Silver ~ Gold + Platinum + Copper", data=df
).fit()

print(combo_lm.summary())

                            OLS Regression Results                            
Dep. Variable:                 Silver   R-squared:                       0.923
Model:                            OLS   Adj. R-squared:                  0.923
Method:                 Least Squares   F-statistic:                     1588.
Date:                Sat, 17 Jun 2023   Prob (F-statistic):          1.15e-220
Time:                        21:27:13   Log-Likelihood:                -922.72
No. Observations:                 401   AIC:                             1853.
Df Residuals:                     397   BIC:                             1869.
Df Model:                           3                                         
Covariance Type:            nonrobust                                         
                 coef    std err          t      P>|t|      [0.025      0.975]
------------------------------------------------------------------------------
Intercept     -2.7287      0.272    -10.041      0.0

>*The combination model has several interesting findings. First, the model does account for roughly 92% of the variance in the price of silver, which is quite strong. The relative importance of each predictor has dropped somewhat, with a dollar change in Copper now accounting for basically zero change in the price of Silver, Platinum moving from 1.6 to less than one cent, and Gold seeing its impact decrease from 1.46 cents to 1.06 cents. The confidence intervals are fairly narrow, which adds strength to the model's outcomes.*