#Q1

#1

The likelihood for a linear regression model, where the errors are assumed to have covariance $\Sigma$, is given by:

![](h71.png)

This expresses the probability of observing y given the model parameters $\beta$ and the covariance matrix $\Sigma$.

The prior distribution for $\beta$ is assumed to be a multivariate normal distribution:

![](h72.png)

This represents our belief about the parameters before observing any data, with a mean $\beta$0 and covariance $\Sigma$$\beta$.

By Bayes' rule, the posterior distribution is proportional to the product of the likelihood and the prior:

![](h73.png)

Then we substitute the expressions for the likelihood and prior:

![](h74.png)

After that, we rearrange the terms in the exponent and complete the square:

![](h75.png)

This is the expression where we have quadratic and linear terms in $\beta$.

To complete the square, we group the terms involving $\beta$ and rewrite the exponent in a form that makes it clear that the posterior is a multivariate normal distribution. The quadratic part becomes:

![](h76.png)

Where:

![](h77.png)



#Q1

#2

Given a linear regression model, the likelihood is: 

![](h721.png)

Where y is the vector of observed values, X is the design matrix, and $\beta$ is the vector of regression coefficients.

The prior distribution on $\sigma^2$ is assumed to be an inverse-gamma distribution:

![](h722.png)

Again using Bayes' rule, the posterior distribution for $\sigma^2$ is proportional to the product of the likelihood and the prior:

![](h723.png)

Now rearrange the terms in the exponent and combine all the terms involving $\sigma^2$: 

![](h724.png)

This expression matches the form of an inverse-gamma distribution.

Thus, the posterior distribution for $\sigma^2$ is:

![](h725.png)

In [None]:
#Q2

# My dataset: https://www.kaggle.com/datasets/hrish4/cpi-inflation-analysis-and-forecasting
#This dataset provides detailed Consumer Price Index (CPI) data in the United States of America from the years 2002 to 2023 to support economic research.

import pandas as pd
from sklearn.preprocessing import StandardScaler
import pymc as pm
import numpy as np
import arviz as az

data = pd.read_csv('CPI_dataset.csv')

data = pd.get_dummies(data, drop_first=True)

scaler = StandardScaler()
X_scaled = scaler.fit_transform(data.drop(columns='forecast_percent_change'))

y = data['forecast_percent_change'].values

p = X_scaled.shape[1]

with pm.Model() as model:
    beta0 = pm.Normal('beta0', mu=0, sigma=10)
    betas = pm.Normal('betas', mu=0, sigma=1, shape=p)
    sigma = pm.HalfNormal('sigma', sigma=10)
    
    mu = beta0 + pm.math.dot(X_scaled, betas)
    
    y_obs = pm.Normal('y_obs', mu=mu, sigma=sigma, observed=y)
    
    trace = pm.sample(2000, tune=1000, return_inferencedata=True)

summary = az.summary(trace)
print(summary)

az.plot_trace(trace)


In [None]:
#Q3

import pandas as pd
from sklearn.preprocessing import StandardScaler
import pymc as pm
import numpy as np
import arviz as az

data = pd.read_csv('CPI_dataset.csv')

np.random.seed(1008111151)
outlier_indices = np.random.choice(data.index, size=int(0.05 * len(data)), replace=False)
data.loc[outlier_indices, 'forecast_percent_change'] *= -30

data = pd.get_dummies(data, drop_first=True)

scaler_X = StandardScaler()
X_scaled = scaler_X.fit_transform(data.drop(columns='forecast_percent_change'))

y = data['forecast_percent_change'].values

p = X_scaled.shape[1]

with pm.Model() as robust_model:
    beta0 = pm.Normal('beta0', mu=0, sigma=10)  
    betas = pm.Normal('betas', mu=0, sigma=1, shape=p)  
    nu = pm.Exponential('nu', 1/30)  
    sigma = pm.HalfNormal('sigma', sigma=10)  
    
    mu = beta0 + pm.math.dot(X_scaled, betas)
    
    y_obs = pm.StudentT('y_obs', mu=mu, sigma=sigma, nu=nu, observed=y)
    
    trace = pm.sample(2000, tune=1000, return_inferencedata=True)

summary = az.summary(trace)
print(summary)

az.plot_trace(trace)
