# Task
Load the wealth, consumption, and income data from "wealth_data.csv", "consumption_data.csv", and "income_data.csv" into a DataFrame, calculate the `cay` index as the residuals of the regression of consumption on wealth and income, and check if the `cay` index is stationary.

## Load data

### Subtask:
Load the wealth, consumption, and income data into a pandas DataFrame.


In [None]:
wealth_df = pd.read_csv('Wealth-Index-Cleaned.csv')
consumption_df = pd.read_csv('Consumption-Cleaned.csv')
income_df = pd.read_csv('Income-Cleaned.csv')

merged_df = wealth_df.merge(consumption_df, left_index=True, right_index=True).merge(income_df, left_index=True, right_index=True)

display(merged_df.head())

Unnamed: 0,month_x,national_wealth_index,explained_var_PC1,month_y,weighted_avg_consumption,month,national_weighted_avg_income
0,2014-04,0.208205,0.332771,2014-04,7208.683747,2014-04,12811.15146
1,2014-05,0.223564,0.34148,2014-05,7005.035044,2014-05,14071.83162
2,2014-06,0.222262,0.329031,2014-06,7311.807185,2014-06,13803.13302
3,2014-07,0.246354,0.332035,2014-07,8051.858683,2014-07,14442.45259
4,2014-08,0.211817,0.334374,2014-08,7449.553517,2014-08,13169.36297


## Perform regression

### Subtask:
Regress consumption on wealth and income.


**Reasoning**:
Import the statsmodels API and perform the OLS regression of consumption on wealth and income, adding a constant term to the independent variables, and then display the results summary.



In [None]:
import statsmodels.api as sm
import numpy as np

# Define the independent variables (wealth and income) and dependent variable (consumption)
X = merged_df[['national_wealth_index', 'national_weighted_avg_income']].apply(np.log)
y = merged_df['weighted_avg_consumption'].apply(np.log)

# Add a constant term to the independent variables
X = sm.add_constant(X)

# Fit the OLS model
model_results = sm.OLS(y, X).fit()

# Display the regression results summary
display(model_results.summary())

0,1,2,3
Dep. Variable:,weighted_avg_consumption,R-squared:,0.328
Model:,OLS,Adj. R-squared:,0.317
Method:,Least Squares,F-statistic:,31.45
Date:,"Tue, 28 Oct 2025",Prob (F-statistic):,7.52e-12
Time:,20:14:03,Log-Likelihood:,21.692
No. Observations:,132,AIC:,-37.38
Df Residuals:,129,BIC:,-28.73
Df Model:,2,,
Covariance Type:,nonrobust,,

0,1,2,3,4,5,6
,coef,std err,t,P>|t|,[0.025,0.975]
const,3.8287,1.418,2.700,0.008,1.023,6.634
national_wealth_index,0.0567,0.161,0.351,0.726,-0.263,0.376
national_weighted_avg_income,0.5377,0.129,4.178,0.000,0.283,0.792

0,1,2,3
Omnibus:,84.104,Durbin-Watson:,0.29
Prob(Omnibus):,0.0,Jarque-Bera (JB):,376.116
Skew:,-2.396,Prob(JB):,2.1299999999999998e-82
Kurtosis:,9.739,Cond. No.,789.0


## Calculate cay

### Subtask:
Calculate the `cay` index as the residuals of the regression.


**Reasoning**:
Calculate the residuals from the fitted OLS model and store them in a new column 'cay' in the merged_df DataFrame.



In [None]:
merged_df['cay'] = model_results.resid
display(merged_df.head())

Unnamed: 0,month_x,national_wealth_index,explained_var_PC1,month_y,weighted_avg_consumption,month,national_weighted_avg_income,cay
0,2014-04,0.208205,0.332771,2014-04,7208.683747,2014-04,12811.15146,0.057466
1,2014-05,0.223564,0.34148,2014-05,7005.035044,2014-05,14071.83162,-0.025694
2,2014-06,0.222262,0.329031,2014-06,7311.807185,2014-06,13803.13302,0.027865
3,2014-07,0.246354,0.332035,2014-07,8051.858683,2014-07,14442.45259,0.094101
4,2014-08,0.211817,0.334374,2014-08,7449.553517,2014-08,13169.36297,0.07453


## Check for stationarity

### Subtask:
Perform a stationarity test (e.g., Augmented Dickey-Fuller test) on the `cay` index.


**Reasoning**:
Perform the Augmented Dickey-Fuller test on the 'cay' column of the merged_df DataFrame and print the results.



In [None]:
from statsmodels.tsa.stattools import adfuller

# Perform the Augmented Dickey-Fuller test on the 'cay' column
adf_test = adfuller(merged_df['cay'])

# Print the results
print('ADF Statistic: %f' % adf_test[0])
print('p-value: %f' % adf_test[1])
print('Critical Values:')
for key, value in adf_test[4].items():
    print('\t%s: %.3f' % (key, value))

ADF Statistic: -3.176350
p-value: 0.021395
Critical Values:
	1%: -3.483
	5%: -2.885
	10%: -2.579


In [None]:
merged_df[['month_x', 'cay']].to_csv('cay_index.csv', index=False)
print("cay index saved to cay_index.csv")

cay index saved to cay_index.csv


## Summary and Interpretation

Based on the analysis:

1.  **Regression Results**: The regression of the log of consumption on the log of wealth and income shows that the log of income is a significant predictor of the log of consumption, while the log of wealth is not statistically significant in this model. The R-squared value indicates that approximately 32.8% of the variance in the log of consumption is explained by the model.

2.  **CAY Index**: The `cay` index, calculated as the residuals of this regression, represents the deviations of actual consumption from the level predicted by wealth and income.

3.  **Stationarity Test (ADF Test)**: The Augmented Dickey-Fuller (ADF) test was performed on the `cay` index to check for stationarity.
    *   **ADF Statistic**: -3.176350
    *   **p-value**: 0.021395
    *   **Critical Values**:
        *   1%: -3.483
        *   5%: -2.885
        *   10%: -2.579

    Since the p-value (0.021395) is less than the significance level of 0.05 (and also less than the 10% critical value), we reject the null hypothesis of non-stationarity. This suggests that the `cay` index is **stationary**.

**Interpretation**:

The stationarity of the `cay` index is an important finding. In economic terms, a stationary `cay` index implies that the relationship between consumption, wealth, and income, as captured by the residuals, tends to revert to its mean over time. This is consistent with the concept of cointegration between these variables, suggesting a stable long-run relationship. The `cay` index can be interpreted as a measure of consumption deviations from this long-run relationship, and its stationarity means that these deviations are temporary and do not persist indefinitely.