## AI Search Trends and Stock Market Excess Returns: Regression & Lag Analysis

In [1]:
# Download and clean data

import pandas as pd

df = pd.read_csv("final_features.csv")

In [2]:
# renaming unnamed column in feat
df.rename(columns = {'Unnamed: 0': 'Date'}, inplace = True)
df.head()

Unnamed: 0,Date,AMZN_ret,META_ret,GOOGL_ret,NVDA_ret,MSFT_ret,AMZN_excess,META_excess,GOOGL_excess,NVDA_excess,...,Meta AI_lag1,Meta AI_lag2,Microsoft AI_lag1,Microsoft AI_lag2,NVIDIA AI_lag1,NVIDIA AI_lag2,Google AI_lag1,Google AI_lag2,Amazon AI_lag1,Amazon AI_lag2
0,2020-03-01,0.035021,-0.133371,-0.132388,-0.023373,-0.023883,0.16014,-0.008252,-0.007268,0.101746,...,0.0,0.0,0.0,0.0,0.0,0.0,1.0,1.0,0.0,0.0
1,2020-04-01,0.2689,0.227278,0.159,0.108801,0.136327,0.142056,0.100434,0.032156,-0.018043,...,0.0,0.0,0.0,0.0,0.0,0.0,1.0,1.0,0.0,0.0
2,2020-05-01,-0.012785,0.099555,0.064469,0.214657,0.022543,-0.058067,0.054273,0.019187,0.169375,...,0.0,0.0,0.0,0.0,0.0,0.0,1.0,1.0,0.0,0.0
3,2020-06-01,0.129567,0.008797,-0.010792,0.070109,0.113652,0.111178,-0.009592,-0.02918,0.05172,...,0.0,0.0,0.0,0.0,0.0,0.0,1.0,1.0,0.0,0.0
4,2020-07-01,0.147114,0.117144,0.049293,0.118117,0.00737,0.092012,0.062043,-0.005808,0.063016,...,0.0,0.0,0.0,0.0,0.0,0.0,1.0,1.0,0.0,0.0


## Predictive Lag Models

In [3]:
import statsmodels.api as sm

df = df.dropna()

# Map stock tickers to their corresponding trends variable
company_map = {
    'NVDA': 'NVIDIA AI',
    'MSFT': 'Microsoft AI',
    'GOOGL': 'Google AI',
    'META': 'Meta AI',
    'AMZN': 'Amazon AI'
}

for ticker, trend in company_map.items():
    print("\n" + "="*80)
    print(f" Lag Analysis for {ticker} Excess Returns")
    print("="*80)

    excess_col = f"{ticker}_excess"
    
    # 1 month lag
    X1 = df[[f"{trend}_lag1"]]
    y = df[excess_col]
    X1 = sm.add_constant(X1)
    model1 = sm.OLS(y, X1).fit()  # Ordinary Least Squares Regression
    
    print(f"\n Lag 1 Regression (1 month Lag): {excess_col} ~ {trend}_lag1")
    print(model1.summary())

    # 2 month lag
    X2 = df[[f"{trend}_lag2"]]
    X2 = sm.add_constant(X2)
    model2 = sm.OLS(y, X2).fit()    # Ordinary Least Squares Regression

    print(f"\n Lag 2 Regression (2 month lag): {excess_col} ~ {trend}_lag2")
    print(model2.summary())

    #  OLS with lag 1 and lag 2
    X12 = df[[f"{trend}_lag1", f"{trend}_lag2"]]
    X12 = sm.add_constant(X12)
    model12 = sm.OLS(y, X12).fit()

    print(f"\n Combined Lag Regression: {excess_col} ~ lag1 + lag2")
    print(model12.summary())



 Lag Analysis for NVDA Excess Returns

 Lag 1 Regression (1 month Lag): NVDA_excess ~ NVIDIA AI_lag1
                            OLS Regression Results                            
Dep. Variable:            NVDA_excess   R-squared:                       0.000
Model:                            OLS   Adj. R-squared:                 -0.015
Method:                 Least Squares   F-statistic:                   0.03133
Date:                Sun, 23 Nov 2025   Prob (F-statistic):              0.860
Time:                        21:24:00   Log-Likelihood:                 50.510
No. Observations:                  68   AIC:                            -97.02
Df Residuals:                      66   BIC:                            -92.58
Df Model:                           1                                         
Covariance Type:            nonrobust                                         
                     coef    std err          t      P>|t|      [0.025      0.975]
-------------------------

Results of Lag Models:

The only models that are statistically significant are the GOOGL lag-1 model with a p-value of 0.008 and the GOOGL lag-2 model with a p-value of 0.018. However, both models have low R² values—0.101 and 0.008 respectively—meaning they explain less than 11% of the variance in Google’s excess returns. This suggests that search interest may lead market movements for GOOGL, but only weakly.

All other models have p-values much greater than 0.1, meaning they are statistically insignificant, along with R² values that are very close to 0.

It can be concluded from the lag analysis that most search trend variables do not significantly predict excess returns at lag-1 or lag-2. This suggests that increases in AI-related searches are more reflective of public attention rather than a consistent, tradable signal for future stock performance.

## General AI Interest to Predict Excess Returns

In [4]:
general_terms = ['AI', 'Machine Learning', 'OpenAI', 'ChatGPT', 'Gemini']

for stock in ['AMZN_excess', 'META_excess', 'GOOGL_excess', 'NVDA_excess', 'MSFT_excess']:
    X = df[general_terms]
    X = sm.add_constant(X)
    y = df[stock]
    
    model = sm.OLS(y, X).fit()
    print(f"\n General AI Trends {stock}")
    print(model.summary())


 General AI Trends AMZN_excess
                            OLS Regression Results                            
Dep. Variable:            AMZN_excess   R-squared:                       0.029
Model:                            OLS   Adj. R-squared:                 -0.050
Method:                 Least Squares   F-statistic:                    0.3638
Date:                Sun, 23 Nov 2025   Prob (F-statistic):              0.871
Time:                        21:24:00   Log-Likelihood:                 83.703
No. Observations:                  68   AIC:                            -155.4
Df Residuals:                      62   BIC:                            -142.1
Df Model:                           5                                         
Covariance Type:            nonrobust                                         
                       coef    std err          t      P>|t|      [0.025      0.975]
------------------------------------------------------------------------------------
const   

## Company Specific Search Interests to Predict Excess Returns

In [5]:
company_map = {
    'AMZN_excess': ['Amazon AI'],
    'META_excess': ['Meta AI'],
    'GOOGL_excess': ['Google AI'],
    'NVDA_excess': ['NVIDIA AI'],
    'MSFT_excess': ['Microsoft AI']
}

for stock, cols in company_map.items():
    X = df[cols]
    X = sm.add_constant(X)
    y = df[stock]
    
    model = sm.OLS(y, X).fit()
    print(f"\n Company-Specific Trend {stock}")
    print(model.summary())


 Company-Specific Trend AMZN_excess
                            OLS Regression Results                            
Dep. Variable:            AMZN_excess   R-squared:                       0.002
Model:                            OLS   Adj. R-squared:                 -0.013
Method:                 Least Squares   F-statistic:                    0.1597
Date:                Sun, 23 Nov 2025   Prob (F-statistic):              0.691
Time:                        21:24:01   Log-Likelihood:                 82.802
No. Observations:                  68   AIC:                            -161.6
Df Residuals:                      66   BIC:                            -157.2
Df Model:                           1                                         
Covariance Type:            nonrobust                                         
                 coef    std err          t      P>|t|      [0.025      0.975]
------------------------------------------------------------------------------
const          