# AIS Evaluation

## Purpose
This notebook is designed to replicate and extend the empirical evaluation conducted in the paper:

"Learning from the Best: Can Artificial Intelligence Replicate Equity Analyst Skill?"

Our objective is to apply the same methodology to our dataset of AIS (Analyst Insight Score) scores for the top 100 Indian stocks. The primary goals are:

- Alignment Test: Assess whether the AIS is consistent with analysts' revision behaviors in the Indian market.
- Asset Pricing Test: Examine the AIS's ability to forecast cross-sectional returns for Indian stocks with large market capitalizations.
- Baseline Comparison: Evaluate the efficacy of an alternative baseline score in stock selection.
- Bias Investigation: Investigate how AIS scores are influenced by potential biases or look-ahead effects.

## Notebook Structure
The notebook is organized into the following key sections, mirroring the steps outlined in the paper's Empirical Evaluation section:

### 1. Data Loading and Preparation
- AIS Scores:
    - Load the ais_scores.csv file containing AIS scores for the top 100 Indian stocks.
- Additional Data:
    - Load datasets for analyst revisions, including consensus price targets, recommendations, and EPS forecasts.
    - Load financial data, including Standardized Unexpected Earnings (SUE), market capitalization, and price-to-book ratios.
    - Load stock returns data for the asset pricing tests.
- Data Merging:
    - Merge the AIS scores with analyst revisions and financial data to create a comprehensive dataset for analysis.
    - Ensure all date formats and ticker symbols are consistent across datasets.

### 2. Variable Calculation
- AIS Change (AIS_Chg):
    - Calculate the quarter-over-quarter change in AIS for each stock to capture shifts in company outlook.
- SUE Change (SUE_Chg):
    - Calculate the quarter-over-quarter change in SUE to measure changes in earnings surprises.
- Dependent Variables:
    - Create discrete variables representing analyst revisions:
        - PT_Rev30/90: Price target revisions within 30/90 days post-earnings call.
        - RC_Rev30/90: Recommendation changes within 30/90 days.
        - EPS_Rev30/90: EPS forecast revisions within 30/90 days.
    - These variables are coded as:
        - 1: Increase beyond a specified threshold.
        - 0: Minor changes within the threshold.
        - -1: Decrease exceeding the threshold.

### 3. Correlation Analysis (Table II)
Objective:
- Examine the relationships between AIS, AIS_Chg, SUE, SUE_Chg, market capitalization, price-to-book ratio, and analyst revision proxies.

Method:
- Calculate the Pearson correlation coefficients between these variables.
- Analyze the correlation matrix to identify significant associations.

### 4. Regression Analysis (Table III)
Objective:
- Explore the predictive power of AIS and other variables on analyst revision behaviors.

Method:
- Perform linear regressions where analyst revision proxies are the dependent variables.
- Independent variables include AIS, AIS_Chg, SUE, SUE_Chg, market capitalization, and price-to-book ratio.
- Interpret the coefficients, t-statistics, and R-squared values to assess significance.

### 5. Asset Pricing Tests (Tables IV and V)
Objective:
- Determine if portfolios formed based on AIS and its change can generate abnormal returns.

Method:
- Portfolio Formation:
    - Sort stocks into quintiles based on AIS and AIS_Chg.
    - Construct equal-weighted and value-weighted portfolios for each quintile.
- Excess Returns Calculation:
    - Calculate monthly excess returns by subtracting the risk-free rate.
- Regression Against Fama-French Factors:
    - Regress portfolio excess returns against the Fama-French three-factor and five-factor models.
    - Analyze alphas (intercepts) to see if portfolios earn significant abnormal returns.
- Comparison:
    - Compare the performance of AIS-based portfolios with those based on traditional metrics like SUE.

### 6. Baseline Test
Objective:
- Evaluate the effectiveness of a baseline AIS score generated from a simple prompt without the analyst template.

Method:
- Repeat the asset pricing tests using the baseline AIS scores.
- Compare the results to those obtained using the template-based AIS to assess the added value of incorporating analyst expertise.

### 7. Counterfactual Transcript Test (Will be done in a separate notebook)

Objective:
- Investigate potential biases or look-ahead effects in AIS scores.

Method:
- Generate counterfactual transcripts by modifying the sentiment of original transcripts (e.g., turning positive statements into negative ones).
- Recalculate AIS scores for these modified transcripts.
- Perform regression analysis to determine if AIS scores are primarily driven by content sentiment or influenced by biases toward certain stocks.

### Import Libraries and Load Data


In [2]:
# Import necessary libraries
import pandas as pd
import numpy as np
import statsmodels.api as sm
from scipy import stats

# Load AIS scores from CSV
ais_df = pd.read_csv('ais_scores.csv')

# Inspect the data
print(ais_df.head())


FileNotFoundError: [Errno 2] No such file or directory: 'ais_scores.csv'

### Load Analyst Revisions Data

In [3]:

# Load data with columns: Date, Ticker, PT_consensus, RC_consensus, EPS_consensus
analyst_df = pd.read_csv('analyst_revisions.csv')

# Placeholder for Financial Data
# Load data with columns: Date, Ticker, SUE, MarketCap, Pb
financials_df = pd.read_csv('financial_data.csv')

# Placeholder for Stock Returns Data
# Load data with columns: Date, Ticker, Return
returns_df = pd.read_csv('stock_returns.csv')

# Ensure date columns are in datetime format
ais_df['Date'] = pd.to_datetime(ais_df['Date'])
analyst_df['Date'] = pd.to_datetime(analyst_df['Date'])
financials_df['Date'] = pd.to_datetime(financials_df['Date'])
returns_df['Date'] = pd.to_datetime(returns_df['Date'])


FileNotFoundError: [Errno 2] No such file or directory: 'analyst_revisions.csv'

### Merge Data

In [4]:
# Merge AIS with Analyst Revisions
data = pd.merge(ais_df, analyst_df, on=['Date', 'Ticker'], how='left')

# Merge with Financial Data
data = pd.merge(data, financials_df, on=['Date', 'Ticker'], how='left')

# Sort data by Ticker and Date
data = data.sort_values(by=['Ticker', 'Date']).reset_index(drop=True)


NameError: name 'ais_df' is not defined

### Calculate Metrics

In [None]:
# Calculate AIS Chg
data['AIS_Chg'] = data.groupby('Ticker')['AIS'].diff()

# Calculate SUE Chg
data['SUE_Chg'] = data.groupby('Ticker')['SUE'].diff()


In [None]:
# Function to calculate future consensus changes
def calculate_future_change(df, consensus_col, periods):
    df = df.sort_values(by=['Ticker', 'Date'])
    df[consensus_col + '_future'] = df.groupby('Ticker')[consensus_col].shift(-periods)
    df[consensus_col + '_change'] = df[consensus_col + '_future'] - df[consensus_col]
    return df

# Apply function for different periods
data = calculate_future_change(data, 'PT_consensus', periods=30)
data = calculate_future_change(data, 'PT_consensus', periods=90)
data = calculate_future_change(data, 'RC_consensus', periods=30)
data = calculate_future_change(data, 'RC_consensus', periods=90)
data = calculate_future_change(data, 'EPS_consensus', periods=30)
data = calculate_future_change(data, 'EPS_consensus', periods=90)


In [None]:
# Define thresholds
pt_threshold = 0.01  
rc_threshold = 0.1   
eps_threshold = 0.01 

# Price Target Revisions
data['PT_Rev30'] = data['PT_consensus_change'].apply(lambda x: 1 if x > pt_threshold else (-1 if x < -pt_threshold else 0))
data['PT_Rev90'] = data['PT_consensus_change'].apply(lambda x: 1 if x > pt_threshold else (-1 if x < -pt_threshold else 0))

# Recommendation Changes
data['RC_Rev30'] = data['RC_consensus_change'].apply(lambda x: 1 if x > rc_threshold else (-1 if x < -rc_threshold else 0))
data['RC_Rev90'] = data['RC_consensus_change'].apply(lambda x: 1 if x > rc_threshold else (-1 if x < -rc_threshold else 0))

# EPS Forecast Revisions
data['EPS_Rev30'] = data['EPS_consensus_change'].apply(lambda x: 1 if x > eps_threshold else (-1 if x < -eps_threshold else 0))
data['EPS_Rev90'] = data['EPS_consensus_change'].apply(lambda x: 1 if x > eps_threshold else (-1 if x < -eps_threshold else 0))

# Log-transform MarketCap
data['MarketCap'] = np.log(data['MarketCap'])



### CorrelationMatrix (Table II)

In [None]:
# Select variables for correlation matrix
corr_vars = ['AIS', 'AIS_Chg', 'SUE', 'SUE_Chg', 'MarketCap', 'Pb',
             'PT_Rev30', 'PT_Rev90', 'RC_Rev30', 'RC_Rev90', 'EPS_Rev30', 'EPS_Rev90']

# Calculate correlation matrix
corr_matrix = data[corr_vars].corr()

# Display correlation matrix
print(corr_matrix)


### Regression Analysis (Table III)

In [None]:
# Define dependent variables
dependent_vars = ['PT_Rev30', 'PT_Rev90', 'RC_Rev30', 'RC_Rev90', 'EPS_Rev30', 'EPS_Rev90']

# Define independent variables
independent_vars = ['AIS', 'AIS_Chg', 'SUE', 'SUE_Chg', 'MarketCap', 'Pb']

# Loop over dependent variables and run regressions
for dep_var in dependent_vars:
    # Drop NaN values for this regression
    regression_data = data[independent_vars + [dep_var]].dropna()
    
    # Define X and y
    X = regression_data[independent_vars]
    y = regression_data[dep_var]
    
    # Add constant term
    X = sm.add_constant(X)
    
    # Fit OLS model
    model = sm.OLS(y, X).fit()
    
    # Print summary statistics
    print(f"Regression results for {dep_var}:")
    print(model.summary())
    print("\n")


### Asset Pricing Results (Table IV)


In [None]:
# Merge AIS with returns data
portfolio_data = pd.merge(ais_df, returns_df, on=['Date', 'Ticker'], how='left')

# Sort data by Date and Ticker
portfolio_data = portfolio_data.sort_values(by=['Date', 'Ticker']).reset_index(drop=True)

# Ensure AIS and Returns are numeric
portfolio_data['AIS'] = pd.to_numeric(portfolio_data['AIS'], errors='coerce')
portfolio_data['Return'] = pd.to_numeric(portfolio_data['Return'], errors='coerce')


In [None]:
# Function to assign quintiles
def assign_quintiles(df, variable):
    df['Quintile'] = df.groupby('Date')[variable].transform(
        lambda x: pd.qcut(x, 5, labels=False, duplicates='drop') + 1
    )
    return df

# Assign quintiles based on AIS
portfolio_data = assign_quintiles(portfolio_data, 'AIS')

# Calculate portfolio returns
portfolio_returns = portfolio_data.groupby(['Date', 'Quintile'])['Return'].mean().reset_index()

# Pivot table to have Quintiles as columns
portfolio_pivot = portfolio_returns.pivot(index='Date', columns='Quintile', values='Return')

# Calculate Q5 - Q1 return
portfolio_pivot['Q5-Q1'] = portfolio_pivot[5] - portfolio_pivot[1]


In [None]:
# Load risk-free rate data
rf_rate = pd.read_csv('risk_free_rate.csv')  # Data with columns: Date, RF

# Merge with portfolio returns
portfolio_pivot = portfolio_pivot.merge(rf_rate, on='Date', how='left')

# Calculate excess returns
for quintile in [1, 2, 3, 4, 5, 'Q5-Q1']:
    portfolio_pivot[f'Excess_Return_{quintile}'] = portfolio_pivot[quintile] - portfolio_pivot['RF']


In [None]:
# Load Fama-French factors
ff_factors = pd.read_csv('ff_factors.csv')  # Data with columns: Date, Mkt_RF, SMB, HML, RMW, CMA

# Merge with portfolio returns
portfolio_pivot = portfolio_pivot.merge(ff_factors, on='Date', how='left')

# Run regressions for Q5 - Q1 portfolio
import statsmodels.formula.api as smf

# Prepare DataFrame for regression
regression_data = portfolio_pivot.dropna()

# Define regression formula for three-factor model
formula_3f = 'Excess_Return_Q5-Q1 ~ Mkt_RF + SMB + HML'

# Define regression formula for five-factor model
formula_5f = 'Excess_Return_Q5-Q1 ~ Mkt_RF + SMB + HML + RMW + CMA'

# Three-factor model regression
model_3f = smf.ols(formula=formula_3f, data=regression_data).fit()
print("Three-Factor Model Regression Results:")
print(model_3f.summary())
print("\n")

# Five-factor model regression
model_5f = smf.ols(formula=formula_5f, data=regression_data).fit()
print("Five-Factor Model Regression Results:")
print(model_5f.summary())
