# Earnings Call Tone Impact on Short-Term Stock Returns
#### By: Ashton Meyer-Bibbins


## Research Questions

1) Does the tone (positive, negative, uncertain) expressed during an earnings call predict short-window abnormal stock returns, defined as the firm’s actual return over the [0, +1]-day event window surrounding the call in excess of the market return (proxied by the S&P 500 ETF, SPY)?
2) Does the tone to return relationship differ across industries, firm sizes, or leadership?
3) Do tone effects weaken or strengthen during high-volatility market days, as measured against the SPY (if there are large increases/decreases in SPY price, are the impacts of tone amplified or dampened)?
4) *Potential question*: After controlling for EPS surprise (the difference between actual returns and the forecasted returns by external analysts, which, when positive or negative, can have a significant impact on a company's stock performance), does tone still explain residual abnormal returns (measured with a [-1,+1] event window)?


## Motivation
Corporate earnings calls serve as the main bridge between enterprises and investors. They shape how markets interpret financial performance beyond the raw numbers, providing context to numeric output. While the quantitative outcomes of an earnings report are easy to measure, the language executives use – be it their tone, confidence, or underlying uncertainty – can carry additional weight, which has the potential to influence investor sentiment when localized to each occurrence.

It is important to study this relationship because, while markets are a quantitative beast, they also rely on narrative, context, and behavioral signals. Prior work has shown that tone effects on “abnormal performance” can be predicted in gradual post-announcement stock price drift. As a contrast, this project isolates the short-window reaction ([0,+1]) to measure the immediate market response to tone, providing a complementary perspective that is clear of other market influences, which conflate analyses. Understanding this aspect of the psychology behind financial decision-making provides an interesting lens into the impacts of behavior on markets, informing future analysis and serving as an input for future models.

Beyond its economic ties, this project provides an interesting computational exploration, combining natural language analysis and statistical models, which is becoming an ever-larger part of financial and economic research. By linking these natural language signals to numerical outcomes, it deepens (my) understanding of how unstructured information can be linked to statistical analysis, and how this information translates to measurable impacts.

Experience with finance and markets, which I have gained over the last 3 years, along with my interest in data science, serve as the foundation for my desire to pursue this project. It is particularly interesting in its combination of NLP and statistical analysis, and I look forward to seeing the results.


## Data Setting
This project draws on three publicly available datasets that together support analysis of how executive tone in earnings calls relates to short-window abnormal stock returns.
1. **[Earnings Call Transcripts (Motley Fool / Kaggle)](https://www.kaggle.com/datasets/tpotterer/motley-fool-scraped-earnings-call-transcripts)** - This dataset includes roughly 18,000 quarterly earnings-call transcripts for U.S.-listed companies. Each record provides the company ticker, call date, exchange, quarter, and full transcript text. The data were scraped from The Motley Fool’s public archives and compiled by Kaggle contributors. The transcripts are the unstructured textual foundation for tone analysis, allowing extraction of sentiment features using finance-specific linguistic dictionaries (see #4).
2. **[NASDAQ Daily Prices (Kaggle / Paul Mooney)](https://www.kaggle.com/datasets/svaningelgem/nasdaq-daily-stock-prices)** - This dataset contains daily open, high, low, close, adjusted-close, and volume (OHLCV) data for U.S. equities from roughly 2015-2024. It enables the computation of firm-level daily returns and the construction of event-window returns surrounding each earnings call date.
3. **[S&P 500 ETF (SPY) Prices (Kaggle)](https://www.kaggle.com/datasets/benjaminbtang/spy-historical-prices)*** - This dataset provides historical daily prices for the SPY ETF, which is used as a market benchmark. Subtracting SPY’s daily return from a firm’s daily return produces a simple measure of abnormal return, controlling for broad market movements.
4. *Supplemental dataset/tool* **|** ***[Loughran-McDonald Financial Sentiment Dictionary](https://sraf.nd.edu/loughranmcdonald-master-dictionary/)*** - Used map word occurrences in transcripts to finance-specific tone categories (positive, negative, uncertainty, etc.). This resource, widely adopted in accounting and finance research, ensures that the tone scores reflect financial meaning rather than generic sentiment.

Each dataset is stored in CSV format and will be merged on ticker and date keys to align firm-level and market-level data for each event window.

#### Potential Challenges
None of the datasets include formal datasheets; however, several contextual details may complicate or encourage deeper analysis:
1. **Coverage and survivorship bias** - The transcript dataset includes only companies covered by The Motley Fool, potentially omitting small-cap or delisted firms. This may over-represent large, stable firms and bias results toward those with stronger disclosure practices.
2. **Timing misalignment** - Earnings calls often take place after market hours, while price data are recorded at the market close. As a result, a “day 0” return may reflect information or expectations formed before the call rather than the call itself, making it important to define the event window ([0,+1]) carefully and account for weekends and holidays.
3. **Linguistic and formatting variation** - Transcripts differ in speaker labeling, punctuation, and inclusion of boilerplate disclaimers or operator remarks. These inconsistencies may distort tone-scoring unless the text is systematically cleaned.


## Method
Step 1: Load and prepare data
- Load the three datasets (earnings call transcripts, stock prices, and SPY benchmark) using pandas
- Standardize date formats and align all data by ticker and date.
- Functions (data-manipulation): load_data(), standardize_dates()
- Tests: Use small 3–5 row samples to confirm correct data types and successful merges.
- Output: Three clean DataFrames with properly formatted and aligned dates.
- Connection: Establishes base for the Multiple Datasets challenge goal by merging separate data sources.


Step 2: Clean transcripts and compute tone features
- Use Python’s built-in re library to remove punctuation, lowercase text, and normalize spacing.
- Count occurrences of positive, negative, and uncertainty words using the Loughran–McDonald financial dictionary.
- Calculate each tone category as a percentage of total words in the transcript.
- Functions (data-manipulation): clean_text(), compute_tone_scores()
- Tests: Verify results on short sample texts (“profits increased,” “uncertain outlook”) with known word counts.
- Output: Dataset with tone metrics (pos_pct, neg_pct, uncert_pct) for each earnings call.
- Connection: Creates the independent variables used in hypothesis testing (RQ1 and RQ2).


Step 3: Compute event-window and abnormal returns
- Compute daily returns for each stock and for SPY using adjusted close prices.
-Define Day 0 as the first trading day on or after the call date and Day +1 as the following trading day.
- Calculate abnormal returns as firm return minus SPY return, then sum over [0,+1] to get cumulative abnormal return (CAR).
- Functions (data-manipulation): compute_returns(), compute_abnormal_returns()
- Tests: Hand-check results on a small, synthetic dataset to confirm correct math and event-window handling.
- Output: Event-level dataset linking each call to its short-window abnormal return.
- Connection: Provides the dependent variable for statistical testing and supports Multiple Datasets.

Step 4: Merge tone and return data
- Merge tone metrics with event returns and add basic controls such as sector and firm-size proxies.
- Functions (data-manipulation): merge_features(), add_controls()
- Tests: Ensure one row per event after merging and confirm correct ticker/date alignment.
- Output: Combined dataset ready for modeling.
- Connection: Prepares data for hypothesis testing (RQ1 and RQ2).

Step 5: Hypothesis testing and modeling
- Run regression models using statsmodels to test whether tone predicts short-term abnormal returns:
     car_0p1 ~ pos_pct + neg_pct + uncert_pct + sector + size_proxy
- Evaluate coefficients, p-values, and confidence intervals to test significance.
- Adjust for multiple comparisons (e.g., Benjamini–Hochberg correction) if running across multiple tone types or sectors.
- Functions (data-manipulation): fit_model(), summarize_results()
- Tests: Use synthetic data with known relationships to confirm correct coefficient direction and model behavior.
- Interpretation:
- - RQ1: Positive coefficients on pos_pct or negative on neg_pct indicate tone predicts abnormal returns.
- - RQ2: Interaction terms or coefficient differences by sector suggest heterogeneity.
- Connection: Directly achieves the Statistical Hypothesis Testing challenge goal.

Step 6: Visualization
- Create plots to display tone distributions, tone vs. return relationships, and regression coefficients.
- Functions (plotting): plot_tone_vs_returns(), plot_coefficients()
- Tests: No formal testing; figures checked visually for accuracy and clarity.
- Output: Visual confirmation of tone–return relationships.
- Connection: Helps interpret quantitative results for RQ1 and RQ2.


*Step 7: Robustness and reporting (optional)*
- *Re-run models using alternative event windows ([−1,+1] or [0,+5]) to confirm consistency.*
- *Winsorize extreme returns to check for sensitivity to outliers.*
- *Save outputs, figures, and summary tables for reporting.*
- *Connection: Provides robustness checks for RQ1 and RQ3, ensuring conclusions are not window-dependent.*

#### Plan

The project will be completed in JupyterHub and divided into five main tasks, each designed to be clear, independent, and reproducible.


1) Setup and data preparation (2 hours): I will create an organized folder structure in JupyterHub with subfolders for raw data, processed data, figures, and reports. After confirming the environment setup, I will load the earnings call transcripts, stock prices, and SPY benchmark data using pandas. During this step, I will standardize date formats, check for missing or duplicated keys, and ensure that tickers and dates align across datasets to prepare for merging.


2) Text cleaning and tone computation (3 hours): Using Python’s re library, I will remove punctuation, normalize spacing, and lowercase the transcript text. I will then apply the Loughran-McDonald financial dictionary or the spaCy API to calculate the percentage of positive, negative, and uncertainty words for each transcript. The resulting tone features will be saved as a separate dataset and tested on a small subset of text examples to confirm accuracy.


3) Return calculations and event-window construction (3 hours): I will compute daily returns for both individual tickers and the SPY benchmark. For each earnings call, I will define the event window as [0,+1], where Day 0 represents the first trading day on or after the call. Abnormal returns will be calculated as the firm’s return minus SPY’s return, and cumulative abnormal returns (CAR) will be saved for each event. Manual checks on a small synthetic dataset will verify the accuracy of these calculations.


4) Merging, modeling, and hypothesis testing (5 hours): I will merge the tone dataset with abnormal returns and add control variables such as industry sector and firm size proxies (e.g., log of average volume). Using statsmodels, I will run regression models to test whether tone predicts short-window abnormal returns while controlling for other factors. I will interpret coefficients, p-values, and confidence intervals directly in the context of the research questions.


5) Visualization and reporting (~3 hours): The final step will involve creating plots to display the distribution of tone features, the relationship between tone and abnormal returns, and regression coefficients with confidence intervals. If time allows, I will perform quick robustness checks such as alternate event windows or light outlier filtering. All intermediate results, figures, and tables will be saved for reproducibility.


*This plan builts in buffer time and may be an overestimation*



## EDA Results

lalalla

In [2]:
import pandas as pd
import csv
import matplotlib.pyplot as plt
import seaborn as sns
import numpy as np
import pickle
import doctest
from pathlib import Path

### Initial Loading
#### Earnings Reports

In [108]:
reports_path = "data/motley-fool-data.pkl"
with open(reports_path, 'rb') as file:
    er = pickle.load(file)

er.head()

Unnamed: 0,date,exchange,q,ticker,transcript
0,"Aug 27, 2020, 9:00 p.m. ET",NASDAQ: BILI,2020-Q2,BILI,"Prepared Remarks:\nOperator\nGood day, and wel..."
1,"Jul 30, 2020, 4:30 p.m. ET",NYSE: GFF,2020-Q3,GFF,Prepared Remarks:\nOperator\nThank you for sta...
2,"Oct 23, 2019, 5:00 p.m. ET",NASDAQ: LRCX,2020-Q1,LRCX,Prepared Remarks:\nOperator\nGood day and welc...
3,"Nov 6, 2019, 12:00 p.m. ET",NASDAQ: BBSI,2019-Q3,BBSI,"Prepared Remarks:\nOperator\nGood day, everyon..."
4,"Aug 7, 2019, 8:30 a.m. ET",NASDAQ: CSTE,2019-Q2,CSTE,Prepared Remarks:\nOperator\nGreetings and wel...


In [110]:
er['date_cleaned'] = er['date'].str.strip()
er['date_cleaned'] = er['date_cleaned'].str.replace(".", "")
er['date_cleaned'] = er['date_cleaned'].str.replace("ET","")
er['datetime_std'] = pd.to_datetime(er['date_cleaned'], format='mixed')

er['datetime_std'].head()

0   2020-08-27 21:00:00
1   2020-07-30 16:30:00
2   2019-10-23 17:00:00
3   2019-11-06 12:00:00
4   2019-08-07 08:30:00
Name: datetime_std, dtype: datetime64[ns]

In [111]:
er[["datetime_std","ticker"]]

Unnamed: 0,datetime_std,ticker
0,2020-08-27 21:00:00,BILI
1,2020-07-30 16:30:00,GFF
2,2019-10-23 17:00:00,LRCX
3,2019-11-06 12:00:00,BBSI
4,2019-08-07 08:30:00,CSTE
...,...,...
18750,2021-11-09 13:00:00,SWX
18751,2021-11-18 12:00:00,PNNT
18752,2022-02-08 11:00:00,TDG
18753,2022-02-28 16:30:00,DVAX


#### Nasdaq Price Data

In [112]:
folder_path = Path('data/nasdaq_prices')
nasdaq_files = list(folder_path.glob('*.csv'))

df_list = [pd.read_csv(file) for file in nasdaq_files]        
ohlcv = pd.concat(df_list, ignore_index=True)

In [113]:
ohlcv.head()
ohlcv.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 4621943 entries, 0 to 4621942
Data columns (total 6 columns):
 #   Column  Dtype  
---  ------  -----  
 0   ticker  object 
 1   date    object 
 2   open    float64
 3   high    float64
 4   low     float64
 5   close   float64
dtypes: float64(4), object(2)
memory usage: 211.6+ MB


In [114]:
ohlcv['date_std'] = pd.to_datetime(ohlcv['date'])
# ohlcv.head()
ohlcv.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 4621943 entries, 0 to 4621942
Data columns (total 7 columns):
 #   Column    Dtype         
---  ------    -----         
 0   ticker    object        
 1   date      object        
 2   open      float64       
 3   high      float64       
 4   low       float64       
 5   close     float64       
 6   date_std  datetime64[ns]
dtypes: datetime64[ns](1), float64(4), object(2)
memory usage: 246.8+ MB


#### SPY Index Data (S&P 500 Index)

In [99]:
spy = pd.read_csv('data/SPY.csv')
spy.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 7703 entries, 0 to 7702
Data columns (total 7 columns):
 #   Column     Non-Null Count  Dtype  
---  ------     --------------  -----  
 0   Date       7703 non-null   object 
 1   Open       7703 non-null   float64
 2   High       7703 non-null   float64
 3   Low        7703 non-null   float64
 4   Close      7703 non-null   float64
 5   Adj Close  7703 non-null   float64
 6   Volume     7703 non-null   int64  
dtypes: float64(5), int64(1), object(1)
memory usage: 421.4+ KB


In [100]:
spy['Date'].dtype

dtype('O')

In [101]:
# should i be making a new df for this update
spy['date_std'] = pd.to_datetime(spy['Date'])
# spy.head()
spy.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 7703 entries, 0 to 7702
Data columns (total 8 columns):
 #   Column     Non-Null Count  Dtype         
---  ------     --------------  -----         
 0   Date       7703 non-null   object        
 1   Open       7703 non-null   float64       
 2   High       7703 non-null   float64       
 3   Low        7703 non-null   float64       
 4   Close      7703 non-null   float64       
 5   Adj Close  7703 non-null   float64       
 6   Volume     7703 non-null   int64         
 7   date_std   7703 non-null   datetime64[ns]
dtypes: datetime64[ns](1), float64(5), int64(1), object(1)
memory usage: 481.6+ KB


#### Loughran-McDonald Master Dictionary w/ Sentiment Word Lists

In [89]:
lmd = pd.read_csv('data/Loughran-McDonald_MasterDictionary_1993-2024.csv')
# lmd.head()
lmd.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 86553 entries, 0 to 86552
Data columns (total 17 columns):
 #   Column              Non-Null Count  Dtype  
---  ------              --------------  -----  
 0   Word                86552 non-null  object 
 1   Seq_num             86553 non-null  int64  
 2   Word Count          86553 non-null  int64  
 3   Word Proportion     86553 non-null  float64
 4   Average Proportion  86553 non-null  float64
 5   Std Dev             86553 non-null  float64
 6   Doc Count           86553 non-null  int64  
 7   Negative            86553 non-null  int64  
 8   Positive            86553 non-null  int64  
 9   Uncertainty         86553 non-null  int64  
 10  Litigious           86553 non-null  int64  
 11  Strong_Modal        86553 non-null  int64  
 12  Weak_Modal          86553 non-null  int64  
 13  Constraining        86553 non-null  int64  
 14  Complexity          86553 non-null  int64  
 15  Syllables           86553 non-null  int64  
 16  Sour

### Table Join
Joining 

In [115]:
er['ticker'].isin(ohlcv['ticker']).value_counts()

ticker
False    15832
True      2923
Name: count, dtype: int64

In [3]:
def ticker_compare (df1, col1, df2, col2):
    '''
    Given two dataframes and two column names, returns the values shared
    between the two dataframe columns, for each dataframe (returned as a
    tuple of two separate dataframes).

    >>> df1 = pd.DataFrame({'ticker': ['AAPL', 'MSFT', 'GOOG'], 'price': [100, 200, 300]})
    >>> df2 = pd.DataFrame({'ticker': ['AAPL', 'TSLA'], 'text': ['apple er', 'tesla er']})
    >>> df1_common, df2_common = ticker_compare(df1, 'ticker', df2, 'ticker')
    >>> sorted(df1_common['ticker'].unique())
    ['AAPL']
    >>> sorted(df1_common['price'].unique())
    [100]
    >>> sorted(df2_common['ticker'].unique())
    ['AAPL']

    >>> df3 = pd.DataFrame({'ticker': ['AMZN'], 'close': [150]})
    >>> df4 = pd.DataFrame({'ticker': ['NFLX'], 'text': ['netflix']})
    >>> df3_common, df4_common = ticker_compare(df3, 'ticker', df4, 'ticker')
    >>> len(df3_common)
    0
    >>> len(df4_common)
    0
    >>> sorted(df2_common['text'].unique())
    ['apple er']
    '''
    ticker_1 = set(df1[col1])
    ticker_2 = set(df2[col2])
    shared = ticker_1 & ticker_2

    df1_common = df1[df1[col1].isin(shared)]
    df2_common = df2[df2[col2].isin(shared)]

    return df1_common, df2_common

doctest.run_docstring_examples(ticker_compare, globals())

In [4]:
ohlcv_common, er_common = ticker_compare(ohlcv, 'ticker', er, 'ticker')

NameError: name 'ohlcv' is not defined

In [131]:
ohlcv_common.head()

Unnamed: 0,ticker,date,open,high,low,close,date_std
28689,AXGN,1986-12-17,0.0,4.1,4.0,4.0,1986-12-17
28690,AXGN,1986-12-18,0.0,4.1,4.0,4.0,1986-12-18
28691,AXGN,1986-12-19,0.0,4.1,4.0,4.0,1986-12-19
28692,AXGN,1986-12-22,0.0,4.1,4.0,4.0,1986-12-22
28693,AXGN,1986-12-23,0.0,3.89,3.67,3.67,1986-12-23


### Test Practice

In [14]:
prices = [100, 112, 125, 180, 111, 96]

In [15]:
# average price all-time
total = 0
for price in prices:
    total += price
avg_price = total / len(prices)
print(avg_price)

120.66666666666667


In [22]:
# daily return (as %)
returns = {}
for i in range(len(prices)):
    if i != 0:
        returns[i] = (prices[i] - prices[i-1])/prices[i-1]*100
    else:
        returns[i] = None

print(returns)

{0: None, 1: 12.0, 2: 11.607142857142858, 3: 44.0, 4: -38.333333333333336, 5: -13.513513513513514}
