<a href="https://colab.research.google.com/github/bhatmohit/Financial_Statement_Analysis/blob/main/MDA.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

Objective: Analyze MD&A data to measure tone and evaluate peformance of the company. Identify firms with positive and negative tone, group the observations and calculate the average future return on assets.

In [38]:
#import all the libraries
import pandas as pd
import re
import numpy as np


In [39]:
#Mount Google drive
from google.colab import drive
drive.mount('/content/drive')

Drive already mounted at /content/drive; to attempt to forcibly remount, call drive.mount("/content/drive", force_remount=True).


In [40]:
#Read the pickle file containing sample mda data, and load it to dataframe
main_df = pd.read_pickle('drive/MyDrive/MDA_Project/Data/mda_sample.pkl') 

In [41]:
#Load positive and negative words identified by Loughran and McDonald from the excel file  
pos_list_df = pd.read_excel('drive/MyDrive/MDA_Project/Data/LoughranMcDonald_SentimentWordLists_2018.xlsx', 'Positive')
neg_list_df = pd.read_excel('drive/MyDrive/MDA_Project/Data/LoughranMcDonald_SentimentWordLists_2018.xlsx', 'Negative')

In [42]:
#Put MD&A data in a list
mda_text_list = main_df.loc[:,'txt'].values

#Put positive and negative words in corresponding lists
pos_list = pos_list_df['Words'].values
neg_list = neg_list_df['Words'].values

In [43]:
results = []

#Loop over each mda
for mda in mda_text_list:
  
  #create variables to store count for each mda 
  pos = 0
  neg = 0
  total_count = 0
  
  #Split mda data into words using re library
  mda_words = re.findall('\w+', mda.upper())
  
  for word in mda_words:
    total_count += 1
    if word in pos_list:
      pos += 1 
    if word in neg_list:
      neg += 1
  #Calculate net_tone for each mda data    
  net_tone = (pos - neg)/total_count
  results.append(net_tone)

In [44]:
main_df['net_tone'] = results
main_df = main_df.sort_values('net_tone')
main_df.head()

Unnamed: 0,gvkey,fyear,txt,roa,roa_ny,net_tone
18194,10124,2006.0,ITEM 7MANAGEMENTS DISCUSSION AND ANALYSIS OF F...,0.007906,0.088232,-0.033584
560,1274,2007.0,Item 7. Managements\n Discussion and Analys...,0.049408,0.021977,-0.031345
878,1487,2008.0,Item 7. Managements\n Discussion and Analys...,-0.093624,-0.012725,-0.029427
3610,2916,2003.0,Item 7. Managements Discussion and Analysis\n ...,-0.141581,0.0322,-0.024023
3562,2889,2003.0,Item 6. Management's Discussion and Analysis ...,-0.088564,-0.043896,-0.02228


In [45]:
#classify the data based on net_tone. If 
main_df = (main_df.assign(positive_tone = lambda x: np.where(x['net_tone'] >=0, 1, 0)))
main_df.head()

Unnamed: 0,gvkey,fyear,txt,roa,roa_ny,net_tone,positive_tone
18194,10124,2006.0,ITEM 7MANAGEMENTS DISCUSSION AND ANALYSIS OF F...,0.007906,0.088232,-0.033584,0
560,1274,2007.0,Item 7. Managements\n Discussion and Analys...,0.049408,0.021977,-0.031345,0
878,1487,2008.0,Item 7. Managements\n Discussion and Analys...,-0.093624,-0.012725,-0.029427,0
3610,2916,2003.0,Item 7. Managements Discussion and Analysis\n ...,-0.141581,0.0322,-0.024023,0
3562,2889,2003.0,Item 6. Management's Discussion and Analysis ...,-0.088564,-0.043896,-0.02228,0


In [46]:
#Group data based on positive_tone and compare the results using roa_ny column
main_df.groupby(['positive_tone'])['roa_ny'].mean()

positive_tone
0   -0.037617
1    0.075141
Name: roa_ny, dtype: float64

We can see that firms with positive tone perform much better than firms with negative tone.

roa_ny = Return On Assets (Next Year)