# Sentimental Analysis of the Financial Reports

This module uses Annual Report of 'HDFC', 'Hindustan Unilever', 'Infosys', 'Kotak', 'Reliance' and 'TCS' that are publicly listed companies and analyses the management's discussion section. This section of the report if written with a negative sentiment shows the performance is going down. The dictionary used is Loughran-McDonald words dictionary because the standard NLTK dictionary does not hold true in case of finance due to different contextual meaning.

The process is as follows:
1. Read the MD&A from an annual report and tokenize the same
2. Remove stop words such as and, of, the, etc.
3. Importing Loughran-McDonald dictionaries and separating the words into positive and negative words
4. Calculating overall sentiment score

The analysis can be easily reproduced for other firms to check their sentiment using this approach

## Tata Consultancy Services (TCS)

In [1]:
import nltk
from nltk.tokenize import RegexpTokenizer
import re

### 2015-16

In [2]:
'''
TCS in 2015-16
'''

with open ("tcs_2015_16.txt", encoding="utf8", errors='ignore') as f:
    tokens_old = nltk.word_tokenize(f.read())
    
no_digit = re.compile('.*[A-Za-z].*')
tokens = [w for w in tokens_old if no_digit.match(w)]
tokens = list(map(lambda x: x.lower(),tokens))
print("Total words in the TCS 2015-16 report are {0}".format(len(tokens)))

Total words in the TCS 2015-16 report are 2283


In [3]:
fdist = nltk.FreqDist(tokens)
stop = []
for word, frequency in fdist.most_common(10):
    stop.append(word)
print("Stop words are {0}".format(stop))

Stop words are ['the', 'to', 'in', 'and', 'of', 'a', 'for', 'company', 'digital', 'our']


In [4]:
count = 0
for word in tokens:
    if word in stop:
        count += 1
print(count)

515


In [5]:
'''
The stopwords do not have 'no', 'not' and 'never' in them
So, directly removing these words in the next step
'''
final_tokens = [x for x in tokens if x not in stop]
total_tokens = len(final_tokens)

In [6]:
# Source: https://sraf.nd.edu/textual-analysis/resources/
# This source provides Loughran-McDonald Sentiment word lists for finance
lm_neg = open("LM_neg_words.txt").read().lower()
lm_pos = open("LM_pos_words.txt").read().lower()

li_neg = lm_neg.split('\n')
li_pos = lm_pos.split('\n')

checking = {}
checking['positive'] = li_pos
checking['negative'] = li_neg

result_pos = []
result_neg = []
for word in final_tokens:
    if word in checking['positive']:
        result_pos.append(word)
    elif word in checking['negative']:
        result_neg.append(word)

print()
print("Total positive words are {0}".format(len(result_pos)))
print('Positive words are: ')
print(result_pos)
print()
print("Total negative words are {0}".format(len(result_neg)))
print('Negative words are: ')
print(result_neg)


Total positive words are 48
Positive words are: 
['strong', 'profitable', 'enabled', 'valuable', 'reward', 'enjoy', 'highest', 'enabling', 'positive', 'strength', 'exceptional', 'improvement', 'satisfaction', 'opportunities', 'efficiency', 'gain', 'improve', 'leading', 'proactive', 'succeed', 'proactively', 'greater', 'collaboration', 'enabling', 'leadership', 'opportunity', 'delighted', 'collaboration', 'collaborative', 'proactive', 'innovation', 'innovations', 'innovate', 'innovative', 'collaboration', 'innovation', 'innovation', 'opportunity', 'gain', 'benefitting', 'empower', 'innovative', 'opportunities', 'delighted', 'rewarded', 'satisfaction', 'innovative', 'encouragement']

Total negative words are 16
Negative words are: 
['adverse', 'severely', 'slowdown', 'weak', 'forcing', 'default', 'pervasive', 'disruptions', 'threats', 'persistent', 'threat', 'threats', 'stringent', 'impossible', 'challenges', 'challenging']


In [7]:
'''
Calculating
(positive words - negative words)/total words
'''

pos = len(result_pos)
neg = len(result_neg)

sentiment = (pos-neg)/total_tokens
print()
print("TCS's total sentiment in 2015-16 is : {0}".format(sentiment))


TCS's total sentiment in 2015-16 is : 0.01809954751131222


### 2016-17

In [8]:
'''
TCS 2016-17
'''

with open ("tcs_2016_17.txt", encoding="utf8", errors='ignore') as f2:
    tokens_old = nltk.word_tokenize(f2.read())
    
no_digit2 = re.compile('.*[A-Za-z].*')
tokens2 = [w for w in tokens_old if no_digit2.match(w)]
tokens2 = list(map(lambda x: x.lower(),tokens2))
print("Total words in the TCS 2016-17 report are {0}".format(len(tokens2)))

Total words in the TCS 2016-17 report are 1595


In [9]:
fdist = nltk.FreqDist(tokens2)
stop2 = []
for word, frequency in fdist.most_common(10):
    stop2.append(word)
print("Stop words are {0}".format(stop2))

Stop words are ['the', 'in', 'and', 'of', 'to', 'our', 'a', 'for', 'tcs', 'company']


In [10]:
count = 0
for word in tokens2:
    if word in stop2:
        count += 1
print(count)

404


In [11]:
'''
The stopwords do not have 'no', 'not' and 'never' in them
So, directly removing these words in the next step
'''
final_tokens2 = [x for x in tokens2 if x not in stop2]
total_tokens2 = len(final_tokens2)

In [12]:
lm_neg2 = open("LM_neg_words.txt").read().lower()
lm_pos2 = open("LM_pos_words.txt").read().lower()

li_neg2 = lm_neg2.split('\n')
li_pos2 = lm_pos2.split('\n')

checking2 = {}
checking2['positive'] = li_pos2
checking2['negative'] = li_neg2

result_pos2 = []
result_neg2 = []
for word in final_tokens2:
    if word in checking2['positive']:
        result_pos2.append(word)
    elif word in checking2['negative']:
        result_neg2.append(word)


print()
print("Total positive words are {0}".format(len(result_pos2)))
print('Positive words are: ')
print(result_pos2)
print()
print("Total negative words are {0}".format(len(result_neg2)))
print('Negative words are: ')
print(result_neg2)


Total positive words are 35
Positive words are: 
['leadership', 'leadership', 'highest', 'stable', 'gain', 'leadership', 'leadership', 'profitable', 'happy', 'leading', 'innovative', 'enable', 'proactive', 'innovation', 'excellence', 'progressing', 'leading', 'progress', 'innovative', 'innovation', 'exciting', 'premier', 'innovation', 'innovation', 'alliances', 'achieving', 'empowering', 'happy', 'better', 'delighted', 'rewarded', 'satisfaction', 'opportunity', 'opportunity', 'encouragement']

Total negative words are 2
Negative words are: 
['critical', 'attrition']


In [13]:
'''
Calculating
(positive words - negative words)/total words
'''

pos2 = len(result_pos2)
neg2 = len(result_neg2)

sentiment2 = (pos2-neg2)/total_tokens2
print()
print("TCS's total sentiment in 2016-17 was : {0}".format(sentiment2))


TCS's total sentiment in 2016-17 was : 0.027707808564231738


### 2017-18

In [15]:
'''
TCS 2017-18
'''
with open ("tcs_2017_18.txt", encoding="utf8", errors='ignore') as f3:
    tokens_old = nltk.word_tokenize(f3.read())
    
no_digit3 = re.compile('.*[A-Za-z].*')
tokens3 = [w for w in tokens_old if no_digit3.match(w)]
tokens3 = list(map(lambda x: x.lower(),tokens3))
print("Total words in the TCS 2017-18 report are {0}".format(len(tokens3)))

Total words in the TCS 2017-18 report are 1674


In [16]:
fdist = nltk.FreqDist(tokens3)
stop3 = []
for word, frequency in fdist.most_common(10):
    stop3.append(word)
print("Stop words are {0}".format(stop3))

Stop words are ['the', 'and', 'in', 'of', 'our', 'to', 'we', 'a', 'on', 'year']


In [17]:
count = 0
for word in tokens3:
    if word in stop3:
        count += 1
print(count)

437


In [18]:
'''
The stopwords do not have 'no', 'not' and 'never' in them
So, directly removing these words in the next step
'''
final_tokens3 = [x for x in tokens3 if x not in stop3]
total_tokens3 = len(final_tokens3)

In [19]:
lm_neg3 = open("LM_neg_words.txt").read().lower()
lm_pos3 = open("LM_pos_words.txt").read().lower()

li_neg3 = lm_neg3.split('\n')
li_pos3 = lm_pos3.split('\n')

checking3 = {}
checking3['positive'] = li_pos3
checking3['negative'] = li_neg3

result_pos3 = []
result_neg3 = []
for word in final_tokens3:
    if word in checking3['positive']:
        result_pos3.append(word)
    elif word in checking3['negative']:
        result_neg3.append(word)


print()
print("Total positive words are {0}".format(len(result_pos3)))
print('Positive words are: ')
print(result_pos3)
print()
print("Total negative words are {0}".format(len(result_neg3)))
print('Negative words are: ')
print(result_neg3)


Total positive words are 51
Positive words are: 
['benefited', 'outstanding', 'distinctive', 'leadership', 'accomplishments', 'inspirational', 'leadership', 'happy', 'stable', 'opportunity', 'successful', 'leadership', 'success', 'great', 'leadership', 'opportunity', 'opportunity', 'strong', 'friendly', 'leadership', 'successful', 'abundance', 'strengthening', 'popular', 'progress', 'strengthens', 'strong', 'innovation', 'innovate', 'gain', 'advantage', 'innovation', 'strengthened', 'innovation', 'successful', 'successfully', 'innovation', 'innovation', 'innovation', 'premier', 'empowering', 'gaining', 'enjoy', 'worthy', 'benefited', 'winning', 'greater', 'opportunity', 'success', 'benefit', 'encouragement']

Total negative words are 5
Negative words are: 
['crucial', 'riskier', 'critical', 'difficult', 'attrition']


In [20]:
'''
Calculating
(positive words - negative words)/total words
'''

pos3 = len(result_pos3)
neg3 = len(result_neg3)

sentiment3 = (pos3-neg3)/total_tokens3
print()
print("TCS's total sentiment in 2017-18 was : {0}".format(sentiment3))


TCS's total sentiment in 2017-18 was : 0.037186742118027485
