### TODO Recording:

- Go to this GitHub page and show what we're using

https://github.com/simplenlg/simplenlg

- Scroll on this page and show
- Click on the link for "Arria NLG" in the first paragraph of the documentation
- Scroll on the main Arria page
- Hover over the "Technology & Products" link on top
- Click on "Arria for Excel"

- Again hover over the "Technology & Products" link on top
- Click on "Arria NLG Studio"

- Go to this URL and show

https://pypi.org/project/simplenlg/

- Go to this URL and show the example

https://github.com/Maleehak/SimpleNLG-Tutorial-in-Python/blob/main/NLG.ipynb

- Come to this notebook

- Paste the code cells in line by line for recording

Installing SimpleNLG library

In [1]:
!pip install simplenlg

Collecting simplenlg
  Downloading simplenlg-0.2.0-py3-none-any.whl (165 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m165.1/165.1 kB[0m [31m2.4 MB/s[0m eta [36m0:00:00[0ma [36m0:00:01[0m
[?25hInstalling collected packages: simplenlg
Successfully installed simplenlg-0.2.0


Importing all required packages

In [43]:
import simplenlg

from simplenlg.framework import *
from simplenlg.lexicon import *
from simplenlg.realiser.english import *
from simplenlg.phrasespec import *
from simplenlg.features import *

We are defining lexicon, realiser and nlgFactory 

In [44]:
lexicon = Lexicon.getDefaultLexicon()

realiser = Realiser(lexicon)

nlgFactory = NLGFactory(lexicon)

Generating  a simple phrase using SimpleNLG

In [45]:
s1 = nlgFactory.createSentence("budgeting is a fundamental skill for managing personal finances")

s1

{realisation="", category=DocumentCategory.SENTENCE, features={textComponents:[budgeting is a fundamental skill for managing personal finances]}}

Once you created the sentence, inorder to get the text we need to realise the sentence generated. Note that budgeting has been capitalised and a full stop is added

In [46]:
output = realiser.realiseSentence(s1)

print(output)

Budgeting is a fundamental skill for managing personal finances.


In [47]:
p = nlgFactory.createClause()

p.setSubject("She")
p.setVerb("trade")
p.setObject("the stock market")

realiser.realiseSentence(p)

'She trades the stock market.'

In [48]:
p = nlgFactory.createClause()

p.setSubject("She")
p.setVerb("trade")
p.setObject("the stock market")

p.setFeature(Feature.NEGATED,True)

realiser.realiseSentence(p)

'She does not trade the stock market.'

In [49]:
p = nlgFactory.createClause()

p.setSubject("She")
p.setVerb("trade")
p.setObject("the stock market")

p.setFeature(Feature.INTERROGATIVE_TYPE, InterrogativeType.YES_NO)

realiser.realiseSentence(p)

'Does She trade the stock market?'

In [50]:
p = nlgFactory.createClause()

p.setSubject("She")
p.setVerb("trade")
p.setObject("the stock market")

p.setFeature(Feature.INTERROGATIVE_TYPE, InterrogativeType.WHAT_OBJECT)

realiser.realiseSentence(p)

'What does She trade?'

In [51]:
subject = nlgFactory.createNounPhrase("John")

obj = nlgFactory.createNounPhrase("a budget")
verb = nlgFactory.createVerbPhrase("create")

subject.addModifier("careful")

p = nlgFactory.createClause()

p.setSubject(subject)
p.setObject(obj)
p.setVerb(verb)

realiser.realiseSentence(p)

'Careful John creates a budget.'

Loading a dataset of S&P 500 stocks with its financial informations in  columns.
https://www.kaggle.com/datasets/paytonfisher/sp-500-companies-with-financial-information?select=financials.csv

In [52]:
import pandas as pd

stocks_df = pd.read_csv('datasets/financials.csv')

stocks_df.sample(10)

Unnamed: 0,Symbol,Name,Sector,Price,Price/Earnings,Dividend Yield,Earnings/Share,52 Week Low,52 Week High,Market Cap,EBITDA,Price/Sales,Price/Book,SEC Filings
445,TWX,Time Warner Inc.,Consumer Discretionary,93.02,15.35,1.692777,6.62,103.9,85.88,74185800000.0,7671000000.0,2.373599,2.73,http://www.sec.gov/cgi-bin/browse-edgar?action...
108,CHD,Church & Dwight,Consumer Staples,47.38,24.42,1.836605,2.92,54.1799,43.21,11838960000.0,868000000.0,3.168245,6.28,http://www.sec.gov/cgi-bin/browse-edgar?action...
249,IBM,International Business Machines,Information Technology,147.59,10.67,3.899903,6.11,182.79,139.13,142433000000.0,16557000000.0,1.817167,7.7,http://www.sec.gov/cgi-bin/browse-edgar?action...
470,VAR,Varian Medical Systems,Health Care,112.82,29.93,0.0,2.69,130.29,77.73,10692680000.0,500600000.0,3.965225,7.32,http://www.sec.gov/cgi-bin/browse-edgar?action...
28,GOOG,Alphabet Inc Class C,Information Technology,1001.52,40.29,0.0,22.27,1186.89,803.1903,728535600000.0,32714000000.0,6.772653,4.67,http://www.sec.gov/cgi-bin/browse-edgar?action...
298,MA,Mastercard Inc.,Information Technology,160.62,34.99,0.592663,3.65,177.11,105.8,187102000000.0,7113000000.0,15.020556,26.93,http://www.sec.gov/cgi-bin/browse-edgar?action...
134,CSX,CSX Corp.,Industrials,50.47,21.94,1.510289,6.07,60.04,45.41,47340510000.0,5003000000.0,4.216355,4.27,http://www.sec.gov/cgi-bin/browse-edgar?action...
43,APC,Anadarko Petroleum Corp,Energy,56.2,-21.29,1.702997,-5.9,70.0,39.96,32129090000.0,3115000000.0,3.968221,2.88,http://www.sec.gov/cgi-bin/browse-edgar?action...
413,SPG,Simon Property Group Inc,Real Estate,152.18,13.56,5.036808,6.25,187.35,150.15,48139840000.0,4411515000.0,8.754495,13.24,http://www.sec.gov/cgi-bin/browse-edgar?action...
354,PAYX,Paychex Inc.,Information Technology,61.86,27.49,3.08928,2.26,73.1,54.2,23253670000.0,1414900000.0,7.248487,11.77,http://www.sec.gov/cgi-bin/browse-edgar?action...


Descriptive summary for financials are obtained

In [53]:
stocks_df.describe()

Unnamed: 0,Price,Price/Earnings,Dividend Yield,Earnings/Share,52 Week Low,52 Week High,Market Cap,EBITDA,Price/Sales,Price/Book
count,505.0,503.0,505.0,505.0,505.0,505.0,505.0,505.0,505.0,497.0
mean,103.830634,24.80839,1.895953,3.753743,122.623832,83.536616,49239440000.0,3590328000.0,3.941705,14.453179
std,134.427636,41.241081,1.537214,5.689036,155.36214,105.725473,90050170000.0,6840544000.0,3.46011,89.660508
min,2.82,-251.53,0.0,-28.01,6.59,2.8,2626102000.0,-5067000000.0,0.153186,0.51
25%,46.25,15.35,0.794834,1.49,56.25,38.43,12732070000.0,773932000.0,1.62949,2.02
50%,73.92,19.45,1.769255,2.89,86.68,62.85,21400950000.0,1614399000.0,2.89644,3.4
75%,116.54,25.75,2.781114,5.14,140.13,96.66,45119680000.0,3692749000.0,4.703842,6.11
max,1806.06,520.15,12.661196,44.09,2067.99,1589.0,809508000000.0,79386000000.0,20.094294,1403.38


We are defining some templates of sentences for getting the data of stock related to Current Market Price,sector,Market cap, dividend yield and also  financial position in respect of its EBITDA.

In [54]:
def create_descriptions(row):
    subject = nlgFactory.createNounPhrase(
        "The company " + str(row['Name']))

    verb1 = nlgFactory.createVerbPhrase(
        "is part of the " + str(row['Sector']) + " Sector")
    
    object1 = nlgFactory.createNounPhrase(
        " and is currently trading at $" + str(row['Price']) + " per share")
    
    clause1 = nlgFactory.createClause(subject, verb1, object1)
    
    verb2 = nlgFactory.createVerbPhrase("boasts")
    object2 = nlgFactory.createNounPhrase(
        "a market capitalization of " + str(round(row['Market Cap']/1e+9, 2)) + " billion dollars.")
    clause2 = nlgFactory.createClause(
        "the company", verb2, object2)

    object3 = nlgFactory.createNounPhrase(
        " The annual dividend yield for the company is " + str(round(row['Dividend Yield'], 2)))
    
    clause3 = nlgFactory.createClause(object3)   
    
    verb4 = nlgFactory.createVerbPhrase("have")
    object4 = nlgFactory.createNounPhrase(
        "a strong financial position with an EBITDA of " + 
        str(round(row['EBITDA']/1e+9, 2)) + " billion dollars.")
    clause4 = nlgFactory.createClause(
        "It", verb4, object4)
    
    if row['EBITDA'] < 0 :
        # This will negate the sentence
        clause4.setFeature(Feature.NEGATED, True)
        
    s1 = nlgFactory.createSentence(clause1)
    s2 = nlgFactory.createSentence(clause2)
    s3 = nlgFactory.createSentence(clause3)
    s4 = nlgFactory.createSentence(clause4)
    
    paragraph = nlgFactory.createParagraph([s1, s2, s3, s4])
    
    output = realiser.realise(paragraph).getRealisation()                                                                          
    
    return output.strip()

Creating a new text column for generated paragraphs by applying lambda function

In [55]:
stocks_df['text'] = stocks_df.apply(lambda r: create_descriptions(r), axis = 1)

stocks_df[['Name', 'Sector', 'Price', 'Dividend Yield', 'Market Cap', 'text']].sample(10)

Unnamed: 0,Name,Sector,Price,Dividend Yield,Market Cap,text
43,Anadarko Petroleum Corp,Energy,56.2,1.702997,32129090000.0,The company Anadarko Petroleum Corp is part of...
186,F5 Networks,Information Technology,137.25,0.0,8744186000.0,The company F5 Networks is part of the Informa...
178,Everest Re Group Ltd.,Financials,241.06,2.107823,10131890000.0,The company Everest Re Group Ltd. is part of t...
6,Acuity Brands Inc,Industrials,145.41,0.351185,6242378000.0,The company Acuity Brands Inc is part of the I...
338,Norfolk Southern Corp.,Industrials,136.89,2.018503,40543550000.0,The company Norfolk Southern Corp. is part of ...
161,Eastman Chemical,Materials,93.57,2.263084,14226830000.0,The company Eastman Chemical is part of the Ma...
438,The Cooper Companies,Health Care,223.17,0.026034,11297960000.0,The company The Cooper Companies is part of th...
20,Alexion Pharmaceuticals,Health Care,108.47,0.0,26172440000.0,The company Alexion Pharmaceuticals is part of...
230,Hewlett Packard Enterprise,Information Technology,15.04,1.928021,24800860000.0,The company Hewlett Packard Enterprise is part...
39,AmerisourceBergen Corp,Health Care,91.55,1.613246,20587700000.0,The company AmerisourceBergen Corp is part of ...


Checking out financial info of some stocks 

In [56]:
stocks_df.iloc[2, :]

Symbol                                                          ABT
Name                                            Abbott Laboratories
Sector                                                  Health Care
Price                                                         56.27
Price/Earnings                                                22.51
Dividend Yield                                             1.908982
Earnings/Share                                                 0.26
52 Week Low                                                    64.6
52 Week High                                                  42.28
Market Cap                                           102121042306.0
EBITDA                                                 5744000000.0
Price/Sales                                                 3.74048
Price/Book                                                     3.19
SEC Filings       http://www.sec.gov/cgi-bin/browse-edgar?action...
text              The company Abbott Laboratorie

Checking the generated paragraph for the same stock

In [57]:
stocks_df['text'][2]

'The company Abbott Laboratories is part of the Health Care Sector and is currently trading at $56.27 per share. The company boasts a market capitalization of 102.12 billion dollars. The annual dividend yield for the company is 1.91. It has a strong financial position with an EBITDA of 5.74 billion dollars.'

In [58]:
stocks_df.iloc[10, :]

Symbol                                                          AES
Name                                                       AES Corp
Sector                                                    Utilities
Price                                                         10.06
Price/Earnings                                                 9.96
Dividend Yield                                             4.961832
Earnings/Share                                                -1.72
52 Week Low                                                   12.05
52 Week High                                                   10.0
Market Cap                                             6920851212.0
EBITDA                                                 3001000000.0
Price/Sales                                                0.659514
Price/Book                                                      2.2
SEC Filings       http://www.sec.gov/cgi-bin/browse-edgar?action...
text              The company AES Corp is part o

In [59]:
stocks_df['text'][10]

'The company AES Corp is part of the Utilities Sector and is currently trading at $10.06 per share. The company boasts a market capitalization of 6.92 billion dollars. The annual dividend yield for the company is 4.96. It has a strong financial position with an EBITDA of 3.0 billion dollars.'

Checking stocks with negative EBITDA

In [60]:
stocks_df[stocks_df['EBITDA'] < 0]

Unnamed: 0,Symbol,Name,Sector,Price,Price/Earnings,Dividend Yield,Earnings/Share,52 Week Low,52 Week High,Market Cap,EBITDA,Price/Sales,Price/Book,SEC Filings,text
23,AGN,"Allergan, Plc",Health Care,164.2,10.65,1.643289,38.35,256.8,160.07,56668830000.0,-2888100000.0,4.820115,0.83,http://www.sec.gov/cgi-bin/browse-edgar?action...,"The company Allergan, Plc is part of the Healt..."
59,ADSK,Autodesk Inc,Information Technology,104.81,-77.07,0.0,-2.61,131.1,81.75,24348290000.0,-378100000.0,16.50682,224.13,http://www.sec.gov/cgi-bin/browse-edgar?action...,The company Autodesk Inc is part of the Inform...
143,XRAY,Dentsply Sirona,Health Care,56.85,22.65,0.600343,1.99,68.98,52.535,13390510000.0,-411100000.0,4.626262,1.8,http://www.sec.gov/cgi-bin/browse-edgar?action...,The company Dentsply Sirona is part of the Hea...
193,FE,FirstEnergy Corp,Utilities,30.64,11.18,4.673807,-14.49,35.22,27.93,13706080000.0,-5067000000.0,1.299448,2.19,http://www.sec.gov/cgi-bin/browse-edgar?action...,The company FirstEnergy Corp is part of the Ut...
209,GE,General Electric,Industrials,14.45,13.76,3.147541,-0.72,30.59,14.71,132249300000.0,-206000000.0,1.088761,1.7,http://www.sec.gov/cgi-bin/browse-edgar?action...,The company General Electric is part of the In...
229,HES,Hess Corporation,Energy,43.0,-9.33,2.26706,-19.94,55.48,37.25,14016130000.0,-819000000.0,3.780475,1.08,http://www.sec.gov/cgi-bin/browse-edgar?action...,The company Hess Corporation is part of the En...
245,INCY,Incyte,Health Care,83.92,-119.89,0.0,0.54,153.15,84.21,18220960000.0,-81686000.0,17.02699,10.25,http://www.sec.gov/cgi-bin/browse-edgar?action...,The company Incyte is part of the Health Care ...
299,MAT,Mattel Inc.,Consumer Discretionary,16.0,-14.68,0.0,-3.06,26.3,12.71,5843402000.0,-203599000.0,1.186372,3.87,http://www.sec.gov/cgi-bin/browse-edgar?action...,The company Mattel Inc. is part of the Consume...
336,NBL,Noble Energy Inc,Energy,25.43,105.96,1.477105,-2.32,39.6,22.985,13177330000.0,-518000000.0,4.697645,1.44,http://www.sec.gov/cgi-bin/browse-edgar?action...,The company Noble Energy Inc is part of the En...


Generated sentences for Negative EBITDA companies. Note that financial position for these companies are tagged as not strong.

In [61]:
stocks_df['text'][23]

'The company Allergan, Plc is part of the Health Care Sector and is currently trading at $164.2 per share. The company boasts a market capitalization of 56.67 billion dollars. The annual dividend yield for the company is 1.64. It does not have a strong financial position with an EBITDA of -2.89 billion dollars.'

In [62]:
stocks_df['text'][59]

'The company Autodesk Inc is part of the Information Technology Sector and is currently trading at $104.81 per share. The company boasts a market capitalization of 24.35 billion dollars. The annual dividend yield for the company is 0.0. It does not have a strong financial position with an EBITDA of -0.38 billion dollars.'