# Background

We're using [IBM Watson's Tone Analyser API](https://cloud.ibm.com/apidocs/tone-analyzer?code=python) to score sentences for emotion.

The text for this analysis was extracted from multiple PDF reports.

In [34]:
import pandas as pd
from ibm_watson import ToneAnalyzerV3
from ibm_cloud_sdk_core.authenticators import IAMAuthenticator
from ibm_watson import ApiException
import json
import re
import os

### Enter API keys

This is the "lazy" way to do it, if you're not going to share your code with anyone else. Otherwise consider loading from JSON below.

In [35]:
apikey = ''

In [36]:
api_url = ''

### Read keys from JSON

Standard code chunk to protect keys (remember to exclude json files in git)

In [37]:
try:
    with open('keys.json') as json_file:
        keys = json.load(json_file)
        apikey = keys['apikey']
        api_url = keys['api_url']
except:
    print('is there a json key file?')

## Authenticate with IBM Watson

Standard flow from [IBM Watson documentation](https://cloud.ibm.com/apidocs/tone-analyzer?code=python#authentication)

In [38]:
authenticator = IAMAuthenticator(apikey)

tone_analyzer = ToneAnalyzerV3(
    version='2017-09-21',
    authenticator=authenticator
)

tone_analyzer.set_service_url(api_url)

## Helper functions

In [39]:
def analyse_tone(txt):
    try:
        tone_analysis = tone_analyzer.tone(
        {'text': txt},
        content_type='application/json').get_result()
    except ApiException as ex:
        print("Method failed with status code " + str(ex.code) + ": " + ex.message)
        
    return(tone_analysis['sentences_tone'])

In [40]:
# rows list initialization 
def build_frame(tone, i):
    rows = [] 

    # appending rows 
    for data in tone: 
        data_row = data['tones'] 
        sentence_text = data['text']
        sentence_id = data['sentence_id'] + i

        for row in data_row: 
            row['text']= sentence_text 
            row['sentence_id'] = sentence_id 
            rows.append(row) 

    # using data frame 
    return(pd.DataFrame(rows))

## Main loop

Cycle through all the text files in the `/txt/` directory, breaking them down into gobbets for the API.

In [42]:
files = os.listdir('./txt/')

In [43]:
for filename in files:
    file = open('./txt/%s' % filename, 'r')
    text = []
    for line in file.readlines():
        text.append(line.strip())
    text = list(filter(None, text))   # strip empty items
    
    
    # build the dataframe    
    df_tone = pd.DataFrame()

    for i in range(0,len(text),40):
        gobbet = ' '.join(text[i:i+40])
        result = analyse_tone(gobbet)
        tempframe = build_frame(result, i)
    
    df_tone = df_tone.append(tempframe)
    df_tone = df_tone.reset_index()
    
    
    output = filename.replace('txt', 'csv')   # create output filename
    df_tone.to_csv('./csv/%s' % output,
               encoding = 'utf-8',
               index = False)

In [13]:
df_tone

Unnamed: 0,index,score,tone_id,tone_name,text,sentence_id
0,0,0.743682,analytical,Analytical,The EU targets a reduction of CO2 emissions of...,0
1,1,0.703409,analytical,Analytical,This has already led to a 21% reduction in CO2...,1
2,2,0.736466,analytical,Analytical,"In addition, Europe is in the process of estab...",3
3,3,0.649361,analytical,Analytical,"Just 4% comes from the chemical industry, desp...",5
4,4,0.765977,analytical,Analytical,"Of that 4%, 2.2% originates from steam and hea...",6
5,5,0.571055,joy,Joy,Electrification is part of the solution.,7
6,6,0.897416,analytical,Analytical,Electrification is part of the solution.,7
7,7,0.823864,analytical,Analytical,"However, the need to store energy and mitigate...",8
8,8,0.738413,joy,Joy,"Here, the unique properties of hydrogen make i...",9
9,9,0.919824,analytical,Analytical,"For example, hydrogen: - Can be used to decarb...",10


In [14]:
df_tone.to_csv("/Users/matm/Desktop/Accenture_TOV.csv",
               encoding = 'utf-8',
               index = False)