# News Sentiment - Gleaning Insights

How do we glean insights from news sentiments?
* Sharp increase in the volume of news and the volume of rated news resulting in the captured news sentiments
* The impact of the moving average of positive or negative news sentiment, large spread between positive and negative news sentiment

Let us explore the use case on examples:
* Read credentials from file
* Connect to MySQL database using credentials
* Retrieve all scores for *Asset*, example Deutsche
* Carnival Corp, 2020
   * Retrieve Asset's scores and scatter plot *Negative Sentiment* for *Asset* by *Date* 
   * Line chart number of relevant ratings 
   * Line chart average sentiments of relevant rated stories By Date 
   * Chart *Price History* for the same time interval 
* Saudi Oil, 2020
   * Retrieve Asset's scores and scatter plot *Negative Sentiment* for *Asset* by *Date* 
   * Line chart number of relevant ratings 
   * Line chart average sentiments of relevant rated stories By Date 
   * Chart *Price History* for the same time interval 
* Facebook, 2018
   * Retrieve Asset's scores and scatter plot *Negative Sentiment* for *Asset* by *Date* 
   * Line chart number of relevant ratings 
   * Line chart average sentiments of relevant rated stories By Date 
   * Chart *Price History* for the same time interval 

### Read MySQL Creds

In [2]:
import requests, json, time,  sys

credFile = open(".\creds\creds.txt","r")    # one per line
                                                #--- USER---
                                                #--- PASSWORD---
USERNAME = credFile.readline().rstrip('\n')
PASSWORD = credFile.readline().rstrip('\n')

credFile.close()

# Make sure that creds are read in correctly
#print("USERNAME="+str(USERNAME))
#print("PASSWORD="+str(PASSWORD))

### Connect To MySQL Db

In [3]:
import mysql.connector

DATABASE='newsanalyticsdb'
myConn = ""

try:
    myConn = mysql.connector.connect(
      host="localhost",
      user=USERNAME,
      passwd=PASSWORD,
      database=DATABASE
)
except mysql.connector.Error as err:
    if err.errno == errorcode.ER_ACCESS_DENIED_ERROR:
        print("Something is wrong with your user name or password")
    elif err.errno == errorcode.ER_BAD_DB_ERROR:
        print("Database does not exist")
    else:
        print(err)
        
else:
    print("Connected to "+ DATABASE)   
  

Connected to newsanalyticsdb


### Define Retrieval By assetId
Retreive all scores for Deutsche

In [4]:
# intorduce fields to archive from news
newsArchivedFields = {
    'id': 'VARCHAR(255)', 
    'feedTimestamp' : 'VARCHAR(255)',
    'headline' : 'VARCHAR(520)',   #max is 512 + ...
    'subjects' : 'VARCHAR(10000)'}

# intorduce fields to archive from scores
scoresArchivedFields = {
    'id': 'VARCHAR(255)', 
    'assetId': 'VARCHAR(255)', 
    'assetName': 'VARCHAR(255)',
    'emeaTimestamp' : 'VARCHAR(255)',
    'sentimentNegative' : 'DECIMAL(10,6)',
    'sentimentNeutral' : 'DECIMAL(10,6)',
    'sentimentPositive' : 'DECIMAL(10,6)',
    'relevance' : 'DECIMAL(10,7)'}

def retrieveFromTable(tableName, assetId):
    
    RETRIEVE_QUERY = 'SELECT * FROM ' + tableName + ' WHERE assetId=\''+ assetId + '\''; 

    try:
        #Get cursor to db connection 
        myCursor = myConn.cursor(buffered=True)

        myCursor.execute(RETRIEVE_QUERY)

        rows = myCursor.fetchall()

        print('Total Row(s):', myCursor.rowcount)
        for row in rows:
            print(row)

    except Error as e:
        print(e)

    finally:
        myCursor.close()
        myConn.close()
    
retrieveFromTable('SCORES', 'P:4295869482')

Total Row(s): 1929
('tr:ACN58442a_2004062eqcVXMfJrR5/BOEM0tJrPsJv3uVEmRCR+tX83', 'P:4295869482', 'Deutsche Bank AG', '2020-04-06T04:14:11.079Z', Decimal('0.062453'), Decimal('0.140947'), Decimal('0.796600'), Decimal('0.7071070'))
('tr:ASA00QVX__2005191wDLEwbkltyogSLJ1WMJVuZ8VPL7cxOiob8a/x', 'P:4295869482', 'Deutsche Bank AG', '2020-05-19T14:47:00.717Z', Decimal('0.028249'), Decimal('0.115890'), Decimal('0.855861'), Decimal('1.0000000'))
('tr:ASN0002TA_2002101FLytXUPx2oOW3yDtBPlXKZKr3Fe7+5YyDU9vI', 'P:4295869482', 'Deutsche Bank AG', '2020-02-10T16:59:50.083Z', Decimal('0.157619'), Decimal('0.614813'), Decimal('0.227568'), Decimal('1.0000000'))
('tr:ASN0002TA_2002101H6RMvI6Dsg24mo5KJhdo23nAwscqq/osnRYwb', 'P:4295869482', 'Deutsche Bank AG', '2020-02-10T17:00:11.401Z', Decimal('0.052306'), Decimal('0.190794'), Decimal('0.756900'), Decimal('1.0000000'))
('tr:ASN0002TA_2002101LHkPfzbWgVXvPYV/hEYO4+w6webn1y05wZz8Z', 'P:4295869482', 'Deutsche Bank AG', '2020-02-10T16:59:57.877Z', Decimal('0.

### Retrieve Into DataFrame , ScatterPlot by Exact Date
(assetId 4295903693 Carnival Corp, year 2020)

In [88]:
from sqlalchemy import create_engine
import pymysql

import pandas as pd
    
import plotly.graph_objects as go
import plotly.express as px
import plotly.offline as pyof

import plotly.io as pio  #Lab
pio.renderers.default = 'iframe' # 'browser' # 'png'  #Lab

def retrieveIntoDataFrame(tableName, assetId, year, minRel):

    MIN_RELEVANCE = str(minRel) #'0.75'
    RETRIEVE_QUERY = 'SELECT * FROM ' + tableName + ' WHERE assetId=\''+ assetId + '\' AND relevance > '+ MIN_RELEVANCE+ 'AND emeaTimestamp LIKE \''+year+'%%\''; 

    db_connection_str = 'mysql+pymysql://'+USERNAME+':'+PASSWORD+'@localhost/'+DATABASE
    db_connection = create_engine(db_connection_str)

    df = pd.read_sql(RETRIEVE_QUERY, con=db_connection)
    
    return df

def scatterPlotDf(df):
#    pyof.init_notebook_mode(connected=False)  #Notebook
    
    fig = px.scatter(df, x='emeaTimestamp', y='sentimentNegative', color='assetId') 
    #fig.update_layout(autosize=False,
    #    margin=dict(l=20, r=20, t=20, b=20),
    #    paper_bgcolor="LightSteelBlue",
    #)

    fig.show(renderer='iframe')

dfCarni = retrieveIntoDataFrame('SCORES', 'P:4295903693', '2020', 0.75)

#print(dfCarni)
    
scatterPlotDf(dfCarni)


### Line Chart Number of Rated Stories By Date
(Carnival Corp 2020)

Let us observe the volume of rated stories peaks that correspond to the stock getting heavy new coverage 

In [95]:
def lineChartVolRatedByDate(dfMine):
    dfMine['emeaTimestampDate'] = dfMine['emeaTimestamp'].str[:10]
    dfMine = dfMine['emeaTimestampDate'] .value_counts(sort = False).to_frame().sort_index().reset_index()

    dfMine

    fig = px.line(dfMine,x='index', y='emeaTimestampDate')
    fig.show()

lineChartVolRatedByDate(dfCarni)

### Line Chart Sum of Ratings of Rated Stories By Date As Is
(Carnival Corp 2020)

* Not very clear in terms of pattern
* On peaks we can tell which outweighs, negative or positive

In [138]:
def lineChartAveByDateAsIs(dfMine):
    dfMine['emeaTimestampDate'] = dfMine['emeaTimestamp'].str[:10]
    dfMine = dfMine.groupby(['emeaTimestampDate']).sum() 
 #   print(dfMine)
    
    fig3 = px.line(dfMine,x=dfMine.index, y=['sentimentNegative', 'sentimentPositive'])
    fig3.show()
lineChartAveByDateAsIs(dfCarni)

### Line Chart Average Ratings of Rated Stories By Date Rolling
(Carnival Corp 2020)

Pattern is more clear

In [90]:
def lineChartAveByDate(dfMine):
    dfMine['emeaTimestampDate'] = dfMine['emeaTimestamp'].str[:10]
    dfMine = dfMine.groupby(['emeaTimestampDate']).mean()
    dfMine['DifferencePositiveNeutral'] = dfMine.sentimentPositive  - dfMine.sentimentNeutral 
    dfMine['DifferenceNeutralNegative'] = dfMine.sentimentNeutral - dfMine.sentimentNegative 
    dfMine['SpreadPositiveNegative'] = dfMine.sentimentPositive  - dfMine.sentimentNegative
    
    dfMine['sentimentNegativeRollingMean'] = dfMine.sentimentNegative.rolling(10).mean() 
    dfMine['sentimentPositiveRollingMean'] = dfMine.sentimentPositive.rolling(10).mean() 
    dfMine['SpreadPositiveNegativeRollingMean'] = dfMine.SpreadPositiveNegative.rolling(10).mean()   
 #   print(dfMine)
    
    fig3 = px.line(dfMine,x=dfMine.index, y=['sentimentNegativeRollingMean', 'sentimentPositiveRollingMean','SpreadPositiveNegativeRollingMean'])
    fig3.show()


In [91]:
lineChartAveByDate(dfCarni)

### Chart Price History via Eikon Data API
(Carnival Corp, 2020)

Feb 18th appers tobe be a key point, News Sentiment falls sharply from  late January till Feb 18th, and that's when closing price begin to fall sharply

In [59]:
import eikon as ek
ek.set_app_key('8a0a51d096f34eec86dadab1763ad94dae81c80e')

dfD, err = ek.get_data('CCL',['TR.PriceClose.date','TR.PriceClose'],{'SDate':'2020-01-01', 'EDate':'2020-12-31'})
#print(dfD)
fig = px.line(dfD, x='Date', y='Price Close', color='Instrument')
fig.show()

### Scatter Plot Negative Sentiment
(Saudi Oil, 2020)

In [92]:
dfSaudiOil = retrieveIntoDataFrame('SCORES', 'P:4298459348', '2020', 0.75)

#print(dfSaudiOil)
    
scatterPlotDf(dfSaudiOil)

In [96]:
lineChartVolRatedByDate(dfSaudiOil)

In [94]:
lineChartAveByDate(dfSaudiOil)

### Chart Price History via Eikon Data API
(Saudi Oil, 2020)

March 16th appears to be a key point, heavy volume of news ratings, May 12th, Aug 9th and almost Nov 3rd (seems the price started moving slightly ealrier on Nov 1st)

In [55]:
dfD, err = ek.get_data('2222.SE',['TR.PriceClose.date','TR.PriceClose'],{'SDate':'2020-01-01', 'EDate':'2020-12-31'})
#print(dfD)
fig = px.line(dfD, x='Date', y='Price Close', color='Instrument')
fig.show()

### Scatter Plot Negative Sentiment
(Facebook 2018)

Here the chart appears to contain distinctive pattern

Let us note mid-March avalance increase in ratings, and a smaller one at the end of July

In [67]:
dfFb18 = retrieveIntoDataFrame('SCORES', '4297297477','2018',0.75)

#print(dfFb18)
    
scatterPlotDf(dfFb18)

### Line Chart Number of ratings by Date
(Facebook 2018)

In [97]:
lineChartVolRatedByDate(dfFb18)

In [69]:
lineChartAveByDate(dfFb18)

### Retrieve Prices and Line Chart
(Facebook 2018)

The patter is reflected in prices for the same period of time

In [71]:
df18, err = ek.get_data('FB.O',['TR.PriceClose.date','TR.PriceClose'],{'SDate':'2018-01-01', 'EDate':'2018-12-31'})
#print(df18)

We note the corresponding change in prices around mid-March and again at the end of July

In [72]:
import plotly.io as pio  #Lab
pio.renderers.default = 'iframe' # 'browser' # 'png'  #Lab
# pyof.init_notebook_mode(connected=True)  #Notebook
 
fig = px.line(df18, x='Date', y='Price Close', color='Instrument')
fig.show()