# Project 4: Predicting Volatility Index price with Sentiment Analysis on News headlines

## This is the final Script to run on daily basis for prediction of Vix index and sentiment review.

# Dataset 

We will retrieve a set of data from a api source called News Api.

News API is a simple HTTP REST API for searching and retrieving live articles from all over the web, in this case we have choosen to retrive top news headlnes.

**News headlines** consist of : 

- Top 10 BBC Headlines
- Top 10 Google Headlines
- Top 10 Tech Crunch Headlines
- Top 20 Trump Headlines
- Top 20 UK headlines
- Top 20 US headlines

Source  :  https://newsapi.org/docs/endpoints/top-headlines

# Model

We will run through the our NewsApi through the models below :

1. Chosen Classifier Model  (3 stacked LSTM)

2. TradingSentiment Tool    (Textblob)

In [0]:
#Remember to install these libraries before hand 
#!pip install newsapi
#!pip install newsapi-python
#!pip install keras
#!pip install tensorflow
#!pip install textblob

In [0]:
# get some libraries that will be useful
import numpy as np # linear algebra
import pandas as pd
import string

# the Naive Bayes model
from sklearn.naive_bayes import MultinomialNB
# function to split the data for cross-validation
from sklearn.model_selection import train_test_split
# function for transforming documents into counts
from sklearn.feature_extraction.text import CountVectorizer

#keras modeling
from keras.utils import np_utils
from keras.models import Sequential
from keras.layers.recurrent import LSTM
from keras.preprocessing import sequence
from keras.preprocessing.text import Tokenizer
from keras.layers.embeddings import Embedding
from keras.layers import Dense, Dropout, Activation
from keras.layers.convolutional import Convolution1D, Conv1D, MaxPooling1D

#Sentiment modelling
from textblob import TextBlob

#to filter out selected dates from dataset
import datetime
import requests

Using TensorFlow backend.


In [0]:
#to import Libraries to import files from Drive into Google-colab
from pydrive.auth import GoogleAuth
from pydrive.drive import GoogleDrive
from google.colab import auth
from oauth2client.client import GoogleCredentials

In [0]:
#authenticate email ID
auth.authenticate_user()
gauth = GoogleAuth()
gauth.credentials = GoogleCredentials.get_application_default()
drive = GoogleDrive(gauth)

To obtain the ID below to get the file, please follow the link attached step by step
LINK to import files from Google Drive to Google Colab : https://buomsoo-kim.github.io/colab/2018/04/16/Importing-files-from-Google-Drive-in-Google-Colab.md/



In [0]:
#to get the file
downloaded = drive.CreateFile({'id':'yourownid'}) #follow the link above to input your own ID
#download the file
downloaded.GetContentFile('final_dataframe.csv')

In [0]:
#Read the file
Finaldf = pd.read_csv("final_dataframe.csv")
Finaldf.head()

Unnamed: 0,Date,all25,upordown
0,2008-08-08,"0,b""georgia 'downs two russian warplanes' as c...",0.0
1,2008-08-11,"1,b'why wont america and nato help us? if they...",0.0
2,2008-08-12,"0,b'remember that adorable 9-year-old who sang...",1.0
3,2008-08-13,"0,b' u.s. refuses israel weapons to attack ira...",0.0
4,2008-08-14,"1,b'all the experts admit that we should legal...",0.0


# Build our model first. 

In [0]:
#we will resue the same dataset from our model to train up the model.
X = Finaldf['all25']
y = Finaldf['upordown']

In [0]:
X_train, X_val, y_train, y_val = train_test_split(X, y, test_size=0.50,stratify = y)

In [0]:
#num_words - This will be the maximum number of words 
#from our resulting tokenized data vocabulary which are to be used, 
#truncated after the 10000 most common words in our case.
tokenizer = Tokenizer(num_words=10000)
# Tokenize our training data'trainheadlines'
tokenizer.fit_on_texts(X_train)
# Encode training data sentences into sequences for both train and test data.
sequences_train = tokenizer.texts_to_sequences(X_train)
sequences_test = tokenizer.texts_to_sequences(X_val)

In [0]:
#Features for model training
#nb_classes - total number of classes.
nb_classes = 2
# maxlen is feature of maximum sequence length for padding our encoded sentences
maxlen = 200
# Pad the training sequences as we need our encoded sequences to be of the same length. 
# use that to pad all other sequences with extra '0's at the end ('post') and
# will also truncate any sequences longer than maximum length from the end ('post') as well. 
X_train = sequence.pad_sequences(sequences_train, maxlen=maxlen)
X_val = sequence.pad_sequences(sequences_test, maxlen=maxlen)
#convert them into array before we put them into model
y_train = np.array(y_train)
y_val = np.array(y_val)
# np_utils.to_categorical to convert array of labeled data(from 0 to nb_classes-1) to one-hot vector.
Y_train = np_utils.to_categorical(y_train, 2)
Y_val = np_utils.to_categorical(y_val, 2)

In [0]:
print('Build LSTM model...')
# expected input data shape: (batch_size, timesteps, data_dim)
data_dim = 16
timesteps = 8
max_features = 10000
#intialize model
model = Sequential()
#Embedding with 128
model.add(Embedding(max_features, 128))
# returns 16 sequences of vectors of dimension 32
model.add(LSTM(32, return_sequences=True,input_shape=(timesteps, 16)))  
# returns a sequence of vectors of dimension 32
model.add(LSTM(32, return_sequences=True)) 
# return a single vector of dimension 32
model.add(LSTM(32))  
model.add(Dense(nb_classes))
model.add(Activation('softmax'))
#Compile model
model.compile(loss='binary_crossentropy',
              optimizer='adam',
              metrics=['accuracy'])

Build LSTM model...


In [0]:
# Final evaluation of the model
model.fit(X_train, Y_train, batch_size=64, epochs=5, validation_data=(X_val, Y_val))

  "Converting sparse IndexedSlices to a dense Tensor of unknown shape. "


Train on 994 samples, validate on 995 samples
Epoch 1/5
Epoch 2/5
Epoch 3/5
Epoch 4/5
Epoch 5/5


<keras.callbacks.callbacks.History at 0x7fd1a2d26668>

# Lets proceed to get our news headlines from NEWSAPI

In [0]:
# BBC Top 10 headlines 

In [0]:
url = ('http://newsapi.org/v2/top-headlines?sources=bbc-news&apiKey=b92f31e6a03f4cf8a1fb120e90ef5451')
headers ={'User-agent':'yourownheader'}

To test the connection of API

In [0]:
res = requests.get(url,headers = headers)

In [0]:
res.status_code

200

In [0]:
bbcheadline=res.json()
sorted(bbcheadline.keys())
bbcheadline = bbcheadline['articles']

In [0]:
#convert to dataframe first
df =  pd.DataFrame(bbcheadline)
df

Unnamed: 0,source,author,title,description,url,urlToImage,publishedAt,content
0,"{'id': 'bbc-news', 'name': 'BBC News'}",BBC News,India evacuates millions ahead of super cyclone,The storm is expected to make landfall on Wedn...,http://www.bbc.co.uk/news/world-asia-india-527...,https://ichef.bbci.co.uk/news/1024/branded_new...,2020-05-19T05:45:03Z,Image copyrightGetty ImagesImage caption\r\n T...
1,"{'id': 'bbc-news', 'name': 'BBC News'}",BBC News,Trump gives WHO ultimatum over virus handling,The US president accuses the UN agency of havi...,http://www.bbc.co.uk/news/world-us-canada-5271...,https://ichef.bbci.co.uk/news/1024/branded_new...,2020-05-19T04:56:47Z,Image copyrightGetty ImagesImage caption\r\n M...
2,"{'id': 'bbc-news', 'name': 'BBC News'}",BBC News,Myanmar drugs seizure 'off the scale',Police find the biggest ever haul of synthetic...,http://www.bbc.co.uk/news/world-asia-52712014,https://ichef.bbci.co.uk/news/1024/branded_new...,2020-05-19T03:17:19Z,Image copyrightReutersImage caption\r\n The dr...
3,"{'id': 'bbc-news', 'name': 'BBC News'}",BBC News,Coronavirus updates: Trump slams WHO as 'puppe...,President Trump - who faces re-election this y...,http://www.bbc.co.uk/news/live/world-52717664,https://m.files.bbci.co.uk/modules/bbc-morph-n...,2020-05-19T02:07:25.9107814Z,US President Donald Trump has accused the Worl...
4,"{'id': 'bbc-news', 'name': 'BBC News'}",BBC News,The secret in your sneakers,The sneaker industry now accounts for almost h...,http://www.bbc.co.uk/news/stories-52708487,https://ichef.bbci.co.uk/news/1024/branded_new...,2020-05-18T23:39:59Z,
5,"{'id': 'bbc-news', 'name': 'BBC News'}",BBC News,What's going wrong in Sweden's care homes?,Swedish healthcare is under scrutiny over the ...,http://www.bbc.co.uk/news/world-europe-52704836,https://ichef.bbci.co.uk/news/1024/branded_new...,2020-05-18T23:13:57Z,Image copyrightGetty ImagesImage caption\r\n S...
6,"{'id': 'bbc-news', 'name': 'BBC News'}",BBC News,Pompeo denies 'retaliation' in State Departmen...,The US Secretary of State denied wanting an in...,http://www.bbc.co.uk/news/world-us-canada-5271...,https://ichef.bbci.co.uk/news/1024/branded_new...,2020-05-18T21:12:25Z,Image copyrightGetty ImagesImage caption\r\n S...
7,"{'id': 'bbc-news', 'name': 'BBC News'}",BBC News,Trump taking unproven drug to ward off coronav...,"Speaking at the White House, he told reporters...",http://www.bbc.co.uk/news/world-us-canada-5271...,https://ichef.bbci.co.uk/news/1024/branded_new...,2020-05-18T21:02:58Z,Image copyrightEPAImage caption\r\n Donald Tru...
8,"{'id': 'bbc-news', 'name': 'BBC News'}",BBC News,FBI: US naval base attack 'motivated by Al-Qaeda',The Saudi Air Force officer who killed three A...,http://www.bbc.co.uk/news/world-us-canada-5271...,https://ichef.bbci.co.uk/news/1024/branded_new...,2020-05-18T16:41:04Z,Image copyrightEPAImage caption\r\n The attack...
9,"{'id': 'bbc-news', 'name': 'BBC News'}",BBC News,First hints coronavirus vaccine trains immune ...,Larger trials now needed to see if the vaccine...,http://www.bbc.co.uk/news/health-52677203,https://ichef.bbci.co.uk/news/1024/branded_new...,2020-05-18T15:05:10Z,Image copyrightGetty Images\r\nThe first hints...


In [0]:
# Google Top 10 headlines

In [0]:
url2 = ('http://newsapi.org/v2/top-headlines?sources=google-news&apiKey=b92f31e6a03f4cf8a1fb120e90ef5451')

In [0]:
res = requests.get(url2,headers = headers)

In [0]:
res.status_code

200

In [0]:
googleheadline=res.json()
sorted(googleheadline.keys())
googleheadline = googleheadline['articles']

In [0]:
#convert to dataframe first
df2 =  pd.DataFrame(googleheadline)
df2

Unnamed: 0,source,author,title,description,url,urlToImage,publishedAt,content
0,"{'id': 'google-news', 'name': 'Google News'}","Analysis by Oliver Darcy, CNN Business",Fox News can't get its message straight on hyd...,Throughout the late-afternoon and into the nig...,https://www.cnn.com/2020/05/19/media/fox-news-...,https://cdn.cnn.com/cnnnext/dam/assets/2005190...,2020-05-19T05:12:52+00:00,Throughout the late-afternoon and into the nig...
1,"{'id': 'google-news', 'name': 'Google News'}",https://www.facebook.com/bbcnews,Trump gives WHO ultimatum over virus handling,The US president accuses the UN agency of havi...,https://www.bbc.com/news/world-us-canada-52718309,https://ichef.bbci.co.uk/news/1024/branded_new...,2020-05-19T04:58:07+00:00,Image copyrightGetty ImagesImage caption\r\n M...
2,"{'id': 'google-news', 'name': 'Google News'}",Ed O'Keefe,"Trump tells governors on reopening: ""We will s...",The president made the remarks in a teleconfer...,https://www.cbsnews.com/news/trump-tells-gover...,https://cbsnews3.cbsistatic.com/hub/i/r/2020/0...,2020-05-19T04:27:24+00:00,President Trump told the nation's governors Mo...
3,"{'id': 'google-news', 'name': 'Google News'}","Hannah Knowles, Marisa Iati",An officer allegedly showed off explicit photo...,"The officer, Miguel Deras, allegedly bragged t...",https://www.washingtonpost.com/education/2020/...,https://www.washingtonpost.com/wp-apps/imrs.ph...,2020-05-19T04:08:42+00:00,"The officer, Miguel Deras, bragged about being..."
4,"{'id': 'google-news', 'name': 'Google News'}","Fiona Kelliher, Maggie Angst","Most counties may reopen in-store retail, hair...",Counties may have no greater than 5% increase ...,https://www.mercurynews.com/most-counties-may-...,https://www.mercurynews.com/wp-content/uploads...,2020-05-19T01:37:24+00:00,CLICK HERE if you’re having a problem viewing ...
5,"{'id': 'google-news', 'name': 'Google News'}",Victor Garcia,Nunes claims 'dozens' of unmaskings of Trump a...,House Intelligence Committee ranking member De...,https://www.foxnews.com/media/devin-nunes-unma...,https://static.foxnews.com/foxnews.com/content...,2020-05-19T00:16:19+00:00,House Intelligence Committee ranking member De...
6,"{'id': 'google-news', 'name': 'Google News'}",Aris Folley,Beachgoers flock to Virginia Beach oceanfront ...,Beachgoers crowded the Virginia Beach oceanfro...,https://thehill.com/homenews/state-watch/49839...,https://thehill.com/sites/default/files/northa...,2020-05-18T21:39:30+00:00,Beachgoers crowded the Virginia Beach oceanfro...
7,"{'id': 'google-news', 'name': 'Google News'}",Carol Morello,Pompeo says he didn’t know fired inspector gen...,The secretary of state confirmed that he asked...,https://www.washingtonpost.com/national-securi...,https://www.washingtonpost.com/wp-apps/imrs.ph...,2020-05-18T21:09:48+00:00,I went to the president and made clear to him ...
8,"{'id': 'google-news', 'name': 'Google News'}","Alaa Elassar, CNN",A man who wore a watermelon on his head while ...,"A pair of melon heads -- yes, actual people wi...",https://www.cnn.com/2020/05/18/us/watermelon-h...,https://cdn.cnn.com/cnnnext/dam/assets/2005181...,2020-05-18T20:55:00+00:00,"(CNN)A pair of melon heads -- yes, actual peop..."
9,"{'id': 'google-news', 'name': 'Google News'}",Jason Hall,Oil Stocks Are Surging on This News From China,Chinese oil demand is said to be closing in on...,https://www.fool.com/investing/2020/05/18/oil-...,https://g.foolcdn.com/editorial/images/574922/...,2020-05-18T19:13:00+00:00,What happened\r\nShares of independent oil pro...


In [0]:
# Tech Crunch Top 10 Headlines

In [0]:
url4 = ('http://newsapi.org/v2/top-headlines?sources=techcrunch&apiKey=b92f31e6a03f4cf8a1fb120e90ef5451')

In [0]:
res = requests.get(url4,headers = headers)
res.status_code

200

In [0]:
techcrunchheadlines=res.json()
sorted(techcrunchheadlines.keys())
techcrunchheadlines = techcrunchheadlines['articles']

In [0]:
df3 =  pd.DataFrame(techcrunchheadlines)
df3

Unnamed: 0,source,author,title,description,url,urlToImage,publishedAt,content
0,"{'id': 'techcrunch', 'name': 'TechCrunch'}",Catherine Shu,SoftBank reportedly plans to sell about $20 bi...,SoftBank Group Corp. is currently seeking buye...,https://techcrunch.com/2020/05/18/softbank-rep...,https://techcrunch.com/wp-content/uploads/2020...,2020-05-19T05:57:31Z,SoftBank Group Corp. is currently seeking buye...
1,"{'id': 'techcrunch', 'name': 'TechCrunch'}",Rita Liao,"With 170M users, Bilibili is the nearest thing...","Bilibili, a Chinese video streaming website th...",https://techcrunch.com/2020/05/18/with-170m-us...,https://techcrunch.com/wp-content/uploads/2020...,2020-05-19T04:22:17Z,"Bilibili, a Chinese video streaming website th..."
2,"{'id': 'techcrunch', 'name': 'TechCrunch'}",Brian Heater,"36M Americans have filed for unemployment, but...","Last week, U.S. unemployment claims hit 36 mil...",https://techcrunch.com/2020/05/18/36m-american...,https://techcrunch.com/wp-content/uploads/2020...,2020-05-18T22:58:43Z,"Last week, U.S. unemployment claims hit 36 mil..."
3,"{'id': 'techcrunch', 'name': 'TechCrunch'}",Eric Peckham,"With stadiums closed, TV networks turn to live...","Two years from now, people will look back at 2...",https://techcrunch.com/2020/05/18/with-stadium...,https://techcrunch.com/wp-content/uploads/2020...,2020-05-18T21:13:15Z,The COVID-19 pandemic has wiped out the spring...
4,"{'id': 'techcrunch', 'name': 'TechCrunch'}",Anthony Ha,Disney streaming exec Kevin Mayer becomes TikT...,"Kevin Mayer, head of The Walt Disney Company’s...",https://techcrunch.com/2020/05/18/disney-kevin...,https://techcrunch.com/wp-content/uploads/2020...,2020-05-18T21:05:39Z,"Kevin Mayer, head of The Walt Disney Company’s..."
5,"{'id': 'techcrunch', 'name': 'TechCrunch'}","Danny Crichton, Arman Tabatabai",Arm’s financials and the blurring future of th...,Amidst the blitz of SoftBank earnings news tod...,https://techcrunch.com/2020/05/18/arms-financi...,https://techcrunch.com/wp-content/uploads/2020...,2020-05-18T20:51:10Z,Amidst the blitz of SoftBank earnings news tod...
6,"{'id': 'techcrunch', 'name': 'TechCrunch'}",Brian Heater,Google is piloting a simpler Nest Hub Max inte...,"Last week, Mount Sinai showcased how it’s star...",https://techcrunch.com/2020/05/18/google-is-pi...,https://techcrunch.com/wp-content/uploads/2020...,2020-05-18T19:54:19Z,"Last week, Mount Sinai showcased how its start..."
7,"{'id': 'techcrunch', 'name': 'TechCrunch'}",Ron Miller,Verizon wraps up BlueJeans acquisition lickety...,When Verizon (which owns this publication) ann...,https://techcrunch.com/2020/05/18/verizon-wrap...,https://techcrunch.com/wp-content/uploads/2020...,2020-05-18T19:50:28Z,When Verizon (which owns this publication) ann...
8,"{'id': 'techcrunch', 'name': 'TechCrunch'}",Alex Wilhelm,What SoftBank's Vision Fund results tell us ab...,A famous investor published notes today concer...,https://techcrunch.com/2020/05/18/what-softban...,https://techcrunch.com/wp-content/themes/techc...,2020-05-18T18:26:13Z,A famous investor published notes today concer...
9,"{'id': 'techcrunch', 'name': 'TechCrunch'}",Danny Crichton,"As Jack Ma and SoftBank part ways, the open an...",It would be one of the greatest startup invest...,https://techcrunch.com/2020/05/18/as-jack-ma-a...,https://techcrunch.com/wp-content/uploads/2020...,2020-05-18T16:39:30Z,It would be one of the greatest startup invest...


In [0]:
# Trump Top 20 headlines

In [0]:
url5 = ('http://newsapi.org/v2/top-headlines?q=trump&apiKey=b92f31e6a03f4cf8a1fb120e90ef5451')

In [0]:
res = requests.get(url5,headers = headers)

In [0]:
res.status_code

200

In [0]:
trumpheadlines=res.json()
sorted(trumpheadlines.keys())
trumpheadlines = trumpheadlines['articles']

In [0]:
df4 =  pd.DataFrame(trumpheadlines)
df4

Unnamed: 0,source,author,title,description,url,urlToImage,publishedAt,content
0,"{'id': 'the-irish-times', 'name': 'The Irish T...",The Irish Times,WHO to investigate own response to coronavirus...,Donald Trump claims organisation did a ‘very s...,https://www.irishtimes.com/\t\t\t\t\t\t\t/news...,https://www.irishtimes.com/image-creator/?id=1...,2020-05-19T06:55:56Z,The World Health Organisation (WHO) has bowed ...
1,"{'id': 'google-news-ca', 'name': 'Google News ...",Kate Mayberry,Trump issues WHO ultimatum over coronavirus: L...,Trump gives WHO director-general 30 days to ma...,https://www.aljazeera.com/news/2020/05/trump-a...,https://www.aljazeera.com/mritems/Images/2020/...,2020-05-19T06:47:26+00:00,<ul><li>US President Donald Trump has threaten...
2,"{'id': 'usa-today', 'name': 'USA Today'}",,Trump threatens to permanently cut WHO funding...,"Trump's threat, made in a letter to WHO Direct...",https://www.usatoday.com/story/news/world/2020...,https://www.gannett-cdn.com/presto/2020/05/19/...,2020-05-19T06:45:22+00:00,President Donald Trump said Monday he is takin...
3,"{'id': 'financial-times', 'name': 'Financial T...",,Trump says he is taking hydroxychloroquine to ...,US president dismisses concerns about antimala...,https://www.ft.com/content/971ae825-b6d3-4ff5-...,https://www.ft.com/__origami/service/image/v2/...,2020-05-19T06:37:19.7060959Z,Donald Trump has been taking hydroxychloroquin...
4,"{'id': 'australian-financial-review', 'name': ...",Phillip Coorey,Is Trump behind China's trade war with Australia?,While Australia's relationship with China goes...,http://www.afr.com/politics/federal/is-trump-b...,https://static.ffx.io/images/$zoom_0.3448%2C$m...,2020-05-19T05:56:33Z,"Last week, after Beijing had issued its draft ..."
5,"{'id': 'google-news-au', 'name': 'Google News ...",Eryk Bagshaw,Trump threatens to cut off WHO funding permane...,In a fiery letter to Director General Tedros A...,https://www.smh.com.au/world/north-america/tru...,https://static.ffx.io/images/$zoom_0.1765%2C$m...,2020-05-19T05:47:22+00:00,Washington: President Donald Trump escalated h...
6,"{'id': 'google-news-in', 'name': 'Google News ...",AP,Trump says he's taking malaria drug in case he...,US News: President Donald Trump said Monday th...,https://timesofindia.indiatimes.com/world/us/t...,"https://static.toiimg.com/thumb/msid-75816005,...",2020-05-19T05:31:17+00:00,US News: President Donald Trump said Monday th...
7,"{'id': 'news24', 'name': 'News24'}",,Trump threatens permanent freeze on WHO fundin...,US President Donald Trump has threatened to pe...,https://www.news24.com/World/News/trump-threat...,http://cdn.24.co.za/files/Cms/General/d/10164/...,2020-05-19T05:25:47+00:00,US President Donald Trump threatened to perman...
8,"{'id': 'el-mundo', 'name': 'El Mundo'}",,"Coronavirus España hoy, noticias de última hor...",Arranca la semana en que Pedro Sánchez tratará...,https://www.elmundo.es/ciencia-y-salud/ciencia...,https://i.ytimg.com/vi/_t9DvH1YEMk/maxresdefau...,2020-05-19T05:25:13Z,Arranca la semana en que Pedro Sánchez tratará...
9,"{'id': 'google-news-ca', 'name': 'Google News ...",Sean Boynton,Trump tells WHO he’ll make funding freeze perm...,"In a letter to the WHO's leader, Trump also sa...",http://globalnews.ca/news/6957773/trump-who-fu...,https://shawglobalnews.files.wordpress.com/202...,2020-05-19T05:24:08+00:00,U.S. President Donald Trump told the World Hea...


In [0]:
# UK HEADLINE 20 headlines

In [0]:
url6 = ('http://newsapi.org/v2/top-headlines?country=gb&apiKey=4ac92a95346643fdbdb26a7e4d0e98b1')

In [0]:
res = requests.get(url6,headers = headers)
res.status_code

200

In [0]:
ukheadlines=res.json()
sorted(ukheadlines.keys())
ukheadlines = ukheadlines['articles']

In [0]:
df5 =  pd.DataFrame(trumpheadlines)
df5

Unnamed: 0,source,author,title,description,url,urlToImage,publishedAt,content
0,"{'id': 'the-irish-times', 'name': 'The Irish T...",The Irish Times,WHO to investigate own response to coronavirus...,Donald Trump claims organisation did a ‘very s...,https://www.irishtimes.com/\t\t\t\t\t\t\t/news...,https://www.irishtimes.com/image-creator/?id=1...,2020-05-19T06:55:56Z,The World Health Organisation (WHO) has bowed ...
1,"{'id': 'google-news-ca', 'name': 'Google News ...",Kate Mayberry,Trump issues WHO ultimatum over coronavirus: L...,Trump gives WHO director-general 30 days to ma...,https://www.aljazeera.com/news/2020/05/trump-a...,https://www.aljazeera.com/mritems/Images/2020/...,2020-05-19T06:47:26+00:00,<ul><li>US President Donald Trump has threaten...
2,"{'id': 'usa-today', 'name': 'USA Today'}",,Trump threatens to permanently cut WHO funding...,"Trump's threat, made in a letter to WHO Direct...",https://www.usatoday.com/story/news/world/2020...,https://www.gannett-cdn.com/presto/2020/05/19/...,2020-05-19T06:45:22+00:00,President Donald Trump said Monday he is takin...
3,"{'id': 'financial-times', 'name': 'Financial T...",,Trump says he is taking hydroxychloroquine to ...,US president dismisses concerns about antimala...,https://www.ft.com/content/971ae825-b6d3-4ff5-...,https://www.ft.com/__origami/service/image/v2/...,2020-05-19T06:37:19.7060959Z,Donald Trump has been taking hydroxychloroquin...
4,"{'id': 'australian-financial-review', 'name': ...",Phillip Coorey,Is Trump behind China's trade war with Australia?,While Australia's relationship with China goes...,http://www.afr.com/politics/federal/is-trump-b...,https://static.ffx.io/images/$zoom_0.3448%2C$m...,2020-05-19T05:56:33Z,"Last week, after Beijing had issued its draft ..."
5,"{'id': 'google-news-au', 'name': 'Google News ...",Eryk Bagshaw,Trump threatens to cut off WHO funding permane...,In a fiery letter to Director General Tedros A...,https://www.smh.com.au/world/north-america/tru...,https://static.ffx.io/images/$zoom_0.1765%2C$m...,2020-05-19T05:47:22+00:00,Washington: President Donald Trump escalated h...
6,"{'id': 'google-news-in', 'name': 'Google News ...",AP,Trump says he's taking malaria drug in case he...,US News: President Donald Trump said Monday th...,https://timesofindia.indiatimes.com/world/us/t...,"https://static.toiimg.com/thumb/msid-75816005,...",2020-05-19T05:31:17+00:00,US News: President Donald Trump said Monday th...
7,"{'id': 'news24', 'name': 'News24'}",,Trump threatens permanent freeze on WHO fundin...,US President Donald Trump has threatened to pe...,https://www.news24.com/World/News/trump-threat...,http://cdn.24.co.za/files/Cms/General/d/10164/...,2020-05-19T05:25:47+00:00,US President Donald Trump threatened to perman...
8,"{'id': 'el-mundo', 'name': 'El Mundo'}",,"Coronavirus España hoy, noticias de última hor...",Arranca la semana en que Pedro Sánchez tratará...,https://www.elmundo.es/ciencia-y-salud/ciencia...,https://i.ytimg.com/vi/_t9DvH1YEMk/maxresdefau...,2020-05-19T05:25:13Z,Arranca la semana en que Pedro Sánchez tratará...
9,"{'id': 'google-news-ca', 'name': 'Google News ...",Sean Boynton,Trump tells WHO he’ll make funding freeze perm...,"In a letter to the WHO's leader, Trump also sa...",http://globalnews.ca/news/6957773/trump-who-fu...,https://shawglobalnews.files.wordpress.com/202...,2020-05-19T05:24:08+00:00,U.S. President Donald Trump told the World Hea...


In [0]:
# Top 20 Headlines in US

In [0]:
url7 = ('http://newsapi.org/v2/top-headlines?country=us&apiKey=b92f31e6a03f4cf8a1fb120e90ef5451')

In [0]:
res = requests.get(url7,headers = headers)
res.status_code

200

In [0]:
usheadlines=res.json()
sorted(usheadlines.keys())
usheadlines = usheadlines['articles']

In [0]:
df6 =  pd.DataFrame(usheadlines)
df6

Unnamed: 0,source,author,title,description,url,urlToImage,publishedAt,content
0,"{'id': None, 'name': 'Bbc.com'}",https://www.facebook.com/bbcnews,Coronavirus: Trump says he is taking unproven ...,The president claims hydroxychloroquine is har...,https://www.bbc.com/news/world-us-canada-52717161,https://ichef.bbci.co.uk/images/ic/1024x576/p0...,2020-05-19T05:55:13Z,Media playback is unsupported on your device\r...
1,"{'id': None, 'name': 'Nytimes.com'}",Catherine Porter,Toronto Fox Family Transfixes City Under Lockd...,Canada’s largest city was politely abiding by ...,https://www.nytimes.com/2020/05/18/world/canad...,https://static01.nyt.com/images/2020/05/17/wor...,2020-05-19T04:51:31Z,TORONTO A crowd of people bunched shoulder to ...
2,"{'id': 'cnn', 'name': 'CNN'}","Ben Westcott, Vedika Sud and Manveena Suri, CNN",India and Bangladesh brace for the strongest s...,Millions of people in India and Bangladesh are...,https://www.cnn.com/2020/05/19/asia/super-cycl...,https://cdn.cnn.com/cnnnext/dam/assets/2005180...,2020-05-19T04:34:02Z,(CNN)Millions of people in India and Banglades...
3,"{'id': 'cbs-news', 'name': 'CBS News'}",Ed O'Keefe,"Trump tells governors on reopening: ""We will s...",The president made the remarks in a teleconfer...,https://www.cbsnews.com/news/trump-tells-gover...,https://cbsnews3.cbsistatic.com/hub/i/r/2020/0...,2020-05-19T04:27:24Z,President Trump told the nation's governors Mo...
4,"{'id': None, 'name': 'Pitchfork.com'}",Madison Bloom,Billboard Responds to Tekashi 6ix9ine’s Corrup...,The rapper recently accused Billboard of manip...,https://pitchfork.com/news/billboard-responds-...,https://media.pitchfork.com/photos/5ec34effe54...,2020-05-19T03:52:02Z,"On May 8, Tekashi 6ix9ine released GOOBA, his ..."
5,"{'id': None, 'name': 'Marketwatch.com'}",Mark DeCambre,Dow futures retreat as stock-market investors ...,,https://www.marketwatch.com/story/dow-futures-...,https://s.marketwatch.com/public/resources/ima...,2020-05-19T03:35:15Z,U.S. stock-index futures indicated a lackluste...
6,"{'id': 'the-verge', 'name': 'The Verge'}",Sam Byford,Samsung announces 50-megapixel camera sensor w...,Samsung has announced a new 50-megapixel camer...,https://www.theverge.com/2020/5/18/21263245/sa...,https://cdn.vox-cdn.com/thumbor/Y6qofE2NAKKsJ-...,2020-05-19T03:28:55Z,The Galaxy S20 Ultra was plagued with AF issue...
7,"{'id': 'cnn', 'name': 'CNN'}","Gregory Lemos, CNN",A Florida man has been stuck on a ship for 62 ...,"Taylor Grimes, from Winter Springs, Florida, t...",https://www.cnn.com/2020/05/18/us/florida-man-...,https://cdn.cnn.com/cnnnext/dam/assets/2005182...,2020-05-19T03:25:00Z,"(CNN)A man from Winter Springs, Florida, told ..."
8,"{'id': 'cnn', 'name': 'CNN'}","Sandra Gonzalez, CNN",Brian Austin Green opens up about split with w...,Brian Austin Green has opened up about parting...,https://www.cnn.com/2020/05/18/entertainment/b...,https://cdn.cnn.com/cnnnext/dam/assets/1508200...,2020-05-19T03:24:59Z,(CNN)Brian Austin Green has opened up about pa...
9,"{'id': 'cnbc', 'name': 'CNBC'}",Reuters,"Nasdaq to tighten listing rules, restricting C...",Nasdaq's new curbs on Chinese IPOs represent t...,https://www.cnbc.com/2020/05/19/nasdaq-to-tigh...,https://image.cnbcfm.com/api/v1/image/10653707...,2020-05-19T02:21:54Z,A view outside Nasdaq in Times Square during t...


In [0]:
# we will combine all news headlines together
finaldf= pd.concat([df,df2,df3,df4,df5,df6],sort=True).reset_index(drop=True)
finaldf.head()

Unnamed: 0,author,content,description,publishedAt,source,title,url,urlToImage
0,BBC News,Image copyrightGetty ImagesImage caption\r\n T...,The storm is expected to make landfall on Wedn...,2020-05-19T05:45:03Z,"{'id': 'bbc-news', 'name': 'BBC News'}",India evacuates millions ahead of super cyclone,http://www.bbc.co.uk/news/world-asia-india-527...,https://ichef.bbci.co.uk/news/1024/branded_new...
1,BBC News,Image copyrightGetty ImagesImage caption\r\n M...,The US president accuses the UN agency of havi...,2020-05-19T04:56:47Z,"{'id': 'bbc-news', 'name': 'BBC News'}",Trump gives WHO ultimatum over virus handling,http://www.bbc.co.uk/news/world-us-canada-5271...,https://ichef.bbci.co.uk/news/1024/branded_new...
2,BBC News,Image copyrightReutersImage caption\r\n The dr...,Police find the biggest ever haul of synthetic...,2020-05-19T03:17:19Z,"{'id': 'bbc-news', 'name': 'BBC News'}",Myanmar drugs seizure 'off the scale',http://www.bbc.co.uk/news/world-asia-52712014,https://ichef.bbci.co.uk/news/1024/branded_new...
3,BBC News,US President Donald Trump has accused the Worl...,President Trump - who faces re-election this y...,2020-05-19T02:07:25.9107814Z,"{'id': 'bbc-news', 'name': 'BBC News'}",Coronavirus updates: Trump slams WHO as 'puppe...,http://www.bbc.co.uk/news/live/world-52717664,https://m.files.bbci.co.uk/modules/bbc-morph-n...
4,BBC News,,The sneaker industry now accounts for almost h...,2020-05-18T23:39:59Z,"{'id': 'bbc-news', 'name': 'BBC News'}",The secret in your sneakers,http://www.bbc.co.uk/news/stories-52708487,https://ichef.bbci.co.uk/news/1024/branded_new...


Preprocessing of the dataframe for all combined news headlines.

In [0]:
# create a new column named "headlines" to combine title and description together
finaldf['headlines'] = finaldf["title"] 
#Convert headlines into string first so that we can combine via the dates later.
finaldf['headlines'] = finaldf['headlines'].astype(str)
#to remove the columns that we do not need
finaldf = finaldf.drop(['author', 'source','url','description','title','urlToImage','content'], axis=1)
#change to date to datetime.
finaldf['publishedAt'] = finaldf['publishedAt'].str[:10]
finaldf['publishedAt'] = pd.to_datetime(finaldf['publishedAt'])
#group the headlines according to dates.
finaldf = (finaldf.groupby('publishedAt').agg({'headlines' : lambda x: ' '.join(x.unique())}))
finaldf.head()

Unnamed: 0_level_0,headlines
publishedAt,Unnamed: 1_level_1
2020-05-18,The secret in your sneakers What's going wrong...
2020-05-19,India evacuates millions ahead of super cyclon...


In [0]:
#redefine our dataframe to name it new X for modelling purposes.
newX = finaldf['headlines']

# Run through our dataset on the first model 

In [0]:
#num_words - This will be the maximum number of words 
#from our resulting tokenized data vocabulary which are to be used, 
#truncated after the 10000 most common words in our case.
tokenizer = Tokenizer(num_words=10000)
# Tokenize our training data'trainheadlines'
tokenizer.fit_on_texts(newX)
# Encode training data sentences into sequences for both train and test data.
sequences_train = tokenizer.texts_to_sequences(newX)

In [0]:
#Features for model training
#nb_classes - total number of classes.
nb_classes = 2
# maxlen is feature of maximum sequence length for padding our encoded sentences
maxlen = 200

# Pad the training sequences as we need our encoded sequences to be of the same length. 
# use that to pad all other sequences with extra '0's at the end ('post') and
# will also truncate any sequences longer than maximum length from the end ('post') as well. 
newX = sequence.pad_sequences(sequences_train, maxlen=maxlen)

In [0]:
#predict y value which is the fear index for 18 May and 19 May
yhat = model.predict_classes(newX)
#time to convert into dataframe
predicty =  pd.DataFrame(yhat)
predicty

Unnamed: 0,0
0,1
1,0


We have trained our model to predict the fear sentiment in the market. 

In [0]:
# Run through our dataset on the 2nd model (sentiment analysis)

In [0]:
pol = lambda x : TextBlob(x).sentiment.polarity
sentiment = finaldf['headlines'].apply(pol)
sentiment

publishedAt
2020-05-18   -0.093939
2020-05-19    0.028348
Name: headlines, dtype: float64

In [0]:
#time to convert this into datafarme
predictions  = pd.DataFrame(sentiment)
predictions

Unnamed: 0_level_0,headlines
publishedAt,Unnamed: 1_level_1
2020-05-18,-0.093939
2020-05-19,0.028348


In [0]:
#we want to save our news headlines and our predictions as well

In [0]:
from google.colab import drive
drive.mount('/drive')

Drive already mounted at /drive; to attempt to forcibly remount, call drive.mount("/drive", force_remount=True).


In [0]:
#savings our news headlines to google drive folder
finaldf.to_csv('/drive/My Drive/Colab Notebooks/newsheadlines.csv')
#this folder is to append for past recording
finaldf.to_csv('/drive/My Drive/Colab Notebooks/pastnewsheadlines.csv')

In [0]:
#savings our predictions from model to google drive folder
merged_df = pd.concat([predictions, predicty])
merged_df

Unnamed: 0,headlines,0
2020-05-18 00:00:00,-0.093939,
2020-05-19 00:00:00,0.028348,
0,,1.0
1,,0.0


In [0]:
merged_df.to_csv('/drive/My Drive/Colab Notebooks/predictions.csv')
#this file is to append predictions for recording
merged_df.to_csv('/drive/My Drive/Colab Notebooks/pastpredictions.csv')
