# Project 4: Predicting Volatility Index price with Sentiment Analysis on News headlines

## This is the final Script to run on daily basis for prediction of Vix index and sentiment review.

# Dataset 

We will retrieve a set of data from a api source called News Api.

News API is a simple HTTP REST API for searching and retrieving live articles from all over the web, in this case we have choosen to retrive top news headlnes.

**News headlines** consist of : 

- Top 10 BBC Headlines
- Top 10 Google Headlines
- Top 10 Tech Crunch Headlines
- Top 20 Trump Headlines
- Top 20 UK headlines
- Top 20 US headlines

Source  :  https://newsapi.org/docs/endpoints/top-headlines

# Model

We will run through the our NewsApi through the models below :

1. Chosen Classifier Model  (3 stacked LSTM)

2. TradingSentiment Tool    (Textblob)

In [0]:
#Libraries to install before hand 
#!pip install newsapi
#!pip install newsapi-python
#!pip install keras
#!pip install tensorflow
#!pip install textblob

In [0]:
# get some libraries that will be useful
import numpy as np # linear algebra
import pandas as pd
import string

# the Naive Bayes model
from sklearn.naive_bayes import MultinomialNB
# function to split the data for cross-validation
from sklearn.model_selection import train_test_split
# function for transforming documents into counts
from sklearn.feature_extraction.text import CountVectorizer

#keras modeling
from keras.utils import np_utils
from keras.models import Sequential
from keras.layers.recurrent import LSTM
from keras.preprocessing import sequence
from keras.preprocessing.text import Tokenizer
from keras.layers.embeddings import Embedding
from keras.layers import Dense, Dropout, Activation
from keras.layers.convolutional import Convolution1D, Conv1D, MaxPooling1D

#Sentiment modelling
from textblob import TextBlob

#to filter out selected dates from dataset
import datetime
import requests

In [0]:
#to import Libraries to import files from Drive into Google-colab
from pydrive.auth import GoogleAuth
from pydrive.drive import GoogleDrive
from google.colab import auth
from oauth2client.client import GoogleCredentials

In [0]:
#authenticate email ID
auth.authenticate_user()
gauth = GoogleAuth()
gauth.credentials = GoogleCredentials.get_application_default()
drive = GoogleDrive(gauth)

In [0]:
#to get the file
downloaded = drive.CreateFile({'id':'1EXS1WWbIMRvI5Tz8OSnEfI-06CF8fLZf'})
#download the file
downloaded.GetContentFile('final_dataframe.csv')

In [28]:
#Read the file
Finaldf = pd.read_csv("final_dataframe.csv")
Finaldf.head()

Unnamed: 0,Date,all25,upordown
0,2008-08-08,"0,b""georgia 'downs two russian warplanes' as c...",0.0
1,2008-08-11,"1,b'why wont america and nato help us? if they...",0.0
2,2008-08-12,"0,b'remember that adorable 9-year-old who sang...",1.0
3,2008-08-13,"0,b' u.s. refuses israel weapons to attack ira...",0.0
4,2008-08-14,"1,b'all the experts admit that we should legal...",0.0


# Build our model first. 

In [0]:
#we will resue the same dataset from our model to train up the model.
X = Finaldf['all25']
y = Finaldf['upordown']

In [0]:
X_train, X_val, y_train, y_val = train_test_split(X, y, test_size=0.50,stratify = y)

In [0]:
#num_words - This will be the maximum number of words 
#from our resulting tokenized data vocabulary which are to be used, 
#truncated after the 10000 most common words in our case.
tokenizer = Tokenizer(num_words=10000)
# Tokenize our training data'trainheadlines'
tokenizer.fit_on_texts(X_train)
# Encode training data sentences into sequences for both train and test data.
sequences_train = tokenizer.texts_to_sequences(X_train)
sequences_test = tokenizer.texts_to_sequences(X_val)

In [0]:
#Features for model training
#nb_classes - total number of classes.
nb_classes = 2
# maxlen is feature of maximum sequence length for padding our encoded sentences
maxlen = 200
# Pad the training sequences as we need our encoded sequences to be of the same length. 
# use that to pad all other sequences with extra '0's at the end ('post') and
# will also truncate any sequences longer than maximum length from the end ('post') as well. 
X_train = sequence.pad_sequences(sequences_train, maxlen=maxlen)
X_val = sequence.pad_sequences(sequences_test, maxlen=maxlen)
#convert them into array before we put them into model
y_train = np.array(y_train)
y_val = np.array(y_val)
# np_utils.to_categorical to convert array of labeled data(from 0 to nb_classes-1) to one-hot vector.
Y_train = np_utils.to_categorical(y_train, 2)
Y_val = np_utils.to_categorical(y_val, 2)

In [33]:
print('Build LSTM model...')
# expected input data shape: (batch_size, timesteps, data_dim)
data_dim = 16
timesteps = 8
max_features = 10000
#intialize model
model = Sequential()
#Embedding with 128
model.add(Embedding(max_features, 128))
# returns 16 sequences of vectors of dimension 32
model.add(LSTM(32, return_sequences=True,input_shape=(timesteps, 16)))  
# returns a sequence of vectors of dimension 32
model.add(LSTM(32, return_sequences=True)) 
# return a single vector of dimension 32
model.add(LSTM(32))  
model.add(Dense(nb_classes))
model.add(Activation('softmax'))
#Compile model
model.compile(loss='binary_crossentropy',
              optimizer='adam',
              metrics=['accuracy'])

Build LSTM model...


In [34]:
# Final evaluation of the model
model.fit(X_train, Y_train, batch_size=64, epochs=5, validation_data=(X_val, Y_val))

  "Converting sparse IndexedSlices to a dense Tensor of unknown shape. "


Train on 994 samples, validate on 995 samples
Epoch 1/5
Epoch 2/5
Epoch 3/5
Epoch 4/5
Epoch 5/5


<keras.callbacks.callbacks.History at 0x7f9d0ed2a160>

# Lets proceed to get our news headlines from NEWSAPI

In [0]:
# BBC Top 10 headlines 

In [0]:
url = ('http://newsapi.org/v2/top-headlines?sources=bbc-news&apiKey=b92f31e6a03f4cf8a1fb120e90ef5451')
headers ={'User-agent':'Derrick'}

In [0]:
res = requests.get(url,headers = headers)

In [38]:
res.status_code

200

In [0]:
bbcheadline=res.json()
sorted(bbcheadline.keys())
bbcheadline = bbcheadline['articles']

In [40]:
#convert to dataframe first
df =  pd.DataFrame(bbcheadline)
df

Unnamed: 0,source,author,title,description,url,urlToImage,publishedAt,content
0,"{'id': 'bbc-news', 'name': 'BBC News'}",BBC News,Netflix sign-ups jump during coronavirus lockd...,The streaming service behind Tiger King added ...,http://www.bbc.co.uk/news/business-52376022,https://ichef.bbci.co.uk/news/1024/branded_new...,2020-04-21T23:50:29Z,Image copyrightGetty Images\r\nNetflix has see...
1,"{'id': 'bbc-news', 'name': 'BBC News'}",BBC News,"US green cards to be halted for 60 days, Trump...",The US president's immigration ban will not be...,http://www.bbc.co.uk/news/world-us-canada-5237...,https://ichef.bbci.co.uk/news/1024/branded_new...,2020-04-21T23:40:45Z,Image copyrightGetty ImagesImage caption\r\n P...
2,"{'id': 'bbc-news', 'name': 'BBC News'}",BBC News,Balancing being a doctor with single parenthood,"Dr Melanie Malloy, an intensive care doctor in...",http://www.bbc.co.uk/news/world-us-canada-5237...,https://ichef.bbci.co.uk/news/1024/branded_new...,2020-04-21T21:44:00Z,
3,"{'id': 'bbc-news', 'name': 'BBC News'}",BBC News,The (back) pains of working from home,"As people adjust to doing their jobs remotely,...",http://www.bbc.co.uk/news/world-us-canada-5235...,https://ichef.bbci.co.uk/news/1024/branded_new...,2020-04-21T21:24:25Z,
4,"{'id': 'bbc-news', 'name': 'BBC News'}",BBC News,Canada shooting death toll rises to 23,Police said on Tuesday one of the newly confir...,http://www.bbc.co.uk/news/world-us-canada-5237...,https://ichef.bbci.co.uk/news/1024/branded_new...,2020-04-21T20:54:34Z,Image copyrightGetty ImagesImage caption\r\n M...
5,"{'id': 'bbc-news', 'name': 'BBC News'}",BBC News,How 'protester in scrubs' images came about,"Photographer Alyson McClaran ""took off running...",http://www.bbc.co.uk/news/world-us-canada-5237...,https://ichef.bbci.co.uk/news/1024/branded_new...,2020-04-21T17:15:48Z,Image copyrightAlyson McClaranImage caption\r\...
6,"{'id': 'bbc-news', 'name': 'BBC News'}",BBC News,Deaths hit 20-year high - but peak may be over,Numbers dying nearly double above what would b...,http://www.bbc.co.uk/news/health-52361519,https://ichef.bbci.co.uk/news/1024/branded_new...,2020-04-21T17:10:35Z,Image copyrightGetty Images\r\nDeaths in Engla...
7,"{'id': 'bbc-news', 'name': 'BBC News'}",BBC News,World risks ‘biblical’ famines due to pandemic...,The number of people facing starvation could a...,http://www.bbc.co.uk/news/world-52373888,https://ichef.bbci.co.uk/news/1024/branded_new...,2020-04-21T16:50:05Z,Image copyrightReutersImage caption\r\n Millio...
8,"{'id': 'bbc-news', 'name': 'BBC News'}",BBC News,'Nobody told us about the coronavirus pandemic',"Before setting sail around the world, Elena an...",http://www.bbc.co.uk/news/uk-52332899,https://ichef.bbci.co.uk/news/1024/branded_new...,2020-04-21T16:17:07Z,"Image copyrightFamily photo\r\nIn 2017, Elena ..."
9,"{'id': 'bbc-news', 'name': 'BBC News'}",BBC News,Carer's touching gift for D-Day veteran,"Ken Benbow, 94, was given a special cushion wi...",http://www.bbc.co.uk/news/uk-england-lancashir...,https://ichef.bbci.co.uk/news/1024/branded_new...,2020-04-21T12:08:03Z,


In [0]:
# Google Top 10 headlines

In [0]:
url2 = ('http://newsapi.org/v2/top-headlines?sources=google-news&apiKey=b92f31e6a03f4cf8a1fb120e90ef5451')

In [0]:
res = requests.get(url2,headers = headers)

In [44]:
res.status_code

200

In [0]:
googleheadline=res.json()
sorted(googleheadline.keys())
googleheadline = googleheadline['articles']

In [46]:
#convert to dataframe first
df2 =  pd.DataFrame(googleheadline)
df2

Unnamed: 0,source,author,title,description,url,urlToImage,publishedAt,content
0,"{'id': 'google-news', 'name': 'Google News'}",Chicago Tribune staff,Coronavirus in Illinois updates: Here’s what’s...,Here are the latest updates on the coronavirus...,https://www.chicagotribune.com/coronavirus/ct-...,https://www.chicagotribune.com/resizer/1IN51pO...,2020-04-22T00:45:33+00:00,Here are the latest updates on the coronavirus...
1,"{'id': 'google-news', 'name': 'Google News'}",The Associated Press,"Trump immigration ban halts green cards, not t...",President Donald Trump announced Tuesday he wi...,https://www.cnbc.com/2020/04/21/trump-immigrat...,https://image.cnbcfm.com/api/v1/image/10649967...,2020-04-21T23:34:35+00:00,President Donald Trump announced Tuesday he wi...
2,"{'id': 'google-news', 'name': 'Google News'}",Vincent Barone,At least seven coronavirus cases linked to Wis...,The virus didn’t spare the voting booth. At le...,https://nypost.com/2020/04/21/at-least-7-coron...,https://thenypost.files.wordpress.com/2020/04/...,2020-04-21T23:30:40+00:00,The virus didnt spare the voting booth.\r\nAt ...
3,"{'id': 'google-news', 'name': 'Google News'}","Eric Bradner, CNN",Georgia Gov. Brian Kemp faces resistance over ...,Georgia Gov. Brian Kemp is running into resist...,https://www.cnn.com/2020/04/21/politics/georgi...,https://cdn.cnn.com/cnnnext/dam/assets/2004030...,2020-04-21T23:16:01+00:00,Georgia Gov. Brian Kemp is running into resist...
4,"{'id': 'google-news', 'name': 'Google News'}","Kevin Liptak, CNN",Trump says he doesn't know whether Kim Jong Un...,President Donald Trump said Tuesday he doesn't...,https://www.cnn.com/2020/04/21/politics/trump-...,https://cdn.cnn.com/cnnnext/dam/assets/1806121...,2020-04-21T23:04:13+00:00,(CNN)President Donald Trump said Tuesday he do...
5,"{'id': 'google-news', 'name': 'Google News'}",Christal Hayes,Senate approves measure to replenish halted co...,"The Paycheck Protection Program, which provide...",https://www.usatoday.com/story/news/politics/2...,https://www.gannett-cdn.com/presto/2018/12/22/...,2020-04-21T21:34:24+00:00,When asked about states' rights to decide when...
6,"{'id': 'google-news', 'name': 'Google News'}",Maggie Haberman,Dan Scavino Promoted as Meadows Shuffles White...,"Mr. Scavino, who is said to be the only White ...",https://www.nytimes.com/2020/04/21/us/politics...,https://static01.nyt.com/images/2020/04/21/us/...,2020-04-21T21:06:56+00:00,"Dan Scavino, the White House director of socia..."
7,"{'id': 'google-news', 'name': 'Google News'}","Rong-Gong Lin II, Melanie Mason",New data show coronavirus-related deaths spiki...,A study shows roughly 4% of L.A. County reside...,https://www.latimes.com/california/story/2020-...,https://ca-times.brightspotcdn.com/dims4/defau...,2020-04-21T17:23:06+00:00,A new report says that perhaps 4% of Los Angel...
8,"{'id': 'google-news', 'name': 'Google News'}",Julia Musto,Small business owner helped by Paycheck Protec...,The coronavirus funding from the government's ...,https://www.foxnews.com/media/small-business-o...,https://static.foxnews.com/foxnews.com/content...,2020-04-21T13:05:57+00:00,Get all the latest news on coronavirus and mor...
9,"{'id': 'google-news', 'name': 'Google News'}",James Walker,Majority of Americans Oppose Protests Against ...,The survey also found that the vast majority o...,https://www.newsweek.com/majority-americans-op...,https://d.newsweek.com/en/full/1583201/counter...,2020-04-21T10:17:54+00:00,Most Americans are opposed to protests against...


In [0]:
# Tech Crunch Top 10 Headlines

In [0]:
url4 = ('http://newsapi.org/v2/top-headlines?sources=techcrunch&apiKey=b92f31e6a03f4cf8a1fb120e90ef5451')

In [49]:
res = requests.get(url4,headers = headers)
res.status_code

200

In [0]:
techcrunchheadlines=res.json()
sorted(techcrunchheadlines.keys())
techcrunchheadlines = techcrunchheadlines['articles']

In [51]:
df3 =  pd.DataFrame(techcrunchheadlines)
df3

Unnamed: 0,source,author,title,description,url,urlToImage,publishedAt,content
0,"{'id': 'techcrunch', 'name': 'TechCrunch'}",Manish Singh,Facebook invests $5.7B in India's Reliance Jio,Facebook has enjoyed unparalleled reach in Ind...,https://techcrunch.com/2020/04/21/facebook-rel...,https://techcrunch.com/wp-content/uploads/2019...,2020-04-22T00:44:30Z,Facebook has enjoyed unparalleled reach in Ind...
1,"{'id': 'techcrunch', 'name': 'TechCrunch'}",Kirsten Korosec,Porsche to produce a cheaper version of its al...,Porsche is producing three variants of its all...,https://techcrunch.com/2020/04/21/porsche-to-p...,https://techcrunch.com/wp-content/uploads/2019...,2020-04-22T00:27:40Z,Porsche is producing three variants of its all...
2,"{'id': 'techcrunch', 'name': 'TechCrunch'}",Taylor Hatmaker,Senate passes new $484 billion relief bill to ...,A new federal aid package designed to provide ...,https://techcrunch.com/2020/04/21/when-will-th...,https://techcrunch.com/wp-content/uploads/2020...,2020-04-22T00:03:34Z,A new federal aid package designed to provide ...
3,"{'id': 'techcrunch', 'name': 'TechCrunch'}",Natasha Mascarenhas,Casper winds down European operations and lays...,Mattress company Casper is shutting down its E...,https://techcrunch.com/2020/04/21/casper-winds...,https://techcrunch.com/wp-content/uploads/2017...,2020-04-21T23:11:30Z,Mattress company Casper is shutting down its E...
4,"{'id': 'techcrunch', 'name': 'TechCrunch'}",Manish Singh,Indian food delivery startup Swiggy is cutting...,"Swiggy is cutting about 1,000 jobs, most from ...",https://techcrunch.com/2020/04/21/indian-food-...,https://techcrunch.com/wp-content/uploads/2019...,2020-04-21T23:00:37Z,"Swiggy is cutting about 1,000 jobs, most from ..."
5,"{'id': 'techcrunch', 'name': 'TechCrunch'}",Rocio Wu,Will China's coronavirus-related trends shape ...,There may be dark horses waiting to break out ...,https://techcrunch.com/2020/04/21/will-chinas-...,https://techcrunch.com/wp-content/uploads/2020...,2020-04-21T22:40:40Z,"For the past month, VC investment pace seems t..."
6,"{'id': 'techcrunch', 'name': 'TechCrunch'}",Connie Loizos,"The Dipp, a subscription-only entertainment ne...",There is no shortage of coverage about the spr...,https://techcrunch.com/2020/04/21/the-dipp-a-s...,https://techcrunch.com/wp-content/uploads/2020...,2020-04-21T22:12:38Z,There is no shortage of coverage about the spr...
7,"{'id': 'techcrunch', 'name': 'TechCrunch'}",Brian Heater,Amazon employees plan additional protests over...,As much of the world has ground to a temporary...,https://techcrunch.com/2020/04/21/amazon-emplo...,https://techcrunch.com/wp-content/uploads/2020...,2020-04-21T21:46:40Z,As much of the world has ground to a temporary...
8,"{'id': 'techcrunch', 'name': 'TechCrunch'}",Anthony Ha,Netflix says 64M households 'chose to watch' t...,How big a hit Netflix’s “Tiger King”? In its l...,https://techcrunch.com/2020/04/21/netflix-tige...,https://techcrunch.com/wp-content/uploads/2020...,2020-04-21T21:29:21Z,How big a hit Netflix’s “Tiger King”? In its l...
9,"{'id': 'techcrunch', 'name': 'TechCrunch'}",Megan Rose Dickey,Patreon lays off 13% of workforce,Creative platform Patreon has laid off 13% of ...,https://techcrunch.com/2020/04/21/patreon-lays...,https://techcrunch.com/wp-content/uploads/2019...,2020-04-21T21:13:15Z,Creative platform Patreon has laid off 13% of ...


In [0]:
# Trump Top 20 headlines

In [0]:
url5 = ('http://newsapi.org/v2/top-headlines?q=trump&apiKey=b92f31e6a03f4cf8a1fb120e90ef5451')

In [0]:
res = requests.get(url5,headers = headers)

In [55]:
res.status_code

200

In [0]:
trumpheadlines=res.json()
sorted(trumpheadlines.keys())
trumpheadlines = trumpheadlines['articles']

In [57]:
df4 =  pd.DataFrame(trumpheadlines)
df4

Unnamed: 0,source,author,title,description,url,urlToImage,publishedAt,content
0,"{'id': 'financial-times', 'name': 'Financial T...",,US experts warn against drugs touted by Donald...,Covid-19 patients alerted mix of antimalarial ...,https://www.ft.com/content/3994960a-9984-4edb-...,https://www.ft.com/__origami/service/image/v2/...,2020-04-22T01:52:20.1643436Z,The US National Institutes of Health has warne...
1,"{'id': 'reuters', 'name': 'Reuters'}",Reuters Editorial,Trump says he will discuss more coronavirus mo...,U.S. President Donald Trump on Tuesday said di...,http://feeds.reuters.com/~r/reuters/topNews/~3...,https://s3.reutersmedia.net/resources/r/?m=02&...,2020-04-22T01:39:00Z,WASHINGTON (Reuters) - U.S. President Donald T...
2,"{'id': 'reuters', 'name': 'Reuters'}",Reuters Editorial,Trump urges U.S. House to approve latest coron...,U.S. President Donald Trump on Tuesday welcome...,http://feeds.reuters.com/~r/reuters/topNews/~3...,https://s2.reutersmedia.net/resources/r/?m=02&...,2020-04-22T01:39:00Z,WASHINGTON (Reuters) - U.S. President Donald T...
3,"{'id': 'google-news-au', 'name': 'Google News ...",,Donald Trump will suspend US immigration for 6...,US President Donald Trump will issue a 60-day ...,https://www.abc.net.au/news/2020-04-22/donald-...,https://www.abc.net.au/news/image/12172004-16x...,2020-04-22T01:38:08+00:00,"Posted \r\nApril 22, 2020 10:15:51\r\nUS Presi..."
4,"{'id': 'google-news-uk', 'name': 'Google News ...",,Coronavirus latest news: Donald Trump outlines...,US President Donald Trump announced what he de...,https://www.telegraph.co.uk/global-health/scie...,https://img.youtube.com/vi/PvP45njbFbg/mqdefau...,2020-04-22T01:33:21+00:00,A second wave of the coronavirus is expected t...
5,"{'id': 'abc-news-au', 'name': 'ABC News (AU)'}",https://www.abc.net.au/news/emily-olson/10480370,"NY Governor Andrew Cuomo, America's anti-Trump...",It may be too soon to tell who'll get the cred...,http://www.abc.net.au/news/2020-04-22/donald-t...,https://www.abc.net.au/news/image/12172182-16x...,2020-04-22T01:17:45Z,"Posted \r\nApril 22, 2020 11:17:45\r\nNew York..."
6,"{'id': 'google-news-uk', 'name': 'Google News ...","Martin Pengelly, Maanvi Singh, Joan E Greve, D...",Coronavirus US live: Trump says he will suspen...,President says order will only apply to those ...,https://www.theguardian.com/world/live/2020/ap...,https://i.guim.co.uk/img/media/d1d2834bfe769fc...,2020-04-22T00:58:51+00:00,A small wrap here on the meeting this afternoo...
7,"{'id': 'abc-news-au', 'name': 'ABC News (AU)'}",,Coronavirus update: True extent of UK death to...,Donald Trump announces the US will suspend imm...,http://www.abc.net.au/news/2020-04-22/coronavi...,https://www.abc.net.au/news/image/12171620-16x...,2020-04-22T00:45:04Z,"Updated \r\nApril 22, 2020 10:45:04\r\nUS Pres..."
8,"{'id': 'usa-today', 'name': 'USA Today'}",,Coronavirus live updates: Senate reaches $484B...,On the day the Senate agreed to a $484 billion...,https://www.usatoday.com/story/news/health/202...,https://www.gannett-cdn.com/presto/2020/04/21/...,2020-04-22T00:38:50+00:00,The Senate approved a $484 billion stimulus pa...
9,"{'id': 'the-verge', 'name': 'The Verge'}",Nick Statt,Peter Thiel’s controversial Palantir built cor...,"Palantir, an analytics company co-founded by T...",https://www.theverge.com/2020/4/21/21230453/pa...,https://cdn.vox-cdn.com/thumbor/PkglppxT7QX75i...,2020-04-22T00:36:21Z,The firm is helping support a tool called HHS ...


In [0]:
# UK HEADLINE 20 headlines

In [0]:
url6 = ('http://newsapi.org/v2/top-headlines?country=gb&apiKey=4ac92a95346643fdbdb26a7e4d0e98b1')

In [60]:
res = requests.get(url6,headers = headers)
res.status_code

200

In [0]:
ukheadlines=res.json()
sorted(ukheadlines.keys())
ukheadlines = ukheadlines['articles']

In [62]:
df5 =  pd.DataFrame(trumpheadlines)
df5

Unnamed: 0,source,author,title,description,url,urlToImage,publishedAt,content
0,"{'id': 'financial-times', 'name': 'Financial T...",,US experts warn against drugs touted by Donald...,Covid-19 patients alerted mix of antimalarial ...,https://www.ft.com/content/3994960a-9984-4edb-...,https://www.ft.com/__origami/service/image/v2/...,2020-04-22T01:52:20.1643436Z,The US National Institutes of Health has warne...
1,"{'id': 'reuters', 'name': 'Reuters'}",Reuters Editorial,Trump says he will discuss more coronavirus mo...,U.S. President Donald Trump on Tuesday said di...,http://feeds.reuters.com/~r/reuters/topNews/~3...,https://s3.reutersmedia.net/resources/r/?m=02&...,2020-04-22T01:39:00Z,WASHINGTON (Reuters) - U.S. President Donald T...
2,"{'id': 'reuters', 'name': 'Reuters'}",Reuters Editorial,Trump urges U.S. House to approve latest coron...,U.S. President Donald Trump on Tuesday welcome...,http://feeds.reuters.com/~r/reuters/topNews/~3...,https://s2.reutersmedia.net/resources/r/?m=02&...,2020-04-22T01:39:00Z,WASHINGTON (Reuters) - U.S. President Donald T...
3,"{'id': 'google-news-au', 'name': 'Google News ...",,Donald Trump will suspend US immigration for 6...,US President Donald Trump will issue a 60-day ...,https://www.abc.net.au/news/2020-04-22/donald-...,https://www.abc.net.au/news/image/12172004-16x...,2020-04-22T01:38:08+00:00,"Posted \r\nApril 22, 2020 10:15:51\r\nUS Presi..."
4,"{'id': 'google-news-uk', 'name': 'Google News ...",,Coronavirus latest news: Donald Trump outlines...,US President Donald Trump announced what he de...,https://www.telegraph.co.uk/global-health/scie...,https://img.youtube.com/vi/PvP45njbFbg/mqdefau...,2020-04-22T01:33:21+00:00,A second wave of the coronavirus is expected t...
5,"{'id': 'abc-news-au', 'name': 'ABC News (AU)'}",https://www.abc.net.au/news/emily-olson/10480370,"NY Governor Andrew Cuomo, America's anti-Trump...",It may be too soon to tell who'll get the cred...,http://www.abc.net.au/news/2020-04-22/donald-t...,https://www.abc.net.au/news/image/12172182-16x...,2020-04-22T01:17:45Z,"Posted \r\nApril 22, 2020 11:17:45\r\nNew York..."
6,"{'id': 'google-news-uk', 'name': 'Google News ...","Martin Pengelly, Maanvi Singh, Joan E Greve, D...",Coronavirus US live: Trump says he will suspen...,President says order will only apply to those ...,https://www.theguardian.com/world/live/2020/ap...,https://i.guim.co.uk/img/media/d1d2834bfe769fc...,2020-04-22T00:58:51+00:00,A small wrap here on the meeting this afternoo...
7,"{'id': 'abc-news-au', 'name': 'ABC News (AU)'}",,Coronavirus update: True extent of UK death to...,Donald Trump announces the US will suspend imm...,http://www.abc.net.au/news/2020-04-22/coronavi...,https://www.abc.net.au/news/image/12171620-16x...,2020-04-22T00:45:04Z,"Updated \r\nApril 22, 2020 10:45:04\r\nUS Pres..."
8,"{'id': 'usa-today', 'name': 'USA Today'}",,Coronavirus live updates: Senate reaches $484B...,On the day the Senate agreed to a $484 billion...,https://www.usatoday.com/story/news/health/202...,https://www.gannett-cdn.com/presto/2020/04/21/...,2020-04-22T00:38:50+00:00,The Senate approved a $484 billion stimulus pa...
9,"{'id': 'the-verge', 'name': 'The Verge'}",Nick Statt,Peter Thiel’s controversial Palantir built cor...,"Palantir, an analytics company co-founded by T...",https://www.theverge.com/2020/4/21/21230453/pa...,https://cdn.vox-cdn.com/thumbor/PkglppxT7QX75i...,2020-04-22T00:36:21Z,The firm is helping support a tool called HHS ...


In [0]:
# Top 20 Headlines in US

In [0]:
url7 = ('http://newsapi.org/v2/top-headlines?country=us&apiKey=b92f31e6a03f4cf8a1fb120e90ef5451')

In [65]:
res = requests.get(url7,headers = headers)
res.status_code

200

In [0]:
usheadlines=res.json()
sorted(usheadlines.keys())
usheadlines = usheadlines['articles']

In [67]:
df6 =  pd.DataFrame(usheadlines)
df6

Unnamed: 0,source,author,title,description,url,urlToImage,publishedAt,content
0,"{'id': None, 'name': 'Nytimes.com'}",,Coronavirus Live Updates: Trump Pauses Issuing...,A 60-day pause in immigration will not apply t...,https://www.nytimes.com/2020/04/21/us/coronavi...,https://www.nytimes.com/newsgraphics/2020/04/0...,2020-04-22T00:46:08Z,Heres what you need to know:\r\nVideo\r\nBack\...
1,"{'id': None, 'name': 'Cnet.com'}",Katie Conner,"Face masks outside, in cars, in stores: Here's...",The wearing of face masks and face coverings i...,https://www.cnet.com/how-to/face-masks-outside...,https://cnet3.cbsistatic.com/img/1VG-b982tcCJe...,2020-04-22T00:22:43Z,Some states and counties now have mandates for...
2,"{'id': 'cnn', 'name': 'CNN'}","Stella Chan and Theresa Waldrop, CNN",Chipotle Mexican Grill to pay $25 million fine...,Chipotle Mexican Grill has agreed to pay a rec...,https://www.cnn.com/2020/04/21/us/chipotle-fin...,https://cdn.cnn.com/cnnnext/dam/assets/2004211...,2020-04-21T23:43:00Z,
3,"{'id': 'cnbc', 'name': 'CNBC'}",The Associated Press,"Trump immigration ban halts green cards, not t...",President Donald Trump announced Tuesday he wi...,https://www.cnbc.com/2020/04/21/trump-immigrat...,https://image.cnbcfm.com/api/v1/image/10649967...,2020-04-21T23:34:35Z,President Donald Trump announced Tuesday he wi...
4,"{'id': 'cnbc', 'name': 'CNBC'}",Alex Sherman,Disney+ has a big fan: Netflix CEO Reed Hastin...,Netflix CEO Reed Hastings praises Disney's str...,https://www.cnbc.com/2020/04/21/netflix-ceo-re...,https://image.cnbcfm.com/api/v1/image/10648111...,2020-04-21T23:32:32Z,Netflix CEO Reed Hastings is a big fan of Disn...
5,"{'id': None, 'name': 'Youtube.com'}",,U.S. officials seeking answers about health of...,The U.S. is closely monitoring reports concern...,https://www.youtube.com/watch?v=uMDRdch8NMU,https://i.ytimg.com/vi/uMDRdch8NMU/maxresdefau...,2020-04-21T23:29:32Z,
6,"{'id': 'politico', 'name': 'Politico'}",BILL MAHONEY,Cuomo commits to reopening New York state regi...,The governor provided the first taste of how t...,https://www.politico.com/states/new-york/alban...,https://static.politico.com/a3/70/bdf7718347a4...,2020-04-21T23:27:55Z,Gov. Cuomo provides a coronavirus update durin...
7,"{'id': 'fox-news', 'name': 'Fox News'}",Morgan Phillips,Trudeau calls for ban on 'assault-style weapon...,Canadian Prime Minister Justin Trudeau promise...,https://www.foxnews.com/world/trudeau-ban-on-a...,https://static.foxnews.com/foxnews.com/content...,2020-04-21T23:23:32Z,Canadian Prime Minister Justin Trudeau promise...
8,"{'id': None, 'name': 'Espn.com'}",Bill Barnwell,Rob Gronkowski trade grades - Did the Bucs jus...,Rob Gronkowski is headed to Tampa Bay to join ...,https://www.espn.com/nfl/story/_/id/29078662/r...,https://a2.espncdn.com/combiner/i?img=%2Fphoto...,2020-04-21T23:02:09Z,The Buccaneers added one future Hall of Famer ...
9,"{'id': None, 'name': 'Cheatsheet.com'}",Michelle Kapusta,Queen Elizabeth II's Birthday: Royal Pastry Ch...,Find out how you can make cupcakes just like Q...,https://www.cheatsheet.com/entertainment/queen...,https://www.cheatsheet.com/wp-content/uploads/...,2020-04-21T22:37:25Z,Queen Elizabeth II celebrated her birthday on ...


In [0]:
# we will combine all news headlines together
finaldf= pd.concat([df,df2,df3,df4,df5,df6],sort=True).reset_index(drop=True)

In [70]:
# create a new headlines to combine title and description together
finaldf['headlines'] = finaldf["title"] 
#Convert headlines into string first so that we can combine via the dates later.
finaldf['headlines'] = finaldf['headlines'].astype(str)
#to remove the columns that we do not need
finaldf = finaldf.drop(['author', 'source','url','description','title','urlToImage','content'], axis=1)
#change to date to datetime.
finaldf['publishedAt'] = finaldf['publishedAt'].str[:10]
finaldf['publishedAt'] = pd.to_datetime(finaldf['publishedAt'])
#group the headlines according to dates.
finaldf = (finaldf.groupby('publishedAt').agg({'headlines' : lambda x: ' '.join(x.unique())}))
finaldf.head()

Unnamed: 0_level_0,headlines
publishedAt,Unnamed: 1_level_1
2020-04-21,Netflix sign-ups jump during coronavirus lockd...
2020-04-22,Coronavirus in Illinois updates: Here’s what’s...


In [0]:
#redefine to new X for modelling purposes.
newX = finaldf['headlines']

# Run through our dataset on the first model 

In [0]:
#num_words - This will be the maximum number of words 
#from our resulting tokenized data vocabulary which are to be used, 
#truncated after the 10000 most common words in our case.
tokenizer = Tokenizer(num_words=10000)
# Tokenize our training data'trainheadlines'
tokenizer.fit_on_texts(newX)
# Encode training data sentences into sequences for both train and test data.
sequences_train = tokenizer.texts_to_sequences(newX)

In [0]:
#Features for model training
#nb_classes - total number of classes.
nb_classes = 2
# maxlen is feature of maximum sequence length for padding our encoded sentences
maxlen = 200

# Pad the training sequences as we need our encoded sequences to be of the same length. 
# use that to pad all other sequences with extra '0's at the end ('post') and
# will also truncate any sequences longer than maximum length from the end ('post') as well. 
newX = sequence.pad_sequences(sequences_train, maxlen=maxlen)

In [88]:
#predict y value
yhat = model.predict_classes(newX)
#predict values 
yhat
#time to convert into dataframe
predicty =  pd.DataFrame(yhat)
predicty

Unnamed: 0,0
0,0
1,1


In [0]:
# Run through our dataset on the 2nd model (sentiment analysis)

In [76]:
pol = lambda x : TextBlob(x).sentiment.polarity
sentiment = finaldf['headlines'].apply(pol)
sentiment

publishedAt
2020-04-21   -0.012601
2020-04-22    0.149675
Name: headlines, dtype: float64

In [92]:
#time to convert this into datafarme
predictions  = pd.DataFrame(sentiment)
predictions

Unnamed: 0_level_0,headlines
publishedAt,Unnamed: 1_level_1
2020-04-21,-0.012601
2020-04-22,0.149675


# Our machine predicted that for Vix  : 
- Vix to rise on 20 April with a negative sentiment of 0.20 
- Vix to fall on 21 April with a positive sentiment of 0.08


In [0]:
#we want to save our news headlines and our predictions as well

In [83]:
from google.colab import drive
drive.mount('/drive')

Mounted at /drive


In [0]:
#savings our news headlines to google drive folder
finaldf.to_csv('/drive/My Drive/Colab Notebooks/newsheadlines.csv')

In [109]:
#savings our predictions from model to google drive folder
merged_df = pd.concat([predictions, predicty])
merged_df

Unnamed: 0,headlines,0
2020-04-21 00:00:00,-0.012601,
2020-04-22 00:00:00,0.149675,
0,,0.0
1,,1.0


In [0]:
merged_df.to_csv('/drive/My Drive/Colab Notebooks/predictions.csv')