### Phase 1 of the Capstone
##### Let's collect data
##### We'll be using news-api to gather current news data for the past 30 months, since there is a limit on days and I do not want to pay for the extension ...

Let's start with our libraries

In [64]:
import pandas as pd
import numpy as np
# This will be our data cleaners and fixers
from newsapi import NewsApiClient
# This is how we'll be getting news data to discover sentiment
import nltk 
# Our nlp pipeline, for any nlp cleaning tasks
from textblob import TextBlob
# One method on gathering sentiment (since this will be unsupervised learning)
from vaderSentiment.vaderSentiment import SentimentIntensityAnalyzer
# Second method of gathering sentimnet
from iex import Stock
# THis is how we'll be getting stock data, and find correlation between movement and setiment generated
import plotly_express as px
#Nice graphs
import seaborn as sns 
# Nicer graphs than matplotlib

##### Starting Small, let's see what we have to get

In [2]:
stocks_df = pd.read_csv('data/constituents.csv')
stocks_df.head()

Unnamed: 0,Symbol,Name,Sector
0,MMM,3M Company,Industrials
1,AOS,A.O. Smith Corp,Industrials
2,ABT,Abbott Laboratories,Health Care
3,ABBV,AbbVie Inc.,Health Care
4,ACN,Accenture plc,Information Technology


This is how I will be getting information from. These companies. These are all securities covered under the S&P 500.

In [5]:
px.bar(stocks_df, x = 'Sector')

### Let's get our news!

In [8]:
#Key so we can use the API
newsapi = NewsApiClient(api_key='22901f790d59422e817ef038a62c6c38')

# Let's check our news sources
newsapi.get_sources(country = 'us')


{'status': 'ok',
 'sources': [{'id': 'abc-news',
   'name': 'ABC News',
   'description': 'Your trusted source for breaking news, analysis, exclusive interviews, headlines, and videos at ABCNews.com.',
   'url': 'https://abcnews.go.com',
   'category': 'general',
   'language': 'en',
   'country': 'us'},
  {'id': 'al-jazeera-english',
   'name': 'Al Jazeera English',
   'description': 'News, analysis from the Middle East and worldwide, multimedia and interactives, opinions, documentaries, podcasts, long reads and broadcast schedule.',
   'url': 'http://www.aljazeera.com',
   'category': 'general',
   'language': 'en',
   'country': 'us'},
  {'id': 'ars-technica',
   'name': 'Ars Technica',
   'description': "The PC enthusiast's resource. Power users and the tools they love, without computing religion.",
   'url': 'http://arstechnica.com',
   'category': 'technology',
   'language': 'en',
   'country': 'us'},
  {'id': 'associated-press',
   'name': 'Associated Press',
   'description': 

#### We want to limit our number of sources so we do not get too much fluff, and stick with mainly business journalist

In [10]:
S_P_500_news = newsapi.get_everything(q='S&P 500',
                                     sources = 'business-insider,cnbc,fortune,fox-news,nbc-news,bloomberg,the-wall-street-journal,the-washington-post,the-new-york-times,reuters')

In [14]:
S_P_500_news

{'status': 'ok',
 'totalResults': 413,
 'articles': [{'source': {'id': 'business-insider',
    'name': 'Business Insider'},
   'author': 'Arjun Reddy',
   'title': "Warren Buffett isn't sure Berkshire Hathaway can beat the S&P 500 (BRK.A)",
   'description': 'The legendary investor Warre n Buffett predicts Berkshire Hathaway may only modestly outperform the S&P 500, if at all. The Berkshire CEO made the comment in a wide-ranging interview with the Financial Times. Buffett also noted that Berkshire may buy back as …',
   'url': 'https://www.businessinsider.com/warren-buffett-isnt-sure-berkshire-hathaway-can-beat-sp-500-2019-4',
   'urlToImage': 'https://amp.businessinsider.com/images/5cc1f06bb14bf4628b26b9a2-1334-667.jpg',
   'publishedAt': '2019-04-26T10:02:00Z',
   'content': 'Warren Buffett shot down expectations that the conglomerate Berkshire Hathaway will significantly outperform the S&amp;P 500 going forward. \r\n The billionaire investor, dubbed the "Oracle of Omaha," told The F

##### Lets make a dataframe!

In [12]:
SP_df = pd.DataFrame(S_P_500_news['articles'])
SP_df.head()

Unnamed: 0,author,content,description,publishedAt,source,title,url,urlToImage
0,Arjun Reddy,Warren Buffett shot down expectations that the...,The legendary investor Warre n Buffett predict...,2019-04-26T10:02:00Z,"{'id': 'business-insider', 'name': 'Business I...",Warren Buffett isn't sure Berkshire Hathaway c...,https://www.businessinsider.com/warren-buffett...,https://amp.businessinsider.com/images/5cc1f06...
1,Arjun Reddy,Markets Insider\r\nThe stock market is set to ...,The stock market's new closing peak may signal...,2019-04-24T18:31:02Z,"{'id': 'business-insider', 'name': 'Business I...",The stock market just hit a record high and hi...,https://markets.businessinsider.com/news/stock...,https://images.markets.businessinsider.com/ima...
2,Marley Jay,"Big stock indexes are setting records again, b...",The S&P 500 and Nasdaq both set record highs t...,2019-04-29T09:58:00Z,"{'id': 'business-insider', 'name': 'Business I...",We spoke to 3 experts who explained why the st...,https://www.businessinsider.com/next-stock-mar...,https://amp.businessinsider.com/images/5cc3557...
3,Theron Mohamed,"Prediction:\r\n ""Over a ten-year period commen...",Legendary investor Warren Buffett avoids makin...,2019-04-19T12:46:00Z,"{'id': 'business-insider', 'name': 'Business I...",Warren Buffett made 12 predictions about bitco...,https://www.businessinsider.com/warren-buffett...,https://amp.businessinsider.com/images/5cb5d50...
4,Akin Oyedele,A good number of investors are feeling left ou...,Investors worried about the durability of the ...,2019-05-02T16:46:41Z,"{'id': 'business-insider', 'name': 'Business I...",Bank of America has devised the perfect tradin...,https://www.businessinsider.com/stock-market-o...,https://amp.businessinsider.com/images/5b55e30...


### Let's try again to see if we can get more news!

In [17]:
try2_df = newsapi.get_top_headlines(q='S&P 500')
try2_df

{'status': 'ok',
 'totalResults': 1,
 'articles': [{'source': {'id': 'crypto-coins-news',
    'name': 'Crypto Coins News'},
   'author': 'https://facebook.com/no.shit.madore',
   'title': 'S&P 500 Stock Plunges 24% After Brutal Q1 Loss Stuns Investors',
   'description': 'By CCN: Stocks are broadly tanking today, but Fluor, Inc., a Texas construction outfit, is leading the charge, with a roughly 24% loss by press time. The company lost 48 cents per share and announced',
   'url': 'https://www.ccn.com/sp-500-stock-plunges-24-after-brutal-q1-loss-stuns-investors',
   'urlToImage': 'https://www.ccn.com/wp-content/uploads/2019/02/tesla-elon-musk-debt-panic-shutterstock.jpg',
   'publishedAt': '2019-05-02T15:34:53Z',
   'content': 'By CCN: Stocks are broadly tanking today, but Fluor, Inc., a Texas construction outfit, is leading the charge, with a roughly 24% loss by press time. The company lost 48 cents per share and announced the resignation of its 8-year CEO David Seaton. Fluor repor… [+

As we can see that did not work out too well... BUT luckily we have what we looked for!

In [18]:
SP_df.tail()

Unnamed: 0,author,content,description,publishedAt,source,title,url,urlToImage
15,Jonathan Garber,Your Personalized Market Center,Here is what you need to know. Uber files to g...,2019-04-26T11:22:00Z,"{'id': 'business-insider', 'name': 'Business I...",10 things you need to know before the opening ...,https://www.businessinsider.com/stock-market-n...,https://images.markets.businessinsider.com/ima...
16,Jonathan Garber,Your Personalized Market Center,Here is what you need to know. Trump's trade w...,2019-04-12T10:54:00Z,"{'id': 'business-insider', 'name': 'Business I...",10 things you need to know before the opening ...,https://markets.businessinsider.com/news/stock...,https://images.markets.businessinsider.com/ima...
17,Bloomberg,,Bloomberg's Abigail Doolittle reports on the S...,2019-04-10T15:04:39Z,"{'id': 'bloomberg', 'name': 'Bloomberg'}",The Unimpressive Momentum in the S&P 500,https://www.bloomberg.com/news/videos/2019-04-...,https://assets.bwbx.io/images/users/iqjWHBFdfx...
18,Theron Mohamed,Asian stocks closed higher on Wednesday after ...,Asian stocks closed higher on Wednesday after ...,2019-04-17T08:48:11Z,"{'id': 'business-insider', 'name': 'Business I...",China's strong data sparks major stock rally w...,https://www.businessinsider.com/stock-market-n...,https://amp.businessinsider.com/images/5cb6e09...
19,Jonathan Garber,Reuters/Francis Mascarenhas\r\nHere is what yo...,Here is what you need to know. The UK is at a ...,2019-04-02T10:47:00Z,"{'id': 'business-insider', 'name': 'Business I...",10 things you need to know before the opening ...,https://markets.businessinsider.com/news/stock...,https://images.markets.businessinsider.com/ima...


### Proceeding steps:
1. Let's turn our dates into useable formats!
2. Let's clean out the content.
3. Let's get stock data for this concept: S&P 500 Stock price!

Notice how the dates have that 'T' and 'Z' present, let's remove them and set them to date time formats!

In [20]:
SP_df['publishedAt'] = [i.replace('T',' ') for i in SP_df['publishedAt']]
SP_df['publishedAt'] = [i.replace('Z',' ') for i in SP_df['publishedAt']]
SP_df['publishedAt'] = pd.to_datetime(SP_df['publishedAt'])

In [26]:
SP_df.head()

Unnamed: 0,author,content,description,publishedAt,source,title,url,urlToImage
0,Arjun Reddy,Warren Buffett shot down expectations that the...,The legendary investor Warre n Buffett predict...,2019-04-26 10:02:00,"{'id': 'business-insider', 'name': 'Business I...",Warren Buffett isn't sure Berkshire Hathaway c...,https://www.businessinsider.com/warren-buffett...,https://amp.businessinsider.com/images/5cc1f06...
1,Arjun Reddy,Markets Insider\r\nThe stock market is set to ...,The stock market's new closing peak may signal...,2019-04-24 18:31:02,"{'id': 'business-insider', 'name': 'Business I...",The stock market just hit a record high and hi...,https://markets.businessinsider.com/news/stock...,https://images.markets.businessinsider.com/ima...
2,Marley Jay,"Big stock indexes are setting records again, b...",The S&P 500 and Nasdaq both set record highs t...,2019-04-29 09:58:00,"{'id': 'business-insider', 'name': 'Business I...",We spoke to 3 experts who explained why the st...,https://www.businessinsider.com/next-stock-mar...,https://amp.businessinsider.com/images/5cc3557...
3,Theron Mohamed,"Prediction:\r\n ""Over a ten-year period commen...",Legendary investor Warren Buffett avoids makin...,2019-04-19 12:46:00,"{'id': 'business-insider', 'name': 'Business I...",Warren Buffett made 12 predictions about bitco...,https://www.businessinsider.com/warren-buffett...,https://amp.businessinsider.com/images/5cb5d50...
4,Akin Oyedele,A good number of investors are feeling left ou...,Investors worried about the durability of the ...,2019-05-02 16:46:41,"{'id': 'business-insider', 'name': 'Business I...",Bank of America has devised the perfect tradin...,https://www.businessinsider.com/stock-market-o...,https://amp.businessinsider.com/images/5b55e30...


In [27]:
# PERFECT! Now lets sort the dates!
df = SP_df.sort_values('publishedAt')
df.head()

Unnamed: 0,author,content,description,publishedAt,source,title,url,urlToImage
19,Jonathan Garber,Reuters/Francis Mascarenhas\r\nHere is what yo...,Here is what you need to know. The UK is at a ...,2019-04-02 10:47:00,"{'id': 'business-insider', 'name': 'Business I...",10 things you need to know before the opening ...,https://markets.businessinsider.com/news/stock...,https://images.markets.businessinsider.com/ima...
5,JIM TANKERSLEY and ANA SWANSON,Border activity makes up a relatively larger s...,"“Security is more important to me than trade,”...",2019-04-02 23:55:24,"{'id': 'the-new-york-times', 'name': 'The New ...","Trump Vows to Close Border, Even if It Hurts t...",https://www.nytimes.com/2019/04/02/us/politics...,https://static01.nyt.com/images/2019/04/02/bus...
7,Arjun Reddy,"Through the following slides, JPMorgan provide...","Through the following slides, JPMorgan provide...",2019-04-03 16:28:00,"{'id': 'business-insider', 'name': 'Business I...",JPMorgan: These 66 charts are the ultimate gui...,https://www.businessinsider.com/stock-market-6...,https://amp.businessinsider.com/images/5ca4d89...
6,Callum Burroughs,Stock markets paused for thought Thursday as i...,"Global equities paused for thought Thursday, a...",2019-04-04 09:02:50,"{'id': 'business-insider', 'name': 'Business I...",Global stocks stall after Trump's tariffs bloc...,https://www.businessinsider.com/stock-markets-...,https://amp.businessinsider.com/images/5b23dd5...
12,Tanza Loudenback,"If you want to get rich quick, we have some ba...",The easiest and safest way to grow your money ...,2019-04-05 13:30:00,"{'id': 'business-insider', 'name': 'Business I...","How to invest $100,000 to make $1 million",https://www.businessinsider.com/investment-cal...,https://amp.businessinsider.com/images/5ca65a2...


In [43]:
# We got our  S&P Data for the past month via yahoo, lets take a look at it
sp_price = pd.read_csv('data/^GSPC.csv')
sp_price.head()

Unnamed: 0,Date,Open,High,Low,Close,Adj Close,Volume
0,2019-02-04,2706.48999,2724.98999,2698.75,2724.870117,2724.870117,3359840000
1,2019-02-05,2728.340088,2738.97998,2724.030029,2737.699951,2737.699951,3560430000
2,2019-02-06,2735.050049,2738.080078,2724.149902,2731.610107,2731.610107,3472690000
3,2019-02-07,2717.530029,2719.320068,2687.26001,2706.050049,2706.050049,4099490000
4,2019-02-08,2692.360107,2708.070068,2681.830078,2707.879883,2707.879883,3622330000


In [44]:
px.line(sp_price, x='Date',y='Close')

- Not too bad, seems to be doing poorly as of recently, so let's check our sentiment data
- We have daily data, and news data is on a differnet basis, let's work aournd this

In [50]:
SP_df['Date'] =pd.to_datetime([str(i)[0:10]for i in SP_df['publishedAt']])

In [58]:
#SP_df['Date']

In [57]:
#sp_df2=SP_df.sort_values('publishedAt')
#sp_df2.head()

#### Let's find some sentiment using our libraries

In [66]:
#Let's Start small, like 1 news source
example = sp_df2['description'][0]

In [67]:
# wiki is always for textblob
wiki = TextBlob(example)
print(wiki.sentiment)

Sentiment(polarity=0.22000000000000003, subjectivity=0.58)


##### By the looks of it it's pretty negative (TextBlob measures for 0 - 1, but it's also hight subjective. Which could be a good indication on contextuality)

##### Now with Vader + Spacy sentiment

In [68]:
from vaderSentiment.vaderSentiment import SentimentIntensityAnalyzer
from spacy.tokens import Doc
import spacy
sentiment_analyzer = SentimentIntensityAnalyzer()
nlp = spacy.load('en')
def polarity_scores(doc):
    return sentiment_analyzer.polarity_scores(doc.text)

In [69]:
doc = nlp(example)
print(polarity_scores(doc))

{'neg': 0.0, 'neu': 1.0, 'pos': 0.0, 'compound': 0.0}


##### 100% Neutral, which makes more sense than TextBlob, if we take a look at the content

In [70]:
print(example)

The legendary investor Warre n Buffett predicts Berkshire Hathaway may only modestly outperform the S&P 500, if at all. The Berkshire CEO made the comment in a wide-ranging interview with the Financial Times. Buffett also noted that Berkshire may buy back as …


##### Contextuality is very important here!

##### Let's make a data frame with all our data.
    - TextBlob sentiment data frame! 
    - Vader sentiment data frame!

In [81]:
news_df = sp_df2.set_index('publishedAt')
news_df = news_df.sort_index()
news_df['content'] = news_df['content'].fillna('None')
news_df['description'] = news_df['description'].fillna('None')
news_df['title'] = news_df['title'].fillna('None')

#### Let's create a textblob object for our text data so we can actually use it

In [82]:
news_df['content_wiki'] = [TextBlob(i) for i in news_df['content']]
news_df['description_wiki'] = [TextBlob(i) for i in news_df['description']]
news_df['title_wiki']= [TextBlob(i) for i in news_df['title']]

In [83]:
news_df[['content_wiki','description_wiki','title_wiki']].head()

Unnamed: 0_level_0,content_wiki,description_wiki,title_wiki
publishedAt,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
2019-04-02 10:47:00,"(R, e, u, t, e, r, s, /, F, r, a, n, c, i, s, ...","(H, e, r, e, , i, s, , w, h, a, t, , y, o, ...","(1, 0, , t, h, i, n, g, s, , y, o, u, , n, ..."
2019-04-02 23:55:24,"(B, o, r, d, e, r, , a, c, t, i, v, i, t, y, ...","(“, S, e, c, u, r, i, t, y, , i, s, , m, o, ...","(T, r, u, m, p, , V, o, w, s, , t, o, , C, ..."
2019-04-03 16:28:00,"(T, h, r, o, u, g, h, , t, h, e, , f, o, l, ...","(T, h, r, o, u, g, h, , t, h, e, , f, o, l, ...","(J, P, M, o, r, g, a, n, :, , T, h, e, s, e, ..."
2019-04-04 09:02:50,"(S, t, o, c, k, , m, a, r, k, e, t, s, , p, ...","(G, l, o, b, a, l, , e, q, u, i, t, i, e, s, ...","(G, l, o, b, a, l, , s, t, o, c, k, s, , s, ..."
2019-04-05 13:30:00,"(I, f, , y, o, u, , w, a, n, t, , t, o, , ...","(T, h, e, , e, a, s, i, e, s, t, , a, n, d, ...","(H, o, w, , t, o, , i, n, v, e, s, t, , $, ..."


#### Now let's run it through

In [84]:
news_df['content_polarity'] = [i.polarity for i in news_df['content_wiki']]
news_df['description_polarity'] = [i.polarity for i in news_df['description_wiki']]
news_df['title_polarity'] = [i.polarity for i in news_df['title_wiki']]
news_df['content_subjectivity'] = [i.subjectivity for i in news_df['content_wiki']]
news_df['description_subjectivity'] = [i.subjectivity for i in news_df['description_wiki']]
news_df['title_subjectivity'] = [i.subjectivity for i in news_df['title_wiki']]

In [85]:
news_df[['content_polarity','description_polarity','title_polarity','content_subjectivity','description_subjectivity'
        ,'title_subjectivity']].head()

Unnamed: 0_level_0,content_polarity,description_polarity,title_polarity,content_subjectivity,description_subjectivity,title_subjectivity
publishedAt,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
2019-04-02 10:47:00,-0.017857,-0.017857,0.0,0.053571,0.053571,0.0
2019-04-02 23:55:24,0.265179,0.366667,0.0,0.591071,0.566667,0.0
2019-04-03 16:28:00,0.08,0.066667,0.0,0.37,0.325,1.0
2019-04-04 09:02:50,0.370833,0.296667,0.0,0.516667,0.413333,0.0
2019-04-05 13:30:00,0.136458,0.033333,0.0,0.483333,0.335714,0.0


In [87]:
news_df[['description','description_polarity','description_subjectivity']].head()

Unnamed: 0_level_0,description,description_polarity,description_subjectivity
publishedAt,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
2019-04-02 10:47:00,Here is what you need to know. The UK is at a ...,-0.017857,0.053571
2019-04-02 23:55:24,"“Security is more important to me than trade,”...",0.366667,0.566667
2019-04-03 16:28:00,"Through the following slides, JPMorgan provide...",0.066667,0.325
2019-04-04 09:02:50,"Global equities paused for thought Thursday, a...",0.296667,0.413333
2019-04-05 13:30:00,The easiest and safest way to grow your money ...,0.033333,0.335714


In [94]:
news_df[['description','description_polarity']]

Unnamed: 0_level_0,description,description_polarity
publishedAt,Unnamed: 1_level_1,Unnamed: 2_level_1
2019-04-02 10:47:00,Here is what you need to know. The UK is at a ...,-0.017857
2019-04-02 23:55:24,"“Security is more important to me than trade,”...",0.366667
2019-04-03 16:28:00,"Through the following slides, JPMorgan provide...",0.066667
2019-04-04 09:02:50,"Global equities paused for thought Thursday, a...",0.296667
2019-04-05 13:30:00,The easiest and safest way to grow your money ...,0.033333
2019-04-08 10:45:00,Here is what you need to know. Brexit's costs ...,0.228571
2019-04-10 15:04:39,Bloomberg's Abigail Doolittle reports on the S...,0.0
2019-04-12 10:54:00,Here is what you need to know. Trump's trade w...,0.0
2019-04-16 04:01:00,BlackRock releases its first quarter earnings ...,0.016667
2019-04-17 08:48:11,Asian stocks closed higher on Wednesday after ...,0.125325


#### One down, vader to go!

In [95]:
news_df['content_vader'] = [nlp(i) for i in news_df['content']]
news_df['description_vader'] = [nlp(i) for i in news_df['description']]
news_df['title_vader']= [nlp(i) for i in news_df['title']]

In [98]:
news_df['content_vader_output'] = [polarity_scores(i) for i in news_df['content_vader']]
news_df['description_vader_output'] = [polarity_scores(i) for i in news_df['description_vader']]
news_df['title_vader_output'] = [polarity_scores(i) for i in news_df['title_vader']]

In [100]:
news_df['description_vader_output']

publishedAt
2019-04-02 10:47:00    {'neg': 0.117, 'neu': 0.883, 'pos': 0.0, 'comp...
2019-04-02 23:55:24    {'neg': 0.127, 'neu': 0.791, 'pos': 0.083, 'co...
2019-04-03 16:28:00    {'neg': 0.0, 'neu': 1.0, 'pos': 0.0, 'compound...
2019-04-04 09:02:50    {'neg': 0.086, 'neu': 0.754, 'pos': 0.16, 'com...
2019-04-05 13:30:00    {'neg': 0.0, 'neu': 0.879, 'pos': 0.121, 'comp...
2019-04-08 10:45:00    {'neg': 0.12, 'neu': 0.8, 'pos': 0.08, 'compou...
2019-04-10 15:04:39    {'neg': 0.0, 'neu': 1.0, 'pos': 0.0, 'compound...
2019-04-12 10:54:00    {'neg': 0.082, 'neu': 0.86, 'pos': 0.059, 'com...
2019-04-16 04:01:00    {'neg': 0.0, 'neu': 0.862, 'pos': 0.138, 'comp...
2019-04-17 08:48:11    {'neg': 0.0, 'neu': 0.853, 'pos': 0.147, 'comp...
2019-04-19 12:46:00    {'neg': 0.138, 'neu': 0.742, 'pos': 0.12, 'com...
2019-04-22 13:00:00    {'neg': 0.0, 'neu': 1.0, 'pos': 0.0, 'compound...
2019-04-24 18:31:02    {'neg': 0.0, 'neu': 0.946, 'pos': 0.054, 'comp...
2019-04-25 10:54:00    {'neg': 0.0, 'ne

In [101]:
vader_content = pd.DataFrame(news_df['content_vader_output'].tolist())
vader_content.head()

Unnamed: 0,compound,neg,neu,pos
0,-0.6908,0.119,0.881,0.0
1,0.8611,0.042,0.735,0.222
2,0.0,0.0,1.0,0.0
3,0.431,0.034,0.852,0.114
4,0.4215,0.102,0.727,0.171


In [103]:
vader_desc = pd.DataFrame(news_df['description_vader_output'].tolist())
vader_desc.head()

Unnamed: 0,compound,neg,neu,pos
0,-0.6908,0.117,0.883,0.0
1,-0.2748,0.127,0.791,0.083
2,0.0,0.0,1.0,0.0
3,0.3716,0.086,0.754,0.16
4,0.6705,0.0,0.879,0.121


In [106]:
vader_text = pd.DataFrame(news_df['title_vader_output'].tolist())
vader_text.head()

Unnamed: 0,compound,neg,neu,pos
0,0.0,0.0,1.0,0.0
1,-0.4767,0.237,0.763,0.0
2,0.0,0.0,1.0,0.0
3,-0.2263,0.303,0.516,0.181
4,0.0,0.0,1.0,0.0


##### Due to the nature of vader, we can use compound, since it has a -1 to 1 range, we can infer the rest of the data from that!

In [108]:
news_df['text_compound'] = [i for i in vader_text['compound']]
news_df['content_compound'] = [i for i in vader_content['compound']]
news_df['description_compound'] = [i for i in vader_desc['compound']]
news_df.head()

Unnamed: 0_level_0,author,content,description,source,title,url,urlToImage,Date,content_wiki,description_wiki,title_wiki,content_polarity,description_polarity,title_polarity,content_subjectivity,description_subjectivity,title_subjectivity,content_vader,description_vader,title_vader,content_vader_output,description_vader_output,title_vader_output,text_compound,content_compound,description_compound
publishedAt,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1,Unnamed: 22_level_1,Unnamed: 23_level_1,Unnamed: 24_level_1,Unnamed: 25_level_1,Unnamed: 26_level_1
2019-04-02 10:47:00,Jonathan Garber,Reuters/Francis Mascarenhas\r\nHere is what yo...,Here is what you need to know. The UK is at a ...,"{'id': 'business-insider', 'name': 'Business I...",10 things you need to know before the opening ...,https://markets.businessinsider.com/news/stock...,https://images.markets.businessinsider.com/ima...,2019-04-02,"(R, e, u, t, e, r, s, /, F, r, a, n, c, i, s, ...","(H, e, r, e, , i, s, , w, h, a, t, , y, o, ...","(1, 0, , t, h, i, n, g, s, , y, o, u, , n, ...",-0.017857,-0.017857,0.0,0.053571,0.053571,0.0,"(Reuters, /, Francis, Mascarenhas, \r\n, Here,...","(Here, is, what, you, need, to, know, ., The, ...","(10, things, you, need, to, know, before, the,...","{'neg': 0.119, 'neu': 0.881, 'pos': 0.0, 'comp...","{'neg': 0.117, 'neu': 0.883, 'pos': 0.0, 'comp...","{'neg': 0.0, 'neu': 1.0, 'pos': 0.0, 'compound...",0.0,-0.6908,-0.6908
2019-04-02 23:55:24,JIM TANKERSLEY and ANA SWANSON,Border activity makes up a relatively larger s...,"“Security is more important to me than trade,”...","{'id': 'the-new-york-times', 'name': 'The New ...","Trump Vows to Close Border, Even if It Hurts t...",https://www.nytimes.com/2019/04/02/us/politics...,https://static01.nyt.com/images/2019/04/02/bus...,2019-04-02,"(B, o, r, d, e, r, , a, c, t, i, v, i, t, y, ...","(“, S, e, c, u, r, i, t, y, , i, s, , m, o, ...","(T, r, u, m, p, , V, o, w, s, , t, o, , C, ...",0.265179,0.366667,0.0,0.591071,0.566667,0.0,"(Border, activity, makes, up, a, relatively, l...","(“, Security, is, more, important, to, me, tha...","(Trump, Vows, to, Close, Border, ,, Even, if, ...","{'neg': 0.042, 'neu': 0.735, 'pos': 0.222, 'co...","{'neg': 0.127, 'neu': 0.791, 'pos': 0.083, 'co...","{'neg': 0.237, 'neu': 0.763, 'pos': 0.0, 'comp...",-0.4767,0.8611,-0.2748
2019-04-03 16:28:00,Arjun Reddy,"Through the following slides, JPMorgan provide...","Through the following slides, JPMorgan provide...","{'id': 'business-insider', 'name': 'Business I...",JPMorgan: These 66 charts are the ultimate gui...,https://www.businessinsider.com/stock-market-6...,https://amp.businessinsider.com/images/5ca4d89...,2019-04-03,"(T, h, r, o, u, g, h, , t, h, e, , f, o, l, ...","(T, h, r, o, u, g, h, , t, h, e, , f, o, l, ...","(J, P, M, o, r, g, a, n, :, , T, h, e, s, e, ...",0.08,0.066667,0.0,0.37,0.325,1.0,"(Through, the, following, slides, ,, JPMorgan,...","(Through, the, following, slides, ,, JPMorgan,...","(JPMorgan, :, These, 66, charts, are, the, ult...","{'neg': 0.0, 'neu': 1.0, 'pos': 0.0, 'compound...","{'neg': 0.0, 'neu': 1.0, 'pos': 0.0, 'compound...","{'neg': 0.0, 'neu': 1.0, 'pos': 0.0, 'compound...",0.0,0.0,0.0
2019-04-04 09:02:50,Callum Burroughs,Stock markets paused for thought Thursday as i...,"Global equities paused for thought Thursday, a...","{'id': 'business-insider', 'name': 'Business I...",Global stocks stall after Trump's tariffs bloc...,https://www.businessinsider.com/stock-markets-...,https://amp.businessinsider.com/images/5b23dd5...,2019-04-04,"(S, t, o, c, k, , m, a, r, k, e, t, s, , p, ...","(G, l, o, b, a, l, , e, q, u, i, t, i, e, s, ...","(G, l, o, b, a, l, , s, t, o, c, k, s, , s, ...",0.370833,0.296667,0.0,0.516667,0.413333,0.0,"(Stock, markets, paused, for, thought, Thursda...","(Global, equities, paused, for, thought, Thurs...","(Global, stocks, stall, after, Trump, 's, tari...","{'neg': 0.034, 'neu': 0.852, 'pos': 0.114, 'co...","{'neg': 0.086, 'neu': 0.754, 'pos': 0.16, 'com...","{'neg': 0.303, 'neu': 0.516, 'pos': 0.181, 'co...",-0.2263,0.431,0.3716
2019-04-05 13:30:00,Tanza Loudenback,"If you want to get rich quick, we have some ba...",The easiest and safest way to grow your money ...,"{'id': 'business-insider', 'name': 'Business I...","How to invest $100,000 to make $1 million",https://www.businessinsider.com/investment-cal...,https://amp.businessinsider.com/images/5ca65a2...,2019-04-05,"(I, f, , y, o, u, , w, a, n, t, , t, o, , ...","(T, h, e, , e, a, s, i, e, s, t, , a, n, d, ...","(H, o, w, , t, o, , i, n, v, e, s, t, , $, ...",0.136458,0.033333,0.0,0.483333,0.335714,0.0,"(If, you, want, to, get, rich, quick, ,, we, h...","(The, easiest, and, safest, way, to, grow, you...","(How, to, invest, $, 100,000, to, make, $, 1, ...","{'neg': 0.102, 'neu': 0.727, 'pos': 0.171, 'co...","{'neg': 0.0, 'neu': 0.879, 'pos': 0.121, 'comp...","{'neg': 0.0, 'neu': 1.0, 'pos': 0.0, 'compound...",0.0,0.4215,0.6705


### This is currently the result for one concept (The general S&P) Let's now concatnate this to our Daily price and find any correlations!

In [135]:
testing_data = news_df[['content_polarity','content_compound','description_polarity','description_compound' ,'title_polarity','text_compound','Date']]
testing_data = testing_data.set_index("Date")
testing_data.head()

Unnamed: 0_level_0,content_polarity,content_compound,description_polarity,description_compound,title_polarity,text_compound
Date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
2019-04-02,-0.017857,-0.6908,-0.017857,-0.6908,0.0,0.0
2019-04-02,0.265179,0.8611,0.366667,-0.2748,0.0,-0.4767
2019-04-03,0.08,0.0,0.066667,0.0,0.0,0.0
2019-04-04,0.370833,0.431,0.296667,0.3716,0.0,-0.2263
2019-04-05,0.136458,0.4215,0.033333,0.6705,0.0,0.0


In [139]:
working_df = sp_price[-30:]
#testing_df = working_df.mereg(working_df,testing_data)
testing_df = pd.concat([working_df.set_index('Date'),testing_data.set_index('Date')], axis=1, join='inner').reset_index()                
testing_df.head()

KeyError: 'Date'

In [130]:
testing_df

Unnamed: 0,Date,Open,High,Low,Close,Adj Close,Volume,content_polarity,content_compound,description_polarity,description_compound,title_polarity,text_compound
31,2019-03-20,2831.340088,2843.540039,2812.429932,2824.22998,2824.22998,3771200000.0,,,,,,
32,2019-03-21,2819.719971,2860.310059,2817.379883,2854.879883,2854.879883,3546800000.0,,,,,,
33,2019-03-22,2844.52002,2846.159912,2800.469971,2800.709961,2800.709961,4237200000.0,,,,,,
34,2019-03-25,2796.01001,2809.790039,2785.02002,2798.360107,2798.360107,3376580000.0,,,,,,
35,2019-03-26,2812.659912,2829.870117,2803.98999,2818.459961,2818.459961,3266050000.0,,,,,,
36,2019-03-27,2819.719971,2825.560059,2787.719971,2805.370117,2805.370117,3372930000.0,,,,,,
37,2019-03-28,2809.399902,2819.709961,2798.77002,2815.439941,2815.439941,3158170000.0,,,,,,
38,2019-03-29,2828.27002,2836.030029,2819.22998,2834.399902,2834.399902,3740700000.0,,,,,,
39,2019-04-01,2848.629883,2869.399902,2848.629883,2867.189941,2867.189941,3500760000.0,,,,,,
40,2019-04-02,2868.23999,2872.899902,2858.75,2867.23999,2867.23999,3246900000.0,,,,,,


### WE seem to have the same problem we delt witha while back, let's use the same method to see the fix

Unnamed: 0,Date,Open,High,Low,Close,Adj Close,Volume
31,2019-03-20,2831.340088,2843.540039,2812.429932,2824.22998,2824.22998,3771200000
32,2019-03-21,2819.719971,2860.310059,2817.379883,2854.879883,2854.879883,3546800000
33,2019-03-22,2844.52002,2846.159912,2800.469971,2800.709961,2800.709961,4237200000
34,2019-03-25,2796.01001,2809.790039,2785.02002,2798.360107,2798.360107,3376580000
35,2019-03-26,2812.659912,2829.870117,2803.98999,2818.459961,2818.459961,3266050000
