In [2]:
import pandas as pd
import numpy as np

### To have effective external validation for sentiment analysis for news headlines, I will be looking at the historic data for SPX and assign the following for the segmented news data per day:

* 1 - *Value rose or stayed the same*
* 0 - *Value decreased*

In [3]:
news = pd.read_csv('./data/csvs/news_250.csv')
news.head()

Unnamed: 0,headline,date
0,Gold Price Forecast: Real Yields Weighing on P...,2021-01-14 17:00:00-05:00
1,"Central Bank Watch: Fed Speeches, Interest Rat...",2021-01-14 16:45:00-05:00
2,Top 10 Candlestick Patterns To Trade the Markets,2021-01-14 14:30:00-05:00
3,How to Read a Candlestick Chart,2021-01-14 12:30:00-05:00
4,EUR/GBP Price Analysis: Will Recent Pound Mome...,2021-01-14 12:03:00-05:00


In [4]:
news.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 14982 entries, 0 to 14981
Data columns (total 2 columns):
 #   Column    Non-Null Count  Dtype 
---  ------    --------------  ----- 
 0   headline  14982 non-null  object
 1   date      14982 non-null  object
dtypes: object(2)
memory usage: 234.2+ KB


In [5]:
# converting 'date' to datetime
news['date'] = pd.to_datetime(news['date'])

In [6]:
news.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 14982 entries, 0 to 14981
Data columns (total 2 columns):
 #   Column    Non-Null Count  Dtype                                 
---  ------    --------------  -----                                 
 0   headline  14982 non-null  object                                
 1   date      14982 non-null  datetime64[ns, pytz.FixedOffset(-300)]
dtypes: datetime64[ns, pytz.FixedOffset(-300)](1), object(1)
memory usage: 234.2+ KB


In [10]:
# now that the 'date' is set to the correct format with time zone, it's time to melt the data by date and put all news in a day in a single row
news['date'] = news.date.dt.date

In [11]:
news.head()

Unnamed: 0,headline,date
0,Gold Price Forecast: Real Yields Weighing on P...,2021-01-14
1,"Central Bank Watch: Fed Speeches, Interest Rat...",2021-01-14
2,Top 10 Candlestick Patterns To Trade the Markets,2021-01-14
3,How to Read a Candlestick Chart,2021-01-14
4,EUR/GBP Price Analysis: Will Recent Pound Mome...,2021-01-14


In [15]:
grouped_news = news.groupby(by='date').agg({'headline' : 'sum'}).reset_index()

In [16]:
grouped_news.head()

Unnamed: 0,date,headline
0,2016-09-02,Cramer Remix: How to know when the market's mo...
1,2016-09-06,Cramer Remix: Time to stash an elevated amount...
2,2016-09-07,Cramer: Too much headline risk for me to like ...
3,2016-09-08,Cramer Remix: The black hole that investors ar...
4,2016-09-09,Cramer: A stock with $1 downside and a $3 upsi...


In [3]:
combined_news = pd.read_csv('./data/csvs/Combined_News_DJIA.csv')
combined_news.tail()

Unnamed: 0,Date,Label,Top1,Top2,Top3,Top4,Top5,Top6,Top7,Top8,...,Top16,Top17,Top18,Top19,Top20,Top21,Top22,Top23,Top24,Top25
1981,2016-06-27,0,Barclays and RBS shares suspended from trading...,Pope says Church should ask forgiveness from g...,Poland 'shocked' by xenophobic abuse of Poles ...,"There will be no second referendum, cabinet ag...","Scotland welcome to join EU, Merkel ally says",Sterling dips below Friday's 31-year low amid ...,No negative news about South African President...,Surge in Hate Crimes in the U.K. Following U.K...,...,German lawyers to probe Erdogan over alleged w...,"Boris Johnson says the UK will continue to ""in...",Richard Branson is calling on the UK governmen...,Turkey 'sorry for downing Russian jet',Edward Snowden lawyer vows new push for pardon...,Brexit opinion poll reveals majority don't wan...,"Conservative MP Leave Campaigner: ""The leave c...","Economists predict UK recession, further weake...","New EU 'superstate plan by France, Germany: Cr...",Pakistani clerics declare transgender marriage...
1982,2016-06-28,1,"2,500 Scientists To Australia: If You Want To ...","The personal details of 112,000 French police ...",S&amp;P cuts United Kingdom sovereign credit r...,Huge helium deposit found in Africa,CEO of the South African state broadcaster qui...,"Brexit cost investors $2 trillion, the worst o...",Hong Kong democracy activists call for return ...,Brexit: Iceland president says UK can join 'tr...,...,"US, Canada and Mexico pledge 50% of power from...",There is increasing evidence that Australia is...,"Richard Branson, the founder of Virgin Group, ...","37,000-yr-old skull from Borneo reveals surpri...",Palestinians stone Western Wall worshipers; po...,Jean-Claude Juncker asks Farage: Why are you h...,"""Romanians for Remainians"" offering a new home...",Brexit: Gibraltar in talks with Scotland to st...,8 Suicide Bombers Strike Lebanon,Mexico's security forces routinely use 'sexual...
1983,2016-06-29,1,Explosion At Airport In Istanbul,Yemeni former president: Terrorism is the offs...,UK must accept freedom of movement to access E...,Devastated: scientists too late to captive bre...,British Labor Party leader Jeremy Corbyn loses...,A Muslim Shop in the UK Was Just Firebombed Wh...,Mexican Authorities Sexually Torture Women in ...,UK shares and pound continue to recover,...,"Escape Tunnel, Dug by Hand, Is Found at Holoca...",The land under Beijing is sinking by as much a...,Car bomb and Anti-Islamic attack on Mosque in ...,Emaciated lions in Taiz Zoo are trapped in blo...,Rupert Murdoch describes Brexit as 'wonderful'...,More than 40 killed in Yemen suicide attacks,Google Found Disastrous Symantec and Norton Vu...,Extremist violence on the rise in Germany: Dom...,BBC News: Labour MPs pass Corbyn no-confidence...,Tiny New Zealand town with 'too many jobs' lau...
1984,2016-06-30,1,Jamaica proposes marijuana dispensers for tour...,Stephen Hawking says pollution and 'stupidity'...,Boris Johnson says he will not run for Tory pa...,Six gay men in Ivory Coast were abused and for...,Switzerland denies citizenship to Muslim immig...,Palestinian terrorist stabs israeli teen girl ...,Puerto Rico will default on $1 billion of debt...,Republic of Ireland fans to be awarded medal f...,...,Googles free wifi at Indian railway stations i...,Mounting evidence suggests 'hobbits' were wipe...,The men who carried out Tuesday's terror attac...,Calls to suspend Saudi Arabia from UN Human Ri...,More Than 100 Nobel Laureates Call Out Greenpe...,British pedophile sentenced to 85 years in US ...,"US permitted 1,200 offshore fracks in Gulf of ...",We will be swimming in ridicule - French beach...,UEFA says no minutes of silence for Istanbul v...,Law Enforcement Sources: Gun Used in Paris Ter...
1985,2016-07-01,1,A 117-year-old woman in Mexico City finally re...,IMF chief backs Athens as permanent Olympic host,"The president of France says if Brexit won, so...",British Man Who Must Give Police 24 Hours' Not...,100+ Nobel laureates urge Greenpeace to stop o...,Brazil: Huge spike in number of police killing...,Austria's highest court annuls presidential el...,"Facebook wins privacy case, can track any Belg...",...,"The United States has placed Myanmar, Uzbekist...",S&amp;P revises European Union credit rating t...,India gets $1 billion loan from World Bank for...,U.S. sailors detained by Iran spoke too much u...,Mass fish kill in Vietnam solved as Taiwan ste...,Philippines president Rodrigo Duterte urges pe...,Spain arrests three Pakistanis accused of prom...,"Venezuela, where anger over food shortages is ...",A Hindu temple worker has been killed by three...,Ozone layer hole seems to be healing - US &amp...


Most recent set of headlines from the combined news data is from 2016-07-01