### Stock Prices and News Articles

For this project I have decided to explore the financial industry. More specifically I am curious of the
relationship between news articles and the effect it has on the stock market. For this project I have collected two data sets, the first was obtained through Kaggle (called News_Category_Dataset_v2) and it is a dataset of news articles gathered from 2012 - 2018. The second data set is from yahoo finance of the SPY stock index between 2012 - 2020. I chose the SPY because it is a good general indicator of how the overall market is performing. This data could be beneficial in establishing a relationship between current news and the stock markets and help with predicting stocks movement.

To start we will import some libraries that we will need

In [1]:
import pandas as pd
import numpy as np

next we will load in our data sets

In [2]:
newsData = pd.read_json('News_Category_Dataset_v2.json', lines=True)
newsData.rename(columns={'date':'Date'}, inplace=True)
newsData.set_index('Date', inplace=True)

spyData = pd.read_csv('SPY.csv.txt',parse_dates=['Date'], index_col='Date')
spyData.rename(columns={'close':'SPY Close'}, inplace=True)

lets check out the newsData

In [3]:
#checks the data for missing values
for col in newsData.columns: 
    counter = 0
    empty = pd.isnull(newsData[col])
    for el in empty:
        if(el == True):
            counter = counter + 1
    print(str(col) + ": " + str(counter))

newsData.head()

category: 0
headline: 0
authors: 0
link: 0
short_description: 0


Unnamed: 0_level_0,category,headline,authors,link,short_description
Date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
2018-05-26,CRIME,There Were 2 Mass Shootings In Texas Last Week...,Melissa Jeltsen,https://www.huffingtonpost.com/entry/texas-ama...,She left her husband. He killed their children...
2018-05-26,ENTERTAINMENT,Will Smith Joins Diplo And Nicky Jam For The 2...,Andy McDonald,https://www.huffingtonpost.com/entry/will-smit...,Of course it has a song.
2018-05-26,ENTERTAINMENT,Hugh Grant Marries For The First Time At Age 57,Ron Dicker,https://www.huffingtonpost.com/entry/hugh-gran...,The actor and his longtime girlfriend Anna Ebe...
2018-05-26,ENTERTAINMENT,Jim Carrey Blasts 'Castrato' Adam Schiff And D...,Ron Dicker,https://www.huffingtonpost.com/entry/jim-carre...,The actor gives Dems an ass-kicking for not fi...
2018-05-26,ENTERTAINMENT,Julianna Margulies Uses Donald Trump Poop Bags...,Ron Dicker,https://www.huffingtonpost.com/entry/julianna-...,"The ""Dietland"" actress said using the bags is ..."


This data set has no empty or missing values and requires no further cleaning

now lets check out the spy data

In [4]:
#checks the data for missing values
for col in spyData.columns: 
    counter = 0
    empty = pd.isnull(spyData[col])
    for el in empty:
        if(el == True):
            counter = counter + 1
    print(str(col) + ": " + str(counter))
    
spyData.head()

Open: 0
High: 0
Low: 0
Close: 0
Adj Close: 0
Volume: 0


Unnamed: 0_level_0,Open,High,Low,Close,Adj Close,Volume
Date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
2012-09-04,141.039993,141.460007,140.130005,141.029999,120.11879,120226200
2012-09-05,141.089996,141.470001,140.630005,140.910004,120.016602,100660300
2012-09-06,141.759995,143.779999,141.75,143.770004,122.452522,158272500
2012-09-07,144.009995,144.389999,143.880005,144.330002,122.929474,107272100
2012-09-10,144.190002,144.440002,143.460007,143.509995,122.231087,86458500


This dataset isn't missing any data either, however we are only interested the close price so we will drop everything else

In [5]:
toDrop = ['Open','High','Low','Adj Close', 'Volume']
spyData.drop(toDrop, inplace=True, axis=1)


In [6]:
spyData.head()

Unnamed: 0_level_0,Close
Date,Unnamed: 1_level_1
2012-09-04,141.029999
2012-09-05,140.910004
2012-09-06,143.770004
2012-09-07,144.330002
2012-09-10,143.509995


Now lets combine these datasets into one dataset using the merge function

In [7]:
result = spyData.merge(newsData, on="Date" )
result.head(12)

Unnamed: 0_level_0,Close,category,headline,authors,link,short_description
Date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
2012-09-04,141.029999,FOOD & DRINK,15 Awesome Craft Beers In A Can,"Menuism, Contributor\nRestaurant reviews, menu...",https://www.huffingtonpost.com/entry/craft-bee...,"Thank Oskar Blues of Lyons, Colorado, for serv..."
2012-09-04,141.029999,TRAVEL,"Meeting, Everywhere, The Rulers Of North Korea","Illya Szilak, Contributor\nwriter, new media a...",https://www.huffingtonpost.com/entry/rulers-of...,"In North Korea, the images of the Great Leader..."
2012-09-04,141.029999,HOME & LIVING,Craft Of The Day: Make A Standout Embroidered ...,Nicole Guzzardi,https://www.huffingtonpost.com/entry/craft-of-...,Photo by Melissa from Look What I Made. We see...
2012-09-04,141.029999,PARENTING,Back to School Advice: Turning the Morning Bli...,"Christine Carter PhD, Contributor\nBest-sellin...",https://www.huffingtonpost.com/entry/back-to-s...,"In my household, there is a vast difference be..."
2012-09-04,141.029999,MONEY,Master the Five Factors That Feed FICO This Fall,"Jeanne Kelly, Contributor\nCredit & Identity T...",https://www.huffingtonpost.com/entry/master-th...,If your credit score has taken a summer vacati...
2012-09-04,141.029999,FOOD & DRINK,No Labor (Day) Pastry Pizza,"Anjali Malhotra, Contributor\nPerfect Morsel",https://www.huffingtonpost.com/entry/puff-past...,As I imagined myself -- for some reason in slo...
2012-09-04,141.029999,WELLNESS,Fearless 'Push Girl' Tiphany Adams: 'It's Okay...,Elizabeth Kuster,https://www.huffingtonpost.com/entry/push-girl...,I saw on the show how there was a time when yo...
2012-09-04,141.029999,PARENTING,The Silent But Deadly War on Drugs,"Gretchen Burns Bergman, Contributor\nCo-Founde...",https://www.huffingtonpost.com/entry/the-silen...,We spend billions of dollars on the war on dru...
2012-09-04,141.029999,WELLNESS,3 Myths About Vulnerability,,https://www.huffingtonpost.comhttp://psychcent...,Vulnerability is scary. But it’s also a powerf...
2012-09-04,141.029999,WELLNESS,Another Belated Update,"Kelcey Harrison, Contributor\nNative of San Fr...",https://www.huffingtonpost.com/entry/great-lun...,"Second was the long Thursday, made longer when..."


Now lets export the data to a csv file

In [8]:
result.to_csv('output.csv')