# Topic: News Sentiment Analysis (In Preparation for ETF Analysis)

### By analyzing news sentiment pertaining to future economic outlook, we hope to determine whether sentiment can serve as a leading indicator for equity markets' performance

- Due to time constraints, the scope of the assinment will be limited to US equities market. 
- S&P 500 will serve as our market index for the time 
- We will start with news articles from Reuters, eventually expanding to multiple news sources
- The result of this analysis can applied to our final project, where we will anlayze how news sentiment can effect the performance of investment funds

In [1]:
import pandas as pd
import datetime as dt
import matplotlib.pyplot as plt
import matplotlib.font_manager as font_manager
import numpy as np
import requests
import time
from bs4 import BeautifulSoup
import pandas_datareader.data as web
import re

In [2]:
# grabs open, high, low, close price data for SP500
def SP500(startDate, endDate):
    sp = web.DataReader('^GSPC', 'yahoo', startDate, endDate)
    sp = sp.resample('D').ffill()
    return sp

startDate = dt.date(2018,10,1)
endDate = dt.date.today()
SP = SP500(startDate, endDate)

In [49]:
# grab links to news articles from reuter's archive page
# ten articles are displayed on each page
url_links = []
for i in range(1,10):
    url = 'https://www.reuters.com/news/archive/marketsNews?view=page&page=' + str(i) + '&pageSize=10'
    html = requests.get(url)
    content = html.content
    content.decode().strip().replace('\t','').split('\n')
    soup = BeautifulSoup(content, "html.parser")
    for tags in soup.find_all('a'):
        if re.search('article', tags['href']):
            url_links.append(tags['href'])
            
# some linkes may be duplicated thus we only select those that only appear once
final_urls = []
for url in url_links:
    if url not in final_urls:
        final_urls.append(url)

In [83]:
# retreive the title, publish time and content for each article

title_all = []
time_all = []
content_all = []

for url in final_urls[:10]:
    link = 'https://www.reuters.com' + url
    page = requests.get(link).content
    soup = BeautifulSoup(page, "html.parser")
    newsTitle = soup.title.text
    newsTime = soup.find_all("div", {"class": 'ArticleHeader_date'})[0].text
    newsContent = ''
    for tag in soup.find_all('p'):
        newsContent += tag.text
        
    title_all.append(newsTitle)
    time_all.append(newsTime)
    content_all.append(newsContent)

# remove spaces infront of titles
title_all = [x.replace('  ', '') for x in title_all]
title_all = [x.replace('\n', '') for x in title_all]

In [84]:
pd.DataFrame({'Title' : title_all, 'Time':time_all, 'Content':content_all})

Unnamed: 0,Title,Time,Content
0,Egypt non-oil private-sector activity expands ...,"May 5, 2019 / 4:16 AM / Updated 2 hours ago","3 Min ReadCAIRO, May 5 (Reuters) - Egypt’s non..."
1,Elliott wants Whitbread to offload chunks of i...,"May 4, 2019 / 10:31 PM / Updated 8 hours ago",2 Min Read(Reuters) - Elliott Advisors has bec...
2,"Warren Buffett defends Kraft, says Wells Fargo...","May 4, 2019 / 5:13 AM / Updated 9 hours ago","6 Min ReadOMAHA, Neb. (Reuters) - Warren Buffe..."
3,"BRIEF-Warren Buffett, Charlie Munger, Greg Abe...","May 4, 2019 / 8:50 PM / in 10 hours",4 Min ReadMay 4 (Reuters) - Buffett says it is...
4,Warren Buffett: Berkshire discloses enough abo...,"May 4, 2019 / 8:39 PM / Updated 10 hours ago","2 Min ReadOMAHA, Neb., May 4 (Reuters) - Warre..."
5,"'We screwed up' not buying Google shares, Berk...","May 4, 2019 / 6:40 PM / Updated 12 hours ago",1 Min ReadMay 4 (Reuters) - One of Warren Buff...
6,Kraft Heinz's chief marketing officer Eduardo ...,"May 4, 2019 / 6:33 PM / Updated 12 hours ago",2 Min ReadMay 4 (Reuters) - Kraft Heinz Co’s c...
7,"HIGHLIGHTS-Wit and wisdom of Warren Buffett, t...","May 4, 2019 / 3:29 PM / Updated 12 hours ago","5 Min Read(Adds comments on Apple, 3G Capital ..."
8,"BRIEF-Warren Buffett, Charlie Munger, Ajit Jai...","May 4, 2019 / 5:01 PM / Updated 13 hours ago",3 Min ReadMay 4 (Reuters) - Buffett says first...
9,Erdogan signals he backs re-run of contested I...,"May 4, 2019 / 12:53 PM / in 15 hours",4 Min ReadANKARA (Reuters) - Turkish President...
