# Scraping stock news and analyzing sentiment
For the moderately interested day trader and programmer

In [62]:
import urllib.request
from bs4 import BeautifulSoup #need to pip install bs4
import time
from IPython.display import clear_output
# import tkinter
# from tkinter import messagebox

In [63]:
companies = ['Facebook','Alibaba']

These are the companies I currently have a few stocks in!

## Getting headlines and links from Google News
The news page URL is consistent over different search terms, so I can use the `replace` function to construct a url for each company I am interested in. The Google News page also has a consistent format for every search, which tags each link/headline with the identifier "aria-level=2".

In [65]:
def retrieve_links(company):
    """returns the company's news links and headlines"""
    headlines = []
    links = []

    quote_page = "https://news.google.com/news/search/section/q/$REPLACE/$REPLACE?hl=en&gl=US&ned=us"
    quote_page = quote_page.replace("$REPLACE", company)
        
    page = urllib.request.urlopen(quote_page)
    soup = BeautifulSoup(page, "html.parser") 
        
    arias = soup.find_all('a')
    #look for aria-level="2"
    for aria in arias:
        if aria.get('aria-level')=="2":
            if aria.get('href')[0:1]=="h": #if it is a https link
                headlines.append(aria.string)
                links.append(aria.get('href'))

    links = list(dict.fromkeys(links)) #orders the links

    return headlines, links

## Printing the news to Jupyter Notebook

In [None]:
for i in range(0, 6):
    """prints news every 10 minutes for one hour"""
    #root = tkinter.Tk()
    #root.withdraw()
    #messagebox.showinfo("Alert", "You have new news!")
    for company in companies:
        company_headlines, company_links = retrieve_links(company)
        print(company, "\n")
        for i in range(0, 3):
            print(company_headlines[i])
            print(company_links[i])
            print('\n')
        print('\n')
    time.sleep(10000)
    clear_output()

My news automatically refreshes every 10 minutes and prints out in this notebook! (I would like to send a notification to my computer, but Python's messagebox is a little awkward and doesn't close properly when I stop the program. It's currently commented out.)

## Analyzing the news sentiment around the company

In [66]:
import nltk #need pip install -U nltk
from nltk import tokenize
from nltk.sentiment.vader import SentimentIntensityAnalyzer
import re

In [67]:
headlines, links = retrieve_links('Alibaba')

#It's good practice to separate the retrieval from the analysis, 
#because I'm frequently going to be testing the analysis and don't
#want to send a new request to the web server every time I change something.

### Getting the text from each news article link
I assumed all of the text would be in the < p > tags without any attributes, so I filtered for those. This generates a pretty good rendering of the article text.

In [68]:
page = urllib.request.urlopen(links[1])
soup = BeautifulSoup(page, "html.parser")
html_text = soup.find_all('p')
text = ""
for tag in html_text:
    if tag.attrs == {}:
        text += tag.text

Now it's time for sentiment analysis using the Vader module in NLTK. I referenced this site to build my code: http://www.nltk.org/howto/sentiment.html

In [69]:
sentences = tokenize.sent_tokenize(text)
sid = SentimentIntensityAnalyzer()
negativity = 0
positivity = 0
for sentence in sentences:
    print(sentence)
    ss = sid.polarity_scores(sentence)
    for k in sorted(ss): # the dictionary categories are compound, neg, neu, and pos
        print('{0}: {1}, '.format(k, ss[k]), end='')
    negativity += ss['neg']
    positivity += ss['pos']
    print()
print(negativity)
print(positivity)
print("The sentiment score for this article is: ", '%.2f'%(positivity/negativity))

Now, Beijing is going all out to change that.
compound: 0.0, neg: 0.0, neu: 1.0, pos: 0.0, 
Authorities are musing fresh policies to engineer a tech homecoming.
compound: 0.3182, neg: 0.0, neu: 0.777, pos: 0.223, 
The move is as much about sharing gains from their rapid growth as increasing control over its tech sector, analysts said .Using the ongoing national parliamentary meetings as the backdrop, Beijing has been vocal about its plan.
compound: 0.7783, neg: 0.0, neu: 0.813, pos: 0.187, 
In his government work report delivered on March 5, Chinese Premier Li Keqiang said the country will “support leading innovative companies in going public.” Just a few days later, the Shanghai Stock Exchange issued a website notice, saying that authorities have visited promising tech startups one by one and pitched them options for China-based listings.
compound: 0.6808, neg: 0.0, neu: 0.901, pos: 0.099, 
And the China Securities Regulatory Commission(CSRC) has formed a special committee to facilita

Cool! Now I have the sentiment score for every article, and I will be able to tell if people are really liking the company today (sentiment > 1.0) or hating it today (sentiment < 1.0).