# Market News Realtime Custom Notifier

The purpose of the project is to develop a POC application in order to assess the feasibility of developing a tool which notifies the user when a certain makert news comes up on the air.

In order to do this a scraper will be built. Such scraper will be constantly querying the realtime news page of MarketWatch website, scraping the new entries added since the last run and checking for the trigger keywords that the user wants to keep a tab on, e.g. "oil", "surge", "drops". Additionally it would be good if the user could also select to notify him/her when the headline contains a percentage bigger than 5% for example, in conjunction with the previous words. This would allow to identify when a specific stock, commodity or currency drops or surges below or above a certain percentage.

Also, as a last development, not included in the scope of this POC application, but which could be usefull for a real live application if the conclusions of this experiment turn out to be positive is to include a search engine which uses machine learning system, specifically semantic search. This would allow the user not only to select from a predetermined set of words like "drop", "surge", "unstable", etc. but to write in his/her own words what he or she wants to get notified for. And the semantic search would have the ability to retrieve headlines which agree in meaning with this search without the need to keep a huge dictionary synonym table.  

#### Notifier Function

Function used to create desktop notification on OSX system.

In [3]:
# Import module
import os

# The notifier function
def notify(title, subtitle, message):
    t = '-title {!r}'.format(title)
    s = '-subtitle {!r}'.format(subtitle)
    m = '-message {!r}'.format(message)
    os.system('terminal-notifier {}'.format(' '.join([m, t, s])))

#### Create Database

1. Creates the database table to store the information of news headlines that have been relevant to the user.
2. Creates the database table to store the news obtained by the last run of the scraper so the monitor can process them. 

In [4]:
# Import module
import sqlite3

# This function creates a database instance which consists of 2 tables.
# First table is user_headlines which consists of headlines that have matched the user match criteria
# Second table are the realtime_headlines which consists of headlines that are yet to be matched angainst the user criteria
def create_database():
    print("started creating database...")
    # Connect to "teeview_analytics" database
    conn = sqlite3.connect('market_news_watcher.db')
    # Create "campaigns" table if it does not exist
    user_headlines_table = conn.execute("SELECT headline FROM sqlite_master WHERE type='table' AND name='user_headlines'").fetchall()
    if len(user_headlines_table) == 0: conn.execute("create table user_headlines(headline)")
    # Create "sales_data" table if it does not exist
    realtime_headlines_table = conn.execute("SELECT headline FROM sqlite_master WHERE type='table' AND name='realtime_headlines'").fetchall()
    if len(realtime_headlines_table) == 0: conn.execute("create table realtime_headlines(headline)")
    # Close connection
    conn.close()
    print("finished creating database!")

#### Scraper Function

Scrapes the MarketWatch realtime news page in search of live market news with the search parameters.

The scraper will feed from live Twitter pages reporting news on market stock prices, currency rates and comodities.

1. Currency rates:
    - @ForexLive: https://twitter.com/ForexLive
    - @currencynews: https://twitter.com/currencynews
    - @FXstreetNews: https://twitter.com/fxstreetnews
2. Market news:
    - @MarketCurrents: https://twitter.com/marketcurrents?lang=en
    - @WSJmarkets: https://twitter.com/wsjmarkets?lang=en
    - @markets: https://twitter.com/markets

In [58]:
# Import modules
import requests
from bs4 import BeautifulSoup

# This function scrapes the realtime market news headlines from MarketWatch
def scrape_headlines():
    url = "https://twitter.com/marketcurrents?lang=en"
    response = requests.get(url)
    html = BeautifulSoup(response.text, "html.parser")
    tweets = html.select("p.TweetTextSize.TweetTextSize--jumbo.js-tweet-text.tweet-text")
    print(len(tweets))
    for tweet in tweets:
        tweet_str = '{0}'.format(tweet)
        tweet_ary = tweet_str.split('<a class="twitter-timeline-link" data-expanded-url="')
        tweet_text = tweet_ary[0].replace('<p class="TweetTextSize TweetTextSize--jumbo js-tweet-text tweet-text" data-aria-label-part="0" lang="en">', '')
        tweet_url = tweet_ary[1].split('"')[0]
        print("{0} :::: {1}".format("tweet", tweet_text))
        print("\n\n")
        print("{0} :::: {1}".format("url", tweet_url))
        print("\n")
        print(tweet)
        print("\n\n\n\n")

** NOTES - CORRECTIONS TO BE APPLIED **

This approach is not useful. Request does not download all the Tweets nor is it even close to realtime. Need to use the Twitter API which for Python is called Tweetpy, in particular I need to dig into the Streaming API: http://docs.tweepy.org/en/v3.5.0/streaming_how_to.html#a-few-more-pointers.

Also should use a package for text recognition, no need to write the functions myself. Also pretty much a difficult task to include all the variations, pluralizations, translations, spelling corrections needed to build a decent piece of software. Textblob is the tech of choice: http://textblob.readthedocs.io/en/dev/quickstart.html#get-word-and-noun-phrase-frequencies

Still left to decide: If after doing all this the program works fairly well without the need to use machine learning, only simple tokenization and frecuency counter with spelling correction and perhaps some translation included to also feed from the main media in any major country. If it works. Where will this Python script be running continuously? What about the database? Ok, but that is a task for another day

** END OF NOTES **

In [59]:
scrape_headlines()

5
tweet :::: L Brands leads S&amp;P as March results exceed investor fears 



url :::: https://seekingalpha.com/news/3256054-l-brands-leads-s-and-p-march-results-exceed-investor-fears?source=feed_f


<p class="TweetTextSize TweetTextSize--jumbo js-tweet-text tweet-text" data-aria-label-part="0" lang="en">L Brands leads S&amp;P as March results exceed investor fears <a class="twitter-timeline-link" data-expanded-url="https://seekingalpha.com/news/3256054-l-brands-leads-s-and-p-march-results-exceed-investor-fears?source=feed_f" dir="ltr" href="https://t.co/zDRAgGruok" rel="nofollow noopener" target="_blank" title="https://seekingalpha.com/news/3256054-l-brands-leads-s-and-p-march-results-exceed-investor-fears?source=feed_f"><span class="tco-ellipsis"></span><span class="invisible">https://</span><span class="js-display-url">seekingalpha.com/news/3256054-l</span><span class="invisible">-brands-leads-s-and-p-march-results-exceed-investor-fears?source=feed_f</span><span class="tco-ellipsis

#### Matcher Function

1. Matches the newly scraped market news against the new headlines and notifies the user if it must. 
2. Stores the headlines of the notified news, that is, the news that matched the user search parameters.
3. Removes from the database 2. the news that have already been checked against the user search parameters.