# Market News Realtime Custom Notifier

The purpose of the project is to develop a POC application in order to assess the feasibility of developing a tool which notifies the user when a certain makert news comes up on the air.

In order to do this a scraper will be built. Such scraper will be constantly querying the realtime news page of MarketWatch website, scraping the new entries added since the last run and checking for the trigger keywords that the user wants to keep a tab on, e.g. "oil", "surge", "drops". Additionally it would be good if the user could also select to notify him/her when the headline contains a percentage bigger than 5% for example, in conjunction with the previous words. This would allow to identify when a specific stock, commodity or currency drops or surges below or above a certain percentage.

Also, as a last development, not included in the scope of this POC application, but which could be usefull for a real live application if the conclusions of this experiment turn out to be positive is to include a search engine which uses machine learning system, specifically semantic search. This would allow the user not only to select from a predetermined set of words like "drop", "surge", "unstable", etc. but to write in his/her own words what he or she wants to get notified for. And the semantic search would have the ability to retrieve headlines which agree in meaning with this search without the need to keep a huge dictionary synonym table.  

** NOTES ON THE PAST **

- First attempt to scrape some twitter feeds failed miserably since it was not nearly real-time and additionally it did not scraped all tweets, it seemed to randomly scrape some and miss some.
- It was decided to use the Twitter stream API to check realtime capabilities as well as to increase the fidelity of information (that is, all tweets must be scraped). It worked. It is increadibly real-time. Seconds of difference perhaps.
- The stream should follow users not keywords, the keyword part should be done on the server. This is so because the Twitter Stream API uses OR for the "track" and "follow" variables. So if one does track="oil,surge" and follow="marketWatch" it will get all tweets that have oil and surge keywords and also all tweets from marketWatch.

** NOTES ON THE FUTURE **

- Now we need to select a few users to stream from. Let's initially select major news outlets, major market players, and major market news outlets. The user should be able to follow anyone he or she would like though in conjunction with the keywords that he/she will be tracking.
- Check the error that appears when a user enters a Twitter handle that does not exist. Catch it and show an error message.
- When reaching the point at which we will be tracking keywords on the stream be aware that TextBlob may result usefull in correcting spelling errors, taking into account synonyms, pluralizations and perhaps even translations if there are any major local outlets in a language different to the one the user speaks. To check out the TextBlock documentation go here: http://textblob.readthedocs.io/en/dev/quickstart.html#get-word-and-noun-phrase-frequencies

#### Notifier Function

Function used to create desktop notification on OSX system.

In [3]:
# Import module
import os

# The notifier function
def notify(title, subtitle, message):
    t = '-title {!r}'.format(title)
    s = '-subtitle {!r}'.format(subtitle)
    m = '-message {!r}'.format(message)
    os.system('terminal-notifier {}'.format(' '.join([m, t, s])))

#### Create Database

1. Creates the database table to store the information of news headlines that have been relevant to the user.
2. Creates the database table to store the news obtained by the last run of the scraper so the monitor can process them. 

In [4]:
# Import module
import sqlite3

# This function creates a database instance which consists of 2 tables.
# First table is user_headlines which consists of headlines that have matched the user match criteria
# Second table are the realtime_headlines which consists of headlines that are yet to be matched angainst the user criteria
def create_database():
    print("started creating database...")
    # Connect to "teeview_analytics" database
    conn = sqlite3.connect('market_news_watcher.db')
    # Create "campaigns" table if it does not exist
    user_headlines_table = conn.execute("SELECT headline FROM sqlite_master WHERE type='table' AND name='user_headlines'").fetchall()
    if len(user_headlines_table) == 0: conn.execute("create table user_headlines(headline)")
    # Create "sales_data" table if it does not exist
    realtime_headlines_table = conn.execute("SELECT headline FROM sqlite_master WHERE type='table' AND name='realtime_headlines'").fetchall()
    if len(realtime_headlines_table) == 0: conn.execute("create table realtime_headlines(headline)")
    # Close connection
    conn.close()
    print("finished creating database!")

#### Twitter Stream

In [17]:
# Import module
import tweepy

# This should be stored as enviroment variables
consumer_key = '1m06oM795BE0EuynpeLWbRNCO'
consumer_secret = 'WMnnyzPQwLmDowvTjLGLperW0XmJVyaeOoWQXmLwjAjyyB40yW'
access_token = '805435652707876864-ybtcv2DHVl740HYcMHjElAHnCQOsX2l'
access_token_secret = 'S9zNQUkXlrGGnYoxTgt0bcXpJQXxyv9eroiKPgONLXKU7'

In [5]:
# Authenticate application
auth = tweepy.OAuthHandler(consumer_key, consumer_secret)
auth.set_access_token(access_token, access_token_secret)

In [19]:
#Override tweepy.StreamListener to add logic to on_status
class MyStreamListener(tweepy.StreamListener):

    def on_status(self, status):
        print(status.text)

# Create stream listener
myStreamListener = MyStreamListener()
myStream = tweepy.Stream(auth = api.auth, listener=myStreamListener)

The following will be the list of news outlets, major market players and specialized market news twitter feeds the Stream will be following on.

1. Currency rates:
    - @ForexLive: https://twitter.com/ForexLive
    - @currencynews: https://twitter.com/currencynews
    - @FXstreetNews: https://twitter.com/fxstreetnews
2. Market news:
    - @MarketCurrents: https://twitter.com/marketcurrents (SeekingAlpha for Breaking News)
    - @WSJmarkets: https://twitter.com/wsjmarkets
    - @markets: https://twitter.com/markets (Bloomberg)
    - @MarketWatch: https://twitter.com/MarketWatch
    - @Reuters: https://twitter.com/Reuters
    - @YahooFinance: https://twitter.com/YahooFinance

In [None]:
twitter_ids = {
    "MarketCurrents": "15296897",
    "WSJmarkets": "28164923",
    "markets" : "69620713",
    "MarketWatch" : "624413",
    "Reuters" : "1652541",
    "YahooFinance" : "19546277",
    "ForexLive" : "19399038",
    "currencynews" : "24349486",
    "FXstreetNews" : "27652717"
}
# Start steam (sync)
myStream.filter(follow=['15296897'])

#### Matcher Function

1. Matches the newly scraped market news against the new headlines and notifies the user if it must. 
2. Stores the headlines of the notified news, that is, the news that matched the user search parameters.
3. Removes from the database 2. the news that have already been checked against the user search parameters.