# Twitter Example
In order to use all of this though, we need to setup a Developer API acocunt with Twitter and create an application to get credentials. Review the video for instructions on how to do this or if you are already familiar with it, just get the credentials from: 

    https://apps.twitter.com/
    
Once you have that you also need to install python-twitter, a python library to connect your Python to the twitter dev account.

You probably won't be able to run this example and then previous in the same notebook, you need to restart you kernel.

Let's get started!

Begin by running the TweetRead.py file. Make sure to add your own IP Adress and your credential keys.

# Imports and Configurations

In [None]:
# May cause deprecation warnings, safe to ignore, they aren't errors
from pyspark import SparkContext
from pyspark.streaming import StreamingContext
from pyspark.sql import SQLContext
from pyspark.sql.functions import desc

In [None]:
sc = SparkContext()

In [None]:
ssc = StreamingContext(sc, 10 )
sqlContext = SQLContext(sc)

In [None]:
socket_stream = ssc.socketTextStream("127.0.0.1", 5555)

In [None]:
lines = socket_stream.window( 20 )

In [None]:
from collections import namedtuple
fields = ("tag", "count" )
Tweet = namedtuple( 'Tweet', fields )

In [None]:
# Get to 10 hashtags associated with search key
( lines.flatMap( lambda text: text.split( " " ) ) #Splits to a list
  .filter( lambda word: word.lower().startswith("#") ) # Checks for hashtag calls
  .map( lambda word: ( word.lower(), 1 ) ) # Lower cases the word, creates the mapper
  .reduceByKey( lambda a, b: a + b ) # Reduces to sum of observations of a hashtag
  .map( lambda rec: Tweet( rec[0], rec[1] ) ) # Stores in a Tweet Object
  .foreachRDD( lambda rdd: rdd.toDF().sort( desc("count") ) # Sorts Them in a DF
  .limit(10).registerTempTable("tweets") ) ) # Registers to a table.

__________
# TweetRead.py is now running and serving tweets 
__________

In [None]:
ssc.start()    

In [None]:
import time
from IPython import display
import matplotlib.pyplot as plt
import seaborn as sns
%matplotlib inline 

In [None]:
import pandas as pd

In [None]:
df = None
count = 0

while count < 50:
    
    time.sleep( 3 )
    top_10_tweets = sqlContext.sql( 'SELECT tag, count FROM tweets' )
    top_10_df_current = top_10_tweets.toPandas()
    if df is None:
        df = top_10_df_current
    else:
        df = pd.concat([df,top_10_df_current])
    display.clear_output(wait=True)
    plt.figure( figsize = ( 10, 8 ) )
    sns.barplot( x="count", y="tag", data=df)
    plt.show()
    count = count + 1

In [None]:
ssc.stop()