Now that you've looked at an example of how we can use Spark in streaming mode, it's time to try it out on your own.

For this exercise, you’ll use the [Twitter streaming API](https://developer.twitter.com/en/docs/tweets/filter-realtime/api-reference/post-statuses-filter.html) to analyze tweets in real time. In order to use Twitter’s API, you’ll first need to [create an account](https://twitter.com/signup), if you don’t already have one. Then you’ll need to register an app to access the API. Ian Andersen Gray has written great [instructions for registering an app with Twitter](https://iag.me/socialmedia/how-to-create-a-twitter-app-in-8-easy-steps/). When asked for your website, you should use the URL for your Big Data GitHub repo.

This notebook (which is also available in your GitHub repo) contains the starter code to pull tweets from the Twitter streaming API and pass them to your Spark instance.

For this challenge, you need to:

* Something about posing a question
* Something about using Spark
* Something about visualizations
* Something about drawing a conclusion

When you're done, submit a link to your notebook below.

In [None]:
import socket
import sys
import requests
import requests_oauthlib
import json

In [None]:
# Replace the values below with yours
ACCESS_TOKEN = 'YOUR_ACCESS_TOKEN'
ACCESS_SECRET = 'YOUR_ACCESS_SECRET'
CONSUMER_KEY = 'YOUR_CONSUMER_KEY'
CONSUMER_SECRET = 'YOUR_CONSUMER_SECRET'
my_auth = requests_oauthlib.OAuth1(CONSUMER_KEY, CONSUMER_SECRET,ACCESS_TOKEN, ACCESS_SECRET)

In [None]:
def get_tweets():
    url = 'https://stream.twitter.com/1.1/statuses/filter.json'
    query_data = [('language', 'en'), ('locations', '-130,-20,100,50'),('track','#')]
    query_url = url + '?' + '&'.join([str(t[0]) + '=' + str(t[1]) for t in query_data])
    response = requests.get(query_url, auth=my_auth, stream=True)
    print(query_url, response)
    return response

In [None]:
def send_tweets_to_spark(http_resp, tcp_connection):
    for line in http_resp.iter_lines():
        try:
            full_tweet = json.loads(line)
            tweet_text = full_tweet['text']
            print("Tweet Text: " + tweet_text)
            print ("------------------------------------------")
            tcp_connection.send(tweet_text + '\n')
        except:
            e = sys.exc_info()[0]
            print("Error: %s" % e)

In [None]:
TCP_IP = "localhost"
TCP_PORT = 9009
conn = None
s = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
s.bind((TCP_IP, TCP_PORT))
s.listen(1)
print("Waiting for TCP connection...")
conn, addr = s.accept()
print("Connected... Starting getting tweets.")
resp = get_tweets()
send_tweets_to_spark(resp, conn)