# APIers
You get the twitter content as a learning boost (below). Once you can play with this a bit then you can see about applying similar principles to your specific APIs.

### Setting up credentials
The following will need to be run new every time you want to start doing some scraping.  Tweepy will need to be installed and you'll need to go to the command line for this: `$ pip install tweepy`.

In [None]:
#from https://marcobonzanini.com/2015/03/02/mining-twitter-data-with-python-part-1/

import tweepy
from tweepy import OAuthHandler
 
consumer_key = 'Your key here'
consumer_secret = 'Your secret here'
access_token = 'Your key here'
access_secret = 'Your secret here'
 
auth = OAuthHandler(consumer_key, consumer_secret)
auth.set_access_token(access_token, access_secret)
 
api = tweepy.API(auth)

### Scraping 100 Tweets
This next bit of code will scrape the last 100 tweets from twitter for a given search term (in this case "#trump").  With the credentials set up and the tweepy library all the magic happens on just one line.  The rest is just seeing the results

In [None]:
results = api.search("#trump", count=100) #100 is the max that the api search will return
print(len(results))
for item in results:
    print(item.text)
    print(type(item))
#    break

### Save the 100 tweets
If you want to save the results to a file then you should run something like the following.

In [None]:
import json
with open("trumpTweets4.json", "w") as f:
    print(len(results))
    for tweet in results:
        f.write(json.dumps(tweet._json) + "\n")

### Tokenize the tweets
We're going to break up each tweet into its respective "words" and count them.  You will likely get an error the first time you run the NLTK.  _READ_ the error.  It is unlikely that you will have success implementing the solution in Jupyter Notebooks so I suggest you try doing it with python on the command line instead.  I'll give you a hint: 

    $ python
    <bunch of stuff about Python>
    >>>

In [None]:
import json
from nltk.tokenize import TweetTokenizer
from collections import Counter
from nltk.corpus import stopwords
import string

print(len(response))
 
punctuation = list(string.punctuation)
stop = stopwords.words('english') + punctuation + ['rt', 'via', 'RT', '…']

count_all = Counter()
for tweet in response:
    text=tweet.text
    #print(text)
    terms = [term.lower() for term in TweetTokenizer().tokenize(text) if term not in stop]
    count_all.update(terms)
# Print the first 5 most frequent words
print(count_all.most_common(10))

### Plotting Sample
This code is completely self-contained.  It is here so that you can play with it and get a feeling for how the `matplotlib` library works.  Change the values.  Comment out lines.  Play with it.

In [None]:
import matplotlib.pyplot as plt

#special ipython/jupyter command that keeps the output in this window rather than opening another one.
%matplotlib inline 

plt.figure()
plt.plot([1, 2, 3, 4], [10, 20, 25, 30], color='lightblue', linewidth=3)
plt.scatter([0.3, 3.8, 1.2, 2.5], [11, 25, 9, 26], color='darkgreen', marker='^')
plt.bar([1,2,3,4],[12,3,25,18], width=0.2, align='center')
plt.xlim(0.5, 4.5)
plt.ylim(0,50)
plt.show()

### Plotting Your Tweet Data
Now that you have a feel for `matplotlib` compare what is above to what is below.  Again, make sure that you are testing and poking at the content.

In [None]:
%matplotlib inline
count_all_dict = dict(count_all.most_common(10))
import matplotlib.pyplot as plt
#plt.figure(figsize=(100, 40))
plt.bar(range(len(count_all_dict)), count_all_dict.values(), align='center')
plt.xticks(range(len(count_all_dict)), list(count_all_dict.keys()),rotation='vertical')

plt.show()

### Reading from the Twitter stream...
What is below will open a twitter stream and pull down every tweet that it can until you either turn it off or you run out of space to store it.  _Be warned_, you can collect a lot of data very quickly (a popular search term will generate hundreds of megabytes of data in a day).

In [None]:
# from https://marcobonzanini.com/2015/03/02/mining-twitter-data-with-python-part-1/
# IMPORTANT: this cell will run until you stop it, it crashes, or you run out of space.
from tweepy import Stream
from tweepy.streaming import StreamListener
 
class MyListener(StreamListener):
 
    def on_data(self, data):
        try:
            with open('trumpStream.json', 'a') as f:
                f.write(data)
                return True
        except BaseException as e:
            print("Error on_data: %s" % str(e))
        return True
 
    def on_error(self, status):
        print(status)
        return True
 
twitter_stream = Stream(auth, MyListener())
twitter_stream.filter(track=['#trump'])

Want to check that you know what you are doing?  Here's some things to try:

0. Add comments to the code that are the equivalent of _you_ explaining to your future self what the code is doing.
1. Change the code so that when it is run you are asked what to search for and your response becomes the search term.
2. Change the code so that the 100 tweet grabber runs all at once rather than with you needing to run it one cell at a time.
3. Play with the sentiment analysis tool in the other notebook.