# Accessing Data From Twitter - Lab 

## Introduction
In this lab, we shall use our Twitter developer account and the keys we generated in the previous lesson to make some API calls to twitter. We shall look at a number of ways of accessing twitter data which may suit different use cases for twitter API calling. We shall also get an introduction to the `tweepy` Python library to help us with tweet mining and parsing. 
## Objectives
You will be able to:
* Successfully request tweet data from the Twitter API using the Tweepy library
* Understand and explain the concept of semi-structured data
* Parse tweet data and perform basic twitter analysis

## `tweepy`

Tweepy is open-source library, hosted on GitHub and enables Python to communicate with Twitter platform and use its API. Visit [HERE](https://pythonhosted.org/tweepy/index.html) for TWeepy's official documentation. Installing tweepy is easy, it can be pip installed as shown below:

In [None]:
# uncomment and pip install tweepy if you havent done so already
# !pip install tweepy

We can now simply import import tweepy in the python working environment

In [4]:
# Import tweepy
import tweepy

So we are now good to move on with setting up tweepy with our user and access tokens. 

### Using tweepy
Tweepy supports accessing Twitter through OAuth. Twitter has stopped accepting Basic Authentication so OAuth is now the only way to use the Twitter API. In order create the API object, however, we must first authenticate ourselves with our developer information.
* Enter your credentials into access_token, access_token_secret, consumer_key, and consumer_secret below

In [5]:
# Set credential variables with appropriate values
consumer_key = "Bqi4VGVT34L55ePRiZF80QlGX"
consumer_secret = "PuxXMs4z04MWIgc2AnDIuLS7gow2P3DVu0B0pjC5vBehpb5jS6"
access_token = "1019612699288915970-Zs9genL06wJmu8dCdwnCIJZt9tFind"
access_token_secret = "6fzYuRBpyR51h7ByTyYdSattJlM3LBSNrEiUpTJQKW60z"

### Creating the Authentication Object

Our next step would be to create tweepy  OAuthHandler instance with our consumer key and secret and set access token and secret using `tweepy.set_access_token`. We can then create an API object with this information.

We shall set it up as shown below:

```python
# Create the tweepy authentication object with consumer key and secret
auth = tweepy.OAuthHandler(consumer_key, consumer_secret)
# Setting your access token and secret
auth.set_access_token(access_token, access_token_secret)
# Creating the API object while passing in auth information
api = tweepy.API(auth)
```

In [6]:
# Paste above code here with your credentials to create the Oauth Handler Instance and API object

# Create the tweepy authentication object with consumer key and secret
auth = tweepy.OAuthHandler(consumer_key, consumer_secret)
# Setting your access token and secret
auth.set_access_token(access_token, access_token_secret)
# Creating the API object while passing in auth information
api = tweepy.API(auth)

Now we can start using the Twitter API through `api` object .

NOTE: If you have a web application and are using a callback URL that needs to be supplied dynamically, you would pass it in like shown below:
```python
auth = tweepy.OAuthHandler(consumer_token, consumer_secret, callback_url)
```

### The API Object - `home_timeline()` method
A detailed indsight into Tweepy's API object with supported methods to post, retrieve and select tweets can be found at [Tweepy's API Dcoumentation](https://pythonhosted.org/tweepy/api.html#api-reference). 

We can directly collect tweets from our home timeline by applying the method `api.home_timeline()`, which collects 20 most recent tweets by default (including retweeted tweets). To adjust the desired number of tweets (take 100 tweets for example),  we can pass in a parameter value as (count = 100).

By default we get first 140 characters of a tweet. We can, however, optionally pass in an extra argument `tweet_mode='extended'` to get the full length of the tweet. This might be useful if you are interested in analyzing the full text of tweets rather than geographical locations, and hashtags etc. only. 

In [27]:
# Use the API object to get tweets from your timeline ,collect 10 tweets
# Store the tweets it in a variable called my_timeline_tweets

my_timeline_tweets = api.home_timeline(count=10, tweet_mode='extended')


### Tweet JSON

All Twitter APIs that return Tweets provide that data encoded using JavaScript Object Notation (JSON). Each Tweet has an **author**, a **message**, a **unique ID**, a **timestamp** of when it was posted, and sometimes **geo metadata** shared by the user. Each User has a Twitter **name**, an **ID**, a **number of followers**, and most often an account **bio**.

With each Tweet we also get **"entity"** objects, which are arrays of common Tweet contents such as **hashtags**, **mentions**, **media**, and **links**. If there are links, the JSON payload can also provide metadata such as the fully unwound URL and the webpage’s title and description.

Here is what a Tweet JSON structure looks like:
```pythpn
{
  "created_at": "Thu Apr 06 15:24:15 +0000 2017",
  "id_str": "850006245121695744",
  "text": "1\/ Today we\u2019re sharing our vision for the future of the Twitter API platform!\nhttps:\/\/t.co\/XweGngmxlP",
  "user": {
    "id": 2244994945,
    "name": "Twitter Dev",
    "screen_name": "TwitterDev",
    "location": "Internet",
    "url": "https:\/\/dev.twitter.com\/",
    "description": "Your official source for Twitter Platform news, updates & events. Need technical help? Visit https:\/\/twittercommunity.com\/ \u2328\ufe0f #TapIntoTwitter"
  },
  "place": {   
  },
  "entities": {
    "hashtags": [      
    ],
    "urls": [
      {
        "url": "https:\/\/t.co\/XweGngmxlP",
        "unwound": {
          "url": "https:\/\/cards.twitter.com\/cards\/18ce53wgo4h\/3xo1c",
          "title": "Building the Future of the Twitter API Platform"
        }
      }
    ],
    "user_mentions": [     
    ]
  }
}



```

---

We can iterate through collected tweets in `my_timeline_tweets` to access any of the properties of each tweet. Let's print the name of user with `.user.name` propoerty and content of the each tweet using the `.text` property. (or `.full_text` in case of using `tweet_mode='extended'` full text beyond 140 character limit).

In [29]:
# foreach through all tweets pulled
for tweet in my_timeline_tweets:
   # printing the name and full text stored inside the tweet object
    print (tweet.user.name)
    print (tweet.full_text)
    print ('----------------------------------------------------------------------------')

Dr. Data&Science 🎃
RT @kylegriffin1: Lyft is partnering with Voto Latino to take voters to polls in Dodge City, Kansas after the city's only polling site was…
----------------------------------------------------------------------------
Data Science Central
Wrongness of the Nogs https://t.co/UejGJei2bx
----------------------------------------------------------------------------
Dr. Data&Science 🎃
RT @JustJen2015: @dataandpolitics https://t.co/M2mpRXfLB5
----------------------------------------------------------------------------
Data Science Central
Most popular data science keywords on DSC https://t.co/B10wjC7cDq
----------------------------------------------------------------------------
Data Science Central
Data Science and Machine Learning: Great List of Resources https://t.co/qofGIQlPMz
----------------------------------------------------------------------------
Data Science Central
Using Confusion Matrices to Quantify the Cost of Being Wrong https://t.co/Ff1mnXFi46
---------------

In this example, we’ll simply pull the latest twenty tweets from a user of our choice.

First, we’ll examine the Tweepy documentation to see if a function like that exists. With a bit of research, we find that the user_timeline() function is what we’re looking for.

We can see that the user_timeline() function has some useful parameters we can use, specifically id (the ID of the user) and count (the amount of tweets we want to pull). Note that we can only pull a limited number of tweets per query due to Twitter’s rate limits.



In [35]:
import json
my_list_of_dicts = []
for each_json_tweet in my_timeline_tweets:
    my_list_of_dicts.append(each_json_tweet._json)

with open('test.txt', 'w') as file:
    file.write(json.dumps(my_list_of_dicts, indent=4))

In [37]:
my_demo_list = []
with open('test.txt', encoding='utf-8') as json_file:  
    all_data = json.load(json_file)
    for each_dictionary in all_data:
        tweet_id = each_dictionary['id']
        text = each_dictionary['full_text']
        favorite_count = each_dictionary['favorite_count']
        retweet_count = each_dictionary['retweet_count']
        created_at = each_dictionary['created_at']
        my_demo_list.append({'tweet_id': str(tweet_id),
                             'text': str(text),
                             'favorite_count': int(favorite_count),
                             'retweet_count': int(retweet_count),
                             'created_at': created_at,
                            })
        #print(my_demo_list)
        tweet_json = pd.DataFrame(my_demo_list, columns = 
                                  ['tweet_id', 'text', 
                                   'favorite_count', 'retweet_count', 
                                   'created_at'])

In [40]:
tweet_json

Unnamed: 0,tweet_id,text,favorite_count,retweet_count,created_at
0,1054544301940600832,RT @kylegriffin1: Lyft is partnering with Voto...,0,5625,Tue Oct 23 01:25:28 +0000 2018
1,1054529839309430784,Wrongness of the Nogs https://t.co/UejGJei2bx,0,1,Tue Oct 23 00:28:00 +0000 2018
2,1054525325697146880,RT @JustJen2015: @dataandpolitics https://t.co...,0,1,Tue Oct 23 00:10:04 +0000 2018
3,1054514738758144000,Most popular data science keywords on DSC http...,3,2,Mon Oct 22 23:28:00 +0000 2018
4,1054507965242396672,Data Science and Machine Learning: Great List ...,8,3,Mon Oct 22 23:01:05 +0000 2018
5,1054497124434485248,Using Confusion Matrices to Quantify the Cost ...,5,3,Mon Oct 22 22:18:00 +0000 2018
6,1054488563868327937,RT @LaurenRPfeifer: @dataandpolitics Crazy eye...,0,1,Mon Oct 22 21:43:59 +0000 2018
7,1054488304136019970,"RT @stlbf: @darth, @dataandpolitics, @stlkerri...",0,1,Mon Oct 22 21:42:57 +0000 2018
8,1054482998672982016,A curated list of awesome #machinelearning fra...,5,1,Mon Oct 22 21:21:52 +0000 2018
9,1054482780959191040,New Marketing Insight from Unsupervised Bayesi...,4,4,Mon Oct 22 21:21:00 +0000 2018


In [None]:
To collect tweets from a particular account (take @NatGeo for example), use the method our_api.user_timeline(screen_name = 'NatGeo', count = 100).


Let’s try pulling the latest twenty tweets from twitter account @NyTimes.

In [10]:
# The Twitter user who we want to get tweets from
name = "nytimes"
# Number of tweets to pull
tweetCount = 5
# Calling the user_timeline function with our parameters
results = api.user_timeline(id=name, count=tweetCount)
# foreach through all tweets pulled
for tweet in results:
   # printing the text stored inside the tweet object
   print (tweet.text)

"Stay-at-home moms in Nebraska who have a limited grocery budget to live off of — no politician can understand that… https://t.co/DDiuNWChSd
Happy Monday! In case you need this today... https://t.co/XBEPHZwA2Q
In his morning tweets, President Trump attempted to stoke fear about a caravan of migrants and blamed Democrats for… https://t.co/OPDascYtpJ
RT @nytpolitics: It’s Monday, President Trump and Ted Cruz are friends again, and there are 15 days until the midterm elections https://t.c…
"The hurt that's been caused to people of faith by people of faith, that's just really hard to come to terms with."… https://t.co/1mpo9GvzIA


Example 3: Finding Tweets Using a Keyword
Let’s do one last example: Getting the most recent tweets that contain a keyword. This can be extremely useful if you want to monitor specifically mentioned topics in the Twitter world, or even to see how your business is getting mentioned. Let’s say we want to see how Twitter’s been mentioning Toptal.

In [18]:
# The search term you want to find
query = "brexit"
# Language code (follows ISO 639-1 standards)
language = "en"
# Calling the user_timeline function with our parameters
results = api.search(q=query, lang=language)
# foreach through all tweets pulled
c = 0
for tweet in results:
   # printing the text stored inside the tweet object
    print (tweet.user.screen_name,"Tweeted:",tweet.text)
    c=c+1
c

suem1951 Tweeted: RT @DAaronovitch: Another morning another ERGer peddling Brexit fantasies on the radio. This time Mark Francois MP describing a delegation…
timbercouk Tweeted: RT @WalesOnline: How Welsh ports plan to tackle Brexit fears with a plan to boost the economy 

https://t.co/O3WxOdBtZK https://t.co/QD5haE…
ophidianpilot Tweeted: Trainspotting Creator: Soros-Funded Anti-Brexit Campaign Is 'Smug, Patronising, Pish' https://t.co/F9ja3rWXQD via @BreitbartLondon
marthafisherlit Tweeted: @guardian the ghost of #brexit - finally captured on a photo!
ChrisDa1917 Tweeted: RT @labourleave: Superb essay!

Long read: how EU membership undermines the left https://t.co/RX0HvZtnZj #Brexit
D1222221 Tweeted: RT @LeaveMnsLeave: .@Nigel_Farage: Theresa May must stand up to the EU's creepy efforts to impose itself on Brexit Britain https://t.co/I8E…
sureduck Tweeted: RT @MarcusJBall: Our crowdfunded private prosecution case against @BorisJohnson  has now achieved financial backing from almost 7

15

In [17]:
c

15