# Using the Twitter Search API

Working with the API within a Python program is straightforward both for Premium and Enterprise clients.

Our group's python [tweet parser library](https://github.com/twitterdev/tweet_parser) is a requirement.


Prior to starting your program, an easy way to define your secrets will be setting an environment variable. If you are an enterprise client, your authentication will be a (username, password) pair. If you are a premium client, you'll need to get a bearer token that will be passed with each call for authentication.

```
export TWITTER_SEARCH_PW=<password>
export TWITTER_SEARCH_ACCOUNT_NAME=<account_name>
export TWITTER_SEARCH_BEARER_TOKEN=<token>
```


The other points that you will have to set in the program are your endpoint, the api you want to use. There are functions to generate correct API endpoints from this info as well as flags to use the `counts` endpoint instead of the regular endpoint.

The following cell demonstrates the basic setup that will be referenced throughout your program's session. Note that any method of storing your credentials is valid here; I am using environment variables for ease of use.

In [1]:
import os
import json
from twittersearch import ResultStream, gen_rule_payload

## Enterprise setup

If you are an enterprise customer, you'll need to authenticate with a basic username/password method. You can specify that here:

In [22]:
# set your environment variables here for enterprise access if you need to
# os.environ["TWITTER_SEARCH_ACCOUNT_NAME"] = ""
# os.environ["TWITTER_SEARCH_PW"] = ""

enterprise_endpoint = "<ENTER YOUR ENDPOINT HERE>"
enterprise_search_args = {"username": "agonzales@twitter.com",
                          "password": os.environ["TWITTER_SEARCH_PW"],
                          "endpoint": enterprise_endpoint
                         }

## Premium Setup

Premium customers will use a bearer token for authentication. Use the following cell for setup:

In [23]:
# set your environment variables here for premium access if you need to
# os.environ["TWITTER_SEARCH_BEARER_TOKEN"] = ""

premium_search_endpoint = "https://api.twitter.com/1.1/tweets/search/30day/dev.json"

premium_search_args = {"bearer_token": os.environ["TWITTER_SEARCH_BEARER_TOKEN"],
                       "endpoint": premium_search_endpoint,
                      }

print(premium_search_endpoint)

https://api.twitter.com/1.1/tweets/search/30day/dev.json


There is a function that formats search API rules into valid json queries called `gen_rule_payload`. It has sensible defaults, such as pulling more tweets per call than the default 100 (but note that a sandbox environment can only have a max of 100 here, so if you get errors, please check this) not including dates, and defaulting to hourly counts when using the counts api. Discussing the finer points of generating search rules is out of scope for these examples; I encourage you to see the docs to learn the nuances within, but for now let's see what a rule looks like.

In [24]:
rule = gen_rule_payload("@robotprincessfi", max_results=100) # testing with a sandbox account
print(rule)

{"query":"@robotprincessfi","maxResults":100}


This rule will match tweets that mention `@robotprincessfi`.

From this point, there are two ways to interact with the API. There is a quick method to collect smaller amounts of tweets to memory that requires less thought and knowledge, and interaction with the `ResultStream` object which will be introduced later.


## Fast Way

We'll use the `search_args` variable to power the configuration point for the API. The object also takes a valid PowerTrack rule and has options to cutoff search when hitting limits on both number of tweets and API calls.

We'll be using the `collect_results` function, which has three parameters.

- rule: a valid powertrack rule, referenced earlier
- max_results: as the api handles pagination, it will stop collecting when we get to this number
- result_stream_args: configuration args that we've already specified.


For the remaining examples, please change the args to either premium or enterprise depending on your usage.

Let's see how it goes:

In [5]:
from twittersearch import collect_results

In [6]:
tweets = collect_results(rule, max_results=500, result_stream_args=premium_search_args) # change this if you need to

using bearer token for authentication


By default, tweet payloads are lazily parsed into a `Tweet` object. An overwhelming number of tweet attributes are made available directly, as such:

In [7]:
[(tweet.id, tweet.all_text, tweet.hashtags) for tweet in tweets[0:10]]

[('920754829873606657', "@ericmbudd I'm super cute.", []),
 ('920754352716783616', "@RobotPrincessFi that's super cute", []),
 ('920543141614067712', '@RobotPrincessFi https://t.co/z6AioxZkwE', []),
 ('920383435209891841', '@robotprincessfi hi there Fiona', [])]

Voila, we have some tweets. For interactive environments and other cases where you don't care about collecting your data in a single load or don't need to operate on the stream of tweets or counts directly, I recommend using this convenience function.


## Working with the ResultStream

The ResultStream object will be powered by the `search_args`, and takes the rules and other configuration parameters, including a hard stop on number of pages to limit your API call usage.

In [25]:
rs = ResultStream(**premium_search_args, rule_payload=rule, max_results=500, max_pages=1, )

In [26]:
print(rs)

ResultStream: 
	{
    "username":null,
    "endpoint":"https:\/\/api.twitter.com\/1.1\/tweets\/search\/30day\/dev.json",
    "rule_payload":{
        "query":"@robotprincessfi",
        "maxResults":100
    },
    "tweetify":true,
    "max_results":500
}


There is a function, `.stream`, that seamlessly handles requests and pagination for a given query. It returns a generator, and to grab our 500 tweets that mention `@robotprincessfi` we can do this:

In [10]:
tweets = list(rs.stream())

using bearer token for authentication


Tweets are lazily parsed using our Tweet Parser, so tweet data is very easily extractable.

In [11]:
[(tweet.id, tweet.all_text, tweet.hashtags) for tweet in tweets[0:10]]

[('920754829873606657', "@ericmbudd I'm super cute.", []),
 ('920754352716783616', "@RobotPrincessFi that's super cute", []),
 ('920543141614067712', '@RobotPrincessFi https://t.co/z6AioxZkwE', []),
 ('920383435209891841', '@robotprincessfi hi there Fiona', [])]

## Counts API

We can also use the counts api to get counts of tweets that match our rule. Each request will return up to *30* results, and each count request can be done on a minutely, hourly, or daily basis. There is a utility function that will convert your regular endpoint to the count endpoint.

The process is very similar to grabbing tweets, but has some minor differneces.


**Caveat - premium sandbox environments do NOT have access to the counts API.**



In [23]:
from twittersearch import change_to_count_endpoint
count_endpoint = change_to_count_endpoint("https://gnip-api.twitter.com/search/fullarchive/accounts/shendrickson/ogformat.json")

count_args = {"username": "agonzales@twitter.com",
                          "password": os.environ["TWITTER_SEARCH_PW"],
                          "endpoint": count_endpoint,
                         }

count_rule = gen_rule_payload("beyonce", count_bucket="day")

counts = collect_results(count_rule, result_stream_args=count_args)

using username and password for authentication


Our results are pretty straightforward and can be rapidly used.

In [29]:
counts

[{'count': 135320, 'timePeriod': '201711100000'},
 {'count': 68532, 'timePeriod': '201711090000'},
 {'count': 67138, 'timePeriod': '201711080000'},
 {'count': 73017, 'timePeriod': '201711070000'},
 {'count': 52290, 'timePeriod': '201711060000'},
 {'count': 79338, 'timePeriod': '201711050000'},
 {'count': 200519, 'timePeriod': '201711040000'},
 {'count': 160512, 'timePeriod': '201711030000'},
 {'count': 220683, 'timePeriod': '201711020000'},
 {'count': 190959, 'timePeriod': '201711010000'},
 {'count': 121580, 'timePeriod': '201710310000'},
 {'count': 39473, 'timePeriod': '201710300000'},
 {'count': 35441, 'timePeriod': '201710290000'},
 {'count': 36198, 'timePeriod': '201710280000'},
 {'count': 36149, 'timePeriod': '201710270000'},
 {'count': 34197, 'timePeriod': '201710260000'},
 {'count': 41497, 'timePeriod': '201710250000'},
 {'count': 47648, 'timePeriod': '201710240000'},
 {'count': 49087, 'timePeriod': '201710230000'},
 {'count': 44945, 'timePeriod': '201710220000'},
 {'count': 548

## Dated searches / Full Archive Search


Let's make a new rule and pass it dates this time.

`gen_rule_payload` takes dates of the forms `YYYY-mm-DD` and `YYYYmmDD`.


**Note that this will only work with the full archive search option**, which is available to my account only via the enterprise options. Full archive search will likely require a different endpoint or access method; please see your developer console for details.

In [27]:
rule = gen_rule_payload("from:jack", from_date="2017-09-01", to_date="2017-10-30", max_results=100)
print(rule)

{"query":"from:jack","maxResults":100,"toDate":"201710300000","fromDate":"201709010000"}


In [16]:
tweets = collect_results(rule, max_results=500, result_stream_args=enterprise_search_args)

using username and password for authentication


In [17]:
[(str(tweet.created_at_datetime), tweet.all_text, tweet.hashtags) for tweet in tweets[0:10]]
  

[('2017-10-27 18:22:07',
  'More clarity on our private information policy and enforcement. Working to build as much direct context into the product too https://t.co/IrwBexPrBA\nTo provide more clarity on our private information policy, we’ve added specific examples of what is/is not a violation and insight into what we need to remove this type of content from the service. https://t.co/NGx5hh2tTQ',
  []),
 ('2017-10-27 18:17:37',
  'Launching violent groups and hateful images/symbols policy on November 22nd https://t.co/NaWuBPxyO5\nWe will now launch our policies on violent groups and hateful imagery and hate symbols on Nov 22. During the development process, we received valuable feedback that we’re implementing before these are published and enforced. See more on our policy development process here 👇 https://t.co/wx3EeH39BI',
  []),
 ('2017-10-27 01:25:39', '@WillStick @lizkelley Happy birthday Liz!', []),
 ('2017-10-26 14:24:05',
  'Off-boarding advertising from all accounts owned by