# Using the Twitter Search API

Working with the API within a Python program is straightforward both for Premium and Enterprise clients.

Our group's python [tweet parser library](https://github.com/twitterdev/tweet_parser) is a requirement.


Prior to starting your program, an easy way to define your secrets will be setting an environment variable. If you are an enterprise client, your authentication will be a (username, password) pair. If you are a premium client, you'll need to get a bearer token that will be passed with each call for authentication.

Your credentials should be put into a YAML file that looks like this:


```.yaml

twitter_search_api:
  endpoint: <FULL_URL_OF_ENDPOINT>
  account: <ACCOUNT_NAME>
  username: <USERNAME>
  password: <PW>
  bearer_token: <TOKEN>
```
And filling in the keys that are appropriate for your account type. Premium users should only have the `endpoint` and `bearer_token`; Enterprise customers should have `account`, `username`, `endpoint`, and `password`.


Our credential reader will default to expecing this file in `"~/.twitter_search.yaml"`, but you can pass the relevant location as needed.

The following cell demonstrates the basic setup that will be referenced throughout your program's session.

In [1]:
from twittersearch import ResultStream, gen_rule_payload, load_credentials

## Enterprise setup

If you are an enterprise customer, you'll need to authenticate with a basic username/password method. You can specify that here:

In [42]:
enterprise_search_args = load_credentials("~/.twitter_keys.yaml",
                                          account_type="enterprise")

## Premium Setup

Premium customers will use a bearer token for authentication. Use the following cell for setup:

In [54]:
premium_search_args = load_credentials("~/.twitter_keys.yaml",
                                       account_type="premium")

There is a function that formats search API rules into valid json queries called `gen_rule_payload`. It has sensible defaults, such as pulling more tweets per call than the default 100 (but note that a sandbox environment can only have a max of 100 here, so if you get errors, please check this) not including dates, and defaulting to hourly counts when using the counts api. Discussing the finer points of generating search rules is out of scope for these examples; I encourage you to see the docs to learn the nuances within, but for now let's see what a rule looks like.

In [45]:
rule = gen_rule_payload("beyonce", results_per_call=100) # testing with a sandbox account
print(rule)

{"query":"beyonce","maxResults":100}


This rule will match tweets that have the text `beyonce` in them.

From this point, there are two ways to interact with the API. There is a quick method to collect smaller amounts of tweets to memory that requires less thought and knowledge, and interaction with the `ResultStream` object which will be introduced later.


## Fast Way

We'll use the `search_args` variable to power the configuration point for the API. The object also takes a valid PowerTrack rule and has options to cutoff search when hitting limits on both number of tweets and API calls.

We'll be using the `collect_results` function, which has three parameters.

- rule: a valid powertrack rule, referenced earlier
- max_results: as the api handles pagination, it will stop collecting when we get to this number
- result_stream_args: configuration args that we've already specified.


For the remaining examples, please change the args to either premium or enterprise depending on your usage.

Let's see how it goes:

In [8]:
from twittersearch import collect_results

In [46]:
tweets = collect_results(rule,
                         max_results=100,
                         result_stream_args=enterprise_search_args) # change this if you need to

By default, tweet payloads are lazily parsed into a `Tweet` object. An overwhelming number of tweet attributes are made available directly, as such:

In [47]:
[print(tweet.all_text) for tweet in tweets[0:10]];

That deep sigh Beyoncé took once she realized she wouldn’t be able to get the earpiece out of her hair before the dance break 😂.  https://t.co/dU1K2KMT7i
4 Years ago today, "BEYONCÉ" by Beyoncé was surprise released. It received acclaim from critics,  debuted at #1 and certified 2x Platinum in the US. https://t.co/wB3C7DuX9o
me mata la gente que se cree superior por sus gustos de música escuches queen beyonce o el polaco no sos mas ni menos que nadie
I’m literally not Beyoncé https://t.co/LwIkllCx6P
#BEYONCÉ ‣ 𝐌𝐄𝐀𝐃𝐃𝐅𝐀𝐍 𝐎𝐅𝐈𝐂𝐈𝐀𝐋 - I Am... 𝐖𝐎𝐑𝐋𝐃 𝐓𝐎𝐔𝐑! https://t.co/TyyeDdXKiM
Beyoncé on how nervous she was to release her self-titled... https://t.co/fru23c6DYC
AAAA ansiosa por esse feat da Beyoncé com Jorge Ben Jor &lt;3 https://t.co/NkKJhC9JUd
I am world tour, the Beyonce experience, revamped hmt. https://t.co/pb07eMyNka
Tell me what studio versions of any artists would u like me to do? https://t.co/Z6YWsAJuhU
Billboard's best female artists over the last decade:

2017: Ariana Grande
2016:

In [48]:
[print(tweet.created_at_datetime) for tweet in tweets[0:10]];

2017-12-13 21:18:17
2017-12-13 21:18:16
2017-12-13 21:18:16
2017-12-13 21:18:15
2017-12-13 21:18:15
2017-12-13 21:18:13
2017-12-13 21:18:12
2017-12-13 21:18:12
2017-12-13 21:18:11
2017-12-13 21:18:10


In [49]:
[print(tweet.generator.get("name")) for tweet in tweets[0:10]];

Twitter for Android
Twitter for Android
Twitter for Android
Twitter for iPhone
Meadd
Twitter for iPhone
Twitter for Android
Twitter for iPhone
Twitter for iPhone
Twitter for Android


Voila, we have some tweets. For interactive environments and other cases where you don't care about collecting your data in a single load or don't need to operate on the stream of tweets or counts directly, I recommend using this convenience function.


## Working with the ResultStream

The ResultStream object will be powered by the `search_args`, and takes the rules and other configuration parameters, including a hard stop on number of pages to limit your API call usage.

In [55]:
rs = ResultStream(rule_payload=rule,
                  max_results=500,
                  max_pages=1,
                  **premium_search_args)

print(rs)

ResultStream: 
	{
    "username":null,
    "endpoint":"https:\/\/api.twitter.com\/1.1\/tweets\/search\/30day\/dev.json",
    "rule_payload":{
        "query":"beyonce",
        "maxResults":100
    },
    "tweetify":true,
    "max_results":500
}


There is a function, `.stream`, that seamlessly handles requests and pagination for a given query. It returns a generator, and to grab our 500 tweets that mention `beyonce` we can do this:

In [56]:
tweets = list(rs.stream())

Tweets are lazily parsed using our Tweet Parser, so tweet data is very easily extractable.

In [57]:
# using unidecode to prevent emoji/accents printing 
[print(tweet.all_text) for tweet in tweets[0:10]];

Everyone: *still dragging Jay for cheating*

Beyoncé: https://t.co/2z1ltlMQiJ
Beyoncé changed the game w/ that digital drop 4 years ago today! 🎉

• #1 debut on Billboard
• Sold 617K in the US / over 828K WW in only 3 days
• Fastest-selling album on iTunes of all time
• Reached #1 in 118 countries
• Widespread acclaim; hailed as her magnum opus https://t.co/lDCdVs6em3
Beyoncé 🔥 #444Tour https://t.co/sCvZzjLwqx
Se presentan casos de feminismo pop basado en sugerencias de artistas famosos en turno, Emma Watson, Beyoncé.
Beyonce. Are you kidding me with this?! #Supreme #love #everything
Dear Beyoncé, https://t.co/5visfVK2LR
At this time 4 years ago today, Beyoncé released her self-titled album BEYONCÉ exclusively on the iTunes Store without any prior announcement. The album remains the ONLY album in history to reach #1 in 118 countries &amp; the fastest-selling album in the history of the iTunes Store. https://t.co/ZZb4QyQYf0
4 years ago today, Beyoncé released her self-titled visual album

## Counts API

We can also use the counts api to get counts of tweets that match our rule. Each request will return up to *30* results, and each count request can be done on a minutely, hourly, or daily basis. The underlying `ResultStream` object will handle converting your endpoint to the count endpoint, and you have to specify the `count_bucket` argument when making a rule to use it.

The process is very similar to grabbing tweets, but has some minor differneces.


**Caveat - premium sandbox environments do NOT have access to the counts API.**



In [24]:
count_rule = gen_rule_payload("beyonce", count_bucket="day")

counts = collect_results(count_rule, result_stream_args=enterprise_search_args)

Our results are pretty straightforward and can be rapidly used.

In [25]:
counts

[{'count': 85660, 'timePeriod': '201712130000'},
 {'count': 95231, 'timePeriod': '201712120000'},
 {'count': 114540, 'timePeriod': '201712110000'},
 {'count': 165964, 'timePeriod': '201712100000'},
 {'count': 102022, 'timePeriod': '201712090000'},
 {'count': 87630, 'timePeriod': '201712080000'},
 {'count': 195794, 'timePeriod': '201712070000'},
 {'count': 209629, 'timePeriod': '201712060000'},
 {'count': 88742, 'timePeriod': '201712050000'},
 {'count': 96795, 'timePeriod': '201712040000'},
 {'count': 177595, 'timePeriod': '201712030000'},
 {'count': 120102, 'timePeriod': '201712020000'},
 {'count': 186759, 'timePeriod': '201712010000'},
 {'count': 151212, 'timePeriod': '201711300000'},
 {'count': 79311, 'timePeriod': '201711290000'},
 {'count': 107175, 'timePeriod': '201711280000'},
 {'count': 58192, 'timePeriod': '201711270000'},
 {'count': 48327, 'timePeriod': '201711260000'},
 {'count': 59639, 'timePeriod': '201711250000'},
 {'count': 85201, 'timePeriod': '201711240000'},
 {'count':

## Dated searches / Full Archive Search


Let's make a new rule and pass it dates this time.

`gen_rule_payload` takes dates of the forms `YYYY-mm-DD` and `YYYYmmDD`.


**Note that this will only work with the full archive search option**, which is available to my account only via the enterprise options. Full archive search will likely require a different endpoint or access method; please see your developer console for details.

In [26]:
rule = gen_rule_payload("from:jack", from_date="2017-09-01", to_date="2017-10-30", results_per_call=500)
print(rule)

{"query":"from:jack","maxResults":500,"toDate":"201710300000","fromDate":"201709010000"}


In [27]:
tweets = collect_results(rule, max_results=500, result_stream_args=enterprise_search_args)

In [40]:
# usiing unidecode only to 
[print(tweet.all_text) for tweet in tweets[0:10]];

More clarity on our private information policy and enforcement. Working to build as much direct context into the product too https://t.co/IrwBexPrBA
To provide more clarity on our private information policy, we’ve added specific examples of what is/is not a violation and insight into what we need to remove this type of content from the service. https://t.co/NGx5hh2tTQ
Launching violent groups and hateful images/symbols policy on November 22nd https://t.co/NaWuBPxyO5
We will now launch our policies on violent groups and hateful imagery and hate symbols on Nov 22. During the development process, we received valuable feedback that we’re implementing before these are published and enforced. See more on our policy development process here 👇 https://t.co/wx3EeH39BI
@WillStick @lizkelley Happy birthday Liz!
Off-boarding advertising from all accounts owned by Russia Today (RT) and Sputnik.

We’re donating all projected earnings ($1.9mm) to support external research into the use of Twitter in e

In [37]:
rule = gen_rule_payload("from:jack",
                        from_date="2017-09-20",
                        to_date="2017-10-30",
                        count_bucket="day",
                        results_per_call=500)
print(rule)

{"query":"from:jack","toDate":"201710300000","fromDate":"201709200000","bucket":"day"}


In [38]:
counts = collect_results(rule, max_results=500, result_stream_args=enterprise_search_args)

In [39]:
[print(c) for c in counts];

{'timePeriod': '201710290000', 'count': 0}
{'timePeriod': '201710280000', 'count': 0}
{'timePeriod': '201710270000', 'count': 3}
{'timePeriod': '201710260000', 'count': 6}
{'timePeriod': '201710250000', 'count': 4}
{'timePeriod': '201710240000', 'count': 4}
{'timePeriod': '201710230000', 'count': 0}
{'timePeriod': '201710220000', 'count': 0}
{'timePeriod': '201710210000', 'count': 3}
{'timePeriod': '201710200000', 'count': 2}
{'timePeriod': '201710190000', 'count': 1}
{'timePeriod': '201710180000', 'count': 6}
{'timePeriod': '201710170000', 'count': 2}
{'timePeriod': '201710160000', 'count': 2}
{'timePeriod': '201710150000', 'count': 1}
{'timePeriod': '201710140000', 'count': 64}
{'timePeriod': '201710130000', 'count': 3}
{'timePeriod': '201710120000', 'count': 4}
{'timePeriod': '201710110000', 'count': 8}
{'timePeriod': '201710100000', 'count': 4}
{'timePeriod': '201710090000', 'count': 1}
{'timePeriod': '201710080000', 'count': 0}
{'timePeriod': '201710070000', 'count': 0}
{'timePeri