# Using the Twitter Search APIs Python Wrapper

Working with the API within a Python program is straightforward both for Premium and Enterprise clients.

Our group's python [tweet parser library](https://github.com/twitterdev/tweet_parser) is a requirement.

## Credential Handling

The premium and enterprise Search APIs use different credentials and we attempt to provide methods of seamless handling for all customers. 
We support YAML-file based methods and environment variables for access.

A YAML credential file should look like this:

```.yaml

<key>:
  account_type: <OPTIONAL PREMIUM_OR_ENTERPRISE>
  endpoint: <FULL_URL_OF_ENDPOINT>
  username: <USERNAME>
  password: <PW>
  bearer_token: <TOKEN>
```

Premium clients will require the `bearer_token` and `endpoint` fields; Enterprise clients require `username`, `password`, and `endpoint`. If you do not specify the `account_type`, we attempt to discern the account type and declare a warning about this behavior. The `load_credentials` function also allows `account_type` to be set.

Our credential reader will default this file being located at `"~/.twitter_keys.yaml"`, but you can pass the relevant location as needed.
You can also specify a different key in the yaml file, which can be useful if you have different endpoints, e.g., `dev`, `test`, `prod`, etc. The file might look like this:


```.yaml

search_tweets_dev:
  account_type: premium
  endpoint: <FULL_URL_OF_ENDPOINT>
  bearer_token: <TOKEN>
  
search_tweets_prod:
  account_type: premium
  endpoint: <FULL_URL_OF_ENDPOINT>
  bearer_token: <TOKEN>
  
```

If you want or need to pass credentials via environment variables, you can set the appropriate variables of the following: 

```
export SEARCHTWEETS_ENDPOINT=
export SEARCHTWEETS_USERNAME=
export SEARCHTWEETS_PASSWORD=
export SEARCHTWEETS_BEARER_TOKEN=
export SEARCHTWEETS_ACCOUNT_TYPE=
```

The `load_credentials` function will attempt to find these variables if it cannot load fields from the yaml file, and it will **overwrite any found credentials from the YAML file** if they have been parsed. This behavior can be changed by setting the `load_credentials` parameter `env_overwrite` to `False`.


The following cells demonstrates credential handling, both in the command line app and Python library.

In [1]:
from searchtweets import ResultStream, gen_rule_payload, load_credentials
import os

In [2]:
load_credentials(filename="./search_tweets_creds_example.yaml",
                 yaml_key="search_tweets_ent_example",
                 env_overwrite=False)

{'endpoint': '<MY_ENDPOINT>',
 'password': '<MY_PASSWORD>',
 'username': '<MY_USERNAME>'}

In [3]:
load_credentials(filename="./search_tweets_creds_example.yaml",
                 yaml_key="search_tweets_premium_example",
                 env_overwrite=False)

{'bearer_token': '<A_VERY_LONG_MAGIC_STRING>',
 'endpoint': 'https://api.twitter.com/1.1/tweets/search/30day/dev.json'}

### Environment Variable Overrides

If we set our environment variables, the program will look for them regardless of a YAML file's validity or existence.

In [4]:
os.environ["SEARCHTWEETS_USERNAME"] = "ENV_USERNAME"
os.environ["SEARCHTWEETS_PASSWORD"] = "ENV_PW"
os.environ["SEARCHTWEETS_ENDPOINT"] = "https://endpoint"

load_credentials(filename="nothing", yaml_key="no_key_here")

cannot read file nothing
Error parsing YAML file; searching for valid environment variables


{'endpoint': 'https://endpoint',
 'password': 'ENV_PW',
 'username': 'ENV_USERNAME'}

## Search API usage

We'll now load our proper credentials and move on with the example.

### Enterprise setup

In [5]:
enterprise_search_args = load_credentials("~/.twitter_keys.yaml",
                                          yaml_key="search_tweets_enterprise",
                                          env_overwrite=False)

### Premium Setup


In [6]:
premium_search_args = load_credentials("~/.twitter_keys.yaml",
                                       yaml_key="search_tweets_premium",
                                       env_overwrite=False)

There is a function that formats search API rules into valid json queries called `gen_rule_payload`. It has sensible defaults, such as pulling more tweets per call than the default 100 (but note that a sandbox environment can only have a max of 100 here, so if you get errors, please check this) not including dates, and defaulting to hourly counts when using the counts api. Discussing the finer points of generating search rules is out of scope for these examples; I encourage you to see the docs to learn the nuances within, but for now let's see what a rule looks like.

In [7]:
rule = gen_rule_payload("beyonce", results_per_call=100) # testing with a sandbox account
print(rule)

{"query":"beyonce","maxResults":100}


This rule will match tweets that have the text `beyonce` in them.

From this point, there are two ways to interact with the API. There is a quick method to collect smaller amounts of tweets to memory that requires less thought and knowledge, and interaction with the `ResultStream` object which will be introduced later.


## Fast Way

We'll use the `search_args` variable to power the configuration point for the API. The object also takes a valid PowerTrack rule and has options to cutoff search when hitting limits on both number of tweets and API calls.

We'll be using the `collect_results` function, which has three parameters.

- rule: a valid PowerTrack rule, referenced earlier
- max_results: as the API handles pagination, it will stop collecting when we get to this number
- result_stream_args: configuration args that we've already specified.


For the remaining examples, please change the args to either premium or enterprise depending on your usage.

Let's see how it goes:

In [8]:
from searchtweets import collect_results

In [9]:
tweets = collect_results(rule,
                         max_results=100,
                         result_stream_args=enterprise_search_args) # change this if you need to

By default, tweet payloads are lazily parsed into a `Tweet` object. An overwhelming number of tweet attributes are made available directly, as such:

In [10]:
[print(tweet.all_text, end='\n\n') for tweet in tweets[0:10]];

If Beyoncé asked me to quit my job and go on tour with her.....I WILL BE A MAMA.

Justin Timberlake Super Bowl Setlist:

Filthy (obvious promo)
Man In The Mirror
Holy Grail ft. Jay Z &amp; Beyoncé
Formation 
End of Time
Run The World
Crazy In Love 
Single Ladies
“Put your hands towards me, I want to feel your energy” Halo

God bless y’all! https://t.co/5WdnxmLzjE

If Beyoncé called me and asked me to quit my job and go on tour with her but she’s not paying me...... I WOULD STILL GO. https://t.co/1iIUvDjyVc

@edsheeran Beyoncé is not credited on "Perfect" for this week as the solo version of the song has been the dominant version this past week.

« quand Beyonce a dit « c’est pas la tromperie qui fait mal, c’est le genre de personne avec qui il me trompe, que mnt je dois le regarder dans les yeux et savoir qu’elle avait la moitié de ce que j’ai alors qu’elle n’est même pas la moitié de ce que je suis » j’ai compris

If @Beyonce played as Nala in The Lion King (lmao I tried, it’s hard to

In [11]:
[print(tweet.created_at_datetime) for tweet in tweets[0:10]];

2018-01-16 21:58:55
2018-01-16 21:58:53
2018-01-16 21:58:49
2018-01-16 21:58:46
2018-01-16 21:58:46
2018-01-16 21:58:45
2018-01-16 21:58:45
2018-01-16 21:58:43
2018-01-16 21:58:42
2018-01-16 21:58:41


In [12]:
[print(tweet.generator.get("name")) for tweet in tweets[0:10]];

Twitter for iPhone
Twitter for iPhone
Twitter for iPhone
Twitter for iPhone
Twitter for iPhone
Twitter for Android
Twitter for iPhone
Twitter for iPhone
Twitter for Android
Twitter for iPhone


Voila, we have some tweets. For interactive environments and other cases where you don't care about collecting your data in a single load or don't need to operate on the stream of tweets or counts directly, I recommend using this convenience function.


## Working with the ResultStream

The ResultStream object will be powered by the `search_args`, and takes the rules and other configuration parameters, including a hard stop on number of pages to limit your API call usage.

In [13]:
rs = ResultStream(rule_payload=rule,
                  max_results=500,
                  max_pages=1,
                  **premium_search_args)

print(rs)

ResultStream: 
	{
    "username":null,
    "endpoint":"https:\/\/api.twitter.com\/1.1\/tweets\/search\/30day\/dev.json",
    "rule_payload":{
        "query":"beyonce",
        "maxResults":100
    },
    "tweetify":true,
    "max_results":500
}


There is a function, `.stream`, that seamlessly handles requests and pagination for a given query. It returns a generator, and to grab our 500 tweets that mention `beyonce` we can do this:

In [14]:
tweets = list(rs.stream())

Tweets are lazily parsed using our Tweet Parser, so tweet data is very easily extractable.

In [15]:
# using unidecode to prevent emoji/accents printing 
[print(tweet.all_text) for tweet in tweets[0:10]];

perfect predicted to drop off #1 a week after they cut the beyoncé credit https://t.co/tKZ2flay8A
« quand Beyonce a dit « c’est pas la tromperie qui fait mal, c’est le genre de personne avec qui il me trompe, que mnt je dois le regarder dans les yeux et savoir qu’elle avait la moitié de ce que j’ai alors qu’elle n’est même pas la moitié de ce que je suis » j’ai compris
"Todo esse ódio que a Beyoncé recebe é culpa da Beyhive"

Uma fã-base que ama a maior artista da atualidade,ativista, empoderada na maioria das vezes inteligente e com argumentos com fundamento te assusta? 
Vadias básicas, curvem-se para a Beyhive! 🍯🐝 https://t.co/TCBBb1lmMk
Imagine Nicki Minaj, Beyoncé, Rihanna and Ariana Grande doing a song together. A lady marmalade remake with them four would be Iconic. https://t.co/PqoKEtmMgS
NEW EMINEM INTERVIEW IS HERE. 

"I don't know what I'm gonna do when I can't rap anymore. I probably fucking... https://t.co/ficwtt0lGc
I’m ready for more Beyoncé music but am I ready for more 

## Counts Endpoint

We can also use the Search API Counts endpoint to get counts of tweets that match our rule. Each request will return up to *30* results, and each count request can be done on a minutely, hourly, or daily basis. The underlying `ResultStream` object will handle converting your endpoint to the count endpoint, and you have to specify the `count_bucket` argument when making a rule to use it.

The process is very similar to grabbing tweets, but has some minor differences.


_Caveat - premium sandbox environments do NOT have access to the Search API counts endpoint._

In [16]:
count_rule = gen_rule_payload("beyonce", count_bucket="day")

counts = collect_results(count_rule, result_stream_args=enterprise_search_args)

Our results are pretty straightforward and can be rapidly used.

In [17]:
counts

[{'count': 40182, 'timePeriod': '201801160000'},
 {'count': 61955, 'timePeriod': '201801150000'},
 {'count': 59688, 'timePeriod': '201801140000'},
 {'count': 44023, 'timePeriod': '201801130000'},
 {'count': 46618, 'timePeriod': '201801120000'},
 {'count': 41527, 'timePeriod': '201801110000'},
 {'count': 47060, 'timePeriod': '201801100000'},
 {'count': 65513, 'timePeriod': '201801090000'},
 {'count': 95272, 'timePeriod': '201801080000'},
 {'count': 162926, 'timePeriod': '201801070000'},
 {'count': 106393, 'timePeriod': '201801060000'},
 {'count': 93565, 'timePeriod': '201801050000'},
 {'count': 110436, 'timePeriod': '201801040000'},
 {'count': 127564, 'timePeriod': '201801030000'},
 {'count': 131984, 'timePeriod': '201801020000'},
 {'count': 176206, 'timePeriod': '201801010000'},
 {'count': 57241, 'timePeriod': '201712310000'},
 {'count': 72280, 'timePeriod': '201712300000'},
 {'count': 72083, 'timePeriod': '201712290000'},
 {'count': 76396, 'timePeriod': '201712280000'},
 {'count': 616

## Dated searches / Full Archive Search


Let's make a new rule and pass it dates this time.

`gen_rule_payload` takes dates of the forms `YYYY-mm-DD` and `YYYYmmDD`.


**Note that this will only work with the full archive search option**, which is available to my account only via the enterprise options. Full archive search will likely require a different endpoint or access method; please see your developer console for details.

In [18]:
rule = gen_rule_payload("from:jack",
                        from_date="2017-09-01",
                        to_date="2017-10-30",
                        results_per_call=500)
print(rule)

{"query":"from:jack","maxResults":500,"toDate":"201710300000","fromDate":"201709010000"}


In [19]:
tweets = collect_results(rule, max_results=500, result_stream_args=enterprise_search_args)

In [20]:
# usiing unidecode only to 
[print(tweet.all_text) for tweet in tweets[0:10]];

More clarity on our private information policy and enforcement. Working to build as much direct context into the product too https://t.co/IrwBexPrBA
To provide more clarity on our private information policy, we’ve added specific examples of what is/is not a violation and insight into what we need to remove this type of content from the service. https://t.co/NGx5hh2tTQ
Launching violent groups and hateful images/symbols policy on November 22nd https://t.co/NaWuBPxyO5
We will now launch our policies on violent groups and hateful imagery and hate symbols on Nov 22. During the development process, we received valuable feedback that we’re implementing before these are published and enforced. See more on our policy development process here 👇 https://t.co/wx3EeH39BI
@WillStick @lizkelley Happy birthday Liz!
Off-boarding advertising from all accounts owned by Russia Today (RT) and Sputnik.

We’re donating all projected earnings ($1.9mm) to support external research into the use of Twitter in e

In [21]:
rule = gen_rule_payload("from:jack",
                        from_date="2017-09-20",
                        to_date="2017-10-30",
                        count_bucket="day",
                        results_per_call=500)
print(rule)

{"query":"from:jack","toDate":"201710300000","fromDate":"201709200000","bucket":"day"}


In [22]:
counts = collect_results(rule, max_results=500, result_stream_args=enterprise_search_args)

In [23]:
[print(c) for c in counts];

{'timePeriod': '201710290000', 'count': 0}
{'timePeriod': '201710280000', 'count': 0}
{'timePeriod': '201710270000', 'count': 3}
{'timePeriod': '201710260000', 'count': 6}
{'timePeriod': '201710250000', 'count': 4}
{'timePeriod': '201710240000', 'count': 4}
{'timePeriod': '201710230000', 'count': 0}
{'timePeriod': '201710220000', 'count': 0}
{'timePeriod': '201710210000', 'count': 3}
{'timePeriod': '201710200000', 'count': 2}
{'timePeriod': '201710190000', 'count': 1}
{'timePeriod': '201710180000', 'count': 6}
{'timePeriod': '201710170000', 'count': 2}
{'timePeriod': '201710160000', 'count': 2}
{'timePeriod': '201710150000', 'count': 1}
{'timePeriod': '201710140000', 'count': 64}
{'timePeriod': '201710130000', 'count': 3}
{'timePeriod': '201710120000', 'count': 4}
{'timePeriod': '201710110000', 'count': 8}
{'timePeriod': '201710100000', 'count': 4}
{'timePeriod': '201710090000', 'count': 1}
{'timePeriod': '201710080000', 'count': 0}
{'timePeriod': '201710070000', 'count': 0}
{'timePeri