Skip to content

Latest commit

 

History

History
42 lines (29 loc) · 1.56 KB

README.md

File metadata and controls

42 lines (29 loc) · 1.56 KB

twarc-ids

This module is a simple reference implementation of a twarc plugin. It uses click-plugins to extend the main twarc command with an 'ids' subcommand that reads tweet data and writes out their identifiers.

It also provides an example of using twarc.ensure_flattened to make ensure that data has been "flattened" to make it easier to process Twitter API data as a stream of tweets where referenced entities (users, media, etc) have had their ids turned into objects. While not strictly needed for this plugin it does make it easier to read the data since API responses can contain one tweet in the payload or a list of tweets.

Install

First you need to install twarc and this plugin:

pip install twarc
pip install twarc-ids

Now you can collect data using the core twarc utility:

twarc2 search blacklivesmatter > tweets.jsonl

And you have a new subcommand ids that is supplied by twarc-ids.

twarc2 ids tweets.jsonl > ids.txt

It's good practice to include some tests for your module. See test_twarc_ids.py for an example. You can run it directly with pytest or using:

python setup.py test

When creating your setup.py make sure you don't forget the entry_points magic so that twarc will find your plugin when it is installed!