Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

twarc2 timeline --no-context-annotations not pulling 500 tweets #687

Open
mr-devs opened this issue Feb 7, 2023 · 2 comments
Open

twarc2 timeline --no-context-annotations not pulling 500 tweets #687

mr-devs opened this issue Feb 7, 2023 · 2 comments
Labels
bug cli Issues with command line interface

Comments

@mr-devs
Copy link

mr-devs commented Feb 7, 2023

I am running the following command with twarc version 2.13.0 (with academic level access):

# Pulls my own timeline
twarc2 timeline --no-context-annotations 1312850357555539972 test.json

Based on running

twarc2 timeline --help

setting --no-context-annotations "makes --max-results 500 the default." Unfortunately, I can see in the twarc.log output that max-results paramater is still equal to 100. Below is a screenshot of the entire process (aborted after a few calls).

image

I think based on the code here it looks like this is only true if utilizing the full archive method.

That said, I tested using the --use-search flag as well, which doesn't seem to correct the issue. See screenshot below.

image

I think that perhaps the message just needs to be updated as it looks like the 500 option is no longer an option (based on Twitter API reference).

Thoughts?

@igorbrigadir
Copy link
Contributor

Ha! I literally just noticed the same "bug" yes.

So, --no-context-annotations is a shortcut to remove context_annotations from tweet.fields. This is something that causes the search endpoint to be limited to 100 results per page, which is slow due to the 1 request per second limit in academic access.

twarc2 timeline command uses the timeline API endpoint, which likewise has a 100 tweet per page limit, but can NOT have a 500 per page with or without context annotations.

twarc2 timeline --use-search will use the search API instead, to get around the last 3200 tweets limit of the timelines API, if you have academic access. However - it seems like --no-context-annotations doesn't seem to work here either, which is the actual bug i think needs fixing.

The temporary workaround for this is to use the search command explicitly, as these two should end up equivalent:

If you had:

twarc2 timeline --use-search --no-context-annotations 1312850357555539972 results.jsonl

run this instead:

twarc2 search --archive --no-context-annotations "from:1312850357555539972" results.jsonl

@igorbrigadir igorbrigadir added bug cli Issues with command line interface labels Feb 7, 2023
@mr-devs
Copy link
Author

mr-devs commented Feb 7, 2023

This makes perfect sense. Thanks for the workaround!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug cli Issues with command line interface
Projects
None yet
Development

No branches or pull requests

2 participants