Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

remove archive file naming logic #51

Closed
edsu opened this issue Jan 28, 2015 · 0 comments
Closed

remove archive file naming logic #51

edsu opened this issue Jan 28, 2015 · 0 comments

Comments

@edsu
Copy link
Member

edsu commented Jan 28, 2015

I think it would simplify the code quite a bit if twarc simply wrote tweets to stdout and let the user decide what file they should go.

When run repeatedly twarc tries to determine the since_id to use when talking to the Twitter API based on data that has already been archived. But this functionality is dependent on twarc being run in the same directory as the other archive files, and the filenames matching a particular pattern (which can get ugly). The determination of the since_id isn't working properly with files created with --stream since they are ordered differently.

I propose this logic is removed and we add a --min_id option to match --max_id. The user can then control what they want to do, and where the data goes.

edsu added a commit that referenced this issue Jan 30, 2015
@edsu edsu closed this as completed Jan 30, 2015
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant