Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reconstructing Threads #107

Open
mcolemann opened this issue Aug 24, 2021 · 7 comments
Open

Reconstructing Threads #107

mcolemann opened this issue Aug 24, 2021 · 7 comments

Comments

@mcolemann
Copy link

Hi everyone!

Does someone of you have a code to reconstruct the threads?

Thanks a lot!

@edsu
Copy link
Member

edsu commented Aug 24, 2021

Hydrator doesn't do that currently, but that would be a good option to add potentially. If you need to do it now you can use twarc's conversation command.

@mcolemann
Copy link
Author

Thank you very much!
Is twarc compatible with the .csv files from the Hydrator?

@edsu
Copy link
Member

edsu commented Aug 24, 2021

It depends on what you are doing. Do you want to get all the conversation threads in the CSV file you generated?

@mcolemann
Copy link
Author

Actually I am trying to identify some good case studies for the conversation threads.
I have 33 .csv files (each around 300-500MB) and I would like to reconstruct all the threads (to better identify the case studies). Do you know how I can do this?

@edsu
Copy link
Member

edsu commented Aug 24, 2021

Do you have access to the Academic Research product track, which allows searching the historical archive?

In theory it ought to be possible if you extract the tweet ids from your CSVs into a file e.g ids.txt. And then run twarc conversations ids.txt --archive conversations.json to collect all the threads. It could take a while depending on the sizes of the threads you encounter. But these are all questions for the twarc issue tracker I guess.

@mcolemann
Copy link
Author

Unfortunately I don't have (I think so) access to the Academic Research product track...
How can I have access to it?

Thanks a lot! Then I will open an issue for twarc.

@edsu
Copy link
Member

edsu commented Aug 25, 2021

If you are studying or working at a university you can apply. The main difference is that you can access 10 million tweets a month from Twitter's V2 API (usually limited to 100,000/month). The V2 API includes things like reply_count for tweets, as well as the conversation_id for a tweet which lets you easily collect all the tweets in a thread. And most importantly, Academic Research track lets you search the full archive of tweets rather than just the last week.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants