New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Prepare data for network analysis: Is it possible to use network.py with twarc2 downloaded tweets? #461
Comments
v2 API responses are totally different to v1.1, so it's very unlikely that existing scripts for v1.1 data will work for v2 data. However: running:
may help, since that writes out 1 tweet per line and includes all the necessary metadata inline. Given that format, it should be relatively straight forward to edit
should be
to match all the "flattened" fields in v2. This is the section that would require changes: https://github.com/DocNow/twarc/blob/main/utils/network.py#L127-L168 It's a good candidate for making a twarc2 plugin https://twarc-project.readthedocs.io/en/latest/plugins/ |
I started substituting variable names as suggested, I had problems with the parsing of the date-time format: The current format of v2 API is: Once I solve it I think it'll work. |
There are other changes required, that may be slightly more awkward - such as dealing with retweets / quotes etc. using
this parses v1.1 data format, so the v2 equivalent would be:
|
Indeed, parsing dates work with your proposal, but not the other stuff related to RT and quotes. |
I'm glad this came up. I can work on creating a twarc-network plugin. But keep working on your revised script to solve your immediate need and if you can attach it here for reference. |
I just added the minor changes of name variable and parsing noted above. See the current script: https://gist.github.com/numeroteca/aa040b0488c914d1e4a37e40117ef062 |
Hey @edsu, any chance that you can work in this? I am not being able to make it work with the changes to the original script. |
Yes, I started on it a weeks ago and stalled. Thanks for the nudge! |
Ok I've released a port of the old network.py script as a twarc2 plugin. You should be able to install it with:
and then run it to generate a network as HTML D3:
More details about the various format options are available at https://github.com/docnow/twarc-network Please ask questions about the plugin over in that issue tracker if you don't mind too much! |
I've downloaded with twarc2 a set of tweets in
.jsonl
and I am now trying to create.gexf
or other network usable files (list of nodes and edges being able to select which relationship to use).While running
utils/network.py
it throws some errors, as the names of the variables (that's my guess) are not the same with the API 2 (id
instead ofid_str
,author
instead ofuser
...) and the script is unable to process them.Which way do you recommend to transform the data into files usable for data analysis?
Thanks!
The text was updated successfully, but these errors were encountered: