Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Json parsing checklist #1

Open
14 of 19 tasks
betsybookwyrm opened this issue Oct 6, 2021 · 2 comments
Open
14 of 19 tasks

Json parsing checklist #1

betsybookwyrm opened this issue Oct 6, 2021 · 2 comments

Comments

@betsybookwyrm
Copy link
Contributor

betsybookwyrm commented Oct 6, 2021

This checklist is for the first pass at parsing all the objects in the json. It's okay if it misses some cases, but it should get it to the point where most of the data goes in the database and we can start doing testing to find things we've missed.

Twarc + twitter json:

  • _twarc
  • data
    • tweet fields
    • mentions
    • annotations
    • context annotations
    • references
  • includes
    • media
    • places
    • tweets (as for data above)
    • users
      • user fields
      • urls
      • hashtags
      • mentions
      • other entites???
  • meta
  • errors
@betsybookwyrm betsybookwyrm pinned this issue Oct 6, 2021
@betsybookwyrm
Copy link
Contributor Author

I'm pasting the more complex jq commands I'm using to go through the twarc/twitter json structure on the wiki, so I/we don't have to figure them out again: https://github.com/QUT-Digital-Observatory/tidy_tweet/wiki

@betsybookwyrm
Copy link
Contributor Author

To start with, I'm ignoring changes over time - initially it will only be able to accurately ingest one API result file per database. We'll change this later.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant