Skip to content
Betsy edited this page Oct 19, 2021 · 5 revisions

Tricks for exploring Twarc json with jq

These are some snippets which are handy for working out the json structure of Twarc data. The snippets should work in bash or other similar shells, where jq (a very useful json manipulating utility) is installed (see the jq installation docs).

To get a list of all the tweet object keys in the first page of results:

head -n 1 tests/data/ObservatoryTeam.jsonl | jq '.data | map(keys) | flatten | unique'

To get a list of all keys within objects within arrays within objects within arrays...:

head -n 1 tests/data/ObservatoryTeam.jsonl | jq '[.includes.users[].entities | objects | keys] | flatten | unique'
head -n 1 tests/data/ObservatoryTeam.jsonl | jq '[.data[].referenced_tweets | arrays | map(keys)] | flatten | unique'

To see all the values of a specific key (in this case, 'type') in an object in an array in an object in an array...:

head -n 1 tests/data/ObservatoryTeam.jsonl | jq '[.data[].referenced_tweets | arrays | map(.type) ] | flatten | unique'

Alternatively, to see the above for all API result pages in the json file (using < is easier than passing the filename directly into jq, which doesn't tab-complete):

jq '[.data[].referenced_tweets | arrays | map(.type) ] | flatten | unique' < tests/data/ObservatoryTeam.jsonl 
Clone this wiki locally