Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Id mismatch? #972

Closed
chmod opened this issue Jun 16, 2023 · 2 comments
Closed

Id mismatch? #972

chmod opened this issue Jun 16, 2023 · 2 comments
Labels
bug Something isn't working upstream

Comments

@chmod
Copy link

chmod commented Jun 16, 2023

I am running the command snscrape --jsonl --progress --max-results 1 twitter-search "@Eikonikos_HQ filter:replies -from:Eikonikos_HQ" which currently returns (removed stuff for brevity)

{
  "_type": "snscrape.modules.twitter.Tweet",
  "url": "https://twitter.com/estfino/status/1669700650467246080",
  "id": 1669700650467246000,
  "conversationId": 1669700100971536400,
  "inReplyToTweetId": 1669700100971536400,
   .....
}
  • Shouldn't id be 1669700650467246080 ?
  • Shouldn't inReplyToTweetId be 1669700100971536389 ?

My understanding is that the last part of a tweet url is the tweet id. The current IDs provided link to error page.

Edit: I am using version of GitHub

@chmod chmod added the question Further information is requested label Jun 16, 2023
@JustAnotherArchivist
Copy link
Owner

snscrape returns the correct IDs:

$ snscrape --jsonl --progress --max-results 1 twitter-search "@Eikonikos_HQ filter:replies -from:Eikonikos_HQ"
{"_type": "snscrape.modules.twitter.Tweet", "url": "https://twitter.com/estfino/status/1669700650467246080", "date": "2023-06-16T13:37:11+00:00", "rawContent": "<snip>", "renderedContent": "<snip>", "id": 1669700650467246080, "user": { <snip>

You are likely passing it through jq or a similar software which doesn't parse large integers correctly. (jq has fixed that bug some time ago, but it isn't released yet.) That's what mangles the IDs.

If you can't switch to a JSON parser that isn't broken, you can use --jsonl-for-buggy-int-parser to emit JSONL with additional id.str etc. string fields for each field with an integer exceeding the float precision.

@JustAnotherArchivist JustAnotherArchivist added bug Something isn't working upstream and removed question Further information is requested labels Jun 16, 2023
@chmod
Copy link
Author

chmod commented Jun 16, 2023

Thank you for the reply. I'll update the jq.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working upstream
Projects
None yet
Development

No branches or pull requests

2 participants