Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Report JSON parsing errors. #42

Open
sourabhnk opened this issue May 21, 2020 · 10 comments
Open

Report JSON parsing errors. #42

sourabhnk opened this issue May 21, 2020 · 10 comments

Comments

@sourabhnk
Copy link

Getting this error when i run a small sample of ~300 tweet IDs. None of the IDs are getting hydrated.
This is what the tool shows:
hydratorerror

@edsu
Copy link
Member

edsu commented May 21, 2020

I think this may be a variation on #39.

@sourabhnk is this problem repeatable. For example if you rehydrate the same tweet ids and try to generate the CSV does it throw the same error?

@edsu
Copy link
Member

edsu commented May 21, 2020

Also, would you be willing to send me your ids and jsonl file that you generated when the error was thrown? You can attach them here or send them to ehs@pobox.com Thanks!

@sourabhnk
Copy link
Author

Hi Ed,
I tried with a different file (~300 IDs) but getting the same error.
For some reason, the JSON files are also 0kb. Could it be that there is an issue with IDs itself? I will check out #39 and sending you the CSV shortly by mail.

@edsu
Copy link
Member

edsu commented May 21, 2020

Yes, if the JSON file is empty that could definitely be a problem. But Hydrator should handle that situation. It hasn't come up before.

@sourabhnk
Copy link
Author

I have shared the csv files by mail. #39 didn't help much in my case.
Thanks for your help.

@edsu edsu changed the title JS Error: Unexpected end of JSON Input Report JSON parsing errors. May 25, 2020
@edsu
Copy link
Member

edsu commented May 25, 2020

@sourabhnk it appears that your tweet ids have been corrupted by opening them with Excel and saving them. Excel is unable to handle the large numbers and overflows the correct value so that it ends with zeros instead of the correct numbers. It is apparent when looking at your files because all the ids four zeros.

This means that none of the tweets are able to hydrate, and when you go to convert the JSON to CSV there is nothing to convert. The Hydrator should report when the JSON file is empty. So I'm going to leave this issue open until it does that.

@sourabhnk
Copy link
Author

Yes Ed, I had extracted tsv file and used Excel to save it in csv. Probably that could be the cause.
Let me use the original file and redo this again.
As to the problem of reporting, I am not sure where it went wrong.

@edsu
Copy link
Member

edsu commented May 26, 2020

Awesome, thank you @sourabhnk. The reporting issue isn't something you did wrong, it is an improvement we need to make in the application!

@sourabhnk
Copy link
Author

Hi Ed,
I have couple of updates on this issue:

  1. For converting tsv file of tweet IDs, when I use read.table or read.csv in R, the IDs get corrupted moment the dataframe is created in R environment. But when I use fread, this issue doesnt crop up. So tweet IDs are maintained in their original form.
  2. Hydrator does throw up an error prompt only when the first row has a wrong ID.

Having said this, I have almost resolved the issue. If you would like, we can close this thread.

@edsu
Copy link
Member

edsu commented May 29, 2020

Thanks for the update @sourabhnk. I am starting to think we should encourage people to share tweet ids in a quoted form in .csv files to prevent this sort of thing from happening. You arent the first person to run into this problem and won't be the last!

I'd like to keep this issue open until the Hydrator better reports problems with empty or malformed JSON files. Thank you for your help in diagnosing the problem!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants