Report JSON parsing errors when generating CSV #39

mihirp161 · 2020-05-10T15:45:15Z

Hello, I have received this error back to back when exporting to csv file:

edsu · 2020-05-10T20:27:37Z

Thanks for reporting this. Is it a large JSON file? Can you share it with me at ehs@pobox.com?

mihirp161 · 2020-05-10T20:30:32Z

Sure sir, thank you for looking into this. I will share it with you. I put a 20MB text file in the Hydrator, and in return I got 2.8GB JSONl file.

mihirp161 · 2020-05-10T22:49:22Z

Hello @edsu , just sent that json file which lead me to this error. Thank you & your team for making this tool!

edsu · 2020-05-11T11:33:32Z

I got it. It's interesting, of the 621,567 lines there appears to be one line (307,400) that has invalid JSON on it. It looks truncated in some way. Do you remember if you happened to have a storage problem or shutdown/stop the Hydrator about midway through hydrating this tweet id dataset? It shouldn't be a problem to start/restart but I should test it out to make sure.

Have you been able to hydrate successfully before?

edsu · 2020-05-11T12:21:12Z

@opendatasurgeon could you share your ID file with me too if it's not too much trouble? This should hopefully help me figure out what might have happened.

mihirp161 · 2020-05-11T14:32:51Z

I got it. It's interesting, of the 621,567 lines there appears to be one line (307,400) that has invalid JSON on it. It looks truncated in some way. Do you remember if you happened to have a storage problem or shutdown/stop the Hydrator about midway through hydrating this tweet id dataset? It shouldn't be a problem to start/restart but I should test it out to make sure.

Have you been able to hydrate successfully before?

No sir, I got no warning or note while hydrating. It was after when I hit the CSV button, json output gave no error. Also my computer didn't sleep, nor I have a storage issue. I have ~280 GB left on the drive where I ran this program. I didn't stop the hydration, I just waited until I get an option to export. It was a continuous process on my end. Only thing missing hardware wise is a dedicated audio device. Whenever I need to hear something, I just put a headset.

mihirp161 · 2020-05-11T14:34:11Z

@opendatasurgeon could you share your ID file with me too if it's not too much trouble? This should hopefully help me figure out what might have happened.

Sure Mr. Summers. Will email the ID file which caused this error and also give you the id file and the json file where I was able to get an output without an error.

mihirp161 · 2020-05-11T18:40:46Z

Mr. Summers, sent you an email containing id files. Please let me know if there is anything else you need. Appreciate your help again!

edsu · 2020-05-11T18:56:52Z

Ok, I will take a look. I rehydrated the tweet ids and also have tested generating CSV from hydrated data that I tried start/stopping, quitting and stopping the network during and wasn't able to get a corrupted jsonl file.

edsu · 2020-05-11T22:01:15Z

Thanks @opendatasurgeon. I'm confused about the relationship between the output-2020-03-22.jsonl file you sent me (which has 621,567 lines) and the output-2020-03-22.txt file you sent (which has 1,069.472 lines). I would expect the numbers to be much closer together unless a large number had been deleted? Can you see the % deleted value on the dataset detail view? It looks like it didn't finish hydrating, or a significant chunk of the file was lost? Can you please try hydrating it again and see if you have the same problem? I can also try hydrating here and see what happens.

mihirp161 · 2020-05-11T22:17:03Z

Will do sir. I unfortunately did';t capture the screenshot if the % lost. I will however re-run this. And report it to you. Thank you :)

mihirp161 · 2020-05-11T22:32:36Z

Oh yes, before I forget to tell you. The source of the data I am using to hydrate is here: https://github.com/echen102/COVID-19-TweetIDs
This text file (output-2020-03-22.txt) containing twitter IDs is from March (2020-03) directory. You will see name difference because I combined all the chunks pertaining to same days together in one text file. I believe they are separate in the original data source, because GitHub only allows 25mb uploads, that's why they split each giant texts files in parts. But I am re-running it now.

mihirp161 · 2020-05-12T03:55:11Z

Hello @edsu Just finished hydrating the text file. No error this time. Same computer, same process (meaning continuous hydration, no-computer sleeping, no disconnects or starting or stopping), I don't get it. The CSV and JSON files are surely smaller in size. Please see the screenshots below.

edsu · 2020-05-12T12:34:41Z

@opendatasurgeon hydration worked for me too, 53% of the tweets have been deleted! That is a shockingly large percentage for such a recent set of tweets (just over a month old). I guess it's not surprising given that these appear to be COVID-19 related tweets, and there have been widespread disinformation campaigns about it.

Do you think it's possible you may have overwritten an output file accidentally when hydrating two files at the same time?

I'm going to leave this ticket open because Hydrator should report an error in the JSON rather than throwing an exception.

edsu · 2020-05-12T14:29:49Z

I closed this by accident and want to keep it open until Hydrator reports errors better.

mihirp161 · 2020-05-12T15:07:40Z

Do you think it's possible you may have overwritten an output file accidentally when hydrating two files at the same time?

I don't think I did overwrite any of my file sir. I always name my files differently, and I believe I just merged the files, that were chunk of same days, together into one. I in fact left duplicates IDs in these files because I was going to clean CSV files anyways. So don't know what could have happened.

Do you think maybe there is limit a I should test with a May file? I know for sure first time around the deletion % wasn't above 30-40%. Let me report it you. Thanks again for helping out Mr Summers!!

mihirp161 · 2020-05-18T14:53:40Z

I am not getting this error these days even when hydrating >1500000 twitter ids text file. I will keep you in loop Mr. Summers. Thank you!

edsu · 2020-05-18T17:02:50Z

Well that's a relief. If you notice it again and can figure out a way to reproduce please let us know. I will leave this ticket open until the app reports the JSON parse error better.

edsu closed this as completed May 12, 2020

edsu reopened this May 12, 2020

edsu mentioned this issue May 14, 2020

Unexpected End of JSON input #41

Closed

edsu mentioned this issue May 21, 2020

Report JSON parsing errors. #42

Open

edsu changed the title ~~JS error at the main process, unexpected token at JSON in position 0~~ Report JSON parsing errors when generating CSV Sep 20, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Report JSON parsing errors when generating CSV #39

Report JSON parsing errors when generating CSV #39

mihirp161 commented May 10, 2020 •

edited

Loading

edsu commented May 10, 2020

mihirp161 commented May 10, 2020 •

edited

Loading

mihirp161 commented May 10, 2020

edsu commented May 11, 2020

edsu commented May 11, 2020

mihirp161 commented May 11, 2020 •

edited

Loading

mihirp161 commented May 11, 2020

mihirp161 commented May 11, 2020

edsu commented May 11, 2020

edsu commented May 11, 2020 •

edited

Loading

mihirp161 commented May 11, 2020 •

edited

Loading

mihirp161 commented May 11, 2020 •

edited

Loading

mihirp161 commented May 12, 2020

edsu commented May 12, 2020 •

edited

Loading

edsu commented May 12, 2020

mihirp161 commented May 12, 2020

mihirp161 commented May 18, 2020

edsu commented May 18, 2020

Report JSON parsing errors when generating CSV #39

Report JSON parsing errors when generating CSV #39

Comments

mihirp161 commented May 10, 2020 • edited Loading

edsu commented May 10, 2020

mihirp161 commented May 10, 2020 • edited Loading

mihirp161 commented May 10, 2020

edsu commented May 11, 2020

edsu commented May 11, 2020

mihirp161 commented May 11, 2020 • edited Loading

mihirp161 commented May 11, 2020

mihirp161 commented May 11, 2020

edsu commented May 11, 2020

edsu commented May 11, 2020 • edited Loading

mihirp161 commented May 11, 2020 • edited Loading

mihirp161 commented May 11, 2020 • edited Loading

mihirp161 commented May 12, 2020

edsu commented May 12, 2020 • edited Loading

edsu commented May 12, 2020

mihirp161 commented May 12, 2020

mihirp161 commented May 18, 2020

edsu commented May 18, 2020

mihirp161 commented May 10, 2020 •

edited

Loading

mihirp161 commented May 10, 2020 •

edited

Loading

mihirp161 commented May 11, 2020 •

edited

Loading

edsu commented May 11, 2020 •

edited

Loading

mihirp161 commented May 11, 2020 •

edited

Loading

mihirp161 commented May 11, 2020 •

edited

Loading

edsu commented May 12, 2020 •

edited

Loading