Error: json.decoder.JSONDecodeError: Expecting value: line 1 column 1 (char 0) #4

DehengYang · 2020-07-30T02:56:31Z

I have met the following error when running python3 -m sosed.run -i input_examples/input.txt -o output/output_example:

(sosed-env) dale@dale:~/sosed$ python3 -m sosed.run -i input_examples/input.txt -o output/output_example
Running tokenizer on repos listed in input_examples/input.txt
Parser successfully initialized.
Enry successfully initialized.
Tokenizing the repositories.
Tokenizing batch 1 out of 1.
  0%|                                                                                                                                   | 0/1 [00:00<?, ?it/s]Segmentation fault
  0%|                                                                                                                                   | 0/1 [05:51<?, ?it/s]
Traceback (most recent call last):
  File "/home/apr/anaconda3/envs/sosed-env/lib/python3.7/runpy.py", line 193, in _run_module_as_main
    "__main__", mod_spec)
  File "/home/apr/anaconda3/envs/sosed-env/lib/python3.7/runpy.py", line 85, in _run_code
    exec(code, run_globals)
  File "/home/apr/apr_tools/sosed/sosed/run.py", line 198, in <module>
    tokenize(args.input, args.output, args.batches, args.local, args.force)
  File "/home/apr/apr_tools/sosed/sosed/run.py", line 36, in tokenize
    run_tokenizer(tokenizer_args)
  File "/home/apr/apr_tools/sosed/tokenizer/identifiers_extractor/run.py", line 19, in main
    batch_size=int(args.batches), local=args.local)
  File "/home/apr/apr_tools/sosed/tokenizer/identifiers_extractor/parsing.py", line 304, in tokenize_repositories
    lang2files = recognize_languages(td)
  File "/home/apr/apr_tools/sosed/tokenizer/identifiers_extractor/parsing.py", line 212, in recognize_languages
    .format(enry_loc=get_enry(), directory=directory)))
  File "/home/apr/anaconda3/envs/sosed-env/lib/python3.7/json/__init__.py", line 348, in loads
    return _default_decoder.decode(s)
  File "/home/apr/anaconda3/envs/sosed-env/lib/python3.7/json/decoder.py", line 337, in decode
    obj, end = self.raw_decode(s, idx=_w(s, 0).end())
  File "/home/apr/anaconda3/envs/sosed-env/lib/python3.7/json/decoder.py", line 355, in raw_decode
    raise JSONDecodeError("Expecting value", s, err.value) from None
json.decoder.JSONDecodeError: Expecting value: line 1 column 1 (char 0)

The content of the input.txt file is:

https://github.com/google/closure-compiler

Is there any way to deal with such error? It would be much appreciated if any guidance or solution could be provided.

Thanks!

The text was updated successfully, but these errors were encountered:

egor-bogomolov · 2020-07-30T10:19:33Z

Hi! Could you specify your system details (mainly OS)? I've tried to reinstall Sosed from scratch and run your example, and it worked fine:
python3 -m sosed.run -i input.txt -o output/closure/
...

Query project: https://github.com/google/closure-compiler
https://github.com/st-js/st-js | similarity = 1.2118
https://github.com/google/compile-testing | similarity = 1.2119
https://github.com/rzwitserloot/lombok | similarity = 1.2121
https://github.com/cincheo/jsweet | similarity = 1.2137
https://github.com/peichhorn/lombok-pg | similarity = 1.2138
https://github.com/codemix/babel-plugin-typecheck | similarity = 1.2142
https://github.com/BladeRunnerJS/brjs | similarity = 1.2143
https://github.com/nativelibs4java/Scalaxy | similarity = 1.2144
https://github.com/ceylon/ceylon-compiler | similarity = 1.2144
https://github.com/google/closure-templates | similarity = 1.2145

I will look deeper into your stacktrace and try to understand the reason but information about your system will definitely be helpful.

egor-bogomolov · 2020-07-30T10:33:22Z

Based on the stacktrace, something went wrong during the language recognition step.
Could you please run enry from the command line and attach the output?

tokenizer/identifiers_extractor/language_recognition/build/enry -json -mode files [path to some small directory with code]

DehengYang · 2020-07-30T10:37:41Z

Thank you so much for the detailed guidance!

My OS is Ubuntu 14.04.

I ran this command you provided and it reported Segmentation fault:

dale@dale:~/dale/sosed$ tokenizer/identifiers_extractor/language_recognition/build/enry -json -mode files Closure/
Segmentation fault
dale@dale:~/dale/sosed$ tokenizer/identifiers_extractor/language_recognition/build/enry -json -mode files Closure/src/com/google/javascript/jscomp/
Segmentation fault

I have no idea why this occurs...

Thank you again for your help!

egor-bogomolov · 2020-07-30T11:23:38Z

Thanks a lot! What I suspect is:

We use enry to detect languages in source code files
Instead of building it from scratch when setting up the tokenizer, we just download a proper release version based on the OS
In your case, the prebuilt version seems to fail :(

A workaround for now would be to build the enry from scratch and check whether it will work. We will think about the proper way to add the build step in case the prebuilt one fails. On this, refer to an issue in Buckwheat I've just opened.

DehengYang · 2020-07-30T16:07:59Z

Thank you so much for providing this workaround! There is no such error since I downloaded the relased version of entry and re-run python3 -m sosed.run -i input_examples/input.txt -o output/output_example.

However, when it comes to the data downloading, the speed is extremely slow, especially at downloading https://s3-eu-west-1.amazonaws.com/resources.ml.labs.aws.intellij.net/sosed/data_stars_100.tar.xz. I also tried different network and downloaded it via web browser but all failed. It there any other way to have this tar.xz? Thank you!

DehengYang · 2020-07-30T16:15:37Z

Or could you please send me a copy of data_stars_100.tar.xz, data_stars_50.tar.xz and data_stars_10.tar.xz via email? (My email is dehengyang@qq.com) It would be sincerely appreciated if any help could be offered.

Thank you again for your great help and patience!

egor-bogomolov · 2020-07-30T21:54:19Z

I've sent you two emails. The archives are too large to send directly, so the emails contain links to Google Drive / OneDrive. If neither works for you, let's try some other file sharing service :)

DehengYang · 2020-07-31T02:39:08Z

Thank you so much for the great and timely help which really helps me a lot! I now haved all the data_stars_{}.tar.xz files. I have decompressed them into the folder like below:

However, a new error occurs as follows when I run python3 -m sosed.run -i input_examples/input.txt -o output/closure/ --force or python3 -m sosed.run -i input_examples/input.txt -o output/closure/ (without --force):

(sosed-env) dale@dale:~/dale/sosed$ python3 -m sosed.run -i input_examples/input.txt -o output/closure/ --force
Running tokenizer on repos listed in input_examples/input.txt
Parser successfully initialized.
Enry successfully initialized.
Tokenizing the repositories.
Tokenizing batch 1 out of 1.
100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1/1 [07:08<00:00, 428.97s/it]
Tokenization successfully completed.
Found 1 batches with tokenized data.
Assigning clusters to tokens from vocab file.
100%|████████████████████████████████████████████████████████████████████████████████████████████████████████| 39002982/39002982 [00:23<00:00, 1677005.10it/s]
Computing vectors for 1 repositories.
Extracting stats for 1 repositories
Traceback (most recent call last):
  File "/home/apr/anaconda3/envs/sosed-env/lib/python3.7/runpy.py", line 193, in _run_module_as_main
    "__main__", mod_spec)
  File "/home/apr/anaconda3/envs/sosed-env/lib/python3.7/runpy.py", line 85, in _run_code
    exec(code, run_globals)
  File "/home/apr/apr_tools/sosed/sosed/run.py", line 201, in <module>
    analyze(processed_data, args.min_stars, args.closest, args.explain, args.metric, args.lang)
  File "/home/apr/apr_tools/sosed/sosed/run.py", line 107, in analyze
    clusters_info = get_clusters_info()
  File "/home/apr/apr_tools/sosed/sosed/utils.py", line 129, in get_clusters_info
    return pickle.load(filepath.open('rb'))
  File "/home/apr/anaconda3/envs/sosed-env/lib/python3.7/pathlib.py", line 1203, in open
    opener=self._opener)
  File "/home/apr/anaconda3/envs/sosed-env/lib/python3.7/pathlib.py", line 1058, in _opener
    return self._accessor.open(self, flags, mode)
FileNotFoundError: [Errno 2] No such file or directory: 'data/clusters_info.pkl'

It would be much appreciated if further guidance could be provided.

Thank you again for your great kindness and help!

egor-bogomolov · 2020-07-31T09:03:05Z

Well, as the error states, you are missing a pickle file :)
Fortunately, it is a small one, stored right in the repository. It should have downloaded during the repo cloning, but either something went wrong, or you deleted it afterwards :)

DehengYang · 2020-08-02T08:53:27Z

Thank you very much for pointing out this! I am sorry that I unintentionally deleted this file at the very beginning when I was trying to solve the error reported above by myself. Now every thing work well! And I obtained the same ouptut as yours:

Found tokenizer output in output/closure/.
If you want to re-run tokenizer, pass --force flag.
Found precomputed vectors in output/closure.
If you wan to re-run vector computation, pass --force flag.

-----------------------
Query project: https://github.com/google/closure-compiler
https://github.com/st-js/st-js | similarity = 1.2102
https://github.com/google/compile-testing | similarity = 1.2102
https://github.com/rzwitserloot/lombok | similarity = 1.2105
https://github.com/cincheo/jsweet | similarity = 1.2119
https://github.com/peichhorn/lombok-pg | similarity = 1.2121
https://github.com/codemix/babel-plugin-typecheck | similarity = 1.2126
https://github.com/nativelibs4java/Scalaxy | similarity = 1.2126
https://github.com/BladeRunnerJS/brjs | similarity = 1.2126
https://github.com/ceylon/ceylon-compiler | similarity = 1.2127
https://github.com/google/closure-templates | similarity = 1.2129
-----------------------

Sorry for my delayed reply. And thank you again for your continuous help and great kindness. This issue is well solved now.

Wish you a nice day!

egor-bogomolov mentioned this issue Jul 30, 2020

Prebuilt version of enry might not work on some platforms JetBrains-Research/buckwheat#7

Open

DehengYang closed this as completed Aug 2, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Error: json.decoder.JSONDecodeError: Expecting value: line 1 column 1 (char 0) #4

Error: json.decoder.JSONDecodeError: Expecting value: line 1 column 1 (char 0) #4

DehengYang commented Jul 30, 2020

egor-bogomolov commented Jul 30, 2020

egor-bogomolov commented Jul 30, 2020 •

edited

DehengYang commented Jul 30, 2020

egor-bogomolov commented Jul 30, 2020

DehengYang commented Jul 30, 2020

DehengYang commented Jul 30, 2020

egor-bogomolov commented Jul 30, 2020

DehengYang commented Jul 31, 2020

egor-bogomolov commented Jul 31, 2020

DehengYang commented Aug 2, 2020 •

edited

Error: json.decoder.JSONDecodeError: Expecting value: line 1 column 1 (char 0) #4

Error: json.decoder.JSONDecodeError: Expecting value: line 1 column 1 (char 0) #4

Comments

DehengYang commented Jul 30, 2020

egor-bogomolov commented Jul 30, 2020

egor-bogomolov commented Jul 30, 2020 • edited

DehengYang commented Jul 30, 2020

egor-bogomolov commented Jul 30, 2020

DehengYang commented Jul 30, 2020

DehengYang commented Jul 30, 2020

egor-bogomolov commented Jul 30, 2020

DehengYang commented Jul 31, 2020

egor-bogomolov commented Jul 31, 2020

DehengYang commented Aug 2, 2020 • edited

egor-bogomolov commented Jul 30, 2020 •

edited

DehengYang commented Aug 2, 2020 •

edited