Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Failed to load json using detect.py #2086

Closed
boesiii opened this issue Apr 4, 2019 · 18 comments
Closed

Failed to load json using detect.py #2086

boesiii opened this issue Apr 4, 2019 · 18 comments
Assignees

Comments

@boesiii
Copy link

boesiii commented Apr 4, 2019

In which file did you encounter the issue?

python-docs-samples/vision/cloud-client/detect/detect.py

Did you change the file? If so, how?

No.

Describe the issue

I tried using detect.py on a PDF that is stored in Google Cloud. Below is a sample of the code I tried
C:\temp1\google_vision>python detect.py ocr-uri gs://my_bucket_name/file_1003.pdf gs://my_bucket_name/output/

When I run my code I get the following error:

C:\temp1\google_vision>python detect.py ocr-uri gs://matr/file_1003.pdf gs://mat
r/output
Waiting for the operation to finish.
Output files:
output/
output/clsoutput-1-to-2.json
output/output-1-to-2.json
outputoutput-1-to-2.json
Traceback (most recent call last):
  File "C:\Program Files (x86)\Python37-32\lib\site-packages\google\protobuf\jso
n_format.py", line 416, in Parse
    js = json.loads(text, object_pairs_hook=_DuplicateChecker)
  File "C:\Program Files (x86)\Python37-32\lib\json\__init__.py", line 361, in l
oads
    return cls(**kw).decode(s)
  File "C:\Program Files (x86)\Python37-32\lib\json\decoder.py", line 337, in de
code
    obj, end = self.raw_decode(s, idx=_w(s, 0).end())
  File "C:\Program Files (x86)\Python37-32\lib\json\decoder.py", line 355, in ra
w_decode
    raise JSONDecodeError("Expecting value", s, err.value) from None
json.decoder.JSONDecodeError: Expecting value: line 1 column 1 (char 0)

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "detect.py", line 955, in <module>
    run_uri(args)
  File "detect.py", line 835, in run_uri
    async_detect_document(args.uri, args.destination_uri)
  File "detect.py", line 720, in async_detect_document
    json_string, vision.types.AnnotateFileResponse())
  File "C:\Program Files (x86)\Python37-32\lib\site-packages\google\protobuf\jso
n_format.py", line 418, in Parse
    raise ParseError('Failed to load JSON: {0}.'.format(str(e)))
google.protobuf.json_format.ParseError: Failed to load JSON: Expecting value: li
ne 1 column 1 (char 0).

How can I avoid this error? There is a resulting JSON file in the output folder.

@boesiii
Copy link
Author

boesiii commented Apr 9, 2019

It looks like the error is more about how it parses the JSON output file.

@nnegrey
Copy link
Contributor

nnegrey commented Apr 17, 2019

Hi, from your call C:\temp1\google_vision>python detect.py ocr-uri gs://matr/file_1003.pdf gs://mat r/output

It looks like you might be missing the end / on the gcs_destination_uri.

Should be: C:\temp1\google_vision>python detect.py ocr-uri gs://matr/file_1003.pdf gs://mat r/output/

Let me know if that works.

@nnegrey nnegrey self-assigned this Apr 17, 2019
@boesiii
Copy link
Author

boesiii commented Apr 18, 2019

No, still the same error.

@nnegrey
Copy link
Contributor

nnegrey commented Apr 18, 2019

Is your target GCS bucket empty?

@boesiii
Copy link
Author

boesiii commented Apr 18, 2019

I created a new folder in my bucket and targeted that folder and still received the error.

@nnegrey
Copy link
Contributor

nnegrey commented Apr 18, 2019

Does it still throw an error if you use our example pdf?
gs://python-docs-samples-tests/HodgeConj.pdf

@boesiii
Copy link
Author

boesiii commented Apr 19, 2019

Yes, still same error.

@nnegrey
Copy link
Contributor

nnegrey commented Apr 22, 2019

For the example pdf (gs://python-docs-samples-tests/HodgeConj.pdf), can you share a little bit of the contents of the output file?

@boesiii
Copy link
Author

boesiii commented Apr 22, 2019

Here are the first 75 lines

{
	"inputConfig": {
		"gcsSource": {
			"uri": "gs://python-docs-samples-tests/HodgeConj.pdf"
		},
		"mimeType": "application/pdf"
	},
	"responses": [{
			"fullTextAnnotation": {
				"pages": [{
						"property": {
							"detectedLanguages": [{
									"languageCode": "en",
									"confidence": 0.97
								}, {
									"languageCode": "az",
									"confidence": 0.02
								}
							]
						},
						"width": 595,
						"height": 842,
						"blocks": [{
								"boundingBox": {
									"normalizedVertices": [{
											"x": 0.09243698,
											"y": 0.059382424
										}, {
											"x": 0.5243698,
											"y": 0.066508316
										}, {
											"x": 0.5243698,
											"y": 0.07482185
										}, {
											"x": 0.09243698,
											"y": 0.06769596
										}
									]
								},
								"paragraphs": [{
										"boundingBox": {
											"normalizedVertices": [{
													"x": 0.09243698,
													"y": 0.059382424
												}, {
													"x": 0.5243698,
													"y": 0.066508316
												}, {
													"x": 0.5243698,
													"y": 0.07482185
												}, {
													"x": 0.09243698,
													"y": 0.06769596
												}
											]
										},
										"words": [{
												"property": {
													"detectedLanguages": [{
															"languageCode": "en"
														}
													]
												},
												"boundingBox": {
													"normalizedVertices": [{
															"x": 0.09243698,
															"y": 0.059382424
														}, {
															"x": 0.13781513,
															"y": 0.060570072
														}, {
															"x": 0.13781513,
															"y": 0.06888361
														}, {

@boesiii
Copy link
Author

boesiii commented Apr 22, 2019

Here are the three total files
test2_output-1-to-2.zip
test2_output-3-to-4.zip
test2_output-5-to-5.zip

@nnegrey
Copy link
Contributor

nnegrey commented Apr 23, 2019

Alright, cool. It looks like the Vision API call is successful, but when retrieving the results from GCS there seems to be an issue.

Are you on the latest version for the storage API?
If you run pip freeze | grep google

@boesiii
Copy link
Author

boesiii commented Apr 24, 2019

pip freeze | findstr google
google-api-core==1.8.2
google-auth==1.6.3
google-cloud-bigquery==1.10.0
google-cloud-core==0.29.1
google-cloud-storage==1.14.0
google-cloud-vision==0.36.0
google-resumable-media==0.3.2
googleapis-common-protos==1.5.9

@boesiii
Copy link
Author

boesiii commented Apr 24, 2019

I updated google cloud storage to 1.15.0 but I still get the same error

@benbluhm
Copy link

I had this issue and determined it was caused by the prefix being iterated as part of the bloblist. I can see that "output/" is listed as a file in your output, and subsequently has parsing attempted on it causing the error.

Try hardcoding a prefix something like prefix = 'output/out' and that folder won't be included in the list.

The demo code should probably be modified to handle this simple case a little better.

@APerson101
Copy link

@benbluhm your suggestion solved my issue, thank you

@boesiii
Copy link
Author

boesiii commented Apr 29, 2019

Yes. It worked for me also.

@nnegrey
Copy link
Contributor

nnegrey commented Apr 29, 2019

Thanks, @benbluhm!
Closing the issue.

@nnegrey nnegrey closed this as completed Apr 29, 2019
@arindam-halder
Copy link

Hi Guys can someone put in the updated sample code. That would be great. \

Thanks

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants