[Bug] - Handle non-PDF input documents #18

d-v-dlee · 2022-07-05T23:23:00Z

trying to run this solution (branch lmv2 on jpg inputs will cause an error. two files need to be updated. submitting an issue instead of PR since this is based on lmv2 v main branch.

Required changes:

1. `preprocess/inference.py`

update the SINGLE_IMAGE_CONTENT_TYPES dictionary on line 520 to include "image/jpg":'JPG"

2. `src/code/inference.py`

update logic for thumbnails to fix the logger message of "Thumbnails expected either array of PNG bytestrings or 4D images array. ". after the logging message add the following code:

if thumbnails.ndim == 3:
    logger.info('Resizing thumbnail of dimension 3 to dimension 4.')
    thumbnails = np.expand_dims(thumbnails, axis=0)

the not images logic also needs to be updated on line 428 and 445.

on line 428, the change is from if processor and not images: to if processor and images is None:. Otherwise it the error will say the comparison with a numpy array is ambigious.

Similariy, on line 445, it must be changed from **({"images": images} if images and processor else {}), to **({"images": images} if images is not None and processor else {}),

The text was updated successfully, but these errors were encountered:

athewsey · 2022-07-07T12:52:36Z

Thanks David!

I think the issue with wrong ndims should have been happening only when the thumbnailer endpoint returns an image variable instead of images, so have pushed the fix in de7ac69 rather than editing both sides of the if page_num is None condition.

From a quick test seems like this should fix the pipeline (up until A2I review of course, which only supports PDFs for now) - but let me know if there's a case I missed!

athewsey added the bug Something isn't working label Jul 7, 2022

athewsey closed this as completed in de7ac69 Aug 3, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Bug] - Handle non-PDF input documents #18

[Bug] - Handle non-PDF input documents #18

d-v-dlee commented Jul 5, 2022

athewsey commented Jul 7, 2022

[Bug] - Handle non-PDF input documents #18

[Bug] - Handle non-PDF input documents #18

Comments

d-v-dlee commented Jul 5, 2022

Required changes:

1. preprocess/inference.py

2. src/code/inference.py

athewsey commented Jul 7, 2022

1. `preprocess/inference.py`

2. `src/code/inference.py`