-
Notifications
You must be signed in to change notification settings - Fork 64
Description
Is your enhancement related to a problem? Please describe.
Ideally we should be looking to update any APIs we use to their latest versions on a regular basis. This issue is focused on any Azure APIs we use. The following is a list of the APIs we are using and the version.
- Analyze Image v3.0
- OCR v3.2
- Read v3.2
- Generate Thumbnail v3.1
- Personalizer v1.0
- TTS
cognitiveservices/v1
For the Personalizer API, v1.0 is the latest (though there is a v1.1 in preview) so nothing needed there. Same for our Text to Speech API, we are currently using the latest version.
The Analyze Image, OCR, Read and Generate Thumbnail APIs are all under the same service (previously known as Cognitive Services Computer Vision, since renamed to Azure AI Vision). The latest released version of this API is v3.2, while there is a v4.0 public preview API.
Azure is pushing for everyone to use the new v4.0 public preview API but in researching this, there are currently some limitations that may hold us back. For instance, generating image captions or smart cropping are only available in a small set of regions in v4.0 (East US, France Central, Korea Central, North Europe, Southeast Asia, West Europe, and West US, East Asia).
There's also been quite a few changes to these APIs in v4.0, so will take some refactoring if we pursue these updates. For instance, all existing features we use, outside of reading content from PDFs, is now under a single Analyze API in v4.0. This will require some changes to how our code works to account for this.
That said, assuming we're okay with the region limitations, I'd like to pursue updating all of those to v4.0. If we're not okay with that, I think it would be ideal to get all of those on v3.2 (so just Analyze Image and Generate Thumbnail).
I tried updating to v3.2 of the Analyze Image API and while the results we get seem good, the confidence scores, at least for image captions, are lower, so that's something we would need to determine how best to handle (in using the Vision Studio tool, this seems to have been fixed in v4.0). Their docs even mention:
In general, we advise a confidence threshold of 0.4 for the Image Analysis 3.2 API and of 0.0 for the Image Analysis 4.0 API (preview).
If we decide to update to v4.0, here's tasks as I see them:
- Update Analyze Image API to v4.0 and address any issues there. I believe we'll need to update how we send data and how we parse the received response
- Update how we handle OCR to use this new API
- Update how we handle generating thumbnails to use this new API
- Investigate the Read API. It seems like this functionality moved to a new API (Document Intelligence). We should investigate what it would mean to use that API instead. We may find it's not worth the effort and we leave this on the current v3.2 API
If we stick with v3.2, here's what we'll want to do:
- Update the Analyze Image API to v3.2 and modify how we handle error responses (this changed in v3.2).
- Update how we deal with confidence scores to account for lower scores in v3.2
- Update the Generate Thumbnail API to v3.2 and address any issues there
Designs
No response
Describe alternatives you've considered
No response
Code of Conduct
- I agree to follow this project's Code of Conduct
Metadata
Metadata
Assignees
Labels
Type
Projects
Status