Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[RFC0022] serving images for layout analysis #18

Open
6 of 12 tasks
ta4tsering opened this issue Feb 3, 2023 · 0 comments
Open
6 of 12 tasks

[RFC0022] serving images for layout analysis #18

ta4tsering opened this issue Feb 3, 2023 · 0 comments
Assignees

Comments

@ta4tsering
Copy link
Contributor

ta4tsering commented Feb 3, 2023

Work Planning

Details

Table of Contents

Housekeeping

[RFC0022] serving images for layout analysis

ALL BELOW FIELDS ARE REQUIRED

Named Concepts

Explain any new concepts introduced in this request.

Summary

selected unique images needs to be processed using the same methodology as used to process page annotation images and serve to prodigy recipe through a csv file to be streamed to layout_analysis instances

Reference-Level Explanation

  • I will be getting a .csv file containing the Repo name OCR### and work_id W######
  • download the repo, go through unique_images folder to get the list of the unique images
  • use the work_id to get all the s3_keys of all the images in the work_id on s3
  • get the s3_keys of all the unique_images list and write it in a .txt text file.
  • parse the .txt file containing s3_keys of selected unique images and process the images using the same processing methodology as used for the processing of sample images for page annotations which include resizing the image, compress the image and encode the image using Pillow
  • upload the processed images to a s3 bucket like openpecha.bdrc.io and append the processed image uploaded s3_key in a csv_file
  • give the csv_file_path to the prodigy recipe so it can parse the csv file to list of s3_keys to stream on prodigy.bdrc.io/layout_analysis/
  • The proposed changes interact with other systems (or other parts of the system that is changed)
  • The actual implementation will take place
  • Known challenges can be readily overcome

This section includes practical examples and explain how this proposal makes those examples work.

This section becomes the engineering specification and work plan, so it must be sufficiently detailed to faciliate for that.

Alternatives

Confirm that alternative approaches have been evaluated and explain those alternatives briefly.

Rationale

  • Why the currently proposed design was selected over alternatives?
  • What would be the impact of going with one of the alternative approaches?
  • Is the evaluation tentative, or is it recommended to use more time to evaluate different approaches?

Drawbacks

Describe any particular caveats and drawbacks that may arise from fulfilling this particular request?

Useful References

we already have all the scripts needed

  • What similar work have we already successfully completed?
  • Is this something that have already been built by others?
  • What other related learnings we have?
  • Are there useful academic literature or other articles related with this topic? (provide links)
  • Have we built a relevant prototype previously?
  • Do we have a rough mock for the UI/UX?
  • Do we have a schematic for the system?

Unresolved Questions

  • What is there that is unresolved (and will be resolved as part of fulfilling this request)?
  • Are there other requests with same or similar problems to solve?

Parts of the System Affected

  • Which parts of the current system are affected by this request?
  • What other open requests are closely related with this request?
  • Does this request depend on fulfillment of any other request?
  • Does any other request depend on the fulfillment of this request?*

Future possibilities

How do you see the particular system or part of the system affected by this request be altered or extended in the future.

Infrastructure

  • requires a s3 bucket to upload the processed selected unique images. like opnepecha.bdrc.io @ngawangtrinley

Testing

image-processing is already tested when used for the processing of images for page annotation

Documentation

Describe the level of documentation fulfilling this request involves. Consider both end-user documentation and developer documentation.

Version History

v0.1

Recordings

Links to audio recordings of related discussion.

Work Phases

Planning

Keep original naming and structure, and keep as first section in Work phases section

  • RFC completed on:
    Estimated time:
    Actual time:
  • RFC reviewed and approved by:
    Estimated time:
    Actual time:

Implementation

A list of checkboxes, one per PR. Each PR should have a descriptive name that clearly illustrates what the work phase is about.

  • PR 1
    Estimated time:
    Actual time:
  • PR 2
    Estimated time:
    Actual time:

Completion

  • Tested and approved by: @username @username
    Estimated time:
    Actual time:
  • Documentation approved @evanyerburgh
    Estimated time:
    Actual time:
@ta4tsering ta4tsering self-assigned this Feb 3, 2023
@ta4tsering ta4tsering transferred this issue from OpenPecha/Requests Feb 8, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant