You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Explain any new concepts introduced in this request.
Summary
selected unique images needs to be processed using the same methodology as used to process page annotation images and serve to prodigy recipe through a csv file to be streamed to layout_analysis instances
Reference-Level Explanation
I will be getting a .csv file containing the Repo name OCR### and work_id W######
download the repo, go through unique_images folder to get the list of the unique images
use the work_id to get all the s3_keys of all the images in the work_id on s3
get the s3_keys of all the unique_images list and write it in a .txt text file.
parse the .txt file containing s3_keys of selected unique images and process the images using the same processing methodology as used for the processing of sample images for page annotations which include resizing the image, compress the image and encode the image using Pillow
upload the processed images to a s3 bucket like openpecha.bdrc.io and append the processed image uploaded s3_key in a csv_file
give the csv_file_path to the prodigy recipe so it can parse the csv file to list of s3_keys to stream on prodigy.bdrc.io/layout_analysis/
The proposed changes interact with other systems (or other parts of the system that is changed)
The actual implementation will take place
Known challenges can be readily overcome
This section includes practical examples and explain how this proposal makes those examples work.
This section becomes the engineering specification and work plan, so it must be sufficiently detailed to faciliate for that.
Alternatives
Confirm that alternative approaches have been evaluated and explain those alternatives briefly.
Rationale
Why the currently proposed design was selected over alternatives?
What would be the impact of going with one of the alternative approaches?
Is the evaluation tentative, or is it recommended to use more time to evaluate different approaches?
Drawbacks
Describe any particular caveats and drawbacks that may arise from fulfilling this particular request?
Useful References
we already have all the scripts needed
What similar work have we already successfully completed?
Is this something that have already been built by others?
What other related learnings we have?
Are there useful academic literature or other articles related with this topic? (provide links)
Have we built a relevant prototype previously?
Do we have a rough mock for the UI/UX?
Do we have a schematic for the system?
Unresolved Questions
What is there that is unresolved (and will be resolved as part of fulfilling this request)?
Are there other requests with same or similar problems to solve?
Parts of the System Affected
Which parts of the current system are affected by this request?
What other open requests are closely related with this request?
Does this request depend on fulfillment of any other request?
Does any other request depend on the fulfillment of this request?*
Future possibilities
How do you see the particular system or part of the system affected by this request be altered or extended in the future.
Infrastructure
requires a s3 bucket to upload the processed selected unique images. like opnepecha.bdrc.io@ngawangtrinley
Testing
image-processing is already tested when used for the processing of images for page annotation
Documentation
Describe the level of documentation fulfilling this request involves. Consider both end-user documentation and developer documentation.
Version History
v0.1
Recordings
Links to audio recordings of related discussion.
Work Phases
parse .csv file containing the Repo name OCR### and work_id W######
time estimation: 10 min
time taken: 10 min
download the repo, go through unique_images folder to get the list of the unique images
time estimation: 10 min
time taken: 10 min
use the work_id to get all the s3_keys of all the images in the work_id on s3
time estimation: 1 hour
time taken: 1 hour
get the s3_keys of all the unique_images list and write it in a .txt text file on prodigy-tools.
time estimation: 30 min
time taken: 30 min
Work Planning
Details
Table of Contents
Housekeeping
[RFC0022] serving images for layout analysis
ALL BELOW FIELDS ARE REQUIRED
Named Concepts
Explain any new concepts introduced in this request.
Summary
selected unique images needs to be processed using the same methodology as used to process page annotation images and serve to prodigy recipe through a csv file to be streamed to layout_analysis instances
Reference-Level Explanation
OCR###
and work_idW######
unique_images
folder to get the list of the unique images.txt
text file..txt
file containing s3_keys of selected unique images and process the images using the same processing methodology as used for the processing of sample images for page annotations which include resizing the image, compress the image and encode the image using Pillowopenpecha.bdrc.io
and append the processed image uploaded s3_key in a csv_filecsv_file_path
to the prodigy recipe so it can parse the csv file to list of s3_keys to stream onprodigy.bdrc.io/layout_analysis/
This section includes practical examples and explain how this proposal makes those examples work.
This section becomes the engineering specification and work plan, so it must be sufficiently detailed to faciliate for that.
Alternatives
Confirm that alternative approaches have been evaluated and explain those alternatives briefly.
Rationale
Drawbacks
Describe any particular caveats and drawbacks that may arise from fulfilling this particular request?
Useful References
we already have all the scripts needed
Unresolved Questions
Parts of the System Affected
Future possibilities
How do you see the particular system or part of the system affected by this request be altered or extended in the future.
Infrastructure
opnepecha.bdrc.io
@ngawangtrinleyTesting
image-processing is already tested when used for the processing of images for page annotation
Documentation
Describe the level of documentation fulfilling this request involves. Consider both end-user documentation and developer documentation.
Version History
v0.1
Recordings
Links to audio recordings of related discussion.
Work Phases
OCR###
and work_idW######
time estimation: 10 min
time taken: 10 min
unique_images
folder to get the list of the unique imagestime estimation: 10 min
time taken: 10 min
time estimation: 1 hour
time taken: 1 hour
.txt
text file onprodigy-tools
.time estimation: 30 min
time taken: 30 min
openpecha.bdrc.io
#19time estimation: 1 hours
time taken:
csv_file_path
to the prodigy recipe so it can go through the list of s3_keys to stream onprodigy.bdrc.io/layout_analysis/
#20time estimation: 10 min
time taken:
Planning
Keep original naming and structure, and keep as first section in Work phases section
Estimated time:
Actual time:
Estimated time:
Actual time:
Implementation
A list of checkboxes, one per PR. Each PR should have a descriptive name that clearly illustrates what the work phase is about.
Estimated time:
Actual time:
Estimated time:
Actual time:
Completion
Estimated time:
Actual time:
Estimated time:
Actual time:
The text was updated successfully, but these errors were encountered: