Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RFW0051: pecha.tools for STT #175

Open
spsither opened this issue Aug 11, 2023 · 0 comments
Open

RFW0051: pecha.tools for STT #175

spsither opened this issue Aug 11, 2023 · 0 comments
Assignees

Comments

@spsither
Copy link
Contributor

spsither commented Aug 11, 2023

Table of Contents

Housekeeping

Make sure to clearly understand Type-A and Type-B requests, and the relavant limitations. Failing to follow the guidelines pertaining to the two acceptable types of RFWs will automatically lead to the disqualification of the RFW.

Take time to complete each section below with as much detail as is required to establish a comprehensive understanding of the underlying product specification.

ALL BELOW FIELDS ARE REQUIRED

Owner

@TenzinGayche @spsither

Summary

Proposing a new platform for STT transcription work. Adding certain features that's lacking in Prodigy:

  • Organize users in Roles and Groups.
  • Reject a task and sent it back to the transcriber.
  • Generate report and track a task.
  • State Machine to handle task state.
  • Edit transcription before submit.

Is This Really Necessary?

We are facing problems with scaling up Prodigy. We are hoping that a custom software will curb some of the issues we have faced and make transcriber's job easier.

Motivation

In finding workarounds with Prodigy we have ended up creating up to 87 instances each for a group. Running a CRON job to query the database directly and upload to multile Google Sheets for reporting. We have also faced problem of duplicate task appearing thus wasting trasncriber and reviewer's time.

Named Concepts

  • We can use State Machine to represnt the state a task is in.
    Possible states for a task are:
    • Imported : the starting state of a task.
    • Transcribing : a task has been assigned to a transcriber and is currently being worked on.
    • Trashed : an audio segemnt is not usable and is trashed.
    • Submitted : a work has been submitted for reviewed.
    • Accepted : A reviewer has reviewed and has accepted the task.
    • Finalized : The Second reviewer has finalized the transcription and has accepted the tak in finalized state.
  • Group : Every transcriber belongs to a group (eg. stt_tt_ga, stt_cs_gb ..)
  • Role : Role can be Transcriber or Reviewer, Final-Reviewer.
    A user is assined a role.

Examples

  1. No duplicate task allocation to mulitple annotators.
  2. Report generation for users on a common platfrom.
  3. Trasncriber can go back to a task and eddit multiple times before submitting.
  4. Notify when a task has been rejected.

Conceptual Design

image
image

Drawbacks

Will require time and effort to build it from scratch.

Alternatives

Extend an existing Open source project like Lable-studio and fork it. Customize it for our needs.

New Data

We get a better idea of where a task is stuck in the pipeline. More analytics for transcriber performance.

Adaption Window

Four weeks.

@spsither spsither changed the title [RFW0051] pecha.tools for STT RFW0051: pecha.tools for STT Feb 26, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants