-
Notifications
You must be signed in to change notification settings - Fork 0
Open
Labels
C4GT BountySuccessful closure has a monetary rewardSuccessful closure has a monetary rewardC4GT CommunityAssignable to C4GT community developersAssignable to C4GT community developersC4GT codingCoding relatedCoding related
Description
Description
There is a need to extract all the required metadata from the given dataset (containing project clearances - Agenda and MoMs). This helps in the indexing stage to index the data based on key metadata information, improving data retrieval. Development and testing should be conducted using the training dataset. Further, the code/logic will be evaluated using the given script on the test dataset.
Goal
To develop an extractor that can accurately extract the required set of metadata from the given dataset.
Expected Outcome
Domain-Specific Metadata Extractor
- Extracts the required fields of metadata from the training and test datasets.
Evaluation Result
- Run the code/logic against the evaluation script (attached below) to evaluate the test dataset.
- The accepted performance score should be greater than or equal to 90%.
Acceptance Criteria
A context-specific extractor that achieves an accuracy of >= 90% in extracting the required metadata.
Implementation Details
- Build a solution to extract the required metadata (attachment) from the dataset.
- Use:
- Training dataset: Training
- Test dataset: Test
- Evaluation script: Evaluation
- Run the code/logic along with the parsed data in csv format against the evaluation script to achieve the desired accuracy.
- To understand the data in training and test dataset folders:
- There are always two related files present - Agenda & MOM.
- The Agenda file talks about the agenda of the meeting which contains some important information such as Agenda ID, Agenda creation date, Title of meeting, Meeting start date, Meeting end date.
- The rest of the information can be fetched from the MOM (Minutes of meeting) file. In short, MOM gives information about the things mentioned in Agenda.
- The full metadata list (fields that need to be extracted) can be found here.
Mockups/Wireframes
NOT APPLICABLE
Product Name
Jugalbandi
Organisation Name
OpenNyAI
Domain
Legal
Tech Skills Needed
- Python
Complexity
Medium
Category
Backend
Metadata
Metadata
Assignees
Labels
C4GT BountySuccessful closure has a monetary rewardSuccessful closure has a monetary rewardC4GT CommunityAssignable to C4GT community developersAssignable to C4GT community developersC4GT codingCoding relatedCoding related