Skip to content

Extract the required metadata from the given training and test dataset #1

@KaranrajM

Description

@KaranrajM

Description

There is a need to extract all the required metadata from the given dataset (containing project clearances - Agenda and MoMs). This helps in the indexing stage to index the data based on key metadata information, improving data retrieval. Development and testing should be conducted using the training dataset. Further, the code/logic will be evaluated using the given script on the test dataset.

Goal

To develop an extractor that can accurately extract the required set of metadata from the given dataset.

Expected Outcome

Domain-Specific Metadata Extractor

  • Extracts the required fields of metadata from the training and test datasets.

Evaluation Result

  • Run the code/logic against the evaluation script (attached below) to evaluate the test dataset.
  • The accepted performance score should be greater than or equal to 90%.

Acceptance Criteria

A context-specific extractor that achieves an accuracy of >= 90% in extracting the required metadata.

Implementation Details

  • Build a solution to extract the required metadata (attachment) from the dataset.
  • Use:
  • Run the code/logic along with the parsed data in csv format against the evaluation script to achieve the desired accuracy.
  • To understand the data in training and test dataset folders:
    • There are always two related files present - Agenda & MOM.
    • The Agenda file talks about the agenda of the meeting which contains some important information such as Agenda ID, Agenda creation date, Title of meeting, Meeting start date, Meeting end date.
    • The rest of the information can be fetched from the MOM (Minutes of meeting) file. In short, MOM gives information about the things mentioned in Agenda.
    • The full metadata list (fields that need to be extracted) can be found here.

Mockups/Wireframes

NOT APPLICABLE

Product Name

Jugalbandi

Organisation Name

OpenNyAI

Domain

Legal

Tech Skills Needed

  • Python

Complexity

Medium

Category

Backend

Metadata

Metadata

Assignees

Labels

C4GT BountySuccessful closure has a monetary rewardC4GT CommunityAssignable to C4GT community developersC4GT codingCoding related

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions