Skip to content

SummarAIze: Machine Learning-based Data Summarization from PDFs and Images

Notifications You must be signed in to change notification settings

Atharva0506/Summarizer

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 
 
 

Repository files navigation

Summarizer

1. Problem

The project aims to address the challenge of generating summaries from PDF books and images using machine learning.

2. Data

The dataset utilized in this project can be found at the following link: https://huggingface.co/datasets/samsum

3. Evaluation

The evaluation metrics for this project are not specified and need to be completed. It is crucial to define the criteria for assessing the quality of the generated summaries. Possible evaluation metrics could include precision, recall, F1 score, or other relevant metrics based on the specific objectives of the summarization task.

4. Features

The dataset comprises the following key features:

  • Content: This column contains the textual content extracted from PDF books and images. It serves as the input for the summarization task.

  • Dialogue: This column likely contains dialogues or conversational elements present in the dataset. Understanding dialogues is important, especially if the summarization task involves capturing conversational context.

  • Summary: This column contains the ground truth or reference summaries for the corresponding content. It represents the desired output of the summarization model.

The dataset consists of approximately 14.7 rows, and it will be split into training, testing, and validation sets to train and assess the summarization model effectively.

About

SummarAIze: Machine Learning-based Data Summarization from PDFs and Images

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published