Summarizer

1. Problem

The project aims to address the challenge of generating summaries from PDF books and images using machine learning.

2. Data

The dataset utilized in this project can be found at the following link: https://huggingface.co/datasets/samsum

3. Evaluation

The evaluation metrics for this project are not specified and need to be completed. It is crucial to define the criteria for assessing the quality of the generated summaries. Possible evaluation metrics could include precision, recall, F1 score, or other relevant metrics based on the specific objectives of the summarization task.

4. Features

The dataset comprises the following key features:

Content: This column contains the textual content extracted from PDF books and images. It serves as the input for the summarization task.
Dialogue: This column likely contains dialogues or conversational elements present in the dataset. Understanding dialogues is important, especially if the summarization task involves capturing conversational context.
Summary: This column contains the ground truth or reference summaries for the corresponding content. It represents the desired output of the summarization model.

The dataset consists of approximately 14.7 rows, and it will be split into training, testing, and validation sets to train and assess the summarization model effectively.

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
README.md		README.md
books_sum.ipynb		books_sum.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Summarizer

1. Problem

2. Data

3. Evaluation

4. Features

About

Releases

Packages

Languages

Atharva0506/Summarizer

Folders and files

Latest commit

History

Repository files navigation

Summarizer

1. Problem

2. Data

3. Evaluation

4. Features

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages