Skip to content

ashokurlana/Indian-Language-Summarization

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

8 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Indian-Language-Summarization

This repository shows the implemenation of summarization models for Indian languages. The code and data available in the repo.

We use a modified fork of huggingface transformers for our experiments.

Creating environment

If you are using conda use the following command:

conda env create -f environment.yml

Otherwise, for creating python environment use:

pip install requirements.txt

Data format:

  • We used the dataset released in the ILSUM shared task

  • Make sure to create `train, dev, test' csv files with column names "text" and "summary"

Models:

For Hindi and Gujarati

For English

Run the script

To fine-tune any huggingface model you can use the run.sh script. When running the different models described in the paper, ensure you pass the appropriate arguments.

sh run.sh

Reference

If you use our code or corpus, please kindly cite:

@article{urlana2023indian,
  title={Indian language summarization using pretrained sequence-to-sequence models},
  author={Urlana, Ashok and Bhatt, Sahil Manoj and Surange, Nirmal and Shrivastava, Manish},
  journal={arXiv preprint arXiv:2303.14461},
  year={2023}
}

About

This repository shows the implemenation of summarization models for Indian languages. The code and data available in the repo.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published