Description: Made use of Open Source datasets like KPTimes, KPCrowd, Inspec etc to fine tune a BART model to generation keyword phrases.
Trained the model by batching complete dataset into individual batches of 2000 datapoints. This was to ensure GPU VRAM is not overused.
Total number of data points = 2,98,311
Number of batches trained on = 8
Total number of data points used for training = 16,000
Best Rouge 1 score: 33.8
- python
- NLTK
- pytorch
- pickle
- hugging face transformers