Skip to content

d0r1h/SAR

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

22 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation



Website tweet

State-of-the-art Summarization methods for Hindi in ЁЯдЧ

SAR (рд╕рд╛рд░) in Hindi means summary. This repository contains my work on Hindi Text Summarization on news article.

Notebook:

Notebook Colab Kaggle
BaseLine Open In Colab Kaggle

DataSet:

  • As of now I've released a sample Dataset of 2k pairs of text and summary which can be accessed at Link

Models:

  • Inference results are on 2k sample data.
Model Checkpoint Rouge-2[f_score] Inference time
BART ai4bharat/IndicBART 21.48 20min 27s
T5 csebuetnlp/mT5_multilingual_XLSum 20.21 45min 54s

Project Pipeline

API

You can summarize any Hindi news article in just 5 lines of code

>>> import requests
>>> api_endpoint = "https://hf.space/embed/d0r1h/Hindi_News_Summarizer/+/api/predict/"
>>> news_url = "https://www.amarujala.com/uttar-pradesh/shamli/up-news-heroin-caught-in-shaheen-bagh-of-delhi-is-connection-to-kairana-and-muzaffarnagar?src=tlh\u0026position=3"
>>> r = requests.post(url= api_endpoint, 
                  json = {"data": [ news_url, "BART"]})
>>> r.json()['data'][0]
>>> рдпреВрдкреА рд╢рд╛рд╣реАрди рдмрд╛рдЧ рдореЗрдВ 100 рдХрд░реЛрдбрд╝ рд░реБрдкрдпреЗ рдХреАрдордд рдХреА рд╣реЗрд░реЛрдЗрди рдФрд░ рдЕрдиреНрдп рдорд╛рджрдХ рдкрджрд╛рд░реНрде рдХреА рдмрд░рд╛рдорджрдЧреА рд╡ рдЙрд╕реЗ рд▓рд╛рдиреЗ рдЕрдВрддрд░реНрд░рд╛рд╖реНрдЯреНрд░реАрдп рдбреНрд░рдЧреНрд╕ рддрд╕реНрдХрд░реЛрдВ рдХреЗ рдЧрд┐рд░реЛрд╣ рдХреЗ рддрд╛рд░ рд╢рд╛рдорд▓реА рдЬрд┐рд▓реЗ рдХреЗ рдХрд╕реНрдмрд╛ рдХреИрд░рд╛рдирд╛ рдФрд░ рдореБрдЬрдлреНрдлрд░рдирдЧрд░ рд╕реЗ рдЬреБрдбрд╝ рд░рд╣реЗ рд╣реИрдВред рдирд╛рд░рдХреЛрдЯрд┐рдХреНрд╕ рдХрдВрдЯреНрд░реЛрд▓ рдмреНрдпреВрд░реЛ рдПрдирд╕реАрдмреА рджрд┐рд▓реНрд▓реА рдХреА рдЯреАрдо рдиреЗ рдЧреБрд░реБрд╡рд╛рд░ рдХреЛ рдХреИрд▓рд╛рдирд╛ рд╕реЗ рджреЛ рд▓реЛрдЧреЛрдВ рдХреЛ рд╣рд┐рд░рд╛рд╕рдд рдореЗрдВ

Inference Demo:

Application is hosted on ЁЯдЧ space and can be accessed at SAR

Website Supported

ToDO

  • Add support for following website
  • Foramtting Hindi text for wordcloud