Skip to content

E-Ghafour/Persian_Embedrank_Keyphrase_Extraction

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

13 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Persian Keyphrase Extraction with EmbedRank

Overview

This repository contains the implementation of a Persian keyphrase extraction system using the EmbedRank method. The EmbedRank algorithm has been adapted and customized for the Persian language. The system includes trained models for Persian sentence embedding and part-of-speech (POS) tagging, utilizing the Hazm library.

sample of keyphrase output

Features

  • EmbedRank Method: The system employs the EmbedRank algorithm for keyphrase extraction, adapted and optimized for the Persian language.

  • Trained Models:

    • Persian Sentence Embedding: Pre-trained models for generating embeddings from Persian sentences, capturing semantic features. You can download this pretrained sentence embedding model from this link.
    • Persian POS Tagger: Pre-trained models for part-of-speech tagging specifically designed for Persian text. you can download this pretrained POS tagger model from this link
  • Hazm Library Usage Example: This implementation serves as a sample of Hazm library usage and is featured in the Hazm documentation. Feel free to open the link and view full persian documentation about this implemention.

Implementation Details

  • Sentence Embedding Training: The Persian sentence embedding model is trained to capture contextual information and semantic features specific to the Persian language to find relevant phrases.

  • POS Tagger Training: The Persian POS tagger is trained for accurate part-of-speech tagging in Persian text to convert sequence of words to decent candidate phrases that can be selected as keyphrases.

  • EmbedRank Customization: EmbedRank algorithm is adapted and fine-tuned for optimal keyphrase extraction from Persian documents.

About

extracting the keyword of the persian texts

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages