Skip to content

This part has been integrated into Nanophotonics_design_command_interactive_chatbot project

License

Notifications You must be signed in to change notification settings

ZooBeasts/NLP_keyword_Summarization_for_physics_paper

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

40 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Keyword extraction for short physics letters, using TFIDF. Extractive summarization will use Textrank and Abstractive summarization will use a pre-trained model and further apply to QA chatbot 28/09/23

More details that changed plz see Development_blog.txt

The task changed, so this project will apply NLTK to split sentences and words to achieve better results. Networkx package is used for textrank since self-written textrank.py has an issue returning an empty list.

Development blog:

added text summarization Maximal Marginal Relevance(MMR) 28/09/23

Texkrank summarization is uploaded and useable for extracting BBC news dataset: https://www.kaggle.com/datasets/pariza/bbc-news-summary. self-written textrank.py works for Chinese, not sure why in English it returns an empty list, will continue investigating. Word_embedding is used glove.6b.50d.txt.https://www.kaggle.com/datasets/adityajn105/glove6b50d (28/09/23) image

Seems that nltk separates few flaws, but still able to extract 4 important words. 27/09/23 (end, problem solved for M, caused by not lower() the content) image

Stil error in textrank. added simple partofspeech pos_tagging.py. TFIDF can extract the correct keywords in a length of 6. The result below is the keywords extracted from my paper. 22/09/23 image

Problem with textrank, I don't know why show ValueError: max() arg is an empty sequence, and why pass empty to min and max, didn't pass the unitest. (Yet TFIDF is working perfectly. Will add LDA later on ) 21/09/23

About

This part has been integrated into Nanophotonics_design_command_interactive_chatbot project

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Languages