Skip to content

A text analysis project on TED Talk dataset for tag generation, summarization and related talk recommendation.

Notifications You must be signed in to change notification settings

Haodi-Qi/TED-Text-Analysis

Repository files navigation

TED-Text-Analysis

A text analysis project on TED Talk dataset for tag extraction, summarization and related talk recommendation.

This project was carried out under SMU IS450 Text Mining and Natural Language Processing in AY2019-2020 Semster 2. The team consists of Wende B., Chengzi Z., May M., Suyee K., Xiaowei L., and myself.

The topic of the project is to analyse TED Talk transcripts and achieve automated tag extraction, transcript summarization, and related talk recommendation with a given new transcript and title.

The TED talk dataset is available on Kaggle, with information and transcripts of talks uploaded to the official TED.com until September 21st, 2017. In total, there is the information of 2550 talks with 2464 transcripts.

You may wish to read our Medium article to learn more about our system.


System Design

System Design

Technology employed

StepMain technology
Transcript PreprocessingSpacy
Topic ModellingGensim, Scikit-Learn, LDA Mallet
TF-IDF Metric ComputationScikit-learn
Tag GenerationWordNet, Networkx
SummarizationTextRank
Related Talks RecommendationScikit-learn

GUI

Besides implementing the system backend, we also created a simple GUI for easy use of our system.

The instructions are as the following:

  1. Navigate to the project directory after downloading and unzipping/cloning
  2. python SummarySystem in Command Prompt
  3. Fill in the Title and Input your Text fields
  4. Click Generate button

About

A text analysis project on TED Talk dataset for tag generation, summarization and related talk recommendation.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published