TED-Text-Analysis

A text analysis project on TED Talk dataset for tag extraction, summarization and related talk recommendation.

This project was carried out under SMU IS450 Text Mining and Natural Language Processing in AY2019-2020 Semster 2. The team consists of Wende B., Chengzi Z., May M., Suyee K., Xiaowei L., and myself.

The topic of the project is to analyse TED Talk transcripts and achieve automated tag extraction, transcript summarization, and related talk recommendation with a given new transcript and title.

The TED talk dataset is available on Kaggle, with information and transcripts of talks uploaded to the official TED.com until September 21st, 2017. In total, there is the information of 2550 talks with 2464 transcripts.

You may wish to read our Medium article to learn more about our system.

System Design

Technology employed

Step	Main technology
Transcript Preprocessing	Spacy
Topic Modelling	Gensim, Scikit-Learn, LDA Mallet
TF-IDF Metric Computation	Scikit-learn
Tag Generation	WordNet, Networkx
Summarization	TextRank
Related Talks Recommendation	Scikit-learn

GUI

Besides implementing the system backend, we also created a simple GUI for easy use of our system.

The instructions are as the following:

Navigate to the project directory after downloading and unzipping/cloning
python SummarySystem in Command Prompt
Fill in the Title and Input your Text fields
Click Generate button

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
data		data
functions		functions
model		model
.gitignore		.gitignore
Analysis.ipynb		Analysis.ipynb
Preprocessing and Filtering Transcripts.ipynb		Preprocessing and Filtering Transcripts.ipynb
README.md		README.md
Related Talks Recommendation.ipynb		Related Talks Recommendation.ipynb
Summarization.ipynb		Summarization.ipynb
SummarySystem.py		SummarySystem.py
TF-IDF Metric Computation.ipynb		TF-IDF Metric Computation.ipynb
Tag Generation.ipynb		Tag Generation.ipynb
Topic Modelling.ipynb		Topic Modelling.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

TED-Text-Analysis

System Design

Technology employed

GUI

About

Releases

Packages

Languages

Haodi-Qi/TED-Text-Analysis

Folders and files

Latest commit

History

Repository files navigation

TED-Text-Analysis

System Design

Technology employed

GUI

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages