GitHub - bharatvem/TED-ANLP: Perform topic modelling on the transcripts of the TED Talks

Introduction

It is human tendency to label all the things we encounter. The internet along with its advantages also nurtured the availability and abundance in data. The larger the data gets, the greater the need to divide larger things into smaller chunks so that they could be accessed and used better. It might be an evolutionary learning to have ability to label the content based continuously training machine learning models. The goal is to design a model that could train on a corpus of text files to generate a finite bag of words that could be used to and predict an unknown/unlabelled text document. For this project we worked on designing and building topic prediction model for the TED talks to predict similar topic labels for TED Talks.

Methods used:

LDA Model
TfIdf Weight Ranking Model
k-NN Model
Word2Vec Model

Name		Name	Last commit message	Last commit date
Latest commit History 29 Commits
Google Word2Vec Model		Google Word2Vec Model
Readings		Readings
Ted-DataCollection-Transcript		Ted-DataCollection-Transcript
code		code
images		images
javascripts		javascripts
stylesheets		stylesheets
.DS_Store		.DS_Store
README.md		README.md
index.html		index.html
params.json		params.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Google Word2Vec Model

Google Word2Vec Model

Readings

Readings

Ted-DataCollection-Transcript

Ted-DataCollection-Transcript

code

code

images

images

javascripts

javascripts

stylesheets

stylesheets

.DS_Store

.DS_Store

README.md

README.md

index.html

index.html

params.json

params.json

Repository files navigation

About

Releases

Packages

Contributors 2

Languages

bharatvem/TED-ANLP

Folders and files

Latest commit

History

Repository files navigation

About

Topics

Resources

Stars

Watchers

Forks

Languages