Skip to content

This project is a helper project for InterviewInsightMine which taggs question of scrapped data

Notifications You must be signed in to change notification settings

AhmedTammaa/InterviewQuestionTagging

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

6 Commits
 
 
 
 
 
 
 
 

Repository files navigation

InterviewQuestionTagging

This project is a helper project for InterviewInsightMine which taggs question of scrapped data

Problem

In the InterviewInsightMine there has been collected 9000+ data, however analyzing them is a difficult task since they are not tagged

Solution

Train a model from data StackOverflow and StackExchange website. They publish Stack Exchage Data Dumps
In this project, we are interested in the Posts file which contains the question and the tags. The first iteration of this project is done on stats.meta.stackexchange.com.7z. Because of the limited GPU power and also we don't need all tags I extracted only the top 50 tags.

Preprocessing

The preprocessing is basic

  1. Removing StopWords
  2. Making all strings lower
  3. stemming the words
  4. Removing the slashes and other symbols

Model

The data is then fitted with tfidf vectorizer and fed into the convolution model

Prediction

The prediction is a vector of 50 elements with each from 0 to 1 as a probability for a tag to be associated with the question

Loss Function

MSE loss between the actual vs predicted vector. Current Testing loss: 0.0268

About

This project is a helper project for InterviewInsightMine which taggs question of scrapped data

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors