Stackoverflow tag generator

This repository contains the code and resources for building a Stack Overflow tag predictor using machine learning techniques. The goal of this project is to predict relevant tags for Stack Overflow questions based on their content.

About the Project

Stack Overflow is a popular platform for developers to ask and answer technical questions. However, tagging questions with relevant keywords is crucial for efficient searching and categorization. This project aims to automate the process of tagging Stack Overflow questions using machine learning. Currently the attempt is made to predict the top 5 tags from the dataset.

Dataset

The training dataset used in this project was obtained from Kaggle and consists of questions along with their associated tags. Preprocessing steps include lemmatization and stop word removal.

Preprocessing

Lemmatization: The text data is lemmatized to reduce words to their base or dictionary form.
Stop Word Removal: Common stop words are removed from the text data to improve model performance.

Vectorization & Model Selection

Two vectorization methods are used in this project:

TF-IDF (Term Frequency-Inverse Document Frequency):

Models: Logistic Regression, Naive Bayes, Support Vector Machine (SVM), Artificial Neural Network (ANN)
Word2Vec:

Models: Logistic Regression, Support Vector Machine (SVM), Recurrent Neural Network (RNN)

Performance

The performance of the models was evaluated using metrics such as accuracy, precision, recall, and F1-score. On the test dataset, the SVM model with Word2Vec achieved the highest accuracy of 90.23%, followed by RNN with 88.43%.

To further evaluate the model's performance, a new test dataset was created by web scraping Stack Overflow questions. The models were tested on this new dataset, and the results showed that the SVM model achieved an accuracy of approximately 92%, while the RNN model achieved an accuracy of 91%.

Visualization

Usage

Clone the repository:

git clone https://github.com/YourUsername/StackOverflow-Tag-Predictor.git

Get the dataset. (Link)
Run the model training and evaluation scripts.
Predict tags for new Stack Overflow questions using the trained model.

Acknowledgments

Thanks to the sklearn library for the models.
Kaggle for the dataset of questions and tags.
Stack Overflow for permitting data scraping to access its invaluable question dataset.

Name		Name	Last commit message	Last commit date
Latest commit History 18 Commits
README.md		README.md
StackOverflow_Tag_Generator.ipynb		StackOverflow_Tag_Generator.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Stackoverflow tag generator

Table of contents

About the Project

Dataset

Preprocessing

Vectorization & Model Selection

Performance

Visualization

Usage

Acknowledgments

About

Uh oh!

Releases

Packages

Languages

ad451/Stackoverflow_tag_generator

Folders and files

Latest commit

History

Repository files navigation

Stackoverflow tag generator

Table of contents

About the Project

Dataset

Preprocessing

Vectorization & Model Selection

Performance

Visualization

Usage

Acknowledgments

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages