# **AutoTagger: Intelligent Question Tagging with Difficulty & Intent Detection**

### Project Overview  
**AutoTagger is an AI-powered NLP system that enhances how technical questions are classified on platforms like StackOverflow or Quora. It automatically predicts the relevant topic tags, estimates the difficulty level, and detects 
the user's intent behind the question.** 

### Key Features: 
- Tag Prediction: Multi-label classification of questions (e.g., ["Python", "NLP"]) 
- Difficulty Estimation: Predicts whether a question is Easy, Medium, or Hard 
- Intent Detection: Classifies the question’s intent (e.g., How-to, Debugging, Concept Explanation) 
- Similar Questions Retrieval: Uses embeddings to show similar previously answered questions 
- Streamlit Web App: Simple UI with real-time prediction 
- Confidence Scores: Shows how certain the model is 
- Feedback Integration: Allows users to correct predictions to improve learning

## Import Libraries

In [None]:
import pandas as pd
import numpy as np

In [38]:
questions = pd.read_csv('././Dataset/Questions.csv', encoding='ISO-8859-1')
answers = pd.read_csv('././Dataset/Answers.csv', encoding='ISO-8859-1')
tags = pd.read_csv('././Dataset/Tags.csv', encoding='ISO-8859-1')

In [None]:
answer_counts = answers.groupby("ParentId").size().reset_index(name="AnswerCount")

# Group tags per question
tag_groups = tags.groupby("Id")["Tag"].apply(list).reset_index(name="Tags")

In [None]:
# Merge answer count and tags into questions
questions_merged = questions.merge(answer_counts, left_on="Id", right_on="ParentId", how="left")
questions_merged = questions_merged.merge(tag_groups, on="Id", how="left")

In [None]:
# Fill missing answer counts with 0
questions_merged["AnswerCount"] = questions_merged["AnswerCount"].fillna(0).astype(int)

# Drop unnecessary columns like 'ParentId'
questions_merged = questions_merged.drop(columns=['ParentId','OwnerUserId','ClosedDate'], errors="ignore")

In [None]:
# Now sample 10k clean records
df = questions_merged.sample(n=10000, random_state=42).reset_index(drop=True)

In [48]:
df.sample()

Unnamed: 0,Id,CreationDate,Score,Title,Body,AnswerCount,Tags
3521,18505250,2013-08-29T07:57:33Z,0,How to make UIWebview url address area progres...,<p>I am creating a web-browser app. </p>\n\n<p...,0,"[ios, uiwebview, uiprogressbar]"
