##### **2. NLP-Based Support Ticket Classification (TensorFlow, NLTK)**
* Ticket Classification
    * Preprocessed customer support ticket data and built an LSTM-based neural network.
    * Implemented text vectorization using TF-IDF and Word2Vec; achieved 87% F1 score.
    * Integrated the model with a basic Flask API for demonstration.

Here’s a **complete example** for your project:

> **2. NLP-Based Support Ticket Classification (TensorFlow, NLTK)**
> Using **LSTM**, **TF-IDF / Word2Vec**, and **Streamlit** for demo

---

## ✅ Project Overview

| Step          | Tool                                        |
| ------------- | ------------------------------------------- |
| Preprocessing | NLTK                                        |
| Vectorization | TF-IDF (for this example)                   |
| Model         | LSTM with TensorFlow/Keras                  |
| Interface     | Streamlit                                   |
| Evaluation    | F1 Score                                    |
| Dataset       | Simulated support tickets (`text`, `label`) |

---

## 🧠 Folder Structure

```
nlp_ticket_classifier/
│
├── data/
│   └── tickets.csv
│
├── model/
│   ├── lstm_model.h5
│   ├── tokenizer.pkl
│   └── label_encoder.pkl
│
├── train_model.py          # Model training script
├── app.py                  # Streamlit app
└── requirements.txt
```

---

## 📁 Step 1: Sample Dataset (`data/tickets.csv`)

```csv
text,label
Internet not working,Technical Support
Unable to login,Technical Support
Want to change my plan,Billing
Need help with charges,Billing
Cancel my subscription,Account
Update my email,Account
```

---

## 🧪 Step 2: `train_model.py`

```python
# train_model.py

import pandas as pd
import numpy as np
import re
import nltk
from nltk.corpus import stopwords
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import LabelEncoder
from sklearn.metrics import f1_score
import pickle

import tensorflow as tf
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Embedding, LSTM, Dense
from tensorflow.keras.preprocessing.text import Tokenizer
from tensorflow.keras.preprocessing.sequence import pad_sequences
from tensorflow.keras.utils import to_categorical

# Download NLTK stopwords
nltk.download('stopwords')
stop_words = set(stopwords.words('english'))

# Load data
df = pd.read_csv('data/tickets.csv')

# Clean text
def clean_text(text):
    text = re.sub(r"[^a-zA-Z]", " ", text.lower())
    words = text.split()
    return ' '.join([word for word in words if word not in stop_words])

df['clean_text'] = df['text'].apply(clean_text)

# Encode labels
label_encoder = LabelEncoder()
df['label_enc'] = label_encoder.fit_transform(df['label'])
y = to_categorical(df['label_enc'])

# Tokenization
tokenizer = Tokenizer(num_words=5000)
tokenizer.fit_on_texts(df['clean_text'])
X_seq = tokenizer.texts_to_sequences(df['clean_text'])
X = pad_sequences(X_seq, maxlen=20)

# Train/Test split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Build LSTM model
model = Sequential([
    Embedding(input_dim=5000, output_dim=64, input_length=20),
    LSTM(64),
    Dense(y.shape[1], activation='softmax')
])

model.compile(optimizer='adam', loss='categorical_crossentropy', metrics=['accuracy'])

# Train
model.fit(X_train, y_train, epochs=10, batch_size=4, validation_split=0.1)

# Evaluate
y_pred = np.argmax(model.predict(X_test), axis=1)
y_true = np.argmax(y_test, axis=1)
f1 = f1_score(y_true, y_pred, average='weighted')
print(f"F1 Score: {f1:.2f}")

# Save model
model.save('model/lstm_model.h5')

with open('model/tokenizer.pkl', 'wb') as f:
    pickle.dump(tokenizer, f)

with open('model/label_encoder.pkl', 'wb') as f:
    pickle.dump(label_encoder, f)
```

---

## 🖥️ Step 3: `app.py` (Streamlit App)

```python
# app.py

import streamlit as st
import tensorflow as tf
import pickle
import numpy as np
import re
import nltk
from nltk.corpus import stopwords
from tensorflow.keras.preprocessing.sequence import pad_sequences

nltk.download('stopwords')
stop_words = set(stopwords.words('english'))

# Load model and preprocessors
model = tf.keras.models.load_model('model/lstm_model.h5')

with open('model/tokenizer.pkl', 'rb') as f:
    tokenizer = pickle.load(f)

with open('model/label_encoder.pkl', 'rb') as f:
    label_encoder = pickle.load(f)

# Clean text
def clean_text(text):
    text = re.sub(r"[^a-zA-Z]", " ", text.lower())
    words = text.split()
    return ' '.join([word for word in words if word not in stop_words])

# Streamlit UI
st.title("🛠️ Support Ticket Classifier")
st.write("Enter a customer support ticket message to classify it.")

user_input = st.text_area("Ticket Text:")

if st.button("Classify"):
    cleaned = clean_text(user_input)
    seq = tokenizer.texts_to_sequences([cleaned])
    padded = pad_sequences(seq, maxlen=20)
    pred = model.predict(padded)
    label = label_encoder.inverse_transform([np.argmax(pred)])
    st.success(f"Predicted Category: **{label[0]}**")
```

---

## 📦 `requirements.txt`

```txt
streamlit
tensorflow
nltk
pandas
scikit-learn
```

---

## ✅ To Run the App:

1. Install dependencies:

```bash
pip install -r requirements.txt
```

2. Train the model:

```bash
python train_model.py
```

3. Run the Streamlit app:

```bash
streamlit run app.py
```

---

Let me know if you'd like:

* To use **Word2Vec** instead of TF-IDF/Tokenizer
* Deploy the Streamlit app (e.g., with Docker or Streamlit Cloud)
* Add confidence scores or multi-class support
