# ECE-5831 Final Project — How to Run (final-project.ipynb)

**Project:** Intelligent Ticket Triage for Customer Support Tickets  
**Author:** Nikhil Patil (+ team)  

This notebook is a *runner notebook* that shows the end-to-end process to:
1) set up the environment, 2) fetch data (CFPB complaints), 3) run the demo app, and (optionally) 4) reproduce training.

> ✅ Tip: After you run all cells, use **Kernel → Restart & Run All**, then **Save**. That ensures the notebook is "executed" for GitHub.


## 0) Project files expected in this repo

- `app.py` (Streamlit demo)  
- `model/` (your fine-tuned adapter weights directory)  
- `EDA_and_Baseline.ipynb` (baseline experiments)  
- `Finetuning.ipynb` (LoRA/PEFT finetuning)  
- `requirements.txt` (recommended)  
- `.env` (local only; **do not commit**)  

If your adapter folder has a different name, update it in `app.py` (it uses `adapter_path = "./model"`).


In [None]:
# Sanity check: run this cell to confirm you're in the repo root.
import os, sys, platform
print("Python:", sys.version)
print("Platform:", platform.platform())
print("Working directory:", os.getcwd())
print("Repo files:", sorted([f for f in os.listdir('.') if not f.startswith('.')] )[:50])


## 1) Create / activate a Python environment

Run one of the options below in a terminal **from your repo folder**.

### Option A — conda
```bash
conda create -n ece5831-final python=3.10 -y
conda activate ece5831-final
```

### Option B — venv
```bash
python -m venv .venv
# Windows:
.venv\Scripts\activate
# macOS/Linux:
source .venv/bin/activate
```


In [None]:
# OPTIONAL (recommended): create a requirements.txt if you don't already have one.
# Run this cell ONCE (it will not overwrite an existing file).
req_path = "requirements.txt"
if not os.path.exists(req_path):
    req = "\n".join([
        "streamlit>=1.30",
        "torch",
        "transformers",
        "peft",
        "accelerate",
        "bitsandbytes",
        "python-dotenv",
        "datasets",
        "pandas",
        "numpy",
        "scikit-learn",
        "matplotlib",
        "seaborn",
    ]) + "\n"
    with open(req_path, "w", encoding="utf-8") as f:
        f.write(req)
    print("Created requirements.txt")
else:
    print("requirements.txt already exists — leaving it unchanged.")


## 2) Install dependencies

In your terminal (with your environment activated):

```bash
pip install -r requirements.txt
```

If you use CUDA, make sure you have a compatible PyTorch build.


## 3) Add your Hugging Face token safely (DO NOT commit it)

Create a file named `.env` in the repo root:

```text
HUGGING_FACE_TOKEN=hf_xxxxxxxxxxxxxxxxxxxxxxxxx
```

Also create a `.gitignore` so `.env` never gets committed.


In [None]:
# Create/append a minimal .gitignore to protect secrets and large artifacts.
gi_path = ".gitignore"
lines = []
if os.path.exists(gi_path):
    with open(gi_path, "r", encoding="utf-8") as f:
        lines = f.read().splitlines()

add = [
    ".env",
    "__pycache__/",
    "*.pyc",
    ".venv/",
    "venv/",
    ".ipynb_checkpoints/",
    "*.pt",
    "*.bin",
    "*.safetensors",
    "*.ckpt",
]
changed = False
for item in add:
    if item not in lines:
        lines.append(item)
        changed = True

if changed:
    with open(gi_path, "w", encoding="utf-8") as f:
        f.write("\n".join(lines) + "\n")
    print("Updated .gitignore")
else:
    print(".gitignore already contains the recommended entries.")


## 4) (Optional) Fetch the CFPB dataset

You can download the CFPB consumer complaints dataset from the official site and place it in a `data/` folder.

- CFPB dataset page: https://www.consumerfinance.gov/data-research/consumer-complaints/

> If your baseline/finetuning notebooks already handle dataset loading, you can skip this step.


## 5) Run the Streamlit demo app

This uses `app.py` in this repo. It loads a Llama-based sequence classification model + your PEFT adapter.

In a terminal:

```bash
streamlit run app.py
```

Streamlit will print a local URL (usually `http://localhost:8501`). Open it in your browser.


In [None]:
# Convenience: show the exact command you should run (copy/paste in a terminal).
print("streamlit run app.py")


## 6) (Optional) Reproduce experiments

Open and run these notebooks as needed:

- `EDA_and_Baseline.ipynb` — EDA + baseline models
- `Finetuning.ipynb` — LoRA/PEFT finetuning and saving adapter weights

After finetuning, ensure your saved adapter directory is present at `./model/` (or update `app.py`).


## 7) Final checklist before you push to GitHub

- [ ] `final-project.ipynb` runs top-to-bottom
- [ ] `.env` exists locally but is ignored by git
- [ ] `README.md` contains links (report, slides, video, dataset, demo)
- [ ] Big model weights are **not** committed (use a link or Git LFS if required)
