# OCR Pipeline Runner and App Launcher

This notebook lets you:

1. **Run the training pipeline** to train the Decision Tree and Random Forest models on EMNIST letters.
2. **Save trained models as artifacts** under `data/processed`.
3. **Launch the Streamlit app** that uses those saved models for interactive exploration.

> Tip: Run the cells from top to bottom. Make sure you have installed the dependencies in `requirments.txt` first.


In [17]:
# Ensure the project root is on sys.path so imports work when running this notebook
import sys
from pathlib import Path

PROJECT_ROOT = Path.cwd().resolve()
if (PROJECT_ROOT / "src").exists() and str(PROJECT_ROOT / "src") not in sys.path:
    sys.path.insert(0, str(PROJECT_ROOT / "src"))

print("Project root:", PROJECT_ROOT)
print("Python path updated with:", PROJECT_ROOT / "src")


Project root: G:\University\Third Year Term 1\AI\Letter_OCR_Project\notebooks
Python path updated with: G:\University\Third Year Term 1\AI\Letter_OCR_Project\notebooks\src


In [18]:
# Optional: verify config paths and that data directories exist
from ocr_project import config

print("EMNIST train:", config.EMNIST_LETTERS_TRAIN)
print("EMNIST test:", config.EMNIST_LETTERS_TEST)
print("Data dir:", config.DATA_DIR)
print("Processed dir:", config.PROCESSED_DATA_DIR)

# Ensure the key directories exist
config.ensure_directories()


EMNIST train: G:\University\Third Year Term 1\AI\Letter_OCR_Project\data\raw\emnist-letters-train.csv
EMNIST test: G:\University\Third Year Term 1\AI\Letter_OCR_Project\data\raw\emnist-letters-test.csv
Data dir: G:\University\Third Year Term 1\AI\Letter_OCR_Project\data
Processed dir: G:\University\Third Year Term 1\AI\Letter_OCR_Project\data\processed


In [19]:
# Run the full OCR pipeline: load data, train models, evaluate, and save artifacts
from ocr_project.pipeline import run_default_pipeline

results = run_default_pipeline()

print("Training complete. Model results:")
for name, result in results.items():
    print(f"- {name}: accuracy = {result.accuracy:.4f}")


Training complete. Model results:
- decision_tree: accuracy = 0.5574
- random_forest: accuracy = 0.8354


In [20]:
# Inspect saved artifacts in data/processed
from pathlib import Path
from ocr_project import config

processed_dir = config.PROCESSED_DATA_DIR
print("Artifacts saved under:", processed_dir)

for p in sorted(processed_dir.glob("*.pkl")):
    print("-", p.name)


Artifacts saved under: G:\University\Third Year Term 1\AI\Letter_OCR_Project\data\processed
- decision_tree.pkl
- random_forest.pkl


## Launch the Streamlit app

The following cell launches the Streamlit app defined in `app.py`.

- **In Jupyter / VS Code / Cursor:** this will start Streamlit in a separate process.
- Open the URL it prints (typically `http://localhost:8501`) in your browser.

Stop the app with **Ctrl+C** in the terminal when you are done.


In [21]:
import sys
import subprocess

# Launch the Streamlit app from this notebook
cmd = [sys.executable, "-m", "streamlit", "run", "app.py"]
print("Running:", " ".join(cmd))
print("If nothing happens, open http://localhost:8501 manually in your browser.")

# This will block until you stop Streamlit (Ctrl+C in the terminal where it is running)
subprocess.run(cmd, check=False)


Running: C:\Users\mohal\AppData\Local\Programs\Python\Python313\python.exe -m streamlit run app.py
If nothing happens, open http://localhost:8501 manually in your browser.


CompletedProcess(args=['C:\\Users\\mohal\\AppData\\Local\\Programs\\Python\\Python313\\python.exe', '-m', 'streamlit', 'run', 'app.py'], returncode=1)