# Phase 0: Project Setup and Foundation

This notebook provides setup instructions and validates the project environment for the Telco Churn Retention Platform.

## Objectives
- Verify Python environment and dependencies
- Validate project structure
- Test data access
- Confirm all tools are properly configured

## Prerequisites
- Python 3.11 or higher
- pip or poetry package manager


In [None]:
# Check Python version
import sys

print(f"Python version: {sys.version}")
print(f"Python executable: {sys.executable}")

# Verify Python 3.11+
assert sys.version_info >= (3, 11), f"Python 3.11+ required, found {sys.version_info.major}.{sys.version_info.minor}"

print("\nPython version check: PASSED")


Python version: 3.13.7 (tags/v3.13.7:bcee1c3, Aug 14 2025, 14:15:11) [MSC v.1944 64 bit (AMD64)]
Python executable: C:\Python313\python.exe

Python version check: PASSED


## 1. Verify Project Structure


In [None]:
import sys
from pathlib import Path

# Add project root to path
PROJECT_ROOT = Path().resolve().parent
if str(PROJECT_ROOT) not in sys.path:
    sys.path.append(str(PROJECT_ROOT))

print(f"Project root: {PROJECT_ROOT}")

# Check key directories
required_dirs = [
    "src",
    "notebooks",
    "scripts",
    "tests",
    "configs",
    "docs",
    "data/raw",
    "data/processed",
]

print("\nChecking project structure...")
for dir_name in required_dirs:
    dir_path = PROJECT_ROOT / dir_name
    exists = dir_path.exists()
    status = "EXISTS" if exists else "MISSING"
    print(f"  {dir_name:20s}: {status}")

print("\nProject structure check: COMPLETE")


Project root: C:\Users\tiwar\OneDrive - The University of Melbourne\Desktop\New folder\telco-churn-retention

Checking project structure...
  src                 : EXISTS
  notebooks           : EXISTS
  scripts             : EXISTS
  tests               : EXISTS
  configs             : EXISTS
  docs                : EXISTS
  data/raw            : EXISTS
  data/processed      : EXISTS

Project structure check: COMPLETE


## 2. Verify Dependencies


In [None]:
# Check core dependencies
required_packages = {
    "pandas": "Data manipulation",
    "numpy": "Numerical computing",
    "sklearn": "Machine learning (scikit-learn)",  # Import name is sklearn, not scikit-learn
    "xgboost": "Gradient boosting",
    "lightgbm": "Gradient boosting",
    "optuna": "Hyperparameter tuning",
    "mlflow": "Experiment tracking",
    "shap": "Model explainability",
    "matplotlib": "Visualization",
    "seaborn": "Statistical visualization",
}

print("Checking core dependencies...")
print("=" * 80)

missing_packages = []
for package, description in required_packages.items():
    try:
        __import__(package)
        print(f"  {package:20s}: INSTALLED - {description}")
    except ImportError:
        print(f"  {package:20s}: MISSING - {description}")
        missing_packages.append(package)

if missing_packages:
    print(f"\nMissing packages: {', '.join(missing_packages)}")
    print("Install with: pip install -e .")
else:
    print("\nAll core dependencies installed!")


Checking core dependencies...


  pandas              : INSTALLED - Data manipulation
  numpy               : INSTALLED - Numerical computing
  scikit-learn        : MISSING - Machine learning


  xgboost             : INSTALLED - Gradient boosting


  lightgbm            : INSTALLED - Gradient boosting


  optuna              : INSTALLED - Hyperparameter tuning


  mlflow              : INSTALLED - Experiment tracking


  shap                : INSTALLED - Model explainability
  matplotlib          : INSTALLED - Visualization
  seaborn             : INSTALLED - Statistical visualization

Missing packages: scikit-learn
Install with: pip install -e .


In [None]:
# Check if data file exists
raw_data_path = PROJECT_ROOT / "data" / "raw" / "telco_data_28_11_2025.csv"

print("Checking data access...")
print("=" * 80)

if raw_data_path.exists():
    print(f"Raw data file: FOUND at {raw_data_path}")

    # Try to load a sample
    try:
        import pandas as pd
        df_sample = pd.read_csv(raw_data_path, nrows=5)
        print("Data file is readable")
        print(f"Sample columns: {list(df_sample.columns)[:5]}...")
        print(f"Sample shape (first 5 rows): {df_sample.shape}")
    except Exception as e:
        print(f"Error reading data file: {e}")
else:
    print(f"Raw data file: NOT FOUND at {raw_data_path}")
    print("Please download the Telco Customer Churn dataset and place it in data/raw/")

# Check processed data
processed_dir = PROJECT_ROOT / "data" / "processed"
if processed_dir.exists():
    timestamp_dirs = [d for d in processed_dir.iterdir() if d.is_dir()]
    if timestamp_dirs:
        latest = max(timestamp_dirs, key=lambda p: p.name)
        print(f"Processed data: FOUND (latest: {latest.name})")
    else:
        print("Processed data: No processed snapshots found")
else:
    print("Processed data directory: Will be created when running Phase 1")


Checking data access...
Raw data file: FOUND at C:\Users\tiwar\OneDrive - The University of Melbourne\Desktop\New folder\telco-churn-retention\data\raw\telco_data_28_11_2025.csv
Data file is readable
Sample columns: ['customerID', 'gender', 'SeniorCitizen', 'Partner', 'Dependents']...
Sample shape (first 5 rows): (5, 21)
Processed data: FOUND (latest: 20251128T132625Z)


## 4. Test Project Imports


In [None]:
# Test importing project modules
print("Testing project module imports...")
print("=" * 80)

modules_to_test = [
    "src.data.ingestion",
    "src.data.validation",
    "src.data.eda",
    "src.features.pipeline",
    "src.features.transformers",
    "src.features.store",
    "src.models.baseline",
    "src.models.trainer",
    "src.models.evaluation",
    "src.models.explainability",
]

failed_imports = []
for module_name in modules_to_test:
    try:
        __import__(module_name)
        print(f"  {module_name:30s}: OK")
    except ImportError as e:
        print(f"  {module_name:30s}: FAILED - {str(e)[:50]}")
        failed_imports.append(module_name)

if failed_imports:
    print(f"\nFailed imports: {len(failed_imports)}")
    print("Make sure you've installed the project: pip install -e .")
else:
    print("\nAll project modules import successfully!")


Testing project module imports...


  src.data.ingestion            : OK
  src.data.validation           : OK
  src.data.eda                  : OK
  src.features.pipeline         : OK
  src.features.transformers     : OK
  src.features.store            : OK
  src.models.baseline           : OK
  src.models.trainer            : OK
  src.models.evaluation         : OK
  src.models.explainability     : OK

All project modules import successfully!


## 5. Verify Development Tools


In [None]:
# Check development tools
dev_tools = {
    "ruff": "Linting",
    "black": "Code formatting",
    "mypy": "Type checking",
    "pytest": "Testing",
}

print("Checking development tools...")
print("=" * 80)

missing_tools = []
for tool, description in dev_tools.items():
    try:
        result = __import__("subprocess").run(
            [tool, "--version"],
            capture_output=True,
            text=True,
            timeout=5
        )
        if result.returncode == 0:
            version = result.stdout.strip().split("\n")[0]
            print(f"  {tool:20s}: INSTALLED - {description} ({version})")
        else:
            print(f"  {tool:20s}: NOT FOUND - {description}")
            missing_tools.append(tool)
    except (FileNotFoundError, ImportError):
        print(f"  {tool:20s}: NOT FOUND - {description}")
        missing_tools.append(tool)

if missing_tools:
    print(f"\nMissing tools: {', '.join(missing_tools)}")
    print("Install with: pip install -e .[dev]")
else:
    print("\nAll development tools available!")


Checking development tools...
  ruff                : INSTALLED - Linting (ruff 0.14.6)


  black               : INSTALLED - Code formatting (black, 25.11.0 (compiled: yes))


  mypy                : INSTALLED - Type checking (mypy 1.18.2 (compiled: yes))


  pytest              : INSTALLED - Testing (pytest 8.4.1)

All development tools available!


## 6. Setup Summary

### Next Steps:
1. If any dependencies are missing, install them:
   ```bash
   pip install -e .
   pip install -e .[dev]  # For development tools
   pip install -e .[eda]  # For EDA tools (jupyter, etc.)
   ```

2. Download the Telco Customer Churn dataset to `data/raw/telco_data_28_11_2025.csv`

3. Run Phase 1 notebook to begin data exploration:
   - `notebooks/phase1_eda.ipynb`

4. Execute Phase 1 script to generate processed data:
   ```bash
   python scripts/run_phase1_data_intake.py
   ```

### Project Structure:
- `src/`: Core application code
- `notebooks/`: Exploratory analysis notebooks
- `scripts/`: CLI utilities for pipeline execution
- `tests/`: Unit and integration tests
- `configs/`: Configuration files
- `docs/`: Documentation

### Getting Help:
- See `README.md` for project overview
- See `docs/roadmap.md` for phase details
- See `docs/architecture.md` for system design
