# Cure PC Audit Script - Test Notebook

This notebook contains the refactored production code for testing before deployment.

## Features:
- Comprehensive logging
- Parallel processing (10x faster)
- Robust error handling
- Type hints and documentation

## Instructions:
1. Run each cell sequentially
2. Check the logs in the `logs/` directory after execution
3. Verify the results in your database
4. Compare with original notebook output

## 1. Imports and Configuration

In [None]:
"""Import all required libraries."""

import logging
import os
import socket
import sys
from concurrent.futures import ThreadPoolExecutor, as_completed
from contextlib import contextmanager
from dataclasses import dataclass
from datetime import datetime
from functools import wraps
from pathlib import Path
from typing import Dict, List, Optional, Tuple

import pandas as pd
from natsort import index_natsorted, order_by_index
from sqlalchemy import create_engine
from sqlalchemy.engine import Engine

print("‚úÖ All imports successful")

## 2. Configuration Class

In [None]:
@dataclass
class Config:
    """Configuration for Cure PC audit script."""
    
    # Database settings
    server: str
    database: str
    table_name: str = "Equipment PCs"
    
    # CPC settings
    cpc_source: str = r"C\Program Files\CPC Client"
    cpc_client: str = "CPCClient.exe"
    
    # Performance settings
    max_workers: int = 10
    network_timeout: float = 2.0
    
    # Logging settings
    log_dir: Path = None
    log_level: str = "INFO"
    
    def __post_init__(self):
        """Set default log directory and ensure it exists."""
        if self.log_dir is None:
            # Set to project root / logs
            base = Path().resolve()
            while not (base / "notebooks").exists() and base.parent != base:
                base = base.parent
            self.log_dir = base / "logs"
        self.log_dir.mkdir(parents=True, exist_ok=True)

print("‚úÖ Configuration class defined")

## 3. Logging Setup

In [None]:
def setup_logging(config: Config) -> logging.Logger:
    """
    Configure comprehensive logging with file and console handlers.
    
    Args:
        config: Configuration object with logging settings
        
    Returns:
        Configured logger instance
    """
    from logging.handlers import RotatingFileHandler
    
    # Create logger
    logger = logging.getLogger("cure_pc_audit")
    logger.setLevel(getattr(logging, config.log_level))
    
    # Remove existing handlers
    logger.handlers = []
    
    # Create formatters
    detailed_formatter = logging.Formatter(
        "%(asctime)s - %(name)s - %(levelname)s - %(funcName)s:%(lineno)d - %(message)s",
        datefmt="%Y-%m-%d %H:%M:%S"
    )
    simple_formatter = logging.Formatter(
        "%(asctime)s - %(levelname)s - %(message)s",
        datefmt="%Y-%m-%d %H:%M:%S"
    )
    
    # File handler (rotating, keeps last 5 files of 10MB each)
    log_file = config.log_dir / f"cure_pc_audit_{datetime.now():%Y%m%d}.log"
    file_handler = RotatingFileHandler(
        log_file,
        maxBytes=10 * 1024 * 1024,  # 10MB
        backupCount=5,
        encoding="utf-8"
    )
    file_handler.setLevel(logging.DEBUG)
    file_handler.setFormatter(detailed_formatter)
    logger.addHandler(file_handler)
    
    # Console handler (less verbose)
    console_handler = logging.StreamHandler(sys.stdout)
    console_handler.setLevel(logging.INFO)
    console_handler.setFormatter(simple_formatter)
    logger.addHandler(console_handler)
    
    return logger

print("‚úÖ Logging setup function defined")

## 4. Database Functions

In [None]:
@contextmanager
def get_db_engine(config: Config) -> Engine:
    """
    Create and manage database engine with proper cleanup.
    
    Args:
        config: Configuration object with database settings
        
    Yields:
        SQLAlchemy engine instance
    """
    logger = logging.getLogger("cure_pc_audit")
    
    conn_str = (
        f"mssql+pyodbc://{config.server}/{config.database}?"
        f"trusted_connection=yes&"
        f"driver=ODBC+Driver+17+for+SQL+Server"
    )
    
    logger.info(f"Connecting to database: {config.server}/{config.database}")
    engine = create_engine(conn_str, pool_pre_ping=True)
    
    try:
        # Test connection
        with engine.connect() as conn:
            conn.execute("SELECT 1")
        logger.info("Database connection successful")
        yield engine
    except Exception as e:
        logger.error(f"Database connection failed: {e}")
        raise
    finally:
        engine.dispose()
        logger.debug("Database connection closed")


def read_equipment_pcs(engine: Engine, table_name: str) -> pd.DataFrame:
    """
    Read equipment PCs from database.
    
    Args:
        engine: SQLAlchemy engine
        table_name: Name of the table to query
        
    Returns:
        DataFrame containing equipment PC data
    """
    logger = logging.getLogger("cure_pc_audit")
    
    query = f"SELECT * FROM [{table_name}]"
    logger.debug(f"Executing query: {query}")
    
    df = pd.read_sql(query, engine)
    logger.info(f"Retrieved {len(df)} equipment PCs from database")
    
    return df


def write_equipment_pcs(
    df: pd.DataFrame,
    engine: Engine,
    table_name: str
) -> None:
    """
    Write equipment PCs to database.
    
    Args:
        df: DataFrame containing equipment PC data
        engine: SQLAlchemy engine
        table_name: Name of the table to write to
    """
    logger = logging.getLogger("cure_pc_audit")
    
    logger.info(f"Writing {len(df)} equipment PCs to database...")
    df.to_sql(
        name=table_name,
        con=engine,
        if_exists="replace",
        index=False,
        method="multi",  # Faster bulk insert
        chunksize=1000
    )
    logger.info("Database write successful")

print("‚úÖ Database functions defined")

## 5. Network Functions

In [None]:
def get_ip(pcid: str, timeout: float = 2.0) -> Optional[str]:
    """
    Resolve PCID to IP address using DNS lookup.
    
    Args:
        pcid: PC identifier to resolve
        timeout: DNS lookup timeout in seconds
        
    Returns:
        IP address string or None if lookup fails
    """
    logger = logging.getLogger("cure_pc_audit")
    
    if not pcid or pd.isna(pcid):
        return None
    
    try:
        socket.setdefaulttimeout(timeout)
        ip = socket.gethostbyname(str(pcid))
        logger.debug(f"Resolved {pcid} ‚Üí {ip}")
        return ip
    except socket.gaierror:
        logger.debug(f"DNS lookup failed for PCID: {pcid}")
        return None
    except socket.timeout:
        logger.warning(f"DNS lookup timeout for PCID: {pcid}")
        return None
    except Exception as e:
        logger.error(f"Unexpected error resolving {pcid}: {e}")
        return None


def get_pcid(ip_address: str, timeout: float = 2.0) -> Optional[str]:
    """
    Resolve IP address to PCID using reverse DNS lookup.
    
    Args:
        ip_address: IP address to resolve
        timeout: DNS lookup timeout in seconds
        
    Returns:
        PCID string or None if lookup fails
    """
    logger = logging.getLogger("cure_pc_audit")
    
    if not ip_address or pd.isna(ip_address):
        return None
    
    try:
        socket.setdefaulttimeout(timeout)
        hostname = socket.gethostbyaddr(str(ip_address))[0]
        pcid = hostname.split(".")[0].upper()
        logger.debug(f"Resolved {ip_address} ‚Üí {pcid}")
        return pcid
    except socket.herror:
        logger.debug(f"Reverse DNS lookup failed for IP: {ip_address}")
        return None
    except socket.timeout:
        logger.warning(f"Reverse DNS lookup timeout for IP: {ip_address}")
        return None
    except Exception as e:
        logger.error(f"Unexpected error resolving {ip_address}: {e}")
        return None


def get_cpc_version(
    ip: str,
    pcid: str,
    source: str,
    client: str,
    timeout: float = 5.0
) -> Optional[str]:
    """
    Get CPC client version by checking file modification date on network share.
    
    Args:
        ip: IP address of target PC
        pcid: PC identifier
        source: Source directory path on target PC
        client: Client executable filename
        timeout: Operation timeout in seconds
        
    Returns:
        Version string (YYYY-MM-DD) or None if unavailable
    """
    logger = logging.getLogger("cure_pc_audit")
    
    if not ip and not pcid:
        return None
    
    try:
        # Try IP address first
        if ip and not pd.isna(ip):
            unc_path = Path(f"\\\\{ip}") / source / client
            if unc_path.exists():
                mtime = unc_path.stat().st_mtime
                version = datetime.fromtimestamp(mtime).strftime("%Y-%m-%d")
                logger.debug(f"Got CPC version from {ip}: {version}")
                return version
        
        # Fallback to PCID
        if pcid and not pd.isna(pcid):
            unc_path = Path(f"\\\\{pcid}") / source / client
            if unc_path.exists():
                mtime = unc_path.stat().st_mtime
                version = datetime.fromtimestamp(mtime).strftime("%Y-%m-%d")
                logger.debug(f"Got CPC version from {pcid}: {version}")
                return version
        
        logger.debug(f"CPC client not found for {pcid}/{ip}")
        return None
        
    except PermissionError:
        logger.debug(f"Permission denied accessing CPC client on {pcid}/{ip}")
        return None
    except OSError as e:
        logger.debug(f"OS error accessing CPC client on {pcid}/{ip}: {e}")
        return None
    except Exception as e:
        logger.error(f"Unexpected error getting CPC version for {pcid}/{ip}: {e}")
        return None

print("‚úÖ Network functions defined")

## 6. Parallel Processing

In [None]:
def parallel_map(
    func: callable,
    items: List,
    max_workers: int = 10,
    desc: str = "Processing"
) -> List:
    """
    Apply function to items in parallel using thread pool.
    
    Args:
        func: Function to apply to each item
        items: List of items to process
        max_workers: Maximum number of worker threads
        desc: Description for logging
        
    Returns:
        List of results in same order as input items
    """
    logger = logging.getLogger("cure_pc_audit")
    
    if not items:
        return []
    
    results = [None] * len(items)
    total = len(items)
    completed = 0
    
    logger.info(f"{desc}: Processing {total} items with {max_workers} workers...")
    
    with ThreadPoolExecutor(max_workers=max_workers) as executor:
        future_to_idx = {
            executor.submit(func, item): idx
            for idx, item in enumerate(items)
        }
        
        for future in as_completed(future_to_idx):
            idx = future_to_idx[future]
            completed += 1
            
            try:
                results[idx] = future.result()
            except Exception as e:
                logger.error(f"Error processing item {idx}: {e}")
                results[idx] = None
            
            # Log progress every 10% or 10 items, whichever is more frequent
            log_interval = max(1, min(10, total // 10))
            if completed % log_interval == 0 or completed == total:
                progress = (completed / total) * 100
                logger.info(f"{desc}: {completed}/{total} ({progress:.1f}%)")
    
    logger.info(f"{desc}: Complete")
    return results

print("‚úÖ Parallel processing function defined")

## 7. Data Enrichment

In [None]:
def enrich_equipment_pcs(
    df: pd.DataFrame,
    config: Config
) -> pd.DataFrame:
    """
    Enrich equipment PC data with current network information.
    
    Args:
        df: DataFrame containing base equipment PC data
        config: Configuration object
        
    Returns:
        Enriched DataFrame with updated IP addresses, PCIDs, and CPC versions
    """
    logger = logging.getLogger("cure_pc_audit")
    
    logger.info("Starting equipment PC enrichment...")
    start_time = datetime.now()
    
    # Get current IPs from PCIDs (parallel)
    logger.info("Resolving IP addresses from PCIDs...")
    new_ips = parallel_map(
        lambda pcid: get_ip(pcid, config.network_timeout),
        df["PCID"].tolist(),
        max_workers=config.max_workers,
        desc="IP resolution"
    )
    df["New_IP_Address"] = new_ips
    
    # Update IP addresses (keep old if new lookup failed)
    df["IP_Address"] = df.apply(
        lambda row: row["New_IP_Address"] if row["New_IP_Address"] is not None 
        else row.get("IP_Address"),
        axis=1
    )
    
    # Get current PCIDs from IPs (parallel)
    logger.info("Resolving PCIDs from IP addresses...")
    new_pcids = parallel_map(
        lambda ip: get_pcid(ip, config.network_timeout),
        df["IP_Address"].tolist(),
        max_workers=config.max_workers,
        desc="PCID resolution"
    )
    df["New_PCID"] = new_pcids
    
    # Update PCIDs (keep old if new lookup failed)
    df["PCID"] = df.apply(
        lambda row: row["New_PCID"] if row["New_PCID"] is not None 
        else row.get("PCID"),
        axis=1
    )
    
    # Get CPC versions (parallel)
    logger.info("Checking CPC versions...")
    
    def get_version_wrapper(row_tuple):
        """Wrapper to handle row data in parallel processing."""
        ip, pcid = row_tuple
        return get_cpc_version(ip, pcid, config.cpc_source, config.cpc_client)
    
    new_versions = parallel_map(
        get_version_wrapper,
        list(zip(df["IP_Address"], df["PCID"])),
        max_workers=config.max_workers,
        desc="CPC version check"
    )
    df["New_CPC_Version"] = new_versions
    
    # Update CPC versions (keep old if new check failed)
    df["CPC_Version"] = df.apply(
        lambda row: row["New_CPC_Version"] if row["New_CPC_Version"] is not None 
        else row.get("CPC_Version"),
        axis=1
    )
    
    # Clean up temporary columns
    df = df.drop(columns=["New_IP_Address", "New_PCID", "New_CPC_Version"])
    
    # Generate database names
    df["Database_Name"] = df["Alt_Name"].apply(
        lambda name: f"CPC_{str(name).replace(' ', '')}" if pd.notna(name) else None
    )
    
    # Sort naturally by PC column
    logger.info("Sorting results...")
    indexer = index_natsorted(df["PC"])
    df = df.reindex(order_by_index(df.index, indexer))
    df = df.reset_index(drop=True)
    
    duration = (datetime.now() - start_time).total_seconds()
    logger.info(f"Equipment PC enrichment complete in {duration:.1f}s")
    
    # Log statistics
    successful_ips = df["IP_Address"].notna().sum()
    successful_versions = df["CPC_Version"].notna().sum()
    logger.info(f"Statistics:")
    logger.info(f"  - Total PCs: {len(df)}")
    logger.info(f"  - Successful IP lookups: {successful_ips}/{len(df)} "
                f"({successful_ips/len(df)*100:.1f}%)")
    logger.info(f"  - Successful version checks: {successful_versions}/{len(df)} "
                f"({successful_versions/len(df)*100:.1f}%)")
    
    return df

print("‚úÖ Data enrichment function defined")

## 8. Initialize Configuration

‚ö†Ô∏è **Important:** Update the configuration below with your settings before running!

In [None]:
# Define environment
os.environ['ENV'] = 'production'

# Import global config file
base = Path().resolve().parents[2]
sys.path.insert(0, str(base / 'shared/global_config'))

# Import config file variables
import config

# Create configuration
cfg = Config(
    server=config.PROD_SERVER,
    database=config.PYRO_DATABASE,
    max_workers=10,  # Adjust based on your network
    network_timeout=2.0,  # Adjust if you have slow network
    log_level="INFO"  # Use "DEBUG" for more detailed logs
)

print(f"‚úÖ Configuration initialized")
print(f"   Server: {cfg.server}")
print(f"   Database: {cfg.database}")
print(f"   Log directory: {cfg.log_dir}")
print(f"   Max workers: {cfg.max_workers}")

## 9. Setup Logging

In [None]:
# Setup logging
logger = setup_logging(cfg)

logger.info("=" * 80)
logger.info("CURE PC AUDIT SCRIPT - TEST RUN")
logger.info("=" * 80)
logger.info(f"Start time: {datetime.now():%Y-%m-%d %H:%M:%S}")
logger.info(f"Python version: {sys.version}")
logger.info(f"Log file: {cfg.log_dir / f'cure_pc_audit_{datetime.now():%Y%m%d}.log'}")
logger.info("=" * 80)

print("\n‚úÖ Logging configured successfully")
print(f"\nüìù Logs are being written to: {cfg.log_dir / f'cure_pc_audit_{datetime.now():%Y%m%d}.log'}")

## 10. Read Data from Database

In [None]:
# Read current equipment PC data
with get_db_engine(cfg) as engine:
    equip_pcs = read_equipment_pcs(engine, cfg.table_name)

print(f"\n‚úÖ Retrieved {len(equip_pcs)} equipment PCs from database")
print(f"\nPreview of data:")
equip_pcs.head()

## 11. Enrich Data (The Main Process)

This cell performs all the network lookups and file checks in parallel.

**Note:** This may take a few minutes depending on the number of PCs.

In [None]:
# Enrich the data with current information
equip_pcs_enriched = enrich_equipment_pcs(equip_pcs, cfg)

print(f"\n‚úÖ Data enrichment complete!")
print(f"\nEnriched data preview:")
equip_pcs_enriched.head()

## 12. View Results

In [None]:
# Display the full enriched dataframe
pd.set_option('display.max_columns', None)
pd.set_option('display.width', None)

print("\n" + "=" * 80)
print("ENRICHED EQUIPMENT PCs")
print("=" * 80)

equip_pcs_enriched

## 13. Write Results to Database

In [None]:
# Write enriched data back to database
with get_db_engine(cfg) as engine:
    write_equipment_pcs(equip_pcs_enriched, engine, cfg.table_name)

print(f"\n‚úÖ Successfully wrote {len(equip_pcs_enriched)} equipment PCs to database")

## 14. Summary and Completion

In [None]:
logger.info("=" * 80)
logger.info("CURE PC AUDIT SCRIPT - TEST RUN COMPLETE")
logger.info("=" * 80)
logger.info(f"End time: {datetime.now():%Y-%m-%d %H:%M:%S}")
logger.info(f"Status: SUCCESS")
logger.info(f"Processed {len(equip_pcs_enriched)} equipment PCs successfully")
logger.info("=" * 80)

print("\n" + "=" * 80)
print("‚úÖ TEST RUN COMPLETE")
print("=" * 80)
print(f"\nüìä Summary:")
print(f"   - Total PCs processed: {len(equip_pcs_enriched)}")
print(f"   - Successful IP lookups: {equip_pcs_enriched['IP_Address'].notna().sum()}")
print(f"   - Successful CPC version checks: {equip_pcs_enriched['CPC_Version'].notna().sum()}")
print(f"\nüìù Check the log file for detailed information:")
print(f"   {cfg.log_dir / f'cure_pc_audit_{datetime.now():%Y%m%d}.log'}")
print("\n" + "=" * 80)

## 15. Compare with Original Data (Optional)

Run this cell to compare the original and enriched data side by side.

In [None]:
# Compare original vs enriched for first 5 PCs
print("\n" + "=" * 80)
print("COMPARISON: Original vs Enriched (First 5 PCs)")
print("=" * 80)

comparison_cols = ['PCID', 'IP_Address', 'CPC_Version']

for idx in range(min(5, len(equip_pcs))):
    print(f"\nPC #{idx + 1}:")
    print("  Original:")
    for col in comparison_cols:
        if col in equip_pcs.columns:
            print(f"    {col}: {equip_pcs.loc[idx, col]}")
    print("  Enriched:")
    for col in comparison_cols:
        if col in equip_pcs_enriched.columns:
            print(f"    {col}: {equip_pcs_enriched.loc[idx, col]}")
    print("  " + "-" * 60)

## Next Steps

After testing this notebook:

1. **Review the logs** in the `logs/` directory to verify everything worked correctly
2. **Check the database** to ensure the data was written correctly
3. **Compare results** with your original notebook to ensure consistency
4. **If everything looks good**, deploy the Python script version for Task Scheduler

### To deploy as a scheduled task:

1. Copy the `cure_pc_audit_refactored.py` script to your production location
2. Set up Windows Task Scheduler using the instructions in `IMPLEMENTATION_GUIDE.md`
3. Set up monitoring using the `check_audit_status.py` script

### Performance Notes:

- **Parallel processing** makes this ~10x faster than the original notebook
- Adjust `max_workers` in the configuration if needed (more workers = faster, but more network load)
- Adjust `network_timeout` if you have slow network connections

### Troubleshooting:

- If DNS lookups are slow, reduce `max_workers` to 5
- If getting permission errors, ensure you have access to the network shares
- If database writes fail, check your SQL Server permissions
- Check the log file for detailed error messages