# BTC Fake - Training Completion Simulator

This notebook simulates employees completing training courses from both manager assignments and AI recommendations.

## How it works:
1. **Preprocessing**: Downloads training content files from SFTP server
2. **Manager Assigns Training**: Selects and assigns up to 3 Daily Dose contents to all employees
3. **Employee Completes Training**: 
   - Loads manager assignments and AI recommendations for each employee
   - Completes training based on employee type (A, B, or F)
4. **Output Generation**:
   - NonCompletedAssignments CSV file (manager assignments)
   - ContentUserCompletion CSV file (completed training with source tracking)
5. **Summary**: Prints completion details showing which training came from manager vs AI

In [1]:
import sys
print("Python executable:", sys.executable)
print("Python version:", sys.version)
from dotenv import load_dotenv
print("Success!")

Python executable: /Users/khansen/craft/stores/python/python-projects-rdi/btc_fake/.venv/bin/python
Python version: 3.13.2 (main, Feb  4 2025, 14:51:09) [Clang 16.0.0 (clang-1600.0.26.6)]
Success!


In [None]:
import pandas as pd
import requests
from datetime import datetime, timedelta
import random
import string
from typing import List, Dict
import urllib3
import pytz

# Disable SSL warnings when ignoring certificate verification
urllib3.disable_warnings(urllib3.exceptions.InsecureRequestWarning)

# Define Pacific timezone globally for all timestamp operations
# This ensures all timestamps in output files use PT timezone, not UTC
PT = pytz.timezone('America/Los_Angeles')

# Configuration
API_BASE_URL = "https://dataiku-api-devqa.lower.internal.sephora.com"
API_ENDPOINT = "/public/api/v1/mltr/v3/run"
EMPLOYEES_FILE = "input/employees.csv"
OUTPUT_DIR = "generated_files"
SFTP_LOCAL_DIR = "downloaded_files"

# Preprocessing - Download Files from SFTP

This section prepares for a fresh simulation run:

## Cleanup
1. Removes all files from `downloaded_files/` directory
2. Removes all files from `generated_files/` directory
3. Ensures each run starts with a clean slate

## Generate UserCompletion File
1. Copies the UserCompletion template from `docs/sample_files/`
2. Renames it with the current date (YYYY_m_d format)
3. Places it in `generated_files/` directory

## Download Files from SFTP Server
1. **CourseCatalog** - Training curriculum elements like Courses and components
2. **StandAloneContent** - All training content (videos, PDFs, documents)

## Requirements:
1. Copy `.env.example` to `.env` and add your SFTP password
2. Files will be downloaded to `downloaded_files/` directory
3. The system finds the most recent file based on the date in the filename

## File Formats:
- CourseCatalog: `CourseCatalog_V2_YYYY_M_DD_1_random.csv`
- StandAloneContent: `StandAloneContent_v2_YYYY_M_DD_1_random.csv`

In [3]:
# Cleanup: Remove old files from previous runs
import os
import glob

def cleanup_directory(directory: str) -> int:
    """
    Remove all files in a directory (keeps the directory itself and .gitkeep files).
    
    Args:
        directory: Path to directory to clean
    
    Returns:
        Number of files removed
    """
    if not os.path.exists(directory):
        print(f"  Directory does not exist: {directory}")
        return 0
    
    files_removed = 0
    pattern = os.path.join(directory, "*")
    
    for file_path in glob.glob(pattern):
        # Skip .gitkeep files
        if os.path.basename(file_path) == ".gitkeep":
            continue
        
        # Only remove files, not subdirectories
        if os.path.isfile(file_path):
            try:
                os.remove(file_path)
                files_removed += 1
            except Exception as e:
                print(f"  Error removing {file_path}: {e}")
    
    return files_removed

print("=" * 80)
print("PREPROCESSING - Cleanup")
print("=" * 80)
print()

print("Cleaning up directories from previous runs...")
print()

# Clean generated_files directory
print(f"Cleaning {OUTPUT_DIR}/...")
removed = cleanup_directory(OUTPUT_DIR)
print(f"  Removed {removed} file(s)")
print()

# Clean downloaded_files directory
print(f"Cleaning {SFTP_LOCAL_DIR}/...")
removed = cleanup_directory(SFTP_LOCAL_DIR)
print(f"  Removed {removed} file(s)")
print()

print("=" * 80)
print()

PREPROCESSING - Cleanup

Cleaning up directories from previous runs...

Cleaning generated_files/...
  Removed 3 file(s)

Cleaning downloaded_files/...
  Removed 2 file(s)




In [None]:
# Generate UserCompletion file from template
import shutil

def generate_user_completion_file() -> str:
    """
    Copy the UserCompletion template file to generated_files with current date in PT.
    
    Returns:
        Path to the generated file, or None if generation fails
    """
    # Source template file
    source_file = "docs/sample_files/UserCompletion_v2_YYYY_m_d_1_000001.csv"
    
    if not os.path.exists(source_file):
        print(f"  Template file not found: {source_file}")
        return None
    
    # Generate new filename with current date in PT
    now = datetime.now(PT)
    year = now.strftime("%Y")
    month = now.strftime("%-m")  # No leading zero
    day = now.strftime("%-d")    # No leading zero
    
    new_filename = f"UserCompletion_v2_{year}_{month}_{day}_1_000001.csv"
    destination_file = os.path.join(OUTPUT_DIR, new_filename)
    
    # Copy the file
    try:
        shutil.copy2(source_file, destination_file)
        return destination_file
    except Exception as e:
        print(f"  Error copying file: {e}")
        return None

print("=" * 80)
print("PREPROCESSING - Generate UserCompletion File")
print("=" * 80)
print()

print("Generating UserCompletion file from template...")
user_completion_path = generate_user_completion_file()

if user_completion_path:
    print(f"✓ UserCompletion file generated successfully")
    print(f"  File: {user_completion_path}")
else:
    print("✗ Failed to generate UserCompletion file")

print()
print("=" * 80)
print()

In [5]:
# Import SFTP libraries and load environment
import os
import re
from dotenv import load_dotenv
import paramiko
from datetime import datetime

# Load environment variables
load_dotenv()

# SFTP Configuration
SFTP_HOST = "sftp.sephora.com"
SFTP_USER = "SephoraMSL"
SFTP_PASSWORD = os.getenv("SFTP_PASSWORD", "your_sftp_password_placeholder")
SFTP_REMOTE_PATH = "/inbound/BTC/retailData/prod/vendor/mySephoraLearning-archive"

def parse_course_catalog_filename(filename: str) -> tuple:
    """
    Parse course catalog filename to extract date components.
    Format: CourseCatalog_V2_YYYY_M_DD_1_random.csv
    
    Args:
        filename: The course catalog filename
    
    Returns:
        Tuple of (year, month, day, datetime_obj) or None if parsing fails
    """
    pattern = r'CourseCatalog_V2_(\d{4})_(\d{1,2})_(\d{1,2})_\d+_[a-z0-9]+\.csv'
    match = re.match(pattern, filename, re.IGNORECASE)
    
    if match:
        year = int(match.group(1))
        month = int(match.group(2))
        day = int(match.group(3))
        
        try:
            date_obj = datetime(year, month, day)
            return (year, month, day, date_obj)
        except ValueError:
            return None
    return None

def parse_standalone_content_filename(filename: str) -> tuple:
    """
    Parse standalone content filename to extract date components.
    Format: StandAloneContent_v2_YYYY_M_DD_1_random.csv
    
    Args:
        filename: The standalone content filename
    
    Returns:
        Tuple of (year, month, day, datetime_obj) or None if parsing fails
    """
    pattern = r'StandAloneContent_v2_(\d{4})_(\d{1,2})_(\d{1,2})_\d+_[a-z0-9]+\.csv'
    match = re.match(pattern, filename, re.IGNORECASE)
    
    if match:
        year = int(match.group(1))
        month = int(match.group(2))
        day = int(match.group(3))
        
        try:
            date_obj = datetime(year, month, day)
            return (year, month, day, date_obj)
        except ValueError:
            return None
    return None

def download_most_recent_course_catalog() -> str:
    """
    Connect to SFTP server and download the most recent CourseCatalog file.
    
    Returns:
        Path to the downloaded file, or None if download fails
    """
    try:
        # Create SFTP connection
        transport = paramiko.Transport((SFTP_HOST, 22))
        transport.connect(username=SFTP_USER, password=SFTP_PASSWORD)
        sftp = paramiko.SFTPClient.from_transport(transport)
        
        print(f"Connected to SFTP server: {SFTP_HOST}")
        
        # Change to remote directory
        sftp.chdir(SFTP_REMOTE_PATH)
        print(f"Changed to directory: {SFTP_REMOTE_PATH}")
        
        # List all files in the directory
        files = sftp.listdir()
        print(f"Found {len(files)} files in directory")
        
        # Filter for course catalog files and parse dates
        catalog_files = []
        for filename in files:
            parsed = parse_course_catalog_filename(filename)
            if parsed:
                catalog_files.append((filename, parsed[3]))  # (filename, datetime_obj)
        
        if not catalog_files:
            print("No valid CourseCatalog files found")
            sftp.close()
            transport.close()
            return None
        
        # Sort by date (most recent first)
        catalog_files.sort(key=lambda x: x[1], reverse=True)
        most_recent_file = catalog_files[0][0]
        most_recent_date = catalog_files[0][1]
        
        print(f"Most recent file: {most_recent_file} (date: {most_recent_date.strftime('%Y-%m-%d')})")
        
        # Download the file
        local_path = os.path.join(SFTP_LOCAL_DIR, most_recent_file)
        sftp.get(most_recent_file, local_path)
        print(f"Downloaded to: {local_path}")
        
        # Close connections
        sftp.close()
        transport.close()
        
        return local_path
        
    except Exception as e:
        print(f"Error downloading course catalog: {e}")
        print(f"  SFTP Host: {SFTP_HOST}")
        print(f"  SFTP Path: {SFTP_REMOTE_PATH}")
        print(f"  SFTP User: {SFTP_USER}")
        return None

def download_most_recent_standalone_content() -> str:
    """
    Connect to SFTP server and download the most recent StandAloneContent file.
    
    Returns:
        Path to the downloaded file, or None if download fails
    """
    try:
        # Create SFTP connection
        transport = paramiko.Transport((SFTP_HOST, 22))
        transport.connect(username=SFTP_USER, password=SFTP_PASSWORD)
        sftp = paramiko.SFTPClient.from_transport(transport)
        
        print(f"Connected to SFTP server: {SFTP_HOST}")
        
        # Change to remote directory
        sftp.chdir(SFTP_REMOTE_PATH)
        print(f"Changed to directory: {SFTP_REMOTE_PATH}")
        
        # List all files in the directory
        files = sftp.listdir()
        print(f"Found {len(files)} files in directory")
        
        # Filter for standalone content files and parse dates
        content_files = []
        for filename in files:
            parsed = parse_standalone_content_filename(filename)
            if parsed:
                content_files.append((filename, parsed[3]))  # (filename, datetime_obj)
        
        if not content_files:
            print("No valid StandAloneContent files found")
            sftp.close()
            transport.close()
            return None
        
        # Sort by date (most recent first)
        content_files.sort(key=lambda x: x[1], reverse=True)
        most_recent_file = content_files[0][0]
        most_recent_date = content_files[0][1]
        
        print(f"Most recent file: {most_recent_file} (date: {most_recent_date.strftime('%Y-%m-%d')})")
        
        # Download the file
        local_path = os.path.join(SFTP_LOCAL_DIR, most_recent_file)
        sftp.get(most_recent_file, local_path)
        print(f"Downloaded to: {local_path}")
        
        # Close connections
        sftp.close()
        transport.close()
        
        return local_path
        
    except Exception as e:
        print(f"Error downloading standalone content: {e}")
        print(f"  SFTP Host: {SFTP_HOST}")
        print(f"  SFTP Path: {SFTP_REMOTE_PATH}")
        print(f"  SFTP User: {SFTP_USER}")
        return None

In [None]:
# Execute: Download Course Catalog and Standalone Content from SFTP
print("=" * 80)
print("PREPROCESSING - Download Files from SFTP")
print("=" * 80)
print()

# Download Course Catalog
print("Downloading Course Catalog...")
print("-" * 80)
course_catalog_path = download_most_recent_course_catalog()

if course_catalog_path:
    print()
    print(f"✓ Course catalog downloaded successfully")
    print(f"  File: {course_catalog_path}")
    
    # Optionally load and preview the file
    try:
        catalog_df = pd.read_csv(course_catalog_path)
        print(f"  Rows: {len(catalog_df)}")
        print(f"  Columns: {list(catalog_df.columns)}")
    except Exception as e:
        print(f"  Note: Could not preview file: {e}")
else:
    print()
    print("✗ Failed to download course catalog")
    print("  Please check:")
    print("    1. .env file contains valid SFTP_PASSWORD")
    print("    2. SFTP server is accessible")
    print("    3. Remote path exists and contains CourseCatalog files")

print()
print("-" * 80)

# Download Standalone Content
print("Downloading Standalone Content...")
print("-" * 80)
standalone_content_path = download_most_recent_standalone_content()

if standalone_content_path:
    print()
    print(f"✓ Standalone content downloaded successfully")
    print(f"  File: {standalone_content_path}")
    
    # Optionally load and preview the file
    try:
        content_df = pd.read_csv(standalone_content_path)
        print(f"  Rows: {len(content_df)}")
        print(f"  Columns: {list(content_df.columns)}")
    except Exception as e:
        print(f"  Note: Could not preview file: {e}")
else:
    print()
    print("✗ Failed to download standalone content")
    print("  Please check:")
    print("    1. .env file contains valid SFTP_PASSWORD")
    print("    2. SFTP server is accessible")
    print("    3. Remote path exists and contains StandAloneContent files")

print()
print("=" * 80)

PREPROCESSING - Download Files from SFTP

Downloading Course Catalog...
--------------------------------------------------------------------------------
Connected to SFTP server: sftp.sephora.com
Changed to directory: /inbound/BTC/retailData/prod/vendor/mySephoraLearning-archive


In [None]:
# Load employees (used by both Manager and Employee Training sections)
print(f"Loading employees from {EMPLOYEES_FILE}...")
employees_df = pd.read_csv(EMPLOYEES_FILE)

# Filter out comment rows (rows where employee_id starts with '#')
initial_count = len(employees_df)
employees_df['employee_id'] = employees_df['employee_id'].astype(str)
employees_df = employees_df[~employees_df['employee_id'].str.startswith('#')].copy()

# Convert employee_id back to int after filtering comments
employees_df['employee_id'] = employees_df['employee_id'].astype(int)

filtered_count = initial_count - len(employees_df)

if filtered_count > 0:
    print(f"Filtered out {filtered_count} comment row(s)")

print(f"Loaded {len(employees_df)} employees")
print()

# Helper function for formatting content IDs (used by both sections)
def format_content_id(content_id: int) -> str:
    """
    Format content ID with commas for human readability.
    Example: 1915085 -> "1,915,085"
    
    Args:
        content_id: The numeric content ID
    
    Returns:
        Formatted string with commas
    """
    return f"{content_id:,}"

Loading employees from input/employees.csv...
Filtered out 17 comment row(s)
Loaded 12 employees



# Manager - Assign Training to Employees

This section implements the manager functionality:
1. Loads the standalone content file from preprocessing
2. Filters for content where Daily_Dose_BA is TRUE
3. Sorts by CreateDate (most recent first)
4. Selects up to 3 contents to assign
5. Assigns the selected contents to all employees
6. Generates a NonCompletedAssignments CSV file

In [None]:
import pytz

# Date/time helper functions - all use PT timezone defined in Cell 2

def get_monday_of_current_week() -> datetime:
    """
    Get Monday of the current week at 00:01 PT.
    
    Returns:
        datetime object for Monday of current week at 00:01 PT
    """
    now = datetime.now(PT)
    # Monday is 0, Sunday is 6
    days_since_monday = now.weekday()
    
    # Go back to Monday of current week
    monday = now - timedelta(days=days_since_monday)
    
    # Set time to 00:01 PT
    return monday.replace(hour=0, minute=1, second=0, microsecond=0)

def get_next_future_monday() -> datetime:
    """
    Get the next future Monday at 23:59 PT.
    
    Returns:
        datetime object for the next future Monday at 23:59 PT
    """
    now = datetime.now(PT)
    # Monday is 0, Sunday is 6
    current_weekday = now.weekday()
    
    # Calculate days until next Monday
    if current_weekday == 0:
        # Today is Monday, next Monday is 7 days away
        days_until_monday = 7
    else:
        # Days until next Monday
        days_until_monday = (7 - current_weekday)
    
    next_monday = now + timedelta(days=days_until_monday)
    
    # Set time to 23:59 PT
    return next_monday.replace(hour=23, minute=59, second=0, microsecond=0)

def generate_request_id() -> str:
    """
    Generate RequestId in format: MMDDYY:Random3Digits
    Example: 010726:347
    Uses PT timezone for date components.
    
    Returns:
        RequestId string
    """
    now = datetime.now(PT)
    month = now.strftime("%m")
    day = now.strftime("%d")
    year = now.strftime("%y")
    random_digits = random.randint(100, 999)
    
    return f"{month}{day}{year}:{random_digits}"

def generate_non_completed_assignments_filename() -> str:
    """
    Generate NonCompletedAssignments filename with timestamp and random suffix.
    Format: Non_Completed_Assignments_V2_YYYY_M_DD_1_RAND.csv
    Uses PT timezone for date components.
    
    Returns:
        Generated filename
    """
    now = datetime.now(PT)
    year = now.strftime("%Y")
    month = now.strftime("%-m")  # No leading zero
    day = now.strftime("%-d")    # No leading zero
    
    # Generate 6-character random alphanumeric suffix
    random_suffix = ''.join(random.choices(string.ascii_lowercase + string.digits, k=6))
    
    return f"Non_Completed_Assignments_V2_{year}_{month}_{day}_1_{random_suffix}.csv"

In [None]:
# Manager - Select training content to assign to employees
print("=" * 80)
print("MANAGER - Assigning Training to Employees")
print("=" * 80)
print()

# Load the standalone content file that was downloaded earlier
if standalone_content_path and os.path.exists(standalone_content_path):
    print(f"Loading standalone content from: {standalone_content_path}")
    standalone_df = pd.read_csv(standalone_content_path)
    print(f"Loaded {len(standalone_df)} content items")
    print()
    
    # Filter for content where Daily_Dose_BA is TRUE
    print("Filtering for Daily_Dose_BA = TRUE...")
    # Handle both string "TRUE" and boolean True
    daily_dose_content = standalone_df[
        (standalone_df['Daily_Dose_BA'] == 'TRUE') | 
        (standalone_df['Daily_Dose_BA'] == True)
    ].copy()
    
    print(f"Found {len(daily_dose_content)} items with Daily_Dose_BA = TRUE")
    print()
    
    if len(daily_dose_content) > 0:
        # Convert CreateDate to datetime for sorting
        daily_dose_content['CreateDate_dt'] = pd.to_datetime(daily_dose_content['CreateDate'])
        
        # Sort by CreateDate (most recent first)
        daily_dose_content = daily_dose_content.sort_values('CreateDate_dt', ascending=False)
        
        # Select up to 3 most recent contents
        contents_to_assign = daily_dose_content.head(3)
        
        print(f"Selected {len(contents_to_assign)} content(s) to assign:")
        for idx, content in contents_to_assign.iterrows():
            content_id = content['ContentId']
            content_name = content['ContentName']
            create_date = content['CreateDate']
            print(f"  {format_content_id(int(content_id.replace(',', '')))} - {content_name} (Created: {create_date})")
        print()
        
        # Create assignments for all employees
        print(f"Creating assignments for {len(employees_df)} employees...")
        
        all_assignments = []
        
        # Calculate dates for assignments in PT
        created_date = (datetime.now(PT) - timedelta(minutes=5)).isoformat()
        start_date = get_monday_of_current_week().isoformat()
        due_date = get_next_future_monday().isoformat()
        
        for _, employee in employees_df.iterrows():
            employee_id = employee['employee_id']
            
            # Assign each selected content to this employee
            for _, content in contents_to_assign.iterrows():
                content_id = content['ContentId']
                
                assignment = {
                    "UserID": employee_id,
                    "CreateDate_text": created_date,
                    "RequestId": generate_request_id(),
                    "TrainingElementId": content_id,
                    "Start_Date_text": start_date,
                    "DueDate_text": due_date,
                    "ContentType": "Media"
                }
                
                all_assignments.append(assignment)
        
        print(f"Created {len(all_assignments)} total assignments")
        print()
        
        # Generate output file
        if all_assignments:
            assignments_filename = generate_non_completed_assignments_filename()
            assignments_path = f"{OUTPUT_DIR}/{assignments_filename}"
            
            # Create DataFrame
            assignments_df = pd.DataFrame(all_assignments)
            
            # Write to CSV with proper quoting
            assignments_df.to_csv(assignments_path, index=False, quoting=1)  # quoting=1 means QUOTE_ALL
            
            print(f"Generated NonCompletedAssignments file: {assignments_filename}")
            print(f"Total assignments: {len(all_assignments)}")
            print()
            
            # Print summary
            print("Assignment Summary:")
            print(f"  Employees: {len(employees_df)}")
            print(f"  Contents assigned: {len(contents_to_assign)}")
            print(f"  Total assignments: {len(all_assignments)}")
            print(f"  Start Date (Monday of current week): {start_date}")
            print(f"  Due Date (next future Monday): {due_date}")
        else:
            print("No assignments created.")
    else:
        print("No content found with Daily_Dose_BA = TRUE")
        print("No assignments created.")
else:
    print("Standalone content file not found.")
    print("Please run the preprocessing section first.")

print()
print("=" * 80)

# Employee Training Simulation

This section simulates employees completing training based on manager assignments and AI recommendations:

## Workflow:
1. **Get Recommendations** (next cell): Calls ML Training Recommender API for each employee
2. **Get Manager Assignments**: Loads assignments from NonCompletedAssignments file created by manager
3. **Combine Training**: Merges manager assignments with AI recommendations
4. **Helper Functions** (following cell): Generates training timestamps
5. **Process Employee**: Determines completions based on employee type:
   - Type A: Completes all training (manager + AI)
   - Type B: Completes one training (from combined list)
   - Type F: Completes no training
6. **Filename Generator**: Creates unique output filename with timestamp
7. **Main Loop**: Processes all employees and collects completion records
8. **Generate Output**: Writes ContentUserCompletion CSV file
9. **Print Summary**: Displays completion summary with source (manager or AI) for each employee

In [None]:
def get_training_recommendations(employee_id: int) -> List[Dict]:
    """
    Call the training recommender API for a given employee.
    
    Args:
        employee_id: The employee's ID (ba_id)
    
    Returns:
        List of recommended training courses
    """
    url = f"{API_BASE_URL}{API_ENDPOINT}"
    payload = {"data": {"ba_id": employee_id}}
    
    try:
        # Disable SSL certificate verification for internal APIs
        response = requests.post(url, json=payload, timeout=30, verify=False)
        response.raise_for_status()
        data = response.json()
        
        # Response structure: {"response": {"ml_recommendations": [...], "coaching_note": {...}}, "timing": {...}, "apiContext": {...}}
        if isinstance(data, dict):
            response_data = data.get("response", {})
            if isinstance(response_data, dict):
                # Get ml_recommendations from nested response
                recommendations = response_data.get("ml_recommendations", [])
            else:
                # Response is directly a list
                recommendations = response_data if isinstance(response_data, list) else []
        else:
            print(f"  Unexpected response type: {type(data)}")
            return []
        
        # Print selected fields from API response
        if isinstance(recommendations, list) and recommendations:
            print(f"  API Response for employee {employee_id}:")
            for rec in recommendations:
                ba_id = rec.get("ba_id", "N/A")
                content_id = rec.get("recommended_content_id", "N/A")
                recommended_content = rec.get("recommended_content", "N/A")
                print(f"  {ba_id} | {content_id} | {recommended_content}")
            print()
        
        # Ensure we have a list
        if isinstance(recommendations, list):
            return recommendations
        else:
            print(f"  Recommendations is not a list: {type(recommendations)}")
            return []
            
    except Exception as e:
        print(f"  Error fetching recommendations for employee {employee_id}: {e}")
        return []

In [None]:
def generate_training_times(num_courses: int) -> List[tuple]:
    """
    Generate start and completion times for training courses in PT timezone.
    Start time: Current day at 00:05 PT
    Completion time: Current day at 00:09 PT
    
    All timestamps are returned in ISO-8601 format with PT timezone offset
    (e.g., 2026-01-13T00:05:00-08:00), NOT in UTC.
    
    Args:
        num_courses: Number of courses to generate times for
    
    Returns:
        List of (start_time, end_time) tuples in ISO-8601 format with PT timezone
    """
    times = []
    now = datetime.now(PT)  # Use PT timezone
    
    # Set to current day at 00:05 PT for start time
    start_time = now.replace(hour=0, minute=5, second=0, microsecond=0)
    
    # Set to current day at 00:09 PT for completion time
    end_time = now.replace(hour=0, minute=9, second=0, microsecond=0)
    
    for _ in range(num_courses):
        # .isoformat() preserves PT timezone in output
        times.append((
            start_time.isoformat(),
            end_time.isoformat()
        ))
    
    return times

In [None]:
def process_employee(employee_id: int, employee_type: str, manager_assignments_path: str, standalone_df: pd.DataFrame, ai_recommendations: List[Dict] = None) -> List[Dict]:
    """
    Process a single employee: get AI recommendations and manager assignments, then simulate completions.

    Args:
        employee_id: The employee's ID
        employee_type: The employee's type (a, b, or f)
        manager_assignments_path: Path to the NonCompletedAssignments CSV file
        standalone_df: DataFrame containing standalone content for lookups
        ai_recommendations: Optional pre-fetched AI recommendations (to avoid duplicate API calls)

    Returns:
        List of completed training records with PT timezone timestamps
    """
    employee_type = employee_type.lower().strip()

    # Get AI recommendations (use provided ones or fetch new)
    if ai_recommendations is None:
        ai_recommendations = get_training_recommendations(employee_id)

    # Get manager assignments
    manager_assignments = []
    if os.path.exists(manager_assignments_path):
        assignments_df = pd.read_csv(manager_assignments_path)
        
        # Convert UserID to int to match employee_id type
        # (CSV with QUOTE_ALL reads as string, but employee_id is int)
        assignments_df['UserID'] = assignments_df['UserID'].astype(int)
        
        # Filter for this employee
        employee_assignments = assignments_df[assignments_df['UserID'] == employee_id]

        for _, assignment in employee_assignments.iterrows():
            # Get the TrainingElementId and look up the content name
            content_id = assignment['TrainingElementId']

            # Remove commas from content_id if present (it might be formatted)
            if isinstance(content_id, str):
                content_id_numeric = int(content_id.replace(',', ''))
            else:
                content_id_numeric = int(content_id)

            # Look up content name in standalone_df
            # Handle both numeric and string ContentId in standalone_df
            content_row = standalone_df[
                (standalone_df['ContentId'] == content_id) |
                (standalone_df['ContentId'] == str(content_id_numeric))
            ]
            if not content_row.empty:
                content_name = content_row.iloc[0]['ContentName']
            else:
                content_name = "Unknown Manager Assignment"

            manager_assignments.append({
                "recommended_content_id": content_id_numeric,
                "recommended_content": content_name,
                "source": "manager"
            })

    # Tag AI recommendations with source
    for rec in ai_recommendations:
        rec["source"] = "ai"

    # Combine manager assignments and AI recommendations
    all_training = manager_assignments + ai_recommendations

    if not all_training:
        print(f"  No training available for employee {employee_id}")
        return []

    print(f"  Total training available: {len(all_training)} ({len(manager_assignments)} manager + {len(ai_recommendations)} AI)")

    # Determine how many courses to complete based on employee type
    if employee_type == 'a':
        # Type A: complete all training (manager + AI)
        courses_to_complete = all_training
    elif employee_type == 'b':
        # Type B: complete one training (from combined list)
        courses_to_complete = all_training[:1]
    else:
        # Type F: complete no training
        courses_to_complete = []

    # Generate completion records with PT timezone timestamps
    completions = []
    times = generate_training_times(len(courses_to_complete))

    for i, course in enumerate(courses_to_complete):
        try:
            # Validate course is a dict
            if not isinstance(course, dict):
                print(f"  WARNING: Course is not a dict, it's {type(course)}: {course}")
                continue

            start_time, end_time = times[i]
            source = course.get("source", "unknown")
            completions.append({
                "UserId": employee_id,
                "ContentId": format_content_id(course["recommended_content_id"]),
                "DateStarted": start_time,  # ISO-8601 with PT timezone
                "DateCompleted": end_time,  # ISO-8601 with PT timezone
                "CourseName": course.get("recommended_content", "Unknown"),
                "Source": source
            })
        except KeyError as e:
            print(f"  WARNING: Missing key {e} in course data: {course}")
            continue
        except Exception as e:
            print(f"  WARNING: Error processing course: {e}")
            continue

    return completions

def generate_output_filename() -> str:
    """
    Generate output filename with PT timestamp and random suffix.
    Format: ContentUserCompletion_V2_YYYY_MM_DD_1_RAND.csv
    Uses PT timezone for date components for consistency.
    
    Returns:
        Generated filename
    """
    now = datetime.now(PT)  # Use PT timezone for consistency
    year = now.strftime("%Y")
    month = now.strftime("%m")
    day = now.strftime("%d")
    
    # Generate 6-character random alphanumeric suffix
    random_suffix = ''.join(random.choices(string.ascii_lowercase + string.digits, k=6))
    
    return f"ContentUserCompletion_V2_{year}_{month}_{day}_1_{random_suffix}.csv"

In [None]:
# Main execution - Process employees and simulate training completions
print("=" * 80)
print("EMPLOYEE TRAINING SIMULATION")
print("=" * 80)
print()

# Check if manager assignments were created
if 'assignments_path' not in locals() or not os.path.exists(assignments_path):
    print("WARNING: Manager assignments file not found. Employees will only complete AI recommendations.")
    print()
    assignments_path = ""

# Process each employee
all_completions = []
employee_summaries = []
employee_ml_recommendations = []  # Store ML recommendations for summary

for _, employee in employees_df.iterrows():
    employee_id = employee['employee_id']
    employee_type = employee['employee_edu_type']
    
    print(f"Processing Employee {employee_id} (Type {employee_type.upper()})...")
    
    # Get AI recommendations
    ai_recommendations = get_training_recommendations(employee_id)
    
    # Store ML recommendations for this employee
    if ai_recommendations:
        ml_recs = []
        for rec in ai_recommendations:
            ml_recs.append({
                "content_id": rec.get("recommended_content_id"),
                "content_name": rec.get("recommended_content", "Unknown")
            })
        employee_ml_recommendations.append((employee_id, ml_recs))
    
    # Process employee with pre-fetched AI recommendations
    completions = process_employee(employee_id, employee_type, assignments_path, standalone_df, ai_recommendations)
    
    if completions:
        all_completions.extend(completions)
        # Store ContentId, CourseName, and Source for summary
        course_details = [(c['ContentId'], c['CourseName'], c['Source']) for c in completions]
        employee_summaries.append((employee_id, course_details))
        print(f"  Completed {len(completions)} training(s)")
    else:
        print(f"  No training completed")
    print()

print("=" * 80)

EMPLOYEE TRAINING SIMULATION

Processing Employee 63419 (Type A)...
  API Response for employee 63419:
  63419 | 657908 | Sell. How to Reassure Your Client
  63419 | 594096 | Sell. When Clients Say No

  Total training available: 5 (3 manager + 2 AI)
  Completed 5 training(s)

Processing Employee 63492 (Type B)...
  Total training available: 3 (3 manager + 0 AI)
  Completed 1 training(s)

Processing Employee 75412 (Type B)...
  Total training available: 3 (3 manager + 0 AI)
  Completed 1 training(s)

Processing Employee 85038 (Type B)...
  API Response for employee 85038:
  85038 | 913731 | Sell. How to Multiworld Sell
  85038 | 1717886 | Get - Client Cues

  Total training available: 5 (3 manager + 2 AI)
  Completed 1 training(s)

Processing Employee 86994 (Type F)...
  API Response for employee 86994:
  86994 | 892298 | Fragrance - Get. Give. Teach. Sell.
  86994 | 574327 | Servicing Multiple Clients

  Total training available: 5 (3 manager + 2 AI)
  No training completed

Processin

In [None]:
# Generate output file
if all_completions:
    output_filename = generate_output_filename()
    output_path = f"{OUTPUT_DIR}/{output_filename}"
    
    # Create DataFrame with only the required columns for CSV
    output_df = pd.DataFrame(all_completions)
    output_df = output_df[['UserId', 'ContentId', 'DateStarted', 'DateCompleted']]
    
    # Write to CSV with proper quoting
    output_df.to_csv(output_path, index=False, quoting=1)  # quoting=1 means QUOTE_ALL
    
    print(f"Generated output file: {output_filename}")
    print(f"Total completions: {len(all_completions)}")
    print()
else:
    print("No training completions to write.")
    print()

In [None]:
# Print summary
print("=" * 80)
print("MANAGER-ASSIGNMENTS GIVEN")
print("=" * 80)
print()

# Display all assignments created by the manager, grouped by assignment set
if 'assignments_path' in locals() and os.path.exists(assignments_path):
    assignments_df = pd.read_csv(assignments_path)
    
    if len(assignments_df) > 0:
        # Step 1: Group assignments by employee to get each employee's set of trainings
        employee_assignments = {}
        
        for _, assignment in assignments_df.iterrows():
            employee_id = assignment['UserID']
            content_id = assignment['TrainingElementId']
            
            if employee_id not in employee_assignments:
                employee_assignments[employee_id] = []
            employee_assignments[employee_id].append(content_id)
        
        # Step 2: Group employees by their set of trainings (using frozenset for hashability)
        assignment_groups = {}
        
        for employee_id, content_list in employee_assignments.items():
            # Sort content list for consistent ordering and convert to tuple for hashability
            content_set = tuple(sorted(content_list, key=str))
            
            if content_set not in assignment_groups:
                assignment_groups[content_set] = []
            assignment_groups[content_set].append(employee_id)
        
        # Step 3: Print each group of employees with their common assignment set
        for content_set, employee_list in assignment_groups.items():
            print("Training Assignments:")
            
            # Print each training in the set
            for content_id in content_set:
                # Look up content name in standalone_df
                if isinstance(content_id, str):
                    content_id_numeric = int(content_id.replace(',', ''))
                else:
                    content_id_numeric = int(content_id)
                
                # Find the course name
                content_row = standalone_df[
                    (standalone_df['ContentId'] == content_id) |
                    (standalone_df['ContentId'] == str(content_id_numeric))
                ]
                
                if not content_row.empty:
                    course_name = content_row.iloc[0]['ContentName']
                else:
                    course_name = "Unknown"
                
                print(f"  Content ID: {content_id:<15} | Course Name: {course_name}")
            
            # Print employees who received this assignment set (comma-separated)
            employee_ids_str = ", ".join([str(emp_id) for emp_id in sorted(employee_list)])
            print(f"Employees: {employee_ids_str}")
            print()
    else:
        print("No assignments were created by the manager.")
else:
    print("No manager assignments file found.")

print("=" * 80)
print("ML-RECOMMENDATIONS GIVEN")
print("=" * 80)
print()

# Display all ML recommendations given to employees, grouped by recommendation set
if employee_ml_recommendations:
    # Step 1: Group employees by their recommendation set
    recommendation_groups = {}
    
    for employee_id, ml_recs in employee_ml_recommendations:
        # Sort recommendation list for consistent ordering and convert to tuple for hashability
        rec_set = tuple(sorted([(rec["content_id"], rec["content_name"]) for rec in ml_recs], key=lambda x: str(x[0])))
        
        if rec_set not in recommendation_groups:
            recommendation_groups[rec_set] = []
        recommendation_groups[rec_set].append(employee_id)
    
    # Step 2: Print each group of employees with their common recommendation set
    for rec_set, employee_list in recommendation_groups.items():
        print("ML Recommendations:")
        
        # Print each recommendation in the set
        for content_id, content_name in rec_set:
            print(f"  Content ID: {content_id:<15} | Course Name: {content_name}")
        
        # Print employees who received this recommendation set (comma-separated)
        employee_ids_str = ", ".join([str(emp_id) for emp_id in sorted(employee_list)])
        print(f"Employees: {employee_ids_str}")
        print()
else:
    print("No ML recommendations were given to any employee.")

print("=" * 80)
print("MANAGER-ASSIGNED TRAINING COMPLETIONS")
print("=" * 80)
print()

# Track if any manager assignments were completed
manager_completions_found = False

# Collect all manager completions for table display
manager_completion_rows = []

for employee_id, course_details in employee_summaries:
    # Filter for manager-assigned training only
    manager_courses = [(content_id, course_name) for content_id, course_name, source in course_details if source == "manager"]
    
    if manager_courses:
        manager_completions_found = True
        for content_id, course_name in manager_courses:
            manager_completion_rows.append((employee_id, content_id, course_name))

if manager_completions_found:
    # Print header
    print(f"{'Employee ID':<15} | {'Content ID':<15} | {'Course Name'}")
    print(f"{'-' * 15} | {'-' * 15} | {'-' * 50}")
    
    # Print each completion on a separate row
    for employee_id, content_id, course_name in manager_completion_rows:
        print(f"{employee_id:<15} | {content_id:<15} | {course_name}")
else:
    print("No manager-assigned training was completed by any employee.")

print()
print("=" * 80)
print("ML-RECOMMENDED TRAINING COMPLETIONS")
print("=" * 80)
print()

# Track if any ML recommendations were completed
ml_completions_found = False

# Collect all ML completions for table display
ml_completion_rows = []

for employee_id, course_details in employee_summaries:
    # Filter for ML-recommended training only
    ml_courses = [(content_id, course_name) for content_id, course_name, source in course_details if source == "ai"]
    
    if ml_courses:
        ml_completions_found = True
        for content_id, course_name in ml_courses:
            ml_completion_rows.append((employee_id, content_id, course_name))

if ml_completions_found:
    # Print header
    print(f"{'Employee ID':<15} | {'Content ID':<15} | {'Course Name'}")
    print(f"{'-' * 15} | {'-' * 15} | {'-' * 50}")
    
    # Print each completion on a separate row
    for employee_id, content_id, course_name in ml_completion_rows:
        print(f"{employee_id:<15} | {content_id:<15} | {course_name}")
else:
    print("No ML-recommended training was completed by any employee.")

print()
print("=" * 80)
print("Simulation complete!")
print("=" * 80)

MANAGER-ASSIGNMENTS GIVEN

Training Assignments:
  Content ID: 1,995,576       | Course Name: December Training Product
  Content ID: 2,021,630       | Course Name: What's Hot For January
  Content ID: 2,024,318       | Course Name: Skincare Consultations with Teens and Tweens
Employees: 63419, 63492, 75412, 85038, 86994, 88563, 104829, 109828, 151557, 155810, 173789, 175342

ML-RECOMMENDATIONS GIVEN

ML Recommendations:
  Content ID: 594096          | Course Name: Sell. When Clients Say No
  Content ID: 657908          | Course Name: Sell. How to Reassure Your Client
Employees: 63419

ML Recommendations:
  Content ID: 1717886         | Course Name: Get - Client Cues
  Content ID: 913731          | Course Name: Sell. How to Multiworld Sell
Employees: 85038

ML Recommendations:
  Content ID: 574327          | Course Name: Servicing Multiple Clients
  Content ID: 892298          | Course Name: Fragrance - Get. Give. Teach. Sell.
Employees: 86994

ML Recommendations:
  Content ID: 1717885