<a href="https://colab.research.google.com/github/90rdon/ai-dynamic-scheduling/blob/main/Workforce_Optimizer.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Idea

This is a dedicated space for brainstorming innovative ideas on how we can leverage AI to enhance the solutions we provide to Skedulo customers.

AI-Driven Workforce Optimizer Idea
----------------------------------

### **Introduction**

The AI-Driven Workforce Optimizer System is an integrated solution designed to enhance workforce management by automating and optimizing job order assignments. It leverages predictive analytics, real-time availability data, intelligent skill matching, dynamic scheduling, and customer feedback mechanisms to ensure efficient resource allocation, optimal job fit, adaptability to changing demands, and continuous improvement.

### **High-level Architecture Diagram**

Open workforce-optimizer-architecture-diagram.pdf**workforce-optimizer-architecture-diagram.pdf**15 Nov 2024, 05:28 PM

### Key components

1.  **Predictive Analytics for Demand Forecasting**
    
    *   Historical Data Analysis: Utilizes machine learning algorithms to analyze historical job order data and identify patterns in demand.
        
    *   Accurate Forecasting: Generates precise demand forecasts to anticipate job order volumes. It optimizes resource availability schedules in collaboration with the Resource Availability Management component.
        
2.  **Resource Availability Management**
    
    *   Resource Availability Integration: Directly integrates with Human Resource Management Systems (HRMS) to automatically retrieve and monitor paid time off (PTO), sick leave, holidays, and other absence types.
        
    *   Optimized Availability Schedules: Employs AI algorithms to create optimized resource availability schedules by considering HRMS data, labor regulations, workload patterns, and internal business rules.
        
    *   Real-Time Updates: Continuously monitors HRMS changes to perform availability swaps and automatically adjust schedules in real-time.
        
    *   Centralized Database: Maintains a centralized availability database serving as the single source of truth for resource scheduling.
        
3.  **Intelligent Skill Matching for Optimal Job Assignment**
    
    *   Resource Profiles: Maintains detailed resource data, including skills, certifications, experience, and performance metrics.
        
    *   AI-Driven Matching Algorithms: Utilizes AI algorithms to align job order requirements with the most suitable resources based on qualifications, expertise, and historical performance.
        
4.  **Dynamic Scheduling for Adaptive Resource Allocation**
    
    *   Real-Time Adjustments: Dynamically modifies schedules based on incoming job orders, leveraging data from the Resource Availability Management component.
        
    *   Optimized Allocation: Maximizes efficiency by optimizing resource allocation based on weighted objectives such as minimizing cost, reducing drive time, balancing workloads, and ensuring the best fit for each job.
        
5.  **Customer Feedback Mechanism for Continuous Improvement**
    
    *   Feedback Collection Channels: Integrates chat and email feedback systems to gather customer feedback after service delivery.
        
    *   Automated Feedback Analysis: Employs AI algorithms to analyze feedback for sentiment, common issues, and suggestions.
        
    *   Integration with Continuous Learning: Uses insights from feedback to refine predictive models, improve service quality, and enhance resource training programs.
        
    *   Real-Time Alerts: Identifies real-time critical feedback, allowing immediate corrective actions.
        

### Telemedicine Workflow Example:

Scenario: A healthcare organization anticipates increased demand for telemedicine services due to an upcoming flu season, requiring efficient coordination of traveling medical providers.

#### Phase 1: Forecasting and Planning

*   Step 1: Demand Forecasting
    
    *   Data Analysis: The system analyzes historical patient data, seasonal illness trends, and public health reports to predict a surge in telemedicine consultations.
        
    *   Machine Learning Predictions: Algorithms forecast increased demand for specific specialties, such as respiratory care and infectious disease experts, over the next few weeks.
        
*   Step 2: Resource Availability Scheduling
    
    *   Scheduling Adjustments: The forecasted demand is communicated to the Resource Availability Management component.
        
    *   Proactive Staffing: AI algorithms adjust the schedules of traveling medical providers to ensure sufficient coverage during peak times.
        
    *   Regulatory Compliance: To ensure compliance, the system considers licensing requirements, time zones, and telemedicine regulations across different regions.
        
    *   Resource Allocation: Additional resources are allocated to high-demand areas, and telemedicine shifts are offered to qualified providers.
        

#### Phase 2: Appointment Booking and Processing

*   Step 3: Patient Appointment Requests
    
    *   Multiple Booking Channels: Patients schedule telemedicine appointments via online portals, mobile apps, or call centers.
        
    *   Data Collection: Appointment requests include patient symptoms, preferred consultation times, language preferences, and insurance information.
        
    *   Priority Identification: Urgent cases are flagged based on symptom severity for expedited scheduling.
        
*   Step 4: Immediate Analysis
    
    *   Availability Matching: The system checks incoming appointment requests against real-time provider availability and schedules.
        
    *   Resource Assessment: Identifies any gaps in coverage and prompts additional resource allocation if necessary.
        

#### Phase 3: Intelligent Skill Matching and Assignment

*   Step 5: Skill-Based Provider Matching
    
    *   Provider Profile: The system references detailed provider profiles, including specialties (e.g., pediatrics, pulmonology), certifications, language skills, and telemedicine experience.
        
    *   AI-Driven Matching: AI algorithms match patients with the most suitable providers, ensuring that medical needs are effectively addressed. The system incorporates patient preferences such as provider gender, language, and prior consultations to enhance the matching process.
        
*   Step 6: Assignment Proposal
    
    *   Optimized Scheduling: Generates an assignment plan that balances objectives such as minimizing patient wait times, balancing provider workloads, and reducing operational costs.
        
    *   Approval Workflow: The proposed assignments can be reviewed by a scheduler or automatically confirmed based on predefined rules.
        

#### Phase 4: Dynamic Scheduling and Communication

*   Step 7: Real-Time Schedule Updates
    
    *   Dynamic Adjustments: Schedules are updated in real-time to reflect new appointments, cancellations, or changes in provider availability.
        
    *   Conflict Resolution: The system automatically resolves scheduling conflicts and reallocates resources as needed.
        
*   Step 8: Notifications and Confirmations
    
    *   Provider Alerts: Medical professionals receive secure notifications detailing appointment times, patient information, and any special considerations.
        
    *   Patient Confirmations: Patients receive confirmations with appointment details and instructions for accessing the telemedicine platform.
        
    *   Reminders: Automated reminders are sent to both patients and providers to reduce no-show rates.
        

#### Phase 5: Visitation Execution and Monitoring

*   Step 9: Telemedicine Session Management
    
    *   Secure Access: Providers access the visitation data through a secure, compliant mobile application.
        
    *   EHR Integration: Providers have access to patients' electronic health records (EHRs) during consultations for informed decision-making.
        
    *   Support Availability: Technical support is on standby to assist with any technical issues.
        
*   Step 10: Real-Time Monitoring and Support
    
    *   Session Tracking: The system monitors consultation durations, start and end times, and schedule adherence.
        
    *   Issue Detection: Detects any deviations or delays and alerts the dynamic scheduling system to take corrective actions.
        
    *   Data Recording: Collects data on consultation outcomes for quality assurance and reporting purposes.
        

#### Phase 6: Post-Consultation Analysis and Continuous Improvement

*   Step 11: Performance Metrics Collection
    
    *   Provider Feedback: Medical professionals input consultation notes, prescribe treatments, and recommend follow-ups as needed.
        
    *   Patient Feedback: Patients are prompted to complete satisfaction surveys rating their experience.
        
    *   Automated Feedback Analysis: AI algorithms analyze feedback for sentiment, common concerns, and suggestions.
        
*   Step 12: Data Analysis and Model Refinement
    
    *   Trend Analysis: Aggregates data to identify patterns, such as common feedback or frequently asked questions.
        
    *   Model Updates: Uses insights to refine demand forecasting models and improve the accuracy of future predictions.
        
    *   Service Enhancement: Identifies areas where providers may benefit from additional training or resources.
        
    *   Real-Time Alerts: Critical feedback triggers immediate alerts for rapid response and issue resolution.
        

##### **Morning Scheduling Scenario on a Peak Demand Day:**

*   **6:00 AM:** The system forecasts a 30% increase in telemedicine appointment requests based on early morning booking trends and recent public health alerts.
    
*   **6:05 AM:** Resource Availability Management adjusts schedules, activating standby traveling providers and extending available hours for existing providers willing to take additional shifts.
    
*   **6:10 AM:** Patients continue booking appointments for the day. Urgent care requests are flagged for providers with immediate availability.
    
*   **6:15 AM:** The system matches incoming appointments with the best-suited providers, considering factors like specialization in respiratory illnesses and patient language preferences.
    
*   **6:20 AM:** Providers receive notifications of their updated schedules, including any new appointments and changes to existing ones.
    
*   **7:00 AM—12:00 PM:** Providers conduct telemedicine visitations. The system monitors sessions, ensuring they start and end on time and collecting real-time feedback.
    
*   **Throughout the Day:** Any cancellations or no-shows are immediately reflected in the system, allowing for dynamic reallocation of time slots to other waiting patients.
    
*   **Post-Consultations:** Patients receive prompts to provide feedback via chat or email, which the system analyzes to improve service quality.
    

### Conclusion

The AI-Driven Workforce Optimizer System offers a comprehensive and integrated solution for modern workforce management challenges by fully automating processes such as demand forecasting, resource availability scheduling, skill matching, dynamic scheduling, and customer feedback collection and analysis, operating with minimal human interaction. This high level of automation reduces human error. It significantly decreases the need for a sizeable administrative workforce, enabling organizations to achieve greater efficiency, accuracy, and cost savings while delivering services effectively and responsively.

# Architecture Diagram

## Mermaid Diagram

graph LR
    subgraph "External Interfaces"
        direction LR
        subgraph "Work Execution Interfaces"
            TPP["Third-Party Platforms"]
            MOB["Mobile Applications"]
            WEB["Web Applications"]
        end
        subgraph "System Interfaces"
            API["REST APIs & WebSockets"]
            IOT["IoT & Sensors"]
        end
        subgraph "Agent-Computer Interfaces"
            ACI["Intuitive User Interfaces"]
            CNV["Conversational Interfaces"]
            VA["Visual Analytics"]
        end
    end

    subgraph "Event Processing Infrastructure"
        direction LR
        KAF["Kafka Streams"]
        CEP["Complex Event Processor"]
    end

    subgraph "Core Intelligence Services"
        direction LR
        PAE["Predictive Analytics (Work Demand, Resource Utilization)"]
        ISM["Skill Matcher (Resource-Work Matching)"]
        RAM["Resource Manager (Availability, Capacity)"]
        DS["Dynamic Scheduler (Work Assignment, Optimization)"]
        CFM["Feedback Manager (Multi-source Quality Analysis)"]
        AIA["AI Agents (Intelligent Assistance)"]
        RAG["RAG (Retrieval-Augmented Generation)"]
    end

    subgraph "Real-Time Operations"
        direction LR
        RTA["Real-Time Analytics (Event Stream Analytics, Performance Monitoring)"]
        RTM["Real-Time Monitor (Status Tracking, Service Health Monitoring)"]
        ALS["Alert System (SLA Monitoring, Notification Management)"]
    end

    subgraph "Data Persistence"
        direction LR
        OD[("Operational Data
        (Cache, Time Series DB,
        Event Store, Command State)")]
        BD[("Business Data
        (Historical, Resource, Profile,
        Feedback, Knowledge Base)")]
    end

    subgraph "Command Processing"
        direction LR
        RMQ["RabbitMQ (Message Bus)"]
        CMD["Command Handler Service"]
    end

    %% Legend
    subgraph "Legend"
        L1[Events]:::legend
        L2[Commands]:::legend
        L3[Metrics & Feedback]:::legend
        L4[Updates & Queries]:::legend
        L5[Agent Interactions]:::legend
        L1 --- LE1[" "]
        L2 --- LC1[" "]
        L3 --- LM1[" "]
        L4 --- LU1[" "]
        L5 --- LA1[" "]

        style LE1 fill:none,stroke:none
        style LC1 fill:none,stroke:none
        style LM1 fill:none,stroke:none
        style LU1 fill:none,stroke:none
        style LA1 fill:none,stroke:none
    end

    %% Consolidated Flows
    TPP & MOB & WEB & API & IOT -->|Events| KAF
    KAF -->|Events| CEP
    CEP -->|Events| PAE & ISM & RAM & DS & CFM & RTA & RTM & ALS
    PAE & ISM & RAM & DS & CFM & RTA -->|Metrics & Feedback| AIA & RAG
    AIA & RAG -->|Updates & Queries| OD & BD
    DS & ALS & RTM & AIA -->|Commands| RMQ
    RMQ <-->|Commands| CMD
    CMD <-->|State & Events| OD & KAF
    ACI & CNV & VA <-->|Agent Interactions| AIA & RAG

    %% Styling
    classDef interface fill:#f8cecc,stroke:#FF0000
    classDef command fill:#ffe6cc,stroke:#d79b00
    classDef event fill:#e1d5e7,stroke:#9673a6
    classDef core fill:#dae8fc,stroke:#6c8ebf
    classDef realtime fill:#fff2cc,stroke:#d6b656
    classDef storage fill:#d5e8d4,stroke:#82b366
    classDef legend fill:white,stroke:#000,stroke-width:1px

    %% Legend lines (matching their respective colors)
    linkStyle 0 stroke:#4caf50,stroke-width:2px
    linkStyle 1 stroke:#ff9800,stroke-width:2px
    linkStyle 2 stroke:#FF0000,stroke-width:2px
    linkStyle 3 stroke:#2196f3,stroke-width:2px
    linkStyle 4 stroke:#7B1FA2,stroke-width:2px

    %% Link Styling for consolidated flows
    linkStyle 5,6,7,8,9,10,11,12,13,14,15,16,17,18 stroke:#4caf50,stroke-width:2px
    linkStyle 19,20,21,22,23,24,25,26,27,28,29,30 stroke:#FF0000,stroke-width:2px
    linkStyle 31,32,33,34 stroke:#2196f3,stroke-width:2px
    linkStyle 35,36,37,38,39 stroke:#ff9800,stroke-width:2px
    linkStyle 40,41 stroke:#4caf50,stroke-width:2px
    linkStyle 42,43,44,45,46,47 stroke:#7B1FA2,stroke-width:2px

    class TPP,MOB,API,WEB,IOT,ACI,CNV,VA interface
    class RMQ,CMD command
    class KAF,CEP event
    class PAE,RAM,ISM,DS,CFM,AIA,RAG core
    class RTA,RTM,ALS realtime
    class OD,BD storage
    class L1,L2,L3,L4,L5 legend

In [None]:
import base64
import json
import requests
from IPython.display import Image, display

def generate_mermaid_diagram():
    try:
        # # Get the contents of the previous cell
        # previous_cell = In[-2]  # This gets the content of the previous cell

        # # Skip if the previous cell contains Python code
        # if "import" in previous_cell or "def" in previous_cell:
        #     raise ValueError("Previous cell appears to be Python code instead of Mermaid markup")

        mark_down = """
graph TB
    subgraph "External Interfaces"
        direction LR
        subgraph "Work Execution Interfaces"
            TPP["Third-Party Platforms"]
            MOB["Mobile Applications"]
            WEB["Web Applications"]
        end
        subgraph "System Interfaces"
            API["REST APIs & WebSockets"]
            IOT["IoT & Sensors"]
        end
    end

    subgraph "Event Processing Infrastructure"
        direction LR
        KAF["Kafka Streams"]
        CEP["Complex Event Processor"]
    end

    subgraph "Core Intelligence Services"
        direction LR
        PAE["Predictive Analytics (Work Demand, Resource Utilization)"]
        ISM["Skill Matcher (Resource-Work Matching)"]
        RAM["Resource Manager (Availability, Capacity)"]
        DS["Dynamic Scheduler (Work Assignment, Optimization)"]
        CFM["Feedback Manager (Multi-source Quality Analysis)"]
    end

    subgraph "Real-Time Operations"
        direction LR
        RTA["Real-Time Analytics (Event Stream Analytics, Performance Monitoring)"]
        RTM["Real-Time Monitor (Status Tracking, Service Health Monitoring)"]
        ALS["Alert System (SLA Monitoring, Notification Management)"]
    end

    subgraph "Data Persistence"
        direction LR
        subgraph "Operational Data"
            CH[("Cache (Active Work Items)")]
            TS[("Time Series DB (Performance Data)")]
            ES[("Event Store (Log History)")]
            CMDDB[("Command State Store")]
        end
        subgraph "Business Data"
            HD[("Historical Data (Job History)")]
            RAD[("Resource DB (Workforce Data)")]
            RPD[("Profile DB (Skills Matrix)")]
            FD[("Feedback DB (Multi-source Metrics)")]
        end
    end

    subgraph "Command Processing"
        direction LR
        RMQ["RabbitMQ (Message Bus)"]
        CMD["Command Handler Service"]
    end

    %% Legend
    subgraph "Legend"
        L1[Events]:::legend
        L2[Commands]:::legend
        L3[Metrics]:::legend
        L4[Updates]:::legend
        L5[Feedback]:::legend
        L1 --- LE1[" "]
        L2 --- LC1[" "]
        L3 --- LM1[" "]
        L4 --- LU1[" "]
        L5 --- LF1[" "]

        style LE1 fill:none,stroke:none
        style LC1 fill:none,stroke:none
        style LM1 fill:none,stroke:none
        style LU1 fill:none,stroke:none
        style LF1 fill:none,stroke:none
    end

    %% Events Flow (Green #4caf50)
    API <-->|Events| KAF
    MOB & WEB <-->|Events| KAF
    TPP <-->|Events| KAF
    IOT -->|Telemetry| KAF
    KAF --> CEP
    CEP --> RTA
    CEP --> RTM
    CEP --> ALS
    CEP -->|Feedback| CFM

    %% Command Flow (Orange #ff9800)
    DS -->|Commands| RMQ
    ALS -->|Alerts| RMQ
    RTM -->|Notifications| RMQ
    MOB & WEB -->|Commands| RMQ
    RMQ -->|Commands| CMD
    CMD -->|State| CMDDB
    CMD <-->|Read/Update| CMDDB
    CMD -->|Responses| RMQ
    RMQ -->|Responses| MOB & WEB
    CMD -->|Command Events| KAF

    %% Metrics Flow (Purple #9c27b0)
    RTA -->|Metrics| PAE
    CFM -->|Insights| ISM
    CFM -->|Ratings| RAM
    PAE -->|Predictions| ISM

    %% Updates Flow (Blue #2196f3)
    ISM -->|Assignments| DS
    RAM -->|Resources| ISM
    RTM -->|Status| RAM
    ALS <-->|Priority| DS
    DS <-->|Assignments| CH
    RAM <-->|Resources| CH
    ISM <-->|Matching| CH
    DS -->|Schedule| RTM
    PAE -->|Forecast| RAM
    RTM -->|Cache| CH
    RTA <-->|Time Series| TS
    ALS <-->|Events| ES
    CFM <-->|Feedback| FD
    PAE <-->|History| HD
    RAM <-->|Resource| RAD
    ISM <-->|Profile| RPD

    %% Feedback Flow (Red #e91e63)
    MOB & WEB -->|Feedback| KAF

    %% Styling
    classDef interface fill:#f8cecc,stroke:#b85450
    classDef command fill:#ffe6cc,stroke:#d79b00
    classDef event fill:#e1d5e7,stroke:#9673a6
    classDef core fill:#dae8fc,stroke:#6c8ebf
    classDef realtime fill:#fff2cc,stroke:#d6b656
    classDef storage fill:#d5e8d4,stroke:#82b366
    classDef legend fill:white,stroke:#000,stroke-width:1px

    %% Legend lines (matching their respective colors)
    linkStyle 0 stroke:#4caf50,stroke-width:2px
    linkStyle 1 stroke:#ff9800,stroke-width:2px
    linkStyle 2 stroke:#9c27b0,stroke-width:2px
    linkStyle 3 stroke:#2196f3,stroke-width:2px
    linkStyle 4 stroke:#e91e63,stroke-width:2px

    %% Link Styling for all other connections
    linkStyle default stroke:#000000,stroke-width:1px

    class TPP,MOB,API,WEB,IOT interface
    class RMQ,CMD command
    class KAF,CEP event
    class PAE,RAM,ISM,DS,CFM core
    class RTA,RTM,ALS realtime
    class CH,TS,ES,HD,RAD,RPD,FD,CMDDB storage
    class L1,L2,L3,L4,L5 legend
        """

        # The Mermaid Live Editor API endpoint
        mermaid_api_url = "https://mermaid.ink/img/"

        # Encode the Mermaid code
        encoded_diagram = base64.urlsafe_b64encode(mark_down.encode()).decode()

        # Create the full URL
        full_url = f"{mermaid_api_url}{encoded_diagram}"

        # Get the image
        response = requests.get(full_url)
        response.raise_for_status()

        # Display the image
        display(Image(response.content))

        print("Diagram generated successfully!")

    except ValueError as e:
        print(f"Error: {str(e)}")
        print("\nPlease ensure the previous cell contains only Mermaid diagram markup.")
    except Exception as e:
        print(f"Error generating diagram: {str(e)}")
        print("\nMermaid markup from previous cell (first 100 characters):")
        # Only show the first 100 characters to avoid flooding the output
        print(mark_down[:100] + "..." if len(mark_down) > 100 else mark_down)

# Generate the diagram
generate_mermaid_diagram()

<IPython.core.display.Image object>

Diagram generated successfully!


# Sample Code

## Skill Matching

In [None]:
# Step 1: Install Necessary Libraries
!pip install transformers torch wandb -q

# Step 2: Import Libraries
from transformers import BertTokenizer, BertForSequenceClassification, Trainer, TrainingArguments
from transformers import pipeline
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score, precision_recall_fscore_support
import torch
import pandas as pd
from google.colab import userdata
import wandb
import os
import re

# Retrieve the Weights & Biases API key from Colab secrets
WANDB_AI_KEY = userdata.get("WANDB_AI_KEY")
if WANDB_AI_KEY:
    os.environ["WANDB_API_KEY"] = WANDB_AI_KEY
    wandb.login()  # Log in to Weights & Biases

# Step 3: Define Dataset

# Sample data: Job description and worker profiles
data = [
    {"text": "Seeking a certified electrician for complex electrical installations in a residential area.", "label": 1},
    {"text": "Experienced electrician with residential installation expertise and safety certifications.", "label": 1},
    {"text": "General technician with basic knowledge of electrical systems.", "label": 0},
    {"text": "Licensed electrician specializing in commercial installations and troubleshooting.", "label": 1},
    {"text": "Looking for a skilled HVAC technician to perform maintenance and repairs on large systems.", "label": 0},
    {"text": "Electrician with a background in industrial and factory setups.", "label": 1},
    {"text": "Maintenance worker familiar with general electrical troubleshooting and repair tasks.", "label": 0},
    {"text": "Certified electrician with experience in both residential and commercial buildings.", "label": 1},
    {"text": "Assistant electrician with limited experience, mostly on residential projects.", "label": 0},
    {"text": "Technician skilled in HVAC systems with over 5 years of field experience.", "label": 0},
    {"text": "Master electrician with a focus on large-scale commercial electrical installations.", "label": 1},
    {"text": "Industrial electrician skilled in heavy machinery and complex repair work.", "label": 1},
]

# Convert to DataFrame
df = pd.DataFrame(data)

# Step 4: Prepare Data for BERT

# Load the tokenizer
tokenizer = BertTokenizer.from_pretrained("bert-base-uncased")

# Tokenize the input data
def preprocess_data(data):
    return tokenizer(data['text'], padding='max_length', truncation=True, max_length=64, return_tensors="pt")

# Apply tokenization to the DataFrame
encoded_data = df['text'].apply(lambda x: tokenizer(x, padding='max_length', truncation=True, max_length=64, return_tensors="pt"))

# Convert labels to a tensor
labels = torch.tensor(df['label'].values)

# Step 5: Fine-Tune the Pre-trained Model on Domain-Specific Data

# Load the pre-trained BERT model for sequence classification
model = BertForSequenceClassification.from_pretrained("bert-base-uncased", num_labels=2)

# Define a custom training dataset class
class WorkerDataset(torch.utils.data.Dataset):
    def __init__(self, encodings, labels):
        self.encodings = encodings
        self.labels = labels

    def __getitem__(self, idx):
        item = {key: val[idx] for key, val in self.encodings.items()}
        item['labels'] = self.labels[idx]
        return item

    def __len__(self):
        return len(self.labels)

# Split the data into training and validation sets
train_texts, val_texts, train_labels, val_labels = train_test_split(df['text'], labels, test_size=0.2, random_state=42)
train_encodings = tokenizer(list(train_texts), truncation=True, padding=True, max_length=64, return_tensors="pt")
val_encodings = tokenizer(list(val_texts), truncation=True, padding=True, max_length=64, return_tensors="pt")

train_dataset = WorkerDataset(train_encodings, train_labels)
val_dataset = WorkerDataset(val_encodings, val_labels)

# Define training arguments with reduced batch size and mixed precision
training_args = TrainingArguments(
    output_dir='./results',
    eval_strategy="epoch",
    per_device_train_batch_size=2,   # Reduce batch size if memory issues occur
    per_device_eval_batch_size=2,
    num_train_epochs=5,
    weight_decay=0.01,
    logging_dir='./logs',
    report_to="wandb",
    logging_steps=10,
    fp16=True,  # Mixed precision training for memory efficiency on compatible GPUs
)

# Define trainer
trainer = Trainer(
    model=model,
    args=training_args,
    train_dataset=train_dataset,
    eval_dataset=val_dataset
)

# Train the model
trainer.train()

# Step 6: Define the Ranking System for Worker Profiles

# Example job description
job_description = "Looking for an experienced electrician to troubleshoot and repair complex wiring issues in a commercial building."

# Define a function to rank workers by skill relevance, including years of experience
def rank_workers(job_description, worker_profiles):
    results = []

    for worker in worker_profiles:
        # Tokenize and get the model's relevance score
        inputs = tokenizer(job_description + " " + worker, return_tensors="pt", truncation=True, max_length=64)
        outputs = model(**inputs)
        base_score = torch.nn.functional.softmax(outputs.logits, dim=1)[0][1].item()  # Confidence of 'relevant' label

        # Extract years of experience from the worker profile
        experience_match = re.search(r'(\d+)\s+years?', worker.lower())
        years_of_experience = int(experience_match.group(1)) if experience_match else 0

        # Apply a boost based on years of experience and keyword presence
        experience_weight = 0.02  # Adjust this weight factor as needed
        experience_boost = years_of_experience * experience_weight
        keyword_boost = 0.1 if "commercial wiring" in worker.lower() else 0

        # Calculate final score
        final_score = base_score + experience_boost + keyword_boost
        results.append((worker, final_score))

    # Sort workers by relevance score in descending order
    ranked_workers = sorted(results, key=lambda x: x[1], reverse=True)
    return ranked_workers

# Worker profiles to rank
worker_profiles = [
    "Electrician with 5 years of commercial wiring experience.",
    "General maintenance worker with basic electrical knowledge.",
    "Certified electrician with 10 years of experience in commercial and residential wiring.",
    "Apprentice electrician with limited experience in residential wiring.",
    "Experienced technician specializing in HVAC systems.",
    "Electrician with a background in industrial settings and complex repairs."
]

# Rank workers by relevance
ranked_workers = rank_workers(job_description, worker_profiles)

# Output the ranking
print(f"Worker Ranking by Skill Relevance (Highest to Lowest) for {job_description}:")
for worker, score in ranked_workers:
    print(f"Worker: {worker}, Relevance Score: {score:.4f}")

## Schedule Optimizing

In [None]:
# Step 1: Install Necessary Libraries
!pip uninstall -y protobuf tensorflow-metadata grpcio-status
!pip install protobuf==4.21.6 tensorflow-metadata==1.13.1 grpcio-status==1.48.2 ortools requests

# Step 2: Import Libraries
import requests
from datetime import datetime, timedelta
from ortools.constraint_solver import routing_enums_pb2
from ortools.constraint_solver import pywrapcp
import os
from google.colab import userdata

# Step 3: Retrieve the API Key
# Retrieve the API key from user data (ensure it's set in your environment)
SKEDULO_API_KEY = userdata.get('SKEDULO_API_KEY')

# Ensure the API key is provided
if not SKEDULO_API_KEY:
    raise Exception("Skedulo API key not found. Please set 'SKEDULO_API_KEY'.")

# Step 4: Define Objective Weights
# You can adjust these weights to prioritize different objectives
cost_weight = 0.1             # Weight for minimizing total cost
workload_balance_weight = 1.0 # Weight for balancing workloads
distance_weight = 3.0         # Weight for minimizing total distance

# Step 5: Define Resource and Job Classes

# Define the Resource class
class Resource:
    def __init__(self, name, hourly_cost, skillset, home_address, shift_start, shift_end):
        self.name = name
        self.hourly_cost = hourly_cost
        self.skillset = skillset
        self.home_address = home_address
        self.shift_start = shift_start  # Shift start time in hours (e.g., 8 AM)
        self.shift_end = shift_end      # Shift end time in hours (e.g., 6 PM)
        self.lat = None  # Latitude (to be filled after geocoding)
        self.lng = None  # Longitude (to be filled after geocoding)

    def __repr__(self):
        return f"{self.name}, Cost: ${self.hourly_cost}/hr, Skills: {self.skillset}"

# Sample resources
resources = [
    Resource("Alice", 50, "Electrician with 5 years of commercial wiring experience", "100 Congress Ave, Austin, TX", 8, 18),
    Resource("Bob", 40, "Electrician with 3 years in residential and commercial repair", "Texas State Capitol, Austin, TX", 8, 18),
    Resource("Charlie", 60, "Electrician and safety inspector, 10 years in industrial settings", "University of Texas, Austin, TX", 8, 18)
]

# Define the Job class
class Job:
    def __init__(self, job_id, requirements, address, time_constraint_start, time_constraint_end, duration):
        self.job_id = job_id
        self.requirements = requirements
        self.address = address
        self.time_constraint_start = time_constraint_start  # Earliest start time (in hours)
        self.time_constraint_end = time_constraint_end      # Latest end time (in hours)
        self.duration = duration  # Duration in hours
        self.lat = None  # Latitude (to be filled after geocoding)
        self.lng = None  # Longitude (to be filled after geocoding)

    def __repr__(self):
        return f"Job {self.job_id}: {self.requirements}, Location: {self.address}, Duration: {self.duration} hrs"

# Sample jobs
jobs = [
    Job(1, "Experienced electrician to troubleshoot wiring issues in a commercial building", "123 4th St, Austin, TX", 9, 12, 2),
    Job(2, "Residential electrician to fix lighting issues", "500 W 2nd St, Austin, TX", 10, 13, 1.5),
    Job(3, "Commercial wiring for an office", "700 E 7th St, Austin, TX", 12, 15, 3),
    Job(4, "Install new electric panel", "100 E 10th St, Austin, TX", 11, 16, 2.5),
    Job(5, "Inspect wiring in an industrial setting", "300 N Lamar Blvd, Austin, TX", 9, 14, 2),
    Job(6, "Emergency repair on circuit breaker", "800 Lavaca St, Austin, TX", 13, 17, 1),
    Job(7, "Install safety switches", "200 W 1st St, Austin, TX", 9, 11, 1),
    Job(8, "Maintenance on electrical panel in office", "400 Guadalupe St, Austin, TX", 14, 17, 1.5)
]

# Step 6: Geocode Addresses Using Skedulo's Geocoding API

# Function to geocode addresses
def geocode_addresses(addresses):
    url = 'https://api.skedulo.com/geoservices/geocode'
    headers = {
        'Content-Type': 'application/json',
        'Authorization': f'Bearer {SKEDULO_API_KEY}'
    }
    body = {
        "addresses": addresses
    }
    response = requests.post(url, headers=headers, json=body)
    result = response.json()
    if response.status_code == 200 and 'result' in result:
        geocoded_locations = []
        for item in result['result']:
            if 'GeocodeSuccess' in item:
                location = item['GeocodeSuccess']['latlng']
                geocoded_locations.append((location['lat'], location['lng']))
            else:
                # Handle GeocodeError or missing data
                geocoded_locations.append((None, None))
        return geocoded_locations
    else:
        print("Geocoding response:", result)
        raise Exception("Error geocoding addresses:", result.get('message', 'Unknown error'))

# Geocode all addresses (homes and jobs)
all_addresses = [resource.home_address for resource in resources] + [job.address for job in jobs]
geocoded_locations = geocode_addresses(all_addresses)

# Assign latitudes and longitudes to resources and jobs
for i, resource in enumerate(resources):
    resource.lat, resource.lng = geocoded_locations[i]

for i, job in enumerate(jobs):
    job.lat, job.lng = geocoded_locations[len(resources) + i]

# Step 7: Prepare Distance and Time Matrices Using Skedulo's Distance Matrix API

# Function to get distance and time matrices
def get_distance_time_matrix(origins, destinations):
    url = 'https://api.skedulo.com/geoservices/distanceMatrix'
    headers = {
        'Content-Type': 'application/json',
        'Authorization': f'Bearer {SKEDULO_API_KEY}'
    }
    # Set the departure time to 5 minutes in the future
    departure_time = datetime.utcnow() + timedelta(minutes=5)
    departure_time = departure_time.strftime('%Y-%m-%dT%H:%M:%SZ')
    body = {
        "origins": [{"lat": origin[0], "lng": origin[1]} for origin in origins],
        "destinations": [{"lat": dest[0], "lng": dest[1]} for dest in destinations],
        "departureTime": departure_time
    }
    response = requests.post(url, headers=headers, json=body)
    result = response.json()

    if response.status_code == 200 and 'result' in result and 'matrix' in result['result']:
        distance_matrix = []
        time_matrix = []
        matrix = result['result']['matrix']
        for row in matrix:
            distance_row = []
            time_row = []
            for element in row:
                if element['status'] == 'OK':
                    distance_row.append(element['distance']['distanceInMeters'])
                    time_row.append(element['duration']['durationInSeconds'])
                else:
                    distance_row.append(9999999)
                    time_row.append(9999999)
            distance_matrix.append(distance_row)
            time_matrix.append(time_row)
        return distance_matrix, time_matrix
    else:
        print("Distance Matrix response:", result)
        message = result.get('message', 'Unknown error')
        if 'error' in result:
            message = result['error']
        elif 'errorType' in result:
            message = result['errorType']
        elif 'errors' in result:
            message = result['errors']
        raise Exception("Error fetching data from Skedulo Distance Matrix API:", message)

# Prepare locations
home_locations = [(resource.lat, resource.lng) for resource in resources]
job_locations = [(job.lat, job.lng) for job in jobs]

# Check for None values in locations (indicates geocoding failure)
if any(loc[0] is None or loc[1] is None for loc in home_locations + job_locations):
    raise Exception("One or more addresses could not be geocoded. Please verify all addresses.")

# Collect all necessary matrices
distance_matrices = {}
time_matrices = {}

# Step 8: Get Distance and Time Matrices Between Locations

# Home to Jobs
dist_home_to_jobs, time_home_to_jobs = get_distance_time_matrix(home_locations, job_locations)
distance_matrices['home_to_jobs'] = dist_home_to_jobs
time_matrices['home_to_jobs'] = time_home_to_jobs

# Jobs to Jobs
dist_jobs_to_jobs, time_jobs_to_jobs = get_distance_time_matrix(job_locations, job_locations)
distance_matrices['jobs_to_jobs'] = dist_jobs_to_jobs
time_matrices['jobs_to_jobs'] = time_jobs_to_jobs

# Jobs to Home (for returning)
dist_jobs_to_home, time_jobs_to_home = get_distance_time_matrix(job_locations, home_locations)
distance_matrices['jobs_to_home'] = dist_jobs_to_home
time_matrices['jobs_to_home'] = time_jobs_to_home

# Step 9: Build Combined Distance and Time Matrices

def build_combined_matrices(resources, jobs, distance_matrices, time_matrices):
    num_homes = len(resources)
    num_jobs = len(jobs)
    num_locations = num_homes + num_jobs

    # Initialize full matrices
    distance_matrix = [[0]*num_locations for _ in range(num_locations)]
    time_matrix = [[0]*num_locations for _ in range(num_locations)]

    # Fill in distances and times from homes to jobs
    for i in range(num_homes):
        for j in range(num_jobs):
            distance_matrix[i][num_homes + j] = distance_matrices['home_to_jobs'][i][j]
            time_matrix[i][num_homes + j] = time_matrices['home_to_jobs'][i][j]

    # Fill in distances and times from jobs to jobs
    for i in range(num_jobs):
        for j in range(num_jobs):
            distance_matrix[num_homes + i][num_homes + j] = distance_matrices['jobs_to_jobs'][i][j]
            time_matrix[num_homes + i][num_homes + j] = time_matrices['jobs_to_jobs'][i][j]

    # Fill in distances and times from jobs to homes
    for i in range(num_jobs):
        for j in range(num_homes):
            distance_matrix[num_homes + i][j] = distance_matrices['jobs_to_home'][i][j]
            time_matrix[num_homes + i][j] = time_matrices['jobs_to_home'][i][j]

    # Set large values for homes to homes to prevent travel between homes
    for i in range(num_homes):
        for j in range(num_homes):
            if i != j:
                distance_matrix[i][j] = 9999999
                time_matrix[i][j] = 9999999
            else:
                distance_matrix[i][j] = 0
                time_matrix[i][j] = 0

    return distance_matrix, time_matrix

# Build the combined matrices
distance_matrix, time_matrix = build_combined_matrices(resources, jobs, distance_matrices, time_matrices)

# Step 10: Prepare Service Times and Time Windows

# Create service times (duration) for each location (homes have zero duration)
service_times = [0]*len(resources) + [int(job.duration * 3600) for job in jobs]  # durations in seconds

# Create time windows for each location
time_windows = []

# Time windows for homes (resource shift times)
for resource in resources:
    start = resource.shift_start * 3600  # Convert hours to seconds
    end = resource.shift_end * 3600
    time_windows.append((start, end))

# Time windows for jobs
for job in jobs:
    start = job.time_constraint_start * 3600
    end = job.time_constraint_end * 3600
    time_windows.append((start, end))

# Function to convert seconds since midnight to time of day string
def seconds_to_time(sec):
    # sec is seconds since midnight
    hour = int(sec // 3600)
    minute = int((sec % 3600) // 60)
    am_pm = 'AM' if hour < 12 or hour == 24 else 'PM'
    hour_display = hour % 12
    if hour_display == 0:
        hour_display = 12
    return f"{hour_display}:{minute:02d}{am_pm}"

# Step 11: Create the Data Model for OR-Tools

def create_data_model():
    data = {}
    data['distance_matrix'] = distance_matrix
    data['time_matrix'] = time_matrix
    data['num_resources'] = len(resources)
    data['home_indices'] = list(range(len(resources)))
    data['service_times'] = service_times
    data['time_windows'] = time_windows

    # Calculate average costs for normalization
    data['avg_hourly_cost'] = sum(r.hourly_cost for r in resources) / len(resources)

    # Compute maximum values for normalization, excluding placeholder values
    data['max_travel_time'] = max(max(row) for row in time_matrix if max(row) < 9999999)
    data['max_distance'] = max(max(row) for row in distance_matrix if max(row) < 9999999)

    return data

# Step 12: Define the OR-Tools Scheduling Optimization Function

def or_tools_scheduling_optimization():
    data = create_data_model()
    num_locations = len(data['distance_matrix'])
    num_resources = data['num_resources']
    home_indices = data['home_indices']

    manager = pywrapcp.RoutingIndexManager(num_locations, num_resources, home_indices, home_indices)
    routing = pywrapcp.RoutingModel(manager)

    # Step 12.1: Define Combined Cost Function and Constraints

    # Define separate callbacks for each objective
    def distance_callback(from_index, to_index):
        from_node = manager.IndexToNode(from_index)
        to_node = manager.IndexToNode(to_index)
        return data['distance_matrix'][from_node][to_node]

    def cost_callback(from_index, to_index):
        from_node = manager.IndexToNode(from_index)
        to_node = manager.IndexToNode(to_index)
        travel_time = data['time_matrix'][from_node][to_node]
        if travel_time >= 9999999:
            return 9999999
        time_in_hours = travel_time / 3600.0
        cost = time_in_hours * data['avg_hourly_cost']
        return int(cost * 100)  # Scale up to preserve decimals

    # Register callbacks
    distance_callback_index = routing.RegisterTransitCallback(distance_callback)
    cost_callback_index = routing.RegisterTransitCallback(cost_callback)

    # Add distance dimension
    routing.AddDimension(
        distance_callback_index,
        0,  # null capacity slack
        1000000,  # vehicle maximum distance
        True,  # start cumul to zero
        'Distance'
    )
    distance_dimension = routing.GetDimensionOrDie('Distance')
    distance_dimension.SetGlobalSpanCostCoefficient(int(distance_weight * 100))

    # Add cost dimension
    routing.AddDimension(
        cost_callback_index,
        0,  # null capacity slack
        1000000,  # vehicle maximum cost
        True,  # start cumul to zero
        'Cost'
    )
    cost_dimension = routing.GetDimensionOrDie('Cost')
    cost_dimension.SetGlobalSpanCostCoefficient(int(cost_weight * 100))

    # Step 12.2: Define the Time Callback and Add Time Dimension

    # Add time callback for time windows
    def time_callback(from_index, to_index):
        from_node = manager.IndexToNode(from_index)
        to_node = manager.IndexToNode(to_index)
        travel_time = data['time_matrix'][from_node][to_node]
        service_time = data['service_times'][from_node]
        return travel_time + service_time

    time_callback_index = routing.RegisterTransitCallback(time_callback)

    # Add Time dimension
    routing.AddDimension(
        time_callback_index,
        30,  # Allow waiting time
        24 * 3600,  # Maximum time per vehicle
        False,  # Don't force start cumul to zero
        'Time'
    )
    time_dimension = routing.GetDimensionOrDie('Time')

    # Step 12.3: Add Time Window Constraints

    # Add time window constraints
    for location_idx, time_window in enumerate(data['time_windows']):
        if time_window[0] > time_window[1]:
            continue
        index = manager.NodeToIndex(location_idx)
        time_dimension.CumulVar(index).SetRange(int(time_window[0]), int(time_window[1]))

    # Set the time windows for start nodes
    for resource_id in range(num_resources):
        index = routing.Start(resource_id)
        time_window = data['time_windows'][resource_id]
        time_dimension.CumulVar(index).SetRange(int(time_window[0]), int(time_window[1]))

    # Step 12.4+: Add Workload Dimension to Balance Workloads

    # Add workload dimension for balancing
    def workload_callback(from_index):
        from_node = manager.IndexToNode(from_index)
        return data['service_times'][from_node]

    workload_callback_index = routing.RegisterUnaryTransitCallback(workload_callback)

    routing.AddDimension(
        workload_callback_index,
        0,          # No slack
        24 * 3600,  # Maximum workload
        True,       # Start cumul to zero
        'Workload'
    )
    workload_dimension = routing.GetDimensionOrDie('Workload')
    workload_dimension.SetGlobalSpanCostCoefficient(int(workload_balance_weight * 100))

    # Step 12.5: Allow Dropping Nodes with Penalties

    # Allow dropping jobs with penalties
    penalty = 1000000
    for node in range(len(resources), num_locations):
        routing.AddDisjunction([manager.NodeToIndex(node)], penalty)

    # Step 12.6: Define Search Parameters and Solve the Problem

    # Define search parameters
    search_parameters = pywrapcp.DefaultRoutingSearchParameters()
    search_parameters.first_solution_strategy = (
        routing_enums_pb2.FirstSolutionStrategy.PATH_CHEAPEST_ARC)
    search_parameters.local_search_metaheuristic = (
        routing_enums_pb2.LocalSearchMetaheuristic.GUIDED_LOCAL_SEARCH)
    search_parameters.time_limit.seconds = 60  # Increase time limit
    search_parameters.solution_limit = 200  # Increase solution limit

    # Solve the problem
    solution = routing.SolveWithParameters(search_parameters)

    # Check if a solution was found
    if solution:
        total_cost_all_resources = 0
        total_time_all_resources = 0
        total_travel_time_all_resources = 0
        total_travel_distance_all_resources = 0

        for resource_id in range(num_resources):
            index = routing.Start(resource_id)
            plan_output = f'Route for {resources[resource_id].name}:\n'
            previous_index = None
            total_travel_time = 0    # Initialize total travel time
            total_travel_distance = 0  # Initialize total travel distance
            while not routing.IsEnd(index):
                node_index = manager.IndexToNode(index)
                time_var = time_dimension.CumulVar(index)
                arrival_time = solution.Min(time_var)
                time_str = seconds_to_time(arrival_time)
                if node_index < len(home_indices):
                    if index == routing.Start(resource_id):
                        location = f"Home ({resources[resource_id].home_address}) Start Time({time_str})"
                    else:
                        location = f"Home ({resources[resource_id].home_address}) Arrival Time({time_str})"
                else:
                    job_index = node_index - len(home_indices)
                    location = f"Job {jobs[job_index].job_id} ({jobs[job_index].address}) Arrival Time({time_str})"
                plan_output += f"{location} -> "
                # Calculate travel time and distance from previous node to current node
                if previous_index is not None:
                    prev_node_index = manager.IndexToNode(previous_index)
                    travel_time = data['time_matrix'][prev_node_index][node_index]
                    travel_distance = data['distance_matrix'][prev_node_index][node_index]
                    total_travel_time += travel_time
                    total_travel_distance += travel_distance
                previous_index = index
                index = solution.Value(routing.NextVar(index))

            # Add end node
            time_var = time_dimension.CumulVar(index)
            arrival_time = solution.Min(time_var)
            time_str = seconds_to_time(arrival_time)
            plan_output += f"End Time({time_str})\n"

            # Calculate total time and cost for the resource
            start_time = solution.Min(time_dimension.CumulVar(routing.Start(resource_id)))
            end_time = solution.Min(time_dimension.CumulVar(routing.End(resource_id)))
            total_time = end_time - start_time

            # Use Decimal for precise calculations
            from decimal import Decimal, ROUND_HALF_UP
            total_work_time_hours = Decimal(total_time) / Decimal(3600)
            total_work_time_hours = total_work_time_hours.quantize(Decimal('0.01'), rounding=ROUND_HALF_UP)
            total_cost = resources[resource_id].hourly_cost * float(total_work_time_hours)
            total_cost = round(total_cost, 2)

            total_travel_time_hours = total_travel_time / 3600.0  # Convert to hours

            # Convert total travel distance from meters to kilometers and miles
            total_travel_distance_km = total_travel_distance / 1000.0
            total_travel_distance_miles = total_travel_distance_km * 0.621371

            # Append total travel time and distance to the output
            plan_output += f"Total Travel Time for {resources[resource_id].name}: {total_travel_time_hours:.2f} hours\n"
            plan_output += f"Total Travel Distance for {resources[resource_id].name}: {total_travel_distance_miles:.2f} miles ({total_travel_distance_km:.2f} km)\n"
            plan_output += f"Total Time for {resources[resource_id].name}: {total_work_time_hours} hours\n"
            plan_output += f"Total Cost for {resources[resource_id].name}: ${total_cost:.2f}\n"

            # Accumulate totals
            total_cost_all_resources += total_cost
            total_time_all_resources += float(total_work_time_hours) * 3600  # Convert hours back to seconds
            total_travel_time_all_resources += total_travel_time
            total_travel_distance_all_resources += total_travel_distance

            print(plan_output)

        # Convert accumulated totals to appropriate units
        total_time_all_hours = total_time_all_resources / 3600.0
        total_travel_time_all_hours = total_travel_time_all_resources / 3600.0
        total_travel_distance_all_km = total_travel_distance_all_resources / 1000.0
        total_travel_distance_all_miles = total_travel_distance_all_km * 0.621371

        print(f"Total cost for all resources: ${total_cost_all_resources:.2f}")
        print(f"Total time for all resources: {total_time_all_hours:.2f} hours")
        print(f"Total travel time for all resources: {total_travel_time_all_hours:.2f} hours")
        print(f"Total travel distance for all resources: {total_travel_distance_all_miles:.2f} miles ({total_travel_distance_all_km:.2f} km)\n")

        # List of unperformed nodes (dropped jobs)
        dropped_jobs = []
        for node in range(len(resources), num_locations):
            index = manager.NodeToIndex(node)
            if solution.Value(routing.NextVar(index)) == index:
                job_index = node - len(resources)
                dropped_jobs.append(jobs[job_index].job_id)
        if dropped_jobs:
            print(f"Jobs not assigned due to constraints: {dropped_jobs}")
        else:
            print("All jobs were assigned.")
    else:
        print('No solution found!')


# Step 14: Run the Scheduling Optimization

# or_tools_scheduling_optimization()

def run_optimization_with_weights(cost_w, workload_w, distance_w):
    global cost_weight, workload_balance_weight, distance_weight
    cost_weight = cost_w
    workload_balance_weight = workload_w
    distance_weight = distance_w
    print(f"\nRunning optimization with weights:")
    print(f"Cost weight: {cost_weight}")
    print(f"Workload balance weight: {workload_balance_weight}")
    print(f"Distance weight: {distance_weight}\n")
    or_tools_scheduling_optimization()

# Test different weight combinations
print("Testing optimization with distance priority:")
run_optimization_with_weights(0.1, 1.0, 3.0)

print("\nTesting optimization with cost priority:")
run_optimization_with_weights(3.0, 1.0, 0.1)


## Reinforement Learning Agent - Dynamic Event Handing



In [None]:
# Step 1: Install Necessary Libraries
print("Step 1: Install Necessary Libraries")
!pip install ortools requests gymnasium stable-baselines3 shimmy -q

# Step 2: Import Libraries
print("Step 2: Import Libraries")
import requests
from datetime import datetime, timedelta
from ortools.constraint_solver import routing_enums_pb2
from ortools.constraint_solver import pywrapcp
import os
import gymnasium as gym
from stable_baselines3 import PPO
from stable_baselines3.common.vec_env import DummyVecEnv
from stable_baselines3.common.evaluation import evaluate_policy
import numpy as np
from google.colab import userdata

# Step 3: Retrieve the API Key
print("Step 3: Retrieve the API Key")
# Retrieve the API key from user data (ensure it's set in your environment)
SKEDULO_API_KEY = userdata.get('SKEDULO_API_KEY')

# Ensure the API key is provided
if not SKEDULO_API_KEY:
    raise Exception("Skedulo API key not found. Please set 'SKEDULO_API_KEY'.")

# Step 4: Define Objective Weights
print("Step 4: Define Objective Weights")
# You can adjust these weights to prioritize different objectives
cost_weight = 0.1             # Weight for minimizing total cost
workload_balance_weight = 1.0 # Weight for balancing workloads
distance_weight = 3.0         # Weight for minimizing total distance

# Step 5: Define Resource and Job Classes
print("Step 5: Define Resource and Job Classes")
# Define the Resource class
class Resource:
    def __init__(self, name, hourly_cost, skillset, home_address, shift_start, shift_end):
        self.name = name
        self.hourly_cost = hourly_cost
        self.skillset = skillset
        self.home_address = home_address
        self.shift_start = shift_start  # Shift start time in hours (e.g., 8 AM)
        self.shift_end = shift_end      # Shift end time in hours (e.g., 6 PM)
        self.lat = None  # Latitude (to be filled after geocoding)
        self.lng = None  # Longitude (to be filled after geocoding)

    def __repr__(self):
        return f"{self.name}, Cost: ${self.hourly_cost}/hr, Skills: {self.skillset}"

# Sample resources
resources = [
    Resource("Alice", 50, "Electrician with 5 years of commercial wiring experience", "100 Congress Ave, Austin, TX", 8, 18),
    Resource("Bob", 40, "Electrician with 3 years in residential and commercial repair", "Texas State Capitol, Austin, TX", 8, 18),
    Resource("Charlie", 60, "Electrician and safety inspector, 10 years in industrial settings", "University of Texas, Austin, TX", 8, 18)
]

# Define the Job class
class Job:
    def __init__(self, job_id, requirements, address, time_constraint_start, time_constraint_end, duration):
        self.job_id = job_id
        self.requirements = requirements
        self.address = address
        self.time_constraint_start = time_constraint_start  # Earliest start time (in hours)
        self.time_constraint_end = time_constraint_end      # Latest end time (in hours)
        self.duration = duration  # Duration in hours
        self.lat = None  # Latitude (to be filled after geocoding)
        self.lng = None  # Longitude (to be filled after geocoding)

    def __repr__(self):
        return f"Job {self.job_id}: {self.requirements}, Location: {self.address}, Duration: {self.duration} hrs"

# Sample jobs
jobs = [
    Job(1, "Experienced electrician to troubleshoot wiring issues in a commercial building", "123 4th St, Austin, TX", 9, 12, 2),
    Job(2, "Residential electrician to fix lighting issues", "500 W 2nd St, Austin, TX", 10, 13, 1.5),
    Job(3, "Commercial wiring for an office", "700 E 7th St, Austin, TX", 12, 15, 3),
    Job(4, "Install new electric panel", "100 E 10th St, Austin, TX", 11, 16, 2.5),
    Job(5, "Inspect wiring in an industrial setting", "300 N Lamar Blvd, Austin, TX", 9, 14, 2),
    Job(6, "Emergency repair on circuit breaker", "800 Lavaca St, Austin, TX", 13, 17, 1),
    Job(7, "Install safety switches", "200 W 1st St, Austin, TX", 9, 11, 1),
    Job(8, "Maintenance on electrical panel in office", "400 Guadalupe St, Austin, TX", 14, 17, 1.5)
]

# Step 6: Geocode Addresses Using Skedulo's Geocoding API
print("Step 6: Geocode Addresses Using Skedulo's Geocoding API")
# Function to geocode addresses
def geocode_addresses(addresses):
    url = 'https://api.skedulo.com/geoservices/geocode'
    headers = {
        'Content-Type': 'application/json',
        'Authorization': f'Bearer {SKEDULO_API_KEY}'
    }
    body = {
        "addresses": addresses
    }
    response = requests.post(url, headers=headers, json=body)
    result = response.json()
    if response.status_code == 200 and 'result' in result:
        geocoded_locations = []
        for item in result['result']:
            if 'GeocodeSuccess' in item:
                location = item['GeocodeSuccess']['latlng']
                geocoded_locations.append((location['lat'], location['lng']))
            else:
                # Handle GeocodeError or missing data
                geocoded_locations.append((None, None))
        return geocoded_locations
    else:
        print("Geocoding response:", result)
        raise Exception("Error geocoding addresses:", result.get('message', 'Unknown error'))

# Geocode all addresses (homes and jobs)
all_addresses = [resource.home_address for resource in resources] + [job.address for job in jobs]
geocoded_locations = geocode_addresses(all_addresses)

# Assign latitudes and longitudes to resources and jobs
for i, resource in enumerate(resources):
    resource.lat, resource.lng = geocoded_locations[i]

for i, job in enumerate(jobs):
    job.lat, job.lng = geocoded_locations[len(resources) + i]

# Step 7: Prepare Distance and Time Matrices Using Skedulo's Distance Matrix API
print("Step 7: Prepare Distance and Time Matrices Using Skedulo's Distance Matrix API")
# Function to get distance and time matrices
def get_distance_time_matrix(origins, destinations):
    url = 'https://api.skedulo.com/geoservices/distanceMatrix'
    headers = {
        'Content-Type': 'application/json',
        'Authorization': f'Bearer {SKEDULO_API_KEY}'
    }
    # Set the departure time to 5 minutes in the future
    departure_time = datetime.utcnow() + timedelta(minutes=5)
    departure_time = departure_time.strftime('%Y-%m-%dT%H:%M:%SZ')
    body = {
        "origins": [{"lat": origin[0], "lng": origin[1]} for origin in origins],
        "destinations": [{"lat": dest[0], "lng": dest[1]} for dest in destinations],
        "departureTime": departure_time
    }
    response = requests.post(url, headers=headers, json=body)
    result = response.json()

    if response.status_code == 200 and 'result' in result and 'matrix' in result['result']:
        distance_matrix = []
        time_matrix = []
        matrix = result['result']['matrix']
        for row in matrix:
            distance_row = []
            time_row = []
            for element in row:
                if element['status'] == 'OK':
                    distance_row.append(element['distance']['distanceInMeters'])
                    time_row.append(element['duration']['durationInSeconds'])
                else:
                    distance_row.append(9999999)
                    time_row.append(9999999)
            distance_matrix.append(distance_row)
            time_matrix.append(time_row)
        return distance_matrix, time_matrix
    else:
        print("Distance Matrix response:", result)
        message = result.get('message', 'Unknown error')
        if 'error' in result:
            message = result['error']
        elif 'errorType' in result:
            message = result['errorType']
        elif 'errors' in result:
            message = result['errors']
        raise Exception("Error fetching data from Skedulo Distance Matrix API:", message)

# Prepare locations
home_locations = [(resource.lat, resource.lng) for resource in resources]
job_locations = [(job.lat, job.lng) for job in jobs]

# Check for None values in locations (indicates geocoding failure)
if any(loc[0] is None or loc[1] is None for loc in home_locations + job_locations):
    raise Exception("One or more addresses could not be geocoded. Please verify all addresses.")

# Collect all necessary matrices
distance_matrices = {}
time_matrices = {}

# Home to Jobs
dist_home_to_jobs, time_home_to_jobs = get_distance_time_matrix(home_locations, job_locations)
distance_matrices['home_to_jobs'] = dist_home_to_jobs
time_matrices['home_to_jobs'] = time_home_to_jobs

# Jobs to Jobs
dist_jobs_to_jobs, time_jobs_to_jobs = get_distance_time_matrix(job_locations, job_locations)
distance_matrices['jobs_to_jobs'] = dist_jobs_to_jobs
time_matrices['jobs_to_jobs'] = time_jobs_to_jobs

# Jobs to Home (for returning)
dist_jobs_to_home, time_jobs_to_home = get_distance_time_matrix(job_locations, home_locations)
distance_matrices['jobs_to_home'] = dist_jobs_to_home
time_matrices['jobs_to_home'] = time_jobs_to_home

# Step 8: Define the Custom Environment for RL
print("Step 8: Define the Custom Environment for RL")
class SchedulingEnv(gym.Env):
    def __init__(self, resources, jobs, distance_matrices, time_matrices):
        super(SchedulingEnv, self).__init__()
        self.resources = resources
        self.jobs = jobs
        self.distance_matrices = distance_matrices
        self.time_matrices = time_matrices
        self.num_resources = len(resources)
        self.num_jobs = len(jobs)
        self.action_space = gym.spaces.Discrete(self.num_resources * self.num_jobs + 1)  # +1 for the "no assignment" action
        self.observation_space = gym.spaces.Box(low=0, high=1, shape=(self.num_resources + self.num_jobs,), dtype=np.float32)
        self.state = self._get_initial_state()

    def _get_initial_state(self):
        # Initialize the state with zeros
        return np.zeros(self.num_resources + self.num_jobs)

    def reset(self, seed=None, options=None):  # Add seed and options arguments
        self.state = self._get_initial_state()
        return self.state, {}  # Return the initial state and an empty info dictionary

    def step(self, action):
        # Implement the step function to update the state based on the action
        # Calculate the reward and determine if the episode is done
        # Return the new state, reward, terminated, truncated, and additional info
        done = False
        reward = 0
        info = {}

        if action < self.num_resources * self.num_jobs:
            resource_idx = action // self.num_jobs
            job_idx = action % self.num_jobs
            self.state[resource_idx] += 1
            self.state[self.num_resources + job_idx] = 1
            reward = self._calculate_reward(resource_idx, job_idx)
        else:
            # No assignment action
            pass

        if np.sum(self.state[self.num_resources:]) == self.num_jobs:
            done = True

        terminated = done  # For now, we assume terminated and done are the same
        truncated = False  # For now, we assume no truncation

        return self.state, reward, terminated, truncated, info

    def _calculate_reward(self, resource_idx, job_idx):
        # Calculate the reward based on the current assignment
        resource = self.resources[resource_idx]
        job = self.jobs[job_idx]
        distance = self._calculate_distance(resource, job)
        cost = self._calculate_cost(resource, job)
        workload_balance = self._calculate_workload_balance()
        return -distance * distance_weight - cost * cost_weight - workload_balance * workload_balance_weight

    def _calculate_distance(self, resource, job):
        # Calculate the distance between the resource and the job
        return np.sqrt((resource.lat - job.lat)**2 + (resource.lng - job.lng)**2)

    def _calculate_cost(self, resource, job):
        # Calculate the cost of assigning the job to the resource
        return resource.hourly_cost * job.duration

    def _calculate_workload_balance(self):
        # Calculate the workload balance among resources
        workload = self.state[:self.num_resources]
        return np.std(workload)

    def handle_dynamic_event(self, event_type, **kwargs):
        reset_required = False
        if event_type == 'new_job':
            new_job = kwargs['new_job']
            self.jobs.append(new_job)
            self.num_jobs += 1
            reset_required = True
        elif event_type == 'cancel_job':
            job_id = kwargs['job_id']
            self.jobs = [job for job in self.jobs if job.job_id != job_id]
            self.num_jobs = len(self.jobs)
            reset_required = True
        elif event_type == 'reschedule_job':
            job_id = kwargs['job_id']
            new_job = kwargs['new_job']
            for i, job in enumerate(self.jobs):
                if job.job_id == job_id:
                    self.jobs[i] = new_job
                    break
        else:
            raise ValueError(f"Unknown event type: {event_type}")

        if reset_required:
            self.action_space = gym.spaces.Discrete(self.num_resources * self.num_jobs + 1)
            self.observation_space = gym.spaces.Box(low=0, high=1, shape=(self.num_resources + self.num_jobs,), dtype=np.float32)
            self.state = self._get_initial_state()

        return reset_required

# Step 9: Train the RL Agent
print("Step 9: Train the RL Agent")
def train_rl_agent(env):
    model = PPO('MlpPolicy', env, verbose=1)
    model.learn(total_timesteps=50000)
    return model

# Step 10: Evaluate the RL Agent
print("Step 10: Evaluate the RL Agent")
def evaluate_rl_agent(model, env, num_episodes=10):
    mean_reward, std_reward = evaluate_policy(model, env, n_eval_episodes=num_episodes)
    print(f"Mean reward: {mean_reward:.2f} +/- {std_reward:.2f}")

# Step 11: Run the RL Agent
print("Step 11: Run the RL Agent")
env = DummyVecEnv([lambda: SchedulingEnv(resources, jobs, distance_matrices, time_matrices)])
print("Environment created")
model = train_rl_agent(env)
print("Model trained")
evaluate_rl_agent(model, env)
print("Model evaluated")

# Step 12: Handle Dynamic Events and Re-optimize
print("Step 12: Handle Dynamic Events and Re-optimize")
def handle_dynamic_events(env, model):
    print("Handling dynamic events")
    # Example of adding a new job
    new_job = Job(9, "Emergency repair on circuit breaker", "900 Congress Ave, Austin, TX", 14, 17, 1)
    if env.envs[0].handle_dynamic_event('new_job', new_job=new_job):
        print("New job added, resetting environment")
        env.envs[0].observation_space = gym.spaces.Box(low=0, high=1, shape=(env.envs[0].num_resources + env.envs[0].num_jobs,), dtype=np.float32)
        env.reset()

    # Re-optimize the schedule
    obs = env.reset()
    done = False
    while not done:
        action, _states = model.predict(obs)
        obs, rewards, done, info = env.step(action)
        env.render()
        print(f"Step: {obs}, Reward: {rewards}, Done: {done}, Observation Space Shape: {env.observation_space.shape}")

    # Example of canceling a job
    job_id_to_cancel = 3
    if env.envs[0].handle_dynamic_event('cancel_job', job_id=job_id_to_cancel):
        print("Job canceled, resetting environment")
        env.envs[0].observation_space = gym.spaces.Box(low=0, high=1, shape=(env.envs[0].num_resources + env.envs[0].num_jobs,), dtype=np.float32)
        env.reset()

    # Re-optimize the schedule
    obs = env.reset()
    done = False
    while not done:
        action, _states = model.predict(obs)
        obs, rewards, done, info = env.step(action)
        env.render()
        print(f"Step: {obs}, Reward: {rewards}, Done: {done}, Observation Space Shape: {env.observation_space.shape}")

    # Example of rescheduling a job
    job_id_to_reschedule = 5
    rescheduled_job = Job(5, "Inspect wiring in an industrial setting", "300 N Lamar Blvd, Austin, TX", 10, 15, 2)
    env.envs[0].handle_dynamic_event('reschedule_job', job_id=job_id_to_reschedule, new_job=rescheduled_job)
    print("Job rescheduled")

    # Re-optimize the schedule
    obs = env.reset()
    done = False
    while not done:
        action, _states = model.predict(obs)
        obs, rewards, done, info = env.step(action)
        env.render()
        print(f"Step: {obs}, Reward: {rewards}, Done: {done}, Observation Space Shape: {env.observation_space.shape}")

# Step 13: Run the Dynamic Event Handling and Re-optimization
print("Step 13: Run the Dynamic Event Handling and Re-optimization")
handle_dynamic_events(env, model)
print("Dynamic events handled and re-optimization completed")

