# Interactive Machine Data Analysis
This notebook demonstrates how to run the LLMAD anomaly detection on Machine data.

### 1. Load Environment Variables

First, we load the necessary environment variables from the `.env` file, such as API keys and the model engine to be used.

In [None]:
import os
from dotenv import load_dotenv

# Load environment variables from .env file
load_dotenv()

# You can check the loaded variables here
print(f"Using Model: {os.getenv('MODEL_ENGINE')}")

### 2. Set Experiment Parameters

Configure the parameters for the analysis.

In [None]:
WINDOW_SIZE = 100
PROMPT_MODE = 5 # For Machine dataset
DATA_ROOT_DIR = "data/air"
SAVE_DIR = "result/machine_test"
VALUE_COL = "param1"
LABEL_COL = "label"
TEST_RATIO = 1.0
RETRIEVE_POSITIVE_NUM = 0
RETRIEVE_NEGATIVE_NUM = 0
DATA_DESCRIPTION = "The data contains sensor readings from a machine."

In [None]:
import datetime

# Get model name for suffix from environment variables
model_engine_name = os.getenv("MODEL_ENGINE", "unknown_model")
# Extract the main part of the model name for the suffix (e.g., 'gpt' or 'gemini')
model_suffix = model_engine_name.split('-')[0]
timestamp = datetime.datetime.now().strftime("%Y%m%d%H%M")

# Construct the run name for organization
RUN_NAME = f"Machine_interactive_prompt_{PROMPT_MODE}_win_{WINDOW_SIZE}_{model_suffix}_{timestamp}"

# Create the target directory for results
RESULT_DIR = os.path.join(SAVE_DIR, RUN_NAME)
os.makedirs(RESULT_DIR, exist_ok=True)

print(f"Results will be saved in: {RESULT_DIR}")

## 3. Run Inference
Run the LLMAD model on the data.

In [None]:
!python src/main.py \
    --infer_data_path {DATA_ROOT_DIR} \
    --retreive_data_path {DATA_ROOT_DIR} \
    --sub_company all \
    --window_size {WINDOW_SIZE} \
    --prompt_mode {PROMPT_MODE} \
    --result_save_dir {SAVE_DIR} \
    --run_name {RUN_NAME} \
    --value_col {VALUE_COL} \
    --label_col {LABEL_COL} \
    --prompt_extra_cols Machine Stage \
    --data_description "{DATA_DESCRIPTION}" \
    --no_affine_transform \
    --retrieve_positive_num {RETRIEVE_POSITIVE_NUM} \
    --retrieve_negative_num {RETRIEVE_NEGATIVE_NUM} \
    --cross_retrieve False \
    --test_ratio {TEST_RATIO}

## 4. Evaluate Metrics

After the detection process is complete, run the evaluation script to calculate performance metrics.

In [None]:
!python Eval/Eval_machine.py --path {RESULT_DIR}

### 5. Visualize the Results Directly

This section provides the code to visualize the results directly within this notebook. The code below will find the `predict.csv` files in the result directory and plot them.

In [None]:
import os
import pandas as pd
import matplotlib.pyplot as plt
import glob
%matplotlib inline

# Find all predict.csv files in the result directory
predict_files = glob.glob(os.path.join(RESULT_DIR, '**', 'predict.csv'), recursive=True)

print(f"Searching for result files in: {RESULT_DIR}")
print(f"Found {len(predict_files)} result file(s) to visualize.")

for file_path in predict_files:
    try:
        df = pd.read_csv(file_path)
        if df.empty:
            print(f"Skipping empty file: {file_path}")
            continue
            
        # Extract file name/directory from path
        file_name = os.path.basename(os.path.dirname(file_path))
        
        plt.figure(figsize=(15, 5))
        plt.plot(df['value'], label='Value', alpha=0.7, color='blue')
        
        # Highlight anomalies
        if 'predict' in df.columns:
            anomalies = df[df['predict'] == 1]
            plt.scatter(anomalies.index, anomalies['value'], color='red', label='Predicted Anomaly', zorder=5)
            
        if 'label' in df.columns:
            true_anomalies = df[df['label'] == 1]
            plt.scatter(true_anomalies.index, true_anomalies['value'], marker='x', color='green', label='True Anomaly', s=50, zorder=6)
        
        # Add metadata to title if available
        title = f'Anomaly Detection Results: {file_name}'
        if 'Machine' in df.columns and not df['Machine'].empty:
            machine = df['Machine'].iloc[0]
            title += f" | Machine: {machine}"
        if 'Stage' in df.columns and not df['Stage'].empty:
            stage = df['Stage'].iloc[0]
            title += f" | Stage: {stage}"
            
        plt.title(title)
        plt.legend()
        plt.tight_layout()
        plt.show()
        
        print(f"First 5 rows of {file_name}:")
        display(df.head())
        
    except Exception as e:
        print(f"Error visualizing {file_path}: {e}")