____
## Anomaly Detection
____

### We assumed the following Processes are running on a System maintaining Normal Behaviour
 

| Process          | Max CPU Usage (%) | Max RAM Usage (MB) |
|------------------|-------------------|--------------------|
| Search           | 1                 | 1                  |
| Service Host     | 0.5               | 30                 |
| Runtime Service  | 15                | 200                |
| Application      | 70                | 4096               |
| Sync Service     | 0.7               | 20                 |
| Network Host     | 5                 | 50                 |
| Client Server    | 30                | 150                |
| Container        | 2                 | 100                |
| Train            | 99                | 10240              |


>Note: This is not in actual scene but assumed for generating the Synthetic Data for Training the ML model.

#### Importing the Libraries

In [1]:
import pandas as pd
import random

#### Generating the synthetic data for ML model training

In [9]:
# Define process names and their normal behavior limits
processes = ['Search', 'Service Host', 'Runtime Service', 'Application', 
             'Sync Service', 'Network Host', 'Client Server', 'Container', 'Train']

normal_limits = {
    'Search': {'max_cpu_usage': 1, 'max_ram_usage': 1},
    'Service Host': {'max_cpu_usage': 0.5, 'max_ram_usage': 30},
    'Runtime Service': {'max_cpu_usage': 15, 'max_ram_usage': 200},
    'Application': {'max_cpu_usage': 70, 'max_ram_usage': 4096},
    'Sync Service': {'max_cpu_usage': 0.7, 'max_ram_usage': 20},
    'Network Host': {'max_cpu_usage': 5, 'max_ram_usage': 50},
    'Client Server': {'max_cpu_usage': 30, 'max_ram_usage': 150},
    'Container': {'max_cpu_usage': 2, 'max_ram_usage': 100},
    'Train': {'max_cpu_usage': 99, 'max_ram_usage': 10240}
}

# Generate synthetic data
rows = 4000
data = []
for _ in range(rows):
    process = random.choice(processes)
    max_cpu_usage = round(random.uniform(normal_limits[process]['max_cpu_usage'], normal_limits[process]['max_cpu_usage'] * 2), 2)
    max_ram_usage = round(random.uniform(normal_limits[process]['max_ram_usage'], normal_limits[process]['max_ram_usage'] * 2), 2)
    
    # Determine behavior
    if random.random() < 0.5:  # 50% chance for abnormal behavior
        max_cpu_usage *= round(random.uniform(2, 4),2)  # Increase CPU usage
        max_ram_usage *= round(random.uniform(2, 4),2)  # Increase RAM usage
        behaviour = 'Abnormal'
    else:
        behaviour = 'Normal'
    
    data.append([process, round(max_cpu_usage,2), round(max_ram_usage), behaviour])

# Create a DataFrame
df = pd.DataFrame(data, columns=['Process', 'max_cpu_usage', 'max_ram_usage', 'Behaviour'])

# Save the DataFrame to an Excel file
df.to_excel('synthetic_data_for_anomaly_detection.xlsx', index=False)
print("Data Sucessfully Generated with {rows} rows")


Data Sucessfully Generated with {rows} rows


### Training the ML model using RandomForest Classifier

In [1]:
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestClassifier
import joblib

In [3]:
# Load the synthetic data
df = pd.read_excel('synthetic_data_for_anomaly_detection.xlsx')

# Split the data into features and target
X = df[['max_cpu_usage', 'max_ram_usage']]
y = df['Behaviour']

# Split the data into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Train a simple Random Forest classifier
clf = RandomForestClassifier(n_estimators=100, random_state=42)
clf.fit(X_train, y_train)

# Evaluate the model
train_accuracy = clf.score(X_train, y_train)
test_accuracy = clf.score(X_test, y_test)

print(f"Training Accuracy: {train_accuracy}")
print(f"Testing Accuracy: {test_accuracy}")

# Save the trained model
joblib.dump(clf, 'anomaly_model.pkl')


Training Accuracy: 0.9990625
Testing Accuracy: 0.90625


['anomaly_model.pkl']

### Inferencing the sample output

In [4]:
import pandas as pd
import joblib

# Load the saved model
clf = joblib.load('anomaly_model.pkl')

# Define custom data for inference
custom_data = pd.DataFrame({'max_cpu_usage': [20.5, 90.2, 1.8],
                            'max_ram_usage': [150, 4000, 25.6]})

# Make inference with the custom data
predictions = clf.predict(custom_data)

# Display the predictions
for idx, pred in enumerate(predictions):
    print(f"Custom Data {idx + 1}: Predicted Behaviour - {pred}")


Custom Data 1: Predicted Behaviour - Abnormal
Custom Data 2: Predicted Behaviour - Normal
Custom Data 3: Predicted Behaviour - Normal
