# üìå 05 ‚Äì Predicting Anomalies on New Logs

‚ÄúThis notebook loads the trained models and performs anomaly prediction on new HTTP logs, demonstrating real-world detection capability.‚Äù


*‚Äì Imports :*

In [7]:
import sys, os
sys.path.append(os.path.abspath("../src"))

import pandas as pd
import joblib
import tempfile

*‚Äî Import official Apache parser :*

In [8]:
from parse_logs import parse_apache_log_lines
from predict import predict_file

print("‚úî Apache parser imported")
print("‚úî predict_file imported")

‚úî Apache parser imported
‚úî predict_file imported


‚≠ê Test ‚Äî Example: Predict anomalies from sample_access.log : 

*‚Äî Load log file :*

In [9]:
log_path = "../data/sample_access.log"

with open(log_path, "r", encoding="utf-8", errors="ignore") as f:
    lines = f.readlines()

print("‚úî Loaded", len(lines), "log lines")

‚úî Loaded 94 log lines


*‚Äì‚Äî Parse into a DataFrame compatible with features.py :*

In [10]:
df_logs = parse_apache_log_lines(lines)
print("Parsed rows:", len(df_logs))
df_logs.head()

Parsed rows: 94


Unnamed: 0,method,url,protocol,status,content_length,user_agent,cookie,content_type,body
0,GET,/index.jsp,HTTP/1.1,200,532,Mozilla/5.0,,,
1,GET,/tienda1/publico/anadir.jsp?id=3&nombre=Vino,HTTP/1.1,200,645,Mozilla/5.0,,,
2,GET,/images/logo.png,HTTP/1.1,200,1203,Chrome/120.0,,,
3,GET,/products/list,HTTP/1.1,200,900,Safari/17.0,,,
4,GET,/contact,HTTP/1.1,200,800,Firefox/109.0,,,


*‚Äî Ensure required columns for build_features() :*

In [11]:
required_cols = [
    "url", "method", "body", "user_agent",
    "cookie", "content_type", "content_length"
]

for col in required_cols:
    if col not in df_logs.columns:
        df_logs[col] = ""

# Ensure numeric
df_logs["content_length"] = df_logs["content_length"].astype(str)

df_logs = df_logs[required_cols]

df_logs.head()


Unnamed: 0,url,method,body,user_agent,cookie,content_type,content_length
0,/index.jsp,GET,,Mozilla/5.0,,,532
1,/tienda1/publico/anadir.jsp?id=3&nombre=Vino,GET,,Mozilla/5.0,,,645
2,/images/logo.png,GET,,Chrome/120.0,,,1203
3,/products/list,GET,,Safari/17.0,,,900
4,/contact,GET,,Firefox/109.0,,,800


*‚Äî Save temp CSV then run predict_file() :*

In [12]:
temp_file = tempfile.NamedTemporaryFile(delete=False, suffix=".csv")
df_logs.to_csv(temp_file.name, index=False, encoding="utf-8")

print("‚úî Temporary CSV created at:")
print(temp_file.name)

results = predict_file(temp_file.name)

print("\n===  Predictions Preview ===")
results.head()

‚úî Temporary CSV created at:
C:\Users\ok\AppData\Local\Temp\tmp5jjfslbv.csv

===  Predictions Preview ===


Unnamed: 0,url,method,body,user_agent,cookie,content_type,content_length,RF_Prediction,ISO_Prediction,Anomaly
0,/index.jsp,GET,,Mozilla/5.0,,,532,1,0,1
1,/tienda1/publico/anadir.jsp?id=3&nombre=Vino,GET,,Mozilla/5.0,,,645,1,0,1
2,/images/logo.png,GET,,Chrome/120.0,,,1203,1,0,1
3,/products/list,GET,,Safari/17.0,,,900,1,0,1
4,/contact,GET,,Firefox/109.0,,,800,1,0,1
