### IT Security Monthly Cost Prediction

Imagine you track the following metrics each month:

- NumPhishingEmails: Number of phishing emails detected.
- NumMalwareInfections: Number of malware infections found.
- AverageResponseTime: Average incident response time (in hours).
- SecurityCost: Total security-related costs (in thousands of dollars).

You want to see if you can predict **SecurityCost** from these other features. Below is a minimal Python script using pandas and scikit-learn for linear regression.

In [25]:
import pandas as pd
from sklearn.linear_model import LinearRegression

# Synthetic cybersecurity data
data = {
    'Month':                list(range(1, 13)),  # 1 to 12
    'NumPhishingEmails':    [50, 60, 55, 65, 80, 90, 120, 100, 95, 85, 70, 60],
    'NumMalwareInfections': [5,  7,  6,  8,  10, 12, 15,  14, 11,  9,  8,  6],
    'AverageResponseTime':  [2.0,2.5,2.2,2.8,3.0,3.5,4.0, 3.8,3.6,3.2,2.7,2.4],
    'SecurityCost':         [5,  6,  5.5,7,  8,  9,  11,  10.5,9.5,8.5,7,  6.5]
}
df = pd.DataFrame(data)
df.head()


Unnamed: 0,Month,NumPhishingEmails,NumMalwareInfections,AverageResponseTime,SecurityCost
0,1,50,5,2.0,5.0
1,2,60,7,2.5,6.0
2,3,55,6,2.2,5.5
3,4,65,8,2.8,7.0
4,5,80,10,3.0,8.0


In [27]:
# Features (X) are the explanatory variables
X = df[['NumPhishingEmails', 'NumMalwareInfections', 'AverageResponseTime']]

# Target (y) is the monthly SecurityCost
y = df['SecurityCost']


In [29]:
# Create a LinearRegression model
model = LinearRegression()

# Fit the model on the entire dataset (for demo purposes)
model.fit(X, y)

# Display the learned coefficients
print("Intercept (beta_0):", model.intercept_)
print("Coefficients (beta_1, beta_2, ...):", model.coef_)


Intercept (beta_0): -0.4980357201809076
Coefficients (beta_1, beta_2, ...): [0.02919927 0.00794791 2.00108943]


In [31]:
# Show Prediction Results
y_pred = model.predict(X)
df['PredictedCost'] = y_pred
df[['NumPhishingEmails','NumMalwareInfections','AverageResponseTime','SecurityCost','PredictedCost']].head()


Unnamed: 0,NumPhishingEmails,NumMalwareInfections,AverageResponseTime,SecurityCost,PredictedCost
0,50,5,2.0,5.0,5.003846
1,60,7,2.5,6.0,6.312279
2,55,6,2.2,5.5,5.558008
3,65,8,2.8,7.0,7.06655
4,80,10,3.0,8.0,7.920653


In [33]:
# Print Regression Metrics
from sklearn.metrics import mean_squared_error, r2_score

mse = mean_squared_error(y, y_pred)
r2 = r2_score(y, y_pred)
print("Mean Squared Error:", mse)
print("R^2 Score:", r2)


Mean Squared Error: 0.03961796270619445
R^2 Score: 0.9886071160665162
