<a href="https://colab.research.google.com/github/VinayNagamallaD9/SmartApplicationPerformanceMonitoring-Auto-Scaling/blob/main/Jupyter.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# **Problem Statement :**

Develop a machine learning model for Smart Application Performance Monitoring and Auto-Scaling based on the provided problem description, skills demonstrated, and desired deliverable. The model should predict demand, detect anomalies, and automatically scale resources.

## **Data collection and preprocessing**
* As  we  don’t have real data, so let's make some fake data that looks like app performance.

* Let us also include some missing parts and weird values to make it more realistic.

* At last we will clean the data to show how it is done in real situations.




In [None]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

# Creating a dataset
# Assuming hourly data over a year
date_rng = pd.date_range(start='2024-01-01', end='2024-12-31 23:00:00', freq='H')
df = pd.DataFrame(date_rng, columns=['timestamp'])
df['timestamp'] = pd.to_datetime(df['timestamp'])
df = df.set_index('timestamp')

# Generate synthetic performance metrics and demand
np.random.seed(15) # for reproducibility
df['cpu_usage'] = np.random.rand(len(df)) * 80 + 10         # Percentage(%)
df['memory_usage'] = np.random.rand(len(df)) * 70 + 15      # Percentage(%)
df['network_traffic'] = np.random.rand(len(df)) * 500 + 100 # Mbps
df['response_time'] = np.random.rand(len(df)) * 200 + 50    # Milliseconds(MS)
df['demand'] = np.random.rand(len(df)) * 1000 + 200         # Requests/hour

# Adding  some seasonality and trend to demand
df['demand'] = df['demand'] + 200 * np.sin(np.arange(len(df)) / (24 * 30) * 2 * np.pi) # Monthly seasonality
df['demand'] = df['demand'] + np.arange(len(df)) * 0.05 # Linear trend

# Including missing values
for col in ['cpu_usage', 'memory_usage', 'network_traffic', 'response_time', 'demand']:
    df.loc[df.sample(frac=0.05).index, col] = np.nan

# Introduce outliers
for col in ['cpu_usage', 'memory_usage', 'network_traffic', 'response_time', 'demand']:
    outlier_indices = df.sample(frac=0.01).index
    df.loc[outlier_indices, col] = df.loc[outlier_indices, col] * np.random.choice([2, 0.5]) + np.random.rand() * 500




**Data Collection in Real Scenario** :

In real life, data is collected from tools like Prometheus, Datadog, or application logs. These tools track things like CPU, memory, network, and response time. The data is gathered at regular time intervals and stored in databases for analysis.

# **Data Preprocessing :**

 we deal with :
 * Fix missing values using fill methods or by removing them.

* Handle outliers using stats like IQR or Z-score.

* Format the data with time as the index, and make sure it has a consistent time gap between entries.



In [None]:
# Preprocessing
# Handle missing values by interpolation
df_preprocessed = df.copy()
df_preprocessed = df_preprocessed.interpolate(method='time')

# Detecting outliers by InterQuartileRange
for col in ['cpu_usage', 'memory_usage', 'network_traffic', 'response_time', 'demand']:
    Q1 = df_preprocessed[col].quantile(0.25)
    Q3 = df_preprocessed[col].quantile(0.75)
    IQR = Q3 - Q1
    lower_bound = Q1 - 1.5 * IQR
    upper_bound = Q3 + 1.5 * IQR
    # Cap outliers
    df_preprocessed[col] = np.where(df_preprocessed[col] < lower_bound, lower_bound, df_preprocessed[col])
    df_preprocessed[col] = np.where(df_preprocessed[col] > upper_bound, upper_bound, df_preprocessed[col])


# Displaying Preprocessed Data
display(df_preprocessed.head())
display(df_preprocessed.info())


## Time series forecasting


Now Implementing a time series forecasting model to predict future resource demand based on historical patterns and seasonality.


We use **Prophet** tool which helps  to predict future data trends based on past time-series data.

In [None]:
from prophet import Prophet
from sklearn.metrics import mean_absolute_error, mean_squared_error
import matplotlib.pyplot as plt

# Prepare data for Prophet
# Prophet needs two columns: 'ds' for date/time & 'y' for value you want to predict.
df_prophet = df_preprocessed.reset_index()[['timestamp', 'demand']].rename(columns={'timestamp': 'ds', 'demand': 'y'})

# Split data into training and testing sets (80% train, 20% test)
train_size = int(len(df_prophet) * 0.8)
train_df = df_prophet[:train_size]
test_df = df_prophet[train_size:]

# Initialize & train the Prophet model
model = Prophet(seasonality_mode='additive', daily_seasonality=True)
model.fit(train_df)

# Make predictions on test set
future = model.make_future_dataframe(periods=len(test_df), freq='H', include_history=False)
forecast = model.predict(future)

# Evaluate the model
# Merge actual values with predictions for evaluation
evaluation_df = test_df.set_index('ds').join(forecast.set_index('ds')[['yhat']])
evaluation_df.dropna(inplace=True)
# Using dtopna to drop nullvalues

mae = mean_absolute_error(evaluation_df['y'], evaluation_df['yhat'])
mse = mean_squared_error(evaluation_df['y'], evaluation_df['yhat'])
rmse = np.sqrt(mse)

print(f'Mean Absolute Error: {mae}')
print(f'Mean Square Error: {mse}')
print(f'Root Mean Square Error: {rmse}')

# Plotting components of forecast
fig1 = model.plot_components(forecast)
plt.show()


## **Performance anomaly detection**

We need to develop a module to detect performance anomalies (e.g., sudden spikes in response time, errors) that might indicate a need for immediate scaling or investigation.


In [None]:
# Calculate rolling average and standard deviation (for 24 hours)
window_size = 24
df_preprocessed['response_time_mean'] = df_preprocessed['response_time'].rolling(window=window_size).mean()
df_preprocessed['response_time_std'] = df_preprocessed['response_time'].rolling(window=window_size).std()

# Set upper and lower limits by standard deviations
n_std = 3
df_preprocessed['response_time_upper_bound'] = df_preprocessed['response_time_mean'] + n_std * df_preprocessed['response_time_std']
df_preprocessed['response_time_lower_bound'] = df_preprocessed['response_time_mean'] - n_std * df_preprocessed['response_time_std']

# Mark points outside these limits as anomalies
df_preprocessed['response_time_anomaly'] = 0
df_preprocessed.loc[df_preprocessed['response_time'] > df_preprocessed['response_time_upper_bound'], 'response_time_anomaly'] = 1
df_preprocessed.loc[df_preprocessed['response_time'] < df_preprocessed['response_time_lower_bound'], 'response_time_anomaly'] = 1

# Displaying first 10 rows with all related columns
display(df_preprocessed[['response_time', 'response_time_mean', 'response_time_std',
                         'response_time_upper_bound', 'response_time_lower_bound', 'response_time_anomaly']].head(10))

# Displaying Anomaly rows only
display(df_preprocessed[df_preprocessed['response_time_anomaly'] == 1].head())


## **Scaling decision engine**


Define the scaling decision engine function that takes predicted demand, current resource utilization, and anomaly flags as input and outputs scaling actions based on predefined policies. Then apply this engine to the evaluation data and store the results.




In [None]:
# Function to decide scaling based on demand and anomalies
def scaling_decision_engine(predicted_demand, response_time_anomaly, up=1200, down=800):
    if response_time_anomaly == 1:
        return 'scale_up'
    elif predicted_demand > up:
        return 'scale_up'
    elif predicted_demand < down:
        return 'scale_down'
    else:
        return 'no_action'

# Combine forecast with needed metrics
evaluation_data = evaluation_df.copy()
evaluation_data = evaluation_data.join(df_preprocessed[['response_time_anomaly']])

# Apply the scaling decision for each row
evaluation_data['scaling_action'] = evaluation_data.apply(
    lambda row: scaling_decision_engine(row['yhat'], row['response_time_anomaly']),
    axis=1
)

# Show the results
display(evaluation_data.head())
display(evaluation_data['scaling_action'].value_counts())


## **Scaling execution module**
Our task is to :

* Implement a module to execute the scaling decisions by interacting with the cloud provider's API or infrastructure management tools to adjust resources


In [None]:
# Function to simulate scaling actions
def execute_scaling_action(action):
    if action == 'scale_up':
        print("Scaling up: Adding more resources...")
    elif action == 'scale_down':
        print("Scaling down: Reducing resources...")
    elif action == 'no_action':
        pass
    else:
        print(f"Unknown action: {action}")

# Run the simulation
print("Running scaling simulation:")
evaluation_data['scaling_action'].apply(execute_scaling_action)


## **Model evaluation and refinement**

Evaluate the performance of the forecasting and anomaly detection models. Monitor the effectiveness of the auto-scaling system in maintaining desired performance levels and resource utilization. Refine the models and scaling policies based on the evaluation results.


In [None]:
from sklearn.metrics import mean_absolute_error, mean_squared_error
import numpy as np

# Calculate and print evaluation metrics for the Prophet forecasting model
mae = mean_absolute_error(evaluation_df['y'], evaluation_df['yhat'])
mse = mean_squared_error(evaluation_df['y'], evaluation_df['yhat'])
rmse = np.sqrt(mse)

print("Forecasting Model Evaluation Metrics:")
print(f'Mean Absolute Error: {mae}')
print(f'Mean Square Error: {mse}')
print(f'Root Mean Square Error : {rmse}')

# Anomaly detection evaluation
anomaly_counts = evaluation_data['response_time_anomaly'].value_counts()
print("\nAnomaly Detection Evaluation:")
print(f"Number of detected anomalies: {anomaly_counts.get(1, 0)}")
print("Distribution of anomaly flags:")
print(anomaly_counts)

# Analyze the distribution of scaling actions
scaling_action_counts = evaluation_data['scaling_action'].value_counts()
print("\nScaling Action Distribution:")
print(scaling_action_counts)



**How to track if auto-scaling is working well**

How to Know if Auto-scaling Is Working ?

Look at these key things :


1. **Resource Usage:** Check CPU, memory, and network to make sure you're not wasting or overloading.

2. **App Speed:** Track how fast the app responds, especially during busy times.
3. **Cost:** Keep an eye on how much you're spending over time."
4. **Scaling Frequency:** See how often the system scales up/down and how fast it reacts.
5. Use **tools** like Prometheus, Grafana, or cloud monitoring services to collect and visualize this info."
6. **Review** **& adjust** scaling settings weekly or monthly based on system performance





## **Conclusion**:


*   We made fake data to act like an app's performance — like CPU use, memory, and user traffic.
*  We cleaned the data and fixed missing or unusual values.
*   We used a model (Prophet) to guess future demand. It did a good job.
*   We checked for weird spikes in response time and found 105 such cases.
*   A rule-based system decided when to scale up, scale down, or do nothing. Most of the time, it said "no action," but it scaled up 105 times.

*   We also made a fake version of what scaling would look like in real life.
*   Each part of the system was built separately, making it easy to understand and change later.

* To predict future demand, we used a forecasting model called Prophet. It gave decent results with:



1.   Mean Absolute Error : 67.44
2.   Mean Square Error: 6613.51
3.   Root Mean Square Error : 81.32







