# Notebook Summary
**Goal:** 
Apply anomaly detection using Prophet for Betacom Data. 

For a column of interest, e.g. **peak_upload_speed** from Customer-DFW, we use past upload data to make prediction for a future data point, and detect whether the actual data falls outside prediction upperbond or lowerbond. In the end, graphs for labeled Anomalies will be plot.



**Prophet Algorithm Documentation:**

Link: https://facebook.github.io/prophet/docs/quick_start.html



**Input 3 Datasets:** 
- Gyan-Database: core_stats (RAN data)
- Gyan-Database: randomized core_stats
- Prometheus Database: **discard** now due to low data quality

**Output:**
1. Anomaly graphs generate for each column.
2. Prediction for featured Dataframe




**Notebook Outline:**
1. Functions using Prophet

 * fit_predict_model:
     - Build a Prophet Model
     - Fit the model
     - Make prediction 
 * detect_anomalies: 

2. Retrieve Data from CORE_stats customer data
3. Retrieve Data from CORE_stats randomized Data
4. Apply prediction and anomaly labeling for multiple columns and save graphs

## Prophet Functions

In [4]:
from prophet import Prophet


def fit_predict_model(dataframe, interval_width=0.99, changepoint_range=0.8):
    '''
        Input: 

        Output: a forecasted dataframe includes

    '''

    m = Prophet(daily_seasonality=True, yearly_seasonality=True, weekly_seasonality=True,
                seasonality_mode='multiplicative',
                interval_width=interval_width,
                changepoint_range=changepoint_range)
    m = m.fit(dataframe)
    forecast = m.predict(dataframe)
    forecast['fact'] = dataframe['y'].reset_index(drop=True)
    return forecast

def detect_anomalies(forecast):
    '''
    What it does:  based on rule: label anomaly data point based on whether the actual data is greater than the upper bond of prediction or smaller than the lower bond of the prediction.

    Input: forecast dataframe from Prophet model.
    Output: forecast dataframe with anomlies labeled. 

    '''
    forecasted = forecast[['ds', 'trend', 'yhat',
                           'yhat_lower', 'yhat_upper', 'fact']].copy()

    forecasted['anomaly'] = 0
    forecasted.loc[forecasted['fact'] >
                   forecasted['yhat_upper'], 'anomaly'] = 1
    forecasted.loc[forecasted['fact'] <
                   forecasted['yhat_lower'], 'anomaly'] = -1

    # anomaly importances
    forecasted['importance'] = 0
    forecasted.loc[forecasted['anomaly'] == 1, 'importance'] = \
        (forecasted['fact'] - forecasted['yhat_upper'])/forecast['fact']
    forecasted.loc[forecasted['anomaly'] == -1, 'importance'] = \
        (forecasted['yhat_lower'] - forecasted['fact'])/forecast['fact']

    return forecasted


import altair as alt


def plot_anomalies(forecasted):
    '''

    '''

    interval = alt.Chart(forecasted).mark_area(interpolate="basis", color='#7FC97F').encode(
        x=alt.X('ds:T',  title='date'),
        y='yhat_upper',
        y2='yhat_lower',
        tooltip=['ds', 'fact', 'yhat_lower', 'yhat_upper']
    ).interactive().properties(
        title='Anomaly Detection'
    )

    fact = alt.Chart(forecasted[forecasted.anomaly == 0]).mark_circle(size=15, opacity=0.7, color='Black').encode(
        x='ds:T',
        y=alt.Y('fact', title='sales'),
        tooltip=['ds', 'fact', 'yhat_lower', 'yhat_upper']
    ).interactive()

    anomalies = alt.Chart(forecasted[forecasted.anomaly != 0]).mark_circle(size=30, color='Red').encode(
        x='ds:T',
        y=alt.Y('fact', title='PeakUpload Speed'),
        tooltip=['ds', 'fact', 'yhat_lower', 'yhat_upper'],
        size=alt.Size('importance', legend=None)
    ).interactive()

    return alt.layer(interval, fact, anomalies)\
              .properties(width=870, height=450)\
              .configure_title(fontSize=20)

## Load Core_stats from Database

In [36]:
import numpy as np
import mysql.connector
import pandas as pd
from pandas_profiling import ProfileReport

# Initiate with Parameters
db_name = "core_stats"
col = "peak_upload_speed"


# Start Database Connection
db_connection = mysql.connector.connect(
    host="10.1.2.10",
    user="gyan",
    password="5Gaa$2022",
    database="gyan_db"
)

# Create Database Cursor for SQL Queries
mycursor = db_connection.cursor()
mycursor.execute("SELECT * FROM {} LIMIT 5".format(db_name))

myresult = mycursor.fetchall()
for x in myresult:
    print(x)

# Load data from database and store as pandas Dataframe
df = pd.read_sql(
    'SELECT * FROM gyan_db.core_stats WHERE client_id not in ("AMD001","BELAP001","DFQ001","BELPROW001","BETBEL01GYN001")'.format(db_name), con=db_connection)
df.head()

('AMD001', datetime.datetime(2022, 2, 21, 9, 35), 0, 0, 3, 1, 0, 0, 0, 0, 8, 1, 6, 9.71, None, None)
('AMD001', datetime.datetime(2022, 2, 21, 10, 10), 1, 0, 4, 2, 0, 0, 1, 0, 9, 3, 5, 4.37, None, None)
('AMD001', datetime.datetime(2022, 2, 21, 10, 15), 1, 0, 4, 0, 1, 0, 0, 10, 8, 5, 6, 3.56, None, None)
('AMD001', datetime.datetime(2022, 2, 21, 10, 35), 0, 0, 1, 0, 0, 0, 1, 10, 8, 2, 4, 5.59, None, None)
('AMD001', datetime.datetime(2022, 2, 21, 10, 55), 1, 0, 5, 1, 0, 0, 0, 10, 7, 4, 6, 9.05, None, None)




Unnamed: 0,client_id,stats_timestamp,total_attached_user,total_rejected_user,peak_upload_speed,peak_download_speed,enodeb_shutdown_count,handover_failure_count,bearer_active_user_count,bearer_rejected_user_count,total_users,total_dropped_packets,enodeb_connected_count,enodeb_connection_status,total_tx_data,total_rx_data
0,BETAZRPDCOR001,2022-03-01 14:55:27,2,0,45751,0,0,0,0,0,0,1,2,33.3333,,
1,BETAZRPDCOR001,2022-03-01 15:16:59,2,0,45901,0,0,0,0,0,0,1,2,33.3333,,
2,BETAZRPDCOR001,2022-03-01 15:18:01,2,0,45901,0,0,0,0,0,0,1,2,33.3333,,
3,BETAZRPDCOR001,2022-03-01 15:24:02,2,0,45921,0,0,0,0,0,0,1,2,33.3333,,
4,BETAZRPDCOR001,2022-03-01 15:30:02,2,0,45940,0,0,0,0,0,0,1,2,33.3333,,


In [5]:
df_prophet = df[["stats_timestamp", "peak_upload_speed"]].rename(
    columns={"stats_timestamp": "ds", "peak_upload_speed": "y"})

In [6]:
df_prophet.head()

Unnamed: 0,ds,y
0,2022-03-01 14:55:27,45751
1,2022-03-01 15:16:59,45901
2,2022-03-01 15:18:01,45901
3,2022-03-01 15:24:02,45921
4,2022-03-01 15:30:02,45940


### Predicted Anomlies Result

In [7]:
pred = fit_predict_model(df_prophet)
pred_anomalies = detect_anomalies(pred)
pred_anomalies

Unnamed: 0,ds,trend,yhat,yhat_lower,yhat_upper,fact,anomaly,importance
0,2022-03-01 14:55:27,47438.579147,47438.579147,-47394.045623,136811.974592,45751,0,0.0
1,2022-03-01 15:16:59,47463.180640,47463.180640,-41417.603983,135433.165061,45901,0,0.0
2,2022-03-01 15:18:01,47464.361207,47464.361207,-41682.830096,147563.611891,45901,0,0.0
3,2022-03-01 15:24:02,47471.235153,47471.235153,-49154.704636,141288.202322,45921,0,0.0
4,2022-03-01 15:30:02,47478.090058,47478.090058,-35212.582023,146174.128465,45940,0,0.0
...,...,...,...,...,...,...,...,...
97287,2022-07-22 09:05:04,30810.000933,30810.000933,-64146.610172,133106.847070,0,0,0.0
97288,2022-07-22 09:05:34,30809.982483,30809.982483,-65816.976455,124641.107599,0,0,0.0
97289,2022-07-22 09:07:53,30809.896998,30809.896998,-54764.529247,121613.610598,0,0,0.0
97290,2022-07-22 09:08:47,30809.863788,30809.863788,-56111.413807,120279.857919,0,0,0.0


In [8]:
pred_anomalies["anomaly"].sum()/pred_anomalies.shape[0]

0.007564856308843481

### Anomaly Detection Graph

In [9]:
plot_anomalies(pred_anomalies[:5000])

## Load Generated Random Data from Database

In [37]:
# Load data from database and store as pandas Dataframe
df_rand = pd.read_sql(
    'SELECT * FROM gyan_db.core_stats WHERE client_id in ("BETBEL01GYN001")'.format(db_name), con=db_connection)
df_rand.head()



Unnamed: 0,client_id,stats_timestamp,total_attached_user,total_rejected_user,peak_upload_speed,peak_download_speed,enodeb_shutdown_count,handover_failure_count,bearer_active_user_count,bearer_rejected_user_count,total_users,total_dropped_packets,enodeb_connected_count,enodeb_connection_status,total_tx_data,total_rx_data
0,BETBEL01GYN001,2022-07-14 10:56:50,0,0,28251,0,0,0,0,0,0,0,0,0.0,,
1,BETBEL01GYN001,2022-07-14 11:05:55,0,0,28251,0,0,0,0,0,0,0,0,0.0,,
2,BETBEL01GYN001,2022-07-14 11:07:04,0,0,28251,0,0,0,0,0,0,0,0,0.0,,
3,BETBEL01GYN001,2022-07-14 11:08:03,0,0,28251,0,0,0,0,0,0,0,0,0.0,,
4,BETBEL01GYN001,2022-07-14 11:08:05,0,0,28251,0,0,0,0,0,0,0,0,0.0,,


In [38]:
df_rand_prophet = df_rand[["stats_timestamp", "peak_upload_speed"]].rename(
    columns={"stats_timestamp": "ds", "peak_upload_speed": "y"})

df_rand_prophet.head()

Unnamed: 0,ds,y
0,2022-07-14 10:56:50,28251
1,2022-07-14 11:05:55,28251
2,2022-07-14 11:07:04,28251
3,2022-07-14 11:08:03,28251
4,2022-07-14 11:08:05,28251


### Predicted Anomlies Result

In [39]:
pred_rand = fit_predict_model(df_rand_prophet)
pred_anomalies_rand = detect_anomalies(pred_rand)
pred_anomalies_rand

Unnamed: 0,ds,trend,yhat,yhat_lower,yhat_upper,fact,anomaly,importance
0,2022-07-14 10:56:50,32656.919977,32656.919977,16391.250064,48727.263683,28251,0,0.000000
1,2022-07-14 11:05:55,32675.987342,32675.987342,15477.754578,50210.492672,28251,0,0.000000
2,2022-07-14 11:07:04,32678.401375,32678.401375,16567.814068,49407.782495,28251,0,0.000000
3,2022-07-14 11:08:03,32680.465549,32680.465549,16513.229612,49461.807837,28251,0,0.000000
4,2022-07-14 11:08:05,32680.535521,32680.535521,14978.956932,49604.344629,28251,0,0.000000
...,...,...,...,...,...,...,...,...
1492,2022-07-22 09:24:24,44569.112315,44569.112315,27196.271927,61591.083288,24856,-1,0.094153
1493,2022-07-22 09:26:24,44569.096407,44569.096407,28015.381750,61621.454690,35555,0,0.000000
1494,2022-07-22 09:28:24,44569.080499,44569.080499,26812.161966,61278.846478,24749,-1,0.083363
1495,2022-07-22 09:30:24,44569.064591,44569.064591,27419.778606,62418.820152,20114,-1,0.363219


In [40]:
pred_anomalies_rand[pred_anomalies_rand["anomaly"]==1]

Unnamed: 0,ds,trend,yhat,yhat_lower,yhat_upper,fact,anomaly,importance
74,2022-07-14 17:02:55,33425.387248,33425.387248,15429.102203,49509.149945,49531,1,0.000441
120,2022-07-14 18:34:59,33618.645595,33618.645595,17879.820505,48964.105276,49856,1,0.017889
675,2022-07-15 13:25:29,35991.326482,35991.326482,18349.871253,51397.209489,54253,1,0.052638
676,2022-07-15 13:27:29,35995.512518,35995.512518,20686.004095,53302.370339,56518,1,0.056896
679,2022-07-15 13:31:17,36003.465986,36003.465986,17460.187379,50946.067505,72806,1,0.300249
680,2022-07-15 13:33:17,36007.652022,36007.652022,18735.576409,53769.611741,59495,1,0.096233
681,2022-07-15 13:35:17,36011.838058,36011.838058,18900.53689,51793.475442,71954,1,0.280186
682,2022-07-15 13:37:17,36016.024094,36016.024094,19121.39007,53250.592728,62885,1,0.153207
683,2022-07-15 13:39:17,36020.21013,36020.21013,19205.016832,52639.053478,59641,1,0.117402
685,2022-07-15 13:43:17,36028.582202,36028.582202,17728.40395,52338.199735,68011,1,0.230445


In [41]:
pred_anomalies_rand["anomaly"].sum()/pred_anomalies_rand.shape[0]

0.0033400133600534404

### Anomaly Detection Graph

In [42]:
plot_anomalies(pred_anomalies_rand[:5000])

In [44]:
rand_list = ['total_attached_user',
             'total_rejected_user', 'peak_upload_speed', 'peak_download_speed',
             'enodeb_shutdown_count', 'handover_failure_count',
             'bearer_active_user_count', 'bearer_rejected_user_count', 'total_users',
             'total_dropped_packets', 'enodeb_connected_count',
             'enodeb_connection_status']

In [45]:
for item in rand_list:

    df_rand_prophet = df_rand[["stats_timestamp", item]].rename(
        columns={"stats_timestamp": "ds", item: "y"})
    pred = fit_predict_model(df_rand_prophet)
    pred_anomalies = detect_anomalies(pred)
    print("Anomaly rate is: ",
          pred_anomalies["anomaly"].sum()/pred_anomalies.shape[0])

    chart = plot_anomalies(pred_anomalies[:5000])

    chart.save('Anomaly_graphs/Anomaly_{}.html'.format(item))

Anomaly rate is:  -0.0006680026720106881
Anomaly rate is:  0.0
Anomaly rate is:  0.004008016032064128
Anomaly rate is:  -0.0033400133600534404
Anomaly rate is:  0.0
Anomaly rate is:  0.0
Anomaly rate is:  0.0
Anomaly rate is:  0.002004008016032064
Anomaly rate is:  0.0
Anomaly rate is:  -0.0006680026720106881
Anomaly rate is:  0.0
Anomaly rate is:  0.0


## Current Anomalies Logic

['total_attached_user',
 'total_rejected_user',
 'peak_upload_speed',
 'peak_download_speed',
 'enodeb_shutdown_count',
 'handover_failure_count',
 'bearer_active_user_count',
 'bearer_rejected_user_count',
 'total_users',
 'total_dropped_packets',
 'enodeb_connected_count',
 'enodeb_connection_status']

In [None]:
for item in rand_list:

    df_rand_prophet = df_rand[["stats_timestamp", item]].rename(
        columns={"stats_timestamp": "ds", item: "y"})
    pred = fit_predict_model(df_rand_prophet)
    pred_anomalies = detect_anomalies(pred)
    print("Anomaly rate is: ",
          pred_anomalies["anomaly"].sum()/pred_anomalies.shape[0])

    chart = plot_anomalies(pred_anomalies[:5000])
    
    chart.save('Anomaly_graphs/Anomaly_{}.html'.format(item))

In [72]:
# # Filter columns with at least one record that have anomaly label
# export_anomaly_df= pd.DataFrame()
# for col in keeped_column_name:
#     condition= (Stats_summary_core["label_Z-score_"+ col]!=0) | (Stats_summary_core["label_outlier_"+col]!=0)


#     subset_columns=["client_id","stats_timestamp",col,"label_Z-score_"+col, "deviation_Z-score_"+col,"label_outlier_"+col]
#     print("{} rows of anomaly detected for column {}".format(sum(condition),col))
    
#     subset_Summary= Stats_summary_core[condition][subset_columns]
    
#     subset_Summary["Attribute_Name"] = col
    
#     subset_Summary = reorder_columns(subset_Summary,"Attribute_Name",2)
#     #subset_Summary.drop(col, axis=1)
# #print(subset_Summary.shape[1])
# #print(subset_Summary.columns)
#     subset_Summary=subset_Summary.rename(columns={str(col):"Attribute_Value","label_Z-score_"+col: "Attribute_Label_Z_Score",  "deviation_Z-score_"+col: "Attribute_Deviation_Z","label_outlier_"+col: "Attribute_Label_Outlier"})
    
#     export_anomaly_df=export_anomaly_df.append(subset_Summary, ignore_index = True)


## Insert Database Option

For items in Interested Columns, 

In [None]:
df_rand_prophet = df_rand[["stats_timestamp", item]].rename(
        columns={"stats_timestamp": "ds", item: "y"})
    pred = fit_predict_model(df_rand_prophet)
    pred_anomalies = detect_anomalies(pred)
    print("Anomaly rate is: ",
          pred_anomalies["anomaly"].sum()/pred_anomalies.shape[0])

    chart = plot_anomalies(pred_anomalies[:5000])

In [73]:
# attribute_name = item
# attribute_value = 

In [95]:

item= "peak_upload_speed"
df_rand_prophet = df_rand[["stats_timestamp", item]].rename(
    columns={"stats_timestamp": "ds", item: "y"})
pred = fit_predict_model(df_rand_prophet)
pred_anomalies = detect_anomalies(pred)
print("Anomaly rate is: ",
      pred_anomalies["anomaly"].sum()/pred_anomalies.shape[0])

chart = plot_anomalies(pred_anomalies[:5000])

anomaly_df = pred_anomalies[(pred_anomalies["anomaly"]==-1) | (pred_anomalies["anomaly"]==1)]

Anomaly rate is:  0.0033400133600534404


In [96]:
anomaly_df

Unnamed: 0,ds,trend,yhat,yhat_lower,yhat_upper,fact,anomaly,importance
675,2022-07-15 13:25:29,35991.326482,35991.326482,20500.508749,53302.046471,54253,1,0.017528
676,2022-07-15 13:27:29,35995.512518,35995.512518,18951.339387,53536.41559,56518,1,0.052755
679,2022-07-15 13:31:17,36003.465986,36003.465986,18133.423094,53664.654425,72806,1,0.262909
680,2022-07-15 13:33:17,36007.652022,36007.652022,18899.857879,52898.481748,59495,1,0.110875
681,2022-07-15 13:35:17,36011.838058,36011.838058,18774.739125,51968.900169,71954,1,0.277748
682,2022-07-15 13:37:17,36016.024094,36016.024094,20070.09678,53194.739013,62885,1,0.154095
683,2022-07-15 13:39:17,36020.21013,36020.21013,19994.654706,50469.516741,59641,1,0.153778
685,2022-07-15 13:43:17,36028.582202,36028.582202,18894.176614,50716.144441,68011,1,0.254295
686,2022-07-15 13:45:18,36032.803121,36032.803121,19278.213754,52925.689686,58661,1,0.09777
687,2022-07-15 13:47:08,36036.640321,36036.640321,20098.869585,53398.612698,67677,1,0.210978


In [97]:
anomaly_df= anomaly_df.rename(columns= {"ds":"stats_timestamp","yhat":"attribute_mean","anomaly":"attribute_label_prophet","fact":"attribute_value","importance":"attribute_deviation"})
anomaly_df=anomaly_df.drop(["trend","yhat_lower","yhat_upper"],axis=1)

In [99]:
anomaly_df["client_id"]= "BETBEL01GYN001"

In [104]:
# Export Anomaly_df is a python Dataframe object.
# MySQL Insert Query includes [tableName], (columns to insert)
# The value stores (col1, col2,col3)
insert_database_option=True


if insert_database_option:
    for index, row in anomaly_df.iterrows():
        connection = mysql.connector.connect(
            host="10.1.2.10",
            user="gyan",
            password="5Gaa$2022",
            database="gyan_db"
        )

        cursor = connection.cursor()

        MySQL_insert_query = "INSERT INTO tb_export_anomaly_df (client_id, stats_timestamp, attribute_name, attribute_value, attribute_label_prophet, attribute_deviation,attribute_mean) VALUES (%s, %s, %s, %s, %s, %s, %s)"

        the_value= (row.client_id, str(row.stats_timestamp), str(item), row.attribute_value, row.attribute_label_prophet, row.attribute_deviation,row.attribute_mean)
        
        try:
            cursor.execute(MySQL_insert_query, the_value)
            connection.commit()
        except:
            print("Record Inserted")
            pass

    cursor.close()
    print("Insert Complete")

Record Inserted
Record Inserted
Record Inserted
Record Inserted
Record Inserted
Record Inserted
Insert Complete


# Machine Learning Methods

In [7]:
import numpy as np
import mysql.connector
import pandas as pd
from pandas_profiling import ProfileReport

# Initiate with Parameters
db_name = "core_stats"
col = "peak_upload_speed"


# Start Database Connection
db_connection = mysql.connector.connect(
    host="10.1.2.10",
    user="gyan",
    password="5Gaa$2022",
    database="gyan_db"
)
# # Create Database Cursor for SQL Queries
# mycursor = db_connection.cursor()
# mycursor.execute("SELECT * FROM {} LIMIT 5".format(db_name))

# myresult = mycursor.fetchall()
# for x in myresult:
#     print(x)


# Load data from database and store as pandas Dataframe
df = pd.read_sql(
    'SELECT * FROM gyan_db.core_stats WHERE client_id in ("BETBEL01GYN001")'.format(db_name), con=db_connection)


In [273]:
df['stats_timestamp'] = pd.to_datetime(df['stats_timestamp'])
df=df.iloc[:,:-2]

## Metric Predictor

In [140]:
"""doctsring for packages."""
import logging
from prometheus_api_client import Metric
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense
from tensorflow.keras.layers import LSTM

import numpy as np
from sklearn.preprocessing import MinMaxScaler
import pandas as pd

# Set up logging
_LOGGER = logging.getLogger(__name__)


class MetricPredictor:
    """docstring for Predictor."""

    model_name = "lstm"
    model_description = "Forecasted value from Lstm model"
    model = None
    predicted_df = None
    metric = None

    def __init__(self, metric, rolling_data_window_size="10d", number_of_feature=10, validation_ratio=0.2,
                 parameter_tuning=True):
        """Initialize the Metric object."""
        self.metric = Metric(metric, rolling_data_window_size)

        self.number_of_features = number_of_feature
        self.scalar = MinMaxScaler(feature_range=(0, 1))
        self.parameter_tuning = parameter_tuning
        self.validation_ratio = validation_ratio

    def prepare_data(self, data):
        """Prepare the data for LSTM."""
        train_x = np.array(data[:, 1])[np.newaxis, :].T

        for i in range(self.number_of_features):
            train_x = np.concatenate((train_x, np.roll(data[:, 1], -i)[np.newaxis, :].T), axis=1)

        train_x = train_x[:train_x.shape[0] - self.number_of_features, :self.number_of_features]

        train_yt = np.roll(data[:, 1], -self.number_of_features + 1)
        train_y = np.roll(data[:, 1], -self.number_of_features)
        train_y = train_y - train_yt
        train_y = train_y[:train_y.shape[0] - self.number_of_features]

        train_x = train_x.reshape(train_x.shape[0], 1, train_x.shape[1])
        return train_x, train_y

    def get_model(self, lstm_cell_count, dense_cell_count):
        """Build the model."""
        model = Sequential()
        model.add(LSTM(64, return_sequences=True, input_shape=(1, self.number_of_features)))
        model.add(LSTM(lstm_cell_count))
        model.add(Dense(dense_cell_count))
        model.add(Dense(1))
        return model

    def train(self, metric_data=None, prediction_duration=15):
        """Train the model."""
        if metric_data:
            # because the rolling_data_window_size is set, this df should not bloat
            self.metric += Metric(metric_data)

        # normalising
        metric_values_np = self.metric.metric_values.values
        scaled_np_arr = self.scalar.fit_transform(metric_values_np[:, 1].reshape(-1, 1))
        metric_values_np[:, 1] = scaled_np_arr.flatten()

        if self.parameter_tuning:
            x, y = self.prepare_data(metric_values_np)
            lstm_cells = [2 ** i for i in range(5, 8)]
            dense_cells = [2 ** i for i in range(5, 8)]
            loss = np.inf
            lstm_cell_count = 0
            dense_cell_count = 0
            for lstm_cell_count_ in lstm_cells:
                for dense_cell_count_ in dense_cells:
                    model = self.get_model(lstm_cell_count_, dense_cell_count_)
                    model.compile(loss='mean_squared_error', optimizer='adam')
                    history = model.fit(np.asarray(x).astype(np.float32),
                                        np.asarray(y).astype(np.float32),
                                        epochs=50, batch_size=512, verbose=0,
                                        validation_split=self.validation_ratio)
                    val_loss = history.history['val_loss']
                    loss_ = min(val_loss)
                    if loss > loss_:
                        lstm_cell_count = lstm_cell_count_
                        dense_cell_count = dense_cell_count_
                        loss = loss_
            self.lstm_cell_count = lstm_cell_count
            self.dense_cell_count = dense_cell_count
            self.parameter_tuning = False

        model = self.get_model(self.lstm_cell_count, self.dense_cell_count)
        _LOGGER.info(
            "training data range: %s - %s", self.metric.start_time, self.metric.end_time
        )
        # _LOGGER.info("training data end time: %s", self.metric.end_time)
        _LOGGER.debug("begin training")
        data_x, data_y = self.prepare_data(metric_values_np)
        _LOGGER.debug(data_x.shape)
        model.compile(loss='mean_squared_error', optimizer='adam')
        model.fit(np.asarray(data_x).astype(np.float32), np.asarray(data_y).astype(np.float32), epochs=50, batch_size=512)
        data_test = np.asarray(metric_values_np[-self.number_of_features:, 1]).astype(np.float32)
        forecast_values = []
        prev_value = data_test[-1]
        for i in range(int(prediction_duration)):
            prediction = model.predict(data_test.reshape(1, 1, self.number_of_features)).flatten()[0]
            curr_pred_value = data_test[-1] + prediction
            scaled_final_value = self.scalar.inverse_transform(curr_pred_value.reshape(1, -1)).flatten()[0]
            forecast_values.append(scaled_final_value)
            data_test = np.roll(data_test, -1)
            data_test[-1] = curr_pred_value
            prev_value = data_test[-1]

        dataframe_cols = {"yhat": np.array(forecast_values)}

        upper_bound = np.array(
            [
                (
                        forecast_values[i] + (np.std(forecast_values[:i]) * 2)
                )
                for i in range(len(forecast_values))
            ]
        )
        upper_bound[0] = np.mean(
            forecast_values[0]
        )  # to account for no std of a single value
        lower_bound = np.array(
            [
                (
                        forecast_values[i] - (np.std(forecast_values[:i]) * 2)
                )
                for i in range(len(forecast_values))
            ]
        )
        lower_bound[0] = np.mean(
            forecast_values[0]
        )  # to account for no std of a single value
        dataframe_cols["yhat_upper"] = upper_bound
        dataframe_cols["yhat_lower"] = lower_bound

        data = self.metric.metric_values
        maximum_time = max(data["ds"])
        dataframe_cols["timestamp"] = pd.date_range(
            maximum_time, periods=len(forecast_values), freq="min"
        )

        forecast = pd.DataFrame(data=dataframe_cols)
        forecast = forecast.set_index("timestamp")

        self.predicted_df = forecast
        _LOGGER.debug(forecast)

    def predict_value(self, prediction_datetime):
        """Return the predicted value of the metric for the prediction_datetime."""
        nearest_index = self.predicted_df.index.get_loc(
            prediction_datetime, method="nearest"
        )
        return self.predicted_df.iloc[[nearest_index]]


In [156]:
tool = MetricPredictor(metric)

TypeError: oldest_data_datetime can only be datetime.datetime/ datetime.timedelta or None

# Prometheus Client

In [150]:
import os
os.listdir()

['.git',
 '.ipynb_checkpoints',
 'Anomaly_graphs',
 'data',
 'Functions',
 'LSTM_Demo.ipynb',
 'mlflow-example',
 'MLflow-example-notebook.ipynb',
 'mlruns',
 'model_artifacts',
 'Prometheus_Documentation.ipynb',
 'Prophet_experiments_on_data.ipynb',
 'Prophet_META_documentation.ipynb',
 'Random_data_generation',
 'Random_data_generation.ipynb',
 'Random_data_generation_ran',
 'Random_data_generation_v2',
 'README.md',
 'Readme.txt',
 'top-machine-learning-algorithms-beginner.ipynb']

In [237]:
%run Prometheus_Documentation.ipynb

we use DFWIRV01COR001_1_peak_dl as an example metrics for function exploration


## Explore metrics

In [238]:
# Initiate the Class
Prometheus_client = PrometheusConnect()    

# Connect with Betacom Promethues Server
Prometheus_client.connect_Betacom_prometheus()

# Get all metrics from Betacom
Prometheus_client.get_Betacom_metrics()

In [239]:
rand_metrics = Prometheus_client.rand_metrics 

# Convert dfw metrics into 
rand_metrics_df= Prometheus_client.convert_current_metrics_toDF(rand_metrics)

In [240]:
rand_metrics_df.head()

Unnamed: 0,all_metrics,datetime_list,value_list
0,BETBEL01GYN001_1_ConnEstabFailSum,2022-07-28 11:40:07.006000,10
1,BETBEL01GYN001_1_ConnEstabSuccSum,2022-07-28 11:40:07.031000,6
2,BETBEL01GYN001_1_DrbDLThCapacityAvg,2022-07-28 11:40:07.051000,4
3,BETBEL01GYN001_1_DrbDLThCapacitySum,2022-07-28 11:40:07.071000,8
4,BETBEL01GYN001_1_DrbMaxULThCapacity,2022-07-28 11:40:07.093000,4


In [241]:
# metric= "BETBEL01GYN001_1_ConnEstabFailSum{app='gyan-core-agent-rand', instance='http://127.0.0.1:9090'}"

In [245]:
metric=rand_metrics[0]

In [249]:
start_time=datetime(2022, 7, 1)
end_time=datetime(2022, 7, 28)
metric_df = Prometheus_client.get_metrics_every_5_minutes(metric,start_time,end_time)
metric_df

TypeError: list indices must be integers or slices, not str

In [212]:
for metric in rand_metrics:
    try:
        sub_df = Prometheus_client.get_metrics_every_5_minutes(metric)[10:30]
        print(sub_df)
    except:
        pass
    

In [214]:
metric = rand_metrics[0]

In [258]:
def get_metrics_every_5_minutes(metric, start_time=datetime(2022, 7, 1), end_time=datetime(2022, 7,28)):
    import pandas as pd
    from datetime import datetime
    #start_time = datetime(2022, 7, 1)
    #end_time = datetime(2022, 7, 7)



    start_time=datetime(2022, 7, 1)
    end_time=datetime(2022, 7, 28)

    metric_list = Prometheus_client.prom.get_metric_range_data(
        metric_name=metric, start_time=start_time, end_time=end_time)[0]["values"]


    # print(len(double_lst))
    num = 5*60/15   # every 5 minutes, the default is 15 seconds

    item_list = [item for i, item in enumerate(
        metric_list) if i % (num) == 0]

    time_list = [datetime.fromtimestamp(item[0]).strftime(
        '%Y-%m-%d %H:%M') for item in item_list]
    val_list = list(map(lambda x: x[1], item_list))

    dic = {"time": time_list,
           "metric_value": val_list}

    df = pd.DataFrame(dic)

    return df

In [261]:
for metric in rand_metrics:
    sub_df = get_metrics_every_5_minutes(metric)
    if sub_df.shape[0]==736:
        
    print("Metric Name is: ",metric)
    print(sub_df.shape)

Metric Name is:  BETBEL01GYN001_1_ConnEstabFailSum
(736, 2)
Metric Name is:  BETBEL01GYN001_1_ConnEstabSuccSum
(736, 2)
Metric Name is:  BETBEL01GYN001_1_DrbDLThCapacityAvg
(736, 2)
Metric Name is:  BETBEL01GYN001_1_DrbDLThCapacitySum
(736, 2)
Metric Name is:  BETBEL01GYN001_1_DrbMaxULThCapacity
(736, 2)
Metric Name is:  BETBEL01GYN001_1_DrbPdcpSduAirLossRateDlSum
(736, 2)
Metric Name is:  BETBEL01GYN001_1_DrbPdcpSduLossRateUlSum
(736, 2)
Metric Name is:  BETBEL01GYN001_1_DrbULThCapacityAvg
(736, 2)
Metric Name is:  BETBEL01GYN001_1_DrbULThCapacitySum
(736, 2)
Metric Name is:  BETBEL01GYN001_1_IpLateSamplesDl
(736, 2)
Metric Name is:  BETBEL01GYN001_1_PdcpSduAirLossDl
(736, 2)
Metric Name is:  BETBEL01GYN001_1_PdcpSduAirLossRateDl
(736, 2)
Metric Name is:  BETBEL01GYN001_1_PdcpSduAirTotalDl
(736, 2)
Metric Name is:  BETBEL01GYN001_1_PdcpSduDropRateDl
(736, 2)
Metric Name is:  BETBEL01GYN001_1_PdcpTrafficDlKbps
(736, 2)
Metric Name is:  BETBEL01GYN001_1_PdcpTrafficUlKbps
(736, 2)
Metric

In [1]:
API = "http://10.1.2.10:9090/api/v1/query?query=BETBEL01GYN001_1_ConnEstabFailSum"

In [2]:
import requests
userAttachedResponse = requests.get(API)

In [4]:
userAttachedResponse.text

'{"status":"success","data":{"resultType":"vector","result":[{"metric":{"__name__":"BETBEL01GYN001_1_ConnEstabFailSum","instance":"localhost:9900","job":"gyan-ran-agent-scraper"},"value":[1659557504.479,"0"]}]}}'

Unnamed: 0,article_id,article_name
0,10,article_name10
1,11,articlename10
2,12,articlename10
3,14,article_name14
4,18,article18
5,21,articlename40
