# Dynamic Resource Allocation with MLOPS integration in Telecommunication Cloud Environments 

### Life Cycle of the Project

* Understanding the Problem Statement
* Data Collection
* Data-preprocessing
* Exploratory Data Analysis
* Model Training
* Model Selection

### 1. Problem Statement

Telecommunication cloud systems need to manage and allocate resources like computing power, storage, and bandwidth to handle changing user demands. The challenge is to do this efficiently while ensuring good service quality, reducing costs, and saving energy. These systems also need to handle multiple virtual networks, make quick decisions, and stay reliable even if something goes wrong. The goal is to create a system that uses resources wisely, adapts to changes, and works smoothly for all users.

### 2. Data Collection

* Data Source : https://data.niaid.nih.gov/resources?id=zenodo_10245447


* Data consists in three csv files - 


        * pods_request_workloads.csv - It consists 7 Columns and Rows
                                (which are 'timestamp', 'uid', 'node', 'cpu', 'memory', 'nvidia_com', 'scenario')

        * nodes_allocatable.csv - It consists 8 Columns and Rows
                                (which are 'timestamp', 'node', 'cpu', 'memory', 'nvidia_com', 'status', 'condition', 'scenario')


We'll merge the dataset because most of the attributes are same 

##### 2.1 Import Data and Required Packages  / Libraries

* IMPORTING LIBRARIES

In [17]:
import numpy as numpy
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
%matplotlib inline
from sklearn.preprocessing import LabelEncoder # type: ignore
from sklearn.preprocessing import StandardScaler

* IMPORTING DATASETS

In [22]:
import pandas as pd

# Load your dataset
df = pd.read_csv("data.csv")

# Keep only the first 50000 rows
new_df=df.head(50000)

# Save the shortened dataset
new_df.to_csv("new_df.csv", index=False)

In [23]:
new_df

Unnamed: 0,timestamp,node,cpu_workloads,memory_workloads,nvidia_com_gpu_workloads,status,condition,scenario_workloads,uid,cpu_allocatable,memory_allocatable,nvidia_com_gpu_allocatable,scenario_allocatable
0,2023-10-13 12:04:00,0xozF0md0I,7.91,14684.878906,0.0,,,1,0a7e3149-7520-44af-be02-6cb0ede2109d,0.100,0.0,0.0,A
1,2023-10-13 12:04:00,0xozF0md0I,7.91,14684.878906,0.0,,,1,327fff0d-3d9d-4c99-8e0d-f8581dfa7373,0.010,40.0,0.0,A
2,2023-10-13 12:04:00,0xozF0md0I,7.91,14684.878906,0.0,,,1,40054bfd-b720-4f90-86c2-974e227c178f,0.025,0.0,0.0,A
3,2023-10-13 12:04:00,1uYdt27oKb,7.91,14684.878906,0.0,,,1,234bebd4-85d4-4068-8dde-0b2a43fc7940,0.100,0.0,0.0,A
4,2023-10-13 12:04:00,1uYdt27oKb,7.91,14684.878906,0.0,,,1,8619d1f4-7029-4425-9f99-725dcb545709,0.010,40.0,0.0,A
...,...,...,...,...,...,...,...,...,...,...,...,...,...
49995,2023-10-13 14:00:30,br8L3VA52I,31.85,60211.082031,0.0,True,Ready,1,8ced0d0c-3827-4c84-a5ec-39f7e36516a3,1.000,1024.0,0.0,A
49996,2023-10-13 14:00:30,br8L3VA52I,31.85,60211.082031,0.0,True,Ready,1,989b4d2b-1b55-40d3-ab5c-03f048ab1233,1.000,1024.0,0.0,A
49997,2023-10-13 14:00:30,br8L3VA52I,31.85,60211.082031,0.0,True,Ready,1,9e1c81e4-3a39-407e-91c3-b2839fe44e45,0.025,0.0,0.0,A
49998,2023-10-13 14:00:30,br8L3VA52I,31.85,60211.082031,0.0,True,Ready,1,9f50dada-a620-4125-8f6b-4ad843f987a0,0.010,40.0,0.0,A


##### 2.2 Dataset Information

* TOP 5 RECORDS OF THE DATASET

In [4]:
new_df.head()

Unnamed: 0,timestamp,node,cpu_workloads,memory_workloads,nvidia_com_gpu_workloads,status,condition,scenario_workloads,uid,cpu_allocatable,memory_allocatable,nvidia_com_gpu_allocatable,scenario_allocatable
0,2023-10-13 12:04:00,0xozF0md0I,7.91,14684.878906,0.0,,,1,0a7e3149-7520-44af-be02-6cb0ede2109d,0.1,0.0,0.0,A
1,2023-10-13 12:04:00,0xozF0md0I,7.91,14684.878906,0.0,,,1,327fff0d-3d9d-4c99-8e0d-f8581dfa7373,0.01,40.0,0.0,A
2,2023-10-13 12:04:00,0xozF0md0I,7.91,14684.878906,0.0,,,1,40054bfd-b720-4f90-86c2-974e227c178f,0.025,0.0,0.0,A
3,2023-10-13 12:04:00,1uYdt27oKb,7.91,14684.878906,0.0,,,1,234bebd4-85d4-4068-8dde-0b2a43fc7940,0.1,0.0,0.0,A
4,2023-10-13 12:04:00,1uYdt27oKb,7.91,14684.878906,0.0,,,1,8619d1f4-7029-4425-9f99-725dcb545709,0.01,40.0,0.0,A


* TOTAL NO OF ROWS & COLUMNS PRESENT IN THE DATASET

In [5]:
rows,cols=new_df.shape
print(f"Total no of Rows in dataset \t\t:\t{rows} \nTotal no of Columns in dataset \t:\t{cols}")

Total no of Rows in dataset 		:	50000 
Total no of Columns in dataset 	:	13


* DATATYPES OF THE ATTRIBUTES PRESENT IN THE DATASET 

In [6]:
new_df.dtypes

timestamp                      object
node                           object
cpu_workloads                 float64
memory_workloads              float64
nvidia_com_gpu_workloads      float64
status                         object
condition                      object
scenario_workloads              int64
uid                            object
cpu_allocatable               float64
memory_allocatable            float64
nvidia_com_gpu_allocatable    float64
scenario_allocatable           object
dtype: object

* DESCRIPTION OF THE DATASET

In [7]:
new_df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 50000 entries, 0 to 49999
Data columns (total 13 columns):
 #   Column                      Non-Null Count  Dtype  
---  ------                      --------------  -----  
 0   timestamp                   50000 non-null  object 
 1   node                        50000 non-null  object 
 2   cpu_workloads               50000 non-null  float64
 3   memory_workloads            50000 non-null  float64
 4   nvidia_com_gpu_workloads    50000 non-null  float64
 5   status                      44546 non-null  object 
 6   condition                   44546 non-null  object 
 7   scenario_workloads          50000 non-null  int64  
 8   uid                         50000 non-null  object 
 9   cpu_allocatable             50000 non-null  float64
 10  memory_allocatable          50000 non-null  float64
 11  nvidia_com_gpu_allocatable  50000 non-null  float64
 12  scenario_allocatable        50000 non-null  object 
dtypes: float64(6), int64(1), object

### 3. Data Preprocessing

 3.1 Data Checks to Perform

*  Check Missing Values

In [8]:
new_df.isnull().sum()

timestamp                        0
node                             0
cpu_workloads                    0
memory_workloads                 0
nvidia_com_gpu_workloads         0
status                        5454
condition                     5454
scenario_workloads               0
uid                              0
cpu_allocatable                  0
memory_allocatable               0
nvidia_com_gpu_allocatable       0
scenario_allocatable             0
dtype: int64

*  Check Duplicates

In [9]:
new_df.duplicated().sum()

0

* Feature Engineering

In [10]:
# Workload-to-Allocatable Ratios
new_df['cpu_ratio'] = new_df['cpu_workloads'] / new_df['cpu_allocatable']

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  new_df['cpu_ratio'] = new_df['cpu_workloads'] / new_df['cpu_allocatable']


In [11]:
# Total Workload
new_df['total_workload'] = new_df['cpu_workloads'] + new_df['memory_workloads'] + new_df['nvidia_com_gpu_workloads']

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  new_df['total_workload'] = new_df['cpu_workloads'] + new_df['memory_workloads'] + new_df['nvidia_com_gpu_workloads']


In [12]:
new_df['cpu_memory_interaction'] = new_df['cpu_workloads'] * new_df['memory_workloads']
new_df['workload_allocatable_interaction'] = (new_df['cpu_workloads'] * new_df['cpu_allocatable']) +  (new_df['memory_workloads'] * new_df['memory_allocatable'])

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  new_df['cpu_memory_interaction'] = new_df['cpu_workloads'] * new_df['memory_workloads']
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  new_df['workload_allocatable_interaction'] = (new_df['cpu_workloads'] * new_df['cpu_allocatable']) +  (new_df['memory_workloads'] * new_df['memory_allocatable'])


* Converting the Object into Int

In [13]:
new_df.dtypes

timestamp                            object
node                                 object
cpu_workloads                       float64
memory_workloads                    float64
nvidia_com_gpu_workloads            float64
status                               object
condition                            object
scenario_workloads                    int64
uid                                  object
cpu_allocatable                     float64
memory_allocatable                  float64
nvidia_com_gpu_allocatable          float64
scenario_allocatable                 object
cpu_ratio                           float64
total_workload                      float64
cpu_memory_interaction              float64
workload_allocatable_interaction    float64
dtype: object

In [14]:
new_df['timestamp']=new_df['timestamp'].str.replace(r'[^\d]','',regex='True')
new_df['timestamp']=pd.to_numeric(new_df['timestamp'], errors='coerce').astype('int64')

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  new_df['timestamp']=new_df['timestamp'].str.replace(r'[^\d]','',regex='True')
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  new_df['timestamp']=pd.to_numeric(new_df['timestamp'], errors='coerce').astype('int64')


In [15]:
new_df['cpu_workloads']=pd.to_numeric(new_df['cpu_workloads'], errors='coerce').astype('int64')
new_df['memory_workloads']=pd.to_numeric(new_df['memory_workloads'], errors='coerce').astype('int64')
new_df['nvidia_com_gpu_workloads']=pd.to_numeric(new_df['nvidia_com_gpu_workloads'], errors='coerce').astype('int64')
new_df['cpu_allocatable']=pd.to_numeric(new_df['cpu_allocatable'], errors='coerce').astype('int64')
new_df['memory_allocatable']=pd.to_numeric(new_df['memory_allocatable'], errors='coerce').astype('int64')
new_df['nvidia_com_gpu_allocatable']=pd.to_numeric(new_df['nvidia_com_gpu_allocatable'], errors='coerce').astype('int64')
new_df['cpu_ratio']=pd.to_numeric(new_df['cpu_ratio'], errors='coerce').astype('int64')
new_df['total_workload']=pd.to_numeric(new_df['total_workload'], errors='coerce').astype('int64')
new_df['cpu_memory_interaction']=pd.to_numeric(new_df['cpu_memory_interaction'], errors='coerce').astype('int64')
new_df['workload_allocatable_interaction']=pd.to_numeric(new_df['workload_allocatable_interaction'], errors='coerce').astype('int64')

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  new_df['cpu_workloads']=pd.to_numeric(new_df['cpu_workloads'], errors='coerce').astype('int64')
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  new_df['memory_workloads']=pd.to_numeric(new_df['memory_workloads'], errors='coerce').astype('int64')
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  new_df[

In [18]:
label_encoder = LabelEncoder()

categorical = ['node', 'status', 'condition', 'scenario_workloads', 'scenario_allocatable', 'uid']

for col in categorical:
    new_df[col] = label_encoder.fit_transform(new_df[col])

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  new_df[col] = label_encoder.fit_transform(new_df[col])
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  new_df[col] = label_encoder.fit_transform(new_df[col])
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  new_df[col] = label_encoder.fit_transform(new_df[col])
A value is trying to be set on a copy of

In [19]:
new_df.head()

Unnamed: 0,timestamp,node,cpu_workloads,memory_workloads,nvidia_com_gpu_workloads,status,condition,scenario_workloads,uid,cpu_allocatable,memory_allocatable,nvidia_com_gpu_allocatable,scenario_allocatable,cpu_ratio,total_workload,cpu_memory_interaction,workload_allocatable_interaction
0,20231013120400,0,7,14684,0,1,1,0,5,0,0,0,0,79,14692,116157,0
1,20231013120400,0,7,14684,0,1,1,0,37,0,40,0,0,791,14692,116157,587395
2,20231013120400,0,7,14684,0,1,1,0,48,0,0,0,0,316,14692,116157,0
3,20231013120400,1,7,14684,0,1,1,0,23,0,0,0,0,79,14692,116157,0
4,20231013120400,1,7,14684,0,1,1,0,113,0,40,0,0,791,14692,116157,587395


In [20]:
scaler = StandardScaler()
new_df= scaler.fit_transform(new_df)

In [21]:
new_df

array([[-1.49777989, -1.91849583, -1.4312024 , ..., -1.31492526,
        -1.77579529, -0.36453985],
       [-1.49777989, -1.91849583, -1.4312024 , ..., -1.31492526,
        -1.77579529, -0.35887867],
       [-1.49777989, -1.91849583, -1.4312024 , ..., -1.31492526,
        -1.77579529, -0.36453985],
       ...,
       [ 2.22996453,  0.55325112,  0.74382214, ...,  0.06179642,
         0.53898666, -0.36453985],
       [ 2.22996453,  0.55325112,  0.74382214, ...,  0.06179642,
         0.53898666, -0.34132781],
       [ 2.22996453,  0.55325112,  0.74382214, ...,  0.06179642,
         0.53898666,  0.22968854]])