***Checkpoint problematic***  : Network operational expenditure (OPEX) already accounts for around 25 percent of the total telecom operator’s cost, and 90 percent of it is spent on large energy bills. More than 70 percent of this energy is estimated to be consumed by the radio access network (RAN), particularly by the base stations (BSs). Thus, the objective is to build and train a ML model to estimate the energy consumed by different 5G base stations taking into consideration the impact of various engineering configurations, traffic conditions, and energy-saving methods.

***Dataset description*** : This dataset is derived from the original copy and simplified for learning purposes. It includes cell-level traffic statistics of 4G/5G sites collected on different days.


![Image](https://i.imgur.com/Agu9zeP_d.webp?maxwidth=760&fidelity=grand)

## Instructions

1. Import you data and perform basic data exploration phase
 - Display general information about the dataset
 - Create a pandas profiling reports to gain insights into the dataset
 - Handle Missing and corrupted values
 - Remove duplicates, if they exist
 - Handle outliers, if they exist
 - Encode categorical features
2. Select your target variable and the features
3. Split your dataset to training and test sets
4. Based on your data exploration phase select a ML regression algorithm and train it on the training set
5. Assess your model performance on the test set using relevant evaluation metrics
6. Discuss with your cohort alternative ways to improve your model performance

# Import Libraries 

In [20]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import sklearn.datasets
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
from sklearn import metrics
import os

# Extract Data

In [4]:
data_path = r"C:\Users\Elle\Desktop\MACHINE LEARNING\ML_CheckPoints\5G_energy_consumption\5G_energy_consumption_dataset.csv"

In [5]:
file_name = "5G_energy_consumption_dataset.csv"
final_path = os.path.join (data_path , file_name)
final_path

'C:\\Users\\Elle\\Desktop\\MACHINE LEARNING\\ML_CheckPoints\\5G_energy_consumption\\5G_energy_consumption_dataset.csv\\5G_energy_consumption_dataset.csv'

In [6]:
data = pd.read_csv("5G_energy_consumption_dataset.csv")

# Explortory Data Analysis 

In [7]:
data.shape

(92629, 6)

In [8]:
data.tail()

Unnamed: 0,Time,BS,Energy,load,ESMODE,TXpower
92624,20230102 170000,B_1018,14.648729,0.087538,0.0,7.325859
92625,20230102 180000,B_1018,14.648729,0.082635,0.0,7.325859
92626,20230102 210000,B_1018,13.452915,0.055538,0.0,7.325859
92627,20230102 220000,B_1018,13.602392,0.058077,0.0,7.325859
92628,20230102 230000,B_1018,13.303438,0.048173,0.0,7.325859


In [9]:
data.head()

Unnamed: 0,Time,BS,Energy,load,ESMODE,TXpower
0,20230101 010000,B_0,64.275037,0.487936,0.0,7.101719
1,20230101 020000,B_0,55.904335,0.344468,0.0,7.101719
2,20230101 030000,B_0,57.698057,0.193766,0.0,7.101719
3,20230101 040000,B_0,55.156951,0.222383,0.0,7.101719
4,20230101 050000,B_0,56.053812,0.175436,0.0,7.101719


In [10]:
data.loc[0]

Time                      20230101 010000 
BS                                 B_0    
Energy                           64.275037
load                              0.487936
ESMODE                                 0.0
TXpower                           7.101719
Name: 0, dtype: object

In [11]:
data.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 92629 entries, 0 to 92628
Data columns (total 6 columns):
 #   Column                  Non-Null Count  Dtype  
---  ------                  --------------  -----  
 0   Time                    92629 non-null  object 
 1   BS                      92629 non-null  object 
 2   Energy                  92629 non-null  float64
 3   load                    92629 non-null  float64
 4   ESMODE                  92629 non-null  float64
 5   TXpower                 92629 non-null  float64
dtypes: float64(4), object(2)
memory usage: 4.2+ MB


In [12]:
data.isnull().sum()

Time                      0
BS                        0
Energy                    0
load                      0
ESMODE                    0
TXpower                   0
dtype: int64

In [33]:
describtion = data.describe().T
describtion.shape

(4, 8)

In [45]:
data.describe().T

Unnamed: 0,count,mean,std,min,25%,50%,75%,max
Energy,92629.0,28.138997,13.934645,0.747384,18.236173,24.06577,35.724963,100.0
load,92629.0,0.244705,0.234677,0.0,0.05737,0.16555,0.363766,0.993957
ESMODE,92629.0,0.081361,0.382317,0.0,0.0,0.0,0.0,4.0
TXpower,92629.0,6.765427,0.309929,5.381166,6.427504,6.875934,6.875934,8.375336


In [34]:
selected_columns = data.select_dtypes(exclude=['object'])
describtion['nunique']=selected_columns.nunique()
describtion['NULLS']=selected_columns.isna().sum()
describtion

Unnamed: 0,count,mean,std,min,25%,50%,75%,max,nunique,NULLS
Energy,92629.0,28.138997,13.934645,0.747384,18.236173,24.06577,35.724963,100.0,612,0
load,92629.0,0.244705,0.234677,0.0,0.05737,0.16555,0.363766,0.993957,58563,0
ESMODE,92629.0,0.081361,0.382317,0.0,0.0,0.0,0.0,4.0,1713,0
TXpower,92629.0,6.765427,0.309929,5.381166,6.427504,6.875934,6.875934,8.375336,41,0


In [None]:
'''profile = pandas_profiling.ProfileReport(df)
profile.to_file("data_exploration_report.html")

In [None]:
data.fillna(data.mean(), inplace=True)