This project focuses on analyzing raw data collected from IoT sensors to extract meaningful insights using machine learning. It involves data preprocessing, handling missing values, feature engineering, and applying predictive modeling techniques. The goal is to transform sensor data into actionable intelligence for improved decision-making.

In [1]:
#Importing the Libraries

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
%matplotlib inline
import warnings
warnings.filterwarnings('ignore')

In [13]:
#Importing the Dataset
data = pd.read_csv('dataset-grid-toplology-1.csv',sep='\t',on_bad_lines='skip')
data.head()

Unnamed: 0,No,Time,From,To,sourceaddress,destinationaddress,edgenodeaddress,INSTANCE_ID,rank,rank_min,...,hops,distance,node_type,node_honest_level,energy_efficiency_OF,lattency_min_OF,pdr_OF,congestion_OF,throughput_opt_OF,Data
0,2,00:03.044,1,[25 d],fe80::212:74011101,fe80::212:74033303,fe80::212:74011101,87bb925b-d3ff-46a6-b339-7d48b8993bec,1024.0,256,...,2,-12,EDGE_NODE,793,-0.000479,-1737107271769,0.01,0.25,0.044803,97: 0x41D88ACD ABFFFF01 01010001 7412007A 3B3A...
1,3,00:03.993,20,[20 d],fe80::212:7420202020,fe80::212:74044404,fe80::212:74000101,bd86a2f6-b3d6-4c67-a9d3-2a42904c243c,,256,...,2,128,CHILD_NODE,788,-0.000417,-1737107271609,0.01,0.25,0.043178,102: 0x61DC18CD AB020202 00027412 00141414 001...
2,4,00:03.993,24,0,fe80::212:7424242424,0,0,0233072f-b8cd-4189-818e-2a4e2b8f1ae7,65535.0,256,...,1,256,CHILD_NODE,768,-0.000416,0,,0.0,0.065104,102: 0x61DC18CD AB020202 00027412 00181818 001...
3,5,00:03.997,42,[10 d],fe80::212:7442424242,fe80::212:74044404,fe80::212:74000101,c0a88d50-730f-4c1a-b21e-83c23f59d9b0,,256,...,1,128,CHILD_NODE,778,-0.000417,-1737107271609,0.01,0.25,0.041876,102: 0x61DC18CD AB020202 00027412 002A2A2A 002...
4,6,00:03.997,19,0,fe80::212:7419191919,0,0,3c256cd4-9b1b-43e3-ab9d-a61f12a776ce,65535.0,256,...,1,256,CHILD_NODE,768,-0.000417,0,,0.0,0.065104,102: 0x61DC18CD AB020202 00027412 00131313 001...


In [15]:
#Profile of the data - No. of Rows,Column,Null Values,Data Types,Data Size.

data.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 33889 entries, 0 to 33888
Data columns (total 34 columns):
 #   Column                Non-Null Count  Dtype  
---  ------                --------------  -----  
 0   No                    33889 non-null  object 
 1   Time                  33889 non-null  object 
 2   From                  33889 non-null  int64  
 3   To                    33889 non-null  object 
 4   sourceaddress         33889 non-null  object 
 5   destinationaddress    33889 non-null  object 
 6   edgenodeaddress       33889 non-null  object 
 7   INSTANCE_ID           33889 non-null  object 
 8   rank                  32541 non-null  float64
 9   rank_min              33889 non-null  int64  
 10  rank_max              33889 non-null  int64  
 11  node_id               33889 non-null  int64  
 12  ver                   33889 non-null  int64  
 13  ver_min               33889 non-null  int64  
 14  ver_max               33889 non-null  int64  
 15  ver_diff           

The IoT sensor dataset has 33,889 records(rows) with 34 features(column) and datatypes of column are int64, flost64, object. It contains missing values in rank and pdr_OF, requiring preprocessing before analysis.

In [25]:
#Convert the column name to lower

data.columns = data.columns.str.lower()
data.columns

Index(['no', 'time', 'from', 'to', 'sourceaddress', 'destinationaddress',
       'edgenodeaddress', 'instance_id', 'rank', 'rank_min', 'rank_max',
       'node_id', 'ver', 'ver_min', 'ver_max', 'ver_diff', 'sending_time',
       'sending_rate', 'delta_time', 'received_packets', 'forward_packets',
       'drop_count', 'energy_consumption', 'etx', 'hops', 'distance',
       'node_type', 'node_honest_level', 'energy_efficiency_of',
       'lattency_min_of', 'pdr_of', 'congestion_of', 'throughput_opt_of',
       'data'],
      dtype='object')

Observation - The column name of string is corrected to lower which able to view very well.

In [29]:
# Na Features

data.isna().sum()

no                          0
time                        0
from                        0
to                          0
sourceaddress               0
destinationaddress          0
edgenodeaddress             0
instance_id                 0
rank                     1348
rank_min                    0
rank_max                    0
node_id                     0
ver                         0
ver_min                     0
ver_max                     0
ver_diff                    0
sending_time            33889
sending_rate                0
delta_time                  0
received_packets            0
forward_packets             0
drop_count                  0
energy_consumption          0
etx                         0
hops                        0
distance                    0
node_type                   0
node_honest_level           0
energy_efficiency_of        0
lattency_min_of             0
pdr_of                   7042
congestion_of               0
throughput_opt_of           0
data      

In [31]:
data.nunique()

no                      33889
time                    31601
from                       52
to                         87
sourceaddress              52
destinationaddress         39
edgenodeaddress             3
instance_id             33889
rank                        6
rank_min                    1
rank_max                    1
node_id                    52
ver                         1
ver_min                     1
ver_max                     1
ver_diff                    1
sending_time                0
sending_rate                6
delta_time                445
received_packets         3292
forward_packets          2065
drop_count                 44
energy_consumption        161
etx                        89
hops                        3
distance                    3
node_type                   3
node_honest_level          44
energy_efficiency_of      161
lattency_min_of          3245
pdr_of                      1
congestion_of               2
throughput_opt_of          89
data      

In [39]:
data['time']

0        00:03.044
1        00:03.993
2        00:03.993
3        00:03.997
4        00:03.997
           ...    
33884    03:35.857
33885    03:35.860
33886    03:35.881
33887    03:35.890
33888    03:35.894
Name: time, Length: 33889, dtype: object

<a style='text-decoration:none;line-height:16px;display:flex;color:#5B5B62;padding:10px;justify-content:end;' href='https://deepnote.com?utm_source=created-in-deepnote-cell&projectId=20fc12f3-425f-462f-a3ce-826f283d5c97' target="_blank">
 </img>
Created in <span style='font-weight:600;margin-left:4px;'>Deepnote</span></a>