In [None]:
import numpy as np
import os 
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

This code imports four libraries: numpy, os, pandas, matplotlib, and seaborn.

numpy is a library for working with arrays and numerical operations. It is commonly used for scientific computing and data analysis in Python.

os is a library for interacting with the operating system. It provides a way to use operating system dependent functionality like reading or writing to files.

pandas is a library for working with data in a tabular format, similar to a spreadsheet. It provides powerful data manipulation and analysis tools.

matplotlib is a plotting library for creating static, animated, and interactive visualizations in Python.

seaborn is a library for creating statistical graphics and data visualization built on top of matplotlib. It provides a high-level interface for drawing attractive and informative statistical graphics.

In [None]:
os.chdir("../input/smoke-detection-dataset")
os.listdir()

This code is using the built-in Python library "os" to change the current working directory to the parent directory (indicated by "../) followed by the specific subdirectory "input/smoke-detection-dataset". The next line of code, "os.listdir()", is then used to list the contents of the newly changed directory, displaying all the files and subdirectories within the "smoke-detection-dataset" folder. This is likely being used to ensure that the script is in the correct directory for accessing the necessary data for the smoke detection task.

In [None]:
# patoolib.extract_archive('smoke_detection_iot.csv.zip')
os.listdir()

In [None]:
data= pd.read_csv('smoke_detection_iot.csv')
data.head()


This code is using the pandas library in Python to read a CSV file named "smoke_detection_iot.csv" and store it in a variable called "data." The CSV file likely contains data related to smoke detection in an Internet of Things (IoT) environment.

The second line of code, "data.head()," is using the pandas method "head()" to display the first five rows of the data stored in the "data" variable. This is a useful command to quickly view the contents of a large dataset.

In [None]:
data.columns

In [None]:
data.drop(['Unnamed: 0', 'UTC','CNT'],axis = 1, inplace = True)
data.columns

This code is using the drop() function from the Pandas library to remove certain columns from a dataframe called "data". The drop() function takes several arguments:

The first argument is a list of column names to be removed. In this case, the columns 'Unnamed: 0', 'UTC', and 'CNT' are specified.
The second argument, "axis = 1", specifies that the columns should be dropped. This is the default value, so it could be omitted in this case.
The third argument, "inplace = True", specifies that the changes should be made to the original dataframe, rather than returning a new dataframe with the changes.
After the columns are removed, the code then accesses the columns attribute of the dataframe to view the remaining columns.

In [None]:
print(f"Shape of data {data.shape}")
print(f"Info of data {data.info()}")

display(data.describe())


In [None]:
data.isnull().sum()

In [None]:
data.skew()

In [None]:
# vizualization data
sns.histplot(data =data,x = 'Raw H2' , hue = 'Fire Alarm',kde = True,bins = 100)


This code creates a histogram plot of the "Raw H2" column in the "data" dataset, with the color of the bars being determined by the "Fire Alarm" column and a kernel density estimate being overlaid on the plot, using 100 bins.

In [None]:
sns.histplot(data =data,x = 'Temperature[C]' , hue = 'Fire Alarm',kde = True,bins = 100)


This code creates a histogram of the "Temperature[C]" column of the "data" dataset, separating the data by the "Fire Alarm" column, displaying the density of the data using a kernel density estimate, and using 100 bins for the histogram.

In [None]:
sns.histplot(data =data,x = 'Pressure[hPa]' , hue = 'Fire Alarm',kde = True,bins = 100)

This code creates a histogram of the "Pressure[hPa]" variable, separated by "Fire Alarm" status, with a density curve overlaid and 100 bins.

In [None]:
# Split data
X = data.drop(['Fire Alarm'],axis = 1)
y = data['Fire Alarm']
cols = X.columns

This code is dropping the 'Fire Alarm' column from the data and assigning it to the variable 'y', while the remaining columns are assigned to the variable 'X'. The columns of 'X' are also saved in the variable 'cols'.

In [None]:
# Normalize data
from sklearn.preprocessing import StandardScaler
stdScaler = StandardScaler()
stdScaler.fit(X)
X = stdScaler.transform(X)

This code imports the StandardScaler module from sklearn, creates an instance of the StandardScaler object, fits the object to the data in the variable X, and then applies the scaling transformation to the data in X.

In [None]:
X = pd.DataFrame(X,columns= cols)

In [None]:
X.head()

In [None]:
# Check skew
skew_limit = 0.5
skew_cols = X.skew()
skew_cols = skew_cols.sort_values(ascending = False)
skew_df = skew_cols.to_frame()
skew_df = skew_df.rename(columns = {0 : 'Skew'})
skew_cols = skew_df.query(f'abs(Skew) > {skew_limit}')
display(skew_cols)

This code is finding the columns in the dataframe X that have a skewness greater than 0.5, and displaying them in a table.

In [None]:
from scipy.special import boxcox1p
from scipy.stats import boxcox_normmax

def NormalizeSkewedFeatures(data_modelling):
  for col in skew_cols.index:
    if(col != "SalePrice"):
      try:
        data_modelling[col] = boxcox1p(data_modelling[col], boxcox_normmax((data_modelling[col] + 1)))
      except:
        print(f"colum {col} can not apply BoxCox")
        continue
  return data_modelling

This code uses the scipy library to normalize skewed features in a dataset by applying the Box-Cox transformation with the optimal lambda value calculated using the boxcox_normmax function, and skipping any columns that cannot be transformed.

In [None]:
X_Train = NormalizeSkewedFeatures(X.copy())

In [None]:
X_Train['Temperature[C]'] = X['Temperature[C]']

In [None]:
from sklearn.ensemble import RandomForestClassifier
clf = RandomForestClassifier(max_depth=4, random_state=0)
clf.fit(X, y)

This code is creating a random forest classifier with a maximum depth of 4 and a random state of 0, then fitting it to the provided data (X and y).

In [None]:
y_predict = clf.predict(X)
data['Fire Alarm Predict'] = y_predict
data.head(10)

This code uses a trained classifier (clf) to predict the outcome of the input data (X) and assigns the predicted values to a new column called "Fire Alarm Predict" in a dataframe (data). It then displays the first 10 rows of the updated dataframe.

In [None]:
print(f"Accurancy of model {clf.score(X,y)}")

Printing the accuracy of the model