Skip to content

Classification model of the energy consumption with the data collected from a smart small-scale steel industry in South Korea in V E,Sathishkumar, Shin,Changsun, and Cho,Yongyun. (2023). Steel Industry Energy Consumption. UCI Machine Learning Repository. https://doi.org/10.24432/C52G8C.

Notifications You must be signed in to change notification settings

Hokfu/Energy-Consumption-Model

Repository files navigation

Table of Contents

  1. Introduction
  2. Methodology
  3. Problem Description
  4. EDA
  5. Model Training
  6. Parameter Tuning
  7. Dependency and Environment Management
  8. Containerization

Introduction

Classification model of the energy consumption with the data collected from a smart small-scale steel industry in South Korea.
I took the data from UCI Machine Learning Repository.

Methodology

In this work, we used the dataset from the following research paper
@sathishkumar2023steel

The dataset is provided in the current repository. Here is the link Steel_industry_data.csv

wget 'https://raw.githubusercontent.com/Hokfu/Energy-Consumption-Model/main/Steel_industry_data.csv'

Problem Description

A steel company has a few challenges apart from market competition like Increased energy Costs, downtime, inefficient resource allocation, maintenance, and regulatory compliance

Problem: If the company does not know which conditions lead to high energy consumption and which ones lead to low and medium energy loads, those challenges will become serious problems.

Opportunity: Vice versa, if the company can predict the energy consumption of a process in advance, it can improve in the challenges above, and can gain market advantage.

EDA

Firstly, I tried to find the relation between numerical features and the target we want to know which is energy load type. relationship between numerical features and target variable
We can see clearly that NSM impacts the most to the load type by checking relations in above violin plot. Violin plot or box plot can be used to find out the distribution of numerical features. In this case, I checked the distribution of each numerical features relating to each load type.


feature importance
It is more obvious when we check feature importance while training the random forest model.

Model Training

I trained with two models - logistic regression and random forest. Overall, random forest model seems to work better so I chose it as the final model.

Parameter Tuning

Maximum depth and minimum sample leaves are tuned in a loop to find the best values.
max depth tuning min sample leaves tuning

Dependency and Environment Management

For notebook and model training(train.py)
Use conda or any environment. For conda environment,

conda create -n 'environment-name' python=3.9.18

Activate conda environment
conda activate 'environment-name'

pip install -r requirements.txt

to install requirements.

For model prediction
Use pipenv

pipenv install numpy scikit-learn==1.3.0 gunicorn flask

Containerization


For container building
docker build -t <container_name> .

For container running
docker run -it --rm -p 9696:9696 <container_name>

Then, use another terminal and run predict_test.py to check the model.

Deployment

In Render, create account, create a new web service, and deploy the container.

About

Classification model of the energy consumption with the data collected from a smart small-scale steel industry in South Korea in V E,Sathishkumar, Shin,Changsun, and Cho,Yongyun. (2023). Steel Industry Energy Consumption. UCI Machine Learning Repository. https://doi.org/10.24432/C52G8C.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published