The electrical distribution network comprising underground and overhead lines is a critical component of the power system infrastructure. However, it is susceptible to various faults due to equipment failure, environmental conditions, human errors, etc.
Traditional fault detection methods are very time-consuming, so come up with an ML model to determine the type of fault using current and voltage data.
I will use Linear models to solve this as the Problem consists of using voltage and current data primarily. I will thus focus on finding data which includes this. Then I will make our my edits to make the model more interactive and create a front-end framework from where it can be accessed. Our model will be able to take input and provide output in the form of the type of problem, it can also be connected to a live server as well as take input data. For now, we are focusing on taking input from the user.
I have used a dataset from the model of a power system in MATLAB which simulates fault analysis. The power system consists of 4 generators of 11 × 10^3 V, each pair located at each end of the transmission line. Transformers are present in between to simulate and study the various faults at the midpoint of the transmission line.
The simulation of the circuit occurs under normal conditions as well as under various fault conditions. The measured Line Voltages and Line Currents at the output side of the power system are collected and saved. The dataset has nearly 12000 data points
Electrical Fault Detection and Classification Dataset Source
Our second Dataset is used to predict if weather has any involvement in the electrical fault as such we use weather data and our previous voltage & current data to predict the type of fault using weather data
Electrical Faults Analysis India Dataset
In this step I dropped the columns not being used and removed outliers and missing values
Herein I connected our input of voltage to both the datasets during predictions as such the voltage being used to predict the type of fault is the average of the voltage of all phases, in future versions I would prefer individual prediction as it would provide greater accuracy!
From the get-go my focus was to make an effective model as such I had the choice of 2 types of linear models! Regressor or Classifier. Regressor models are used for regression tasks, where the goal is to predict a continuous output or a numerical value. Classifier models are used for classification tasks, where the goal is to predict the class or category of a given input.
Upon analysis, I discovered that Classifier models are providing the best results as our data is in discrete form!
- Voltage graph
- Current graph
- Voltage and Current graph, where there is a large fluctuation in the graph, there faults have occurred
- Voltage graph
- Current graph
- In a normal (No_Fault) condition Voltage and Current graph is symmetrical and sinusoidal in nature with current and voltage 120 degrees in phase shift and maximum current is approximately +100 to -100 Amperes and voltage +0.5 pu to -0.5pu
- Voltage graph
- Current graph
- At a time of Line A to ground fault, the current in line A increases to 10 fold approximately 1000 Ampears from normal 100 Ampears and voltage reduced.
I decided to use XGBoost Classifier since it requires minimal parameter tuning and gives accurate results quickly. However, to predict the type of fault, I had to predict a fault in each line. To achieve this, I had to create an ensemble, i.e., we are taking 6 inputs, and the model is predicting 4 outputs and calculating the type of fault based on it. So I utilized the Multiclass-multioutput classification (also known as multitask classification), which is a classification task that labels each sample with a set of non-binary properties. Both the number of properties and the number of classes per property are greater than 2. A single estimator thus handles several joint classification tasks. This is both a generalization of the multilabel classification task, which only considers binary attributes, as well as a generalization of the multiclass classification task, where only one property is considered.
As such, our model can now effectively make multiple predictions!
During this period, I learned multiple new skills, from developing ML models to front end programming. The most amazing part was conducting Exploratory Data Analysis and finding effective features. Then struggling over which ML model was the best. A particular problem I faced was getting multiple predictions from the model as I struggled to find effective solutions before finding out about MultiOutputClassifier! Another daunting aspect was learning frontend from scratch as I learned Streamlit and its functions to host our very own Webapp. I particularly struggled with output errors since Streamlit was not accepting multiple outputs, as such I had to make If...else statements and arrays to ensure smooth operation.
res - folder of images
app.py - contains the frontend streamlit interface
classDatacsv - training data for voltage and current
model.ipynb - consists of model construction and EDA
model.pkl - initial model for fault detection using voltage and current (uses multioutput classifier)
multimodel - Simple backend code to test the model.pkl
reason.pkl - ML model to predict reasoning behind the fault (takes weather input)
requirement.txt - libraries required for running this software
trip.pkl - General prediction of weather its a trip or not
Dhruv Kapur
Khush Trivedi
for their help in this project