# [이지AI] 전력설비 고장 대응을 위한 전력품질 분류 모델
- 수치예측
- 상세 설명 :  
전력 설비의 에너지 사용 패턴에 대해 측정된 20여개의 feature 데이터를 바탕으로 "역률", "전류고조파", "전압고조파" 3가지 도메인에 대한 전력 설비의 상태(State Of Health: 정상, 주의, 경고)를 예측하는 Multi-class 분류 과제

- 추진 배경 :  
디지털 ICT 혁신의 핵심인 AI 분야 중 스마트그리드·스마트팩토리 4차 산업혁명의 핵심인 개별 설비·장비별 에너지 효율화·에너지 사용 패턴 분석·설비별 이상 감지 등 AI 핵심 알고리즘 개발 필요

- 활용 가능 서비스 :  
에너지 사용 패턴과 설비 이상 (SOH) 진단 AI Solution

- 채점방식  
Macro F1-Score  

Macro F1 = 1/6 * sum( F1 )  

F1-score: Pricision과 Recall의 조화평균  
F1 = (2 * Recall * Precision ) / ( Precision + Recall )  

- Recall[재현율/민감도] : ( TP ) / ( TP + FN )
- Precision[정밀도] : ( TP ) / ( TP + FP )
- TP : True로 예측하고 실제 값도 True
- TN : False로 예측하고 실제 값도 False
- FP : True로 예측하고 실제는 False
- FN : False로 예측하고 실제는 True

# 1. Null data check

In [38]:
import pandas as pd
import seaborn as sns
import matplotlib as mlt
import matplotlib.pyplot as plt

In [62]:
# for 한글
mlt.rc('font', family='Malgun Gothic')

In [15]:
# load data
train = pd.read_csv(r'./data/train.csv',encoding='utf-8', index_col = 'index')
train

Unnamed: 0_level_0,누적전력량,유효전력평균,무효전력평균,주파수,전류평균,상전압평균,선간전압평균,온도,R상유효전력,R상무효전력,...,S상전압,S상선간전압,T상유효전력,T상무효전력,T상전류,T상전압,T상선간전압,label_역률평균,label_전류고조파평균,label_전압고조파평균
index,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
0,2248930.50,28963.0,20237.0,59.854076,45.197918,259.916656,449.916656,47.5000,8663.00,6182.00,...,260.750,452.750,10417.00,7304.00,48.71875,260.750,448.750,정상,경고,주의
1,0.00,0.0,0.0,0.000000,101.312500,0.000000,0.000000,24.3750,0.00,0.00,...,0.000,0.000,0.00,0.00,100.68750,0.000,0.000,경고,정상,정상
2,5375707.00,35244.0,19826.0,59.975650,107.385414,125.416664,216.750000,18.7500,11988.00,7384.00,...,125.250,216.250,12236.00,6170.00,109.56250,125.000,216.250,정상,정상,주의
3,17781200.00,77056.0,39520.0,59.863000,244.854000,118.083000,205.333000,23.1250,25796.00,12244.00,...,118.750,118.750,24992.00,13704.00,242.18800,118.000,118.000,정상,정상,경고
4,10143988.00,0.0,0.0,59.798140,0.000000,133.750000,231.500000,26.8750,0.00,0.00,...,134.500,231.500,0.00,0.00,0.00000,133.000,230.250,경고,정상,정상
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
2420561,6229905.00,25925.5,22606.0,57.945000,52.833000,215.327000,372.958000,19.0630,8631.50,6670.50,...,216.628,216.628,7818.50,7867.00,51.12500,215.128,215.128,주의,정상,경고
2420562,6057307.50,91244.0,47068.0,59.928820,152.156250,224.416672,388.583344,33.7500,30128.00,17328.00,...,224.250,386.250,29052.00,14496.00,145.18750,223.250,388.750,정상,정상,주의
2420563,7966820.00,16822.0,11924.8,59.914000,31.363100,219.233000,379.862000,17.3214,5157.84,3806.66,...,219.276,219.276,5954.98,4172.97,32.79530,219.560,219.560,정상,경고,경고
2420564,0.00,0.0,0.0,0.000000,44.572918,0.000000,0.000000,34.3750,0.00,0.00,...,0.000,0.000,0.00,0.00,41.96875,0.000,0.000,경고,경고,정상


In [29]:
# null check
train.isna().sum()

누적전력량              17
유효전력평균             17
무효전력평균             17
주파수                17
전류평균               17
상전압평균              17
선간전압평균             17
온도                 17
R상유효전력             18
R상무효전력             18
R상전류               18
R상전압               18
R상선간전압             18
S상유효전력           1380
S상무효전력           1380
S상전류             1380
S상전압             1380
S상선간전압           1380
T상유효전력           2898
T상무효전력           2898
T상전류             2898
T상전압             2898
T상선간전압           2898
label_역률평균          0
label_전류고조파평균       0
label_전압고조파평균       0
dtype: int64

In [47]:
# null row delete
train2 = train.dropna()
train2.isna().sum()

누적전력량            0
유효전력평균           0
무효전력평균           0
주파수              0
전류평균             0
상전압평균            0
선간전압평균           0
온도               0
R상유효전력           0
R상무효전력           0
R상전류             0
R상전압             0
R상선간전압           0
S상유효전력           0
S상무효전력           0
S상전류             0
S상전압             0
S상선간전압           0
T상유효전력           0
T상무효전력           0
T상전류             0
T상전압             0
T상선간전압           0
label_역률평균       0
label_전류고조파평균    0
label_전압고조파평균    0
dtype: int64

# 2. EDA
- before : 2898 null data delete

In [57]:
train2.info()
feature_col = train2.columns[0:23]
target_col = train2.columns[23:]

<class 'pandas.core.frame.DataFrame'>
Int64Index: 2417652 entries, 0 to 2420565
Data columns (total 26 columns):
 #   Column         Dtype  
---  ------         -----  
 0   누적전력량          float64
 1   유효전력평균         float64
 2   무효전력평균         float64
 3   주파수            float64
 4   전류평균           float64
 5   상전압평균          float64
 6   선간전압평균         float64
 7   온도             float64
 8   R상유효전력         float64
 9   R상무효전력         float64
 10  R상전류           float64
 11  R상전압           float64
 12  R상선간전압         float64
 13  S상유효전력         float64
 14  S상무효전력         float64
 15  S상전류           float64
 16  S상전압           float64
 17  S상선간전압         float64
 18  T상유효전력         float64
 19  T상무효전력         float64
 20  T상전류           float64
 21  T상전압           float64
 22  T상선간전압         float64
 23  label_역률평균     object 
 24  label_전류고조파평균  object 
 25  label_전압고조파평균  object 
dtypes: float64(23), object(3)
memory usage: 562.5+ MB


In [72]:
# Need to background knowledge for understand
train2.describe()

Unnamed: 0,누적전력량,유효전력평균,무효전력평균,주파수,전류평균,상전압평균,선간전압평균,온도,R상유효전력,R상무효전력,...,S상유효전력,S상무효전력,S상전류,S상전압,S상선간전압,T상유효전력,T상무효전력,T상전류,T상전압,T상선간전압
count,2417652.0,2417652.0,2417652.0,2417652.0,2417652.0,2417652.0,2417652.0,2417652.0,2417652.0,2417652.0,...,2417652.0,2417652.0,2417652.0,2417652.0,2417652.0,2417652.0,2417652.0,2417652.0,2417652.0,2417652.0
mean,8250512.0,36913.41,20119.49,57.64316,84.74899,196.6612,340.5679,28.9679,11522.86,7520.982,...,13666.55,7088.967,86.65303,197.1775,307.3738,11723.84,5509.404,82.89941,196.283,306.4923
std,11347380.0,40197.14,20869.2,11.21846,85.22464,57.7211,99.88848,11.3116,15347.73,8133.155,...,14559.25,7941.429,87.17585,57.83271,111.0368,13441.44,11034.92,83.1833,57.74115,110.8254
min,-1281148.0,-18412.0,-30332.0,0.0,0.0,0.0,0.0,-5.625,-54944.0,-7732.0,...,-34536.0,-31472.0,0.0,0.0,0.0,-72632.0,-63232.0,0.0,0.0,0.0
25%,663270.3,0.0,0.0,59.83542,0.0,134.75,233.4167,21.0714,0.0,0.0,...,0.0,0.0,0.0,135.5,219.236,0.0,0.0,0.0,134.25,218.0
50%,4076549.0,28102.5,15388.0,59.89142,62.47917,219.5833,380.25,28.2143,9574.0,6092.5,...,10403.0,5809.73,64.6875,220.25,376.25,9219.0,5611.0,63.538,219.0,372.75
75%,11235320.0,71056.0,39612.0,59.929,137.25,227.25,393.3333,36.875,23502.0,12780.0,...,24668.0,14364.0,140.625,227.25,391.0,22486.0,13404.0,137.312,226.75,391.5
max,91706790.0,241576.0,478960.0,60.37,4095.94,625.5833,472.583,63.75,80880.0,167888.0,...,82912.0,155344.0,4095.94,490.5,473.0,106312.0,478960.0,4095.94,897.75,472.25


- 도메인 지식 찾아보기 -> 이상치 찾아보기 및 수정
- 표준화 방법 찾아서 정리
- target col 숫자로 교체
- 상관관계 확인하기
- 분류 관련 분석법 찾기(Classification)
- 분석 돌리기