#**스마트폰 센서 데이터 기반 모션 분류**
# 단계2 : 기본 모델링


## 0.미션

* 데이터 전처리
    * 가변수화, 데이터 분할, NaN 확인 및 조치, 스케일링 등 필요한 전처리 수행
* 다양한 알고리즘으로 분류 모델 생성
    * 최소 4개 이상의 알고리즘을 적용하여 모델링 수행
    * 성능 비교
    * 각 모델의 성능을 저장하는 별도 데이터 프레임을 만들고 비교
* 옵션 : 다음 사항은 선택사항입니다. 시간이 허용하는 범위 내에서 수행하세요.
    * 상위 N개 변수를 선정하여 모델링 및 성능 비교
        * 모델링에 항상 모든 변수가 필요한 것은 아닙니다.
        * 변수 중요도 상위 N개를 선정하여 모델링하고 타 모델과 성능을 비교하세요.
        * 상위 N개를 선택하는 방법은, 변수를 하나씩 늘려가며 모델링 및 성능 검증을 수행하여 적절한 지점을 찾는 것입니다.

## 1.환경설정

### (1) 라이브러리 불러오기

* 세부 요구사항
    - 기본적으로 필요한 라이브러리를 import 하도록 코드가 작성되어 있습니다.
    - 필요하다고 판단되는 라이브러리를 추가하세요.

In [None]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

# 필요하다고 판단되는 라이브러리를 추가하세요.




* 함수 생성

In [None]:
# 변수의 특성 중요도 계산하기
def plot_feature_importance(importance, names, result_only = False, topn = 'all'):
    feature_importance = np.array(importance)
    feature_name = np.array(names)

    data={'feature_name':feature_name,'feature_importance':feature_importance}
    fi_temp = pd.DataFrame(data)

    #변수의 특성 중요도 순으로 정렬하기
    fi_temp.sort_values(by=['feature_importance'], ascending=False,inplace=True)
    fi_temp.reset_index(drop=True, inplace = True)

    if topn == 'all' :
        fi_df = fi_temp.copy()
    else :
        fi_df = fi_temp.iloc[:topn]

    #변수의 특성 중요도 그래프로 그리기
    if result_only == False :
        plt.figure(figsize=(10,20))
        sns.barplot(x='feature_importance', y='feature_name', data = fi_df)

        plt.xlabel('importance')
        plt.ylabel('feature name')
        plt.grid()

    return fi_df

In [None]:
from google.colab import drive
drive.mount('/content/drive')

Drive already mounted at /content/drive; to attempt to forcibly remount, call drive.mount("/content/drive", force_remount=True).


### (2) 데이터 불러오기

* 주어진 데이터셋
    * data01_train.csv : 학습 및 검증용
* 세부 요구사항
    - 전체 데이터 'data01_train.csv' 를 불러와 'data' 이름으로 저장합니다.
        - data에서 변수 subject는 삭제합니다.
    - 데이터프레임에 대한 기본 정보를 확인합니다.( .head(), .shape 등)

#### 1) 데이터 로딩

In [None]:
data=pd.read_csv('/content/drive/MyDrive/미니프로젝트/5차_/2023.10.25_미니프로젝트5차_실습자료, 데이터/데이터/data01_train.csv')

In [None]:
import joblib
fi = joblib.load('/content/drive/MyDrive/미니프로젝트/5차_/2023.10.25_미니프로젝트5차_실습자료, 데이터/데이터/feature_importances.pkl')

#### 2) 기본 정보 조회

In [None]:
data.head()

Unnamed: 0,tBodyAcc-mean()-X,tBodyAcc-mean()-Y,tBodyAcc-mean()-Z,tBodyAcc-std()-X,tBodyAcc-std()-Y,tBodyAcc-std()-Z,tBodyAcc-mad()-X,tBodyAcc-mad()-Y,tBodyAcc-mad()-Z,tBodyAcc-max()-X,...,fBodyBodyGyroJerkMag-kurtosis(),"angle(tBodyAccMean,gravity)","angle(tBodyAccJerkMean),gravityMean)","angle(tBodyGyroMean,gravityMean)","angle(tBodyGyroJerkMean,gravityMean)","angle(X,gravityMean)","angle(Y,gravityMean)","angle(Z,gravityMean)",subject,Activity
0,0.288508,-0.009196,-0.103362,-0.988986,-0.962797,-0.967422,-0.989,-0.962596,-0.96565,-0.929747,...,-0.816696,-0.042494,-0.044218,0.307873,0.07279,-0.60112,0.331298,0.165163,21,STANDING
1,0.265757,-0.016576,-0.098163,-0.989551,-0.994636,-0.987435,-0.990189,-0.99387,-0.987558,-0.937337,...,-0.693515,-0.062899,0.388459,-0.765014,0.771524,0.345205,-0.769186,-0.147944,15,LAYING
2,0.278709,-0.014511,-0.108717,-0.99772,-0.981088,-0.994008,-0.997934,-0.982187,-0.995017,-0.942584,...,-0.829311,0.000265,-0.525022,-0.891875,0.021528,-0.833564,0.202434,-0.032755,11,STANDING
3,0.289795,-0.035536,-0.150354,-0.231727,-0.006412,-0.338117,-0.273557,0.014245,-0.347916,0.008288,...,-0.408956,-0.255125,0.612804,0.747381,-0.072944,-0.695819,0.287154,0.111388,17,WALKING
4,0.394807,0.034098,0.091229,0.088489,-0.106636,-0.388502,-0.010469,-0.10968,-0.346372,0.584131,...,-0.563437,-0.044344,-0.845268,-0.97465,-0.887846,-0.705029,0.264952,0.137758,17,WALKING_DOWNSTAIRS


In [None]:
data.shape

(5881, 563)

In [None]:
data.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 5881 entries, 0 to 5880
Columns: 563 entries, tBodyAcc-mean()-X to Activity
dtypes: float64(561), int64(1), object(1)
memory usage: 25.3+ MB


In [None]:
data.describe()

Unnamed: 0,tBodyAcc-mean()-X,tBodyAcc-mean()-Y,tBodyAcc-mean()-Z,tBodyAcc-std()-X,tBodyAcc-std()-Y,tBodyAcc-std()-Z,tBodyAcc-mad()-X,tBodyAcc-mad()-Y,tBodyAcc-mad()-Z,tBodyAcc-max()-X,...,fBodyBodyGyroJerkMag-skewness(),fBodyBodyGyroJerkMag-kurtosis(),"angle(tBodyAccMean,gravity)","angle(tBodyAccJerkMean),gravityMean)","angle(tBodyGyroMean,gravityMean)","angle(tBodyGyroJerkMean,gravityMean)","angle(X,gravityMean)","angle(Y,gravityMean)","angle(Z,gravityMean)",subject
count,5881.0,5881.0,5881.0,5881.0,5881.0,5881.0,5881.0,5881.0,5881.0,5881.0,...,5881.0,5881.0,5881.0,5881.0,5881.0,5881.0,5881.0,5881.0,5881.0,5881.0
mean,0.274811,-0.017799,-0.109396,-0.603138,-0.509815,-0.604058,-0.628151,-0.525944,-0.605374,-0.46549,...,-0.305883,-0.623548,0.008524,-0.001185,0.00934,-0.007099,-0.491501,0.059299,-0.054594,17.381568
std,0.067614,0.039422,0.058373,0.448807,0.501815,0.417319,0.424345,0.485115,0.413043,0.544995,...,0.322808,0.310371,0.33973,0.447197,0.60819,0.476738,0.509069,0.29734,0.278479,8.938316
min,-0.503823,-0.684893,-1.0,-1.0,-0.999844,-0.999667,-1.0,-0.999419,-1.0,-1.0,...,-0.979261,-0.999765,-0.97658,-1.0,-1.0,-1.0,-1.0,-1.0,-0.980143,1.0
25%,0.262919,-0.024877,-0.121051,-0.992774,-0.97768,-0.980127,-0.993602,-0.977865,-0.980112,-0.936067,...,-0.541969,-0.845985,-0.122361,-0.294369,-0.481718,-0.373345,-0.811397,-0.018203,-0.141555,8.0
50%,0.277154,-0.017221,-0.108781,-0.943933,-0.844575,-0.856352,-0.948501,-0.849266,-0.849896,-0.878729,...,-0.342923,-0.712677,0.010278,0.005146,0.011448,-0.000847,-0.709441,0.182893,0.003951,19.0
75%,0.288526,-0.01092,-0.098163,-0.24213,-0.034499,-0.26269,-0.291138,-0.068857,-0.268539,-0.01369,...,-0.127371,-0.501158,0.154985,0.28503,0.499857,0.356236,-0.51133,0.248435,0.111932,26.0
max,1.0,1.0,1.0,1.0,0.916238,1.0,1.0,0.967664,1.0,1.0,...,0.989538,0.956845,1.0,1.0,0.998702,0.996078,0.977344,0.478157,1.0,30.0


## **2. 데이터 전처리**

* 가변수화, 데이터 분할, NaN 확인 및 조치, 스케일링 등 필요한 전처리를 수행한다.


In [None]:
data.isnull().sum()

tBodyAcc-mean()-X       0
tBodyAcc-mean()-Y       0
tBodyAcc-mean()-Z       0
tBodyAcc-std()-X        0
tBodyAcc-std()-Y        0
                       ..
angle(X,gravityMean)    0
angle(Y,gravityMean)    0
angle(Z,gravityMean)    0
subject                 0
Activity                0
Length: 563, dtype: int64

### (1) 데이터 분할1 : x, y

* 세부 요구사항
    - x, y로 분할합니다.

In [None]:
x=data.drop('Activity',axis=1)
y=data.loc[:,'Activity']


### (2) 스케일링(필요시)


* 세부 요구사항
    - 스케일링을 필요로 하는 알고리즘 사용을 위해서 코드 수행
    - min-max 방식 혹은 standard 방식 중 한가지 사용.

In [None]:
from sklearn.preprocessing import MinMaxScaler
scaler = MinMaxScaler()

### (3) 데이터분할2 : train, validation

* 세부 요구사항
    - train : val = 8 : 2 혹은 7 : 3
    - random_state 옵션을 사용하여 다른 모델과 비교를 위해 성능이 재현되도록 합니다.

In [None]:
from sklearn.model_selection import train_test_split
x_train, x_val, y_train, y_val = train_test_split(x, y, test_size=0.2, random_state=42, stratify=y)


In [None]:
x_train_s = scaler.fit_transform(x_train)
x_val_s = scaler.transform(x_val)

## **3. 기본 모델링**



* 세부 요구사항
    - 최소 4개 이상의 알고리즘을 적용하여 모델링을 수행한다.
    - 각 알고리즘별로 전체 변수로 모델링, 상위 N개 변수를 선택하여 모델링을 수행하고 성능 비교를 한다.
    - (옵션) 알고리즘 중 1~2개에 대해서, 변수 중요도 상위 N개를 선정하여 모델링하고 타 모델과 성능을 비교.
        * 상위 N개를 선택하는 방법은, 변수를 하나씩 늘려가며 모델링 및 성능 검증을 수행하여 적절한 지점을 찾는 것이다.

### (1) 알고리즘1 :

In [None]:
from sklearn.ensemble import RandomForestClassifier

In [None]:
rf = RandomForestClassifier()
rf.fit(x_train, y_train)

rf.score(x_val, y_val)

0.9753610875106202

In [None]:
rf = RandomForestClassifier()
rf.fit(x_train_s, y_train)

rf.score(x_val_s, y_val)

0.9762107051826678

In [None]:
fi

Unnamed: 0,sensor,agg,axis,feature_name,feature_importance_6class,feature_importance_dynamic,feature_importance_standing,feature_importance_sitting,feature_importance_laying,feature_importance_walking,feature_importance_walking_up,feature_importance_walking_down
0,tBodyAcc,mean(),X,tBodyAcc-mean()-X,0.000250,0.000017,0.000017,0.000236,0.000247,0.000220,0.000250,0.000175
1,tBodyAcc,mean(),Y,tBodyAcc-mean()-Y,0.000269,0.000017,0.000017,0.000362,0.000139,0.000441,0.000871,0.000278
2,tBodyAcc,mean(),Z,tBodyAcc-mean()-Z,0.000188,0.000000,0.000000,0.000206,0.000103,0.000100,0.000361,0.000193
3,tBodyAcc,std(),X,tBodyAcc-std()-X,0.004709,0.000000,0.000000,0.001512,0.001378,0.004702,0.003864,0.014531
4,tBodyAcc,std(),Y,tBodyAcc-std()-Y,0.000332,0.000000,0.000000,0.000477,0.000048,0.000024,0.000397,0.000789
...,...,...,...,...,...,...,...,...,...,...,...,...
556,angle,tBodyGyroMean,gravityMean,"angle(tBodyGyroMean,gravityMean)",0.000565,0.000000,0.000000,0.001082,0.000000,0.000356,0.000225,0.000499
557,angle,tBodyGyroJerkMean,gravityMean,"angle(tBodyGyroJerkMean,gravityMean)",0.000532,0.000000,0.000000,0.000945,0.000000,0.000082,0.000046,0.000120
558,angle,X,gravityMean,"angle(X,gravityMean)",0.030217,0.000043,0.000043,0.031555,0.082389,0.003542,0.007252,0.001507
559,angle,Y,gravityMean,"angle(Y,gravityMean)",0.028289,0.000026,0.000026,0.054235,0.053112,0.001675,0.030536,0.005530


In [None]:
cols = list(fi)[4:]
fi.loc[:, cols]

Unnamed: 0,feature_importance_6class,feature_importance_dynamic,feature_importance_standing,feature_importance_sitting,feature_importance_laying,feature_importance_walking,feature_importance_walking_up,feature_importance_walking_down
0,0.000250,0.000017,0.000017,0.000236,0.000247,0.000220,0.000250,0.000175
1,0.000269,0.000017,0.000017,0.000362,0.000139,0.000441,0.000871,0.000278
2,0.000188,0.000000,0.000000,0.000206,0.000103,0.000100,0.000361,0.000193
3,0.004709,0.000000,0.000000,0.001512,0.001378,0.004702,0.003864,0.014531
4,0.000332,0.000000,0.000000,0.000477,0.000048,0.000024,0.000397,0.000789
...,...,...,...,...,...,...,...,...
556,0.000565,0.000000,0.000000,0.001082,0.000000,0.000356,0.000225,0.000499
557,0.000532,0.000000,0.000000,0.000945,0.000000,0.000082,0.000046,0.000120
558,0.030217,0.000043,0.000043,0.031555,0.082389,0.003542,0.007252,0.001507
559,0.028289,0.000026,0.000026,0.054235,0.053112,0.001675,0.030536,0.005530


In [None]:
fi.loc[:, cols].sum(axis=1)
fi['feature_importance_sum'] = fi.loc[:, cols].sum(axis=1)
fi

Unnamed: 0,sensor,agg,axis,feature_name,feature_importance_6class,feature_importance_dynamic,feature_importance_standing,feature_importance_sitting,feature_importance_laying,feature_importance_walking,feature_importance_walking_up,feature_importance_walking_down,feature_importance_sum
0,tBodyAcc,mean(),X,tBodyAcc-mean()-X,0.000250,0.000017,0.000017,0.000236,0.000247,0.000220,0.000250,0.000175,0.001412
1,tBodyAcc,mean(),Y,tBodyAcc-mean()-Y,0.000269,0.000017,0.000017,0.000362,0.000139,0.000441,0.000871,0.000278,0.002394
2,tBodyAcc,mean(),Z,tBodyAcc-mean()-Z,0.000188,0.000000,0.000000,0.000206,0.000103,0.000100,0.000361,0.000193,0.001151
3,tBodyAcc,std(),X,tBodyAcc-std()-X,0.004709,0.000000,0.000000,0.001512,0.001378,0.004702,0.003864,0.014531,0.030696
4,tBodyAcc,std(),Y,tBodyAcc-std()-Y,0.000332,0.000000,0.000000,0.000477,0.000048,0.000024,0.000397,0.000789,0.002066
...,...,...,...,...,...,...,...,...,...,...,...,...,...
556,angle,tBodyGyroMean,gravityMean,"angle(tBodyGyroMean,gravityMean)",0.000565,0.000000,0.000000,0.001082,0.000000,0.000356,0.000225,0.000499,0.002727
557,angle,tBodyGyroJerkMean,gravityMean,"angle(tBodyGyroJerkMean,gravityMean)",0.000532,0.000000,0.000000,0.000945,0.000000,0.000082,0.000046,0.000120,0.001726
558,angle,X,gravityMean,"angle(X,gravityMean)",0.030217,0.000043,0.000043,0.031555,0.082389,0.003542,0.007252,0.001507,0.156547
559,angle,Y,gravityMean,"angle(Y,gravityMean)",0.028289,0.000026,0.000026,0.054235,0.053112,0.001675,0.030536,0.005530,0.173427


In [None]:
# 중요도 작은 값부터
for i in range(50, 551, 50):
    cols = fi.sort_values(by='feature_importance_sum')['feature_name'].values[:i]
    temp_x = x_train[cols]
    rf = RandomForestClassifier()
    rf.fit(temp_x, y_train)
    print(f"변수 {i}개 : {rf.score(x_val[cols], y_val)}")

변수 50개 : 0.6338147833474936
변수 100개 : 0.719626168224299
변수 150개 : 0.7994902293967715
변수 200개 : 0.8351741716227697
변수 250개 : 0.8708581138487681
변수 300개 : 0.8920985556499575
변수 350개 : 0.913338997451147
변수 400개 : 0.9218351741716228
변수 450개 : 0.9362786745964317
변수 500개 : 0.9490229396771452
변수 550개 : 0.9685641461342396


In [None]:
# 중요도 큰 값부터
for i in range(50, 551, 50):
    cols = fi.sort_values(by='feature_importance_sum',ascending=False)['feature_name'].values[:i]
    temp_x = x_train[cols]
    rf = RandomForestClassifier()
    rf.fit(temp_x, y_train)
    print(f"변수 {i}개 : {rf.score(x_val[cols], y_val)}")

변수 50개 : 0.9711129991503823
변수 100개 : 0.973661852166525
변수 150개 : 0.9753610875106202
변수 200개 : 0.9804587935429057
변수 250개 : 0.9796091758708582
변수 300개 : 0.9787595581988106
변수 350개 : 0.9821580288870009
변수 400개 : 0.9787595581988106
변수 450개 : 0.9796091758708582
변수 500개 : 0.9779099405267629
변수 550개 : 0.9711129991503823


### (2) 알고리즘2 :

In [None]:
!pip install lightgbm



In [None]:
from lightgbm import LGBMClassifier

lgb=LGBMClassifier(random_state=42)
lgb.fit(x_train.values,y_train.values)

lgb.score(x_val,y_val)

You can set `force_col_wise=true` to remove the overhead.
[LightGBM] [Info] Total Bins 140168
[LightGBM] [Info] Number of data points in the train set: 4704, number of used features: 562
[LightGBM] [Info] Start training from score -1.662702
[LightGBM] [Info] Start training from score -1.739574
[LightGBM] [Info] Start training from score -1.688825
[LightGBM] [Info] Start training from score -1.774060
[LightGBM] [Info] Start training from score -2.005698
[LightGBM] [Info] Start training from score -1.925291


0.9915038232795242

0.9915038232795242

In [None]:
# 중요도 작은 값부터
for i in range(50, 551, 50):
    cols = fi.sort_values(by='feature_importance_sum')['feature_name'].values[:i]
    temp_x = x_train[cols]
    lgb = LGBMClassifier(random_state=42)
    lgb.fit(temp_x.values, y_train.values)
    print(f"변수 {i}개 : {lgb.score(x_val[cols].values, y_val.values)}")

You can set `force_col_wise=true` to remove the overhead.
[LightGBM] [Info] Total Bins 12302
[LightGBM] [Info] Number of data points in the train set: 4704, number of used features: 50
[LightGBM] [Info] Start training from score -1.662702
[LightGBM] [Info] Start training from score -1.739574
[LightGBM] [Info] Start training from score -1.688825
[LightGBM] [Info] Start training from score -1.774060
[LightGBM] [Info] Start training from score -2.005698
[LightGBM] [Info] Start training from score -1.925291
변수 50개 : 0.6890399320305862
You can set `force_col_wise=true` to remove the overhead.
[LightGBM] [Info] Total Bins 24841
[LightGBM] [Info] Number of data points in the train set: 4704, number of used features: 100
[LightGBM] [Info] Start training from score -1.662702
[LightGBM] [Info] Start training from score -1.739574
[LightGBM] [Info] Start training from score -1.688825
[LightGBM] [Info] Start training from score -1.774060
[LightGBM] [Info] Start training from score -2.005698
[LightG

In [None]:
# 중요도 큰 값부터
for i in range(50, 551, 50):
    cols = fi.sort_values(by='feature_importance_sum',ascending=False)['feature_name'].values[:i]
    temp_x = x_train[cols]
    lgb = LGBMClassifier(random_state=42)
    lgb.fit(temp_x.values, y_train.values)
    print(f"변수 {i}개 : {lgb.score(x_val[cols].values, y_val.values)}")

You can set `force_col_wise=true` to remove the overhead.
[LightGBM] [Info] Total Bins 12750
[LightGBM] [Info] Number of data points in the train set: 4704, number of used features: 50
[LightGBM] [Info] Start training from score -1.662702
[LightGBM] [Info] Start training from score -1.739574
[LightGBM] [Info] Start training from score -1.688825
[LightGBM] [Info] Start training from score -1.774060
[LightGBM] [Info] Start training from score -2.005698
[LightGBM] [Info] Start training from score -1.925291
변수 50개 : 0.9770603228547153
You can set `force_col_wise=true` to remove the overhead.
[LightGBM] [Info] Total Bins 25058
[LightGBM] [Info] Number of data points in the train set: 4704, number of used features: 100
[LightGBM] [Info] Start training from score -1.662702
[LightGBM] [Info] Start training from score -1.739574
[LightGBM] [Info] Start training from score -1.688825
[LightGBM] [Info] Start training from score -1.774060
[LightGBM] [Info] Start training from score -2.005698
[LightG

### (3) 알고리즘3 :

In [None]:
from xgboost import XGBClassifier

In [None]:
from sklearn.preprocessing import LabelEncoder
le = LabelEncoder()
y_train_le = le.fit_transform(y_train)
y_val_le = le.transform(y_val)

xgb = XGBClassifier(random_state=42)
xgb.fit(x_train.values, y_train_le)

xgb.score(x_val, y_val_le)

0.9932030586236194

In [None]:
# 중요도 작은 값부터
for i in range(50, 551, 50):
    cols = fi.sort_values(by='feature_importance_sum')['feature_name'].values[:i]
    temp_x = x_train[cols]
    xgb = XGBClassifier(random_state=42)
    xgb.fit(temp_x.values, y_train_le)
    print(f"변수 {i}개 : {xgb.score(x_val[cols], y_val_le)}")

변수 50개 : 0.6728971962616822
변수 100개 : 0.7926932880203909
변수 150개 : 0.8742565845369583
변수 200개 : 0.897196261682243
변수 250개 : 0.923534409515718
변수 300개 : 0.9396771452846219
변수 350개 : 0.9447748513169074
변수 400개 : 0.9481733220050977
변수 450개 : 0.9558198810535259
변수 500개 : 0.9770603228547153
변수 550개 : 0.9906542056074766


In [None]:
# 중요도 큰 값부터
for i in range(50, 551, 50):
    cols = fi.sort_values(by='feature_importance_sum',ascending=False)['feature_name'].values[:i]
    temp_x = x_train[cols]
    xgb = XGBClassifier(random_state=42)
    xgb.fit(temp_x.values, y_train_le)
    print(f"변수 {i}개 : {xgb.score(x_val[cols], y_val_le)}")

변수 50개 : 0.9813084112149533
변수 100개 : 0.9847068819031436
변수 150개 : 0.989804587935429
변수 200개 : 0.989804587935429
변수 250개 : 0.9915038232795242
변수 300개 : 0.9915038232795242
변수 350개 : 0.9923534409515717
변수 400개 : 0.9915038232795242
변수 450개 : 0.9915038232795242
변수 500개 : 0.9923534409515717
변수 550개 : 0.9915038232795242


### (4) 알고리즘4 :

In [None]:
from sklearn.svm import SVC
from sklearn.metrics import classification_report, confusion_matrix, accuracy_score
svc = SVC(random_state=42)
svc.fit(x_train, y_train)

y_pred = svc.predict(x_val)
print(classification_report(y_val, y_pred))
print(confusion_matrix(y_val, y_pred))
print('Accuracy :', accuracy_score(y_val, y_pred))

                    precision    recall  f1-score   support

            LAYING       0.99      1.00      1.00       223
           SITTING       0.87      0.80      0.84       206
          STANDING       0.84      0.89      0.86       218
           WALKING       0.98      0.99      0.99       200
WALKING_DOWNSTAIRS       0.99      0.96      0.97       158
  WALKING_UPSTAIRS       0.96      0.97      0.96       172

          accuracy                           0.93      1177
         macro avg       0.94      0.94      0.94      1177
      weighted avg       0.93      0.93      0.93      1177

[[223   0   0   0   0   0]
 [  2 165  38   0   0   1]
 [  0  24 194   0   0   0]
 [  0   0   0 199   0   1]
 [  0   0   0   1 152   5]
 [  0   0   0   4   2 166]]
Accuracy : 0.9337298215802888


In [None]:
# 중요도 작은 값부터
for i in range(50, 551, 50):
    cols = fi.sort_values(by='feature_importance_sum')['feature_name'].values[:i]
    temp_x = x_train[cols]
    svc = SVC(random_state=42)
    svc.fit(temp_x, y_train)
    print(f"변수 {i}개 : {svc.score(x_val[cols], y_val)}")

변수 50개 : 0.5862361937128292
변수 100개 : 0.7230246389124894
변수 150개 : 0.7629566694987255
변수 200개 : 0.8181818181818182
변수 250개 : 0.8470688190314358
변수 300개 : 0.8717077315208156
변수 350개 : 0.8920985556499575
변수 400개 : 0.892948173322005
변수 450개 : 0.9090909090909091
변수 500개 : 0.9252336448598131
변수 550개 : 0.9592183517417162


In [None]:
# 중요도 큰 값부터
for i in range(50, 551, 50):
    cols = fi.sort_values(by='feature_importance_sum',ascending=False)['feature_name'].values[:i]
    temp_x = x_train[cols]
    svc = SVC(random_state=42)
    svc.fit(temp_x, y_train)
    print(f"변수 {i}개 : {svc.score(x_val[cols], y_val)}")

변수 50개 : 0.9303313508920985
변수 100개 : 0.9362786745964317
변수 150개 : 0.9430756159728122
변수 200개 : 0.9524214103653356
변수 250개 : 0.9634664401019541
변수 300개 : 0.9617672047578589
변수 350개 : 0.9643160577740016
변수 400개 : 0.967714528462192
변수 450개 : 0.967714528462192
변수 500개 : 0.9702633814783348
변수 550개 : 0.9685641461342396
