# Build & Save Similarity Model
---

### 개요
* **Preprocessed_repository**로 부터 **preprocessing** 된 data를 불러와 각 data 사이 **유사도(similarity)**를 계산하여 하나의 **유사도 모델(similarity_model)**을 구성하여 반환/저장함

---
* 아래는 저장되어있는 preprocessed_data 사이 similarity를 계산하여 similarity_model을 구성/저장하는 과정임  

<img src="https://raw.githubusercontent.com/jhyun0919/EnergyData_jhyun/master/docs/images/%EC%8A%A4%ED%81%AC%EB%A6%B0%EC%83%B7%202016-05-18%20%EC%98%A4%EC%A0%84%2010.26.43.jpg" alt="Drawing" style="width: 700px;"/>

---
* similarity 계산과 save 과정에 필요한 module들을 import 하자

In [1]:
from utils import GlobalParameter
from utils import FileIO
from utils import Similarity
import os



---
* 다음 과정은 repository의 경로를 지정하고 확인하는 과정이다

In [2]:
repository4prepodessed_path = os.path.join(GlobalParameter.Repository_Path, GlobalParameter.Preprocessed_Path)
repository4prepodessed_path

'/Users/JH/Documents/GitHub/EnergyData_jhyun/repository/preprocessed_data'

---
* 지정된 경로 아래에 있는 preprocessed_data file들의 abs_path를 list로 만들어 반환하자

In [3]:
file_list = FileIO.Load.load_filelist(repository4prepodessed_path)
file_list

['/Users/JH/Documents/GitHub/EnergyData_jhyun/repository/preprocessed_data/PP_VTT_GW1_HA10_VM_EP_KV_K.bin',
 '/Users/JH/Documents/GitHub/EnergyData_jhyun/repository/preprocessed_data/PP_VTT_GW1_HA10_VM_KV_K.bin',
 '/Users/JH/Documents/GitHub/EnergyData_jhyun/repository/preprocessed_data/PP_VTT_GW1_HA10_VM_KV_KAM.bin',
 '/Users/JH/Documents/GitHub/EnergyData_jhyun/repository/preprocessed_data/PP_VTT_GW1_HA11_VM_EP_KV_K.bin',
 '/Users/JH/Documents/GitHub/EnergyData_jhyun/repository/preprocessed_data/PP_VTT_GW1_HA11_VM_KV_K.bin',
 '/Users/JH/Documents/GitHub/EnergyData_jhyun/repository/preprocessed_data/PP_VTT_GW1_HA11_VM_KV_KAM.bin',
 '/Users/JH/Documents/GitHub/EnergyData_jhyun/repository/preprocessed_data/PP_VTT_GW2_HA4_VM_EP_KV_K.bin',
 '/Users/JH/Documents/GitHub/EnergyData_jhyun/repository/preprocessed_data/PP_VTT_GW2_HA4_VM_KV_K.bin',
 '/Users/JH/Documents/GitHub/EnergyData_jhyun/repository/preprocessed_data/PP_VTT_GW2_HA4_VM_KV_KAM.bin']

---
* file_list를 인자값으로 전달하여 **similarity_model**을 구성하고, 
    * 해당 모델(similarity_model)과 
    * 저장된 경로(model_save_path)를 반환 받자

In [4]:
similarity_model, model_save_path = Similarity.Model.build_model(file_list)

---
* 반환 받은 model_save_path를 확인해보자

In [5]:
model_save_path

'/Users/JH/Documents/GitHub/EnergyData_jhyun/repository/model/model.bin'

---
* 반환 받은 similarity_model을 확인해보자

In [6]:
similarity_model

{'cosine_similarity': array([[ 0.   ,  0.038,  0.909,  0.066,  0.076,  0.931,  0.194,  0.089,
         -0.994],
        [ 0.038,  0.   ,  0.916,  0.084,  0.071,  0.961,  0.184,  0.07 ,
         -0.994],
        [ 0.909,  0.916,  0.   ,  0.916,  0.911,  0.142,  0.93 ,  0.931,
         -0.999],
        [ 0.066,  0.084,  0.916,  0.   ,  0.003,  0.893,  0.082,  0.045,
         -0.995],
        [ 0.076,  0.071,  0.911,  0.003,  0.   ,  0.881,  0.078,  0.046,
         -0.996],
        [ 0.931,  0.961,  0.142,  0.893,  0.881,  0.   ,  0.885,  0.925,
         -0.999],
        [ 0.194,  0.184,  0.93 ,  0.082,  0.078,  0.885,  0.   ,  0.095,
         -0.997],
        [ 0.089,  0.07 ,  0.931,  0.045,  0.046,  0.925,  0.095,  0.   ,
         -0.995],
        [-0.994, -0.994, -0.999, -0.995, -0.996, -0.999, -0.997, -0.995,  0.   ]]),
 'covariance': array([[  9.90440234e-01,   9.41935308e-01,   8.48112269e-02,
           9.57313075e-01,   9.45224056e-01,   6.45930925e-02,
           8.10657063e-01, 

---
### Similarity Model  

* **similarity_model**의 구성
    * file_list
    * cosine_similarity
    * euclidean_distance
    * manhatton_distance
    * gradient_similarity
    * reversed_gradient_similarity

---
* **file_list**
    * preprocessed_repository 아래에 있는 data file의 abs_path를 list로 관리하는 항목임임
        * 각 file의 list_idx는 차후 similarity_matrix에서 row와 column의 idx와 일치하게 됨

In [7]:
similarity_model['file_list']

['/Users/JH/Documents/GitHub/EnergyData_jhyun/repository/preprocessed_data/PP_VTT_GW1_HA10_VM_EP_KV_K.bin',
 '/Users/JH/Documents/GitHub/EnergyData_jhyun/repository/preprocessed_data/PP_VTT_GW1_HA10_VM_KV_K.bin',
 '/Users/JH/Documents/GitHub/EnergyData_jhyun/repository/preprocessed_data/PP_VTT_GW1_HA10_VM_KV_KAM.bin',
 '/Users/JH/Documents/GitHub/EnergyData_jhyun/repository/preprocessed_data/PP_VTT_GW1_HA11_VM_EP_KV_K.bin',
 '/Users/JH/Documents/GitHub/EnergyData_jhyun/repository/preprocessed_data/PP_VTT_GW1_HA11_VM_KV_K.bin',
 '/Users/JH/Documents/GitHub/EnergyData_jhyun/repository/preprocessed_data/PP_VTT_GW1_HA11_VM_KV_KAM.bin',
 '/Users/JH/Documents/GitHub/EnergyData_jhyun/repository/preprocessed_data/PP_VTT_GW2_HA4_VM_EP_KV_K.bin',
 '/Users/JH/Documents/GitHub/EnergyData_jhyun/repository/preprocessed_data/PP_VTT_GW2_HA4_VM_KV_K.bin',
 '/Users/JH/Documents/GitHub/EnergyData_jhyun/repository/preprocessed_data/PP_VTT_GW2_HA4_VM_KV_KAM.bin']

---
* **covariance**
    * 각 data 사이 **covariance**를 계산하여 해당 유사도(similarity)를 **symmetric matrix**로 구성

In [8]:
similarity_model['covariance']

array([[  9.90440234e-01,   9.41935308e-01,   8.48112269e-02,
          9.57313075e-01,   9.45224056e-01,   6.45930925e-02,
          8.10657063e-01,   9.24261912e-01,  -3.41390872e-03],
       [  9.41935308e-01,   9.71891045e-01,   7.56023087e-02,
          9.28991314e-01,   9.47172602e-01,   3.58401501e-02,
          8.13103689e-01,   9.24520734e-01,  -3.89648755e-03],
       [  8.48112269e-02,   7.56023087e-02,   8.25948153e-01,
          8.03563700e-02,   8.40187232e-02,   7.19271295e-01,
          6.59052368e-02,   6.26831597e-02,  -3.89355835e-04],
       [  9.57313075e-01,   9.28991314e-01,   8.03563700e-02,
          1.05948132e+00,   1.05548115e+00,   1.03404221e-01,
          9.54919057e-01,   1.00255331e+00,  -3.07638052e-03],
       [  9.45224056e-01,   9.47172602e-01,   8.40187232e-02,
          1.05548115e+00,   1.06951791e+00,   1.14026241e-01,
          9.58744606e-01,   9.93714682e-01,  -2.90897168e-03],
       [  6.45930925e-02,   3.58401501e-02,   7.19271295e-01,
   

---
* **cosine_similarity**
    * 각 data 사이 **cosine simialrity**를 계산하여 해당 유사도(similarity)를 **symmetric matrix**로 구성

In [9]:
similarity_model['cosine_similarity']

array([[ 0.   ,  0.038,  0.909,  0.066,  0.076,  0.931,  0.194,  0.089,
        -0.994],
       [ 0.038,  0.   ,  0.916,  0.084,  0.071,  0.961,  0.184,  0.07 ,
        -0.994],
       [ 0.909,  0.916,  0.   ,  0.916,  0.911,  0.142,  0.93 ,  0.931,
        -0.999],
       [ 0.066,  0.084,  0.916,  0.   ,  0.003,  0.893,  0.082,  0.045,
        -0.995],
       [ 0.076,  0.071,  0.911,  0.003,  0.   ,  0.881,  0.078,  0.046,
        -0.996],
       [ 0.931,  0.961,  0.142,  0.893,  0.881,  0.   ,  0.885,  0.925,
        -0.999],
       [ 0.194,  0.184,  0.93 ,  0.082,  0.078,  0.885,  0.   ,  0.095,
        -0.997],
       [ 0.089,  0.07 ,  0.931,  0.045,  0.046,  0.925,  0.095,  0.   ,
        -0.995],
       [-0.994, -0.994, -0.999, -0.995, -0.996, -0.999, -0.997, -0.995,  0.   ]])

---
* **euclidean_distance**
    * 각 data 사이 **euclidean distance**를 계산하여 해당 유사도(similarity)를 **symmetric matrix**로 구성

In [10]:
similarity_model['euclidean_distance']

array([[   0.   ,   74.411,  354.838,  100.623,  108.247,  361.088,
         170.702,  116.332,  321.428],
       [  74.411,    0.   ,  360.477,  113.019,  107.833,  371.74 ,
         165.212,  104.87 ,  326.448],
       [ 354.838,  360.477,    0.   ,  362.987,  368.799,  136.881,
         362.052,  367.268,  306.696],
       [ 100.623,  113.019,  362.987,    0.   ,   21.754,  360.188,
         112.549,   83.897,  329.375],
       [ 108.247,  107.833,  368.799,   21.754,    0.   ,  365.069,
         109.729,   87.299,  337.346],
       [ 361.088,  371.74 ,  136.881,  360.188,  365.069,    0.   ,
         355.105,  368.355,  309.985],
       [ 170.702,  165.212,  362.052,  112.549,  109.729,  355.105,
           0.   ,  120.636,  324.873],
       [ 116.332,  104.87 ,  367.268,   83.897,   87.299,  368.355,
         120.636,    0.   ,  330.716],
       [ 321.428,  326.448,  306.696,  329.375,  337.346,  309.985,
         324.873,  330.716,    0.   ]])

---
* **manhatton_distance**
    * 각 data 사이 **manhatton distance**를 계산하여 해당 유사도(similarity)를 **symmetric matrix**로 구성

In [11]:
similarity_model['manhattan_distance']

array([[ 0.        ,  0.22389808,  0.74712074,  0.28994398,  0.32272072,
         0.81851911,  0.55159773,  0.36840761,  0.73799388],
       [ 0.22389808,  0.        ,  0.70576342,  0.33172273,  0.2989379 ,
         0.79123097,  0.52648347,  0.32054276,  0.70567165],
       [ 0.74712074,  0.70576342,  0.        ,  0.78048187,  0.78603476,
         0.36567016,  0.83606104,  0.78682505,  0.19612062],
       [ 0.28994398,  0.33172273,  0.78048187,  0.        ,  0.06097491,
         0.79656894,  0.34233685,  0.24679199,  0.78408137],
       [ 0.32272072,  0.2989379 ,  0.78603476,  0.06097491,  0.        ,
         0.78897834,  0.33261979,  0.2544871 ,  0.81120333],
       [ 0.81851911,  0.79123097,  0.36567016,  0.79656894,  0.78897834,
         0.        ,  0.76062935,  0.775794  ,  0.42274913],
       [ 0.55159773,  0.52648347,  0.83606104,  0.34233685,  0.33261979,
         0.76062935,  0.        ,  0.36851547,  0.8740347 ],
       [ 0.36840761,  0.32054276,  0.78682505,  0.24679199,  0

---
* **gradient_similarity**
    * 각 data 사이 **gradient simialrity**를 계산하여 해당 유사도(similarity)를 **symmetric matrix**로 구성

In [12]:
similarity_model['gradient_similarity']

array([[  0.00000000e+00,   1.33498079e-04,   1.35509952e-03,
          8.49472186e-05,   1.38104703e-04,   1.88512987e-03,
          1.00123560e-04,   1.57229579e-04,   2.90528889e-03],
       [  1.33498079e-04,   0.00000000e+00,   1.37209716e-03,
          1.43705176e-04,   1.23794989e-04,   1.85728094e-03,
          1.40669908e-04,   1.43558438e-04,   2.75777693e-03],
       [  1.35509952e-03,   1.37209716e-03,   0.00000000e+00,
          1.38138279e-03,   1.34537708e-03,   1.40281063e-03,
          1.35620081e-03,   1.37656111e-03,   3.92666921e-03],
       [  8.49472186e-05,   1.43705176e-04,   1.38138279e-03,
          0.00000000e+00,   9.89954068e-05,   1.86734804e-03,
          7.95347713e-05,   1.43302264e-04,   2.87708507e-03],
       [  1.38104703e-04,   1.23794989e-04,   1.34537708e-03,
          9.89954068e-05,   0.00000000e+00,   1.82846242e-03,
          1.10585834e-04,   1.29403536e-04,   2.72661834e-03],
       [  1.88512987e-03,   1.85728094e-03,   1.40281063e-03,
   

---
* **reversed_gradient_similarity**
    * 각 data 사이 **reversed gradient simialrity**를 계산하여 해당 유사도(similarity)를 **symmetric matrix**로 구성

In [13]:
similarity_model['reversed_gradient_similarity']

array([[  2.06653415e-04,   2.13556636e-04,   1.40140750e-03,
          1.78436165e-04,   1.80719332e-04,   1.91575116e-03,
          1.45491418e-04,   2.13529775e-04,   2.90524860e-03],
       [  2.13556636e-04,   2.08915172e-04,   1.36755691e-03,
          1.85339386e-04,   1.77731146e-04,   1.85151978e-03,
          1.52515512e-04,   2.09716393e-04,   2.75777693e-03],
       [  1.40140750e-03,   1.36755691e-03,   2.54858197e-03,
          1.38494185e-03,   1.33980669e-03,   2.82647844e-03,
          1.35508609e-03,   1.37064734e-03,   3.92762304e-03],
       [  1.78436165e-04,   1.85339386e-04,   1.38494185e-03,
          1.50218915e-04,   1.52502082e-04,   1.88671466e-03,
          1.17583067e-04,   1.85312525e-04,   2.87707164e-03],
       [  1.80719332e-04,   1.77731146e-04,   1.33980669e-03,
          1.52502082e-04,   1.46597991e-04,   1.82074272e-03,
          1.19893094e-04,   1.78557802e-04,   2.72661834e-03],
       [  1.91575116e-03,   1.85151978e-03,   2.82647844e-03,
   