# Build & Save Similarity Model
---

### 개요
* **Preprocessed_repository**로 부터 **preprocessing** 된 data를 불러와 각 data 사이 **유사도(similarity)**를 계산하여 하나의 **유사도 모델(similarity_model)**을 구성하여 반환/저장함

---
* 아래는 저장되어있는 preprocessed_data 사이 similarity를 계산하여 similarity_model을 구성/저장하는 과정임  

<img src="https://raw.githubusercontent.com/jhyun0919/EnergyData_jhyun/master/docs/images/%EC%8A%A4%ED%81%AC%EB%A6%B0%EC%83%B7%202016-05-18%20%EC%98%A4%EC%A0%84%2010.26.43.jpg" alt="Drawing" style="width: 700px;"/>

---
* similarity 계산과 save 과정에 필요한 module들을 import 하자

In [1]:
from utils import GlobalParameter
from utils import FileIO
from utils import Similarity
import os



---
* 다음 과정은 repository의 경로를 지정하고 확인하는 과정이다

In [2]:
repository4prepodessed_path = os.path.join(GlobalParameter.Repository_Path, GlobalParameter.Preprocessed_Path)
repository4prepodessed_path

'/Users/JH/Documents/GitHub/EnergyData_jhyun/repository/preprocessed_data'

---
* 지정된 경로 아래에 있는 preprocessed_data file들의 abs_path를 list로 만들어 반환하자

In [3]:
file_list = FileIO.Load.load_filelist(repository4prepodessed_path)
file_list

['/Users/JH/Documents/GitHub/EnergyData_jhyun/repository/preprocessed_data/PP_VTT_GW1_HA10_VM_EP_KV_K.bin',
 '/Users/JH/Documents/GitHub/EnergyData_jhyun/repository/preprocessed_data/PP_VTT_GW1_HA10_VM_KV_K.bin',
 '/Users/JH/Documents/GitHub/EnergyData_jhyun/repository/preprocessed_data/PP_VTT_GW1_HA10_VM_KV_KAM.bin',
 '/Users/JH/Documents/GitHub/EnergyData_jhyun/repository/preprocessed_data/PP_VTT_GW1_HA11_VM_EP_KV_K.bin',
 '/Users/JH/Documents/GitHub/EnergyData_jhyun/repository/preprocessed_data/PP_VTT_GW1_HA11_VM_KV_K.bin',
 '/Users/JH/Documents/GitHub/EnergyData_jhyun/repository/preprocessed_data/PP_VTT_GW1_HA11_VM_KV_KAM.bin',
 '/Users/JH/Documents/GitHub/EnergyData_jhyun/repository/preprocessed_data/PP_VTT_GW2_HA4_VM_EP_KV_K.bin',
 '/Users/JH/Documents/GitHub/EnergyData_jhyun/repository/preprocessed_data/PP_VTT_GW2_HA4_VM_KV_K.bin',
 '/Users/JH/Documents/GitHub/EnergyData_jhyun/repository/preprocessed_data/PP_VTT_GW2_HA4_VM_KV_KAM.bin']

---
* file_list를 인자값으로 전달하여 **similarity_model**을 구성하고, 
    * 해당 모델(similarity_model)과 
    * 저장된 경로(model_save_path)를 반환 받자

In [4]:
similarity_model, model_save_path = Similarity.Model.build_model(file_list)

---
* 반환 받은 model_save_path를 확인해보자

In [5]:
model_save_path

'/Users/JH/Documents/GitHub/EnergyData_jhyun/repository/model/model.bin'

---
* 반환 받은 similarity_model을 확인해보자

In [6]:
similarity_model

{'cosine_similarity': array([[ 0.   ,  0.018,  0.787,  0.03 ,  0.032,  0.658,  0.128,  0.033,  1.   ],
        [ 0.018,  0.   ,  0.791,  0.048,  0.042,  0.665,  0.144,  0.028,  1.   ],
        [ 0.787,  0.791,  0.   ,  0.798,  0.799,  0.147,  0.825,  0.806,  1.   ],
        [ 0.03 ,  0.048,  0.798,  0.   ,  0.002,  0.656,  0.059,  0.022,  1.   ],
        [ 0.032,  0.042,  0.799,  0.002,  0.   ,  0.656,  0.061,  0.023,  1.   ],
        [ 0.658,  0.665,  0.147,  0.656,  0.656,  0.   ,  0.693,  0.666,  1.   ],
        [ 0.128,  0.144,  0.825,  0.059,  0.061,  0.693,  0.   ,  0.081,  1.   ],
        [ 0.033,  0.028,  0.806,  0.022,  0.023,  0.666,  0.081,  0.   ,  1.   ],
        [ 1.   ,  1.   ,  1.   ,  1.   ,  1.   ,  1.   ,  1.   ,  1.   ,  0.   ]]),
 'euclidean_distance': array([[     0.   ,   2642.35 ,  13562.254,   3838.5  ,   3935.521,
          13338.732,   6934.901,   3828.04 ,  13775.754],
        [  2642.35 ,      0.   ,  13858.202,   4656.261,   4383.415,
          13639.555, 

---
### Similarity Model  

* **similarity_model**의 구성
    * file_list
    * cosine_similarity
    * euclidean_distance
    * gradient_similarity
    * reversed_gradient_similarity

---
* **file_list**
    * preprocessed_repository 아래에 있는 data file의 abs_path를 list로 관리하는 항목임임
        * 각 file의 list_idx는 차후 similarity_matrix에서 row와 column의 idx와 일치하게 됨

In [7]:
similarity_model['file_list']

['/Users/JH/Documents/GitHub/EnergyData_jhyun/repository/preprocessed_data/PP_VTT_GW1_HA10_VM_EP_KV_K.bin',
 '/Users/JH/Documents/GitHub/EnergyData_jhyun/repository/preprocessed_data/PP_VTT_GW1_HA10_VM_KV_K.bin',
 '/Users/JH/Documents/GitHub/EnergyData_jhyun/repository/preprocessed_data/PP_VTT_GW1_HA10_VM_KV_KAM.bin',
 '/Users/JH/Documents/GitHub/EnergyData_jhyun/repository/preprocessed_data/PP_VTT_GW1_HA11_VM_EP_KV_K.bin',
 '/Users/JH/Documents/GitHub/EnergyData_jhyun/repository/preprocessed_data/PP_VTT_GW1_HA11_VM_KV_K.bin',
 '/Users/JH/Documents/GitHub/EnergyData_jhyun/repository/preprocessed_data/PP_VTT_GW1_HA11_VM_KV_KAM.bin',
 '/Users/JH/Documents/GitHub/EnergyData_jhyun/repository/preprocessed_data/PP_VTT_GW2_HA4_VM_EP_KV_K.bin',
 '/Users/JH/Documents/GitHub/EnergyData_jhyun/repository/preprocessed_data/PP_VTT_GW2_HA4_VM_KV_K.bin',
 '/Users/JH/Documents/GitHub/EnergyData_jhyun/repository/preprocessed_data/PP_VTT_GW2_HA4_VM_KV_KAM.bin']

---
* **cosine_similarity**
    * 각 data 사이 **cosine simialrity**를 계산하여 해당 유사도(similarity)를 **symmetric matrix**로 구성

In [8]:
similarity_model['cosine_similarity']

array([[ 0.   ,  0.018,  0.787,  0.03 ,  0.032,  0.658,  0.128,  0.033,  1.   ],
       [ 0.018,  0.   ,  0.791,  0.048,  0.042,  0.665,  0.144,  0.028,  1.   ],
       [ 0.787,  0.791,  0.   ,  0.798,  0.799,  0.147,  0.825,  0.806,  1.   ],
       [ 0.03 ,  0.048,  0.798,  0.   ,  0.002,  0.656,  0.059,  0.022,  1.   ],
       [ 0.032,  0.042,  0.799,  0.002,  0.   ,  0.656,  0.061,  0.023,  1.   ],
       [ 0.658,  0.665,  0.147,  0.656,  0.656,  0.   ,  0.693,  0.666,  1.   ],
       [ 0.128,  0.144,  0.825,  0.059,  0.061,  0.693,  0.   ,  0.081,  1.   ],
       [ 0.033,  0.028,  0.806,  0.022,  0.023,  0.666,  0.081,  0.   ,  1.   ],
       [ 1.   ,  1.   ,  1.   ,  1.   ,  1.   ,  1.   ,  1.   ,  1.   ,  0.   ]])

---
* **euclidean_distance**
    * 각 data 사이 **euclidean distance**를 계산하여 해당 유사도(similarity)를 **symmetric matrix**로 구성

In [9]:
similarity_model['euclidean_distance']

array([[     0.   ,   2642.35 ,  13562.254,   3838.5  ,   3935.521,
         13338.732,   6934.901,   3828.04 ,  13775.754],
       [  2642.35 ,      0.   ,  13858.202,   4656.261,   4383.415,
         13639.555,   7423.263,   3501.575,  14067.647],
       [ 13562.254,  13858.202,      0.   ,  14987.302,  15014.644,
           780.26 ,  13443.173,  14660.575,   1270.35 ],
       [  3838.5  ,   4656.261,  14987.302,      0.   ,    863.361,
         14744.373,   5204.77 ,   3186.513,  15191.619],
       [  3935.521,   4383.415,  15014.644,    863.361,      0.   ,
         14771.002,   5278.966,   3224.633,  15218.663],
       [ 13338.732,  13639.555,    780.26 ,  14744.373,  14771.002,
             0.   ,  13225.429,  14424.24 ,   1493.382],
       [  6934.901,   7423.263,  13443.173,   5204.77 ,   5278.966,
         13225.429,      0.   ,   5824.987,  13606.85 ],
       [  3828.04 ,   3501.575,  14660.575,   3186.513,   3224.633,
         14424.24 ,   5824.987,      0.   ,  14853.905],


---
* **gradient_similarity**
    * 각 data 사이 **gradient simialrity**를 계산하여 해당 유사도(similarity)를 **symmetric matrix**로 구성

In [10]:
similarity_model['gradient_similarity']

array([[   0.   ,   37.398,  309.711,   48.777,   60.615,  482.503,
          87.293,   65.95 ,  100.963],
       [  37.398,    0.   ,  344.971,   45.805,   30.913,  509.31 ,
          64.103,   39.745,   73.496],
       [ 309.711,  344.971,    0.   ,  323.065,  327.625,  255.386,
         296.747,  352.112,  283.104],
       [  48.777,   45.805,  323.065,    0.   ,   13.468,  466.194,
          58.637,   55.581,   68.29 ],
       [  60.615,   30.913,  327.625,   13.468,    0.   ,  491.866,
          48.001,   43.01 ,   54.981],
       [ 482.503,  509.31 ,  255.386,  466.194,  491.866,    0.   ,
         457.352,  516.074,  448.641],
       [  87.293,   64.103,  296.747,   58.637,   48.001,  457.352,
           0.   ,   69.811,   17.356],
       [  65.95 ,   39.745,  352.112,   55.581,   43.01 ,  516.074,
          69.811,    0.   ,   78.81 ],
       [ 100.963,   73.496,  283.104,   68.29 ,   54.981,  448.641,
          17.356,   78.81 ,    0.   ]])

---
* **reversed_gradient_similarity**
    * 각 data 사이 **reversed gradient simialrity**를 계산하여 해당 유사도(similarity)를 **symmetric matrix**로 구성

In [11]:
similarity_model['reversed_gradient_similarity']

array([[   0.   ,  169.639,  389.035,  165.399,  145.113,  541.429,
         122.604,  171.384,  100.963],
       [ 169.639,    0.   ,  344.031,  130.954,  118.299,  505.112,
          84.136,  145.142,   73.496],
       [ 389.035,  344.031,    0.   ,  347.669,  324.113,  742.964,
         298.68 ,  349.756,  283.311],
       [ 165.399,  130.954,  347.669,    0.   ,  115.094,  526.838,
          88.371,  136.579,   68.29 ],
       [ 145.113,  118.299,  324.113,  115.094,    0.   ,  486.208,
          62.111,  123.953,   54.987],
       [ 541.429,  505.112,  742.964,  526.838,  486.208,    0.   ,
         459.641,  511.492,  448.106],
       [ 122.604,   84.136,  298.68 ,   88.371,   62.111,  459.641,
           0.   ,   88.573,   17.356],
       [ 171.384,  145.142,  349.756,  136.579,  123.953,  511.492,
          88.573,    0.   ,   78.81 ],
       [ 100.963,   73.496,  283.311,   68.29 ,   54.987,  448.106,
          17.356,   78.81 ,    0.   ]])