## 期中專案：靜脈輸注幫浦(infusion pump)是否為打點滴部位軟組織細菌感染的危險因子？

### 背景與目的

    靜脈留置導管（就是打點滴的意思）是現代病人住院接受醫療時主要的注射藥物給予途徑，可以透過靜脈輸注幫浦(infusion pump)，方便可靠，免去每次給藥需重複靜脈注射之苦，但留置時間較久後，留置部位靜脈容易因藥物刺激而紅腫發炎，甚至導致軟組織細菌感染，嚴重則會死亡。
    
    而美國FDA顯示，2005年與2009年間有56,000不良反應及500起死亡與靜脈輸注幫浦(infusion pump)有關，因此 infusion pump 是否為軟組織細菌感染的危險因子呢？
    
    本專案是參考一篇發表在 ELSEVIER 的研究，此研究目的為研究打點滴的時間與位置與打點滴部位軟組織細菌感染有關。而我則利用此研究的「公開」原始資料探討 「靜脈輸注幫浦(infusion pump)是否為打點滴部位軟組織細菌感染的危險因子？」
    
    (原始資料與原始論文均上傳至git hub)
    (PubMed連結 : https://www.ncbi.nlm.nih.gov/pubmed/20619497)

### 研究方法

    我會結合流行病學資料分析的方法來做方法設計

    變項篩選
        
        Data driven :
        
            1. Univariate analysis
            2. Kitchen sink model
    
       流行病學 :
    
            10% rule
    
    建立模型與解釋：
    
        Conditional logistic regression
    
    預測風險：
    
        SVM

### 介紹資料 

    介紹資料前，需帶到一個重要觀念：干擾因子(confounder)。簡單的說，假設有兩組人，一組吃藥一組沒吃藥，追蹤之後發現有吃藥的死亡率比較高，因此認為這顆藥會導致死亡，但其實是吃藥組的人年紀較老，死亡風險較高，相反沒吃藥組都是年輕人。因此「年紀」就是研究藥物與死亡之間的干擾因子。這觀念之後都會陸續用到。
    
    由於病房的差異與發病時間均為嚴重的干擾因子，因此每個病人均配對4個相同病房與發病時間的對照組，因此該資料為配對的資料。以下先查看資料再進行變項解釋。

In [143]:
%matplotlib inline
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
from sklearn.linear_model import LinearRegression
from sklearn.model_selection import train_test_split
import math

In [44]:
data = pd.read_excel("Dataset 4.xlsx")

In [45]:
data.head()

Unnamed: 0,ID,strata,Case,Gender,Age,Age ≥65,Service,Operation,ContIV 24H,Site,...,Pre,hypertonic,Blood,Lipid,Antibiotic,Cvdrug,neuro drug,PPN,pump,Amount
0,1,30,0,1,19,0,1,0,0,0,...,1,1,1,0,1,0,0,0,1,1
1,2,34,0,1,19,0,1,0,0,0,...,0,0,0,0,0,0,0,0,1,0
2,3,45,1,0,20,0,1,0,1,0,...,0,0,0,0,0,0,0,0,1,0
3,4,14,0,1,20,0,0,1,0,0,...,0,0,0,0,0,0,0,0,0,0
4,5,37,1,0,22,0,1,0,0,0,...,0,1,0,0,0,0,1,0,1,0


### 變項介紹

    ID         : 病人的ID
    strata     : 配對的組別
    Case       : 0=對照組 , 1=病人 (為本研究的outcome)
    Gender     : 0=女 , 1=男
    Age        : 年紀
    Age>=65    : 0=小於65歲 , 1=大於65歲
    Service    : 0=外科病房 , 1=內科病房
    Operation  : 0=沒接受手術 , 1=有接受手術
    ContIV 24H : 0=沒有打點滴超過24小時 , 1=有打點滴超過24小時
    Site       : 0=點滴位置在上肢 , 1=點滴位置在上肢
    Inserter   : 0=由熟練的護士打點滴 , 1=由一般護士打點滴
    Neuro      : 0=非神經外科 , 1=神經外科
    DM         : 0=無糖尿病 , 1=有糖尿病
    renal      : 0=無腎病 , 1=有腎病
    liver      : 0=無肝硬化 , 1=有肝硬化
    Pre        : 0=之前無感染 , 1=之前有感染
    hypertonic : 0=輸注液非高張溶液 , 1=輸注液為高張溶液
    Blood      : 0=輸注液非血品 , 1=輸注液為血品
    Lipid      : 0=輸注液非脂質溶液 , 1=輸注液為脂質溶液
    Antibiotic : 0=輸注液非抗生素 , 1=輸注液為抗生素
    Cvdrug     : 0=輸注液非心臟用藥 , 1=輸注液為心臟用藥 
    neuro drug : 0=輸注液非神經用藥 , 1=輸注液為神經用藥
    PPN        : 0=輸注液非營養液 , 1=輸注液為營養液
    Pump       : 0=無使用 infusion pump , 1=有使用 infusion pump
    Amount     : 0=每天輸注液量小於 1000ml , 1=每天輸注液量大於 1000ml
    
    (詳細介紹請參考原始資料)

### 變項篩選 - Data driven(1) : Univariate analysis

    由於是配對資料且outcome為二元變項，因此採用 conditional logistic regression 來做分析。Univariate analysis 是將各個變項分別做單變量分析，若達顯著則認為是重要變項。

In [53]:
from collections import OrderedDict    # For recording the model specification 

import pylogit as pl                   # For MNL model estimation and
                                       # conversion from wide to long format

In [32]:
10%5

0

In [37]:
#必須將每個配對組別編號 pylogit 才能執行
alt = []

for i in range(1,len(data["ID"])+1):
    if i%5 == 1:
        alt.append(1)
    elif i%5 == 2:
        alt.append(2)
    elif i%5 == 3:
        alt.append(3)
    elif i%5 == 4:
        alt.append(4)
    elif i%5 == 0:
        alt.append(5)

In [46]:
data2 = data.sort_values(by=['strata',"Case"])

In [48]:
data2.head()

Unnamed: 0,ID,strata,Case,Gender,Age,Age ≥65,Service,Operation,ContIV 24H,Site,...,Pre,hypertonic,Blood,Lipid,Antibiotic,Cvdrug,neuro drug,PPN,pump,Amount
13,14,1,0,1,30,0,1,0,0,0,...,1,1,0,0,1,0,0,0,0,0
15,16,1,0,0,31,0,1,0,0,0,...,0,0,0,0,0,0,0,0,0,0
95,96,1,0,0,65,1,1,0,1,0,...,1,0,0,0,1,0,0,0,0,0
109,110,1,0,0,67,1,1,0,0,0,...,1,0,0,0,1,0,0,0,0,1
124,125,1,1,1,70,1,1,0,1,0,...,0,0,1,0,1,0,0,0,0,0


In [50]:
data2["ALT"] = alt

In [73]:
data2.head()

Unnamed: 0,ID,strata,Case,Gender,Age,Age ≥65,Service,Operation,ContIV 24H,Site,...,hypertonic,Blood,Lipid,Antibiotic,Cvdrug,neuro drug,PPN,pump,Amount,ALT
13,14,1,0,1,30,0,1,0,0,0,...,1,0,0,1,0,0,0,0,0,1
15,16,1,0,0,31,0,1,0,0,0,...,0,0,0,0,0,0,0,0,0,2
95,96,1,0,0,65,1,1,0,1,0,...,0,0,0,1,0,0,0,0,0,3
109,110,1,0,0,67,1,1,0,0,0,...,0,0,0,1,0,0,0,0,1,4
124,125,1,1,1,70,1,1,0,1,0,...,0,1,0,1,0,0,0,0,0,5


In [99]:
#將每個變項的 p-value 儲存在 dictionary
univariate_dic = {}
for i in range(3,25):
    #設立一些可以讓 pylogit 跑的容器
    spec = OrderedDict()
    variable_names = OrderedDict()
    spec[data2.columns[i]] = [ [1, 2, 3, 4,5] ]
    variable_names[data2.columns[i]] = [data2.columns[i]]

    model_gender = pl.create_choice_model(data = data2,
                        alt_id_col="ALT",
                        obs_id_col="strata",
                        choice_col="Case",
                        specification=spec,
                        model_type = "MNL",
                        names = variable_names
    )
    model_gender.fit_mle(np.zeros(1))
    univariate_dic[data2.columns[i]] = np.round(model_gender.summary,3)["p_values"]

Log-likelihood at zero: -74.5201
Initial Log-likelihood: -74.5201




Estimation Time for Point Estimation: 0.00 seconds.
Final log-likelihood: -73.8159
Log-likelihood at zero: -74.5201
Initial Log-likelihood: -74.5201
Estimation Time for Point Estimation: 0.00 seconds.
Final log-likelihood: -73.5657
Log-likelihood at zero: -74.5201
Initial Log-likelihood: -74.5201
Estimation Time for Point Estimation: 0.00 seconds.
Final log-likelihood: -72.6849
Log-likelihood at zero: -74.5201
Initial Log-likelihood: -74.5201
Estimation Time for Point Estimation: 0.00 seconds.
Final log-likelihood: -73.7113
Log-likelihood at zero: -74.5201
Initial Log-likelihood: -74.5201
Estimation Time for Point Estimation: 0.00 seconds.
Final log-likelihood: -74.1446
Log-likelihood at zero: -74.5201
Initial Log-likelihood: -74.5201
Estimation Time for Point Estimation: 0.00 seconds.
Final log-likelihood: -62.6777
Log-likelihood at zero: -74.5201
Initial Log-likelihood: -74.5201
Estimation Time for Point Estimation: 0.00 seconds.
Final log-likelihood: -69.5150
Log-likelihood at zero:

In [101]:
univariate_dic

{'Gender': Gender    0.237
 Name: p_values, dtype: float64, 'Age': Age    0.178
 Name: p_values, dtype: float64, 'Age ≥65': Age ≥65    0.063
 Name: p_values, dtype: float64, 'Service': Service    0.209
 Name: p_values, dtype: float64, 'Operation': Operation    0.383
 Name: p_values, dtype: float64, 'ContIV 24H ': ContIV 24H     0.0
 Name: p_values, dtype: float64, 'Site': Site    0.002
 Name: p_values, dtype: float64, 'Inserter': Inserter    0.749
 Name: p_values, dtype: float64, 'Neuro': Neuro    0.007
 Name: p_values, dtype: float64, 'DM': DM    0.531
 Name: p_values, dtype: float64, 'renal dis(2變項)': renal dis(2變項)    0.986
 Name: p_values, dtype: float64, 'liver dis': liver dis    0.769
 Name: p_values, dtype: float64, 'Pre': Pre    0.855
 Name: p_values, dtype: float64, 'hypertonic ': hypertonic     0.783
 Name: p_values, dtype: float64, 'Blood ': Blood     0.218
 Name: p_values, dtype: float64, 'Lipid': Lipid    0.006
 Name: p_values, dtype: float64, 'Antibiotic': Antibiotic    0

### Univariate analysis 分析結果

    共有6個變項達顯著(p-value<0.05)：ContIV 24H、Site、Neuro、Lipid、PPN、pump
    
    因此變項篩選第一步有這六個變項

### 變項篩選 - Data driven(1) : Kitchen sink model

    Kitchen sink model 是流行病學的用語，意即將所有變項放入模式中，若還達顯著則代表是重要變項，需納入至分析的model。在 Kitchen sink model中單一變項，可以看做是控制了其他所有變項後，該變項是否還達顯著? 因此 Kitchen sink model 相較於 Univariate analysis 是更嚴格的篩選標準。
    

In [135]:
#設立一些利於 pylogit 執行的容器
spec = OrderedDict()
variable_names = OrderedDict()
Vars = []

# 由於Age對感染的效應應該非線性，因此採用 Age>=65 而不用 Age(連續)
for i in range(3,25):
    if i != 4:
        Vars.append(data2.columns[i])

for i in Vars:
    spec[i] = [[1,2,3,4]]
    variable_names[i] = [i]
    
model_gender = pl.create_choice_model(data = data2,
                        alt_id_col="ALT",
                        obs_id_col="strata",
                        choice_col="Case",
                        specification=spec,
                        model_type = "MNL",
                        names = variable_names)
model_gender.fit_mle(np.zeros(len(variable_names)))
np.round(model_gender.summary,4)

Log-likelihood at zero: -74.5201
Initial Log-likelihood: -74.5201




Estimation Time for Point Estimation: 0.02 seconds.
Final log-likelihood: -46.9814


Unnamed: 0,parameters,std_err,t_stats,p_values,robust_std_err,robust_t_stats,robust_p_values
Gender,-0.6412,0.499,-1.2851,0.1987,0.5323,-1.2046,0.2284
Age ≥65,-0.4636,0.6247,-0.7421,0.458,0.6685,-0.6935,0.488
Service,-0.2316,0.745,-0.3109,0.7559,0.7104,-0.326,0.7444
Operation,-0.4353,1.0659,-0.4084,0.683,1.0232,-0.4254,0.6705
ContIV 24H,1.6198,0.6601,2.4537,0.0141,0.6332,2.5582,0.0105
Site,1.1894,1.0079,1.1801,0.238,0.8159,1.4578,0.1449
Inserter,-0.8021,0.6563,-1.2221,0.2217,0.654,-1.2264,0.2201
Neuro,2.337,0.7356,3.1768,0.0015,0.8246,2.8339,0.0046
DM,0.2407,0.5758,0.418,0.676,0.592,0.4066,0.6843
renal dis(2變項),0.5221,0.8482,0.6156,0.5382,0.7363,0.7091,0.4783


In [138]:
#找出顯著的變項
for i in range(21):
    if np.round(model_gender.summary,6)["p_values"][i] < 0.05:
        print(np.round(model_gender.summary,6).index[i]," 達顯著",sep="")

ContIV 24H  達顯著
Neuro 達顯著
pump 達顯著


### Kitchen sink model 分析結果

    比較一下，Univariate analysis共有6個變項達顯著(p-value<0.05)：ContIV 24H、Site、Neuro、Lipid、PPN、pump
    
    而 Kitchen sink model 只有三個 ContIV 24H、Neuro、pump，又 Kitchen sink model 是相對嚴格的標準。
    
    這告訴我們 ContIV 24H、Neuro、pump 對於軟組織感染是重要的因子，值得高興的是我們在意的 pump 還沒有被淘汰掉。
    

### 變項篩選 - 流行病學 : 10% rule

    以上兩個變項篩選是 Data driven 取向，也就是跟著資料走，並不代表臨床上的意義。以流行病學的角度來說，決定納入的變項亦即「要控制干擾的變項」。還記得上述干擾作用的藥物與年紀例子，如果在分析藥物與死亡率的迴歸模型中加入年紀，則藥物的迴歸係數可以解釋為「控制年紀後，藥物對死亡率的影響」。因此流行病學上認為，控制前與控制後的影響若超過 10%，則該因子就是重要的干擾因子。
    
    舉例來說：
    
        logit = B1*pump              (B1是pump對於感染的影響)
        
        logit = B1*pump + B2*Gender  (B1是控制性別後，也就是性別相同時，pump對於感染的影響)
        
        又在 logistic regression 中，將迴歸係數取自然指數即是 Odds ratio，用舉例來解釋：
        
        logit = 1*pump                  (OR of pump=exp(1)=2.718，使用pump感染的風險是沒使用pump的2.718倍)
        logit = 1.2*pump + 0.5*Gender   (OR of pump=exp(1)=3.320，控制性別後，使用pump感染的風險是沒使用pump的3.320倍)
        
        
        (3.32 - 2.718)/2.718 = 22.14%
        
        22.14% > 10%，因此在探討pump與感染的關係時，Gender 是重要的干擾因子
    
    以下就開始對每個變項做 10% rule，篩選需要納入模型中控制的因子。

In [185]:
# 10% rule
pump_OR = 0
univariate_dic = {}
spec = OrderedDict()
variable_names = OrderedDict()
spec["pump"] = [ [1, 2, 3, 4,5] ]
variable_names["pump"] = ["pump"]

model_gender = pl.create_choice_model(data = data2,
                            alt_id_col="ALT",
                            obs_id_col="strata",
                            choice_col="Case",
                            specification=spec,
                            model_type = "MNL",
                            names = variable_names)
model_gender.fit_mle(np.zeros(1))
pump_OR = math.exp(np.round(model_gender.summary,3)["parameters"][0])

#10% rule 結果儲存至 dic
ep_10 = {}

for i in range(3,25):   
    spec = OrderedDict()
    variable_names = OrderedDict()
    
    if (data2.columns[i] != "pump") & (data2.columns[i] != "Age"):
        Vars = ["pump",data2.columns[i]]
        
        for k in Vars:
            spec[k] = [[1,2,3,4]]
            variable_names[k] = [k]

        model_gender = pl.create_choice_model(data = data2,
                            alt_id_col="ALT",
                            obs_id_col="strata",
                            choice_col="Case",
                            specification=spec,
                            model_type = "MNL",
                            names = variable_names
        )
        model_gender.fit_mle(np.zeros(2))

        pump_adjOR = math.exp(np.round(model_gender.summary,3)["parameters"][0])
        
        ep_10[data2.columns[i]] = abs(pump_OR-pump_adjOR)/pump_OR



Log-likelihood at zero: -74.5201
Initial Log-likelihood: -74.5201




Estimation Time for Point Estimation: 0.00 seconds.
Final log-likelihood: -67.5390
Log-likelihood at zero: -74.5201
Initial Log-likelihood: -74.5201
Estimation Time for Point Estimation: 0.01 seconds.
Final log-likelihood: -63.2167
Log-likelihood at zero: -74.5201
Initial Log-likelihood: -74.5201
Estimation Time for Point Estimation: 0.01 seconds.
Final log-likelihood: -63.4714
Log-likelihood at zero: -74.5201
Initial Log-likelihood: -74.5201
Estimation Time for Point Estimation: 0.01 seconds.
Final log-likelihood: -63.0448
Log-likelihood at zero: -74.5201
Initial Log-likelihood: -74.5201
Estimation Time for Point Estimation: 0.01 seconds.
Final log-likelihood: -62.9631
Log-likelihood at zero: -74.5201
Initial Log-likelihood: -74.5201
Estimation Time for Point Estimation: 0.00 seconds.
Final log-likelihood: -59.5033
Log-likelihood at zero: -74.5201
Initial Log-likelihood: -74.5201
Estimation Time for Point Estimation: 0.00 seconds.
Final log-likelihood: -61.6399
Log-likelihood at zero:

In [186]:
#結果
ep_10

{'Gender': 1.183654828863502,
 'Age ≥65': 1.0896571302360567,
 'Service': 1.3419876689913814,
 'Operation': 1.1404155662118944,
 'ContIV 24H ': 0.0343945837424334,
 'Site': 1.052379804307529,
 'Inserter': 1.3537269310346607,
 'Neuro': 2.380420128015566,
 'DM': 1.1468464544196766,
 'renal dis(2變項)': 1.119118075482217,
 'liver dis': 1.1447006810307654,
 'Pre': 1.155451037931604,
 'hypertonic ': 1.1447006810307654,
 'Blood ': 1.1684226199906478,
 'Lipid': 0.7228846360108876,
 'Antibiotic': 1.1404155662118944,
 'Cvdrug': 0.9464359844275911,
 'neuro drug': 1.803868301533657,
 'PPN': 0.6972342254930016,
 'Amount': 1.1404155662118944}

### 10% rule 分析結果

    從結果來看每個變項均大於 10%，也許 10% rule 對於這筆資料而言過於寬鬆

### 變項篩選結果，以 LR test 做最後模式選擇：

    Univariate analysis：ContIV 24H、Site、Neuro、Lipid、PPN、pump
    
    Kitchen sink：ContIV 24H、Neuro、pump
    
    10% rule：全部
    
    我們該使用 Univariate analysis 還是 Kitchen sink 的結果呢？ 因此可以透過 Likelihood ratio test (LR test) 來檢定兩 model 是否有顯著差異(outcome 均為 Case)，若無差異則選擇小 model 即可：
    
            H0：Reduced model (ContIV 24H、Neuro、pump)
            H1：Full model (ContIV 24H、Site、Neuro、Lipid、PPN、pump)

In [189]:
# Univariate analysis model

spec = OrderedDict()
variable_names = OrderedDict()
Vars = ['ContIV 24H ',"Site","Neuro","PPN","pump","Lipid"]

for i in Vars:
    spec[i] = [[1,2,3,4]]
    variable_names[i] = [i]
    
model_gender = pl.create_choice_model(data = data2,
                        alt_id_col="ALT",
                        obs_id_col="strata",
                        choice_col="Case",
                        specification=spec,
                        model_type = "MNL",
                        names = variable_names)
model_gender.fit_mle(np.zeros(len(variable_names)))
np.round(model_gender.summary,4)


Log-likelihood at zero: -74.5201
Initial Log-likelihood: -74.5201




Estimation Time for Point Estimation: 0.01 seconds.
Final log-likelihood: -53.4744


Unnamed: 0,parameters,std_err,t_stats,p_values,robust_std_err,robust_t_stats,robust_p_values
ContIV 24H,0.9505,0.5127,1.854,0.0637,0.5278,1.8008,0.0717
Site,0.9367,0.8246,1.1359,0.256,0.6894,1.3588,0.1742
Neuro,1.526,0.5553,2.7482,0.006,0.6242,2.4448,0.0145
PPN,0.492,1.6253,0.3027,0.7621,0.8082,0.6088,0.5427
pump,2.1862,0.8182,2.6719,0.0075,0.7399,2.9549,0.0031
Lipid,0.5736,1.2572,0.4562,0.6482,1.1063,0.5185,0.6041


In [191]:
# Univariate analysis model -2loglikelibood

LL_uni = -53.4744*-2

In [194]:
# Kitchen sink model

spec = OrderedDict()
variable_names = OrderedDict()
Vars = ['ContIV 24H ',"Neuro","pump"]

for i in Vars:
    spec[i] = [[1,2,3,4]]
    variable_names[i] = [i]
    
model_gender = pl.create_choice_model(data = data2,
                        alt_id_col="ALT",
                        obs_id_col="strata",
                        choice_col="Case",
                        specification=spec,
                        model_type = "MNL",
                        names = variable_names)
model_gender.fit_mle(np.zeros(len(variable_names)))
np.round(model_gender.summary,4)

Log-likelihood at zero: -74.5201
Initial Log-likelihood: -74.5201




Estimation Time for Point Estimation: 0.01 seconds.
Final log-likelihood: -54.2986


Unnamed: 0,parameters,std_err,t_stats,p_values,robust_std_err,robust_t_stats,robust_p_values
ContIV 24H,0.9958,0.5013,1.9862,0.047,0.5151,1.933,0.0532
Neuro,1.6825,0.5435,3.0959,0.002,0.6319,2.6627,0.0078
pump,2.3433,0.7676,3.0528,0.0023,0.6756,3.4683,0.0005


In [201]:
# Kitchen sink model -2loglikelibood

K_uni = -54.2986*-2

In [203]:
# Perform LR test
from scipy.stats.distributions import chi2
LRT = K_uni - LL_uni
df = 6-3
chi2.sf(LRT, df)


0.648465908493641

###  LR test 結果：

    p-value = 0.6484 > 0.05，兩模式未達統計上顯著差異，因此選擇 Reduced model。
    
    最終我們選擇 Kitchen sink 的變數篩選結果來建立最後的 model。

### 建立 Final model

In [205]:
spec = OrderedDict()
variable_names = OrderedDict()
Vars = ['ContIV 24H ',"Neuro","pump"]

for i in Vars:
    spec[i] = [[1,2,3,4]]
    variable_names[i] = [i]
    
model_gender = pl.create_choice_model(data = data2,
                        alt_id_col="ALT",
                        obs_id_col="strata",
                        choice_col="Case",
                        specification=spec,
                        model_type = "MNL",
                        names = variable_names)
model_gender.fit_mle(np.zeros(len(variable_names)))
np.round(model_gender.summary,4)

Log-likelihood at zero: -74.5201
Initial Log-likelihood: -74.5201




Estimation Time for Point Estimation: 0.01 seconds.
Final log-likelihood: -54.2986


Unnamed: 0,parameters,std_err,t_stats,p_values,robust_std_err,robust_t_stats,robust_p_values
ContIV 24H,0.9958,0.5013,1.9862,0.047,0.5151,1.933,0.0532
Neuro,1.6825,0.5435,3.0959,0.002,0.6319,2.6627,0.0078
pump,2.3433,0.7676,3.0528,0.0023,0.6756,3.4683,0.0005


### Model interpret

    logit = 2.3433*pump + 1.6825*Neuro + 0.9958*ContIV 24H
    
    Odds ratio of pump = exp(2.3433) = 10.415
    
    代表配對病房及發病日期後，控制 Neuro 及 ContIV 24H 下，有裝 pump 的病人受感染的風險是沒 pump 的 10.415 倍，p-value < 0.05 達統計上顯著。代表 pump 是軟組織感染的危險因子。
    
  
    
    
    

### SVM 預測

    利用篩選出來的 pump、Neuro、ContIV 24H 做SVM預測

In [208]:
from sklearn.svm import SVC

In [218]:
X = pd.DataFrame(data2['ContIV 24H '])
X["Neuro"] = data2["Neuro"]
X["pump"] = data2["pump"]
Y = data2["Case"]

In [219]:
#將資料分為 75% training 25% testing
x_train,x_test,y_train,y_test = train_test_split(X,Y,test_size=0.25,random_state=87)

In [221]:
clf = SVC()

In [223]:
clf.fit(x_train,y_train)

SVC(C=1.0, cache_size=200, class_weight=None, coef0=0.0,
  decision_function_shape='ovr', degree=3, gamma='auto', kernel='rbf',
  max_iter=-1, probability=False, random_state=None, shrinking=True,
  tol=0.001, verbose=False)

In [225]:
y_predict = clf.predict(x_test)

In [227]:
from sklearn.metrics import accuracy_score

In [229]:
accuracy_score(y_test, y_predict)

0.847457627118644

### SVM 預測結果

    利用篩選出來的 pump、Neuro、ContIV 24H 做SVM預測有 0.84745 的準確率。代表除了 pump 以外，臨床上也需考慮 Neuro、ContIV 24H 這兩個變項來評估病人將來的打點滴位置是否會感染。如果經過預測有很高的風險會感染，則在醫療照護上需要額外注意。

### 總結

    利用 Data driven 以及流行病學 10% rule 結果篩出三個重要變項，透過 Conditional logistic regression 得知有裝 pump 的感染風險高達10倍。另外 SVM 的預測表現也顯示滿高的準確率(84.75%)，代表 pump 可能真的是臨床上該注意的危險因子。