In [1]:
from load import load_sav
import pandas as pd

df = load_sav("../data/tscs2014q2.sav")

In [2]:
variable_value_labels = df.attrs['variable_value_labels']
column_names = df.attrs['column_names']
column_names_to_labels = df.attrs['column_names_to_labels']

def print_labels(key):
    variable_labels = variable_value_labels.get(key, None)
    column_labels = column_names_to_labels.get(key, None)

    print(f"variable_labels: {variable_labels}")
    print(f"column_labels: {column_labels}")

In [4]:
from statistic.categorical_data import crosstab_with_residuals

report = crosstab_with_residuals(row_series = df['EDU_3gp'], col_series = df['v1'])

In [7]:
report['adjusted_residuals']

v1,1.0,2.0
EDU_3gp,Unnamed: 1_level_1,Unnamed: 2_level_1
1.0,-4.616877,4.616877
2.0,1.686073,-1.686073
3.0,2.720417,-2.720417


卡方檢定結果

In [8]:
report['chi2_table']

Unnamed: 0,Value
chi2,21.742558
p_value,1.9e-05
dof,2.0


計算Cramer's V

In [9]:
from statistic.categorical_data.cramers_v import cramers_v

cramers_v(row_series = df['EDU_3gp'], col_series = df['v1'])

{'cramers_v': 0.106139408936679}

接下來實作Lambda Test。由於是否減少搭乘汽機車不影響個人性別，故採用以性別為自變數的不對稱Lambda 結果

In [13]:
from statistic.categorical_data.pre import goodman_kruskal_lambda, goodman_kruskal_tau

goodman_kruskal_lambda(df, 'EDU_3gp', 'v1')

Unnamed: 0,Measure,Type,Value,ASE,Approx_T,Approx_Sig
0,Lambda,Symmetric,0.036883,0.010723,3.439782,0.000582
1,Lambda,EDU_3gp Dependent,0.0,0.0,0.0,1.0
2,Lambda,v1 Dependent,0.084746,0.024111,3.514883,0.00044


In [14]:
goodman_kruskal_tau(df, 'EDU_3gp', 'v1')

Unnamed: 0,Measure,Type,Value,ASE,Approx_T,Approx_Sig
0,Goodman-Kruskal Tau,EDU_3gp Dependent,0.00526,0.002247,2.341106,0.019227
1,Goodman-Kruskal Tau,v1 Dependent,0.011266,0.004798,2.348112,0.018869


### Answer
由於「個人性別」基本上不被任何因素影響，故在此分析中，「個性」別為自變數，「是否減少搭乘汽機車」則為因變數。因此在Lambda 檢定中，可取以「個人性別」為自變數之數值，Appr. P Value 小於0.05 ，拒絕虛無假設，不同性別的「是否減少搭乘汽機車」狀況有顯著差異。

而卡方檢定的P Value 亦大於0.05 ，符合上述分析結論。

進一步分析交叉表以及調整後標準化殘差表。除了「通常」組別在兩性別中無達到顯著差異，其餘組別皆有顯著差異。其中顯示男性更多「不曾」、「很少」改變其習慣，而女性則顯著地「總是」改變其減少搭乘汽機車的習慣。

In [15]:
from statistic.categorical_data.rank_correlation import rank_correlation_measures

rank_correlation_measures(df, 'AGE_3gp', 'R_v212')

Unnamed: 0,Measure,Value,ASE,Approx_T,Approx_Sig
0,Goodman & Kruskal's Gamma,-0.203746,0.057211,-3.56131,0.0003690093
1,Kendall's Tau-c,-0.13518,0.019575,-6.905719,4.994893e-12


接著執行卡方檢定，並匯出調整後標準化殘差表

In [15]:
from statistic.categorical_data import crosstab_with_residuals

report = crosstab_with_residuals(row_series = df_2['f1a'], col_series = df_2['f2c'])

In [16]:
report['adjusted_residuals']

f2c,1.0,2.0,3.0,4.0
f1a,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
1.0,2.539924,-2.10823,0.082925,0.775671
2.0,1.543489,2.842432,-3.994861,-1.611912
3.0,-3.833528,3.036668,1.082096,-3.817451
4.0,2.379295,-4.688641,1.545248,5.149645
