# anti_party_FAC
* 因素分析 -> 根據構面取平均
* 因素分析 -> 根據構面取綜合得分
* 因素分析 -> 全部算一個綜合得分

## Import package

In [None]:
!pip install factor_analyzer pingouin stargazer

In [None]:
! pip install plotnine

In [183]:
# for data ETL
import pandas as pd
import ETL
# Module for spatial data manipulation
import geopandas as gpd
# Module for data viz 
from plotnine import *
import plotnine
import plotly.express as px
import seaborn as sns               
import matplotlib.pyplot as plt
# Module for spatial data viz
import folium
import branca
import branca.colormap as cm
from folium.features import GeoJson, GeoJsonTooltip
# for establish index
import Factor_Analysis 
from factor_analyzer import FactorAnalyzer
from factor_analyzer.factor_analyzer import calculate_bartlett_sphericity, calculate_kmo
import pingouin as pg # index reliability testing
# for stats ml
from sklearn.preprocessing import StandardScaler
from scipy.stats import pearsonr
import statsmodels.api as sm
import statsmodels.formula.api as smf
from statsmodels.miscmodels.ordinal_model import OrderedModel
from stargazer.stargazer import Stargazer

## Load data and filiter 

In [157]:
raw_data = pd.read_csv('raw_data.csv')

In [159]:
filter_data = raw_data.loc[raw_data['目前國內政黨當中，請問您是否偏向哪一個政黨？'].isin(['沒有特定支持', '都不支持'])]
filter_data.drop(['您有絶對的權力決定是否要參與本研究。若您願意參與，請務必勾選下列選項：', '請填寫您的電子信箱，以利後續抽獎聯繫使用'], axis=1, inplace=True)

## Data Cleaning

In [None]:
ml_df = ETL.data_cleaning(filter_data)

In [None]:
ml_df.shape

## Establish anti_party Index with Factor Analysis

### Filter out IV

In [None]:
IV_vars = ['anti_1', 'anti_2', 'anti_3', 'anti_4', 'anti_5']
IV_df = ml_df[IV_vars]

### Bartlett’s test and Kaiser-Meyer-Olkin 

In [None]:
# Adequacy test - Bartlett's test
chiSquareValue, pValue = calculate_bartlett_sphericity(IV_df)
print('Chi-square value : {}'.format(round(chiSquareValue, ndigits = 3)))
print('p-value          : {}'.format(round(pValue, ndigits = 3)))

The Bartlett test produces a p-value that is less than 0.05. It means, we reject the null hypothesis or in this case, at least two population variances are different.

In [None]:
# Adequacy test - Kaiser-Meyer-Olkin test
KMO, KMO_model = calculate_kmo(IV_df)
print('KMO value : {}'.format(round(KMO_model, ndigits = 3)))

### Communality testing


In [None]:
# Create factor analysis object and perform factor analysis
fa = FactorAnalyzer(n_factors = 25, rotation = None)
fa.fit(IV_df)
# The communalities
df_communalities_IV = pd.DataFrame(data = {'Column': IV_df.columns, 'Communality': fa.get_communalities()})
df_communalities_IV
df_communalities_IV.style.apply(Factor_Analysis.highlightCommunalities, subset = ['Communality'])

The yellow color indicates that the communality values meet the criteria — greater than 0.5. Eliminated Variable below 0.5 .

In [None]:
# Data viz
plotnine.options.figure_size = (15, 9)
communality_bar = (
    ggplot(data = df_communalities_IV)+
    geom_bar(aes(x = 'Column',
                 y = 'Communality'),
             width = 0.75,
             stat = 'identity')+
    geom_hline(yintercept = 0.5)+
    scale_x_discrete(limits = df_communalities_IV['Column'].tolist())+
    labs(title = 'Communalitites of factor analysis')+
    xlab('Columns')+
    ylab('Communalities')+
    theme_minimal()
)
# Display the viz
communality_bar

### Reduce Factors

In [None]:
# Check Eigenvalues
eigenValue, value = fa.get_eigenvalues()

In [None]:
# Convert the results into a dataframe
df_eigen = pd.DataFrame({'Factor': range(1, len(eigenValue) + 1), 'Eigen value': eigenValue})
df_eigen.style.apply(Factor_Analysis.highlightEigenvalue, subset = ['Eigen value'])

According to the Kaiser criteria, the number of factors generated is 2. It means that the 16 columns or well-known variables will be grouped and interpreted into 2 factors.

In [None]:
# Data viz
plotnine.options.figure_size = (8, 4.8)
scree_eigenvalue = (
    ggplot(data = df_eigen)+
    geom_hline(yintercept = 1)+
    geom_line(aes(x = 'Factor',
                  y = 'Eigen value'))+
    geom_point(aes(x = 'Factor',
                   y = 'Eigen value'),
               size = 2)+
    labs(title = 'Scree plot of eigen value from factor analysis')+
    xlab('Factors')+
    ylab('Eigenvalue')+
    theme_minimal()
)
# Display the viz
scree_eigenvalue

In [None]:
# Factor analysis with rotation
fa = FactorAnalyzer(n_factors = 2, rotation = 'varimax')
fa.fit(IV_df)
# Create a factor's names
facs = ['Factors' + ' ' + str(i + 1) for i in range(2)]
print(facs)
# Loading factors
pd.DataFrame(data = fa.loadings_, index = IV_df.columns, columns = facs).style.apply(Factor_Analysis.highlightLoadings)

#### 由上述結果可以發現依變數可以分為2個構面：</b>

* 極化現象(political_polarization) : anti_1</b>


* 政黨形象(party_image) : anti_3 / anti_4 / anti_5

In [None]:
# Explained variance
idx = ['SS Loadings', 'Proportion Variance', 'Cumulative Variance']
df_variance = pd.DataFrame(data = fa.get_factor_variance(), index = idx, columns = facs)
# Ratio of variance
ratioVariance = fa.get_factor_variance()[1] / fa.get_factor_variance()[1].sum()
df_ratio_var = pd.DataFrame(data = ratioVariance.reshape((1, 2)), index = ['Ratio Variance'], columns = facs)
# New completed dataframe
df_variance.append(df_ratio_var)

In [None]:
df_factors_DV = pd.DataFrame(data = fa.fit_transform(DV_df),columns = facs)
df_factors_DV                     

### Establish Index with mean method

In [None]:
political_polarization_vars = ['anti_1']
party_image_vars = ['anti_3', 'anti_4', 'anti_5']

In [None]:
# Calculate the scores for each factor
ml_df['political_polarization_mean'] = ml_df[political_polarization_vars].mean(axis=1)
ml_df['party_image_mean'] = ml_df[party_image_vars].mean(axis=1)