In [1]:
import pandas as pd
import numpy as np
import statsmodels.api as sm
import statsmodels
import statsmodels.stats
import warnings
import scipy

### Group Members

Ge Gao, Runnan Guo, Guangdi Zhong

# I. Introduction

The purpose of our project is to find the relationship between gender and change of Hukou status and inter-provincial migration patterns in China, which is tied up with a traditional Chinese ideology of preference for sons over daughters.

“Hukou” is a system of household registration in China with the aim of managing the population and safeguarding the basic rights of citizens in employment, education and social welfare. The division of hukou status into agricultural and non-agricultural accounts creates a wealth gap and greatly increases the cost of hukou transfer. There is ample research evidence to show that the consequence of such hukou status differences is not gender equality, but rather greater transfer costs for women, especially for those in rural areas.

"Inter-provincial migration" is more informal since it does not have to acquire the formal Hukou. Despite of no hukou requirement, there are still many factors that constrain women's migration in China, where patriarchy is deeply entrenched.

The link stems mainly from a traditional Chinese ideology of preference sons to daughters. In the old traditional feudal familym, only men were eligible heirs to the family, since women were no longer member of the original family when they married. So a family must have at least one boy, and if the first few children are all girls, they will continue to have children until a boy is born.Although this backward concept of family inheritance has gradually faded in modern Chinese society, it is inevitable that some families still prefer boys to girls, and in these families, more good resources are given to boys. A fairly typical example is that in rural areas, older sisters will drop out of school at an early age and work in cities to support their younger brothers. Even in families with only daughters, strong patriarchal attitudes can make education and other resources less secure for girls, because parents feel that girls do not need to pursue higher education or have a high social status.

These traditional concepts result in two different outcomes of female in hukou transition and inter-provincial migration. On the one hand, although women move to other cities or provinces to live and work, this does not change the status of their hukou, because the cost and threshold to transfer from agricultural hukou to non-agricultural hukou is very high. Also, there may be resistance from their original family who don't want their daughter live too far from them. On the other hand, working in cities also increases the possibility to obtain urban status hukou. Moreover, for some of them, escaping shackles of their original families stimulates them to work hard to stay in cities or places far from their hometowns. The following models will examine which force prevail in reality.

### Literature Review

Liu,Z.(2003) found that people who switch from agricultural to urban hukou tend to have a lower standard of living than thoes who are originally urban citizens. They often have limited access to education, medical, public welfare  resources.

Fan,C.C.(2000) examined the different patterns of cross-provincial migration between males and females. Geographically, males migrate more to the western and central parts of China, while females migrate more to the eastern areas where the third industry is more developed; motivationally, males' migration motivation are mainly business, job transfer and study, while females' are mainly marriage and business; in addition, females' average migration distance is longer and net movement between adjacent provinces is less compared to males' inter-provincial migration. 

In another paper by Fan,C.C.(2003) where she particularly discussed rural-urban migration and gender division of labor, she pointed out a latent factor of the survival of the traditional patriarchy in modern China, which is the transitional phase in China that differs from its socialist period as well as the capitalist economies by “prioritizing economic goals and making peasants more vulnerable” and by “continued prominence of the state”, leaving room for the Confucianism originated socio-economic traditions to re-breed. She also addressed that the transition is not gender-neutral, but instead potentially facilitates labor division, which keeps undermining the circumstances of women in rural areas. 

Using Shenzhen Special Economic Zone (SEZ) as a case study, Liang,Z. and Chen,Y.P.(2004) further explained the reason of female migration. Their finding is consistent with Fan(2000)'s view, which happened before the establishment of SEZ; However, after the establishment of SEZ, this situation has changed dramatically, with 80% of female moving to Shenzhen for business and work purposes. In addition, he pointed out that women are helped to improve their social status by raising their educational level.

The above articles and other relevant literature have made rather thorough discussions on gender, change of Hukou status and inter-provincial migration. Education is one of the essential linkages between gender and Hukou or migration; What's more, our study adds another layer of patriarchy within the family represented by the feature of siblings of women and examines a more detailed chain reaction from gender to migration. 

# II. Data 

The data we use are from Chinese Family Panel Studies (CFPS), a random national survey that tracks individuals, families and communities every two years (except for 2010 and 2011) for 12 years (2008-2020) and is conducted by the Institute of Social Science Survey of Peking University (ISSS). For each year of the survey, the database is composed of four subsets on the family economy, family relationships, adults, children, and the community information was only surveyed in 2010 and 2014. Our research specifically uses the dataset of adults in 2010, where the information on the subjects’ Hukou status, the province of residence, gender, age, education, marital status, and siblings are comprehensively recorded. 

In [2]:
warnings.filterwarnings("ignore")
data=pd.read_stata('ecfps2010adult_201906.dta',convert_categoricals=False)

In [3]:
#change of hukou status: agricultural to non-agriculturual
#current hukou status ≠ hukou status at age 3
data['d_hukou_status1']=np.nan
data['d_hukou_status1'][data['qa2']!=data['qa302']]=1
data['d_hukou_status1']=data['d_hukou_status1'].fillna(0)
#current hukou status ≠ hukou status at age 12
data['d_hukou_status2']=np.nan
data['d_hukou_status2'][data['qa2']!=data['qa402']]=1
data['d_hukou_status2']=data['d_hukou_status2'].fillna(0)
# change of living place: different province
# current living province ≠ birth place
data['d_migration1']=np.nan
data['d_migration1'][data['provcd']!=data['qa102acode']]=1
data['d_migration1']=data['d_migration1'].fillna(0)
# place of current hukou ≠ birth place
data['d_migration2']=np.nan
data['d_migration2'][data['qa201acode']!=data['qa102acode']]=1
data['d_migration2']=data['d_migration2'].fillna(0) 
#generate dummy of younger brother
data['d_youngbro']=np.nan
data['d_youngbro'][(data['qb301_a_1']==3)|(data['qb301_a_2']==3)|(data['qb301_a_3']==3)|(data['qb301_a_4']==3)|(data['qb301_a_5']==3)|(data['qb301_a_6']==3)|(data['qb301_a_7']==3)|(data['qb301_a_8']==3)|(data['qb301_a_9']==3)|(data['qb301_a_10']==3)|(data['qb301_a_11']==3)|(data['qb301_a_12']==3)|(data['qb301_a_13']==3)|(data['qb301_a_14']==3)|(data['qb301_a_15']==3)]=1
data['d_youngbro']=data['d_youngbro'].fillna(0)
#missing data
data['qb1'][data['qb1']<0]=np.nan
data['qe1'][data['qe1']<0]=np.nan
data['qc1'][data['qc1']<0]=np.nan

In [4]:
#generate interaction terms
data['d_gender']=np.nan
data['d_gender'][data['gender']==0]=1
data['d_gender']=data['d_gender'].fillna(0)
data['gender_bro']=data['d_gender']*data['d_youngbro']
data['gender_sib']=data['d_gender']*data['qb1']

### Summary Statistics

In [5]:
data[['d_hukou_status1', 'd_hukou_status2', 'd_migration1', 'd_migration2', 'd_gender', 'd_youngbro', 'gender_bro', 'gender_sib', 'qa1age']].describe()

Unnamed: 0,d_hukou_status1,d_hukou_status2,d_migration1,d_migration2,d_gender,d_youngbro,gender_bro,gender_sib,qa1age
count,33600.0,33600.0,33600.0,33600.0,33600.0,33600.0,33600.0,33185.0,33600.0
mean,0.157917,0.141786,0.082381,0.063036,0.515298,0.432292,0.244911,1.502637,45.514821
std,0.364668,0.348835,0.274948,0.243031,0.499773,0.495402,0.430041,2.00956,16.405657
min,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,16.0
25%,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,33.0
50%,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,45.0
75%,0.0,0.0,0.0,0.0,1.0,1.0,0.0,3.0,57.0
max,1.0,1.0,1.0,1.0,1.0,1.0,1.0,14.0,110.0


The change of Hukou status is a 0 and 1 variable that takes one if the subject’s current Hukou status is different from that in his/her childhood. There are two different ways of measurements of the change of Hukou status to ensure the robustness of the results: the first one is whether there is a difference in Hukou status between the present and the age of 3, the second one is between the present and the age of 12. Using the two measurements, on average, there are 16% and 14% of the sample respectively have changed their Hukou status.

Similarly, the migration variable is also a dummy that takes one if the subject has ever change their place of living. Since the provinces of living at age 3 and 12 are not available in this dataset, we use the province of birth place as the proxy of their original living provinces instead. There are also two different ways of measurements for their current living provinces: the first measurement uses the province that the subjects took the survey, while the second uses their current Hukou province. There are about 8% and 6% of the sample respectively had gone through inter-provincial migration.

Gender is labeled 1 if the subject is female and 0 if male. 51.5% of the subjects are female. Younger brother is also a dummy variable that takes one if the subject has (had) one or more younger brother(s). About 43.2% of the subjects have (had) one or more younger brother(s). In the sample, there are “only children” who had no siblings and also ones with 15 siblings at maximum. The mean of the interaction terms gender * younger brother indicates that among the samples, 24.5% are female and have (had) one or more younger brother(s). The average age of the subjects is 45.5, with the youngest at age 16 and the oldest at age 110.

In [6]:
#ttest
ttable=pd.DataFrame(index=range(0,4),columns=['Variables','G1(Female)','Mean1','G2(Male)','Mean2','MeanDiff','t','p'])
data1=data[data['gender']==0]
data2=data[data['gender']==1]
vlist=['d_hukou_status1', 'd_hukou_status2', 'd_migration1', 'd_migration2']
for i in range(0,4):    
    rvs1=data1[vlist[i]]
    rvs2=data2[vlist[i]]
    ttable.iloc[i,0]=vlist[i]
    ttable.iloc[i,1]=len(data1)
    ttable.iloc[i,2]=rvs1.mean()
    ttable.iloc[i,3]=len(data2)
    ttable.iloc[i,4]=rvs2.mean()
    ttable.iloc[i,5]=rvs1.mean()-rvs2.mean()
    ttable.iloc[i,6]= scipy.stats.ttest_ind(rvs1,rvs2)[0]
    ttable.iloc[i,7]=scipy.stats.ttest_ind(rvs1,rvs2)[1]
ttable  

Unnamed: 0,Variables,G1(Female),Mean1,G2(Male),Mean2,MeanDiff,t,p
0,d_hukou_status1,17314,0.148608,16286,0.167813,-0.019205,-4.826049,1e-06
1,d_hukou_status2,17314,0.133014,16286,0.151111,-0.018098,-4.754197,2e-06
2,d_migration1,17314,0.093393,16286,0.070674,0.022718,7.575786,0.0
3,d_migration2,17314,0.072311,16286,0.053175,0.019137,7.219,0.0


This table exhibits the t-test of transfer of Hukou status and migration by gender. Women on average have lower rates of Hukou status transfer, and the difference between the mean of men and women is highly significant. In terms of the probabilities of inter-provincial migration in the next two rows, women are now having significantly higher rates than men.

# III. Modeling

The models are binary response regressions on a series of explanatory variables of gender and siblings using the Linear Probability Model (LPM). In the basis specification where no covaraites are incorporated, the change of Hukou status and the migration variables are regressed on the number of siblings and its quadratic form, the dummy variable of gender, the interaction term of gender and the younger-brother dummy, and the interaction term of gender and the number of siblings. The quadratic term of the number of siblings is to examine if there is an (inverted) U-shape relationship between siblings and the explained variables. The basic model is specified as $Pr(y_i=1)=\beta_0+\beta_1 * number of siblings_i +\beta_2 * number of siblings_i^2 +\beta_3 * gender_i + \beta_4 * gender * youngbro_i + \beta_5 * gender * number of siblings_i$ , where $Pr(y_i=1)$ is the probability of changes in Hukou status and the probability of inter-provincial migration for individual i.

In the extended model where more covariates are added, more explanatory variables of marital status, education and age are added. This is based on the Hukou registration law, where marriage and education are two key factors in deciding whether a permit for the transfer of Hukou status can be issued to the applicant, and in affecting individuals’ decisions in the place of settlement. Age is controlled since the wealth and experience that affect one’s Hukou status and migration behavior accumulate over time. The extended model is as follows: $Pr(y_i=1)=\beta_0+\beta_1 * number of siblings_i +\beta_2 * number of siblings_i^2 +\beta_3 * gender_i + \beta_4 * gender * youngbro_i + \beta_5 * gender * number of siblings_i + \beta_6 * marital status_i + \beta_7 * gender * marital status_i + \beta_8 * education_i * \beta_9 * gender * education_i + \alpha * age_i$ Interaction terms are to examine the specific effects on women. To be specific, incorporation of educational attainment helps detect patriarchy that does not directly affect explained variables through siblings and brothers but educational opportunities. 

In both models, individuals are clustered at a family level.

In [7]:
data['c.qb1#c.qb1']=data['qb1']*data['qb1']
new_data=pd.concat([data,pd.get_dummies(data['qe1'],prefix="qe1")],axis=1)
new_data=pd.concat([new_data,pd.get_dummies(data['qc1'],prefix="qc1")],axis=1)
new_data['c.d_gender#i.qe1_2.0']=new_data['d_gender']*new_data['qe1_2.0']
new_data['c.d_gender#i.qe1_3.0']=new_data['d_gender']*new_data['qe1_3.0']
new_data['c.d_gender#i.qe1_4.0']=new_data['d_gender']*new_data['qe1_4.0']
new_data['c.d_gender#i.qe1_5.0']=new_data['d_gender']*new_data['qe1_5.0']
new_data['c.d_gender#i.qc1_2.0']=new_data['d_gender']*new_data['qc1_2.0']
new_data['c.d_gender#i.qc1_3.0']=new_data['d_gender']*new_data['qc1_3.0']
new_data['c.d_gender#i.qc1_4.0']=new_data['d_gender']*new_data['qc1_4.0']
new_data['c.d_gender#i.qc1_5.0']=new_data['d_gender']*new_data['qc1_5.0']
new_data['c.d_gender#i.qc1_6.0']=new_data['d_gender']*new_data['qc1_6.0']
new_data['c.d_gender#i.qc1_7.0']=new_data['d_gender']*new_data['qc1_7.0']
new_data['c.d_gender#i.qc1_8.0']=new_data['d_gender']*new_data['qc1_8.0']
model1_data=new_data[['fid','d_hukou_status1','qb1','c.qb1#c.qb1','d_gender','gender_bro','gender_sib']]
model1_data=model1_data.dropna(how='any')
model1=sm.OLS(model1_data['d_hukou_status1'],sm.add_constant(model1_data[['qb1','c.qb1#c.qb1','d_gender','gender_bro','gender_sib']])).fit(cov_type='cluster', cov_kwds={'groups': model1_data['fid']},use_t=True)

model2_data=new_data[['fid','d_hukou_status1','qe1','qc1','qb1','c.qb1#c.qb1','d_gender','qe1_2.0','qe1_3.0','qe1_4.0','qe1_5.0','c.d_gender#i.qe1_2.0','c.d_gender#i.qe1_3.0','c.d_gender#i.qe1_4.0','c.d_gender#i.qe1_5.0','qc1_2.0','qc1_3.0','qc1_4.0','qc1_5.0','qc1_6.0','qc1_7.0','qc1_8.0','c.d_gender#i.qc1_2.0','c.d_gender#i.qc1_3.0','c.d_gender#i.qc1_4.0','c.d_gender#i.qc1_5.0','c.d_gender#i.qc1_6.0','c.d_gender#i.qc1_7.0','c.d_gender#i.qc1_8.0','gender_bro','gender_sib','qa1age']]
model2_data=model2_data.dropna(how='any')
model2=sm.OLS(model2_data['d_hukou_status1'],sm.add_constant(model2_data[['qb1','c.qb1#c.qb1','d_gender','qe1_2.0','qe1_3.0','qe1_4.0','qe1_5.0','c.d_gender#i.qe1_2.0','c.d_gender#i.qe1_3.0','c.d_gender#i.qe1_4.0','c.d_gender#i.qe1_5.0','qc1_2.0','qc1_3.0','qc1_4.0','qc1_5.0','qc1_6.0','qc1_7.0','qc1_8.0','c.d_gender#i.qc1_2.0','c.d_gender#i.qc1_3.0','c.d_gender#i.qc1_4.0','c.d_gender#i.qc1_5.0','c.d_gender#i.qc1_6.0','c.d_gender#i.qc1_7.0','c.d_gender#i.qc1_8.0','gender_bro','gender_sib','qa1age']])).fit(cov_type='cluster', cov_kwds={'groups': model2_data['fid']},use_t=True)

model3_data=new_data[['fid','d_hukou_status2','qb1','c.qb1#c.qb1','d_gender','gender_bro','gender_sib']]
model3_data=model3_data.dropna(how='any')
model3=sm.OLS(model3_data['d_hukou_status2'],sm.add_constant(model3_data[['qb1','c.qb1#c.qb1','d_gender','gender_bro','gender_sib']])).fit(cov_type='cluster', cov_kwds={'groups': model3_data['fid']},use_t=True)

model4_data=new_data[['fid','d_hukou_status2','qe1','qc1','qb1','c.qb1#c.qb1','d_gender','qe1_2.0','qe1_3.0','qe1_4.0','qe1_5.0','c.d_gender#i.qe1_2.0','c.d_gender#i.qe1_3.0','c.d_gender#i.qe1_4.0','c.d_gender#i.qe1_5.0','qc1_2.0','qc1_3.0','qc1_4.0','qc1_5.0','qc1_6.0','qc1_7.0','qc1_8.0','c.d_gender#i.qc1_2.0','c.d_gender#i.qc1_3.0','c.d_gender#i.qc1_4.0','c.d_gender#i.qc1_5.0','c.d_gender#i.qc1_6.0','c.d_gender#i.qc1_7.0','c.d_gender#i.qc1_8.0','gender_bro','gender_sib','qa1age']]
model4_data=model4_data.dropna(how='any')
model4=sm.OLS(model4_data['d_hukou_status2'],sm.add_constant(model4_data[['qb1','c.qb1#c.qb1','d_gender','qe1_2.0','qe1_3.0','qe1_4.0','qe1_5.0','c.d_gender#i.qe1_2.0','c.d_gender#i.qe1_3.0','c.d_gender#i.qe1_4.0','c.d_gender#i.qe1_5.0','qc1_2.0','qc1_3.0','qc1_4.0','qc1_5.0','qc1_6.0','qc1_7.0','qc1_8.0','c.d_gender#i.qc1_2.0','c.d_gender#i.qc1_3.0','c.d_gender#i.qc1_4.0','c.d_gender#i.qc1_5.0','c.d_gender#i.qc1_6.0','c.d_gender#i.qc1_7.0','c.d_gender#i.qc1_8.0','gender_bro','gender_sib','qa1age']])).fit(cov_type='cluster', cov_kwds={'groups': model4_data['fid']},use_t=True)

model5_data=new_data[['fid','d_migration1','qb1','c.qb1#c.qb1','d_gender','gender_bro','gender_sib']]
model5_data=model5_data.dropna(how='any')
model5=sm.OLS(model5_data['d_migration1'],sm.add_constant(model5_data[['qb1','c.qb1#c.qb1','d_gender','gender_bro','gender_sib']])).fit(cov_type='cluster', cov_kwds={'groups': model5_data['fid']},use_t=True)

model6_data=new_data[['fid','d_migration1','qe1','qc1','qb1','c.qb1#c.qb1','d_gender','qe1_2.0','qe1_3.0','qe1_4.0','qe1_5.0','c.d_gender#i.qe1_2.0','c.d_gender#i.qe1_3.0','c.d_gender#i.qe1_4.0','c.d_gender#i.qe1_5.0','qc1_2.0','qc1_3.0','qc1_4.0','qc1_5.0','qc1_6.0','qc1_7.0','qc1_8.0','c.d_gender#i.qc1_2.0','c.d_gender#i.qc1_3.0','c.d_gender#i.qc1_4.0','c.d_gender#i.qc1_5.0','c.d_gender#i.qc1_6.0','c.d_gender#i.qc1_7.0','c.d_gender#i.qc1_8.0','gender_bro','gender_sib','qa1age']]
model6_data=model6_data.dropna(how='any')
model6=sm.OLS(model6_data['d_migration1'],sm.add_constant(model6_data[['qb1','c.qb1#c.qb1','d_gender','qe1_2.0','qe1_3.0','qe1_4.0','qe1_5.0','c.d_gender#i.qe1_2.0','c.d_gender#i.qe1_3.0','c.d_gender#i.qe1_4.0','c.d_gender#i.qe1_5.0','qc1_2.0','qc1_3.0','qc1_4.0','qc1_5.0','qc1_6.0','qc1_7.0','qc1_8.0','c.d_gender#i.qc1_2.0','c.d_gender#i.qc1_3.0','c.d_gender#i.qc1_4.0','c.d_gender#i.qc1_5.0','c.d_gender#i.qc1_6.0','c.d_gender#i.qc1_7.0','c.d_gender#i.qc1_8.0','gender_bro','gender_sib','qa1age']])).fit(cov_type='cluster', cov_kwds={'groups': model6_data['fid']},use_t=True)

model7_data=new_data[['fid','d_migration2','qb1','c.qb1#c.qb1','d_gender','gender_bro','gender_sib']]
model7_data=model7_data.dropna(how='any')
model7=sm.OLS(model7_data['d_migration2'],sm.add_constant(model7_data[['qb1','c.qb1#c.qb1','d_gender','gender_bro','gender_sib']])).fit(cov_type='cluster', cov_kwds={'groups': model7_data['fid']},use_t=True)

model8_data=new_data[['fid','d_migration2','qe1','qc1','qb1','c.qb1#c.qb1','d_gender','qe1_2.0','qe1_3.0','qe1_4.0','qe1_5.0','c.d_gender#i.qe1_2.0','c.d_gender#i.qe1_3.0','c.d_gender#i.qe1_4.0','c.d_gender#i.qe1_5.0','qc1_2.0','qc1_3.0','qc1_4.0','qc1_5.0','qc1_6.0','qc1_7.0','qc1_8.0','c.d_gender#i.qc1_2.0','c.d_gender#i.qc1_3.0','c.d_gender#i.qc1_4.0','c.d_gender#i.qc1_5.0','c.d_gender#i.qc1_6.0','c.d_gender#i.qc1_7.0','c.d_gender#i.qc1_8.0','gender_bro','gender_sib','qa1age']]
model8_data=model8_data.dropna(how='any')
model8=sm.OLS(model8_data['d_migration2'],sm.add_constant(model8_data[['qb1','c.qb1#c.qb1','d_gender','qe1_2.0','qe1_3.0','qe1_4.0','qe1_5.0','c.d_gender#i.qe1_2.0','c.d_gender#i.qe1_3.0','c.d_gender#i.qe1_4.0','c.d_gender#i.qe1_5.0','qc1_2.0','qc1_3.0','qc1_4.0','qc1_5.0','qc1_6.0','qc1_7.0','qc1_8.0','c.d_gender#i.qc1_2.0','c.d_gender#i.qc1_3.0','c.d_gender#i.qc1_4.0','c.d_gender#i.qc1_5.0','c.d_gender#i.qc1_6.0','c.d_gender#i.qc1_7.0','c.d_gender#i.qc1_8.0','gender_bro','gender_sib','qa1age']])).fit(cov_type='cluster', cov_kwds={'groups': model8_data['fid']},use_t=True)

model1p=sm.Probit(model1_data['d_hukou_status1'],sm.add_constant(model1_data[['qb1','c.qb1#c.qb1','d_gender','gender_bro','gender_sib']])).fit(cov_type='cluster', cov_kwds={'groups': model1_data['fid']},use_t=True)
model2p=sm.Probit(model2_data['d_hukou_status1'],sm.add_constant(model2_data[['qb1','c.qb1#c.qb1','d_gender','qe1_2.0','qe1_3.0','qe1_4.0','qe1_5.0','c.d_gender#i.qe1_2.0','c.d_gender#i.qe1_3.0','c.d_gender#i.qe1_4.0','c.d_gender#i.qe1_5.0','qc1_2.0','qc1_3.0','qc1_4.0','qc1_5.0','qc1_6.0','qc1_7.0','qc1_8.0','c.d_gender#i.qc1_2.0','c.d_gender#i.qc1_3.0','c.d_gender#i.qc1_4.0','c.d_gender#i.qc1_5.0','c.d_gender#i.qc1_6.0','c.d_gender#i.qc1_7.0','c.d_gender#i.qc1_8.0','gender_bro','gender_sib','qa1age']])).fit(cov_type='cluster', cov_kwds={'groups': model2_data['fid']},use_t=True)
model3p=sm.Probit(model3_data['d_hukou_status2'],sm.add_constant(model3_data[['qb1','c.qb1#c.qb1','d_gender','gender_bro','gender_sib']])).fit(cov_type='cluster', cov_kwds={'groups': model3_data['fid']},use_t=True)
model4p=sm.Probit(model4_data['d_hukou_status2'],sm.add_constant(model4_data[['qb1','c.qb1#c.qb1','d_gender','qe1_2.0','qe1_3.0','qe1_4.0','qe1_5.0','c.d_gender#i.qe1_2.0','c.d_gender#i.qe1_3.0','c.d_gender#i.qe1_4.0','c.d_gender#i.qe1_5.0','qc1_2.0','qc1_3.0','qc1_4.0','qc1_5.0','qc1_6.0','qc1_7.0','qc1_8.0','c.d_gender#i.qc1_2.0','c.d_gender#i.qc1_3.0','c.d_gender#i.qc1_4.0','c.d_gender#i.qc1_5.0','c.d_gender#i.qc1_6.0','c.d_gender#i.qc1_7.0','c.d_gender#i.qc1_8.0','gender_bro','gender_sib','qa1age']])).fit(cov_type='cluster', cov_kwds={'groups': model4_data['fid']},use_t=True)
model5p=sm.Probit(model5_data['d_migration1'],sm.add_constant(model5_data[['qb1','c.qb1#c.qb1','d_gender','gender_bro','gender_sib']])).fit(cov_type='cluster', cov_kwds={'groups': model5_data['fid']},use_t=True)
model6p=sm.Probit(model6_data['d_migration1'],sm.add_constant(model6_data[['qb1','c.qb1#c.qb1','d_gender','qe1_2.0','qe1_3.0','qe1_4.0','qe1_5.0','c.d_gender#i.qe1_2.0','c.d_gender#i.qe1_3.0','c.d_gender#i.qe1_4.0','c.d_gender#i.qe1_5.0','qc1_2.0','qc1_3.0','qc1_4.0','qc1_5.0','qc1_6.0','qc1_7.0','qc1_8.0','c.d_gender#i.qc1_2.0','c.d_gender#i.qc1_3.0','c.d_gender#i.qc1_4.0','c.d_gender#i.qc1_5.0','c.d_gender#i.qc1_6.0','c.d_gender#i.qc1_7.0','c.d_gender#i.qc1_8.0','gender_bro','gender_sib','qa1age']])).fit(cov_type='cluster', cov_kwds={'groups': model6_data['fid']},use_t=True)
model7p=sm.Probit(model7_data['d_migration2'],sm.add_constant(model7_data[['qb1','c.qb1#c.qb1','d_gender','gender_bro','gender_sib']])).fit(cov_type='cluster', cov_kwds={'groups': model7_data['fid']},use_t=True)
model8p=sm.Probit(model8_data['d_migration2'],sm.add_constant(model8_data[['qb1','c.qb1#c.qb1','d_gender','qe1_2.0','qe1_3.0','qe1_4.0','qe1_5.0','c.d_gender#i.qe1_2.0','c.d_gender#i.qe1_3.0','c.d_gender#i.qe1_4.0','c.d_gender#i.qe1_5.0','qc1_2.0','qc1_3.0','qc1_4.0','qc1_5.0','qc1_6.0','qc1_7.0','qc1_8.0','c.d_gender#i.qc1_2.0','c.d_gender#i.qc1_3.0','c.d_gender#i.qc1_4.0','c.d_gender#i.qc1_5.0','c.d_gender#i.qc1_6.0','c.d_gender#i.qc1_7.0','c.d_gender#i.qc1_8.0','gender_bro','gender_sib','qa1age']])).fit(cov_type='cluster', cov_kwds={'groups': model8_data['fid']},use_t=True)


Optimization terminated successfully.
         Current function value: 0.435309
         Iterations 5
         Current function value: 0.406303
         Iterations: 35
Optimization terminated successfully.
         Current function value: 0.407047
         Iterations 5
         Current function value: 0.377092
         Iterations: 35
Optimization terminated successfully.
         Current function value: 0.281624
         Iterations 6
         Current function value: 0.274540
         Iterations: 35
Optimization terminated successfully.
         Current function value: 0.232794
         Iterations 6
         Current function value: 0.221931
         Iterations: 35


In [8]:
#LPM1, no covariate, using Hukou status at age 3 as original Hukou status
print(model1.summary())

                            OLS Regression Results                            
Dep. Variable:        d_hukou_status1   R-squared:                       0.001
Model:                            OLS   Adj. R-squared:                  0.001
Method:                 Least Squares   F-statistic:                     9.994
Date:                Fri, 06 May 2022   Prob (F-statistic):           1.46e-09
Time:                        01:34:14   Log-Likelihood:                -13575.
No. Observations:               33185   AIC:                         2.716e+04
Df Residuals:                   33179   BIC:                         2.721e+04
Df Model:                           5                                         
Covariance Type:              cluster                                         
                  coef    std err          t      P>|t|      [0.025      0.975]
-------------------------------------------------------------------------------
const           0.1548      0.006     26.725      

In [9]:
#LPM2, covariates, using Hukou status at age 3 as original Hukou status
print(model2.summary())

                            OLS Regression Results                            
Dep. Variable:        d_hukou_status1   R-squared:                       0.062
Model:                            OLS   Adj. R-squared:                  0.061
Method:                 Least Squares   F-statistic:                     4964.
Date:                Fri, 06 May 2022   Prob (F-statistic):               0.00
Time:                        01:34:14   Log-Likelihood:                -12514.
No. Observations:               33157   AIC:                         2.509e+04
Df Residuals:                   33128   BIC:                         2.533e+04
Df Model:                          28                                         
Covariance Type:              cluster                                         
                           coef    std err          t      P>|t|      [0.025      0.975]
----------------------------------------------------------------------------------------
const                   -0.1039 

In [10]:
#LPM3, no covariate, using Hukou status at age 12 as original Hukou status
print(model3.summary())

                            OLS Regression Results                            
Dep. Variable:        d_hukou_status2   R-squared:                       0.001
Model:                            OLS   Adj. R-squared:                  0.001
Method:                 Least Squares   F-statistic:                     9.950
Date:                Fri, 06 May 2022   Prob (F-statistic):           1.62e-09
Time:                        01:34:14   Log-Likelihood:                -12087.
No. Observations:               33185   AIC:                         2.419e+04
Df Residuals:                   33179   BIC:                         2.424e+04
Df Model:                           5                                         
Covariance Type:              cluster                                         
                  coef    std err          t      P>|t|      [0.025      0.975]
-------------------------------------------------------------------------------
const           0.1362      0.006     24.680      

In [11]:
#LPM4, covariates, using Hukou status at age 12 as original Hukou status
print(model4.summary())

                            OLS Regression Results                            
Dep. Variable:        d_hukou_status2   R-squared:                       0.063
Model:                            OLS   Adj. R-squared:                  0.063
Method:                 Least Squares   F-statistic:                     6505.
Date:                Fri, 06 May 2022   Prob (F-statistic):               0.00
Time:                        01:34:14   Log-Likelihood:                -11007.
No. Observations:               33157   AIC:                         2.207e+04
Df Residuals:                   33128   BIC:                         2.232e+04
Df Model:                          28                                         
Covariance Type:              cluster                                         
                           coef    std err          t      P>|t|      [0.025      0.975]
----------------------------------------------------------------------------------------
const                   -0.1234 

In [12]:
#LPM5, no covariate, using province in which the survey was taken as the current province of settlement
print(model5.summary())

                            OLS Regression Results                            
Dep. Variable:           d_migration1   R-squared:                       0.002
Model:                            OLS   Adj. R-squared:                  0.002
Method:                 Least Squares   F-statistic:                     20.17
Date:                Fri, 06 May 2022   Prob (F-statistic):           4.13e-20
Time:                        01:34:14   Log-Likelihood:                -4067.3
No. Observations:               33185   AIC:                             8147.
Df Residuals:                   33179   BIC:                             8197.
Df Model:                           5                                         
Covariance Type:              cluster                                         
                  coef    std err          t      P>|t|      [0.025      0.975]
-------------------------------------------------------------------------------
const           0.0810      0.005     17.370      

In [13]:
#LPM6, covariates, using province in which the survey was taken as the current province of settlement
print(model6.summary())

                            OLS Regression Results                            
Dep. Variable:           d_migration1   R-squared:                       0.018
Model:                            OLS   Adj. R-squared:                  0.017
Method:                 Least Squares   F-statistic:                     9652.
Date:                Fri, 06 May 2022   Prob (F-statistic):               0.00
Time:                        01:34:14   Log-Likelihood:                -3792.5
No. Observations:               33157   AIC:                             7643.
Df Residuals:                   33128   BIC:                             7887.
Df Model:                          28                                         
Covariance Type:              cluster                                         
                           coef    std err          t      P>|t|      [0.025      0.975]
----------------------------------------------------------------------------------------
const                    0.0087 

In [14]:
#LPM7, no covariate, using current province of Hukou as the current province of settlement
print(model7.summary())

                            OLS Regression Results                            
Dep. Variable:           d_migration2   R-squared:                       0.002
Model:                            OLS   Adj. R-squared:                  0.002
Method:                 Least Squares   F-statistic:                     15.56
Date:                Fri, 06 May 2022   Prob (F-statistic):           2.64e-15
Time:                        01:34:15   Log-Likelihood:                 21.639
No. Observations:               33185   AIC:                            -31.28
Df Residuals:                   33179   BIC:                             19.18
Df Model:                           5                                         
Covariance Type:              cluster                                         
                  coef    std err          t      P>|t|      [0.025      0.975]
-------------------------------------------------------------------------------
const           0.0526      0.004     13.821      

In [15]:
#LPM8, covariates, using current province of Hukou as the current province of settlement
print(model8.summary())

                            OLS Regression Results                            
Dep. Variable:           d_migration2   R-squared:                       0.026
Model:                            OLS   Adj. R-squared:                  0.025
Method:                 Least Squares   F-statistic:                 1.857e+04
Date:                Fri, 06 May 2022   Prob (F-statistic):               0.00
Time:                        01:34:15   Log-Likelihood:                 427.31
No. Observations:               33157   AIC:                            -796.6
Df Residuals:                   33128   BIC:                            -552.8
Df Model:                          28                                         
Covariance Type:              cluster                                         
                           coef    std err          t      P>|t|      [0.025      0.975]
----------------------------------------------------------------------------------------
const                   -0.0409 

# IV. Findings

· Gender and Transfer of Hukou Status
1. Using Hukou status at age 3 as original Hukou status
a. no covariate
The number of siblings exhibits a linear positive correlation with the probability of Hukou status. However, for females, this effect is almost offset by the negative coefficient of the interaction term. Therefore, for women with siblings, there is not much possibility to change their Hukou status. The positive coefficient of the interaction between gender and younger brother suggests that women with younger brothers are more likely to change their Hukou status, indicating that the positive force prevails in this case. 

b. covariates
The relation between the number of siblings and the probability of Hukou status change becomes nonlinear U-shape. 

For both males and females, getting married will be more possible to change Hukou status. However, when it comes to being divorced or widowed, women would be more inclined to change Hukou status, which means marriage brings burden on women, disengagement from the marriage makes women dominance in dominance on important life events such as place of residence, instead of being influenced by their husbands. 

As for education, the males will have a higher possibility to change the Hukou status, especially this could be certain for the male who has a doctoral degree. For women who dropped out of school after senior school, it will be also easier for them to change the Hukou status. The negative term of doctoral degree for women indicates that it is less more possible for women with high educational level to change their Hukou status compared with men. There are many explanations for this result, such as gender discrimination in the workplace, inability to find support from her family, or a strong sense of belonging to her hometown, all of which could be true.


2. Using Hukou status at age 12 as original Hukou status
a. no covariate
Consistent with the model of another measurement.

b. covariates
Pattern is consistent with the model of another measurement, although for women with master’s degrees, similar negative effect emerges, indicating there is a similar situation for them to those with a doctoral degree.




· Gender and Inter-provincial Migration 
1. Using the provinces in which the survey were taken as current province
a. No covariate
The coefficients presents a U-shape relation between the number of siblings and migration for the male benchmark group. For female group, the relation shows a 2% increase in the possibility of migration for women. For the female individuals who have brothers, it is even easier to have inter-provincial migration in order to support their family members. This is probably related to working at younger ages in other places to support their younger brothers. 

b. Covariates
The relation between sibling and inter-provincial migration does not change much compared to models without covariates. The female group, however, has a lower probability of inter-provincial migration now. The interaction term of siblings and gender again becomes insignificant, absorbed by the effects of marital status and education.

Both males and females who have(had) a relationship will be more possible for inter-provincial migration, especially for women, because they have higher dependencies in relationship in terms of migration. 

As for having education, compared to illiteracy,the male individuals have a higher possibility for inter-provincial migration. Women who have lower than 3 years of undergraduate education will have a higher possibility for inter-provincial migration compared to men. However, for women who attained higher education including bachelor's, master and doctoral degrees, inter-provincial migration seems to be less seen than men.


2. Using the current Hukou provinces as current province
a. No covariate
The coefficients of the number of siblings and its quadratic terms are not significant anymore.

b. Covariates
The effects of marital status for male groups are insignificant except for a negative one with Married on a small margin. The patterns of the effects of education and the interaction between marital status and education remain the same as in the former model of the first measurement

Robustness: using different kinds of measurements in both models, the results are generally consistent between each kind of measurement, although some discrepancies appear in the migration model.

# V. Conclusion

1. Although women on average have lower rates of Hukou status change and higher rates of migration, the net effect of patriarchy in the family on female Hukou status change and inter-provincial migration is unexpected positive, even though on a relatively small margin. This is probably because patriarchal power in the family may be more of an incentive for girls to leave original families for job opportunities at the expense of education, resulting in higher probabilities of inter-provincial migration and thus a higher chance of changing their Hukou status.
2. Engaging or having engaged in relationships raises the probability of female migration, while only disengagement from marriage causes the altering of female Hukou status.
3. Compared to men, women with higher educational levels enjoy less gain from education in Hukou status transfer and in inter-provincial migration than men do. Instead, females with lower educational levels tend to have higher rates of Hukou status change and inter-provincial migration probably due to early work. 

### Questions to be answered in the future

1. The direction of Hukou status transfer especially for inverse migration (urban to rural) of women. 
2. Urban-rural migration within a province.
3. Whether the Hukou system is having a stimulating effect on educational investment as international skilled migration does in developing countries.
4. The causal relationship between migration and education. Although this paper assumes Hukou and migration are results of education, some parents would like to transfer their Hukou to a place that has better education resource, resulting in the effect of Hukou status on educational attainment.