# Initial processing of Understanding Society data

Starting off with the full Understanding Society dataset (Ind+hh_with_xwave_all_waves.dta), I made the following changes in Stata:

1. Start with 505,477 rows and 8,488 columns
2. Drop all irrelevant waves (294,270 observations deleted)
3. Dropped all vars with >10% missing (7,603 columns dropped)
4. Dropped all rows where nbrsnci_dv < 0 or scghq1_dv < 0 (35,872 rows dropped) (17%)
5. Dropped all informal institutions and SW variables except nbrsnci_dv and scghq1_dv (21 columns dropped)
6. Dropped vars with >20% negative variables (521 columns dropped) (343 vars remain)

This notebook will have a look at the large processed dataset (343 vars, 175,335 obs), and run a lasso regression to see which variables are most predictive of subjective wellbeing (measured on the Likert scale).

Before starting, these are the 15 variables I hand selected based on intuition and prior research to include in my refined variable list:

- gor_dv : Government Office Region
- urban_dv : Urban or rural area, derived
- nbrsnci_dv : Buckner's Neighbourhood Cohesion Instrument, short (α= .88)
- scghq1_dv : Subjective wellbeing (GHQ): Likert
- sex_dv : Sex, derived
- age_dv : Age, derived from dob_dv and intdat_dv
- ethn_dv : Ethnic group (derived from multiple sources)
- marstat_dv : Harmonised de facto marital status
- jbstat : Current economic activity
- qfhigh_dv : Highest educational qualification ever reported
- jbiindb_dv : Current job: Industrial classification (CNEF), two digits
- fimnnet_dv : total net personal income - no deductions
- fihhmnnet1_dv : total household net income - no deductions
- houscost1_dv : monthly housing cost including mortgage principal payments
- health : Long-standing illness or disability

In [29]:
import pandas as pd
pd.set_option('display.max_rows', None)
import matplotlib.pyplot as plt
%matplotlib inline
import pyreadstat

In [30]:
# Import file

path = "/Users/arikatz/VSCode Projects/ukhls-informal-institutions-project/data/droppedvaraibleswithlotsofnegatives.dta"
df_full, meta = pyreadstat.read_dta(path)

print("Shape:", df_full.shape)
df_full.head()

Shape: (175335, 343)


Unnamed: 0,pidp,wave,wave_num,nbrsnci_dv,scghq1_dv,hidp,pno,hhorig,memorig,psu,strata,sampst,month,ivfio,ioutcome,sex,dvage,birthy,istrtdatd,istrtdatm,istrtdaty,lkmove,xpmove,jbstat,racel_dv,health,aidhh,aidxhh,j2has,bensta2,bensta3,bensta4,bensta5,bensta6,bensta7,bensta96,fiyrdia,finnow,finfut,vote1,vote6,mobuse,nch14resp,nch415resp,nchresp,nnatch,nadoptch,nchunder16,nch5to15,nch10to15,sclfsat1,sclfsat2,sclfsat7,sclfsato,marstat,employ,hgbiom,hgbiof,hgpart,respf16,respm16,intdatd_if,intdatm_if,intdaty_if,doby_if,age_if,pn1pno,pn2pno,pns1pno,pns2pno,hhsize,jbhas,istrtdathh,istrtdatmm,istrtdatss,ienddathh,ienddatmm,ienddatss,j2pay_if,fimngrs_tc,fimngrs_dv,fimnlabgrs_tc,fimnlabgrs_dv,fimnlabnet_tc,fimnlabnet_dv,fiyrinvinc_tc,fiyrinvinc_dv,fibenothr_tc,fibenothr_dv,j2pay_dv,j2paynet_dv,sex_dv,age_dv,intdatd_dv,intdatm_dv,intdaty_dv,doby_dv,pensioner_dv,npensioner_dv,marstat_dv,npn_dv,npns_dv,ngrp_dv,nnsib_dv,nnssib_dv,ethn_dv,fimnmisc_dv,fimnprben_dv,fimninvnet_dv,fimnpen_dv,fimnsben_dv,fimnnet_dv,country,gor_dv,urban_dv,hhresp_dv,xtra5min_dv,agegr5_dv,agegr10_dv,agegr13_dv,livesp_dv,cohab_dv,single_dv,mastat_dv,hhtype_dv,buno_dv,depchl_dv,nchild_dv,ndepchl_dv,respm16_dv,respf16_dv,rach16_dv,hrpid,hrpno,ppno,sppno,fnpno,fnspno,mnpno,mnspno,grfpno,grmpno,qfhigh_dv,qfhighfl_dv,hiqual_dv,jbiindb_dv,sf12pcs_dv,sf12mcs_dv,scflag_dv,paygu_if,paynu_if,seearngrs_if,fiyrinvinc_if,fibenothr_if,fimnlabgrs_if,fimngrs_if,ind5mus_xw,ivfho,intdated,intdatem,intdatey,ivh1,ivh2,ivh3,ivh4,ivh5,ivh6,ivh7,ivh8,ivh9,ivh10,ivh11,ivh12,ivh13,ivh14,ivh15,ivh16,hsbeds,hsrooms,hsownd,fuelhave1,fuelhave2,fuelhave3,fuelhave4,fuelhave96,fuelduel,heatch,xphsdct,xphsdba,cduse1,cduse2,cduse5,cduse6,cduse7,cduse8,cduse9,cduse12,cduse13,cduse96,pcnet,xpfood1_g3,xpfdout_g3,xpaltob_g3,ncars,hhintlang,n10to15,fihhmngrs_dv,fihhmngrs_tc,fihhmnlabgrs_dv,fihhmnlabgrs_tc,ctband_if,fihhmnnet1_dv,fihhmnlabnet_dv,fihhmnmisc_dv,fihhmnprben_dv,fihhmninv_dv,fihhmnpen_dv,fihhmnsben_dv,houscost1_dv,houscost2_dv,fihhmngrs1_dv,ctband_dv,ncouple_dv,nonepar_dv,nkids_dv,nch02_dv,nch34_dv,nch511_dv,nch1215_dv,npens_dv,nemp_dv,nue_dv,nwage_dv,nchoecd_dv,nadoecd_dv,ieqmoecd_dv,tenure_dv,fihhnegsei_if,fihhmngrs_if,issue_num,aintlen,outcome,ivtnc,w6osmflag,dcsedfl_dv,lwenum_dv,fwenum_dv,lwintvd_dv,fwintvd_dv,b_hidp,b_pno,b_ivfio,b_ivfho,b_month,c_hidp,c_pno,c_ivfio,c_ivfho,c_month,d_hidp,d_pno,d_ivfio,d_ivfho,d_month,e_hidp,e_pno,e_ivfio,e_ivfho,e_month,f_hidp,f_pno,f_ivfio,f_ivfho,f_month,g_hidp,g_pno,g_ivfio,g_ivfho,g_month,h_hidp,h_pno,h_ivfio,h_ivfho,h_month,i_hidp,i_pno,i_ivfio,i_ivfho,i_month,genetics,epigenetics,xwdat_dv,scend_dv,school_dv,bornuk_dv,generation,evercoh_dv,evermar_dv,anychild_dv,ethn_dv_source,prob91e,prob91w,prob91s,prob99w,prob99s,prob01ni,prob09ni,prob09e,prob09w,prob09s,bb_mortbh_tw,bc_mortbh_tw,bd_mortbh_tw,be_mortbh_tw,bf_mortbh_tw,bg_mortbh_tw,bh_mortbh_tw,bi_mortbh_tw,bj_mortbh_tw,bk_mortbh_tw,bl_mortbh_tw,bm_mortbh_tw,bn_mortbh_tw,bo_mortbh_tw,bp_mortbh_tw,bq_mortbh_tw,br_mortbh_tw,b_mortbh_tw,c_mortbh_tw,d_mortbh_tw,e_mortbh_tw,f_mortbh_tw,g_mortbh_tw,h_mortbh_tw,i_mortbh_tw,b_mortus_tw,c_mortus_tw,d_mortus_tw,e_mortus_tw,f_mortus_tw,g_mortus_tw,h_mortus_tw,psnenub_xd
0,22445,f,6,3.4,25,278664010,3,3,3,4,2,1,6,1,11,2,29,1984,26,6,2014,2,1,2,1,2,2,2,1,0,0,0,0,0,0,1,0,2,2,1,2,1,0,0,0,0,0,0,0,0,2,5,2,3,1,1,1,0,0,0,0,0,0,0,0,0,1,0,1,0,2,1,18,16,57,19,7,12,0,0,2572.590088,0,2572.590088,0,2012.0,0,0.0,0,0.0,90,72.0,2,29,26,6,2014,1984,2,1,6,1,1,0,0,0,1,0.0,0.0,0.0,0.0,0.0,2012.0,1,7,1,1,0,6,3,5,0,0,1,1,17,3,2,0,0,2,2,2,272012925,1,0,0,0,0,1,1,0,0,-8,0,3,31,62.12,32.59,1,0,0,0,0,0.0,0.0,0.0,0.0,14,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,0,2,14,4,14,4,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,280942006,5,1,10,4,279255608,5,1,10,4,278664010,3,1,14,6,278447092,1,1,10,6,278092814,1,1,10,6,277344816,1,1,10,6,0,0,3,-8,3,1,5,1,1,1,1,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.999978,0.999948,0.99992,0.99989,0.999854,0.999813,0.999772,0.999738,0.999689,0.999649,0.999609,0.999566,0.999452,0.999389,0.999288,0.999219,0.999144,0.999005,0.998933,0.99884,0.998742,0.998624,0.998511,0.998397,0.998219,1.0,1.0,1.0,1.0,1.0,1.0,1.0,0.0
1,22445,i,9,3.3,11,277344816,1,3,3,4,2,1,6,1,11,2,33,1984,23,10,2017,2,2,2,1,2,-8,2,2,0,0,0,0,0,0,1,0,2,3,1,1,1,0,0,0,0,0,0,0,0,4,4,4,5,2,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,1,18,50,38,19,28,40,0,0,2423.030029,0,2333.330078,0,1200.0,0,0.0,0,89.699997,0,0.0,2,33,23,10,2017,1984,2,0,1,0,0,0,0,0,1,0.0,0.0,0.0,0.0,89.699997,1289.699951,1,7,1,1,0,7,4,6,0,0,1,2,3,1,2,0,-8,2,2,2,22445,1,0,0,0,0,0,0,0,0,-8,0,3,31,57.2,46.08,1,0,0,0,0,0.0,0.0,0.0,0.0,10,23.0,10.0,2017.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,3.0,3.0,2.0,1.0,1.0,0.0,0.0,0.0,1.0,1.0,2.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,0.0,1.0,300.0,150.0,30.0,1.0,0.0,0.0,2423.030029,0.0,2333.330078,0.0,1.0,1289.699951,1200.0,0.0,0.0,0.0,0.0,89.699997,1300.0,736.869995,2423.030029,4.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,1.0,0.0,1.0,1.0,2.0,0.0,0.0,1.0,10.0,110.0,9.0,0,2,14,4,14,4,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,280942006,5,1,10,4,279255608,5,1,10,4,278664010,3,1,14,6,278447092,1,1,10,6,278092814,1,1,10,6,277344816,1,1,10,6,0,0,3,-8,3,1,5,1,1,1,1,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.999978,0.999948,0.99992,0.99989,0.999854,0.999813,0.999772,0.999738,0.999689,0.999649,0.999609,0.999566,0.999452,0.999389,0.999288,0.999219,0.999144,0.999005,0.998933,0.99884,0.998742,0.998624,0.998511,0.998397,0.998219,1.0,1.0,1.0,1.0,1.0,1.0,1.0,0.0
2,22445,l,12,1.6,32,276637622,1,3,3,4,2,1,4,1,11,2,35,1984,2,4,2020,2,1,6,1,2,2,2,2,0,0,0,0,0,0,1,0,4,1,1,2,1,2,1,2,2,0,2,0,0,5,3,3,5,2,2,0,0,2,0,1,0,0,0,0,0,0,0,0,0,4,2,21,1,46,21,17,32,0,0,145.169998,0,0.0,0,0.0,0,0.0,0,145.169998,0,0.0,2,35,2,4,2020,1984,2,0,1,0,0,0,0,0,1,0.0,0.0,0.0,0.0,145.169998,145.169998,1,7,1,1,0,8,4,7,1,0,0,2,11,1,2,2,2,1,2,1,276841780,1,2,2,0,0,0,0,0,0,1,0,1,0,67.18,19.42,0,0,0,0,0,1.0,0.0,1.0,0.0,10,2.0,4.0,2020.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,3.0,3.0,2.0,1.0,1.0,1.0,0.0,0.0,1.0,2.0,2.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,0.0,1.0,300.0,100.0,100.0,0.0,0.0,0.0,5656.390137,0.0,5070.0,0.0,1.0,4146.390137,3560.0,350.0,0.0,0.0,0.0,236.389999,1350.0,705.679993,5656.390137,4.0,1.0,0.0,2.0,1.0,1.0,0.0,0.0,0.0,1.0,1.0,2.0,2.0,2.0,2.1,2.0,0.0,0.0257,1.0,9.35,110.0,-9.0,0,2,14,4,14,4,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,280942006,5,1,10,4,279255608,5,1,10,4,278664010,3,1,14,6,278447092,1,1,10,6,278092814,1,1,10,6,277344816,1,1,10,6,0,0,3,-8,3,1,5,1,1,1,1,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.999978,0.999948,0.99992,0.99989,0.999854,0.999813,0.999772,0.999738,0.999689,0.999649,0.999609,0.999566,0.999452,0.999389,0.999288,0.999219,0.999144,0.999005,0.998933,0.99884,0.998742,0.998624,0.998511,0.998397,0.998219,1.0,1.0,1.0,1.0,1.0,1.0,1.0,0.0
3,29925,f,6,4.1,11,620547610,1,3,3,6,2,1,8,1,11,2,37,1977,29,9,2014,1,1,1,1,1,2,2,2,0,0,1,0,0,0,0,0,4,1,2,2,1,2,1,2,2,0,2,0,0,3,2,5,4,4,1,0,0,0,0,1,0,0,0,0,0,0,0,0,0,3,1,10,0,47,10,48,3,0,0,2175.620117,0,13.82,0,13.82,0,0.0,0,2161.800049,0,0.0,2,37,29,9,2014,1977,2,0,5,0,0,0,0,0,1,0.0,320.0,0.0,0.0,1841.800049,2175.620117,1,7,1,1,0,8,4,7,0,0,1,4,5,1,2,2,2,1,2,1,29925,1,0,0,0,0,0,0,0,0,-8,0,1,30,56.59,35.67,1,0,0,1,0,0.04,1.0,0.05,0.0,10,29.0,9.0,2014.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,2.0,1.0,4.0,1.0,1.0,0.0,0.0,0.0,1.0,1.0,2.0,1.0,1.0,1.0,1.0,1.0,0.0,1.0,1.0,1.0,1.0,0.0,1.0,350.0,30.0,0.0,1.0,0.0,0.0,2175.620117,0.0,13.82,0.0,3.0,2175.620117,13.82,0.0,320.0,0.0,0.0,1841.800049,1451.0,1451.0,2175.620117,2.0,0.0,1.0,2.0,0.0,2.0,0.0,0.0,0.0,1.0,0.0,1.0,2.0,1.0,1.6,7.0,0.0,0.0451,1.0,10.0,110.0,3.0,0,2,14,4,14,4,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,622866406,3,2,11,9,621384688,1,50,61,9,620547610,1,1,10,8,620316412,1,1,10,8,619935614,1,1,10,8,619024416,1,1,10,8,0,0,3,-8,3,1,5,1,1,1,2,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.999955,0.999895,0.999839,0.999777,0.999704,0.999622,0.999538,0.99947,0.999369,0.999289,0.999208,0.99912,0.99889,0.998761,0.998557,0.998418,0.998266,0.997985,0.997838,0.997649,0.997451,0.997212,0.996983,0.996752,0.996392,1.0,1.0,1.0,1.0,1.0,1.0,1.0,0.0
4,29925,i,9,3.5,9,619024416,1,3,3,6,2,1,8,1,11,2,40,1977,22,8,2017,1,2,2,1,1,2,2,2,0,0,1,0,0,0,0,0,4,3,2,1,1,2,2,2,2,0,2,2,0,6,3,6,5,5,1,0,0,0,0,1,0,0,0,0,0,0,0,0,0,3,2,20,39,52,21,37,24,0,0,3054.530029,0,1400.0,0,1250.0,0,0.0,0,1654.530029,0,0.0,2,40,22,8,2017,1977,2,0,4,0,0,0,0,0,1,0.0,1000.0,0.0,0.0,654.530029,2904.530029,1,7,1,1,0,9,5,8,0,0,1,5,5,1,2,2,2,1,2,1,622866606,1,0,0,0,0,0,0,0,0,-8,0,1,27,62.04,41.06,0,0,0,0,0,0.0,0.0,0.0,0.0,10,22.0,8.0,2017.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,2.0,1.0,1.0,1.0,1.0,0.0,0.0,0.0,1.0,1.0,2.0,1.0,1.0,1.0,1.0,1.0,0.0,1.0,1.0,1.0,1.0,0.0,1.0,300.0,100.0,0.0,1.0,0.0,0.0,3054.530029,0.0,1400.0,0.0,2.0,2904.530029,1250.0,0.0,1000.0,0.0,0.0,654.530029,0.0,0.0,3054.530029,4.0,0.0,1.0,2.0,0.0,0.0,2.0,0.0,0.0,1.0,0.0,1.0,2.0,1.0,1.6,1.0,0.0,0.0011,1.0,17.0,110.0,-9.0,0,2,14,4,14,4,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,622866406,3,2,11,9,621384688,1,50,61,9,620547610,1,1,10,8,620316412,1,1,10,8,619935614,1,1,10,8,619024416,1,1,10,8,0,0,3,-8,3,1,5,1,1,1,2,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.999955,0.999895,0.999839,0.999777,0.999704,0.999622,0.999538,0.99947,0.999369,0.999289,0.999208,0.99912,0.99889,0.998761,0.998557,0.998418,0.998266,0.997985,0.997838,0.997649,0.997451,0.997212,0.996983,0.996752,0.996392,1.0,1.0,1.0,1.0,1.0,1.0,1.0,0.0


In [31]:
df = df_full

## Dealing with missing and negative values

Great! Now we need to deal with missing and negative values. 

First, let's see which columns have the most missing and negative values.

Then, we can drop those columns.

Finally, we'll drop any remaining rows with missing or negative values.

In [32]:
# Calculate % missing and % negative for each column
missing_pct = df.isnull().mean() 
negative_pct = (df.select_dtypes(include=['number']) < 0).mean() * 100

# Combine into a summary DataFrame and sort by the sum of % missing and % negative (descending)
summary = pd.DataFrame({
    '% Missing': missing_pct,
    '% Negative': negative_pct
}).fillna(0)

summary['% Total'] = summary['% Missing'] + summary['% Negative']
summary = summary.sort_values(by='% Total', ascending=False)

summary.head(50)

Unnamed: 0,% Missing,% Negative,% Total
i_pno,0.0,19.098868,19.098868
i_month,0.0,18.786894,18.786894
i_ivfho,0.0,17.531582,17.531582
i_ivfio,0.0,17.531582,17.531582
i_hidp,0.0,17.531582,17.531582
h_pno,0.0,16.072661,16.072661
h_month,0.0,15.73217,15.73217
aidhh,0.0,15.215445,15.215445
ndepchl_dv,0.0,15.13902,15.13902
h_hidp,0.0,14.320586,14.320586


In [33]:
# Delete the vars with total % ineligible >5%
cols_to_drop = summary[summary['% Total'] > 5].index
cols_to_drop = summary.head(25).index
df = df.drop(columns=cols_to_drop)

print("Shape after dropping vars with total % ineligible >5%:", df.shape)
df.head()

Shape after dropping vars with total % ineligible >5%: (175335, 318)


Unnamed: 0,pidp,wave,wave_num,nbrsnci_dv,scghq1_dv,hidp,pno,hhorig,memorig,psu,strata,sampst,month,ivfio,ioutcome,sex,dvage,birthy,istrtdatd,istrtdatm,istrtdaty,lkmove,xpmove,jbstat,racel_dv,health,aidxhh,j2has,bensta2,bensta3,bensta4,bensta5,bensta6,bensta7,bensta96,finnow,finfut,vote1,vote6,mobuse,nch14resp,nch415resp,nchresp,nnatch,nadoptch,nchunder16,nch5to15,nch10to15,sclfsat1,sclfsat2,sclfsat7,sclfsato,marstat,employ,hgbiom,hgbiof,respf16,respm16,intdatd_if,intdatm_if,intdaty_if,doby_if,age_if,pn1pno,pn2pno,pns1pno,pns2pno,hhsize,jbhas,istrtdathh,istrtdatmm,istrtdatss,j2pay_if,fimngrs_tc,fimngrs_dv,fimnlabgrs_tc,fimnlabgrs_dv,fimnlabnet_tc,fimnlabnet_dv,fiyrinvinc_tc,fiyrinvinc_dv,fibenothr_tc,fibenothr_dv,j2pay_dv,j2paynet_dv,sex_dv,age_dv,intdatd_dv,intdatm_dv,intdaty_dv,doby_dv,pensioner_dv,npensioner_dv,marstat_dv,npn_dv,npns_dv,ngrp_dv,nnsib_dv,nnssib_dv,ethn_dv,fimnmisc_dv,fimnprben_dv,fimninvnet_dv,fimnpen_dv,fimnsben_dv,fimnnet_dv,country,gor_dv,urban_dv,hhresp_dv,xtra5min_dv,agegr5_dv,agegr10_dv,agegr13_dv,livesp_dv,cohab_dv,single_dv,mastat_dv,hhtype_dv,buno_dv,depchl_dv,nchild_dv,respm16_dv,respf16_dv,rach16_dv,hrpid,hrpno,ppno,sppno,fnpno,fnspno,mnpno,mnspno,grfpno,grmpno,qfhighfl_dv,hiqual_dv,jbiindb_dv,sf12pcs_dv,sf12mcs_dv,scflag_dv,paygu_if,paynu_if,seearngrs_if,fiyrinvinc_if,fibenothr_if,fimnlabgrs_if,fimngrs_if,ind5mus_xw,ivfho,intdated,intdatem,intdatey,ivh1,ivh2,ivh3,ivh4,ivh5,ivh6,ivh7,ivh8,ivh9,ivh10,ivh11,ivh12,ivh13,ivh14,ivh15,ivh16,hsbeds,hsrooms,hsownd,fuelhave1,fuelhave2,fuelhave3,fuelhave4,fuelhave96,fuelduel,heatch,xphsdct,xphsdba,cduse1,cduse2,cduse5,cduse6,cduse7,cduse8,cduse9,cduse12,cduse13,cduse96,pcnet,xpfood1_g3,xpfdout_g3,xpaltob_g3,ncars,hhintlang,n10to15,fihhmngrs_dv,fihhmngrs_tc,fihhmnlabgrs_dv,fihhmnlabgrs_tc,ctband_if,fihhmnnet1_dv,fihhmnlabnet_dv,fihhmnmisc_dv,fihhmnprben_dv,fihhmninv_dv,fihhmnpen_dv,fihhmnsben_dv,houscost1_dv,houscost2_dv,fihhmngrs1_dv,ctband_dv,ncouple_dv,nonepar_dv,nkids_dv,nch02_dv,nch34_dv,nch511_dv,nch1215_dv,npens_dv,nemp_dv,nue_dv,nwage_dv,nchoecd_dv,nadoecd_dv,ieqmoecd_dv,tenure_dv,fihhnegsei_if,fihhmngrs_if,issue_num,aintlen,outcome,ivtnc,w6osmflag,dcsedfl_dv,lwenum_dv,fwenum_dv,lwintvd_dv,fwintvd_dv,b_hidp,b_pno,b_ivfio,b_ivfho,b_month,c_hidp,c_pno,c_ivfio,c_ivfho,c_month,d_hidp,d_pno,d_ivfio,d_ivfho,d_month,e_hidp,e_ivfio,e_ivfho,g_hidp,g_pno,g_ivfio,g_ivfho,g_month,genetics,epigenetics,xwdat_dv,scend_dv,school_dv,bornuk_dv,generation,evercoh_dv,evermar_dv,anychild_dv,ethn_dv_source,prob91e,prob91w,prob91s,prob99w,prob99s,prob01ni,prob09ni,prob09e,prob09w,prob09s,bb_mortbh_tw,bc_mortbh_tw,bd_mortbh_tw,be_mortbh_tw,bf_mortbh_tw,bg_mortbh_tw,bh_mortbh_tw,bi_mortbh_tw,bj_mortbh_tw,bk_mortbh_tw,bl_mortbh_tw,bm_mortbh_tw,bn_mortbh_tw,bo_mortbh_tw,bp_mortbh_tw,bq_mortbh_tw,br_mortbh_tw,b_mortbh_tw,c_mortbh_tw,d_mortbh_tw,e_mortbh_tw,f_mortbh_tw,g_mortbh_tw,h_mortbh_tw,i_mortbh_tw,b_mortus_tw,c_mortus_tw,d_mortus_tw,e_mortus_tw,f_mortus_tw,g_mortus_tw,h_mortus_tw,psnenub_xd
0,22445,f,6,3.4,25,278664010,3,3,3,4,2,1,6,1,11,2,29,1984,26,6,2014,2,1,2,1,2,2,1,0,0,0,0,0,0,1,2,2,1,2,1,0,0,0,0,0,0,0,0,2,5,2,3,1,1,1,0,0,0,0,0,0,0,0,1,0,1,0,2,1,18,16,57,0,0,2572.590088,0,2572.590088,0,2012.0,0,0.0,0,0.0,90,72.0,2,29,26,6,2014,1984,2,1,6,1,1,0,0,0,1,0.0,0.0,0.0,0.0,0.0,2012.0,1,7,1,1,0,6,3,5,0,0,1,1,17,3,2,0,2,2,2,272012925,1,0,0,0,0,1,1,0,0,0,3,31,62.12,32.59,1,0,0,0,0,0.0,0.0,0.0,0.0,14,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,0,2,14,4,14,4,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,280942006,5,1,10,4,279255608,1,10,278447092,1,1,10,6,0,0,3,-8,3,1,5,1,1,1,1,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.999978,0.999948,0.99992,0.99989,0.999854,0.999813,0.999772,0.999738,0.999689,0.999649,0.999609,0.999566,0.999452,0.999389,0.999288,0.999219,0.999144,0.999005,0.998933,0.99884,0.998742,0.998624,0.998511,0.998397,0.998219,1.0,1.0,1.0,1.0,1.0,1.0,1.0,0.0
1,22445,i,9,3.3,11,277344816,1,3,3,4,2,1,6,1,11,2,33,1984,23,10,2017,2,2,2,1,2,2,2,0,0,0,0,0,0,1,2,3,1,1,1,0,0,0,0,0,0,0,0,4,4,4,5,2,1,0,0,0,0,0,0,0,0,0,0,0,0,0,1,1,18,50,38,0,0,2423.030029,0,2333.330078,0,1200.0,0,0.0,0,89.699997,0,0.0,2,33,23,10,2017,1984,2,0,1,0,0,0,0,0,1,0.0,0.0,0.0,0.0,89.699997,1289.699951,1,7,1,1,0,7,4,6,0,0,1,2,3,1,2,0,2,2,2,22445,1,0,0,0,0,0,0,0,0,0,3,31,57.2,46.08,1,0,0,0,0,0.0,0.0,0.0,0.0,10,23.0,10.0,2017.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,3.0,3.0,2.0,1.0,1.0,0.0,0.0,0.0,1.0,1.0,2.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,0.0,1.0,300.0,150.0,30.0,1.0,0.0,0.0,2423.030029,0.0,2333.330078,0.0,1.0,1289.699951,1200.0,0.0,0.0,0.0,0.0,89.699997,1300.0,736.869995,2423.030029,4.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,1.0,0.0,1.0,1.0,2.0,0.0,0.0,1.0,10.0,110.0,9.0,0,2,14,4,14,4,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,280942006,5,1,10,4,279255608,1,10,278447092,1,1,10,6,0,0,3,-8,3,1,5,1,1,1,1,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.999978,0.999948,0.99992,0.99989,0.999854,0.999813,0.999772,0.999738,0.999689,0.999649,0.999609,0.999566,0.999452,0.999389,0.999288,0.999219,0.999144,0.999005,0.998933,0.99884,0.998742,0.998624,0.998511,0.998397,0.998219,1.0,1.0,1.0,1.0,1.0,1.0,1.0,0.0
2,22445,l,12,1.6,32,276637622,1,3,3,4,2,1,4,1,11,2,35,1984,2,4,2020,2,1,6,1,2,2,2,0,0,0,0,0,0,1,4,1,1,2,1,2,1,2,2,0,2,0,0,5,3,3,5,2,2,0,0,0,1,0,0,0,0,0,0,0,0,0,4,2,21,1,46,0,0,145.169998,0,0.0,0,0.0,0,0.0,0,145.169998,0,0.0,2,35,2,4,2020,1984,2,0,1,0,0,0,0,0,1,0.0,0.0,0.0,0.0,145.169998,145.169998,1,7,1,1,0,8,4,7,1,0,0,2,11,1,2,2,1,2,1,276841780,1,2,2,0,0,0,0,0,0,0,1,0,67.18,19.42,0,0,0,0,0,1.0,0.0,1.0,0.0,10,2.0,4.0,2020.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,3.0,3.0,2.0,1.0,1.0,1.0,0.0,0.0,1.0,2.0,2.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,0.0,1.0,300.0,100.0,100.0,0.0,0.0,0.0,5656.390137,0.0,5070.0,0.0,1.0,4146.390137,3560.0,350.0,0.0,0.0,0.0,236.389999,1350.0,705.679993,5656.390137,4.0,1.0,0.0,2.0,1.0,1.0,0.0,0.0,0.0,1.0,1.0,2.0,2.0,2.0,2.1,2.0,0.0,0.0257,1.0,9.35,110.0,-9.0,0,2,14,4,14,4,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,280942006,5,1,10,4,279255608,1,10,278447092,1,1,10,6,0,0,3,-8,3,1,5,1,1,1,1,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.999978,0.999948,0.99992,0.99989,0.999854,0.999813,0.999772,0.999738,0.999689,0.999649,0.999609,0.999566,0.999452,0.999389,0.999288,0.999219,0.999144,0.999005,0.998933,0.99884,0.998742,0.998624,0.998511,0.998397,0.998219,1.0,1.0,1.0,1.0,1.0,1.0,1.0,0.0
3,29925,f,6,4.1,11,620547610,1,3,3,6,2,1,8,1,11,2,37,1977,29,9,2014,1,1,1,1,1,2,2,0,0,1,0,0,0,0,4,1,2,2,1,2,1,2,2,0,2,0,0,3,2,5,4,4,1,0,0,0,1,0,0,0,0,0,0,0,0,0,3,1,10,0,47,0,0,2175.620117,0,13.82,0,13.82,0,0.0,0,2161.800049,0,0.0,2,37,29,9,2014,1977,2,0,5,0,0,0,0,0,1,0.0,320.0,0.0,0.0,1841.800049,2175.620117,1,7,1,1,0,8,4,7,0,0,1,4,5,1,2,2,1,2,1,29925,1,0,0,0,0,0,0,0,0,0,1,30,56.59,35.67,1,0,0,1,0,0.04,1.0,0.05,0.0,10,29.0,9.0,2014.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,2.0,1.0,4.0,1.0,1.0,0.0,0.0,0.0,1.0,1.0,2.0,1.0,1.0,1.0,1.0,1.0,0.0,1.0,1.0,1.0,1.0,0.0,1.0,350.0,30.0,0.0,1.0,0.0,0.0,2175.620117,0.0,13.82,0.0,3.0,2175.620117,13.82,0.0,320.0,0.0,0.0,1841.800049,1451.0,1451.0,2175.620117,2.0,0.0,1.0,2.0,0.0,2.0,0.0,0.0,0.0,1.0,0.0,1.0,2.0,1.0,1.6,7.0,0.0,0.0451,1.0,10.0,110.0,3.0,0,2,14,4,14,4,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,622866406,3,2,11,9,621384688,50,61,620316412,1,1,10,8,0,0,3,-8,3,1,5,1,1,1,2,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.999955,0.999895,0.999839,0.999777,0.999704,0.999622,0.999538,0.99947,0.999369,0.999289,0.999208,0.99912,0.99889,0.998761,0.998557,0.998418,0.998266,0.997985,0.997838,0.997649,0.997451,0.997212,0.996983,0.996752,0.996392,1.0,1.0,1.0,1.0,1.0,1.0,1.0,0.0
4,29925,i,9,3.5,9,619024416,1,3,3,6,2,1,8,1,11,2,40,1977,22,8,2017,1,2,2,1,1,2,2,0,0,1,0,0,0,0,4,3,2,1,1,2,2,2,2,0,2,2,0,6,3,6,5,5,1,0,0,0,1,0,0,0,0,0,0,0,0,0,3,2,20,39,52,0,0,3054.530029,0,1400.0,0,1250.0,0,0.0,0,1654.530029,0,0.0,2,40,22,8,2017,1977,2,0,4,0,0,0,0,0,1,0.0,1000.0,0.0,0.0,654.530029,2904.530029,1,7,1,1,0,9,5,8,0,0,1,5,5,1,2,2,1,2,1,622866606,1,0,0,0,0,0,0,0,0,0,1,27,62.04,41.06,0,0,0,0,0,0.0,0.0,0.0,0.0,10,22.0,8.0,2017.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,2.0,1.0,1.0,1.0,1.0,0.0,0.0,0.0,1.0,1.0,2.0,1.0,1.0,1.0,1.0,1.0,0.0,1.0,1.0,1.0,1.0,0.0,1.0,300.0,100.0,0.0,1.0,0.0,0.0,3054.530029,0.0,1400.0,0.0,2.0,2904.530029,1250.0,0.0,1000.0,0.0,0.0,654.530029,0.0,0.0,3054.530029,4.0,0.0,1.0,2.0,0.0,0.0,2.0,0.0,0.0,1.0,0.0,1.0,2.0,1.0,1.6,1.0,0.0,0.0011,1.0,17.0,110.0,-9.0,0,2,14,4,14,4,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,622866406,3,2,11,9,621384688,50,61,620316412,1,1,10,8,0,0,3,-8,3,1,5,1,1,1,2,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.999955,0.999895,0.999839,0.999777,0.999704,0.999622,0.999538,0.99947,0.999369,0.999289,0.999208,0.99912,0.99889,0.998761,0.998557,0.998418,0.998266,0.997985,0.997838,0.997649,0.997451,0.997212,0.996983,0.996752,0.996392,1.0,1.0,1.0,1.0,1.0,1.0,1.0,0.0


In [34]:
# Drop rows with any missing values
df = df.dropna()  # Remove rows with any NaN values

print("Shape after dropping rows with missing values:", df.shape)

df.head()

Shape after dropping rows with missing values: (173487, 318)


Unnamed: 0,pidp,wave,wave_num,nbrsnci_dv,scghq1_dv,hidp,pno,hhorig,memorig,psu,strata,sampst,month,ivfio,ioutcome,sex,dvage,birthy,istrtdatd,istrtdatm,istrtdaty,lkmove,xpmove,jbstat,racel_dv,health,aidxhh,j2has,bensta2,bensta3,bensta4,bensta5,bensta6,bensta7,bensta96,finnow,finfut,vote1,vote6,mobuse,nch14resp,nch415resp,nchresp,nnatch,nadoptch,nchunder16,nch5to15,nch10to15,sclfsat1,sclfsat2,sclfsat7,sclfsato,marstat,employ,hgbiom,hgbiof,respf16,respm16,intdatd_if,intdatm_if,intdaty_if,doby_if,age_if,pn1pno,pn2pno,pns1pno,pns2pno,hhsize,jbhas,istrtdathh,istrtdatmm,istrtdatss,j2pay_if,fimngrs_tc,fimngrs_dv,fimnlabgrs_tc,fimnlabgrs_dv,fimnlabnet_tc,fimnlabnet_dv,fiyrinvinc_tc,fiyrinvinc_dv,fibenothr_tc,fibenothr_dv,j2pay_dv,j2paynet_dv,sex_dv,age_dv,intdatd_dv,intdatm_dv,intdaty_dv,doby_dv,pensioner_dv,npensioner_dv,marstat_dv,npn_dv,npns_dv,ngrp_dv,nnsib_dv,nnssib_dv,ethn_dv,fimnmisc_dv,fimnprben_dv,fimninvnet_dv,fimnpen_dv,fimnsben_dv,fimnnet_dv,country,gor_dv,urban_dv,hhresp_dv,xtra5min_dv,agegr5_dv,agegr10_dv,agegr13_dv,livesp_dv,cohab_dv,single_dv,mastat_dv,hhtype_dv,buno_dv,depchl_dv,nchild_dv,respm16_dv,respf16_dv,rach16_dv,hrpid,hrpno,ppno,sppno,fnpno,fnspno,mnpno,mnspno,grfpno,grmpno,qfhighfl_dv,hiqual_dv,jbiindb_dv,sf12pcs_dv,sf12mcs_dv,scflag_dv,paygu_if,paynu_if,seearngrs_if,fiyrinvinc_if,fibenothr_if,fimnlabgrs_if,fimngrs_if,ind5mus_xw,ivfho,intdated,intdatem,intdatey,ivh1,ivh2,ivh3,ivh4,ivh5,ivh6,ivh7,ivh8,ivh9,ivh10,ivh11,ivh12,ivh13,ivh14,ivh15,ivh16,hsbeds,hsrooms,hsownd,fuelhave1,fuelhave2,fuelhave3,fuelhave4,fuelhave96,fuelduel,heatch,xphsdct,xphsdba,cduse1,cduse2,cduse5,cduse6,cduse7,cduse8,cduse9,cduse12,cduse13,cduse96,pcnet,xpfood1_g3,xpfdout_g3,xpaltob_g3,ncars,hhintlang,n10to15,fihhmngrs_dv,fihhmngrs_tc,fihhmnlabgrs_dv,fihhmnlabgrs_tc,ctband_if,fihhmnnet1_dv,fihhmnlabnet_dv,fihhmnmisc_dv,fihhmnprben_dv,fihhmninv_dv,fihhmnpen_dv,fihhmnsben_dv,houscost1_dv,houscost2_dv,fihhmngrs1_dv,ctband_dv,ncouple_dv,nonepar_dv,nkids_dv,nch02_dv,nch34_dv,nch511_dv,nch1215_dv,npens_dv,nemp_dv,nue_dv,nwage_dv,nchoecd_dv,nadoecd_dv,ieqmoecd_dv,tenure_dv,fihhnegsei_if,fihhmngrs_if,issue_num,aintlen,outcome,ivtnc,w6osmflag,dcsedfl_dv,lwenum_dv,fwenum_dv,lwintvd_dv,fwintvd_dv,b_hidp,b_pno,b_ivfio,b_ivfho,b_month,c_hidp,c_pno,c_ivfio,c_ivfho,c_month,d_hidp,d_pno,d_ivfio,d_ivfho,d_month,e_hidp,e_ivfio,e_ivfho,g_hidp,g_pno,g_ivfio,g_ivfho,g_month,genetics,epigenetics,xwdat_dv,scend_dv,school_dv,bornuk_dv,generation,evercoh_dv,evermar_dv,anychild_dv,ethn_dv_source,prob91e,prob91w,prob91s,prob99w,prob99s,prob01ni,prob09ni,prob09e,prob09w,prob09s,bb_mortbh_tw,bc_mortbh_tw,bd_mortbh_tw,be_mortbh_tw,bf_mortbh_tw,bg_mortbh_tw,bh_mortbh_tw,bi_mortbh_tw,bj_mortbh_tw,bk_mortbh_tw,bl_mortbh_tw,bm_mortbh_tw,bn_mortbh_tw,bo_mortbh_tw,bp_mortbh_tw,bq_mortbh_tw,br_mortbh_tw,b_mortbh_tw,c_mortbh_tw,d_mortbh_tw,e_mortbh_tw,f_mortbh_tw,g_mortbh_tw,h_mortbh_tw,i_mortbh_tw,b_mortus_tw,c_mortus_tw,d_mortus_tw,e_mortus_tw,f_mortus_tw,g_mortus_tw,h_mortus_tw,psnenub_xd
1,22445,i,9,3.3,11,277344816,1,3,3,4,2,1,6,1,11,2,33,1984,23,10,2017,2,2,2,1,2,2,2,0,0,0,0,0,0,1,2,3,1,1,1,0,0,0,0,0,0,0,0,4,4,4,5,2,1,0,0,0,0,0,0,0,0,0,0,0,0,0,1,1,18,50,38,0,0,2423.030029,0,2333.330078,0,1200.0,0,0.0,0,89.699997,0,0.0,2,33,23,10,2017,1984,2,0,1,0,0,0,0,0,1,0.0,0.0,0.0,0.0,89.699997,1289.699951,1,7,1,1,0,7,4,6,0,0,1,2,3,1,2,0,2,2,2,22445,1,0,0,0,0,0,0,0,0,0,3,31,57.2,46.08,1,0,0,0,0,0.0,0.0,0.0,0.0,10,23,10,2017,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,3,3,2,1,1,0,0,0,1,1,2,1,1,1,1,1,1,1,1,1,1,0,1,300,150,30,1,0,0,2423.030029,0,2333.330078,0,1,1289.699951,1200.0,0.0,0.0,0.0,0.0,89.699997,1300.0,736.869995,2423.030029,4,0,0,0,0,0,0,0,0,1,0,1,0,1,1.0,2,0,0.0,1,10.0,110,9,0,2,14,4,14,4,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,280942006,5,1,10,4,279255608,1,10,278447092,1,1,10,6,0,0,3,-8,3,1,5,1,1,1,1,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.999978,0.999948,0.99992,0.99989,0.999854,0.999813,0.999772,0.999738,0.999689,0.999649,0.999609,0.999566,0.999452,0.999389,0.999288,0.999219,0.999144,0.999005,0.998933,0.99884,0.998742,0.998624,0.998511,0.998397,0.998219,1.0,1.0,1.0,1.0,1.0,1.0,1.0,0.0
2,22445,l,12,1.6,32,276637622,1,3,3,4,2,1,4,1,11,2,35,1984,2,4,2020,2,1,6,1,2,2,2,0,0,0,0,0,0,1,4,1,1,2,1,2,1,2,2,0,2,0,0,5,3,3,5,2,2,0,0,0,1,0,0,0,0,0,0,0,0,0,4,2,21,1,46,0,0,145.169998,0,0.0,0,0.0,0,0.0,0,145.169998,0,0.0,2,35,2,4,2020,1984,2,0,1,0,0,0,0,0,1,0.0,0.0,0.0,0.0,145.169998,145.169998,1,7,1,1,0,8,4,7,1,0,0,2,11,1,2,2,1,2,1,276841780,1,2,2,0,0,0,0,0,0,0,1,0,67.18,19.42,0,0,0,0,0,1.0,0.0,1.0,0.0,10,2,4,2020,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,3,3,2,1,1,1,0,0,1,2,2,1,1,1,1,1,1,1,1,1,1,0,1,300,100,100,0,0,0,5656.390137,0,5070.0,0,1,4146.390137,3560.0,350.0,0.0,0.0,0.0,236.389999,1350.0,705.679993,5656.390137,4,1,0,2,1,1,0,0,0,1,1,2,2,2,2.1,2,0,0.0257,1,9.35,110,-9,0,2,14,4,14,4,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,280942006,5,1,10,4,279255608,1,10,278447092,1,1,10,6,0,0,3,-8,3,1,5,1,1,1,1,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.999978,0.999948,0.99992,0.99989,0.999854,0.999813,0.999772,0.999738,0.999689,0.999649,0.999609,0.999566,0.999452,0.999389,0.999288,0.999219,0.999144,0.999005,0.998933,0.99884,0.998742,0.998624,0.998511,0.998397,0.998219,1.0,1.0,1.0,1.0,1.0,1.0,1.0,0.0
3,29925,f,6,4.1,11,620547610,1,3,3,6,2,1,8,1,11,2,37,1977,29,9,2014,1,1,1,1,1,2,2,0,0,1,0,0,0,0,4,1,2,2,1,2,1,2,2,0,2,0,0,3,2,5,4,4,1,0,0,0,1,0,0,0,0,0,0,0,0,0,3,1,10,0,47,0,0,2175.620117,0,13.82,0,13.82,0,0.0,0,2161.800049,0,0.0,2,37,29,9,2014,1977,2,0,5,0,0,0,0,0,1,0.0,320.0,0.0,0.0,1841.800049,2175.620117,1,7,1,1,0,8,4,7,0,0,1,4,5,1,2,2,1,2,1,29925,1,0,0,0,0,0,0,0,0,0,1,30,56.59,35.67,1,0,0,1,0,0.04,1.0,0.05,0.0,10,29,9,2014,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,2,1,4,1,1,0,0,0,1,1,2,1,1,1,1,1,0,1,1,1,1,0,1,350,30,0,1,0,0,2175.620117,0,13.82,0,3,2175.620117,13.82,0.0,320.0,0.0,0.0,1841.800049,1451.0,1451.0,2175.620117,2,0,1,2,0,2,0,0,0,1,0,1,2,1,1.6,7,0,0.0451,1,10.0,110,3,0,2,14,4,14,4,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,622866406,3,2,11,9,621384688,50,61,620316412,1,1,10,8,0,0,3,-8,3,1,5,1,1,1,2,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.999955,0.999895,0.999839,0.999777,0.999704,0.999622,0.999538,0.99947,0.999369,0.999289,0.999208,0.99912,0.99889,0.998761,0.998557,0.998418,0.998266,0.997985,0.997838,0.997649,0.997451,0.997212,0.996983,0.996752,0.996392,1.0,1.0,1.0,1.0,1.0,1.0,1.0,0.0
4,29925,i,9,3.5,9,619024416,1,3,3,6,2,1,8,1,11,2,40,1977,22,8,2017,1,2,2,1,1,2,2,0,0,1,0,0,0,0,4,3,2,1,1,2,2,2,2,0,2,2,0,6,3,6,5,5,1,0,0,0,1,0,0,0,0,0,0,0,0,0,3,2,20,39,52,0,0,3054.530029,0,1400.0,0,1250.0,0,0.0,0,1654.530029,0,0.0,2,40,22,8,2017,1977,2,0,4,0,0,0,0,0,1,0.0,1000.0,0.0,0.0,654.530029,2904.530029,1,7,1,1,0,9,5,8,0,0,1,5,5,1,2,2,1,2,1,622866606,1,0,0,0,0,0,0,0,0,0,1,27,62.04,41.06,0,0,0,0,0,0.0,0.0,0.0,0.0,10,22,8,2017,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,2,1,1,1,1,0,0,0,1,1,2,1,1,1,1,1,0,1,1,1,1,0,1,300,100,0,1,0,0,3054.530029,0,1400.0,0,2,2904.530029,1250.0,0.0,1000.0,0.0,0.0,654.530029,0.0,0.0,3054.530029,4,0,1,2,0,0,2,0,0,1,0,1,2,1,1.6,1,0,0.0011,1,17.0,110,-9,0,2,14,4,14,4,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,622866406,3,2,11,9,621384688,50,61,620316412,1,1,10,8,0,0,3,-8,3,1,5,1,1,1,2,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.999955,0.999895,0.999839,0.999777,0.999704,0.999622,0.999538,0.99947,0.999369,0.999289,0.999208,0.99912,0.99889,0.998761,0.998557,0.998418,0.998266,0.997985,0.997838,0.997649,0.997451,0.997212,0.996983,0.996752,0.996392,1.0,1.0,1.0,1.0,1.0,1.0,1.0,0.0
5,29925,l,12,3.8,15,618222022,1,3,3,6,2,1,8,1,11,2,43,1977,8,9,2020,1,2,2,1,1,2,2,0,0,0,0,0,0,1,3,3,2,1,1,2,2,2,2,0,2,2,1,5,2,5,3,5,1,0,0,0,1,0,0,0,0,0,0,0,0,0,3,1,9,26,23,0,0,3202.0,0,1733.329956,0,1516.670044,0,0.0,0,1468.670044,0,0.0,2,43,8,9,2020,1977,2,0,4,0,0,0,0,0,1,0.0,1000.0,0.0,0.0,468.670013,2985.340088,1,7,1,1,0,9,5,8,0,0,1,5,5,1,2,2,1,2,1,622866606,1,0,0,0,0,0,0,0,0,0,1,27,59.07,37.34,0,0,0,0,0,0.01,0.0,0.01,0.0,10,5,8,2020,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,2,1,1,1,1,0,0,0,1,1,2,1,1,1,1,1,0,1,1,1,1,0,1,320,150,0,1,0,1,3202.0,0,1733.329956,0,2,2985.340088,1516.670044,0.0,1000.0,0.0,0.0,468.670013,0.0,0.0,3202.0,4,0,1,2,0,0,2,0,0,1,0,1,2,1,1.6,1,0,0.0053,1,12.5,110,15,0,2,14,4,14,4,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,622866406,3,2,11,9,621384688,50,61,620316412,1,1,10,8,0,0,3,-8,3,1,5,1,1,1,2,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.999955,0.999895,0.999839,0.999777,0.999704,0.999622,0.999538,0.99947,0.999369,0.999289,0.999208,0.99912,0.99889,0.998761,0.998557,0.998418,0.998266,0.997985,0.997838,0.997649,0.997451,0.997212,0.996983,0.996752,0.996392,1.0,1.0,1.0,1.0,1.0,1.0,1.0,0.0


In [35]:
# Drop rows with any negative values
numeric_cols = df.select_dtypes(include='number').columns
df = df[(df[numeric_cols] >= 0).all(axis=1)]
print("Shape after dropping rows with negative values:", df.shape)
df.head()

Shape after dropping rows with negative values: (106771, 318)


Unnamed: 0,pidp,wave,wave_num,nbrsnci_dv,scghq1_dv,hidp,pno,hhorig,memorig,psu,strata,sampst,month,ivfio,ioutcome,sex,dvage,birthy,istrtdatd,istrtdatm,istrtdaty,lkmove,xpmove,jbstat,racel_dv,health,aidxhh,j2has,bensta2,bensta3,bensta4,bensta5,bensta6,bensta7,bensta96,finnow,finfut,vote1,vote6,mobuse,nch14resp,nch415resp,nchresp,nnatch,nadoptch,nchunder16,nch5to15,nch10to15,sclfsat1,sclfsat2,sclfsat7,sclfsato,marstat,employ,hgbiom,hgbiof,respf16,respm16,intdatd_if,intdatm_if,intdaty_if,doby_if,age_if,pn1pno,pn2pno,pns1pno,pns2pno,hhsize,jbhas,istrtdathh,istrtdatmm,istrtdatss,j2pay_if,fimngrs_tc,fimngrs_dv,fimnlabgrs_tc,fimnlabgrs_dv,fimnlabnet_tc,fimnlabnet_dv,fiyrinvinc_tc,fiyrinvinc_dv,fibenothr_tc,fibenothr_dv,j2pay_dv,j2paynet_dv,sex_dv,age_dv,intdatd_dv,intdatm_dv,intdaty_dv,doby_dv,pensioner_dv,npensioner_dv,marstat_dv,npn_dv,npns_dv,ngrp_dv,nnsib_dv,nnssib_dv,ethn_dv,fimnmisc_dv,fimnprben_dv,fimninvnet_dv,fimnpen_dv,fimnsben_dv,fimnnet_dv,country,gor_dv,urban_dv,hhresp_dv,xtra5min_dv,agegr5_dv,agegr10_dv,agegr13_dv,livesp_dv,cohab_dv,single_dv,mastat_dv,hhtype_dv,buno_dv,depchl_dv,nchild_dv,respm16_dv,respf16_dv,rach16_dv,hrpid,hrpno,ppno,sppno,fnpno,fnspno,mnpno,mnspno,grfpno,grmpno,qfhighfl_dv,hiqual_dv,jbiindb_dv,sf12pcs_dv,sf12mcs_dv,scflag_dv,paygu_if,paynu_if,seearngrs_if,fiyrinvinc_if,fibenothr_if,fimnlabgrs_if,fimngrs_if,ind5mus_xw,ivfho,intdated,intdatem,intdatey,ivh1,ivh2,ivh3,ivh4,ivh5,ivh6,ivh7,ivh8,ivh9,ivh10,ivh11,ivh12,ivh13,ivh14,ivh15,ivh16,hsbeds,hsrooms,hsownd,fuelhave1,fuelhave2,fuelhave3,fuelhave4,fuelhave96,fuelduel,heatch,xphsdct,xphsdba,cduse1,cduse2,cduse5,cduse6,cduse7,cduse8,cduse9,cduse12,cduse13,cduse96,pcnet,xpfood1_g3,xpfdout_g3,xpaltob_g3,ncars,hhintlang,n10to15,fihhmngrs_dv,fihhmngrs_tc,fihhmnlabgrs_dv,fihhmnlabgrs_tc,ctband_if,fihhmnnet1_dv,fihhmnlabnet_dv,fihhmnmisc_dv,fihhmnprben_dv,fihhmninv_dv,fihhmnpen_dv,fihhmnsben_dv,houscost1_dv,houscost2_dv,fihhmngrs1_dv,ctband_dv,ncouple_dv,nonepar_dv,nkids_dv,nch02_dv,nch34_dv,nch511_dv,nch1215_dv,npens_dv,nemp_dv,nue_dv,nwage_dv,nchoecd_dv,nadoecd_dv,ieqmoecd_dv,tenure_dv,fihhnegsei_if,fihhmngrs_if,issue_num,aintlen,outcome,ivtnc,w6osmflag,dcsedfl_dv,lwenum_dv,fwenum_dv,lwintvd_dv,fwintvd_dv,b_hidp,b_pno,b_ivfio,b_ivfho,b_month,c_hidp,c_pno,c_ivfio,c_ivfho,c_month,d_hidp,d_pno,d_ivfio,d_ivfho,d_month,e_hidp,e_ivfio,e_ivfho,g_hidp,g_pno,g_ivfio,g_ivfho,g_month,genetics,epigenetics,xwdat_dv,scend_dv,school_dv,bornuk_dv,generation,evercoh_dv,evermar_dv,anychild_dv,ethn_dv_source,prob91e,prob91w,prob91s,prob99w,prob99s,prob01ni,prob09ni,prob09e,prob09w,prob09s,bb_mortbh_tw,bc_mortbh_tw,bd_mortbh_tw,be_mortbh_tw,bf_mortbh_tw,bg_mortbh_tw,bh_mortbh_tw,bi_mortbh_tw,bj_mortbh_tw,bk_mortbh_tw,bl_mortbh_tw,bm_mortbh_tw,bn_mortbh_tw,bo_mortbh_tw,bp_mortbh_tw,bq_mortbh_tw,br_mortbh_tw,b_mortbh_tw,c_mortbh_tw,d_mortbh_tw,e_mortbh_tw,f_mortbh_tw,g_mortbh_tw,h_mortbh_tw,i_mortbh_tw,b_mortus_tw,c_mortus_tw,d_mortus_tw,e_mortus_tw,f_mortus_tw,g_mortus_tw,h_mortus_tw,psnenub_xd
8,280165,i,9,3.3,7,754793216,1,3,3,67,15,1,12,1,11,2,38,1979,31,1,2018,2,2,2,1,1,2,2,0,0,0,0,0,0,1,2,1,2,3,1,1,1,1,1,0,1,1,1,5,5,3,6,2,1,2,0,0,1,0,0,0,0,0,2,0,2,0,4,1,19,17,37,0,0,3089.699951,0,3000.0,0,2228.0,0,0.0,0,89.699997,0,0.0,2,38,31,1,2018,1979,2,1,1,1,1,0,0,0,1,0.0,0.0,0.0,0.0,89.699997,2317.699951,1,8,2,1,0,8,4,7,1,0,0,2,20,1,2,1,1,2,1,783876922,1,4,4,0,0,2,2,0,0,1,4,16,55.57,52.43,1,0,0,0,0,0.0,0.0,0.0,0.0,10,31,1,2018,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,5,2,2,1,1,0,1,0,1,1,2,1,1,1,1,1,1,1,1,1,1,0,1,500,100,0,3,0,1,10162.950195,0,8228.730469,0,1,8086.029785,6151.810059,0.0,0.0,9.56,1184.959961,739.700012,1536.0,1536.0,10162.950195,5,1,0,1,0,0,1,0,1,2,1,2,1,3,2.3,2,0,0.6242,1,6.0,110,7,0,2,14,2,14,2,783876802,2,1,10,11,759532804,2,2,11,11,758492406,2,2,11,11,756833208,1,10,755847212,1,1,10,12,0,0,3,15,1,1,6,1,1,1,1,0.000342,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.999982,0.999959,0.999937,0.999912,0.999884,0.999851,0.999818,0.999791,0.999752,0.99972,0.999688,0.999654,0.999563,0.999513,0.999432,0.999378,0.999318,0.999207,0.99915,0.999076,0.998998,0.998904,0.998814,0.998724,0.998582,1.0,1.0,1.0,1.0,1.0,1.0,1.0,2923.147705
9,280165,l,12,1.9,36,753800422,1,3,3,67,15,1,12,1,11,2,41,1979,12,12,2020,2,1,1,1,2,1,2,0,0,0,0,0,0,1,3,1,2,4,1,1,1,1,1,0,1,1,1,6,1,5,2,2,2,2,0,0,1,0,0,0,0,0,2,0,2,0,4,2,7,20,30,0,0,361.670013,0,0.0,0,0.0,0,0.0,0,361.670013,0,0.0,2,41,12,12,2020,1979,2,1,1,1,1,0,0,0,1,0.0,0.0,0.0,0.0,361.670013,361.670013,1,8,2,3,0,9,5,8,1,0,0,2,20,1,2,1,1,2,1,783876922,1,4,4,0,0,2,2,0,0,1,4,0,60.96,13.6,0,0,0,0,0,0.0,0.0,0.0,0.0,12,27,11,2020,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,5,2,2,1,0,0,0,0,-8,1,2,1,1,1,1,1,1,1,1,1,1,0,1,500,300,100,3,0,1,4623.209961,0,1035.390015,0,1,4594.120117,1006.299988,0.0,0.0,0.0,0.0,3587.820068,1500.0,247.149994,4623.209961,5,1,0,1,0,0,0,1,1,1,2,2,1,3,2.3,2,0,0.7637,1,11.16,210,4,0,2,14,2,14,2,783876802,2,1,10,11,759532804,2,2,11,11,758492406,2,2,11,11,756833208,1,10,755847212,1,1,10,12,0,0,3,15,1,1,6,1,1,1,1,0.000342,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.999982,0.999959,0.999937,0.999912,0.999884,0.999851,0.999818,0.999791,0.999752,0.99972,0.999688,0.999654,0.999563,0.999513,0.999432,0.999378,0.999318,0.999207,0.99915,0.999076,0.998998,0.998904,0.998814,0.998724,0.998582,1.0,1.0,1.0,1.0,1.0,1.0,1.0,2923.147705
18,665045,f,6,3.3,10,212588410,4,3,3,144,38,1,1,1,11,1,32,1981,18,2,2014,2,2,2,1,2,2,2,0,0,0,0,0,0,1,4,1,1,1,1,0,0,0,0,0,0,0,0,5,3,5,4,1,1,1,0,0,0,0,0,0,0,0,1,0,1,2,4,1,15,26,7,0,0,460.0,0,460.0,0,460.0,0,0.0,0,0.0,0,0.0,1,32,18,2,2014,1981,2,0,6,1,2,0,0,1,1,0.0,0.0,0.0,0.0,0.0,460.0,1,5,2,3,0,7,4,6,0,0,1,1,19,4,2,0,2,2,2,205598685,1,0,0,0,2,1,1,0,0,0,3,24,53.97,50.14,0,0,0,0,0,0.0,0.0,0.0,0.0,12,25,2,2014,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,3,2,1,1,1,0,0,0,1,1,2,1,-8,-8,-8,-8,-8,-8,-8,-8,-8,-9,1,350,100,0,2,0,0,11270.959961,0,9879.849609,0,1,7817.240234,6426.129883,0.0,0.0,1148.439941,0.0,242.669998,0.0,0.0,11270.959961,3,1,0,0,0,0,0,0,0,3,1,4,0,4,2.5,1,0,0.862,1,17.0,210,20,0,2,14,2,12,3,239074402,4,10,12,3,215192804,4,1,12,3,214812006,4,1,12,3,213071208,1,12,212262012,4,10,12,1,0,0,3,16,1,1,5,2,1,2,2,0.000342,0.0,0.0,0.0,0.0,0.0,0.0,0.001829,0.0,0.0,0.999946,0.999874,0.999806,0.999732,0.999644,0.999545,0.999445,0.999363,0.999242,0.999145,0.999048,0.998943,0.998667,0.998512,0.998267,0.9981,0.997918,0.997581,0.997406,0.99718,0.996943,0.996656,0.996382,0.996107,0.995676,1.0,1.0,1.0,1.0,1.0,1.0,1.0,460.659668
19,665045,l,12,3.6,7,210188022,1,3,3,144,38,1,3,1,11,1,38,1981,27,4,2020,2,2,2,1,2,2,1,0,0,0,0,0,0,1,2,1,2,1,1,0,0,0,0,0,0,0,0,5,5,7,6,1,1,0,0,0,0,0,0,0,0,0,0,0,2,0,2,1,13,29,7,0,0,425.0,0,425.0,0,425.0,0,0.0,0,0.0,100,100.0,1,38,27,4,2020,1981,2,0,6,0,1,0,0,0,1,0.0,0.0,0.0,0.0,0.0,425.0,1,5,2,3,0,8,4,7,0,0,1,1,16,1,2,0,2,2,2,665045,2,0,0,0,2,0,0,0,0,0,3,24,53.17,54.96,0,0,0,0,0,0.0,0.0,0.0,0.0,12,27,4,2020,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,3,2,1,1,1,0,0,0,1,2,2,1,-8,-8,-8,-8,-8,-8,-8,-8,-8,-8,1,300,0,0,3,0,0,4291.410156,0,4291.410156,0,1,3348.810059,3348.810059,0.0,0.0,0.0,0.0,0.0,0.0,0.0,4291.410156,3,0,0,0,0,0,0,0,0,2,0,2,0,2,1.5,1,0,0.901,2,17.56,210,6,0,2,14,2,12,3,239074402,4,10,12,3,215192804,4,1,12,3,214812006,4,1,12,3,213071208,1,12,212262012,4,10,12,1,0,0,3,16,1,1,5,2,1,2,2,0.000342,0.0,0.0,0.0,0.0,0.0,0.0,0.001829,0.0,0.0,0.999946,0.999874,0.999806,0.999732,0.999644,0.999545,0.999445,0.999363,0.999242,0.999145,0.999048,0.998943,0.998667,0.998512,0.998267,0.9981,0.997918,0.997581,0.997406,0.99718,0.996943,0.996656,0.996382,0.996107,0.995676,1.0,1.0,1.0,1.0,1.0,1.0,1.0,460.659668
27,1833965,c,3,3.4,8,757615204,3,3,3,46,12,3,11,1,11,1,46,1965,7,12,2011,2,2,2,1,1,2,2,0,0,0,0,0,0,1,3,1,2,2,1,0,0,0,0,0,0,0,0,3,2,2,3,1,1,1,2,2,2,0,0,0,0,0,1,2,1,2,3,1,20,26,46,0,0,1666.72998,0,1660.0,0,1325.660034,0,80.760002,0,0.0,0,0.0,1,46,7,12,2011,1965,2,2,6,2,2,0,0,0,1,0.0,0.0,6.73,0.0,0.0,1332.390015,1,8,2,2,0,10,5,9,0,0,1,1,19,3,2,0,2,2,2,748184965,1,0,0,2,2,1,1,0,0,0,5,20,45.33,37.91,1,0,0,0,1,0.0,0.0,0.0,0.0,11,16,11,2011,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,3,3,1,1,0,0,0,0,-8,1,2,1,1,1,1,1,1,1,1,1,1,0,1,255,-1,0,1,0,0,3313.25,0,1660.0,0,1,2978.909912,1325.660034,0.0,0.0,726.039978,166.710007,760.5,0.0,0.0,3313.25,4,1,0,0,0,0,0,0,2,1,2,1,0,3,2.0,1,0,0.2123,1,124.0,210,3,0,2,11,2,9,2,782088402,3,1,11,11,757615204,3,1,11,11,756846806,3,2,11,11,755412008,1,10,754494012,1,1,10,10,0,0,3,16,1,1,5,2,2,1,4,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,0.0


## Separating outcome variable and dropping identifiers

Great! Now we need to take out scghq1_dv (the outcome variable) and drop identifiers (pidp wave wave_num hidp)

In [41]:
# Separate outcome (scghq1_dv) and predictors (X)

Y = df['scghq1_dv'].astype(float)  # Ensure Y is a numeric pandas Series
X = df.drop(columns=['scghq1_dv'])

# Drop identifiers (pidp, wave, wave_num, hidp, pno, hhorig, memorig, psu, strata, sampst, month, ivfio, ioutcome, birthy, hrpid)
identifiers = ['pidp', 'wave', 'wave_num', 'hidp', 'pno', 'hhorig', 'memorig', 'psu', 'strata', 'sampst', 'month', 'ivfio', 'ioutcome', 'birthy', 'hrpid']
X = X.drop(columns=identifiers)

## Defining categorical and continuous variables

In [37]:
# Strict rule: discrete only if dtype is object/category or integer dtype, and num_unique <= MAX_UNIQUE_DISCRETE
# Anything with more than MAX_UNIQUE_DISCRETE unique values is automatically continuous.

MAX_UNIQUE_DISCRETE = 30

label_map = dict(zip(meta.column_names, meta.column_labels)) if hasattr(meta, 'column_names') else {}
rows = []

for col in X.columns:
    dtype = X[col].dtype
    col_vals = X[col].dropna()
    num_unique = int(col_vals.nunique())

    # Automatic continuous if too many uniques
    if num_unique > MAX_UNIQUE_DISCRETE:
        will_discrete = False
        reason = f'{num_unique} unique > {MAX_UNIQUE_DISCRETE} -> continuous'
    else:
        is_string = dtype == 'object' or str(dtype).startswith('category')
        is_integer = pd.api.types.is_integer_dtype(X[col].dtype)

        will_discrete = bool(is_string or is_integer)
        if is_string:
            reason = 'string/category dtype -> discrete'
        elif is_integer:
            reason = 'integer dtype -> discrete'
        else:
            reason = 'float/numeric dtype -> continuous'

    suggested_action = 'one-hot encode (discrete)' if will_discrete else 'treat as continuous'
    sample_vals = list(pd.Series(col_vals.unique()).sort_values()[:6]) if num_unique > 0 else []

    rows.append({
        'Column': col,
        'Label': label_map.get(col, ''),
        'DataType': str(dtype),
        'NumUnique': num_unique,
        'IsString': dtype == 'object' or str(dtype).startswith('category'),
        'IsInteger': pd.api.types.is_integer_dtype(X[col].dtype),
        'WillBeDiscrete': will_discrete,
        'DecisionReason': reason,
        'SuggestedAction': suggested_action
    })

variable_summary = pd.DataFrame(rows)
# Order discrete first for visibility
variable_summary = variable_summary.sort_values(by=['WillBeDiscrete', 'NumUnique'], ascending=[False, True]).reset_index(drop=True)

print(f"Total variables: {len(variable_summary)}")
print(f"Discrete by rule: {int(variable_summary['WillBeDiscrete'].sum())}")
print(f"MAX_UNIQUE_DISCRETE = {MAX_UNIQUE_DISCRETE}")

pd.set_option('display.max_rows', None)
display(variable_summary)


Total variables: 302
Discrete by rule: 206
MAX_UNIQUE_DISCRETE = 30


Unnamed: 0,Column,Label,DataType,NumUnique,IsString,IsInteger,WillBeDiscrete,DecisionReason,SuggestedAction
0,intdatd_if,"Interview date: Day, imputation flag",int64,1,False,True,True,integer dtype -> discrete,one-hot encode (discrete)
1,intdatm_if,"Interview date: Month, imputation flag",int64,1,False,True,True,integer dtype -> discrete,one-hot encode (discrete)
2,intdaty_if,"Interview date: Year, imputation flag",int64,1,False,True,True,integer dtype -> discrete,one-hot encode (discrete)
3,doby_if,DOB Year imputation flag,int64,1,False,True,True,integer dtype -> discrete,one-hot encode (discrete)
4,age_if,Imputation flag for age_dv,int64,1,False,True,True,integer dtype -> discrete,one-hot encode (discrete)
5,school_dv,Never went to/still at school,int64,1,False,True,True,integer dtype -> discrete,one-hot encode (discrete)
6,sex,Sex,int64,2,False,True,True,integer dtype -> discrete,one-hot encode (discrete)
7,lkmove,Prefers to move house,int64,2,False,True,True,integer dtype -> discrete,one-hot encode (discrete)
8,xpmove,Expects to move in next year,int64,2,False,True,True,integer dtype -> discrete,one-hot encode (discrete)
9,health,Long-standing illness or disability,int64,2,False,True,True,integer dtype -> discrete,one-hot encode (discrete)


## Scaling continuous variables

Great! Now we need to scale the continuous variables.

In [38]:
# Scale continuous variables to mean 0, std 1
from sklearn.preprocessing import StandardScaler
import numpy as np

# Identify continuous columns using the variable_summary decisions
continuous_cols = variable_summary.loc[~variable_summary['WillBeDiscrete'], 'Column'].tolist()

# Only keep columns that still exist in X (defensive)
continuous_cols = [c for c in continuous_cols if c in X.columns]

print(f'Found {len(continuous_cols)} continuous column(s) to scale')

# Keep a copy of the unscaled X in case we need it later
X_unscaled = X.copy()

scaler = None
if len(continuous_cols) > 0:
    scaler = StandardScaler()
    # Convert to float (safe) and scale in-place on a copy
    X_scaled = X.copy()
    try:
        X_scaled[continuous_cols] = scaler.fit_transform(X_scaled[continuous_cols].astype(float))
    except Exception as e:
        # Fall back to scaling each column separately if there are issues with mixed dtypes
        print('Warning: bulk scaling failed, falling back to per-column scaling. Error:', e)
        for col in continuous_cols:
            try:
                vals = X_scaled[col].astype(float).values.reshape(-1, 1)
                X_scaled[col] = scaler.fit_transform(vals).ravel()
            except Exception as e2:
                print(f'  Could not scale column {col}:', e2)
    
    # Replace X with scaled version
    X = X_scaled
else:
    print('No continuous columns to scale; X left unchanged')

print('Shape of X after scaling:', X.shape)

# Quick sanity checks
if scaler is not None and len(continuous_cols) > 0:
    # show means and stds (approx) for a few columns
    sample_check = continuous_cols[:6]
    means = X[sample_check].mean().round(6)
    stds = X[sample_check].std().round(6)
    print('Sample scaled means (should be near 0):')
    print(means.to_dict())
    print('Sample scaled stds (should be near 1):')
    print(stds.to_dict())

# Display head for quick verification
pd.set_option('display.max_columns', None)
display(X.head())

Found 96 continuous column(s) to scale
Shape of X after scaling: (106771, 302)
Sample scaled means (should be near 0):
{'prob01ni': 0.0, 'b_mortus_tw': -0.0, 'prob91e': -0.0, 'prob91w': -0.0, 'prob91s': -0.0, 'prob99w': 0.0}
Sample scaled stds (should be near 1):
{'prob01ni': 1.000005, 'b_mortus_tw': 1.000005, 'prob91e': 1.000005, 'prob91w': 1.000005, 'prob91s': 1.000005, 'prob99w': 1.000005}


Unnamed: 0,nbrsnci_dv,sex,dvage,istrtdatd,istrtdatm,istrtdaty,lkmove,xpmove,jbstat,racel_dv,health,aidxhh,j2has,bensta2,bensta3,bensta4,bensta5,bensta6,bensta7,bensta96,finnow,finfut,vote1,vote6,mobuse,nch14resp,nch415resp,nchresp,nnatch,nadoptch,nchunder16,nch5to15,nch10to15,sclfsat1,sclfsat2,sclfsat7,sclfsato,marstat,employ,hgbiom,hgbiof,respf16,respm16,intdatd_if,intdatm_if,intdaty_if,doby_if,age_if,pn1pno,pn2pno,pns1pno,pns2pno,hhsize,jbhas,istrtdathh,istrtdatmm,istrtdatss,j2pay_if,fimngrs_tc,fimngrs_dv,fimnlabgrs_tc,fimnlabgrs_dv,fimnlabnet_tc,fimnlabnet_dv,fiyrinvinc_tc,fiyrinvinc_dv,fibenothr_tc,fibenothr_dv,j2pay_dv,j2paynet_dv,sex_dv,age_dv,intdatd_dv,intdatm_dv,intdaty_dv,doby_dv,pensioner_dv,npensioner_dv,marstat_dv,npn_dv,npns_dv,ngrp_dv,nnsib_dv,nnssib_dv,ethn_dv,fimnmisc_dv,fimnprben_dv,fimninvnet_dv,fimnpen_dv,fimnsben_dv,fimnnet_dv,country,gor_dv,urban_dv,hhresp_dv,xtra5min_dv,agegr5_dv,agegr10_dv,agegr13_dv,livesp_dv,cohab_dv,single_dv,mastat_dv,hhtype_dv,buno_dv,depchl_dv,nchild_dv,respm16_dv,respf16_dv,rach16_dv,hrpno,ppno,sppno,fnpno,fnspno,mnpno,mnspno,grfpno,grmpno,qfhighfl_dv,hiqual_dv,jbiindb_dv,sf12pcs_dv,sf12mcs_dv,scflag_dv,paygu_if,paynu_if,seearngrs_if,fiyrinvinc_if,fibenothr_if,fimnlabgrs_if,fimngrs_if,ind5mus_xw,ivfho,intdated,intdatem,intdatey,ivh1,ivh2,ivh3,ivh4,ivh5,ivh6,ivh7,ivh8,ivh9,ivh10,ivh11,ivh12,ivh13,ivh14,ivh15,ivh16,hsbeds,hsrooms,hsownd,fuelhave1,fuelhave2,fuelhave3,fuelhave4,fuelhave96,fuelduel,heatch,xphsdct,xphsdba,cduse1,cduse2,cduse5,cduse6,cduse7,cduse8,cduse9,cduse12,cduse13,cduse96,pcnet,xpfood1_g3,xpfdout_g3,xpaltob_g3,ncars,hhintlang,n10to15,fihhmngrs_dv,fihhmngrs_tc,fihhmnlabgrs_dv,fihhmnlabgrs_tc,ctband_if,fihhmnnet1_dv,fihhmnlabnet_dv,fihhmnmisc_dv,fihhmnprben_dv,fihhmninv_dv,fihhmnpen_dv,fihhmnsben_dv,houscost1_dv,houscost2_dv,fihhmngrs1_dv,ctband_dv,ncouple_dv,nonepar_dv,nkids_dv,nch02_dv,nch34_dv,nch511_dv,nch1215_dv,npens_dv,nemp_dv,nue_dv,nwage_dv,nchoecd_dv,nadoecd_dv,ieqmoecd_dv,tenure_dv,fihhnegsei_if,fihhmngrs_if,issue_num,aintlen,outcome,ivtnc,w6osmflag,dcsedfl_dv,lwenum_dv,fwenum_dv,lwintvd_dv,fwintvd_dv,b_hidp,b_pno,b_ivfio,b_ivfho,b_month,c_hidp,c_pno,c_ivfio,c_ivfho,c_month,d_hidp,d_pno,d_ivfio,d_ivfho,d_month,e_hidp,e_ivfio,e_ivfho,g_hidp,g_pno,g_ivfio,g_ivfho,g_month,genetics,epigenetics,xwdat_dv,scend_dv,school_dv,bornuk_dv,generation,evercoh_dv,evermar_dv,anychild_dv,ethn_dv_source,prob91e,prob91w,prob91s,prob99w,prob99s,prob01ni,prob09ni,prob09e,prob09w,prob09s,bb_mortbh_tw,bc_mortbh_tw,bd_mortbh_tw,be_mortbh_tw,bf_mortbh_tw,bg_mortbh_tw,bh_mortbh_tw,bi_mortbh_tw,bj_mortbh_tw,bk_mortbh_tw,bl_mortbh_tw,bm_mortbh_tw,bn_mortbh_tw,bo_mortbh_tw,bp_mortbh_tw,bq_mortbh_tw,br_mortbh_tw,b_mortbh_tw,c_mortbh_tw,d_mortbh_tw,e_mortbh_tw,f_mortbh_tw,g_mortbh_tw,h_mortbh_tw,i_mortbh_tw,b_mortus_tw,c_mortus_tw,d_mortus_tw,e_mortus_tw,f_mortus_tw,g_mortus_tw,h_mortus_tw,psnenub_xd
8,-0.440636,2,-0.730395,1.841032,1,2018,2,2,2,1,1,2,2,0,0,0,0,0,0,1,2,1,2,3,1,1,1,1,1,0,1,1,1,5,5,3,6,2,1,2,0,0,1,0,0,0,0,0,2,0,2,0,4,1,19,-0.660555,0.433265,0,0,0.753119,0,1.093031,0,1.173913,0,-0.140384,0,-0.578007,-0.105704,-0.114112,2,-0.730616,1.841032,1,2018,0.910605,2,1,1,1,1,0,0,0,1,-0.083346,-0.094846,-0.096086,-0.305152,-0.510221,0.269495,1,8,2,1,0,8,4,7,1,0,0,2,20,1,2,1,1,2,1,1,4,4,0,0,2,2,0,0,1,4,0.187421,0.55269,0.290543,1,0,0,0,0,-0.316392,-0.279575,-0.412347,-0.112562,10,1.830483,1,2018,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,5,2,2,1,1,0,1,0,1,1,2,1,1,1,1,1,1,1,1,1,1,0,1,0.723646,0.055377,-0.619351,3,0,1,0.578201,0,0.544842,0,1,0.663123,0.625091,-0.130702,-0.129852,-0.140547,0.991024,0.26687,0.767635,1.457195,0.485031,5,1,0,1,0,0,1,0,1,2,1,2,1,3,0.821956,2,0,1.340219,1,-0.527096,110,0.798744,0,2,14,2,14,2,0.120716,2,1,10,11,0.075896,2,2,11,11,0.074458,2,2,11,11,0.072513,1,10,0.071291,1,1,10,12,0,0,3,15,1,1,6,1,1,1,1,0.701625,-0.248481,-0.296052,-0.250578,-0.294834,-0.258463,-0.196618,-1.363109,-0.186445,-0.239297,0.085939,0.085984,0.086015,0.086112,0.086206,0.086267,0.086321,0.086364,0.106416,0.11067,0.12179,0.127877,0.13438,0.137584,0.140994,0.146117,0.148689,0.152505,0.154641,0.156956,0.157936,0.159864,0.161757,0.164697,0.16532,0.00306,0.008455,0.017376,0.019656,0.025612,0.031429,0.034611,3.596949
9,-2.409916,2,-0.557328,-0.381826,12,2020,2,1,1,1,2,1,2,0,0,0,0,0,0,1,3,1,2,4,1,1,1,1,1,0,1,1,1,6,1,5,2,2,2,2,0,0,1,0,0,0,0,0,2,0,2,0,4,2,7,-0.484519,0.028494,0,0,-0.911076,0,-0.75163,0,-0.818101,0,-0.140384,0,-0.265487,-0.105704,-0.114112,2,-0.557535,-0.381826,12,2020,0.910605,2,1,1,1,1,0,0,0,1,-0.083346,-0.094846,-0.096086,-0.305152,0.109423,-0.455727,1,8,2,3,0,9,5,8,1,0,0,2,20,1,2,1,1,2,1,1,4,4,0,0,2,2,0,0,1,4,-1.047141,1.040586,-3.611892,0,0,0,0,0,-0.316392,-0.279575,-0.412347,-0.112562,12,1.368714,11,2020,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,5,2,2,1,0,0,0,0,-8,1,2,1,1,1,1,1,1,1,1,1,1,0,1,0.723646,1.037402,0.591102,3,0,1,0.059409,0,-0.177171,0,1,0.171333,-0.186583,-0.130702,-0.129852,-0.146551,-0.381974,4.661875,0.743419,-0.002362,0.036173,5,1,0,1,0,0,0,1,1,1,2,2,1,3,0.821956,2,0,1.783508,1,-0.478437,210,0.31018,0,2,14,2,14,2,0.120716,2,1,10,11,0.075896,2,2,11,11,0.074458,2,2,11,11,0.072513,1,10,0.071291,1,1,10,12,0,0,3,15,1,1,6,1,1,1,1,0.701625,-0.248481,-0.296052,-0.250578,-0.294834,-0.258463,-0.196618,-1.363109,-0.186445,-0.239297,0.085939,0.085984,0.086015,0.086112,0.086206,0.086267,0.086321,0.086364,0.106416,0.11067,0.12179,0.127877,0.13438,0.137584,0.140994,0.146117,0.148689,0.152505,0.154641,0.156956,0.157936,0.159864,0.161757,0.164697,0.16532,0.00306,0.008455,0.017376,0.019656,0.025612,0.031429,0.034611,3.596949
18,-0.440636,1,-1.076528,0.320129,2,2014,2,2,2,1,2,2,2,0,0,0,0,0,0,1,4,1,1,1,1,0,0,0,0,0,0,0,0,5,3,5,4,1,1,1,0,0,0,0,0,0,0,0,1,0,1,2,4,1,15,-0.132447,-1.301465,0,0,-0.851091,0,-0.468782,0,-0.406823,0,-0.140384,0,-0.681081,-0.105704,-0.114112,1,-1.076776,0.320129,2,2014,1.028209,2,0,6,1,2,0,0,1,1,-0.083346,-0.094846,-0.096086,-0.305152,-0.714589,-0.419269,1,5,2,3,0,7,4,6,0,0,1,1,19,4,2,0,2,2,2,1,0,0,0,2,1,1,0,0,0,3,0.804701,0.40786,0.060396,0,0,0,0,0,-0.316392,-0.279575,-0.412347,-0.112562,12,1.13783,2,2014,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,3,2,1,1,1,0,0,0,1,1,2,1,-8,-8,-8,-8,-8,-8,-8,-8,-8,-9,1,0.107795,0.055377,-0.619351,2,0,0,0.681966,0,0.710568,0,1,0.625268,0.668363,-0.130702,-0.129852,0.574681,-0.381974,-0.500109,-0.265601,-0.282247,0.574808,3,1,0,0,0,0,0,0,0,3,1,4,0,4,1.150608,1,0,2.095875,1,-0.423366,210,2.915853,0,2,14,2,12,3,-1.088965,4,10,12,3,-1.125452,4,1,12,3,-1.125229,4,1,12,3,-1.126584,1,12,-1.127254,4,10,12,1,0,0,3,16,1,1,5,2,1,2,2,0.701644,-0.248481,-0.296052,-0.250578,-0.294834,-0.258463,-0.196618,0.734615,-0.186445,-0.239297,-0.057334,-0.058501,-0.059567,-0.060711,-0.062104,-0.063653,-0.065195,-0.066464,-0.045805,-0.04155,-0.028713,-0.021523,-0.014491,-0.011802,-0.010281,-0.004049,-0.001212,0.00064,0.0033,0.005504,0.004144,0.005717,0.006538,0.009469,0.006305,0.00306,0.008455,0.017376,0.019656,0.025612,0.031429,0.034611,-0.109542
19,-0.018647,1,-0.730395,1.373062,4,2020,2,2,2,1,2,2,1,0,0,0,0,0,0,1,2,1,2,1,1,0,0,0,0,0,0,0,0,5,5,7,6,1,1,0,0,0,0,0,0,0,0,0,0,0,2,0,2,1,13,0.043589,-1.301465,0,0,-0.872443,0,-0.490303,0,-0.438116,0,-0.140384,0,-0.681081,0.320406,0.444934,1,-0.730616,1.373062,4,2020,1.028209,2,0,6,0,1,0,0,0,1,-0.083346,-0.094846,-0.096086,-0.305152,-0.714589,-0.432246,1,5,2,3,0,8,4,7,0,0,1,1,16,1,2,0,2,2,2,2,0,0,0,2,0,0,0,0,0,3,0.804701,0.335445,0.544809,0,0,0,0,0,-0.316392,-0.279575,-0.412347,-0.112562,12,1.368714,4,2020,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,3,2,1,1,1,0,0,0,1,2,2,1,-8,-8,-8,-8,-8,-8,-8,-8,-8,-8,1,-0.097489,-0.435635,-0.619351,3,0,0,0.028337,0,0.149643,0,1,-0.004052,0.182934,-0.130702,-0.129852,-0.146551,-0.381974,-0.874579,-0.265601,-0.282247,0.009289,3,0,0,0,0,0,0,0,0,2,0,2,0,2,-0.492649,1,0,2.219805,2,-0.418085,210,0.635889,0,2,14,2,12,3,-1.088965,4,10,12,3,-1.125452,4,1,12,3,-1.125229,4,1,12,3,-1.126584,1,12,-1.127254,4,10,12,1,0,0,3,16,1,1,5,2,1,2,2,0.701644,-0.248481,-0.296052,-0.250578,-0.294834,-0.258463,-0.196618,0.734615,-0.186445,-0.239297,-0.057334,-0.058501,-0.059567,-0.060711,-0.062104,-0.063653,-0.065195,-0.066464,-0.045805,-0.04155,-0.028713,-0.021523,-0.014491,-0.011802,-0.010281,-0.004049,-0.001212,0.00064,0.0033,0.005504,0.004144,0.005717,0.006538,0.009469,0.006305,0.00306,0.008455,0.017376,0.019656,0.025612,0.031429,0.034611,-0.109542
27,-0.299973,1,-0.268883,-0.966788,12,2011,2,2,2,1,1,2,2,0,0,0,0,0,0,1,3,1,2,2,1,0,0,0,0,0,0,0,0,3,2,2,3,1,1,1,2,2,2,0,0,0,0,0,1,2,1,2,3,1,20,-0.132447,0.953683,0,0,-0.114943,0,0.269083,0,0.367147,0,-0.116796,0,-0.681081,-0.105704,-0.114112,1,-0.269068,-0.966788,12,2011,0.087384,2,2,6,2,2,0,0,0,1,-0.083346,-0.094846,-0.090505,-0.305152,-0.714589,-0.09582,1,8,2,2,0,10,5,9,0,0,1,1,19,3,2,0,2,2,2,1,0,0,2,2,1,1,0,0,0,5,0.496061,-0.374221,-1.168725,1,0,0,0,1,-0.316392,-0.279575,-0.412347,-0.112562,11,0.098852,11,2011,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,3,3,1,1,0,0,0,0,-8,1,2,1,1,1,1,1,1,1,1,1,1,0,1,-0.282245,-0.440546,-0.619351,1,0,0,-0.063267,0,-0.114478,0,1,-0.056148,-0.136206,-0.130702,-0.129852,0.30941,-0.188809,0.298967,-0.265601,-0.282247,-0.069966,4,1,0,0,0,0,0,0,2,1,2,1,0,3,0.328979,1,0,0.031327,1,0.585649,210,0.147326,0,2,11,2,9,2,0.116745,3,1,11,11,0.071663,3,1,11,11,0.070826,3,2,11,11,0.069379,1,10,0.068308,1,1,10,10,0,0,3,16,1,1,5,2,2,1,4,-1.425262,-0.248481,-0.296052,-0.250578,-0.294834,-0.258463,-0.196618,-1.363109,-0.186445,-0.239297,0.155589,0.15629,0.156941,0.157626,0.158421,0.159285,0.160132,0.160801,0.180569,0.184823,0.195101,0.200647,0.206905,0.210372,0.214712,0.219297,0.221744,0.226526,0.228416,0.230787,0.232913,0.235029,0.237449,0.240402,0.242887,0.00306,0.008455,0.017376,0.019656,0.025612,0.031429,0.034611,-0.802919


## Encoding categorical variables

Great, now we need to encode categorical variables using one-hot encoding

In [39]:
# Encode categorical variables using one-hot encoding, then show shape and head

# Identify categorical (discrete) columns from variable_summary
cat_cols = variable_summary.loc[variable_summary['WillBeDiscrete'], 'Column'].tolist()

# Defensive: keep only those present in X
cat_cols = [c for c in cat_cols if c in X.columns]
print(f'Found {len(cat_cols)} categorical column(s) to encode')

# If there are categorical columns, create dummies and merge with the rest of X
if len(cat_cols) > 0:
    # Convert to string to ensure stable dummy names (preserve distinct categories)
    cat_df = X[cat_cols].astype(str).apply(lambda s: s.str.replace(' ', '_'))
    # Create dummies; drop_first avoids creating a full-rank encoding
    dummies = pd.get_dummies(cat_df, prefix=cat_cols, prefix_sep='_', drop_first=True, dummy_na=False)
    # Build new X: drop original categorical columns, concat dummies
    X_encoded = X.drop(columns=cat_cols).copy()
    # Ensure no column name collisions (rename if necessary)
    overlap = set(X_encoded.columns).intersection(dummies.columns)
    if overlap:
        # Rare: if a dummy name collides with existing column, prefix dummy names with 'dum_'
        dummies = dummies.rename(columns={c: f'dum_{c}' for c in dummies.columns})
    X_encoded = pd.concat([X_encoded, dummies], axis=1)
else:
    print('No categorical columns selected for encoding')
    X_encoded = X.copy()

# Replace X with encoded version for downstream modeling
X = X_encoded

print('Shape of X after encoding:', X.shape)
pd.set_option('display.max_columns', None)
display(X.head())

Found 206 categorical column(s) to encode
Shape of X after encoding: (106771, 1457)


Unnamed: 0,nbrsnci_dv,dvage,istrtdatd,istrtdatmm,istrtdatss,fimngrs_dv,fimnlabgrs_dv,fimnlabnet_dv,fiyrinvinc_dv,fibenothr_dv,j2pay_dv,j2paynet_dv,age_dv,intdatd_dv,doby_dv,fimnmisc_dv,fimnprben_dv,fimninvnet_dv,fimnpen_dv,fimnsben_dv,fimnnet_dv,jbiindb_dv,sf12pcs_dv,sf12mcs_dv,fibenothr_if,fimnlabgrs_if,fimngrs_if,ind5mus_xw,intdated,xpfood1_g3,xpfdout_g3,xpaltob_g3,fihhmngrs_dv,fihhmnlabgrs_dv,fihhmnnet1_dv,fihhmnlabnet_dv,fihhmnmisc_dv,fihhmnprben_dv,fihhmninv_dv,fihhmnpen_dv,fihhmnsben_dv,houscost1_dv,houscost2_dv,fihhmngrs1_dv,ieqmoecd_dv,fihhmngrs_if,aintlen,ivtnc,b_hidp,c_hidp,d_hidp,e_hidp,g_hidp,prob91e,prob91w,prob91s,prob99w,prob99s,prob01ni,prob09ni,prob09e,prob09w,prob09s,bb_mortbh_tw,bc_mortbh_tw,bd_mortbh_tw,be_mortbh_tw,bf_mortbh_tw,bg_mortbh_tw,bh_mortbh_tw,bi_mortbh_tw,bj_mortbh_tw,bk_mortbh_tw,bl_mortbh_tw,bm_mortbh_tw,bn_mortbh_tw,bo_mortbh_tw,bp_mortbh_tw,bq_mortbh_tw,br_mortbh_tw,b_mortbh_tw,c_mortbh_tw,d_mortbh_tw,e_mortbh_tw,f_mortbh_tw,g_mortbh_tw,h_mortbh_tw,i_mortbh_tw,b_mortus_tw,c_mortus_tw,d_mortus_tw,e_mortus_tw,f_mortus_tw,g_mortus_tw,h_mortus_tw,psnenub_xd,sex_2,lkmove_2,xpmove_2,health_2,aidxhh_2,j2has_2,bensta2_1,bensta3_1,bensta4_1,bensta5_1,bensta6_1,bensta7_1,bensta96_1,vote1_2,mobuse_2,employ_2,jbhas_2,j2pay_if_1,fimngrs_tc_1,fimnlabgrs_tc_1,fimnlabnet_tc_1,fiyrinvinc_tc_1,fibenothr_tc_1,pensioner_dv_2,urban_dv_2,xtra5min_dv_1,livesp_dv_1,cohab_dv_1,single_dv_1,depchl_dv_2,respm16_dv_2,respf16_dv_2,rach16_dv_2,qfhighfl_dv_1,scflag_dv_1,paygu_if_1,paynu_if_1,seearngrs_if_1,fiyrinvinc_if_1,fihhmngrs_tc_1,fihhmnlabgrs_tc_1,fihhnegsei_if_1,w6osmflag_1,dcsedfl_dv_2,genetics_1,epigenetics_1,xwdat_dv_3,bornuk_dv_2,evercoh_dv_2,evermar_dv_2,anychild_dv_2,finfut_2,finfut_3,respf16_1,respf16_2,respm16_1,respm16_2,sex_dv_1,sex_dv_2,npn_dv_1,npn_dv_2,npns_dv_1,npns_dv_2,ngrp_dv_1,ngrp_dv_2,hhresp_dv_2,hhresp_dv_3,ivfho_11,ivfho_12,nonepar_dv_1,nonepar_dv_2,vote6_2,vote6_3,vote6_4,country_2,country_3,country_4,ivh10_-2,ivh10_-9,ivh10_0,ivh11_-2,ivh11_-9,ivh11_0,ivh12_-2,ivh12_-9,ivh12_0,ivh13_-2,ivh13_-9,ivh13_0,ivh14_-2,ivh14_-9,ivh14_0,ivh15_-2,ivh15_-9,ivh15_0,ivh16_-2,ivh16_-9,ivh16_0,ctband_if_1,ctband_if_2,ctband_if_3,outcome_210,outcome_211,outcome_214,ethn_dv_source_2,ethn_dv_source_3,ethn_dv_source_4,finnow_2,finnow_3,finnow_4,finnow_5,ivh1_-2,ivh1_-9,ivh1_0,ivh1_1,ivh2_-2,ivh2_-9,ivh2_0,ivh2_1,ivh3_-2,ivh3_-9,ivh3_0,ivh3_1,ivh4_-2,ivh4_-9,ivh4_0,ivh4_1,ivh5_-2,ivh5_-9,ivh5_0,ivh5_1,ivh6_-2,ivh6_-9,ivh6_0,ivh6_1,ivh7_-2,ivh7_-9,ivh7_0,ivh7_1,ivh8_-2,ivh8_-9,ivh8_0,ivh8_1,ivh9_-2,ivh9_-9,ivh9_0,ivh9_1,fuelhave1_-2,fuelhave1_-9,fuelhave1_0,fuelhave1_1,fuelhave2_-2,fuelhave2_-9,fuelhave2_0,fuelhave2_1,fuelhave3_-2,fuelhave3_-9,fuelhave3_0,fuelhave3_1,fuelhave4_-2,fuelhave4_-9,fuelhave4_0,fuelhave4_1,fuelhave96_-2,fuelhave96_-9,fuelhave96_0,fuelhave96_1,fuelduel_-2,fuelduel_-8,fuelduel_1,fuelduel_2,heatch_-2,heatch_-9,heatch_1,heatch_2,pcnet_-2,pcnet_-8,pcnet_1,pcnet_2,ncouple_dv_1,ncouple_dv_2,ncouple_dv_3,ncouple_dv_4,nch34_dv_0,nch34_dv_1,nch34_dv_2,nch34_dv_3,npens_dv_1,npens_dv_2,npens_dv_3,npens_dv_4,nch10to15_1,nch10to15_2,nch10to15_3,nch10to15_4,nch10to15_5,npensioner_dv_0,npensioner_dv_1,npensioner_dv_2,npensioner_dv_3,npensioner_dv_4,marstat_dv_2,marstat_dv_3,marstat_dv_4,marstat_dv_5,marstat_dv_6,hiqual_dv_2,hiqual_dv_3,hiqual_dv_4,hiqual_dv_5,hiqual_dv_9,xphsdct_-2,xphsdct_-8,xphsdct_-9,xphsdct_1,xphsdct_2,xphsdba_-2,xphsdba_-9,xphsdba_1,xphsdba_2,xphsdba_3,cduse1_-2,cduse1_-8,cduse1_-9,cduse1_0,cduse1_1,cduse2_-2,cduse2_-8,cduse2_-9,cduse2_0,cduse2_1,cduse5_-2,cduse5_-8,cduse5_-9,cduse5_0,cduse5_1,cduse6_-2,cduse6_-8,cduse6_-9,cduse6_0,cduse6_1,cduse7_-2,cduse7_-8,cduse7_-9,cduse7_0,cduse7_1,cduse8_-2,cduse8_-8,cduse8_-9,cduse8_0,cduse8_1,cduse9_-2,cduse9_-8,cduse9_-9,cduse9_0,cduse9_1,cduse12_-2,cduse12_-8,cduse12_-9,cduse12_0,cduse12_1,cduse13_-2,cduse13_-8,cduse13_-9,cduse13_0,cduse13_1,cduse96_-2,cduse96_-8,cduse96_-9,cduse96_0,cduse96_1,n10to15_1,n10to15_2,n10to15_3,n10to15_4,n10to15_5,nch02_dv_0,nch02_dv_1,nch02_dv_2,nch02_dv_3,nch02_dv_5,nch1215_dv_0,nch1215_dv_1,nch1215_dv_2,nch1215_dv_3,nch1215_dv_4,fwenum_dv_2,fwenum_dv_3,fwenum_dv_4,fwenum_dv_5,fwenum_dv_6,generation_2,generation_3,generation_4,generation_5,generation_6,nadoptch_1,nadoptch_2,nadoptch_3,nadoptch_4,nadoptch_5,nadoptch_6,sclfsat1_2,sclfsat1_3,sclfsat1_4,sclfsat1_5,sclfsat1_6,sclfsat1_7,sclfsat2_2,sclfsat2_3,sclfsat2_4,sclfsat2_5,sclfsat2_6,sclfsat2_7,sclfsat7_2,sclfsat7_3,sclfsat7_4,sclfsat7_5,sclfsat7_6,sclfsat7_7,sclfsato_2,sclfsato_3,sclfsato_4,sclfsato_5,sclfsato_6,sclfsato_7,agegr10_dv_3,agegr10_dv_4,agegr10_dv_5,agegr10_dv_6,agegr10_dv_7,agegr10_dv_8,nch5to15_1,nch5to15_2,nch5to15_3,nch5to15_4,nch5to15_5,nch5to15_6,nch5to15_7,nch511_dv_0,nch511_dv_1,nch511_dv_2,nch511_dv_3,nch511_dv_4,nch511_dv_5,nch511_dv_6,nch14resp_1,nch14resp_2,nch14resp_3,nch14resp_4,nch14resp_5,nch14resp_6,nch14resp_7,nch14resp_9,nch415resp_1,nch415resp_2,nch415resp_3,nch415resp_4,nch415resp_5,nch415resp_6,nch415resp_7,nch415resp_8,marstat_2,marstat_3,marstat_4,marstat_5,marstat_6,marstat_7,marstat_8,marstat_9,pn2pno_2,pn2pno_3,pn2pno_4,pn2pno_5,pn2pno_6,pn2pno_7,pn2pno_8,pn2pno_9,pns2pno_2,pns2pno_3,pns2pno_4,pns2pno_5,pns2pno_6,pns2pno_7,pns2pno_8,pns2pno_9,hrpno_10,hrpno_2,hrpno_3,hrpno_4,hrpno_5,hrpno_6,hrpno_7,hrpno_8,grfpno_1,grfpno_2,grfpno_3,grfpno_4,grfpno_5,grfpno_6,grfpno_7,grfpno_8,hhintlang_-9,hhintlang_0,hhintlang_2,hhintlang_4,hhintlang_5,hhintlang_6,hhintlang_8,hhintlang_9,nemp_dv_1,nemp_dv_2,nemp_dv_3,nemp_dv_4,nemp_dv_5,nemp_dv_6,nemp_dv_7,nemp_dv_8,nchoecd_dv_1,nchoecd_dv_2,nchoecd_dv_3,nchoecd_dv_4,nchoecd_dv_5,nchoecd_dv_6,nchoecd_dv_7,nchoecd_dv_8,tenure_dv_1,tenure_dv_2,tenure_dv_3,tenure_dv_4,tenure_dv_5,tenure_dv_6,tenure_dv_7,tenure_dv_8,g_pno_2,g_pno_3,g_pno_4,g_pno_5,g_pno_6,g_pno_7,g_pno_8,g_pno_9,nchresp_1,nchresp_2,nchresp_3,nchresp_4,nchresp_5,nchresp_6,nchresp_7,nchresp_8,nchresp_9,nchunder16_1,nchunder16_2,nchunder16_3,nchunder16_4,nchunder16_5,nchunder16_6,nchunder16_7,nchunder16_8,nchunder16_9,hgbiom_1,hgbiom_13,hgbiom_2,hgbiom_3,hgbiom_4,hgbiom_5,hgbiom_6,hgbiom_7,hgbiom_8,mastat_dv_10,mastat_dv_2,mastat_dv_3,mastat_dv_4,mastat_dv_5,mastat_dv_6,mastat_dv_7,mastat_dv_8,mastat_dv_9,buno_dv_13,buno_dv_2,buno_dv_3,buno_dv_4,buno_dv_5,buno_dv_6,buno_dv_7,buno_dv_8,buno_dv_9,nchild_dv_1,nchild_dv_2,nchild_dv_3,nchild_dv_4,nchild_dv_5,nchild_dv_6,nchild_dv_7,nchild_dv_8,nchild_dv_9,mnpno_1,mnpno_13,mnpno_2,mnpno_3,mnpno_4,mnpno_5,mnpno_6,mnpno_7,mnpno_8,mnspno_1,mnspno_13,mnspno_2,mnspno_3,mnspno_4,mnspno_5,mnspno_6,mnspno_7,mnspno_8,grmpno_1,grmpno_13,grmpno_2,grmpno_3,grmpno_4,grmpno_5,grmpno_6,grmpno_7,grmpno_8,hsownd_-2,hsownd_-8,hsownd_-9,hsownd_1,hsownd_2,hsownd_3,hsownd_4,hsownd_5,hsownd_97,nkids_dv_1,nkids_dv_2,nkids_dv_3,nkids_dv_4,nkids_dv_5,nkids_dv_6,nkids_dv_7,nkids_dv_8,nkids_dv_9,nue_dv_1,nue_dv_10,nue_dv_2,nue_dv_3,nue_dv_4,nue_dv_5,nue_dv_6,nue_dv_7,nue_dv_8,c_pno_10,c_pno_2,c_pno_3,c_pno_4,c_pno_5,c_pno_6,c_pno_7,c_pno_8,c_pno_9,d_pno_10,d_pno_2,d_pno_3,d_pno_4,d_pno_5,d_pno_6,d_pno_7,d_pno_8,d_pno_9,nnatch_1,nnatch_10,nnatch_2,nnatch_3,nnatch_4,nnatch_5,nnatch_6,nnatch_7,nnatch_8,nnatch_9,hgbiof_1,hgbiof_10,hgbiof_2,hgbiof_3,hgbiof_4,hgbiof_5,hgbiof_6,hgbiof_7,hgbiof_8,hgbiof_9,pn1pno_1,pn1pno_10,pn1pno_13,pn1pno_2,pn1pno_3,pn1pno_4,pn1pno_5,pn1pno_6,pn1pno_7,pn1pno_8,pns1pno_1,pns1pno_10,pns1pno_13,pns1pno_2,pns1pno_3,pns1pno_4,pns1pno_5,pns1pno_6,pns1pno_7,pns1pno_8,ppno_1,ppno_11,ppno_2,ppno_3,ppno_4,ppno_5,ppno_6,ppno_7,ppno_8,ppno_9,sppno_1,sppno_11,sppno_2,sppno_3,sppno_4,sppno_5,sppno_6,sppno_7,sppno_8,sppno_9,fnpno_1,fnpno_10,fnpno_2,fnpno_3,fnpno_4,fnpno_5,fnpno_6,fnpno_7,fnpno_8,fnpno_9,fnspno_1,fnspno_10,fnspno_2,fnspno_3,fnspno_4,fnspno_5,fnspno_6,fnspno_7,fnspno_8,fnspno_9,ctband_dv_1,ctband_dv_10,ctband_dv_2,ctband_dv_3,ctband_dv_4,ctband_dv_5,ctband_dv_6,ctband_dv_7,ctband_dv_8,ctband_dv_9,b_pno_10,b_pno_11,b_pno_2,b_pno_3,b_pno_4,b_pno_5,b_pno_6,b_pno_7,b_pno_8,b_pno_9,istrtdatm_10,istrtdatm_11,istrtdatm_12,istrtdatm_2,istrtdatm_3,istrtdatm_4,istrtdatm_5,istrtdatm_6,istrtdatm_7,istrtdatm_8,istrtdatm_9,intdatm_dv_10,intdatm_dv_11,intdatm_dv_12,intdatm_dv_2,intdatm_dv_3,intdatm_dv_4,intdatm_dv_5,intdatm_dv_6,intdatm_dv_7,intdatm_dv_8,intdatm_dv_9,nnsib_dv_1,nnsib_dv_10,nnsib_dv_11,nnsib_dv_2,nnsib_dv_3,nnsib_dv_4,nnsib_dv_5,nnsib_dv_6,nnsib_dv_7,nnsib_dv_8,nnsib_dv_9,nnssib_dv_1,nnssib_dv_10,nnssib_dv_11,nnssib_dv_2,nnssib_dv_3,nnssib_dv_4,nnssib_dv_5,nnssib_dv_6,nnssib_dv_7,nnssib_dv_8,nnssib_dv_9,gor_dv_10,gor_dv_11,gor_dv_12,gor_dv_2,gor_dv_3,gor_dv_4,gor_dv_5,gor_dv_6,gor_dv_7,gor_dv_8,gor_dv_9,agegr5_dv_11,agegr5_dv_12,agegr5_dv_13,agegr5_dv_14,agegr5_dv_15,agegr5_dv_4,agegr5_dv_5,agegr5_dv_6,agegr5_dv_7,agegr5_dv_8,agegr5_dv_9,agegr13_dv_11,agegr13_dv_12,agegr13_dv_13,agegr13_dv_2,agegr13_dv_3,agegr13_dv_4,agegr13_dv_5,agegr13_dv_6,agegr13_dv_7,agegr13_dv_8,agegr13_dv_9,intdatem_10,intdatem_11,intdatem_12,intdatem_2,intdatem_3,intdatem_4,intdatem_5,intdatem_6,intdatem_7,intdatem_8,intdatem_9,nwage_dv_1,nwage_dv_10,nwage_dv_11,nwage_dv_2,nwage_dv_3,nwage_dv_4,nwage_dv_5,nwage_dv_6,nwage_dv_7,nwage_dv_8,nwage_dv_9,nadoecd_dv_10,nadoecd_dv_11,nadoecd_dv_12,nadoecd_dv_2,nadoecd_dv_3,nadoecd_dv_4,nadoecd_dv_5,nadoecd_dv_6,nadoecd_dv_7,nadoecd_dv_8,nadoecd_dv_9,fwintvd_dv_10,fwintvd_dv_11,fwintvd_dv_12,fwintvd_dv_2,fwintvd_dv_3,fwintvd_dv_4,fwintvd_dv_5,fwintvd_dv_6,fwintvd_dv_7,fwintvd_dv_8,fwintvd_dv_9,istrtdaty_2010,istrtdaty_2011,istrtdaty_2012,istrtdaty_2013,istrtdaty_2014,istrtdaty_2015,istrtdaty_2016,istrtdaty_2017,istrtdaty_2018,istrtdaty_2019,istrtdaty_2020,istrtdaty_2021,istrtdaty_2022,jbstat_10,jbstat_11,jbstat_12,jbstat_13,jbstat_2,jbstat_3,jbstat_4,jbstat_5,jbstat_6,jbstat_7,jbstat_8,jbstat_9,jbstat_97,intdaty_dv_2010,intdaty_dv_2011,intdaty_dv_2012,intdaty_dv_2013,intdaty_dv_2014,intdaty_dv_2015,intdaty_dv_2016,intdaty_dv_2017,intdaty_dv_2018,intdaty_dv_2019,intdaty_dv_2020,intdaty_dv_2021,intdaty_dv_2022,intdatey_2010,intdatey_2011,intdatey_2012,intdatey_2013,intdatey_2014,intdatey_2015,intdatey_2016,intdatey_2017,intdatey_2018,intdatey_2019,intdatey_2020,intdatey_2021,intdatey_2022,lwenum_dv_10,lwenum_dv_11,lwenum_dv_12,lwenum_dv_13,lwenum_dv_14,lwenum_dv_2,lwenum_dv_3,lwenum_dv_4,lwenum_dv_5,lwenum_dv_6,lwenum_dv_7,lwenum_dv_8,lwenum_dv_9,lwintvd_dv_10,lwintvd_dv_11,lwintvd_dv_12,lwintvd_dv_13,lwintvd_dv_14,lwintvd_dv_2,lwintvd_dv_3,lwintvd_dv_4,lwintvd_dv_5,lwintvd_dv_6,lwintvd_dv_7,lwintvd_dv_8,lwintvd_dv_9,hhsize_10,hhsize_11,hhsize_12,hhsize_13,hhsize_14,hhsize_15,hhsize_2,hhsize_3,hhsize_4,hhsize_5,hhsize_6,hhsize_7,hhsize_8,hhsize_9,racel_dv_10,racel_dv_11,racel_dv_12,racel_dv_13,racel_dv_14,racel_dv_15,racel_dv_16,racel_dv_17,racel_dv_2,racel_dv_4,racel_dv_5,racel_dv_6,racel_dv_7,racel_dv_8,racel_dv_9,racel_dv_97,ethn_dv_10,ethn_dv_11,ethn_dv_12,ethn_dv_13,ethn_dv_14,ethn_dv_15,ethn_dv_16,ethn_dv_17,ethn_dv_2,ethn_dv_4,ethn_dv_5,ethn_dv_6,ethn_dv_7,ethn_dv_8,ethn_dv_9,ethn_dv_97,hhtype_dv_10,hhtype_dv_11,hhtype_dv_12,hhtype_dv_16,hhtype_dv_17,hhtype_dv_18,hhtype_dv_19,hhtype_dv_2,hhtype_dv_20,hhtype_dv_21,hhtype_dv_22,hhtype_dv_23,hhtype_dv_3,hhtype_dv_4,hhtype_dv_5,hhtype_dv_6,hhtype_dv_8,scend_dv_10,scend_dv_11,scend_dv_116,scend_dv_12,scend_dv_13,scend_dv_14,scend_dv_15,scend_dv_16,scend_dv_17,scend_dv_18,scend_dv_19,scend_dv_20,scend_dv_21,scend_dv_22,scend_dv_23,scend_dv_24,scend_dv_7,b_ivfho_11,b_ivfho_12,b_ivfho_13,b_ivfho_39,b_ivfho_50,b_ivfho_51,b_ivfho_52,b_ivfho_53,b_ivfho_54,b_ivfho_56,b_ivfho_59,b_ivfho_60,b_ivfho_61,b_ivfho_62,b_ivfho_81,b_ivfho_91,b_ivfho_96,b_ivfho_97,hsbeds_-2,hsbeds_-8,hsbeds_-9,hsbeds_0,hsbeds_1,hsbeds_10,hsbeds_11,hsbeds_12,hsbeds_15,hsbeds_2,hsbeds_23,hsbeds_29,hsbeds_3,hsbeds_4,hsbeds_5,hsbeds_6,hsbeds_7,hsbeds_8,hsbeds_9,hsrooms_-2,hsrooms_-8,hsrooms_-9,hsrooms_1,hsrooms_10,hsrooms_12,hsrooms_13,hsrooms_15,hsrooms_16,hsrooms_2,hsrooms_20,hsrooms_3,hsrooms_4,hsrooms_5,hsrooms_6,hsrooms_60,hsrooms_7,hsrooms_8,hsrooms_9,d_ivfho_11,d_ivfho_12,d_ivfho_13,d_ivfho_39,d_ivfho_50,d_ivfho_51,d_ivfho_53,d_ivfho_55,d_ivfho_56,d_ivfho_59,d_ivfho_60,d_ivfho_61,d_ivfho_62,d_ivfho_63,d_ivfho_65,d_ivfho_80,d_ivfho_81,d_ivfho_91,d_ivfho_92,ncars_-2,ncars_-9,ncars_0,ncars_1,ncars_10,ncars_11,ncars_12,ncars_15,ncars_19,ncars_2,ncars_3,ncars_30,ncars_34,ncars_4,ncars_5,ncars_6,ncars_7,ncars_8,ncars_9,ncars_97,c_ivfho_11,c_ivfho_12,c_ivfho_13,c_ivfho_50,c_ivfho_51,c_ivfho_52,c_ivfho_53,c_ivfho_54,c_ivfho_55,c_ivfho_56,c_ivfho_59,c_ivfho_60,c_ivfho_61,c_ivfho_62,c_ivfho_63,c_ivfho_65,c_ivfho_80,c_ivfho_81,c_ivfho_96,c_ivfho_97,b_ivfio_10,b_ivfio_11,b_ivfio_14,b_ivfio_15,b_ivfio_16,b_ivfio_18,b_ivfio_2,b_ivfio_21,b_ivfio_22,b_ivfio_24,b_ivfio_25,b_ivfio_50,b_ivfio_51,b_ivfio_53,b_ivfio_54,b_ivfio_60,b_ivfio_61,b_ivfio_63,b_ivfio_80,b_ivfio_83,b_ivfio_9,issue_num_1,issue_num_10,issue_num_11,issue_num_12,issue_num_13,issue_num_14,issue_num_15,issue_num_16,issue_num_17,issue_num_18,issue_num_19,issue_num_2,issue_num_23,issue_num_25,issue_num_26,issue_num_3,issue_num_4,issue_num_5,issue_num_6,issue_num_7,issue_num_8,issue_num_9,c_ivfio_10,c_ivfio_11,c_ivfio_14,c_ivfio_15,c_ivfio_16,c_ivfio_18,c_ivfio_2,c_ivfio_21,c_ivfio_24,c_ivfio_25,c_ivfio_50,c_ivfio_51,c_ivfio_52,c_ivfio_53,c_ivfio_54,c_ivfio_55,c_ivfio_57,c_ivfio_60,c_ivfio_63,c_ivfio_67,c_ivfio_83,c_ivfio_9,e_ivfho_11,e_ivfho_12,e_ivfho_13,e_ivfho_50,e_ivfho_51,e_ivfho_53,e_ivfho_55,e_ivfho_56,e_ivfho_59,e_ivfho_60,e_ivfho_61,e_ivfho_62,e_ivfho_63,e_ivfho_65,e_ivfho_80,e_ivfho_81,e_ivfho_90,e_ivfho_91,e_ivfho_92,e_ivfho_93,e_ivfho_96,e_ivfho_97,istrtdathh_1,istrtdathh_10,istrtdathh_11,istrtdathh_12,istrtdathh_13,istrtdathh_14,istrtdathh_15,istrtdathh_16,istrtdathh_17,istrtdathh_18,istrtdathh_19,istrtdathh_2,istrtdathh_20,istrtdathh_21,istrtdathh_22,istrtdathh_23,istrtdathh_3,istrtdathh_4,istrtdathh_5,istrtdathh_6,istrtdathh_7,istrtdathh_8,istrtdathh_9,b_month_10,b_month_11,b_month_12,b_month_13,b_month_14,b_month_15,b_month_16,b_month_17,b_month_18,b_month_19,b_month_2,b_month_20,b_month_21,b_month_22,b_month_23,b_month_24,b_month_3,b_month_4,b_month_5,b_month_6,b_month_7,b_month_8,b_month_9,c_month_10,c_month_11,c_month_12,c_month_13,c_month_14,c_month_15,c_month_16,c_month_17,c_month_18,c_month_19,c_month_2,c_month_20,c_month_21,c_month_22,c_month_23,c_month_24,c_month_3,c_month_4,c_month_5,c_month_6,c_month_7,c_month_8,c_month_9,d_ivfio_10,d_ivfio_11,d_ivfio_14,d_ivfio_15,d_ivfio_16,d_ivfio_18,d_ivfio_2,d_ivfio_21,d_ivfio_24,d_ivfio_25,d_ivfio_50,d_ivfio_51,d_ivfio_52,d_ivfio_53,d_ivfio_54,d_ivfio_55,d_ivfio_57,d_ivfio_60,d_ivfio_63,d_ivfio_67,d_ivfio_80,d_ivfio_81,d_ivfio_9,d_month_10,d_month_11,d_month_12,d_month_13,d_month_14,d_month_15,d_month_16,d_month_17,d_month_18,d_month_19,d_month_2,d_month_20,d_month_21,d_month_22,d_month_23,d_month_24,d_month_3,d_month_4,d_month_5,d_month_6,d_month_7,d_month_8,d_month_9,g_ivfio_10,g_ivfio_11,g_ivfio_14,g_ivfio_15,g_ivfio_16,g_ivfio_2,g_ivfio_21,g_ivfio_24,g_ivfio_25,g_ivfio_50,g_ivfio_51,g_ivfio_52,g_ivfio_53,g_ivfio_54,g_ivfio_57,g_ivfio_60,g_ivfio_63,g_ivfio_80,g_ivfio_81,g_ivfio_83,g_ivfio_84,g_ivfio_9,g_ivfio_99,g_month_10,g_month_11,g_month_12,g_month_13,g_month_14,g_month_15,g_month_16,g_month_17,g_month_18,g_month_19,g_month_2,g_month_20,g_month_21,g_month_22,g_month_23,g_month_24,g_month_3,g_month_4,g_month_5,g_month_6,g_month_7,g_month_8,g_month_9,g_ivfho_11,g_ivfho_12,g_ivfho_13,g_ivfho_14,g_ivfho_16,g_ivfho_39,g_ivfho_50,g_ivfho_51,g_ivfho_52,g_ivfho_53,g_ivfho_54,g_ivfho_55,g_ivfho_59,g_ivfho_60,g_ivfho_61,g_ivfho_62,g_ivfho_63,g_ivfho_65,g_ivfho_66,g_ivfho_81,g_ivfho_90,g_ivfho_91,g_ivfho_96,g_ivfho_97,e_ivfio_10,e_ivfio_11,e_ivfio_14,e_ivfio_15,e_ivfio_16,e_ivfio_18,e_ivfio_2,e_ivfio_21,e_ivfio_24,e_ivfio_25,e_ivfio_50,e_ivfio_51,e_ivfio_52,e_ivfio_53,e_ivfio_54,e_ivfio_55,e_ivfio_57,e_ivfio_60,e_ivfio_62,e_ivfio_63,e_ivfio_67,e_ivfio_80,e_ivfio_81,e_ivfio_82,e_ivfio_83,e_ivfio_9,e_ivfio_99
8,-0.440636,-0.730395,1.841032,-0.660555,0.433265,0.753119,1.093031,1.173913,-0.140384,-0.578007,-0.105704,-0.114112,-0.730616,1.841032,0.910605,-0.083346,-0.094846,-0.096086,-0.305152,-0.510221,0.269495,0.187421,0.55269,0.290543,-0.316392,-0.279575,-0.412347,-0.112562,1.830483,0.723646,0.055377,-0.619351,0.578201,0.544842,0.663123,0.625091,-0.130702,-0.129852,-0.140547,0.991024,0.26687,0.767635,1.457195,0.485031,0.821956,1.340219,-0.527096,0.798744,0.120716,0.075896,0.074458,0.072513,0.071291,0.701625,-0.248481,-0.296052,-0.250578,-0.294834,-0.258463,-0.196618,-1.363109,-0.186445,-0.239297,0.085939,0.085984,0.086015,0.086112,0.086206,0.086267,0.086321,0.086364,0.106416,0.11067,0.12179,0.127877,0.13438,0.137584,0.140994,0.146117,0.148689,0.152505,0.154641,0.156956,0.157936,0.159864,0.161757,0.164697,0.16532,0.00306,0.008455,0.017376,0.019656,0.025612,0.031429,0.034611,3.596949,True,True,True,False,True,True,False,False,False,False,False,False,True,True,False,False,False,False,False,False,False,False,False,True,True,False,True,False,False,True,False,True,False,True,True,False,False,False,False,False,False,False,False,True,False,False,True,False,False,False,False,False,False,False,False,True,False,False,True,True,False,True,False,False,False,False,False,False,False,False,False,False,True,False,False,False,False,False,False,True,False,False,True,False,False,True,False,False,True,False,False,True,False,False,True,False,False,True,True,False,False,False,False,False,False,False,False,True,False,False,False,False,False,False,True,False,False,True,False,False,False,True,False,False,False,True,False,False,False,True,False,False,False,True,False,False,False,True,False,False,False,True,False,False,False,True,False,False,False,False,True,False,False,False,True,False,False,True,False,False,False,False,True,False,False,True,False,False,False,True,False,False,False,True,False,False,False,True,False,True,False,False,False,True,False,False,False,True,False,False,False,True,False,False,False,False,False,True,False,False,False,False,False,False,False,False,False,False,True,False,False,False,False,False,False,True,False,False,True,False,False,False,False,False,False,True,False,False,False,False,True,False,False,False,False,True,False,False,False,False,True,False,False,False,False,True,False,False,False,False,True,False,False,False,False,True,False,False,False,False,True,False,False,False,False,True,False,False,False,True,False,True,False,False,False,False,True,False,False,False,False,True,False,False,False,False,True,False,False,False,False,False,False,False,False,True,False,False,False,False,False,False,False,False,False,True,False,False,False,False,False,True,False,False,False,True,False,False,False,False,False,False,False,False,True,False,False,True,False,False,False,False,True,False,False,False,False,False,False,False,True,False,False,False,False,False,True,False,False,False,False,False,False,False,True,False,False,False,False,False,False,False,True,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,True,False,False,False,False,False,False,False,True,False,False,False,False,False,False,True,False,False,False,False,False,False,False,False,True,False,False,False,False,False,False,False,False,False,False,False,False,False,False,True,False,False,False,False,False,False,False,False,True,False,False,False,False,False,False,False,False,False,False,True,False,False,False,False,False,False,False,True,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,True,False,False,False,False,False,False,False,False,False,False,True,False,False,False,False,False,False,False,False,True,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,True,False,False,False,False,True,False,False,False,False,False,False,False,False,True,False,False,False,False,False,False,False,False,False,True,False,False,False,False,False,False,False,False,True,False,False,False,False,False,False,False,True,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,True,False,False,False,False,False,False,False,False,False,True,False,False,False,False,False,False,False,False,False,False,True,False,False,False,False,False,False,False,False,False,True,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,True,False,False,False,False,False,False,True,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,True,False,False,False,False,False,False,False,False,False,False,True,False,False,False,False,False,False,False,False,False,True,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,True,False,False,False,False,False,False,False,False,False,False,False,True,False,False,False,False,False,False,False,False,False,True,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,True,False,False,False,False,False,False,False,False,True,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,True,False,False,False,False,False,False,False,False,False,False,False,False,True,False,False,False,False,False,False,False,False,True,False,False,False,False,False,False,False,False,False,False,False,False,True,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,True,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,True,False,False,False,False,False,False,False,False,False,False,False,False,False,False,True,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,True,False,False,False,False,False,False,False,False,False,False,False,False,False,True,False,False,False,False,False,False,False,False,False,True,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,True,False,False,False,False,False,False,False,False,False,True,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,True,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,True,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,True,False,False,False,False,False,False,False,False,False,False,False,False,False,True,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,True,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,True,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,True,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,True,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False
9,-2.409916,-0.557328,-0.381826,-0.484519,0.028494,-0.911076,-0.75163,-0.818101,-0.140384,-0.265487,-0.105704,-0.114112,-0.557535,-0.381826,0.910605,-0.083346,-0.094846,-0.096086,-0.305152,0.109423,-0.455727,-1.047141,1.040586,-3.611892,-0.316392,-0.279575,-0.412347,-0.112562,1.368714,0.723646,1.037402,0.591102,0.059409,-0.177171,0.171333,-0.186583,-0.130702,-0.129852,-0.146551,-0.381974,4.661875,0.743419,-0.002362,0.036173,0.821956,1.783508,-0.478437,0.31018,0.120716,0.075896,0.074458,0.072513,0.071291,0.701625,-0.248481,-0.296052,-0.250578,-0.294834,-0.258463,-0.196618,-1.363109,-0.186445,-0.239297,0.085939,0.085984,0.086015,0.086112,0.086206,0.086267,0.086321,0.086364,0.106416,0.11067,0.12179,0.127877,0.13438,0.137584,0.140994,0.146117,0.148689,0.152505,0.154641,0.156956,0.157936,0.159864,0.161757,0.164697,0.16532,0.00306,0.008455,0.017376,0.019656,0.025612,0.031429,0.034611,3.596949,True,True,False,True,False,True,False,False,False,False,False,False,True,True,False,True,True,False,False,False,False,False,False,True,True,False,True,False,False,True,False,True,False,True,False,False,False,False,False,False,False,False,False,True,False,False,True,False,False,False,False,False,False,False,False,True,False,False,True,True,False,True,False,False,False,False,True,False,True,False,False,False,False,True,False,False,False,False,False,True,False,False,True,False,False,True,False,False,True,False,False,True,False,False,True,False,False,True,True,False,False,True,False,False,False,False,False,False,True,False,False,False,False,False,True,False,False,True,False,False,False,True,False,False,False,True,False,False,False,True,False,False,False,True,False,False,False,True,False,False,False,True,False,False,False,True,False,False,False,False,True,False,False,True,False,False,False,True,False,False,False,True,False,False,False,True,False,False,True,False,False,False,False,True,False,False,False,True,False,True,False,False,False,True,False,False,False,True,False,False,False,True,False,False,False,False,False,True,False,False,False,False,False,False,False,False,False,False,True,False,False,False,False,False,False,True,False,False,True,False,False,False,False,False,False,True,False,False,False,False,True,False,False,False,False,True,False,False,False,False,True,False,False,False,False,True,False,False,False,False,True,False,False,False,False,True,False,False,False,False,True,False,False,False,False,True,False,False,False,True,False,True,False,False,False,False,True,False,False,False,False,False,True,False,False,False,True,False,False,False,False,False,False,False,False,True,False,False,False,False,False,False,False,False,False,False,True,False,False,False,False,False,False,False,False,False,False,True,False,False,True,False,False,False,False,False,False,False,True,False,False,False,True,False,False,False,False,False,False,True,False,False,False,False,False,False,True,False,False,False,False,False,False,False,True,False,False,False,False,False,False,False,True,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,True,False,False,False,False,False,False,True,False,False,False,False,False,False,False,True,False,False,False,False,False,False,False,False,True,False,False,False,False,False,False,False,False,False,False,False,False,False,False,True,False,False,False,False,False,False,False,False,True,False,False,False,False,False,False,False,False,False,False,True,False,False,False,False,False,False,False,True,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,True,False,False,False,False,False,False,False,False,False,False,True,False,False,False,False,False,False,False,False,True,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,True,False,False,False,False,True,False,False,False,False,False,False,False,False,False,False,True,False,False,False,False,False,False,False,True,False,False,False,False,False,False,False,False,True,False,False,False,False,False,False,False,True,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,True,False,False,False,False,False,False,False,False,False,True,False,False,False,False,False,False,False,False,False,False,True,False,False,False,False,False,False,False,False,False,True,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,True,False,False,False,False,False,False,True,False,False,False,False,False,False,False,False,False,True,False,False,False,False,False,False,False,False,False,False,True,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,True,False,False,False,False,False,False,False,False,False,False,False,True,False,False,False,False,False,False,False,False,False,True,False,False,True,False,False,False,False,False,False,False,False,False,False,False,False,True,False,False,False,False,False,False,False,False,False,False,False,True,False,False,False,False,False,False,False,False,False,True,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,True,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,True,False,False,False,False,False,False,False,False,False,False,False,False,True,False,False,False,False,False,False,True,False,False,False,False,False,False,False,False,False,False,False,False,True,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,True,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,True,False,False,False,False,False,False,False,False,False,False,False,False,False,False,True,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,True,False,False,False,False,False,False,False,False,False,False,False,False,False,True,False,False,False,False,False,False,False,False,False,True,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,True,False,False,False,False,False,False,False,False,False,True,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,True,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,True,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,True,False,False,False,True,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,True,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,True,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,True,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,True,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False
18,-0.440636,-1.076528,0.320129,-0.132447,-1.301465,-0.851091,-0.468782,-0.406823,-0.140384,-0.681081,-0.105704,-0.114112,-1.076776,0.320129,1.028209,-0.083346,-0.094846,-0.096086,-0.305152,-0.714589,-0.419269,0.804701,0.40786,0.060396,-0.316392,-0.279575,-0.412347,-0.112562,1.13783,0.107795,0.055377,-0.619351,0.681966,0.710568,0.625268,0.668363,-0.130702,-0.129852,0.574681,-0.381974,-0.500109,-0.265601,-0.282247,0.574808,1.150608,2.095875,-0.423366,2.915853,-1.088965,-1.125452,-1.125229,-1.126584,-1.127254,0.701644,-0.248481,-0.296052,-0.250578,-0.294834,-0.258463,-0.196618,0.734615,-0.186445,-0.239297,-0.057334,-0.058501,-0.059567,-0.060711,-0.062104,-0.063653,-0.065195,-0.066464,-0.045805,-0.04155,-0.028713,-0.021523,-0.014491,-0.011802,-0.010281,-0.004049,-0.001212,0.00064,0.0033,0.005504,0.004144,0.005717,0.006538,0.009469,0.006305,0.00306,0.008455,0.017376,0.019656,0.025612,0.031429,0.034611,-0.109542,False,True,True,True,True,True,False,False,False,False,False,False,True,False,False,False,False,False,False,False,False,False,False,True,True,False,False,False,True,True,True,True,True,False,False,False,False,False,False,False,False,False,False,True,False,False,True,False,True,False,True,False,False,False,False,False,False,True,False,True,False,False,True,False,False,False,True,False,True,False,False,False,False,False,False,False,False,False,False,True,False,False,True,False,False,True,False,False,True,False,False,True,False,False,True,False,False,True,True,False,False,True,False,False,True,False,False,False,False,True,False,False,False,False,True,False,False,True,False,False,False,True,False,False,False,True,False,False,False,True,False,False,False,True,False,False,False,True,False,False,False,True,False,False,False,True,False,False,False,False,True,False,False,False,True,False,False,True,False,False,False,True,False,False,False,True,False,False,False,True,False,False,False,True,False,False,False,True,False,True,False,False,False,True,False,False,False,False,False,False,False,False,False,False,False,False,True,False,False,False,False,False,False,False,False,True,False,True,False,False,False,False,False,False,False,True,False,False,True,False,False,False,True,False,False,False,False,True,False,False,False,False,True,False,False,False,False,True,False,False,False,False,True,False,False,False,False,True,False,False,False,False,True,False,False,False,False,True,False,False,False,False,True,False,False,False,False,False,True,False,False,False,False,False,False,False,True,False,False,False,False,True,False,False,False,False,True,False,False,False,False,False,False,False,True,False,False,False,False,False,False,False,False,False,False,True,False,False,False,True,False,False,False,False,False,False,False,True,False,False,False,False,True,False,False,False,False,True,False,False,False,False,False,False,False,False,False,False,False,True,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,True,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,True,False,False,False,False,False,False,False,False,True,False,False,False,False,False,False,False,False,False,False,False,False,False,True,False,False,False,False,False,False,False,False,False,True,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,True,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,True,False,False,False,False,False,False,False,False,False,False,False,False,False,False,True,False,False,False,False,False,False,False,False,True,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,True,False,False,False,False,False,False,False,False,False,False,False,False,False,False,True,False,False,False,False,False,False,False,False,False,False,False,True,False,False,False,False,False,False,False,False,True,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,True,False,False,False,False,False,False,False,False,False,True,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,True,False,False,False,False,False,False,False,False,False,False,True,False,False,False,False,False,False,False,False,False,False,True,False,False,False,False,False,False,False,False,True,False,False,False,False,False,False,False,False,False,False,True,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,True,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,True,False,False,False,False,False,False,False,False,False,False,False,False,True,False,False,False,False,False,False,False,False,False,True,False,False,False,False,False,False,True,False,False,False,False,False,False,False,False,False,False,False,False,True,False,False,False,False,False,False,False,False,False,False,True,False,False,False,False,False,False,False,False,False,True,False,False,False,False,False,False,False,False,False,False,True,False,False,False,False,False,False,False,False,False,False,False,False,True,False,False,False,False,False,False,False,False,False,False,False,False,True,False,False,False,False,False,False,False,False,False,False,False,False,True,False,False,False,False,False,False,False,False,False,False,False,False,True,False,False,False,False,False,False,False,False,False,False,True,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,True,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,True,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,True,False,False,False,False,False,False,False,False,False,False,True,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,True,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,True,False,False,False,False,False,False,False,False,False,False,True,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,True,False,False,False,False,False,False,False,False,False,False,False,True,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,True,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,True,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,True,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,True,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,True,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,True,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,True,False,False,False,False,False,False,True,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,True,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False
19,-0.018647,-0.730395,1.373062,0.043589,-1.301465,-0.872443,-0.490303,-0.438116,-0.140384,-0.681081,0.320406,0.444934,-0.730616,1.373062,1.028209,-0.083346,-0.094846,-0.096086,-0.305152,-0.714589,-0.432246,0.804701,0.335445,0.544809,-0.316392,-0.279575,-0.412347,-0.112562,1.368714,-0.097489,-0.435635,-0.619351,0.028337,0.149643,-0.004052,0.182934,-0.130702,-0.129852,-0.146551,-0.381974,-0.874579,-0.265601,-0.282247,0.009289,-0.492649,2.219805,-0.418085,0.635889,-1.088965,-1.125452,-1.125229,-1.126584,-1.127254,0.701644,-0.248481,-0.296052,-0.250578,-0.294834,-0.258463,-0.196618,0.734615,-0.186445,-0.239297,-0.057334,-0.058501,-0.059567,-0.060711,-0.062104,-0.063653,-0.065195,-0.066464,-0.045805,-0.04155,-0.028713,-0.021523,-0.014491,-0.011802,-0.010281,-0.004049,-0.001212,0.00064,0.0033,0.005504,0.004144,0.005717,0.006538,0.009469,0.006305,0.00306,0.008455,0.017376,0.019656,0.025612,0.031429,0.034611,-0.109542,False,True,True,True,True,False,False,False,False,False,False,False,True,True,False,False,False,False,False,False,False,False,False,True,True,False,False,False,True,True,True,True,True,False,False,False,False,False,False,False,False,False,False,True,False,False,True,False,True,False,True,False,False,False,False,False,False,True,False,False,False,True,False,False,False,False,True,False,True,False,False,False,False,False,False,False,False,False,False,True,False,False,True,False,False,True,False,False,True,False,False,True,False,False,True,False,False,True,True,False,False,True,False,False,True,False,False,True,False,False,False,False,False,True,False,False,False,True,False,False,False,True,False,False,False,True,False,False,False,True,False,False,False,True,False,False,False,True,False,False,False,True,False,False,False,True,False,False,False,False,True,False,False,False,True,False,False,True,False,False,False,True,False,False,False,True,False,False,False,True,False,False,False,False,True,False,False,True,False,False,False,False,False,True,False,False,False,False,False,False,False,False,False,False,False,False,True,False,False,False,False,False,False,False,False,True,False,True,False,False,False,False,False,False,False,True,False,False,True,False,False,False,True,False,False,False,False,True,False,False,False,False,True,False,False,False,False,True,False,False,False,False,True,False,False,False,False,True,False,False,False,False,True,False,False,False,False,True,False,False,False,False,True,False,False,False,False,True,False,False,False,False,False,False,False,False,True,False,False,False,False,True,False,False,False,False,True,False,False,False,False,False,False,False,True,False,False,False,False,False,False,False,False,False,False,True,False,False,False,False,False,True,False,False,False,False,False,False,False,True,False,False,False,False,True,False,False,True,False,False,False,False,False,False,False,False,False,False,False,True,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,True,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,True,False,False,False,False,False,False,False,True,False,False,False,False,False,False,False,False,False,False,False,False,False,False,True,False,False,False,False,False,False,False,False,False,True,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,True,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,True,False,False,False,False,False,False,False,False,True,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,True,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,True,False,False,False,False,False,False,False,False,False,False,True,False,False,False,False,False,False,False,False,False,False,True,False,False,False,False,False,False,False,False,False,False,True,False,False,False,False,False,False,False,False,False,False,True,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,True,False,False,False,False,False,False,False,False,False,False,False,False,False,True,False,False,False,False,False,False,False,False,False,True,False,False,False,False,False,False,False,True,False,False,False,False,False,False,False,False,True,False,False,False,False,False,False,False,False,False,False,True,False,False,False,False,False,False,False,False,False,False,False,True,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,True,False,False,False,False,False,False,True,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,True,False,False,False,False,False,False,False,False,False,False,False,False,True,False,False,False,False,False,False,True,False,False,False,False,False,False,False,False,False,False,True,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,True,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,True,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,True,False,False,False,False,False,False,False,False,False,False,True,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,True,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,True,False,False,False,False,False,False,False,False,False,False,True,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,True,False,False,False,False,False,False,False,False,False,False,True,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,True,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,True,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,True,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,True,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,True,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,True,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,True,False,False,False,False,False,False,True,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,True,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False
27,-0.299973,-0.268883,-0.966788,-0.132447,0.953683,-0.114943,0.269083,0.367147,-0.116796,-0.681081,-0.105704,-0.114112,-0.269068,-0.966788,0.087384,-0.083346,-0.094846,-0.090505,-0.305152,-0.714589,-0.09582,0.496061,-0.374221,-1.168725,-0.316392,-0.279575,-0.412347,-0.112562,0.098852,-0.282245,-0.440546,-0.619351,-0.063267,-0.114478,-0.056148,-0.136206,-0.130702,-0.129852,0.30941,-0.188809,0.298967,-0.265601,-0.282247,-0.069966,0.328979,0.031327,0.585649,0.147326,0.116745,0.071663,0.070826,0.069379,0.068308,-1.425262,-0.248481,-0.296052,-0.250578,-0.294834,-0.258463,-0.196618,-1.363109,-0.186445,-0.239297,0.155589,0.15629,0.156941,0.157626,0.158421,0.159285,0.160132,0.160801,0.180569,0.184823,0.195101,0.200647,0.206905,0.210372,0.214712,0.219297,0.221744,0.226526,0.228416,0.230787,0.232913,0.235029,0.237449,0.240402,0.242887,0.00306,0.008455,0.017376,0.019656,0.025612,0.031429,0.034611,-0.802919,False,True,True,False,True,True,False,False,False,False,False,False,True,True,False,False,False,False,False,False,False,False,False,True,True,False,False,False,True,True,True,True,True,False,True,False,False,False,True,False,False,False,False,True,False,False,True,False,True,True,False,False,False,False,True,False,True,True,False,False,True,False,True,False,False,True,False,True,False,False,False,True,False,False,False,False,False,False,False,True,False,False,True,False,False,True,False,False,True,False,False,True,False,False,True,False,False,True,True,False,False,True,False,False,False,False,True,False,True,False,False,False,False,False,True,False,False,True,False,False,False,True,False,False,False,True,False,False,False,True,False,False,False,True,False,False,False,True,False,False,False,True,False,False,False,True,False,False,False,False,True,False,False,True,False,False,False,True,False,False,False,True,False,False,False,True,False,False,True,False,False,False,False,True,False,False,False,True,False,True,False,False,False,True,False,False,False,False,True,False,False,False,False,False,False,False,False,False,True,False,False,False,False,False,False,True,False,False,False,True,False,False,False,False,False,True,False,False,True,False,False,False,False,False,False,True,False,False,False,False,True,False,False,False,False,True,False,False,False,False,True,False,False,False,False,True,False,False,False,False,True,False,False,False,False,True,False,False,False,False,True,False,False,False,False,True,False,False,False,True,False,False,False,False,False,False,True,False,False,False,False,True,False,False,False,False,True,False,False,False,False,False,False,False,True,False,False,False,False,False,False,False,False,True,False,False,False,False,True,False,False,False,False,False,True,False,False,False,False,False,False,True,False,False,False,False,False,False,True,False,False,False,False,False,False,False,False,False,False,True,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,True,False,False,False,False,False,False,False,True,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,True,False,False,False,False,False,False,True,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,True,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,True,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,True,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,True,False,False,False,False,False,False,False,False,True,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,True,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,True,False,False,False,False,False,False,False,False,True,False,False,False,False,False,False,False,False,True,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,True,False,False,False,False,False,False,False,True,False,False,False,False,False,False,False,False,False,True,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,True,False,False,False,False,False,False,False,False,False,True,False,False,False,False,False,False,False,False,False,False,False,True,False,False,False,False,False,False,False,False,True,False,False,False,False,False,False,False,False,True,False,False,False,False,False,False,False,False,False,False,True,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,True,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,True,False,True,False,False,False,False,False,False,False,False,False,True,False,False,False,False,False,False,False,False,False,False,False,False,False,False,True,False,False,False,False,False,False,False,False,False,True,False,False,False,False,False,False,False,False,True,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,True,False,False,False,False,False,False,False,False,False,True,False,False,False,False,False,False,False,False,False,False,False,False,True,False,False,False,False,False,False,False,False,False,False,False,False,True,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,True,False,False,False,False,False,False,False,True,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,True,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,True,False,False,False,False,False,False,False,False,False,True,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,True,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,True,False,False,False,False,False,False,False,True,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,True,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,True,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,True,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,True,False,False,False,False,False,False,False,False,False,False,False,True,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,True,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,True,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,True,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,True,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False


## Fitting the lasso model

Lets move on to fitting the lasso model!

In [40]:
# Build and run Lasso, then map nonzero coefficients back to variable labels (removed BaselineCategory)
from sklearn.linear_model import LassoCV
import numpy as np
import pandas as pd

# Fit Lasso with cross-validation
lasso = LassoCV(cv=5, random_state=0, max_iter=10000)
lasso.fit(X, Y)

# Get nonzero coefficients sorted by absolute value
coefs = pd.Series(lasso.coef_, index=X.columns)
nonzero_coefs = coefs[coefs != 0].reindex(coefs[coefs != 0].abs().sort_values(ascending=False).index)

# Create label mapping from Stata metadata
label_map = dict(zip(meta.column_names, meta.column_labels)) if hasattr(meta, 'column_labels') else {}
original_vars = set(meta.column_names) if hasattr(meta, 'column_names') else set()

def parse_variable(varname):
    """Parse variable to get base variable, category, and labels (no baseline category)."""
    # Find base variable and category
    if varname in original_vars:
        base_var, category = varname, ''
    elif '_' in varname:
        base, cat = varname.rsplit('_', 1)
        if base in original_vars:
            base_var, category = base, cat
        else:
            # Try progressively shorter prefixes for variables with underscores
            parts = varname.split('_')
            base_var, category = varname, ''
            for i in range(len(parts) - 1, 0, -1):
                potential_base = '_'.join(parts[:i])
                if potential_base in original_vars:
                    base_var, category = potential_base, '_'.join(parts[i:])
                    break
    else:
        base_var, category = varname, ''

    # Get variable label
    variable_label = label_map.get(base_var, base_var)

    # Get category label from value labels
    category_label = ''
    if category and hasattr(meta, 'variable_value_labels') and base_var in meta.variable_value_labels:
        value_dict = meta.variable_value_labels[base_var]
        for cat_type in [int, float, str]:
            try:
                cat_key = cat_type(category)
                if cat_key in value_dict:
                    category_label = value_dict[cat_key]
                    break
            except (ValueError, TypeError):
                continue

    return base_var, variable_label, category, category_label

# Build results table (no BaselineCategory)
results_data = []
for var in nonzero_coefs.index:
    base_var, variable_label, category, category_label = parse_variable(var)
    results_data.append({
        'Variable': var,
        'Label': variable_label,
        'Category': category,
        'CategoryLabel': category_label,
        'Coefficient': nonzero_coefs[var]
    })

# Display results with formatted CategoryLabel and wider Coefficient display
result_df = pd.DataFrame(results_data)
pd.set_option('display.max_rows', None)
print(f"Lasso found {len(result_df)} significant variables (nonzero coefficients):")

# Prepare display copy: truncate long category labels and format coefficients for readability
max_cat_len = 40  # max characters to show for category labels
result_df['CategoryLabel'] = result_df['CategoryLabel'].astype(str).apply(
    lambda s: s if len(s) <= max_cat_len else s[:max_cat_len-3] + '...'
)

# Keep numeric coefficient for downstream use, but create a formatted display version
result_df['Coefficient'] = result_df['Coefficient'].astype(float)
result_df_display = result_df.copy()
result_df_display['Coefficient'] = result_df_display['Coefficient'].map(lambda v: f"{v: .6f}")

# Show a compact, clearly formatted table
display_cols = ['Variable', 'Label', 'Category', 'CategoryLabel', 'Coefficient']
pd.set_option('display.max_colwidth', 50)
display(result_df_display[display_cols])

Lasso found 206 significant variables (nonzero coefficients):


Unnamed: 0,Variable,Label,Category,CategoryLabel,Coefficient
0,sf12mcs_dv,SF-12 Mental Component Summary (MCS),,,-3.450475
1,sclfsato_7,Satisfaction with life overall,7.0,completely satisfied,-2.429473
2,finnow_5,Subjective financial situation - current,5.0,Finding it very difficult,2.008797
3,sclfsato_6,Satisfaction with life overall,6.0,mostly satisfied,-1.895336
4,sclfsato_5,Satisfaction with life overall,5.0,somewhat satisfied,-1.239503
5,finnow_4,Subjective financial situation - current,4.0,Finding it quite difficult,1.033134
6,sf12pcs_dv,SF-12 Physical Component Summary (PCS),,,-0.964695
7,sclfsato_4,Satisfaction with life overall,4.0,Neither Sat nor Dissat,-0.691199
8,finfut_2,Subjective financial situation - future,2.0,Worse of than now,0.572704
9,istrtdaty_2011,Individual interview start date (year),2011.0,,-0.381945


## Analysis

**Very interesting outcome! Here are my key takeaways:**

Already included in my refined variable list:

- Neighbourhood social cohesion (nbrsnci_dv) is confirmed to be correlated with subjective well-being (scghq1_dv), as expected
- Demographics are important (e.g. age, gender, ethnicity)
- Education is important (highest qualification)
- Current economic activity is important

Variables to consider adding:
- Subjective financial situation (finnow) (finfut) -- clearly subjective financial strain is a key predictor of well-being
- hhsize (household size)
- tenure_dv (housing tenure)

Variables I could consider deleting from my refined variable list:
- Job industry (not present here, and will be annoying due to lots of dummies)

Noteworthy variables:
- sclfsat* variables (e.g. satisfaction with life, health, income) are strong predictors of wellbeing. But they are effecitvely alternative wellbeing outcomes. If we include them, we risk circularity (i.e. we are predicting wellbeing with other measures of wellbeing). So I will exclude them
- sf12mcs (SF-12 mental health score) and sf12pcs (SF-12 physical health score) are strong predictors of wellbeing. They are also arguably alternative measures of wellbeing, so if we include them we risk circularity (over-control).

**Final list of variables to include in my refined variable list:**

- pidp : Cross-wave person identifier (public release)
- wave : 
- wave_num : 
- hidp : Household identifier (public release)

- nbrsnci_dv : Buckner's Neighbourhood Cohesion Instrument (**X**)
- gor_dv : Government Office Region
- urban_dv : Urban or rural area, derived
- sex_dv : Sex, derived
- age_dv : Age, derived from dob_dv and intdat_dv
- ethn_dv : Ethnic group (derived from multiple sources)
- marstat_dv : Harmonised de facto marital status
- jbstat : Current economic activity
- qfhigh_dv : Highest educational qualification ever reported
- fimnnet_dv : total net personal income - no deductions
- fihhmnnet1_dv : total household net income - no deductions
- houscost1_dv : monthly housing cost including mortgage principal payments
- health : Long-standing illness or disability
- scghq1_dv : Subjective wellbeing (GHQ): Likert (**Y**)


