# Introduction

This notebook will have a look at the large processed dataset (343 vars, 175,335 obs), and run a lasso regression to see which variables are most predictive of subjective wellbeing (measured on the Likert scale).

Before starting, these are the 15 variables I hand selected:

- gor_dv : Government Office Region
- urban_dv : Urban or rural area, derived
- nbrsnci_dv : Buckner's Neighbourhood Cohesion Instrument, short (α= .88)
- scghq1_dv : Subjective wellbeing (GHQ): Likert
- sex_dv : Sex, derived
- age_dv : Age, derived from dob_dv and intdat_dv
- ethn_dv : Ethnic group (derived from multiple sources)
- marstat_dv : Harmonised de facto marital status
- jbstat : Current economic activity
- qfhigh_dv : Highest educational qualification ever reported
- jbiindb_dv : Current job: Industrial classification (CNEF), two digits
- fimnnet_dv : total net personal income - no deductions
- fihhmnnet1_dv : total household net income - no deductions
- houscost1_dv : monthly housing cost including mortgage principal payments
- health : Long-standing illness or disability

In [1]:
import pandas as pd
pd.set_option('display.max_rows', None)
import matplotlib.pyplot as plt
%matplotlib inline
import pyreadstat

# Make 'head' scroll across the width of the screen
pd.set_option('display.max_columns', None)

In [2]:
# Import file

path = "/Users/arikatz/VSCode Projects/ukhls-informal-institutions-project/data/droppedvaraibleswithlotsofnegatives.dta"
df_full, meta = pyreadstat.read_dta(path)

print("Shape:", df_full.shape)
df_full.head()

Shape: (175335, 343)


Unnamed: 0,pidp,wave,wave_num,nbrsnci_dv,scghq1_dv,hidp,pno,hhorig,memorig,psu,strata,sampst,month,ivfio,ioutcome,sex,dvage,birthy,istrtdatd,istrtdatm,istrtdaty,lkmove,xpmove,jbstat,racel_dv,health,aidhh,aidxhh,j2has,bensta2,bensta3,bensta4,bensta5,bensta6,bensta7,bensta96,fiyrdia,finnow,finfut,vote1,vote6,mobuse,nch14resp,nch415resp,nchresp,nnatch,nadoptch,nchunder16,nch5to15,nch10to15,sclfsat1,sclfsat2,sclfsat7,sclfsato,marstat,employ,hgbiom,hgbiof,hgpart,respf16,respm16,intdatd_if,intdatm_if,intdaty_if,doby_if,age_if,pn1pno,pn2pno,pns1pno,pns2pno,hhsize,jbhas,istrtdathh,istrtdatmm,istrtdatss,ienddathh,ienddatmm,ienddatss,j2pay_if,fimngrs_tc,fimngrs_dv,fimnlabgrs_tc,fimnlabgrs_dv,fimnlabnet_tc,fimnlabnet_dv,fiyrinvinc_tc,fiyrinvinc_dv,fibenothr_tc,fibenothr_dv,j2pay_dv,j2paynet_dv,sex_dv,age_dv,intdatd_dv,intdatm_dv,intdaty_dv,doby_dv,pensioner_dv,npensioner_dv,marstat_dv,npn_dv,npns_dv,ngrp_dv,nnsib_dv,nnssib_dv,ethn_dv,fimnmisc_dv,fimnprben_dv,fimninvnet_dv,fimnpen_dv,fimnsben_dv,fimnnet_dv,country,gor_dv,urban_dv,hhresp_dv,xtra5min_dv,agegr5_dv,agegr10_dv,agegr13_dv,livesp_dv,cohab_dv,single_dv,mastat_dv,hhtype_dv,buno_dv,depchl_dv,nchild_dv,ndepchl_dv,respm16_dv,respf16_dv,rach16_dv,hrpid,hrpno,ppno,sppno,fnpno,fnspno,mnpno,mnspno,grfpno,grmpno,qfhigh_dv,qfhighfl_dv,hiqual_dv,jbiindb_dv,sf12pcs_dv,sf12mcs_dv,scflag_dv,paygu_if,paynu_if,seearngrs_if,fiyrinvinc_if,fibenothr_if,fimnlabgrs_if,fimngrs_if,ind5mus_xw,ivfho,intdated,intdatem,intdatey,ivh1,ivh2,ivh3,ivh4,ivh5,ivh6,ivh7,ivh8,ivh9,ivh10,ivh11,ivh12,ivh13,ivh14,ivh15,ivh16,hsbeds,hsrooms,hsownd,fuelhave1,fuelhave2,fuelhave3,fuelhave4,fuelhave96,fuelduel,heatch,xphsdct,xphsdba,cduse1,cduse2,cduse5,cduse6,cduse7,cduse8,cduse9,cduse12,cduse13,cduse96,pcnet,xpfood1_g3,xpfdout_g3,xpaltob_g3,ncars,hhintlang,n10to15,fihhmngrs_dv,fihhmngrs_tc,fihhmnlabgrs_dv,fihhmnlabgrs_tc,ctband_if,fihhmnnet1_dv,fihhmnlabnet_dv,fihhmnmisc_dv,fihhmnprben_dv,fihhmninv_dv,fihhmnpen_dv,fihhmnsben_dv,houscost1_dv,houscost2_dv,fihhmngrs1_dv,ctband_dv,ncouple_dv,nonepar_dv,nkids_dv,nch02_dv,nch34_dv,nch511_dv,nch1215_dv,npens_dv,nemp_dv,nue_dv,nwage_dv,nchoecd_dv,nadoecd_dv,ieqmoecd_dv,tenure_dv,fihhnegsei_if,fihhmngrs_if,issue_num,aintlen,outcome,ivtnc,w6osmflag,dcsedfl_dv,lwenum_dv,fwenum_dv,lwintvd_dv,fwintvd_dv,b_hidp,b_pno,b_ivfio,b_ivfho,b_month,c_hidp,c_pno,c_ivfio,c_ivfho,c_month,d_hidp,d_pno,d_ivfio,d_ivfho,d_month,e_hidp,e_pno,e_ivfio,e_ivfho,e_month,f_hidp,f_pno,f_ivfio,f_ivfho,f_month,g_hidp,g_pno,g_ivfio,g_ivfho,g_month,h_hidp,h_pno,h_ivfio,h_ivfho,h_month,i_hidp,i_pno,i_ivfio,i_ivfho,i_month,genetics,epigenetics,xwdat_dv,scend_dv,school_dv,bornuk_dv,generation,evercoh_dv,evermar_dv,anychild_dv,ethn_dv_source,prob91e,prob91w,prob91s,prob99w,prob99s,prob01ni,prob09ni,prob09e,prob09w,prob09s,bb_mortbh_tw,bc_mortbh_tw,bd_mortbh_tw,be_mortbh_tw,bf_mortbh_tw,bg_mortbh_tw,bh_mortbh_tw,bi_mortbh_tw,bj_mortbh_tw,bk_mortbh_tw,bl_mortbh_tw,bm_mortbh_tw,bn_mortbh_tw,bo_mortbh_tw,bp_mortbh_tw,bq_mortbh_tw,br_mortbh_tw,b_mortbh_tw,c_mortbh_tw,d_mortbh_tw,e_mortbh_tw,f_mortbh_tw,g_mortbh_tw,h_mortbh_tw,i_mortbh_tw,b_mortus_tw,c_mortus_tw,d_mortus_tw,e_mortus_tw,f_mortus_tw,g_mortus_tw,h_mortus_tw,psnenub_xd
0,22445,f,6,3.4,25,278664010,3,3,3,4,2,1,6,1,11,2,29,1984,26,6,2014,2,1,2,1,2,2,2,1,0,0,0,0,0,0,1,0,2,2,1,2,1,0,0,0,0,0,0,0,0,2,5,2,3,1,1,1,0,0,0,0,0,0,0,0,0,1,0,1,0,2,1,18,16,57,19,7,12,0,0,2572.590088,0,2572.590088,0,2012.0,0,0.0,0,0.0,90,72.0,2,29,26,6,2014,1984,2,1,6,1,1,0,0,0,1,0.0,0.0,0.0,0.0,0.0,2012.0,1,7,1,1,0,6,3,5,0,0,1,1,17,3,2,0,0,2,2,2,272012925,1,0,0,0,0,1,1,0,0,-8,0,3,31,62.12,32.59,1,0,0,0,0,0.0,0.0,0.0,0.0,14,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,0,2,14,4,14,4,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,280942006,5,1,10,4,279255608,5,1,10,4,278664010,3,1,14,6,278447092,1,1,10,6,278092814,1,1,10,6,277344816,1,1,10,6,0,0,3,-8,3,1,5,1,1,1,1,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.999978,0.999948,0.99992,0.99989,0.999854,0.999813,0.999772,0.999738,0.999689,0.999649,0.999609,0.999566,0.999452,0.999389,0.999288,0.999219,0.999144,0.999005,0.998933,0.99884,0.998742,0.998624,0.998511,0.998397,0.998219,1.0,1.0,1.0,1.0,1.0,1.0,1.0,0.0
1,22445,i,9,3.3,11,277344816,1,3,3,4,2,1,6,1,11,2,33,1984,23,10,2017,2,2,2,1,2,-8,2,2,0,0,0,0,0,0,1,0,2,3,1,1,1,0,0,0,0,0,0,0,0,4,4,4,5,2,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,1,18,50,38,19,28,40,0,0,2423.030029,0,2333.330078,0,1200.0,0,0.0,0,89.699997,0,0.0,2,33,23,10,2017,1984,2,0,1,0,0,0,0,0,1,0.0,0.0,0.0,0.0,89.699997,1289.699951,1,7,1,1,0,7,4,6,0,0,1,2,3,1,2,0,-8,2,2,2,22445,1,0,0,0,0,0,0,0,0,-8,0,3,31,57.2,46.08,1,0,0,0,0,0.0,0.0,0.0,0.0,10,23.0,10.0,2017.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,3.0,3.0,2.0,1.0,1.0,0.0,0.0,0.0,1.0,1.0,2.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,0.0,1.0,300.0,150.0,30.0,1.0,0.0,0.0,2423.030029,0.0,2333.330078,0.0,1.0,1289.699951,1200.0,0.0,0.0,0.0,0.0,89.699997,1300.0,736.869995,2423.030029,4.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,1.0,0.0,1.0,1.0,2.0,0.0,0.0,1.0,10.0,110.0,9.0,0,2,14,4,14,4,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,280942006,5,1,10,4,279255608,5,1,10,4,278664010,3,1,14,6,278447092,1,1,10,6,278092814,1,1,10,6,277344816,1,1,10,6,0,0,3,-8,3,1,5,1,1,1,1,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.999978,0.999948,0.99992,0.99989,0.999854,0.999813,0.999772,0.999738,0.999689,0.999649,0.999609,0.999566,0.999452,0.999389,0.999288,0.999219,0.999144,0.999005,0.998933,0.99884,0.998742,0.998624,0.998511,0.998397,0.998219,1.0,1.0,1.0,1.0,1.0,1.0,1.0,0.0
2,22445,l,12,1.6,32,276637622,1,3,3,4,2,1,4,1,11,2,35,1984,2,4,2020,2,1,6,1,2,2,2,2,0,0,0,0,0,0,1,0,4,1,1,2,1,2,1,2,2,0,2,0,0,5,3,3,5,2,2,0,0,2,0,1,0,0,0,0,0,0,0,0,0,4,2,21,1,46,21,17,32,0,0,145.169998,0,0.0,0,0.0,0,0.0,0,145.169998,0,0.0,2,35,2,4,2020,1984,2,0,1,0,0,0,0,0,1,0.0,0.0,0.0,0.0,145.169998,145.169998,1,7,1,1,0,8,4,7,1,0,0,2,11,1,2,2,2,1,2,1,276841780,1,2,2,0,0,0,0,0,0,1,0,1,0,67.18,19.42,0,0,0,0,0,1.0,0.0,1.0,0.0,10,2.0,4.0,2020.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,3.0,3.0,2.0,1.0,1.0,1.0,0.0,0.0,1.0,2.0,2.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,0.0,1.0,300.0,100.0,100.0,0.0,0.0,0.0,5656.390137,0.0,5070.0,0.0,1.0,4146.390137,3560.0,350.0,0.0,0.0,0.0,236.389999,1350.0,705.679993,5656.390137,4.0,1.0,0.0,2.0,1.0,1.0,0.0,0.0,0.0,1.0,1.0,2.0,2.0,2.0,2.1,2.0,0.0,0.0257,1.0,9.35,110.0,-9.0,0,2,14,4,14,4,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,280942006,5,1,10,4,279255608,5,1,10,4,278664010,3,1,14,6,278447092,1,1,10,6,278092814,1,1,10,6,277344816,1,1,10,6,0,0,3,-8,3,1,5,1,1,1,1,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.999978,0.999948,0.99992,0.99989,0.999854,0.999813,0.999772,0.999738,0.999689,0.999649,0.999609,0.999566,0.999452,0.999389,0.999288,0.999219,0.999144,0.999005,0.998933,0.99884,0.998742,0.998624,0.998511,0.998397,0.998219,1.0,1.0,1.0,1.0,1.0,1.0,1.0,0.0
3,29925,f,6,4.1,11,620547610,1,3,3,6,2,1,8,1,11,2,37,1977,29,9,2014,1,1,1,1,1,2,2,2,0,0,1,0,0,0,0,0,4,1,2,2,1,2,1,2,2,0,2,0,0,3,2,5,4,4,1,0,0,0,0,1,0,0,0,0,0,0,0,0,0,3,1,10,0,47,10,48,3,0,0,2175.620117,0,13.82,0,13.82,0,0.0,0,2161.800049,0,0.0,2,37,29,9,2014,1977,2,0,5,0,0,0,0,0,1,0.0,320.0,0.0,0.0,1841.800049,2175.620117,1,7,1,1,0,8,4,7,0,0,1,4,5,1,2,2,2,1,2,1,29925,1,0,0,0,0,0,0,0,0,-8,0,1,30,56.59,35.67,1,0,0,1,0,0.04,1.0,0.05,0.0,10,29.0,9.0,2014.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,2.0,1.0,4.0,1.0,1.0,0.0,0.0,0.0,1.0,1.0,2.0,1.0,1.0,1.0,1.0,1.0,0.0,1.0,1.0,1.0,1.0,0.0,1.0,350.0,30.0,0.0,1.0,0.0,0.0,2175.620117,0.0,13.82,0.0,3.0,2175.620117,13.82,0.0,320.0,0.0,0.0,1841.800049,1451.0,1451.0,2175.620117,2.0,0.0,1.0,2.0,0.0,2.0,0.0,0.0,0.0,1.0,0.0,1.0,2.0,1.0,1.6,7.0,0.0,0.0451,1.0,10.0,110.0,3.0,0,2,14,4,14,4,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,622866406,3,2,11,9,621384688,1,50,61,9,620547610,1,1,10,8,620316412,1,1,10,8,619935614,1,1,10,8,619024416,1,1,10,8,0,0,3,-8,3,1,5,1,1,1,2,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.999955,0.999895,0.999839,0.999777,0.999704,0.999622,0.999538,0.99947,0.999369,0.999289,0.999208,0.99912,0.99889,0.998761,0.998557,0.998418,0.998266,0.997985,0.997838,0.997649,0.997451,0.997212,0.996983,0.996752,0.996392,1.0,1.0,1.0,1.0,1.0,1.0,1.0,0.0
4,29925,i,9,3.5,9,619024416,1,3,3,6,2,1,8,1,11,2,40,1977,22,8,2017,1,2,2,1,1,2,2,2,0,0,1,0,0,0,0,0,4,3,2,1,1,2,2,2,2,0,2,2,0,6,3,6,5,5,1,0,0,0,0,1,0,0,0,0,0,0,0,0,0,3,2,20,39,52,21,37,24,0,0,3054.530029,0,1400.0,0,1250.0,0,0.0,0,1654.530029,0,0.0,2,40,22,8,2017,1977,2,0,4,0,0,0,0,0,1,0.0,1000.0,0.0,0.0,654.530029,2904.530029,1,7,1,1,0,9,5,8,0,0,1,5,5,1,2,2,2,1,2,1,622866606,1,0,0,0,0,0,0,0,0,-8,0,1,27,62.04,41.06,0,0,0,0,0,0.0,0.0,0.0,0.0,10,22.0,8.0,2017.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,2.0,1.0,1.0,1.0,1.0,0.0,0.0,0.0,1.0,1.0,2.0,1.0,1.0,1.0,1.0,1.0,0.0,1.0,1.0,1.0,1.0,0.0,1.0,300.0,100.0,0.0,1.0,0.0,0.0,3054.530029,0.0,1400.0,0.0,2.0,2904.530029,1250.0,0.0,1000.0,0.0,0.0,654.530029,0.0,0.0,3054.530029,4.0,0.0,1.0,2.0,0.0,0.0,2.0,0.0,0.0,1.0,0.0,1.0,2.0,1.0,1.6,1.0,0.0,0.0011,1.0,17.0,110.0,-9.0,0,2,14,4,14,4,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,622866406,3,2,11,9,621384688,1,50,61,9,620547610,1,1,10,8,620316412,1,1,10,8,619935614,1,1,10,8,619024416,1,1,10,8,0,0,3,-8,3,1,5,1,1,1,2,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.999955,0.999895,0.999839,0.999777,0.999704,0.999622,0.999538,0.99947,0.999369,0.999289,0.999208,0.99912,0.99889,0.998761,0.998557,0.998418,0.998266,0.997985,0.997838,0.997649,0.997451,0.997212,0.996983,0.996752,0.996392,1.0,1.0,1.0,1.0,1.0,1.0,1.0,0.0


To speed things up, we'll only use observations from wave c (the largest wave, with 40,509 obs)

In [3]:
# Drop all rows with non-C wave observations
df = df_full[df_full['wave'] == 'c']
print("Shape after dropping non-C wave observations:", df.shape)
df.head(20)

Shape after dropping non-C wave observations: (40509, 343)


Unnamed: 0,pidp,wave,wave_num,nbrsnci_dv,scghq1_dv,hidp,pno,hhorig,memorig,psu,strata,sampst,month,ivfio,ioutcome,sex,dvage,birthy,istrtdatd,istrtdatm,istrtdaty,lkmove,xpmove,jbstat,racel_dv,health,aidhh,aidxhh,j2has,bensta2,bensta3,bensta4,bensta5,bensta6,bensta7,bensta96,fiyrdia,finnow,finfut,vote1,vote6,mobuse,nch14resp,nch415resp,nchresp,nnatch,nadoptch,nchunder16,nch5to15,nch10to15,sclfsat1,sclfsat2,sclfsat7,sclfsato,marstat,employ,hgbiom,hgbiof,hgpart,respf16,respm16,intdatd_if,intdatm_if,intdaty_if,doby_if,age_if,pn1pno,pn2pno,pns1pno,pns2pno,hhsize,jbhas,istrtdathh,istrtdatmm,istrtdatss,ienddathh,ienddatmm,ienddatss,j2pay_if,fimngrs_tc,fimngrs_dv,fimnlabgrs_tc,fimnlabgrs_dv,fimnlabnet_tc,fimnlabnet_dv,fiyrinvinc_tc,fiyrinvinc_dv,fibenothr_tc,fibenothr_dv,j2pay_dv,j2paynet_dv,sex_dv,age_dv,intdatd_dv,intdatm_dv,intdaty_dv,doby_dv,pensioner_dv,npensioner_dv,marstat_dv,npn_dv,npns_dv,ngrp_dv,nnsib_dv,nnssib_dv,ethn_dv,fimnmisc_dv,fimnprben_dv,fimninvnet_dv,fimnpen_dv,fimnsben_dv,fimnnet_dv,country,gor_dv,urban_dv,hhresp_dv,xtra5min_dv,agegr5_dv,agegr10_dv,agegr13_dv,livesp_dv,cohab_dv,single_dv,mastat_dv,hhtype_dv,buno_dv,depchl_dv,nchild_dv,ndepchl_dv,respm16_dv,respf16_dv,rach16_dv,hrpid,hrpno,ppno,sppno,fnpno,fnspno,mnpno,mnspno,grfpno,grmpno,qfhigh_dv,qfhighfl_dv,hiqual_dv,jbiindb_dv,sf12pcs_dv,sf12mcs_dv,scflag_dv,paygu_if,paynu_if,seearngrs_if,fiyrinvinc_if,fibenothr_if,fimnlabgrs_if,fimngrs_if,ind5mus_xw,ivfho,intdated,intdatem,intdatey,ivh1,ivh2,ivh3,ivh4,ivh5,ivh6,ivh7,ivh8,ivh9,ivh10,ivh11,ivh12,ivh13,ivh14,ivh15,ivh16,hsbeds,hsrooms,hsownd,fuelhave1,fuelhave2,fuelhave3,fuelhave4,fuelhave96,fuelduel,heatch,xphsdct,xphsdba,cduse1,cduse2,cduse5,cduse6,cduse7,cduse8,cduse9,cduse12,cduse13,cduse96,pcnet,xpfood1_g3,xpfdout_g3,xpaltob_g3,ncars,hhintlang,n10to15,fihhmngrs_dv,fihhmngrs_tc,fihhmnlabgrs_dv,fihhmnlabgrs_tc,ctband_if,fihhmnnet1_dv,fihhmnlabnet_dv,fihhmnmisc_dv,fihhmnprben_dv,fihhmninv_dv,fihhmnpen_dv,fihhmnsben_dv,houscost1_dv,houscost2_dv,fihhmngrs1_dv,ctband_dv,ncouple_dv,nonepar_dv,nkids_dv,nch02_dv,nch34_dv,nch511_dv,nch1215_dv,npens_dv,nemp_dv,nue_dv,nwage_dv,nchoecd_dv,nadoecd_dv,ieqmoecd_dv,tenure_dv,fihhnegsei_if,fihhmngrs_if,issue_num,aintlen,outcome,ivtnc,w6osmflag,dcsedfl_dv,lwenum_dv,fwenum_dv,lwintvd_dv,fwintvd_dv,b_hidp,b_pno,b_ivfio,b_ivfho,b_month,c_hidp,c_pno,c_ivfio,c_ivfho,c_month,d_hidp,d_pno,d_ivfio,d_ivfho,d_month,e_hidp,e_pno,e_ivfio,e_ivfho,e_month,f_hidp,f_pno,f_ivfio,f_ivfho,f_month,g_hidp,g_pno,g_ivfio,g_ivfho,g_month,h_hidp,h_pno,h_ivfio,h_ivfho,h_month,i_hidp,i_pno,i_ivfio,i_ivfho,i_month,genetics,epigenetics,xwdat_dv,scend_dv,school_dv,bornuk_dv,generation,evercoh_dv,evermar_dv,anychild_dv,ethn_dv_source,prob91e,prob91w,prob91s,prob99w,prob99s,prob01ni,prob09ni,prob09e,prob09w,prob09s,bb_mortbh_tw,bc_mortbh_tw,bd_mortbh_tw,be_mortbh_tw,bf_mortbh_tw,bg_mortbh_tw,bh_mortbh_tw,bi_mortbh_tw,bj_mortbh_tw,bk_mortbh_tw,bl_mortbh_tw,bm_mortbh_tw,bn_mortbh_tw,bo_mortbh_tw,bp_mortbh_tw,bq_mortbh_tw,br_mortbh_tw,b_mortbh_tw,c_mortbh_tw,d_mortbh_tw,e_mortbh_tw,f_mortbh_tw,g_mortbh_tw,h_mortbh_tw,i_mortbh_tw,b_mortus_tw,c_mortus_tw,d_mortus_tw,e_mortus_tw,f_mortus_tw,g_mortus_tw,h_mortus_tw,psnenub_xd
22,1558565,c,3,3.5,5,828349484,1,3,3,222,67,1,12,1,11,2,18,1993,20,12,2011,2,1,7,1,1,-8,2,2,0,0,0,0,0,0,1,0,1,3,2,4,1,0,0,0,0,0,0,0,0,1,1,3,3,1,2,0,0,-9,2,2,0,0,0,0,0,0,0,0,0,1,2,12,16,54,13,12,0,0,0,702.0,0,0.0,0,0.0,0,0.0,0,702.0,0,0.0,2,18,20,12,2011,1993,2.0,0.0,6,0,0,0,0,0,1,130.0,0.0,0.0,0.0,572.0,702.0,2,10,1,1,0,4,2,3,0,0,1,1,3,1,2,0,-8,2,2,2,1558565.0,1.0,0,0,0,0,0,0,0,0,13,1,4,0,50.14,48.43,1,0,0,0,0,0.0,0.0,0.0,0.0,10,20,12,2011,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,2,1,4,1,1,0,0,0,2,1,-8,1,1,1,1,1,0,0,1,0,1,0,2,50,0,10,0,0,0,702.0,0,0.0,0,1,702.0,0.0,130.0,0.0,0.0,0.0,572.0,130.0,130.0,702.0,1,0,0,0,0,0,0,0,0,0,1,1,0,1,1.0,3,0,0.0,1,64.0,110,1,0,2,3,2,3,2,852386802,3,1,10,12,828349484,1,1,10,12,827560006,1,50,61,12,827560006,-9,82,93,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,0,0,3,16,1,1,6,2,2,2,1,0.0,0.000342,0.0,0.001349,0.0,0.0,0.0,0.0,0.0,0.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,591.265015
27,1833965,c,3,3.4,8,757615204,3,3,3,46,12,3,11,1,11,1,46,1965,7,12,2011,2,2,2,1,1,2,2,2,0,0,0,0,0,0,1,-1,3,1,2,2,1,0,0,0,0,0,0,0,0,3,2,2,3,1,1,1,2,-9,2,2,0,0,0,0,0,1,2,1,2,3,1,20,26,46,21,31,13,0,0,1666.72998,0,1660.0,0,1325.660034,0,80.760002,0,0.0,0,0.0,1,46,7,12,2011,1965,2.0,2.0,6,2,2,0,0,0,1,0.0,0.0,6.73,0.0,0.0,1332.390015,1,8,2,2,0,10,5,9,0,0,1,1,19,3,2,0,0,2,2,2,748184965.0,1.0,0,0,2,2,1,1,0,0,-8,0,5,20,45.33,37.91,1,0,0,0,1,0.0,0.0,0.0,0.0,11,16,11,2011,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,3,3,1,1,0,0,0,0,-8,1,2,1,1,1,1,1,1,1,1,1,1,0,1,255,-1,0,1,0,0,3313.25,0,1660.0,0,1,2978.909912,1325.660034,0.0,0.0,726.039978,166.710007,760.5,0.0,0.0,3313.25,4,1,0,0,0,0,0,0,2,1,2,1,0,3,2.0,1,0,0.2123,1,124.0,210,3,0,2,11,2,9,2,782088402,3,1,11,11,757615204,3,1,11,11,756846806,3,2,11,11,755412008,2,1,10,11,754766010,2,1,10,10,754494012,1,1,10,10,754310414,1,1,10,10,753508016,1,1,10,10,0,0,3,16,1,1,5,2,2,1,4,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,0.0
35,2670365,c,3,2.1,11,692348804,3,5,5,403,101,1,10,1,11,2,17,1993,19,10,2011,1,2,2,1,2,2,2,2,0,0,0,0,0,0,1,0,2,1,2,4,1,0,0,0,0,0,0,0,0,3,2,7,5,1,1,0,1,-9,2,2,0,0,0,0,0,1,0,1,2,5,1,19,42,20,20,24,1,0,0,346.670013,0,346.670013,0,346.670013,0,0.0,0,0.0,0,0.0,2,17,19,10,2011,1993,2.0,0.0,6,1,2,0,0,2,1,0.0,0.0,0.0,0.0,0.0,346.670013,2,10,1,1,0,4,2,2,0,0,1,1,20,3,2,0,0,2,2,2,717013162.0,2.0,0,0,1,1,0,2,0,0,13,1,3,18,58.06,35.4,1,0,0,0,0,0.0,0.0,0.0,0.0,10,19,10,2011,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,4,1,2,1,1,0,0,0,1,1,2,1,1,1,1,1,1,1,1,1,1,0,1,300,10,10,3,0,1,4453.830078,0,4253.830078,0,1,3676.669922,3476.669922,0.0,0.0,0.0,0.0,200.0,445.0,222.229996,4453.830078,3,1,0,1,0,0,0,1,0,4,0,4,1,4,2.8,2,0,0.0,1,256.0,110,3,0,2,3,2,3,2,717013082,4,1,10,10,692348804,3,1,10,10,691363486,1,53,50,10,689703608,1,53,50,10,688683610,1,83,50,-9,688296012,1,83,96,12,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,0,0,3,16,1,1,6,2,2,2,1,0.0,0.000342,0.0,0.001349,0.0,0.0,0.0,0.0,0.0,0.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,0.999994,0.999991,0.999987,0.999983,0.999974,0.99997,0.999965,0.999958,0.99995,0.999941,0.999932,0.999923,0.999917,0.999903,0.999892,0.999878,0.999864,1.0,1.0,1.0,1.0,1.0,1.0,1.0,591.265015
36,2853965,c,3,2.8,14,692805764,1,4,4,503,131,1,10,1,11,2,29,1982,21,12,2011,1,1,2,1,2,2,2,2,0,0,0,0,0,0,1,0,4,1,2,2,1,0,0,0,0,0,0,0,0,6,3,5,6,1,1,0,0,-9,2,2,0,0,0,0,0,0,0,0,0,2,1,19,38,46,20,30,47,0,0,2083.330078,0,2083.330078,0,1600.0,0,0.0,0,0.0,0,0.0,2,29,21,12,2011,1982,2.0,0.0,6,0,0,0,0,0,1,0.0,0.0,0.0,0.0,0.0,1600.0,1,7,1,3,0,6,3,5,0,0,1,1,16,1,2,0,0,2,2,2,2853965.0,1.0,0,0,0,0,0,0,0,0,-8,0,2,30,59.72,46.63,1,0,0,0,0,0.0,0.0,0.0,0.0,12,21,12,2011,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,2,1,-1,1,1,0,0,0,-1,1,2,1,1,1,1,1,0,0,1,1,1,0,1,500,100,200,0,0,0,3373.02002,0,3243.02002,0,2,2715.449951,2585.449951,130.0,0.0,0.0,0.0,0.0,0.0,0.0,3373.02002,7,0,0,0,0,0,0,0,0,2,0,2,0,2,1.5,-9,0,0.3824,2,65.0,210,4,0,2,7,2,7,2,717413602,4,1,10,10,692805764,1,1,12,10,691804806,1,1,11,10,690118408,1,1,11,10,689030410,1,1,11,11,688642812,1,1,12,11,688411614,1,53,51,11,687507216,1,53,51,11,0,0,3,-8,3,1,5,2,2,2,3,0.0,0.0,0.000342,0.0,0.001349,0.0,0.0,0.0,0.0,0.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,0.999908,0.999871,0.999809,0.999753,0.999623,0.99956,0.999476,0.999376,0.999267,0.999128,0.999002,0.998856,0.99877,0.998566,0.998401,0.9982,0.997995,1.0,1.0,1.0,1.0,1.0,1.0,1.0,591.265015
38,2888645,c,3,2.8,7,80688804,3,4,4,516,135,1,1,1,11,2,22,1988,24,2,2011,2,1,2,1,2,2,2,2,0,0,0,0,0,0,1,0,2,1,2,1,1,0,0,0,0,0,0,0,0,2,2,6,6,1,1,0,0,-9,2,2,0,0,0,0,0,0,0,0,0,3,1,12,5,31,12,57,34,0,0,1416.670044,0,1416.670044,0,1200.0,0,0.0,0,0.0,0,0.0,2,22,24,2,2011,1988,2.0,0.0,6,0,0,0,1,1,1,0.0,0.0,0.0,0.0,0.0,1200.0,3,11,1,1,0,5,3,4,0,0,1,1,19,3,2,0,0,2,2,2,89303729.0,2.0,0,0,0,0,0,0,0,0,-8,0,3,21,57.57,51.49,1,0,0,0,0,0.0,0.0,0.0,0.0,10,16,2,2011,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,3,1,2,1,1,0,0,0,2,1,2,1,1,1,1,1,0,1,1,1,1,0,1,450,250,0,2,0,0,5708.330078,0,5708.330078,0,1,4656.669922,4656.669922,0.0,0.0,0.0,0.0,0.0,803.0,437.230011,5708.330078,3,1,0,0,0,0,0,0,0,3,0,3,0,3,2.0,2,0,0.0,1,203.0,110,6,0,2,14,3,14,3,-9,-9,-9,-9,-9,80688804,3,1,10,1,79839486,3,1,12,1,77676408,1,50,61,1,76969210,-9,53,59,1,-9,-9,-9,-9,-9,79553214,1,1,10,2,76479616,1,1,10,3,0,0,3,-8,3,1,5,1,1,2,1,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,0.999984,0.999977,0.999966,0.999956,0.999933,0.999922,0.999907,0.999889,0.99987,0.999846,0.999823,0.999797,0.999782,0.999746,0.999717,0.999681,0.999645,1.0,1.0,1.0,1.0,1.0,1.0,1.0,0.0
47,3705325,c,3,3.9,8,150484004,2,6,6,1777,701,1,2,1,11,2,57,1953,23,2,2011,1,2,97,1,2,2,1,2,0,0,0,0,0,0,1,0,2,3,1,1,1,0,0,0,0,0,0,0,0,6,6,6,6,1,1,0,0,-9,2,2,0,0,0,0,0,0,0,0,0,2,2,17,10,21,17,59,57,0,0,0.0,0,0.0,0,0.0,0,0.0,0,0.0,0,0.0,2,57,23,2,2011,1953,2.0,0.0,6,0,0,0,0,0,1,0.0,0.0,0.0,0.0,0.0,0.0,4,12,1,1,0,12,6,11,0,0,1,1,16,2,2,0,0,2,2,2,164178525.0,1.0,0,0,0,0,0,0,0,0,-8,0,1,0,59.11,54.1,1,0,0,0,0,0.0,0.0,0.0,0.0,10,23,2,2011,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,4,1,5,1,1,1,0,0,2,1,-8,1,1,1,0,1,1,0,1,1,1,0,1,150,30,0,2,0,0,2420.72998,0,420.730011,0,-8,2420.72998,420.730011,0.0,0.0,0.0,2000.0,0.0,0.0,0.0,2420.72998,-8,0,0,0,0,0,0,0,0,2,0,2,0,2,1.5,5,0,0.1738,1,790.0,110,0,0,2,14,3,14,3,-9,-9,-9,-9,-9,150484004,2,1,10,2,149443606,2,1,10,2,147274408,2,1,10,2,146342810,2,1,10,2,145914412,2,1,10,2,145635614,2,1,10,2,144867216,1,1,10,2,0,0,3,17,1,2,1,2,2,2,1,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,0.999517,0.999187,0.998761,0.998287,0.997469,0.996469,0.995917,0.994254,0.993639,0.992678,0.991646,0.990708,0.98933,0.98725,0.985731,1.0,1.0,1.0,1.0,1.0,1.0,1.0,0.0
51,3915445,c,3,3.9,5,557436804,2,4,4,538,141,3,8,1,11,2,20,1990,31,8,2011,1,1,2,1,2,1,2,2,0,0,0,0,0,0,1,0,2,2,2,4,1,0,0,0,0,0,0,0,0,7,5,6,7,1,1,0,0,-9,2,2,0,0,0,0,0,0,0,0,0,2,1,16,32,56,22,18,19,0,0,250.0,0,250.0,0,240.0,0,0.0,0,0.0,0,0.0,2,20,31,8,2011,1990,2.0,1.0,6,0,0,1,0,0,1,0.0,0.0,0.0,0.0,0.0,240.0,3,11,1,1,0,5,3,4,0,0,1,1,17,2,2,0,0,2,2,2,564924285.0,1.0,0,0,0,0,0,0,0,1,2,1,1,18,56.71,62.39,1,0,0,0,0,0.0,0.0,0.0,0.0,10,31,8,2011,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,5,2,1,1,1,0,1,0,2,1,2,1,1,1,1,1,1,1,1,1,1,0,1,160,130,60,0,0,0,2524.030029,0,250.0,0,1,2514.030029,240.0,0.0,0.0,456.029999,668.0,1150.0,0.0,0.0,2524.030029,5,0,0,0,0,0,0,0,1,1,1,1,0,2,1.5,1,0,0.18,1,749.0,110,4,0,2,4,2,4,2,581617602,2,1,10,8,557436804,2,1,10,8,556015606,2,1,10,8,554248288,1,80,91,8,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,0,0,3,17,1,1,6,1,2,2,1,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,0.0
52,4091565,c,3,2.4,8,487947604,3,5,5,475,124,3,7,1,11,1,33,1978,10,8,2011,2,1,3,1,2,2,2,2,0,0,0,0,0,0,1,0,3,1,1,1,1,0,0,0,0,0,0,0,0,2,6,2,5,1,2,1,2,-9,2,2,0,0,0,0,0,1,2,1,2,3,2,13,57,53,14,46,10,0,0,290.329987,0,0.0,0,0.0,0,0.0,0,290.329987,0,0.0,1,33,10,8,2011,1978,2.0,2.0,6,2,2,0,0,0,1,0.0,0.0,0.0,0.0,290.329987,290.329987,2,10,1,1,0,7,4,6,0,0,1,1,19,3,2,0,0,2,2,2,489864529.0,2.0,0,0,2,2,1,1,0,0,1,1,1,0,58.57,51.77,1,0,0,0,0,0.0,0.0,0.0,0.0,10,10,8,2011,1,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,4,2,1,1,1,0,0,0,1,1,2,1,1,1,1,1,1,0,1,1,1,0,1,550,120,0,1,0,0,1948.339966,0,0.0,0,1,1948.339966,0.0,0.0,0.0,41.669998,286.0,1620.670044,0.0,0.0,1948.339966,5,1,0,0,0,0,0,0,2,0,3,1,0,3,2.0,1,0,0.0,1,245.0,110,2,0,2,7,2,7,2,512577202,3,2,11,7,487947604,3,1,10,7,486920806,3,1,10,7,485411208,3,1,11,7,484650290,1,53,50,8,484425212,3,1,10,8,484045094,1,80,91,8,-9,-9,-9,-9,-9,0,0,3,16,1,1,5,2,2,2,3,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,0.0
53,4192205,c,3,2.1,5,554247604,3,3,3,184,52,3,8,1,11,2,27,1984,31,8,2011,2,1,2,1,2,2,2,2,0,0,0,0,0,0,1,0,2,2,1,4,1,0,0,0,0,0,0,0,0,6,6,6,6,1,1,2,0,-9,2,2,0,0,0,0,0,2,0,2,0,3,1,18,7,16,18,46,54,0,0,1816.0,0,1816.0,0,1356.0,0,0.0,0,0.0,0,0.0,2,27,31,8,2011,1984,2.0,0.0,6,1,1,0,0,0,1,0.0,0.0,0.0,0.0,0.0,1356.0,1,3,1,1,0,6,3,5,0,0,1,1,19,3,2,0,0,2,2,2,544838445.0,1.0,0,0,0,0,2,2,0,0,-8,0,3,33,59.11,54.1,1,0,0,0,0,0.0,0.0,0.0,0.0,10,31,8,2011,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,2,1,2,1,1,0,0,0,2,1,2,1,1,1,1,1,0,0,1,1,1,0,1,450,100,25,2,0,0,5516.0,0,5516.0,0,1,4306.0,4306.0,0.0,0.0,0.0,0.0,0.0,211.0,63.43,5516.0,1,1,0,0,0,0,0,0,0,3,0,3,0,3,2.0,2,0,0.0,1,153.0,110,3,0,2,4,2,4,2,578564402,3,1,10,8,554247604,3,1,10,8,553370406,3,1,10,8,551779888,1,80,91,8,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,0,0,3,16,1,1,5,1,2,2,3,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,0.0
54,4197647,c,3,2.6,11,146533204,3,3,3,221,67,3,2,1,11,1,49,1961,15,4,2011,2,2,2,1,2,2,2,2,0,0,0,0,0,0,1,0,3,1,2,4,1,0,0,0,2,0,0,0,0,6,1,5,4,5,1,0,0,-9,2,2,0,0,0,0,0,0,0,0,0,3,1,20,38,5,14,19,12,0,0,1200.0,0,1200.0,0,930.0,0,0.0,0,0.0,0,0.0,1,49,15,4,2011,1961,,,4,0,0,0,0,0,1,0.0,0.0,0.0,0.0,0.0,930.0,2,10,1,1,0,10,5,9,0,0,1,5,22,3,2,0,0,2,2,2,,,0,0,0,0,0,0,0,0,14,1,3,3,31.49,63.89,1,0,0,0,0,0.0,0.0,0.0,0.0,10,15,2,2011,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,4,2,4,1,1,0,0,0,2,1,2,1,1,1,1,1,1,1,1,1,1,0,1,100,-1,40,0,0,0,3282.159912,0,2662.5,0,1,2827.98999,2208.330078,619.659973,0.0,0.0,0.0,0.0,300.0,300.0,3282.159912,6,0,0,0,0,0,0,0,0,3,0,3,0,3,2.0,7,0,0.1888,2,124.0,110,7,0,2,4,3,4,3,-9,-9,-9,-9,-9,146533204,3,1,10,2,145819206,3,1,10,2,144004288,1,80,91,2,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,0,0,3,15,1,1,6,1,1,1,1,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.999849,0.999651,0.999463,0.999259,0.999015,0.998742,0.998464,0.998239,0.997908,0.997642,0.997376,0.997089,0.996337,0.995917,0.995253,0.994804,0.994313,0.993411,0.992944,0.992342,0.991714,0.990958,0.990239,0.989518,0.988399,1.0,1.0,1.0,1.0,1.0,1.0,1.0,0.0


Great! Now we need to deal with missing and negative values. 

First, let's see which columns have the most missing and negative values.

Then, we can drop those columns.

Finally, we'll drop any remaining rows with missing or negative values.

In [4]:
# Calculate % missing and % negative for each column
missing_pct = df.isnull().mean() 
negative_pct = (df.select_dtypes(include=['number']) < 0).mean() * 100

# Combine into a summary DataFrame and sort by the sum of % missing and % negative (descending)
summary = pd.DataFrame({
    '% Missing': missing_pct,
    '% Negative': negative_pct
}).fillna(0)

summary['% Total'] = summary['% Missing'] + summary['% Negative']
summary = summary.sort_values(by='% Total', ascending=False)

summary.head(50)

Unnamed: 0,% Missing,% Negative,% Total
hgpart,0.0,48.408996,48.408996
i_pno,0.0,28.694858,28.694858
i_month,0.0,28.225826,28.225826
i_ivfio,0.0,26.260831,26.260831
i_ivfho,0.0,26.260831,26.260831
i_hidp,0.0,26.260831,26.260831
h_pno,0.0,23.873707,23.873707
h_month,0.0,23.333086,23.333086
h_ivfio,0.0,20.81019,20.81019
h_ivfho,0.0,20.81019,20.81019


In [5]:
# Delete the top 25 columns with the highest % missing + % negative
cols_to_drop = summary.head(25).index
df = df.drop(columns=cols_to_drop)

print("Shape after dropping top 25 columns with highest % missing + % negative:", df.shape)
df.head()

Shape after dropping top 25 columns with highest % missing + % negative: (40509, 318)


Unnamed: 0,pidp,wave,wave_num,nbrsnci_dv,scghq1_dv,hidp,pno,hhorig,memorig,psu,strata,sampst,month,ivfio,ioutcome,sex,dvage,birthy,istrtdatd,istrtdatm,istrtdaty,lkmove,xpmove,jbstat,racel_dv,health,aidxhh,j2has,bensta2,bensta3,bensta4,bensta5,bensta6,bensta7,bensta96,finnow,finfut,vote1,vote6,mobuse,nch14resp,nch415resp,nchresp,nnatch,nadoptch,nchunder16,nch5to15,nch10to15,sclfsat1,sclfsat2,sclfsat7,sclfsato,marstat,employ,hgbiom,hgbiof,respf16,respm16,intdatd_if,intdatm_if,intdaty_if,doby_if,age_if,pn1pno,pn2pno,pns1pno,pns2pno,hhsize,jbhas,istrtdathh,istrtdatmm,istrtdatss,ienddathh,ienddatmm,ienddatss,j2pay_if,fimngrs_tc,fimngrs_dv,fimnlabgrs_tc,fimnlabgrs_dv,fimnlabnet_tc,fimnlabnet_dv,fiyrinvinc_tc,fiyrinvinc_dv,fibenothr_tc,fibenothr_dv,j2pay_dv,j2paynet_dv,sex_dv,age_dv,intdatd_dv,intdatm_dv,intdaty_dv,doby_dv,pensioner_dv,npensioner_dv,marstat_dv,npn_dv,npns_dv,ngrp_dv,nnsib_dv,nnssib_dv,ethn_dv,fimnmisc_dv,fimnprben_dv,fimninvnet_dv,fimnpen_dv,fimnsben_dv,fimnnet_dv,country,gor_dv,urban_dv,hhresp_dv,xtra5min_dv,agegr5_dv,agegr10_dv,agegr13_dv,livesp_dv,cohab_dv,single_dv,mastat_dv,hhtype_dv,buno_dv,depchl_dv,nchild_dv,respm16_dv,respf16_dv,rach16_dv,hrpid,hrpno,ppno,sppno,fnpno,fnspno,mnpno,mnspno,grfpno,grmpno,qfhighfl_dv,hiqual_dv,jbiindb_dv,sf12pcs_dv,sf12mcs_dv,scflag_dv,paygu_if,paynu_if,seearngrs_if,fiyrinvinc_if,fibenothr_if,fimnlabgrs_if,fimngrs_if,ind5mus_xw,ivfho,intdated,intdatem,intdatey,ivh1,ivh2,ivh3,ivh4,ivh5,ivh6,ivh7,ivh8,ivh9,ivh10,ivh11,ivh12,ivh13,ivh14,ivh15,ivh16,hsbeds,hsrooms,hsownd,fuelhave1,fuelhave2,fuelhave3,fuelhave4,fuelhave96,fuelduel,heatch,xphsdct,xphsdba,cduse1,cduse2,cduse5,cduse6,cduse7,cduse8,cduse9,cduse12,cduse13,cduse96,pcnet,xpfood1_g3,xpfdout_g3,xpaltob_g3,ncars,hhintlang,n10to15,fihhmngrs_dv,fihhmngrs_tc,fihhmnlabgrs_dv,fihhmnlabgrs_tc,ctband_if,fihhmnnet1_dv,fihhmnlabnet_dv,fihhmnmisc_dv,fihhmnprben_dv,fihhmninv_dv,fihhmnpen_dv,fihhmnsben_dv,houscost1_dv,houscost2_dv,fihhmngrs1_dv,ctband_dv,ncouple_dv,nonepar_dv,nkids_dv,nch02_dv,nch34_dv,nch511_dv,nch1215_dv,npens_dv,nemp_dv,nue_dv,nwage_dv,nchoecd_dv,nadoecd_dv,ieqmoecd_dv,tenure_dv,fihhnegsei_if,fihhmngrs_if,issue_num,aintlen,outcome,ivtnc,w6osmflag,dcsedfl_dv,lwenum_dv,fwenum_dv,lwintvd_dv,fwintvd_dv,b_hidp,b_pno,b_ivfio,b_ivfho,b_month,c_hidp,c_pno,c_ivfio,c_ivfho,c_month,d_hidp,d_pno,d_ivfio,d_ivfho,d_month,e_hidp,e_pno,e_ivfio,e_ivfho,e_month,genetics,epigenetics,xwdat_dv,scend_dv,school_dv,bornuk_dv,generation,evercoh_dv,evermar_dv,anychild_dv,ethn_dv_source,prob91e,prob91w,prob91s,prob99w,prob99s,prob01ni,prob09ni,prob09e,prob09w,prob09s,bb_mortbh_tw,bc_mortbh_tw,bd_mortbh_tw,be_mortbh_tw,bf_mortbh_tw,bg_mortbh_tw,bh_mortbh_tw,bi_mortbh_tw,bj_mortbh_tw,bk_mortbh_tw,bl_mortbh_tw,bm_mortbh_tw,bn_mortbh_tw,bo_mortbh_tw,bp_mortbh_tw,bq_mortbh_tw,br_mortbh_tw,b_mortbh_tw,c_mortbh_tw,d_mortbh_tw,e_mortbh_tw,f_mortbh_tw,g_mortbh_tw,h_mortbh_tw,i_mortbh_tw,b_mortus_tw,c_mortus_tw,d_mortus_tw,e_mortus_tw,f_mortus_tw,g_mortus_tw,h_mortus_tw,psnenub_xd
22,1558565,c,3,3.5,5,828349484,1,3,3,222,67,1,12,1,11,2,18,1993,20,12,2011,2,1,7,1,1,2,2,0,0,0,0,0,0,1,1,3,2,4,1,0,0,0,0,0,0,0,0,1,1,3,3,1,2,0,0,2,2,0,0,0,0,0,0,0,0,0,1,2,12,16,54,13,12,0,0,0,702.0,0,0.0,0,0.0,0,0.0,0,702.0,0,0.0,2,18,20,12,2011,1993,2,0,6,0,0,0,0,0,1,130.0,0.0,0.0,0.0,572.0,702.0,2,10,1,1,0,4,2,3,0,0,1,1,3,1,2,0,2,2,2,1558565,1,0,0,0,0,0,0,0,0,1,4,0,50.14,48.43,1,0,0,0,0,0.0,0.0,0.0,0.0,10,20,12,2011,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,2,1,4,1,1,0,0,0,2,1,-8,1,1,1,1,1,0,0,1,0,1,0,2,50,0,10,0,0,0,702.0,0,0.0,0,1,702.0,0.0,130.0,0.0,0.0,0.0,572.0,130.0,130.0,702.0,1,0,0,0,0,0,0,0,0,0,1,1,0,1,1.0,3,0,0.0,1,64.0,110,1,0,2,3,2,3,2,852386802,3,1,10,12,828349484,1,1,10,12,827560006,1,50,61,12,827560006,-9,82,93,-9,0,0,3,16,1,1,6,2,2,2,1,0.0,0.000342,0.0,0.001349,0.0,0.0,0.0,0.0,0.0,0.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,591.265015
27,1833965,c,3,3.4,8,757615204,3,3,3,46,12,3,11,1,11,1,46,1965,7,12,2011,2,2,2,1,1,2,2,0,0,0,0,0,0,1,3,1,2,2,1,0,0,0,0,0,0,0,0,3,2,2,3,1,1,1,2,2,2,0,0,0,0,0,1,2,1,2,3,1,20,26,46,21,31,13,0,0,1666.72998,0,1660.0,0,1325.660034,0,80.760002,0,0.0,0,0.0,1,46,7,12,2011,1965,2,2,6,2,2,0,0,0,1,0.0,0.0,6.73,0.0,0.0,1332.390015,1,8,2,2,0,10,5,9,0,0,1,1,19,3,2,0,2,2,2,748184965,1,0,0,2,2,1,1,0,0,0,5,20,45.33,37.91,1,0,0,0,1,0.0,0.0,0.0,0.0,11,16,11,2011,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,3,3,1,1,0,0,0,0,-8,1,2,1,1,1,1,1,1,1,1,1,1,0,1,255,-1,0,1,0,0,3313.25,0,1660.0,0,1,2978.909912,1325.660034,0.0,0.0,726.039978,166.710007,760.5,0.0,0.0,3313.25,4,1,0,0,0,0,0,0,2,1,2,1,0,3,2.0,1,0,0.2123,1,124.0,210,3,0,2,11,2,9,2,782088402,3,1,11,11,757615204,3,1,11,11,756846806,3,2,11,11,755412008,2,1,10,11,0,0,3,16,1,1,5,2,2,1,4,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,0.0
35,2670365,c,3,2.1,11,692348804,3,5,5,403,101,1,10,1,11,2,17,1993,19,10,2011,1,2,2,1,2,2,2,0,0,0,0,0,0,1,2,1,2,4,1,0,0,0,0,0,0,0,0,3,2,7,5,1,1,0,1,2,2,0,0,0,0,0,1,0,1,2,5,1,19,42,20,20,24,1,0,0,346.670013,0,346.670013,0,346.670013,0,0.0,0,0.0,0,0.0,2,17,19,10,2011,1993,2,0,6,1,2,0,0,2,1,0.0,0.0,0.0,0.0,0.0,346.670013,2,10,1,1,0,4,2,2,0,0,1,1,20,3,2,0,2,2,2,717013162,2,0,0,1,1,0,2,0,0,1,3,18,58.06,35.4,1,0,0,0,0,0.0,0.0,0.0,0.0,10,19,10,2011,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,4,1,2,1,1,0,0,0,1,1,2,1,1,1,1,1,1,1,1,1,1,0,1,300,10,10,3,0,1,4453.830078,0,4253.830078,0,1,3676.669922,3476.669922,0.0,0.0,0.0,0.0,200.0,445.0,222.229996,4453.830078,3,1,0,1,0,0,0,1,0,4,0,4,1,4,2.8,2,0,0.0,1,256.0,110,3,0,2,3,2,3,2,717013082,4,1,10,10,692348804,3,1,10,10,691363486,1,53,50,10,689703608,1,53,50,10,0,0,3,16,1,1,6,2,2,2,1,0.0,0.000342,0.0,0.001349,0.0,0.0,0.0,0.0,0.0,0.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,0.999994,0.999991,0.999987,0.999983,0.999974,0.99997,0.999965,0.999958,0.99995,0.999941,0.999932,0.999923,0.999917,0.999903,0.999892,0.999878,0.999864,1.0,1.0,1.0,1.0,1.0,1.0,1.0,591.265015
36,2853965,c,3,2.8,14,692805764,1,4,4,503,131,1,10,1,11,2,29,1982,21,12,2011,1,1,2,1,2,2,2,0,0,0,0,0,0,1,4,1,2,2,1,0,0,0,0,0,0,0,0,6,3,5,6,1,1,0,0,2,2,0,0,0,0,0,0,0,0,0,2,1,19,38,46,20,30,47,0,0,2083.330078,0,2083.330078,0,1600.0,0,0.0,0,0.0,0,0.0,2,29,21,12,2011,1982,2,0,6,0,0,0,0,0,1,0.0,0.0,0.0,0.0,0.0,1600.0,1,7,1,3,0,6,3,5,0,0,1,1,16,1,2,0,2,2,2,2853965,1,0,0,0,0,0,0,0,0,0,2,30,59.72,46.63,1,0,0,0,0,0.0,0.0,0.0,0.0,12,21,12,2011,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,2,1,-1,1,1,0,0,0,-1,1,2,1,1,1,1,1,0,0,1,1,1,0,1,500,100,200,0,0,0,3373.02002,0,3243.02002,0,2,2715.449951,2585.449951,130.0,0.0,0.0,0.0,0.0,0.0,0.0,3373.02002,7,0,0,0,0,0,0,0,0,2,0,2,0,2,1.5,-9,0,0.3824,2,65.0,210,4,0,2,7,2,7,2,717413602,4,1,10,10,692805764,1,1,12,10,691804806,1,1,11,10,690118408,1,1,11,10,0,0,3,-8,3,1,5,2,2,2,3,0.0,0.0,0.000342,0.0,0.001349,0.0,0.0,0.0,0.0,0.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,0.999908,0.999871,0.999809,0.999753,0.999623,0.99956,0.999476,0.999376,0.999267,0.999128,0.999002,0.998856,0.99877,0.998566,0.998401,0.9982,0.997995,1.0,1.0,1.0,1.0,1.0,1.0,1.0,591.265015
38,2888645,c,3,2.8,7,80688804,3,4,4,516,135,1,1,1,11,2,22,1988,24,2,2011,2,1,2,1,2,2,2,0,0,0,0,0,0,1,2,1,2,1,1,0,0,0,0,0,0,0,0,2,2,6,6,1,1,0,0,2,2,0,0,0,0,0,0,0,0,0,3,1,12,5,31,12,57,34,0,0,1416.670044,0,1416.670044,0,1200.0,0,0.0,0,0.0,0,0.0,2,22,24,2,2011,1988,2,0,6,0,0,0,1,1,1,0.0,0.0,0.0,0.0,0.0,1200.0,3,11,1,1,0,5,3,4,0,0,1,1,19,3,2,0,2,2,2,89303729,2,0,0,0,0,0,0,0,0,0,3,21,57.57,51.49,1,0,0,0,0,0.0,0.0,0.0,0.0,10,16,2,2011,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,3,1,2,1,1,0,0,0,2,1,2,1,1,1,1,1,0,1,1,1,1,0,1,450,250,0,2,0,0,5708.330078,0,5708.330078,0,1,4656.669922,4656.669922,0.0,0.0,0.0,0.0,0.0,803.0,437.230011,5708.330078,3,1,0,0,0,0,0,0,0,3,0,3,0,3,2.0,2,0,0.0,1,203.0,110,6,0,2,14,3,14,3,-9,-9,-9,-9,-9,80688804,3,1,10,1,79839486,3,1,12,1,77676408,1,50,61,1,0,0,3,-8,3,1,5,1,1,2,1,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,0.999984,0.999977,0.999966,0.999956,0.999933,0.999922,0.999907,0.999889,0.99987,0.999846,0.999823,0.999797,0.999782,0.999746,0.999717,0.999681,0.999645,1.0,1.0,1.0,1.0,1.0,1.0,1.0,0.0


In [6]:
# Drop rows with any missing values
df = df.dropna()  # Remove rows with any NaN values

print("Shape after dropping rows with missing values:", df.shape)

df.head()

Shape after dropping rows with missing values: (40508, 318)


Unnamed: 0,pidp,wave,wave_num,nbrsnci_dv,scghq1_dv,hidp,pno,hhorig,memorig,psu,strata,sampst,month,ivfio,ioutcome,sex,dvage,birthy,istrtdatd,istrtdatm,istrtdaty,lkmove,xpmove,jbstat,racel_dv,health,aidxhh,j2has,bensta2,bensta3,bensta4,bensta5,bensta6,bensta7,bensta96,finnow,finfut,vote1,vote6,mobuse,nch14resp,nch415resp,nchresp,nnatch,nadoptch,nchunder16,nch5to15,nch10to15,sclfsat1,sclfsat2,sclfsat7,sclfsato,marstat,employ,hgbiom,hgbiof,respf16,respm16,intdatd_if,intdatm_if,intdaty_if,doby_if,age_if,pn1pno,pn2pno,pns1pno,pns2pno,hhsize,jbhas,istrtdathh,istrtdatmm,istrtdatss,ienddathh,ienddatmm,ienddatss,j2pay_if,fimngrs_tc,fimngrs_dv,fimnlabgrs_tc,fimnlabgrs_dv,fimnlabnet_tc,fimnlabnet_dv,fiyrinvinc_tc,fiyrinvinc_dv,fibenothr_tc,fibenothr_dv,j2pay_dv,j2paynet_dv,sex_dv,age_dv,intdatd_dv,intdatm_dv,intdaty_dv,doby_dv,pensioner_dv,npensioner_dv,marstat_dv,npn_dv,npns_dv,ngrp_dv,nnsib_dv,nnssib_dv,ethn_dv,fimnmisc_dv,fimnprben_dv,fimninvnet_dv,fimnpen_dv,fimnsben_dv,fimnnet_dv,country,gor_dv,urban_dv,hhresp_dv,xtra5min_dv,agegr5_dv,agegr10_dv,agegr13_dv,livesp_dv,cohab_dv,single_dv,mastat_dv,hhtype_dv,buno_dv,depchl_dv,nchild_dv,respm16_dv,respf16_dv,rach16_dv,hrpid,hrpno,ppno,sppno,fnpno,fnspno,mnpno,mnspno,grfpno,grmpno,qfhighfl_dv,hiqual_dv,jbiindb_dv,sf12pcs_dv,sf12mcs_dv,scflag_dv,paygu_if,paynu_if,seearngrs_if,fiyrinvinc_if,fibenothr_if,fimnlabgrs_if,fimngrs_if,ind5mus_xw,ivfho,intdated,intdatem,intdatey,ivh1,ivh2,ivh3,ivh4,ivh5,ivh6,ivh7,ivh8,ivh9,ivh10,ivh11,ivh12,ivh13,ivh14,ivh15,ivh16,hsbeds,hsrooms,hsownd,fuelhave1,fuelhave2,fuelhave3,fuelhave4,fuelhave96,fuelduel,heatch,xphsdct,xphsdba,cduse1,cduse2,cduse5,cduse6,cduse7,cduse8,cduse9,cduse12,cduse13,cduse96,pcnet,xpfood1_g3,xpfdout_g3,xpaltob_g3,ncars,hhintlang,n10to15,fihhmngrs_dv,fihhmngrs_tc,fihhmnlabgrs_dv,fihhmnlabgrs_tc,ctband_if,fihhmnnet1_dv,fihhmnlabnet_dv,fihhmnmisc_dv,fihhmnprben_dv,fihhmninv_dv,fihhmnpen_dv,fihhmnsben_dv,houscost1_dv,houscost2_dv,fihhmngrs1_dv,ctband_dv,ncouple_dv,nonepar_dv,nkids_dv,nch02_dv,nch34_dv,nch511_dv,nch1215_dv,npens_dv,nemp_dv,nue_dv,nwage_dv,nchoecd_dv,nadoecd_dv,ieqmoecd_dv,tenure_dv,fihhnegsei_if,fihhmngrs_if,issue_num,aintlen,outcome,ivtnc,w6osmflag,dcsedfl_dv,lwenum_dv,fwenum_dv,lwintvd_dv,fwintvd_dv,b_hidp,b_pno,b_ivfio,b_ivfho,b_month,c_hidp,c_pno,c_ivfio,c_ivfho,c_month,d_hidp,d_pno,d_ivfio,d_ivfho,d_month,e_hidp,e_pno,e_ivfio,e_ivfho,e_month,genetics,epigenetics,xwdat_dv,scend_dv,school_dv,bornuk_dv,generation,evercoh_dv,evermar_dv,anychild_dv,ethn_dv_source,prob91e,prob91w,prob91s,prob99w,prob99s,prob01ni,prob09ni,prob09e,prob09w,prob09s,bb_mortbh_tw,bc_mortbh_tw,bd_mortbh_tw,be_mortbh_tw,bf_mortbh_tw,bg_mortbh_tw,bh_mortbh_tw,bi_mortbh_tw,bj_mortbh_tw,bk_mortbh_tw,bl_mortbh_tw,bm_mortbh_tw,bn_mortbh_tw,bo_mortbh_tw,bp_mortbh_tw,bq_mortbh_tw,br_mortbh_tw,b_mortbh_tw,c_mortbh_tw,d_mortbh_tw,e_mortbh_tw,f_mortbh_tw,g_mortbh_tw,h_mortbh_tw,i_mortbh_tw,b_mortus_tw,c_mortus_tw,d_mortus_tw,e_mortus_tw,f_mortus_tw,g_mortus_tw,h_mortus_tw,psnenub_xd
22,1558565,c,3,3.5,5,828349484,1,3,3,222,67,1,12,1,11,2,18,1993,20,12,2011,2,1,7,1,1,2,2,0,0,0,0,0,0,1,1,3,2,4,1,0,0,0,0,0,0,0,0,1,1,3,3,1,2,0,0,2,2,0,0,0,0,0,0,0,0,0,1,2,12,16,54,13,12,0,0,0,702.0,0,0.0,0,0.0,0,0.0,0,702.0,0,0.0,2,18,20,12,2011,1993,2,0,6,0,0,0,0,0,1,130.0,0.0,0.0,0.0,572.0,702.0,2,10,1,1,0,4,2,3,0,0,1,1,3,1,2,0,2,2,2,1558565,1,0,0,0,0,0,0,0,0,1,4,0,50.14,48.43,1,0,0,0,0,0.0,0.0,0.0,0.0,10,20,12,2011,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,2,1,4,1,1,0,0,0,2,1,-8,1,1,1,1,1,0,0,1,0,1,0,2,50,0,10,0,0,0,702.0,0,0.0,0,1,702.0,0.0,130.0,0.0,0.0,0.0,572.0,130.0,130.0,702.0,1,0,0,0,0,0,0,0,0,0,1,1,0,1,1.0,3,0,0.0,1,64.0,110,1,0,2,3,2,3,2,852386802,3,1,10,12,828349484,1,1,10,12,827560006,1,50,61,12,827560006,-9,82,93,-9,0,0,3,16,1,1,6,2,2,2,1,0.0,0.000342,0.0,0.001349,0.0,0.0,0.0,0.0,0.0,0.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,591.265015
27,1833965,c,3,3.4,8,757615204,3,3,3,46,12,3,11,1,11,1,46,1965,7,12,2011,2,2,2,1,1,2,2,0,0,0,0,0,0,1,3,1,2,2,1,0,0,0,0,0,0,0,0,3,2,2,3,1,1,1,2,2,2,0,0,0,0,0,1,2,1,2,3,1,20,26,46,21,31,13,0,0,1666.72998,0,1660.0,0,1325.660034,0,80.760002,0,0.0,0,0.0,1,46,7,12,2011,1965,2,2,6,2,2,0,0,0,1,0.0,0.0,6.73,0.0,0.0,1332.390015,1,8,2,2,0,10,5,9,0,0,1,1,19,3,2,0,2,2,2,748184965,1,0,0,2,2,1,1,0,0,0,5,20,45.33,37.91,1,0,0,0,1,0.0,0.0,0.0,0.0,11,16,11,2011,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,3,3,1,1,0,0,0,0,-8,1,2,1,1,1,1,1,1,1,1,1,1,0,1,255,-1,0,1,0,0,3313.25,0,1660.0,0,1,2978.909912,1325.660034,0.0,0.0,726.039978,166.710007,760.5,0.0,0.0,3313.25,4,1,0,0,0,0,0,0,2,1,2,1,0,3,2.0,1,0,0.2123,1,124.0,210,3,0,2,11,2,9,2,782088402,3,1,11,11,757615204,3,1,11,11,756846806,3,2,11,11,755412008,2,1,10,11,0,0,3,16,1,1,5,2,2,1,4,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,0.0
35,2670365,c,3,2.1,11,692348804,3,5,5,403,101,1,10,1,11,2,17,1993,19,10,2011,1,2,2,1,2,2,2,0,0,0,0,0,0,1,2,1,2,4,1,0,0,0,0,0,0,0,0,3,2,7,5,1,1,0,1,2,2,0,0,0,0,0,1,0,1,2,5,1,19,42,20,20,24,1,0,0,346.670013,0,346.670013,0,346.670013,0,0.0,0,0.0,0,0.0,2,17,19,10,2011,1993,2,0,6,1,2,0,0,2,1,0.0,0.0,0.0,0.0,0.0,346.670013,2,10,1,1,0,4,2,2,0,0,1,1,20,3,2,0,2,2,2,717013162,2,0,0,1,1,0,2,0,0,1,3,18,58.06,35.4,1,0,0,0,0,0.0,0.0,0.0,0.0,10,19,10,2011,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,4,1,2,1,1,0,0,0,1,1,2,1,1,1,1,1,1,1,1,1,1,0,1,300,10,10,3,0,1,4453.830078,0,4253.830078,0,1,3676.669922,3476.669922,0.0,0.0,0.0,0.0,200.0,445.0,222.229996,4453.830078,3,1,0,1,0,0,0,1,0,4,0,4,1,4,2.8,2,0,0.0,1,256.0,110,3,0,2,3,2,3,2,717013082,4,1,10,10,692348804,3,1,10,10,691363486,1,53,50,10,689703608,1,53,50,10,0,0,3,16,1,1,6,2,2,2,1,0.0,0.000342,0.0,0.001349,0.0,0.0,0.0,0.0,0.0,0.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,0.999994,0.999991,0.999987,0.999983,0.999974,0.99997,0.999965,0.999958,0.99995,0.999941,0.999932,0.999923,0.999917,0.999903,0.999892,0.999878,0.999864,1.0,1.0,1.0,1.0,1.0,1.0,1.0,591.265015
36,2853965,c,3,2.8,14,692805764,1,4,4,503,131,1,10,1,11,2,29,1982,21,12,2011,1,1,2,1,2,2,2,0,0,0,0,0,0,1,4,1,2,2,1,0,0,0,0,0,0,0,0,6,3,5,6,1,1,0,0,2,2,0,0,0,0,0,0,0,0,0,2,1,19,38,46,20,30,47,0,0,2083.330078,0,2083.330078,0,1600.0,0,0.0,0,0.0,0,0.0,2,29,21,12,2011,1982,2,0,6,0,0,0,0,0,1,0.0,0.0,0.0,0.0,0.0,1600.0,1,7,1,3,0,6,3,5,0,0,1,1,16,1,2,0,2,2,2,2853965,1,0,0,0,0,0,0,0,0,0,2,30,59.72,46.63,1,0,0,0,0,0.0,0.0,0.0,0.0,12,21,12,2011,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,2,1,-1,1,1,0,0,0,-1,1,2,1,1,1,1,1,0,0,1,1,1,0,1,500,100,200,0,0,0,3373.02002,0,3243.02002,0,2,2715.449951,2585.449951,130.0,0.0,0.0,0.0,0.0,0.0,0.0,3373.02002,7,0,0,0,0,0,0,0,0,2,0,2,0,2,1.5,-9,0,0.3824,2,65.0,210,4,0,2,7,2,7,2,717413602,4,1,10,10,692805764,1,1,12,10,691804806,1,1,11,10,690118408,1,1,11,10,0,0,3,-8,3,1,5,2,2,2,3,0.0,0.0,0.000342,0.0,0.001349,0.0,0.0,0.0,0.0,0.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,0.999908,0.999871,0.999809,0.999753,0.999623,0.99956,0.999476,0.999376,0.999267,0.999128,0.999002,0.998856,0.99877,0.998566,0.998401,0.9982,0.997995,1.0,1.0,1.0,1.0,1.0,1.0,1.0,591.265015
38,2888645,c,3,2.8,7,80688804,3,4,4,516,135,1,1,1,11,2,22,1988,24,2,2011,2,1,2,1,2,2,2,0,0,0,0,0,0,1,2,1,2,1,1,0,0,0,0,0,0,0,0,2,2,6,6,1,1,0,0,2,2,0,0,0,0,0,0,0,0,0,3,1,12,5,31,12,57,34,0,0,1416.670044,0,1416.670044,0,1200.0,0,0.0,0,0.0,0,0.0,2,22,24,2,2011,1988,2,0,6,0,0,0,1,1,1,0.0,0.0,0.0,0.0,0.0,1200.0,3,11,1,1,0,5,3,4,0,0,1,1,19,3,2,0,2,2,2,89303729,2,0,0,0,0,0,0,0,0,0,3,21,57.57,51.49,1,0,0,0,0,0.0,0.0,0.0,0.0,10,16,2,2011,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,3,1,2,1,1,0,0,0,2,1,2,1,1,1,1,1,0,1,1,1,1,0,1,450,250,0,2,0,0,5708.330078,0,5708.330078,0,1,4656.669922,4656.669922,0.0,0.0,0.0,0.0,0.0,803.0,437.230011,5708.330078,3,1,0,0,0,0,0,0,0,3,0,3,0,3,2.0,2,0,0.0,1,203.0,110,6,0,2,14,3,14,3,-9,-9,-9,-9,-9,80688804,3,1,10,1,79839486,3,1,12,1,77676408,1,50,61,1,0,0,3,-8,3,1,5,1,1,2,1,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,0.999984,0.999977,0.999966,0.999956,0.999933,0.999922,0.999907,0.999889,0.99987,0.999846,0.999823,0.999797,0.999782,0.999746,0.999717,0.999681,0.999645,1.0,1.0,1.0,1.0,1.0,1.0,1.0,0.0


In [7]:
# Drop rows with any negative values
numeric_cols = df.select_dtypes(include='number').columns
df = df[(df[numeric_cols] >= 0).all(axis=1)]
print("Shape after dropping rows with negative values:", df.shape)
df.head()

Shape after dropping rows with negative values: (31497, 318)


Unnamed: 0,pidp,wave,wave_num,nbrsnci_dv,scghq1_dv,hidp,pno,hhorig,memorig,psu,strata,sampst,month,ivfio,ioutcome,sex,dvage,birthy,istrtdatd,istrtdatm,istrtdaty,lkmove,xpmove,jbstat,racel_dv,health,aidxhh,j2has,bensta2,bensta3,bensta4,bensta5,bensta6,bensta7,bensta96,finnow,finfut,vote1,vote6,mobuse,nch14resp,nch415resp,nchresp,nnatch,nadoptch,nchunder16,nch5to15,nch10to15,sclfsat1,sclfsat2,sclfsat7,sclfsato,marstat,employ,hgbiom,hgbiof,respf16,respm16,intdatd_if,intdatm_if,intdaty_if,doby_if,age_if,pn1pno,pn2pno,pns1pno,pns2pno,hhsize,jbhas,istrtdathh,istrtdatmm,istrtdatss,ienddathh,ienddatmm,ienddatss,j2pay_if,fimngrs_tc,fimngrs_dv,fimnlabgrs_tc,fimnlabgrs_dv,fimnlabnet_tc,fimnlabnet_dv,fiyrinvinc_tc,fiyrinvinc_dv,fibenothr_tc,fibenothr_dv,j2pay_dv,j2paynet_dv,sex_dv,age_dv,intdatd_dv,intdatm_dv,intdaty_dv,doby_dv,pensioner_dv,npensioner_dv,marstat_dv,npn_dv,npns_dv,ngrp_dv,nnsib_dv,nnssib_dv,ethn_dv,fimnmisc_dv,fimnprben_dv,fimninvnet_dv,fimnpen_dv,fimnsben_dv,fimnnet_dv,country,gor_dv,urban_dv,hhresp_dv,xtra5min_dv,agegr5_dv,agegr10_dv,agegr13_dv,livesp_dv,cohab_dv,single_dv,mastat_dv,hhtype_dv,buno_dv,depchl_dv,nchild_dv,respm16_dv,respf16_dv,rach16_dv,hrpid,hrpno,ppno,sppno,fnpno,fnspno,mnpno,mnspno,grfpno,grmpno,qfhighfl_dv,hiqual_dv,jbiindb_dv,sf12pcs_dv,sf12mcs_dv,scflag_dv,paygu_if,paynu_if,seearngrs_if,fiyrinvinc_if,fibenothr_if,fimnlabgrs_if,fimngrs_if,ind5mus_xw,ivfho,intdated,intdatem,intdatey,ivh1,ivh2,ivh3,ivh4,ivh5,ivh6,ivh7,ivh8,ivh9,ivh10,ivh11,ivh12,ivh13,ivh14,ivh15,ivh16,hsbeds,hsrooms,hsownd,fuelhave1,fuelhave2,fuelhave3,fuelhave4,fuelhave96,fuelduel,heatch,xphsdct,xphsdba,cduse1,cduse2,cduse5,cduse6,cduse7,cduse8,cduse9,cduse12,cduse13,cduse96,pcnet,xpfood1_g3,xpfdout_g3,xpaltob_g3,ncars,hhintlang,n10to15,fihhmngrs_dv,fihhmngrs_tc,fihhmnlabgrs_dv,fihhmnlabgrs_tc,ctband_if,fihhmnnet1_dv,fihhmnlabnet_dv,fihhmnmisc_dv,fihhmnprben_dv,fihhmninv_dv,fihhmnpen_dv,fihhmnsben_dv,houscost1_dv,houscost2_dv,fihhmngrs1_dv,ctband_dv,ncouple_dv,nonepar_dv,nkids_dv,nch02_dv,nch34_dv,nch511_dv,nch1215_dv,npens_dv,nemp_dv,nue_dv,nwage_dv,nchoecd_dv,nadoecd_dv,ieqmoecd_dv,tenure_dv,fihhnegsei_if,fihhmngrs_if,issue_num,aintlen,outcome,ivtnc,w6osmflag,dcsedfl_dv,lwenum_dv,fwenum_dv,lwintvd_dv,fwintvd_dv,b_hidp,b_pno,b_ivfio,b_ivfho,b_month,c_hidp,c_pno,c_ivfio,c_ivfho,c_month,d_hidp,d_pno,d_ivfio,d_ivfho,d_month,e_hidp,e_pno,e_ivfio,e_ivfho,e_month,genetics,epigenetics,xwdat_dv,scend_dv,school_dv,bornuk_dv,generation,evercoh_dv,evermar_dv,anychild_dv,ethn_dv_source,prob91e,prob91w,prob91s,prob99w,prob99s,prob01ni,prob09ni,prob09e,prob09w,prob09s,bb_mortbh_tw,bc_mortbh_tw,bd_mortbh_tw,be_mortbh_tw,bf_mortbh_tw,bg_mortbh_tw,bh_mortbh_tw,bi_mortbh_tw,bj_mortbh_tw,bk_mortbh_tw,bl_mortbh_tw,bm_mortbh_tw,bn_mortbh_tw,bo_mortbh_tw,bp_mortbh_tw,bq_mortbh_tw,br_mortbh_tw,b_mortbh_tw,c_mortbh_tw,d_mortbh_tw,e_mortbh_tw,f_mortbh_tw,g_mortbh_tw,h_mortbh_tw,i_mortbh_tw,b_mortus_tw,c_mortus_tw,d_mortus_tw,e_mortus_tw,f_mortus_tw,g_mortus_tw,h_mortus_tw,psnenub_xd
27,1833965,c,3,3.4,8,757615204,3,3,3,46,12,3,11,1,11,1,46,1965,7,12,2011,2,2,2,1,1,2,2,0,0,0,0,0,0,1,3,1,2,2,1,0,0,0,0,0,0,0,0,3,2,2,3,1,1,1,2,2,2,0,0,0,0,0,1,2,1,2,3,1,20,26,46,21,31,13,0,0,1666.72998,0,1660.0,0,1325.660034,0,80.760002,0,0.0,0,0.0,1,46,7,12,2011,1965,2,2,6,2,2,0,0,0,1,0.0,0.0,6.73,0.0,0.0,1332.390015,1,8,2,2,0,10,5,9,0,0,1,1,19,3,2,0,2,2,2,748184965,1,0,0,2,2,1,1,0,0,0,5,20,45.33,37.91,1,0,0,0,1,0.0,0.0,0.0,0.0,11,16,11,2011,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,3,3,1,1,0,0,0,0,-8,1,2,1,1,1,1,1,1,1,1,1,1,0,1,255,-1,0,1,0,0,3313.25,0,1660.0,0,1,2978.909912,1325.660034,0.0,0.0,726.039978,166.710007,760.5,0.0,0.0,3313.25,4,1,0,0,0,0,0,0,2,1,2,1,0,3,2.0,1,0,0.2123,1,124.0,210,3,0,2,11,2,9,2,782088402,3,1,11,11,757615204,3,1,11,11,756846806,3,2,11,11,755412008,2,1,10,11,0,0,3,16,1,1,5,2,2,1,4,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,0.0
35,2670365,c,3,2.1,11,692348804,3,5,5,403,101,1,10,1,11,2,17,1993,19,10,2011,1,2,2,1,2,2,2,0,0,0,0,0,0,1,2,1,2,4,1,0,0,0,0,0,0,0,0,3,2,7,5,1,1,0,1,2,2,0,0,0,0,0,1,0,1,2,5,1,19,42,20,20,24,1,0,0,346.670013,0,346.670013,0,346.670013,0,0.0,0,0.0,0,0.0,2,17,19,10,2011,1993,2,0,6,1,2,0,0,2,1,0.0,0.0,0.0,0.0,0.0,346.670013,2,10,1,1,0,4,2,2,0,0,1,1,20,3,2,0,2,2,2,717013162,2,0,0,1,1,0,2,0,0,1,3,18,58.06,35.4,1,0,0,0,0,0.0,0.0,0.0,0.0,10,19,10,2011,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,4,1,2,1,1,0,0,0,1,1,2,1,1,1,1,1,1,1,1,1,1,0,1,300,10,10,3,0,1,4453.830078,0,4253.830078,0,1,3676.669922,3476.669922,0.0,0.0,0.0,0.0,200.0,445.0,222.229996,4453.830078,3,1,0,1,0,0,0,1,0,4,0,4,1,4,2.8,2,0,0.0,1,256.0,110,3,0,2,3,2,3,2,717013082,4,1,10,10,692348804,3,1,10,10,691363486,1,53,50,10,689703608,1,53,50,10,0,0,3,16,1,1,6,2,2,2,1,0.0,0.000342,0.0,0.001349,0.0,0.0,0.0,0.0,0.0,0.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,0.999994,0.999991,0.999987,0.999983,0.999974,0.99997,0.999965,0.999958,0.99995,0.999941,0.999932,0.999923,0.999917,0.999903,0.999892,0.999878,0.999864,1.0,1.0,1.0,1.0,1.0,1.0,1.0,591.265015
51,3915445,c,3,3.9,5,557436804,2,4,4,538,141,3,8,1,11,2,20,1990,31,8,2011,1,1,2,1,2,2,2,0,0,0,0,0,0,1,2,2,2,4,1,0,0,0,0,0,0,0,0,7,5,6,7,1,1,0,0,2,2,0,0,0,0,0,0,0,0,0,2,1,16,32,56,22,18,19,0,0,250.0,0,250.0,0,240.0,0,0.0,0,0.0,0,0.0,2,20,31,8,2011,1990,2,1,6,0,0,1,0,0,1,0.0,0.0,0.0,0.0,0.0,240.0,3,11,1,1,0,5,3,4,0,0,1,1,17,2,2,0,2,2,2,564924285,1,0,0,0,0,0,0,0,1,1,1,18,56.71,62.39,1,0,0,0,0,0.0,0.0,0.0,0.0,10,31,8,2011,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,5,2,1,1,1,0,1,0,2,1,2,1,1,1,1,1,1,1,1,1,1,0,1,160,130,60,0,0,0,2524.030029,0,250.0,0,1,2514.030029,240.0,0.0,0.0,456.029999,668.0,1150.0,0.0,0.0,2524.030029,5,0,0,0,0,0,0,0,1,1,1,1,0,2,1.5,1,0,0.18,1,749.0,110,4,0,2,4,2,4,2,581617602,2,1,10,8,557436804,2,1,10,8,556015606,2,1,10,8,554248288,1,80,91,8,0,0,3,17,1,1,6,1,2,2,1,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,0.0
52,4091565,c,3,2.4,8,487947604,3,5,5,475,124,3,7,1,11,1,33,1978,10,8,2011,2,1,3,1,2,2,2,0,0,0,0,0,0,1,3,1,1,1,1,0,0,0,0,0,0,0,0,2,6,2,5,1,2,1,2,2,2,0,0,0,0,0,1,2,1,2,3,2,13,57,53,14,46,10,0,0,290.329987,0,0.0,0,0.0,0,0.0,0,290.329987,0,0.0,1,33,10,8,2011,1978,2,2,6,2,2,0,0,0,1,0.0,0.0,0.0,0.0,290.329987,290.329987,2,10,1,1,0,7,4,6,0,0,1,1,19,3,2,0,2,2,2,489864529,2,0,0,2,2,1,1,0,0,1,1,0,58.57,51.77,1,0,0,0,0,0.0,0.0,0.0,0.0,10,10,8,2011,1,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,4,2,1,1,1,0,0,0,1,1,2,1,1,1,1,1,1,0,1,1,1,0,1,550,120,0,1,0,0,1948.339966,0,0.0,0,1,1948.339966,0.0,0.0,0.0,41.669998,286.0,1620.670044,0.0,0.0,1948.339966,5,1,0,0,0,0,0,0,2,0,3,1,0,3,2.0,1,0,0.0,1,245.0,110,2,0,2,7,2,7,2,512577202,3,2,11,7,487947604,3,1,10,7,486920806,3,1,10,7,485411208,3,1,11,7,0,0,3,16,1,1,5,2,2,2,3,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,0.0
53,4192205,c,3,2.1,5,554247604,3,3,3,184,52,3,8,1,11,2,27,1984,31,8,2011,2,1,2,1,2,2,2,0,0,0,0,0,0,1,2,2,1,4,1,0,0,0,0,0,0,0,0,6,6,6,6,1,1,2,0,2,2,0,0,0,0,0,2,0,2,0,3,1,18,7,16,18,46,54,0,0,1816.0,0,1816.0,0,1356.0,0,0.0,0,0.0,0,0.0,2,27,31,8,2011,1984,2,0,6,1,1,0,0,0,1,0.0,0.0,0.0,0.0,0.0,1356.0,1,3,1,1,0,6,3,5,0,0,1,1,19,3,2,0,2,2,2,544838445,1,0,0,0,0,2,2,0,0,0,3,33,59.11,54.1,1,0,0,0,0,0.0,0.0,0.0,0.0,10,31,8,2011,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,2,1,2,1,1,0,0,0,2,1,2,1,1,1,1,1,0,0,1,1,1,0,1,450,100,25,2,0,0,5516.0,0,5516.0,0,1,4306.0,4306.0,0.0,0.0,0.0,0.0,0.0,211.0,63.43,5516.0,1,1,0,0,0,0,0,0,0,3,0,3,0,3,2.0,2,0,0.0,1,153.0,110,3,0,2,4,2,4,2,578564402,3,1,10,8,554247604,3,1,10,8,553370406,3,1,10,8,551779888,1,80,91,8,0,0,3,16,1,1,5,1,2,2,3,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,0.0


Great! Now we need to take out scghq1_dv (the outcome variable) and drop identifiers (pidp wave wave_num hidp)

In [8]:
# Separate outcome (scghq1_dv) and predictors (X)

Y = df['scghq1_dv'].astype(float)  # Ensure Y is a numeric pandas Series
X = df.drop(columns=['scghq1_dv'])

# Drop identifiers (pidp, wave, wave_num, hidp, pno, hhorig, memorig, psu, strata, sampst, month, ivfio, ioutcome, birthy, hrpid)
identifiers = ['pidp', 'wave', 'wave_num', 'hidp', 'pno', 'hhorig', 'memorig', 'psu', 'strata', 'sampst', 'month', 'ivfio', 'ioutcome', 'birthy', 'hrpid']
X = X.drop(columns=identifiers)

In [9]:
# Strict rule: discrete only if dtype is object/category or integer dtype, and num_unique <= MAX_UNIQUE_DISCRETE
# Anything with more than MAX_UNIQUE_DISCRETE unique values is automatically continuous.

MAX_UNIQUE_DISCRETE = 30

label_map = dict(zip(meta.column_names, meta.column_labels)) if hasattr(meta, 'column_names') else {}
rows = []

for col in X.columns:
    dtype = X[col].dtype
    col_vals = X[col].dropna()
    num_unique = int(col_vals.nunique())

    # Automatic continuous if too many uniques
    if num_unique > MAX_UNIQUE_DISCRETE:
        will_discrete = False
        reason = f'{num_unique} unique > {MAX_UNIQUE_DISCRETE} -> continuous'
    else:
        is_string = dtype == 'object' or str(dtype).startswith('category')
        is_integer = pd.api.types.is_integer_dtype(X[col].dtype)

        will_discrete = bool(is_string or is_integer)
        if is_string:
            reason = 'string/category dtype -> discrete'
        elif is_integer:
            reason = 'integer dtype -> discrete'
        else:
            reason = 'float/numeric dtype -> continuous'

    suggested_action = 'one-hot encode (discrete)' if will_discrete else 'treat as continuous'
    sample_vals = list(pd.Series(col_vals.unique()).sort_values()[:6]) if num_unique > 0 else []

    rows.append({
        'Column': col,
        'Label': label_map.get(col, ''),
        'DataType': str(dtype),
        'NumUnique': num_unique,
        'IsString': dtype == 'object' or str(dtype).startswith('category'),
        'IsInteger': pd.api.types.is_integer_dtype(X[col].dtype),
        'WillBeDiscrete': will_discrete,
        'DecisionReason': reason,
        'SuggestedAction': suggested_action
    })

variable_summary = pd.DataFrame(rows)
# Order discrete first for visibility
variable_summary = variable_summary.sort_values(by=['WillBeDiscrete', 'NumUnique'], ascending=[False, True]).reset_index(drop=True)

print(f"Total variables: {len(variable_summary)}")
print(f"Discrete by rule: {int(variable_summary['WillBeDiscrete'].sum())}")
print(f"MAX_UNIQUE_DISCRETE = {MAX_UNIQUE_DISCRETE}")

pd.set_option('display.max_rows', None)
display(variable_summary)


Total variables: 302
Discrete by rule: 206
MAX_UNIQUE_DISCRETE = 30


Unnamed: 0,Column,Label,DataType,NumUnique,IsString,IsInteger,WillBeDiscrete,DecisionReason,SuggestedAction
0,intdatd_if,"Interview date: Day, imputation flag",int64,1,False,True,True,integer dtype -> discrete,one-hot encode (discrete)
1,intdatm_if,"Interview date: Month, imputation flag",int64,1,False,True,True,integer dtype -> discrete,one-hot encode (discrete)
2,intdaty_if,"Interview date: Year, imputation flag",int64,1,False,True,True,integer dtype -> discrete,one-hot encode (discrete)
3,doby_if,DOB Year imputation flag,int64,1,False,True,True,integer dtype -> discrete,one-hot encode (discrete)
4,age_if,Imputation flag for age_dv,int64,1,False,True,True,integer dtype -> discrete,one-hot encode (discrete)
5,c_ivfio,individual interview outcome,int64,1,False,True,True,integer dtype -> discrete,one-hot encode (discrete)
6,school_dv,Never went to/still at school,int64,1,False,True,True,integer dtype -> discrete,one-hot encode (discrete)
7,sex,Sex,int64,2,False,True,True,integer dtype -> discrete,one-hot encode (discrete)
8,lkmove,Prefers to move house,int64,2,False,True,True,integer dtype -> discrete,one-hot encode (discrete)
9,xpmove,Expects to move in next year,int64,2,False,True,True,integer dtype -> discrete,one-hot encode (discrete)


Great! Now we need to scale the continuous variables.

In [10]:
# Scale continuous variables to mean 0, std 1
from sklearn.preprocessing import StandardScaler
import numpy as np

# Identify continuous columns using the variable_summary decisions
continuous_cols = variable_summary.loc[~variable_summary['WillBeDiscrete'], 'Column'].tolist()
# Only keep columns that still exist in X (defensive)
continuous_cols = [c for c in continuous_cols if c in X.columns]

print(f'Found {len(continuous_cols)} continuous column(s) to scale')

# Keep a copy of the unscaled X in case we need it later
X_unscaled = X.copy()

scaler = None
if len(continuous_cols) > 0:
    scaler = StandardScaler()
    # Convert to float (safe) and scale in-place on a copy
    X_scaled = X.copy()
    try:
        X_scaled[continuous_cols] = scaler.fit_transform(X_scaled[continuous_cols].astype(float))
    except Exception as e:
        # Fall back to scaling each column separately if there are issues with mixed dtypes
        print('Warning: bulk scaling failed, falling back to per-column scaling. Error:', e)
        for col in continuous_cols:
            try:
                vals = X_scaled[col].astype(float).values.reshape(-1, 1)
                X_scaled[col] = scaler.fit_transform(vals).ravel()
            except Exception as e2:
                print(f'  Could not scale column {col}:', e2)
    # Replace X with scaled version
    X = X_scaled
else:
    print('No continuous columns to scale; X left unchanged')

print('Shape of X after scaling:', X.shape)
# Quick sanity checks
if scaler is not None and len(continuous_cols) > 0:
    # show means and stds (approx) for a few columns
    sample_check = continuous_cols[:6]
    means = X[sample_check].mean().round(6)
    stds = X[sample_check].std().round(6)
    print('Sample scaled means (should be near 0):')
    print(means.to_dict())
    print('Sample scaled stds (should be near 1):')
    print(stds.to_dict())

# Display head for quick verification
pd.set_option('display.max_columns', None)
display(X.head())

Found 96 continuous column(s) to scale
Shape of X after scaling: (31497, 302)
Sample scaled means (should be near 0):
{'b_mortus_tw': 0.0, 'c_mortus_tw': 0.0, 'prob01ni': 0.0, 'prob91e': 0.0, 'prob91w': 0.0, 'prob91s': -0.0}
Sample scaled stds (should be near 1):
{'b_mortus_tw': 0.0, 'c_mortus_tw': 0.0, 'prob01ni': 1.000016, 'prob91e': 1.000016, 'prob91w': 1.000016, 'prob91s': 1.000016}


Unnamed: 0,nbrsnci_dv,sex,dvage,istrtdatd,istrtdatm,istrtdaty,lkmove,xpmove,jbstat,racel_dv,health,aidxhh,j2has,bensta2,bensta3,bensta4,bensta5,bensta6,bensta7,bensta96,finnow,finfut,vote1,vote6,mobuse,nch14resp,nch415resp,nchresp,nnatch,nadoptch,nchunder16,nch5to15,nch10to15,sclfsat1,sclfsat2,sclfsat7,sclfsato,marstat,employ,hgbiom,hgbiof,respf16,respm16,intdatd_if,intdatm_if,intdaty_if,doby_if,age_if,pn1pno,pn2pno,pns1pno,pns2pno,hhsize,jbhas,istrtdathh,istrtdatmm,istrtdatss,ienddathh,ienddatmm,ienddatss,j2pay_if,fimngrs_tc,fimngrs_dv,fimnlabgrs_tc,fimnlabgrs_dv,fimnlabnet_tc,fimnlabnet_dv,fiyrinvinc_tc,fiyrinvinc_dv,fibenothr_tc,fibenothr_dv,j2pay_dv,j2paynet_dv,sex_dv,age_dv,intdatd_dv,intdatm_dv,intdaty_dv,doby_dv,pensioner_dv,npensioner_dv,marstat_dv,npn_dv,npns_dv,ngrp_dv,nnsib_dv,nnssib_dv,ethn_dv,fimnmisc_dv,fimnprben_dv,fimninvnet_dv,fimnpen_dv,fimnsben_dv,fimnnet_dv,country,gor_dv,urban_dv,hhresp_dv,xtra5min_dv,agegr5_dv,agegr10_dv,agegr13_dv,livesp_dv,cohab_dv,single_dv,mastat_dv,hhtype_dv,buno_dv,depchl_dv,nchild_dv,respm16_dv,respf16_dv,rach16_dv,hrpno,ppno,sppno,fnpno,fnspno,mnpno,mnspno,grfpno,grmpno,qfhighfl_dv,hiqual_dv,jbiindb_dv,sf12pcs_dv,sf12mcs_dv,scflag_dv,paygu_if,paynu_if,seearngrs_if,fiyrinvinc_if,fibenothr_if,fimnlabgrs_if,fimngrs_if,ind5mus_xw,ivfho,intdated,intdatem,intdatey,ivh1,ivh2,ivh3,ivh4,ivh5,ivh6,ivh7,ivh8,ivh9,ivh10,ivh11,ivh12,ivh13,ivh14,ivh15,ivh16,hsbeds,hsrooms,hsownd,fuelhave1,fuelhave2,fuelhave3,fuelhave4,fuelhave96,fuelduel,heatch,xphsdct,xphsdba,cduse1,cduse2,cduse5,cduse6,cduse7,cduse8,cduse9,cduse12,cduse13,cduse96,pcnet,xpfood1_g3,xpfdout_g3,xpaltob_g3,ncars,hhintlang,n10to15,fihhmngrs_dv,fihhmngrs_tc,fihhmnlabgrs_dv,fihhmnlabgrs_tc,ctband_if,fihhmnnet1_dv,fihhmnlabnet_dv,fihhmnmisc_dv,fihhmnprben_dv,fihhmninv_dv,fihhmnpen_dv,fihhmnsben_dv,houscost1_dv,houscost2_dv,fihhmngrs1_dv,ctband_dv,ncouple_dv,nonepar_dv,nkids_dv,nch02_dv,nch34_dv,nch511_dv,nch1215_dv,npens_dv,nemp_dv,nue_dv,nwage_dv,nchoecd_dv,nadoecd_dv,ieqmoecd_dv,tenure_dv,fihhnegsei_if,fihhmngrs_if,issue_num,aintlen,outcome,ivtnc,w6osmflag,dcsedfl_dv,lwenum_dv,fwenum_dv,lwintvd_dv,fwintvd_dv,b_hidp,b_pno,b_ivfio,b_ivfho,b_month,c_hidp,c_pno,c_ivfio,c_ivfho,c_month,d_hidp,d_pno,d_ivfio,d_ivfho,d_month,e_hidp,e_pno,e_ivfio,e_ivfho,e_month,genetics,epigenetics,xwdat_dv,scend_dv,school_dv,bornuk_dv,generation,evercoh_dv,evermar_dv,anychild_dv,ethn_dv_source,prob91e,prob91w,prob91s,prob99w,prob99s,prob01ni,prob09ni,prob09e,prob09w,prob09s,bb_mortbh_tw,bc_mortbh_tw,bd_mortbh_tw,be_mortbh_tw,bf_mortbh_tw,bg_mortbh_tw,bh_mortbh_tw,bi_mortbh_tw,bj_mortbh_tw,bk_mortbh_tw,bl_mortbh_tw,bm_mortbh_tw,bn_mortbh_tw,bo_mortbh_tw,bp_mortbh_tw,bq_mortbh_tw,br_mortbh_tw,b_mortbh_tw,c_mortbh_tw,d_mortbh_tw,e_mortbh_tw,f_mortbh_tw,g_mortbh_tw,h_mortbh_tw,i_mortbh_tw,b_mortus_tw,c_mortus_tw,d_mortus_tw,e_mortus_tw,f_mortus_tw,g_mortus_tw,h_mortus_tw,psnenub_xd
27,-0.28371,1,-0.115876,-1.095393,12,2011,2,2,2,1,1,2,2,0,0,0,0,0,0,1,3,1,2,2,1,0,0,0,0,0,0,0,0,3,2,2,3,1,1,1,2,2,2,0,0,0,0,0,1,2,1,2,3,1,20,-0.06573,0.958887,21,0.10705,-0.950787,0,0,-0.017847,0,0.337452,0,0.446605,0,-0.103971,0,-0.648642,-0.102372,-0.110536,1,-0.116101,-1.095393,12,2011,0.11848,2,2,6,2,2,0,0,0,1,-0.109752,-0.055754,-0.046693,-0.251228,-0.713671,-0.041775,1,8,2,2,0,10,5,9,0,0,1,1,19,3,2,0,2,2,2,1,0,0,2,2,1,1,0,0,0,5,0.483555,-0.412933,-1.195721,1,0,0,0,1,-0.310811,-0.24992,-0.389147,-0.116499,11,0.01771,11,2011,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,3,3,1,1,0,0,0,0,-8,1,2,1,1,1,1,1,1,1,1,1,1,0,1,-0.342745,-0.73437,-0.627075,1,0,0,-0.109458,0,-0.339211,0,1,-0.011242,-0.306992,-0.135861,-0.07604,0.247667,-0.10215,0.379483,-0.393626,-0.355857,-0.103269,4,1,0,0,0,0,0,0,2,1,2,1,0,3,0.300307,1,0,0.07043,1,-0.184237,210,3,0,2,11,2,9,2,0.011441,3,1,11,11,-0.031456,3,1,11,11,-0.032236,3,2,11,11,-0.033598,2,1,10,11,0,0,3,16,1,1,5,2,2,1,4,-1.374061,-0.2606,-0.302591,-0.263092,-0.301276,-0.242768,-0.176924,-1.322376,-0.196803,-0.24081,0.157095,0.15788,0.158612,0.159385,0.160282,0.161261,0.162223,0.162984,0.184261,0.188936,0.199234,0.20491,0.211685,0.215199,0.219569,0.224109,0.226733,0.231534,0.233569,0.236054,0.238249,0.240541,0.243034,0.246003,0.248661,0.0,0.0,0.031977,0.046482,0.06008,0.066837,0.069134,-0.79084
35,-2.169433,2,-1.802825,0.392242,10,2011,1,2,2,1,2,2,2,0,0,0,0,0,0,1,2,1,2,4,1,0,0,0,0,0,0,0,0,3,2,7,5,1,1,0,1,2,2,0,0,0,0,0,1,0,1,2,5,1,19,0.878459,-0.543315,20,-0.291053,-1.645235,0,0,-0.878042,0,-0.532203,0,-0.493183,0,-0.131257,0,-0.648642,-0.102372,-0.110536,2,-1.803121,0.392242,10,2011,1.744135,2,0,6,1,2,0,0,2,1,-0.109752,-0.055754,-0.049984,-0.251228,-0.713671,-0.449438,2,10,1,1,0,4,2,2,0,0,1,1,20,3,2,0,2,2,2,2,0,0,1,1,0,2,0,0,1,3,0.328207,0.756356,-1.454086,1,0,0,0,0,-0.310811,-0.24992,-0.389147,-0.116499,10,0.390159,10,2011,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,4,1,2,1,1,0,0,0,1,1,2,1,1,1,1,1,1,1,1,1,1,0,1,-0.105946,-0.640895,-0.499209,3,0,1,0.319382,0,0.612307,0,1,0.220158,0.696363,-0.135861,-0.07604,-0.091388,-0.323408,-0.564526,0.068229,-0.044475,0.186635,3,1,0,1,0,0,0,1,0,4,0,4,1,4,1.620534,2,0,-0.626279,1,0.648571,110,3,0,2,3,2,3,2,-0.130077,4,1,10,10,-0.17238,3,1,10,10,-0.1736,1,53,50,10,-0.175351,1,53,50,10,0,0,3,16,1,1,6,2,2,2,1,-1.374061,3.837305,-0.302591,3.800957,-0.301276,-0.242768,-0.176924,-1.322376,-0.196803,-0.24081,0.157095,0.15788,0.158612,0.159385,0.160282,0.161261,0.162223,0.162984,0.182733,0.187048,0.196738,0.202016,0.208184,0.211512,0.215737,0.219962,0.222289,0.226896,0.228626,0.230817,0.23296,0.234871,0.23713,0.2398,0.242258,0.0,0.0,0.031977,0.046482,0.06008,0.066837,0.069134,0.08562
51,0.441568,2,-1.628313,1.879876,8,2011,1,1,2,1,2,2,2,0,0,0,0,0,0,1,2,2,2,4,1,0,0,0,0,0,0,0,0,7,5,6,7,1,1,0,0,2,2,0,0,0,0,0,0,0,0,0,2,1,16,0.28834,1.536658,22,-0.632284,-0.603563,0,0,-0.941035,0,-0.596216,0,-0.595581,0,-0.131257,0,-0.648642,-0.102372,-0.110536,2,-1.628602,1.879876,8,2011,1.569958,2,1,6,0,0,1,0,0,1,-0.109752,-0.055754,-0.049984,-0.251228,-0.713671,-0.493553,3,11,1,1,0,5,3,4,0,0,1,1,17,2,2,0,2,2,2,1,0,0,0,0,0,0,0,1,1,1,0.328207,0.632355,1.324108,1,0,0,0,0,-0.310811,-0.24992,-0.389147,-0.116499,10,1.879954,8,2011,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,5,2,1,1,1,0,1,0,2,1,2,1,1,1,1,1,1,1,1,1,1,0,1,-0.842653,0.378833,0.140122,0,0,0,-0.406193,0,-0.856455,0,1,-0.165412,-0.813406,-0.135861,-0.07604,0.121574,0.563163,1.03549,-0.393626,-0.355857,-0.303868,5,0,0,0,0,0,0,0,1,1,1,1,0,2,-0.524835,1,0,-0.03557,1,3.758982,110,4,0,2,4,2,4,2,-0.424517,2,1,10,8,-0.463684,2,1,10,8,-0.465787,2,1,10,8,-0.46757,1,80,91,8,0,0,3,17,1,1,6,1,2,2,1,-1.374061,-0.2606,-0.302591,-0.263092,-0.301276,-0.242768,-0.176924,-1.322376,-0.196803,-0.24081,0.157095,0.15788,0.158612,0.159385,0.160282,0.161261,0.162223,0.162984,0.184261,0.188936,0.199234,0.20491,0.211685,0.215199,0.219569,0.224109,0.226733,0.231534,0.233569,0.236054,0.238249,0.240541,0.243034,0.246003,0.248661,0.0,0.0,0.031977,0.046482,0.06008,0.066837,0.069134,-0.79084
52,-1.734266,1,-0.872094,-0.723484,8,2011,2,1,3,1,2,2,2,0,0,0,0,0,0,1,3,1,1,1,1,0,0,0,0,0,0,0,0,2,6,2,5,1,2,1,2,2,2,0,0,0,0,0,1,2,1,2,3,2,13,1.763636,1.363326,14,0.960128,-1.124399,0,0,-0.914755,0,-0.761759,0,-0.825971,0,-0.131257,0,-0.280365,-0.102372,-0.110536,1,-0.872351,-0.723484,8,2011,0.873248,2,2,6,2,2,0,0,0,1,-0.109752,-0.055754,-0.049984,-0.251228,-0.010486,-0.472739,2,10,1,1,0,7,4,6,0,0,1,1,19,3,2,0,2,2,2,2,0,0,2,2,1,1,0,0,1,1,-1.069922,0.803201,0.230947,1,0,0,0,0,-0.310811,-0.24992,-0.389147,-0.116499,10,-0.727188,8,2011,1,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,4,2,1,1,1,0,0,0,1,1,2,1,1,1,1,1,1,0,1,1,1,0,1,1.209603,0.293856,-0.627075,1,0,0,-0.622643,0,-0.948164,0,1,-0.353013,-0.925356,-0.135861,-0.07604,-0.071929,0.056172,1.828206,-0.393626,-0.355857,-0.450192,5,1,0,0,0,0,0,0,2,0,3,1,0,3,0.300307,1,0,-0.626279,1,0.57917,110,2,0,2,7,2,7,2,-0.574657,3,2,11,7,-0.613726,3,1,10,7,-0.614948,3,1,10,7,-0.616073,3,1,11,7,0,0,3,16,1,1,5,2,2,2,3,-1.374061,-0.2606,-0.302591,-0.263092,-0.301276,-0.242768,-0.176924,-1.322376,-0.196803,-0.24081,0.157095,0.15788,0.158612,0.159385,0.160282,0.161261,0.162223,0.162984,0.184261,0.188936,0.199234,0.20491,0.211685,0.215199,0.219569,0.224109,0.226733,0.231534,0.233569,0.236054,0.238249,0.240541,0.243034,0.246003,0.248661,0.0,0.0,0.031977,0.046482,0.06008,0.066837,0.069134,-0.79084
53,-2.169433,2,-1.221118,1.879876,8,2011,2,1,2,1,2,2,2,0,0,0,0,0,0,1,2,2,1,4,1,0,0,0,0,0,0,0,0,6,6,6,6,1,1,2,0,2,2,0,0,0,0,0,2,0,2,0,3,1,18,-1.186955,-0.774423,18,0.960128,1.421912,0,0,0.079422,0,0.440751,0,0.47573,0,-0.131257,0,-0.648642,-0.102372,-0.110536,2,-1.22139,1.879876,8,2011,1.221603,2,0,6,1,1,0,0,0,1,-0.109752,-0.055754,-0.049984,-0.251228,-0.713671,-0.032011,1,3,1,1,0,6,3,5,0,0,1,1,19,3,2,0,2,2,2,1,0,0,0,0,2,2,0,0,0,3,1.493315,0.852802,0.470783,1,0,0,0,0,-0.310811,-0.24992,-0.389147,-0.116499,10,1.879954,8,2011,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,2,1,2,1,1,0,0,0,2,1,2,1,1,1,1,1,0,0,1,1,1,0,1,0.683383,0.123901,-0.30741,2,0,0,0.718741,0,1.075321,0,1,0.428864,1.08321,-0.135861,-0.07604,-0.091388,-0.323408,-0.901372,-0.174634,-0.266981,0.45661,1,1,0,0,0,0,0,0,0,3,0,3,0,3,0.300307,2,0,-0.626279,1,-0.001272,110,3,0,2,4,2,4,2,-0.431157,3,1,10,8,-0.47057,3,1,10,8,-0.471498,3,1,10,8,-0.472895,1,80,91,8,0,0,3,16,1,1,5,1,2,2,3,-1.374061,-0.2606,-0.302591,-0.263092,-0.301276,-0.242768,-0.176924,-1.322376,-0.196803,-0.24081,0.157095,0.15788,0.158612,0.159385,0.160282,0.161261,0.162223,0.162984,0.184261,0.188936,0.199234,0.20491,0.211685,0.215199,0.219569,0.224109,0.226733,0.231534,0.233569,0.236054,0.238249,0.240541,0.243034,0.246003,0.248661,0.0,0.0,0.031977,0.046482,0.06008,0.066837,0.069134,-0.79084


Great, now we need to encode categorical variables using one-hot encoding

In [11]:
# Encode categorical variables using one-hot encoding, then show shape and head
# Identify categorical (discrete) columns from variable_summary
cat_cols = variable_summary.loc[variable_summary['WillBeDiscrete'], 'Column'].tolist()
# Defensive: keep only those present in X
cat_cols = [c for c in cat_cols if c in X.columns]
print(f'Found {len(cat_cols)} categorical column(s) to encode')

# If there are categorical columns, create dummies and merge with the rest of X
if len(cat_cols) > 0:
    # Convert to string to ensure stable dummy names (preserve distinct categories)
    cat_df = X[cat_cols].astype(str).apply(lambda s: s.str.replace(' ', '_'))
    # Create dummies; drop_first avoids creating a full-rank encoding
    dummies = pd.get_dummies(cat_df, prefix=cat_cols, prefix_sep='_', drop_first=True, dummy_na=False)
    # Build new X: drop original categorical columns, concat dummies
    X_encoded = X.drop(columns=cat_cols).copy()
    # Ensure no column name collisions (rename if necessary)
    overlap = set(X_encoded.columns).intersection(dummies.columns)
    if overlap:
        # Rare: if a dummy name collides with existing column, prefix dummy names with 'dum_'
        dummies = dummies.rename(columns={c: f'dum_{c}' for c in dummies.columns})
    X_encoded = pd.concat([X_encoded, dummies], axis=1)
else:
    print('No categorical columns selected for encoding')
    X_encoded = X.copy()

# Replace X with encoded version for downstream modeling
X = X_encoded

print('Shape of X after encoding:', X.shape)
pd.set_option('display.max_columns', None)
display(X.head())

Found 206 categorical column(s) to encode
Shape of X after encoding: (31497, 1205)
Shape of X after encoding: (31497, 1205)


Unnamed: 0,nbrsnci_dv,dvage,istrtdatd,istrtdatmm,istrtdatss,ienddatmm,ienddatss,fimngrs_dv,fimnlabgrs_dv,fimnlabnet_dv,fiyrinvinc_dv,fibenothr_dv,j2pay_dv,j2paynet_dv,age_dv,intdatd_dv,doby_dv,fimnmisc_dv,fimnprben_dv,fimninvnet_dv,fimnpen_dv,fimnsben_dv,fimnnet_dv,jbiindb_dv,sf12pcs_dv,sf12mcs_dv,fibenothr_if,fimnlabgrs_if,fimngrs_if,ind5mus_xw,intdated,xpfood1_g3,xpfdout_g3,xpaltob_g3,fihhmngrs_dv,fihhmnlabgrs_dv,fihhmnnet1_dv,fihhmnlabnet_dv,fihhmnmisc_dv,fihhmnprben_dv,fihhmninv_dv,fihhmnpen_dv,fihhmnsben_dv,houscost1_dv,houscost2_dv,fihhmngrs1_dv,ieqmoecd_dv,fihhmngrs_if,aintlen,b_hidp,c_hidp,d_hidp,e_hidp,prob91e,prob91w,prob91s,prob99w,prob99s,prob01ni,prob09ni,prob09e,prob09w,prob09s,bb_mortbh_tw,bc_mortbh_tw,bd_mortbh_tw,be_mortbh_tw,bf_mortbh_tw,bg_mortbh_tw,bh_mortbh_tw,bi_mortbh_tw,bj_mortbh_tw,bk_mortbh_tw,bl_mortbh_tw,bm_mortbh_tw,bn_mortbh_tw,bo_mortbh_tw,bp_mortbh_tw,bq_mortbh_tw,br_mortbh_tw,b_mortbh_tw,c_mortbh_tw,d_mortbh_tw,e_mortbh_tw,f_mortbh_tw,g_mortbh_tw,h_mortbh_tw,i_mortbh_tw,b_mortus_tw,c_mortus_tw,d_mortus_tw,e_mortus_tw,f_mortus_tw,g_mortus_tw,h_mortus_tw,psnenub_xd,sex_2,lkmove_2,xpmove_2,health_2,aidxhh_2,j2has_2,bensta2_1,bensta3_1,bensta4_1,bensta5_1,bensta6_1,bensta7_1,bensta96_1,vote1_2,mobuse_2,employ_2,respf16_2,respm16_2,jbhas_2,j2pay_if_1,fimngrs_tc_1,fimnlabgrs_tc_1,fimnlabnet_tc_1,fiyrinvinc_tc_1,fibenothr_tc_1,pensioner_dv_2,urban_dv_2,xtra5min_dv_1,livesp_dv_1,cohab_dv_1,single_dv_1,depchl_dv_2,respm16_dv_2,respf16_dv_2,rach16_dv_2,qfhighfl_dv_1,scflag_dv_1,paygu_if_1,paynu_if_1,seearngrs_if_1,fiyrinvinc_if_1,ivh10_0,ivh11_0,ivh12_0,ivh13_0,ivh14_0,ivh15_0,ivh16_0,fihhmngrs_tc_1,fihhmnlabgrs_tc_1,fihhnegsei_if_1,outcome_210,w6osmflag_1,dcsedfl_dv_2,genetics_1,epigenetics_1,xwdat_dv_3,bornuk_dv_2,evercoh_dv_2,evermar_dv_2,anychild_dv_2,istrtdaty_2012,istrtdaty_2013,finfut_2,finfut_3,sex_dv_1,sex_dv_2,intdaty_dv_2012,intdaty_dv_2013,npn_dv_1,npn_dv_2,npns_dv_1,npns_dv_2,ngrp_dv_1,ngrp_dv_2,hhresp_dv_2,hhresp_dv_3,ivfho_11,ivfho_12,ivh1_0,ivh1_1,ivh2_0,ivh2_1,ivh3_0,ivh3_1,ivh4_0,ivh4_1,ivh5_0,ivh5_1,ivh6_0,ivh6_1,ivh7_0,ivh7_1,ivh8_0,ivh8_1,ivh9_0,ivh9_1,nonepar_dv_1,nonepar_dv_2,fwenum_dv_2,fwenum_dv_3,fwintvd_dv_2,fwintvd_dv_3,c_ivfho_11,c_ivfho_12,vote6_2,vote6_3,vote6_4,country_2,country_3,country_4,intdatey_2011,intdatey_2012,intdatey_2013,fuelhave1_-2,fuelhave1_0,fuelhave1_1,fuelhave2_-2,fuelhave2_0,fuelhave2_1,fuelhave3_-2,fuelhave3_0,fuelhave3_1,fuelhave4_-2,fuelhave4_0,fuelhave4_1,fuelhave96_-2,fuelhave96_0,fuelhave96_1,heatch_-2,heatch_1,heatch_2,cduse1_-2,cduse1_0,cduse1_1,cduse2_-2,cduse2_0,cduse2_1,cduse5_-2,cduse5_0,cduse5_1,cduse6_-2,cduse6_0,cduse6_1,cduse7_-2,cduse7_0,cduse7_1,cduse8_-2,cduse8_0,cduse8_1,cduse9_-2,cduse9_0,cduse9_1,cduse12_-2,cduse12_0,cduse12_1,cduse13_-2,cduse13_0,cduse13_1,cduse96_-2,cduse96_0,cduse96_1,ctband_if_1,ctband_if_2,ctband_if_3,npens_dv_1,npens_dv_2,npens_dv_3,issue_num_2,issue_num_3,issue_num_4,ethn_dv_source_2,ethn_dv_source_3,ethn_dv_source_4,finnow_2,finnow_3,finnow_4,finnow_5,npensioner_dv_0,npensioner_dv_1,npensioner_dv_2,npensioner_dv_3,fuelduel_-2,fuelduel_-8,fuelduel_1,fuelduel_2,xphsdct_-2,xphsdct_-8,xphsdct_1,xphsdct_2,xphsdba_-2,xphsdba_1,xphsdba_2,xphsdba_3,pcnet_-2,pcnet_-8,pcnet_1,pcnet_2,ncouple_dv_1,ncouple_dv_2,ncouple_dv_3,ncouple_dv_4,nch34_dv_0,nch34_dv_1,nch34_dv_2,nch34_dv_3,nch1215_dv_0,nch1215_dv_1,nch1215_dv_2,nch1215_dv_3,nadoptch_1,nadoptch_2,nadoptch_3,nadoptch_4,nadoptch_5,nch10to15_1,nch10to15_2,nch10to15_3,nch10to15_4,nch10to15_5,marstat_dv_2,marstat_dv_3,marstat_dv_4,marstat_dv_5,marstat_dv_6,hiqual_dv_2,hiqual_dv_3,hiqual_dv_4,hiqual_dv_5,hiqual_dv_9,hhintlang_0,hhintlang_2,hhintlang_6,hhintlang_8,hhintlang_9,n10to15_1,n10to15_2,n10to15_3,n10to15_4,n10to15_5,nch02_dv_0,nch02_dv_1,nch02_dv_2,nch02_dv_3,nch02_dv_5,generation_2,generation_3,generation_4,generation_5,generation_6,sclfsat1_2,sclfsat1_3,sclfsat1_4,sclfsat1_5,sclfsat1_6,sclfsat1_7,sclfsat2_2,sclfsat2_3,sclfsat2_4,sclfsat2_5,sclfsat2_6,sclfsat2_7,sclfsat7_2,sclfsat7_3,sclfsat7_4,sclfsat7_5,sclfsat7_6,sclfsat7_7,sclfsato_2,sclfsato_3,sclfsato_4,sclfsato_5,sclfsato_6,sclfsato_7,agegr10_dv_3,agegr10_dv_4,agegr10_dv_5,agegr10_dv_6,agegr10_dv_7,agegr10_dv_8,nch511_dv_0,nch511_dv_1,nch511_dv_2,nch511_dv_3,nch511_dv_4,nch511_dv_6,nch14resp_1,nch14resp_2,nch14resp_3,nch14resp_4,nch14resp_5,nch14resp_6,nch14resp_7,nch5to15_1,nch5to15_2,nch5to15_3,nch5to15_4,nch5to15_5,nch5to15_6,nch5to15_7,pn2pno_2,pn2pno_3,pn2pno_4,pn2pno_5,pn2pno_6,pn2pno_7,pn2pno_8,pns2pno_2,pns2pno_3,pns2pno_4,pns2pno_5,pns2pno_6,pns2pno_7,pns2pno_8,grfpno_1,grfpno_2,grfpno_3,grfpno_4,grfpno_5,grfpno_6,grfpno_8,nemp_dv_1,nemp_dv_2,nemp_dv_3,nemp_dv_4,nemp_dv_5,nemp_dv_6,nemp_dv_7,nchoecd_dv_1,nchoecd_dv_2,nchoecd_dv_3,nchoecd_dv_4,nchoecd_dv_5,nchoecd_dv_6,nchoecd_dv_7,nch415resp_1,nch415resp_2,nch415resp_3,nch415resp_4,nch415resp_5,nch415resp_6,nch415resp_7,nch415resp_8,nchresp_1,nchresp_2,nchresp_3,nchresp_4,nchresp_5,nchresp_6,nchresp_7,nchresp_8,nchunder16_1,nchunder16_2,nchunder16_3,nchunder16_4,nchunder16_5,nchunder16_6,nchunder16_7,nchunder16_8,marstat_2,marstat_3,marstat_4,marstat_5,marstat_6,marstat_7,marstat_8,marstat_9,hgbiom_1,hgbiom_13,hgbiom_2,hgbiom_3,hgbiom_4,hgbiom_5,hgbiom_6,hgbiom_7,buno_dv_2,buno_dv_3,buno_dv_4,buno_dv_5,buno_dv_6,buno_dv_7,buno_dv_8,buno_dv_9,nchild_dv_1,nchild_dv_2,nchild_dv_3,nchild_dv_4,nchild_dv_5,nchild_dv_6,nchild_dv_7,nchild_dv_8,mnpno_1,mnpno_13,mnpno_2,mnpno_3,mnpno_4,mnpno_5,mnpno_6,mnpno_7,mnspno_1,mnspno_13,mnspno_2,mnspno_3,mnspno_4,mnspno_5,mnspno_6,mnspno_7,grmpno_1,grmpno_13,grmpno_2,grmpno_3,grmpno_4,grmpno_5,grmpno_6,grmpno_7,nkids_dv_1,nkids_dv_2,nkids_dv_3,nkids_dv_4,nkids_dv_5,nkids_dv_6,nkids_dv_7,nkids_dv_8,nue_dv_1,nue_dv_10,nue_dv_2,nue_dv_3,nue_dv_4,nue_dv_5,nue_dv_6,nue_dv_7,tenure_dv_1,tenure_dv_2,tenure_dv_3,tenure_dv_4,tenure_dv_5,tenure_dv_6,tenure_dv_7,tenure_dv_8,c_pno_2,c_pno_3,c_pno_4,c_pno_5,c_pno_6,c_pno_7,c_pno_8,c_pno_9,d_pno_2,d_pno_3,d_pno_4,d_pno_5,d_pno_6,d_pno_7,d_pno_8,d_pno_9,e_pno_2,e_pno_3,e_pno_4,e_pno_5,e_pno_6,e_pno_7,e_pno_8,e_pno_9,nnatch_1,nnatch_10,nnatch_2,nnatch_3,nnatch_4,nnatch_5,nnatch_6,nnatch_7,nnatch_8,hgbiof_1,hgbiof_10,hgbiof_2,hgbiof_3,hgbiof_4,hgbiof_5,hgbiof_6,hgbiof_7,hgbiof_8,pn1pno_1,pn1pno_10,pn1pno_13,pn1pno_2,pn1pno_3,pn1pno_4,pn1pno_5,pn1pno_6,pn1pno_7,pns1pno_1,pns1pno_10,pns1pno_13,pns1pno_2,pns1pno_3,pns1pno_4,pns1pno_5,pns1pno_6,pns1pno_7,nnsib_dv_1,nnsib_dv_10,nnsib_dv_2,nnsib_dv_3,nnsib_dv_4,nnsib_dv_5,nnsib_dv_6,nnsib_dv_7,nnsib_dv_9,nnssib_dv_1,nnssib_dv_10,nnssib_dv_2,nnssib_dv_3,nnssib_dv_4,nnssib_dv_5,nnssib_dv_6,nnssib_dv_7,nnssib_dv_9,mastat_dv_10,mastat_dv_2,mastat_dv_3,mastat_dv_4,mastat_dv_5,mastat_dv_6,mastat_dv_7,mastat_dv_8,mastat_dv_9,hrpno_10,hrpno_2,hrpno_3,hrpno_4,hrpno_5,hrpno_6,hrpno_7,hrpno_8,hrpno_9,ppno_1,ppno_2,ppno_3,ppno_4,ppno_5,ppno_6,ppno_7,ppno_8,ppno_9,sppno_1,sppno_2,sppno_3,sppno_4,sppno_5,sppno_6,sppno_7,sppno_8,sppno_9,fnpno_1,fnpno_10,fnpno_2,fnpno_3,fnpno_4,fnpno_5,fnpno_6,fnpno_7,fnpno_8,fnspno_1,fnspno_10,fnspno_2,fnspno_3,fnspno_4,fnspno_5,fnspno_6,fnspno_7,fnspno_8,hsownd_-2,hsownd_-8,hsownd_-9,hsownd_1,hsownd_2,hsownd_3,hsownd_4,hsownd_5,hsownd_97,ctband_dv_1,ctband_dv_10,ctband_dv_2,ctband_dv_3,ctband_dv_4,ctband_dv_5,ctband_dv_6,ctband_dv_7,ctband_dv_8,ctband_dv_9,nwage_dv_1,nwage_dv_10,nwage_dv_2,nwage_dv_3,nwage_dv_4,nwage_dv_5,nwage_dv_6,nwage_dv_7,nwage_dv_8,nwage_dv_9,nadoecd_dv_10,nadoecd_dv_11,nadoecd_dv_2,nadoecd_dv_3,nadoecd_dv_4,nadoecd_dv_5,nadoecd_dv_6,nadoecd_dv_7,nadoecd_dv_8,nadoecd_dv_9,b_pno_10,b_pno_11,b_pno_2,b_pno_3,b_pno_4,b_pno_5,b_pno_6,b_pno_7,b_pno_8,b_pno_9,istrtdatm_10,istrtdatm_11,istrtdatm_12,istrtdatm_2,istrtdatm_3,istrtdatm_4,istrtdatm_5,istrtdatm_6,istrtdatm_7,istrtdatm_8,istrtdatm_9,jbstat_10,jbstat_11,jbstat_2,jbstat_3,jbstat_4,jbstat_5,jbstat_6,jbstat_7,jbstat_8,jbstat_9,jbstat_97,intdatm_dv_10,intdatm_dv_11,intdatm_dv_12,intdatm_dv_2,intdatm_dv_3,intdatm_dv_4,intdatm_dv_5,intdatm_dv_6,intdatm_dv_7,intdatm_dv_8,intdatm_dv_9,gor_dv_10,gor_dv_11,gor_dv_12,gor_dv_2,gor_dv_3,gor_dv_4,gor_dv_5,gor_dv_6,gor_dv_7,gor_dv_8,gor_dv_9,agegr5_dv_11,agegr5_dv_12,agegr5_dv_13,agegr5_dv_14,agegr5_dv_15,agegr5_dv_4,agegr5_dv_5,agegr5_dv_6,agegr5_dv_7,agegr5_dv_8,agegr5_dv_9,agegr13_dv_11,agegr13_dv_12,agegr13_dv_13,agegr13_dv_2,agegr13_dv_3,agegr13_dv_4,agegr13_dv_5,agegr13_dv_6,agegr13_dv_7,agegr13_dv_8,agegr13_dv_9,intdatem_10,intdatem_11,intdatem_12,intdatem_2,intdatem_3,intdatem_4,intdatem_5,intdatem_6,intdatem_7,intdatem_8,intdatem_9,lwenum_dv_11,lwenum_dv_12,lwenum_dv_13,lwenum_dv_14,lwenum_dv_3,lwenum_dv_4,lwenum_dv_5,lwenum_dv_6,lwenum_dv_7,lwenum_dv_8,lwenum_dv_9,lwintvd_dv_11,lwintvd_dv_12,lwintvd_dv_13,lwintvd_dv_14,lwintvd_dv_3,lwintvd_dv_4,lwintvd_dv_5,lwintvd_dv_6,lwintvd_dv_7,lwintvd_dv_8,lwintvd_dv_9,hhsize_10,hhsize_11,hhsize_12,hhsize_13,hhsize_14,hhsize_15,hhsize_16,hhsize_2,hhsize_3,hhsize_4,hhsize_5,hhsize_6,hhsize_7,hhsize_8,hhsize_9,hsbeds_-2,hsbeds_-8,hsbeds_0,hsbeds_1,hsbeds_11,hsbeds_12,hsbeds_15,hsbeds_2,hsbeds_3,hsbeds_4,hsbeds_5,hsbeds_6,hsbeds_7,hsbeds_8,hsbeds_9,ncars_-2,ncars_0,ncars_1,ncars_12,ncars_15,ncars_2,ncars_3,ncars_30,ncars_34,ncars_4,ncars_5,ncars_6,ncars_7,ncars_8,ncars_9,scend_dv_10,scend_dv_11,scend_dv_12,scend_dv_13,scend_dv_14,scend_dv_15,scend_dv_16,scend_dv_17,scend_dv_18,scend_dv_19,scend_dv_20,scend_dv_21,scend_dv_22,scend_dv_24,scend_dv_7,racel_dv_10,racel_dv_11,racel_dv_12,racel_dv_13,racel_dv_14,racel_dv_15,racel_dv_16,racel_dv_17,racel_dv_2,racel_dv_4,racel_dv_5,racel_dv_6,racel_dv_7,racel_dv_8,racel_dv_9,racel_dv_97,ethn_dv_10,ethn_dv_11,ethn_dv_12,ethn_dv_13,ethn_dv_14,ethn_dv_15,ethn_dv_16,ethn_dv_17,ethn_dv_2,ethn_dv_4,ethn_dv_5,ethn_dv_6,ethn_dv_7,ethn_dv_8,ethn_dv_9,ethn_dv_97,hsrooms_-2,hsrooms_-8,hsrooms_1,hsrooms_10,hsrooms_12,hsrooms_15,hsrooms_2,hsrooms_20,hsrooms_3,hsrooms_4,hsrooms_5,hsrooms_6,hsrooms_60,hsrooms_7,hsrooms_8,hsrooms_9,istrtdathh_10,istrtdathh_11,istrtdathh_12,istrtdathh_13,istrtdathh_14,istrtdathh_15,istrtdathh_16,istrtdathh_17,istrtdathh_18,istrtdathh_19,istrtdathh_20,istrtdathh_21,istrtdathh_22,istrtdathh_23,istrtdathh_7,istrtdathh_8,istrtdathh_9,hhtype_dv_10,hhtype_dv_11,hhtype_dv_12,hhtype_dv_16,hhtype_dv_17,hhtype_dv_18,hhtype_dv_19,hhtype_dv_2,hhtype_dv_20,hhtype_dv_21,hhtype_dv_22,hhtype_dv_23,hhtype_dv_3,hhtype_dv_4,hhtype_dv_5,hhtype_dv_6,hhtype_dv_8,b_ivfho_11,b_ivfho_12,b_ivfho_13,b_ivfho_39,b_ivfho_50,b_ivfho_51,b_ivfho_52,b_ivfho_53,b_ivfho_54,b_ivfho_56,b_ivfho_59,b_ivfho_60,b_ivfho_61,b_ivfho_62,b_ivfho_91,b_ivfho_96,b_ivfho_97,d_ivfio_10,d_ivfio_11,d_ivfio_14,d_ivfio_15,d_ivfio_16,d_ivfio_18,d_ivfio_2,d_ivfio_50,d_ivfio_51,d_ivfio_52,d_ivfio_53,d_ivfio_54,d_ivfio_55,d_ivfio_57,d_ivfio_80,d_ivfio_81,d_ivfio_9,b_ivfio_10,b_ivfio_11,b_ivfio_14,b_ivfio_15,b_ivfio_16,b_ivfio_18,b_ivfio_2,b_ivfio_21,b_ivfio_22,b_ivfio_25,b_ivfio_50,b_ivfio_51,b_ivfio_53,b_ivfio_60,b_ivfio_63,b_ivfio_80,b_ivfio_83,b_ivfio_9,e_ivfio_10,e_ivfio_11,e_ivfio_14,e_ivfio_15,e_ivfio_16,e_ivfio_18,e_ivfio_2,e_ivfio_50,e_ivfio_51,e_ivfio_52,e_ivfio_53,e_ivfio_54,e_ivfio_55,e_ivfio_57,e_ivfio_80,e_ivfio_83,e_ivfio_9,e_ivfio_99,d_ivfho_11,d_ivfho_12,d_ivfho_13,d_ivfho_39,d_ivfho_50,d_ivfho_51,d_ivfho_53,d_ivfho_55,d_ivfho_56,d_ivfho_59,d_ivfho_60,d_ivfho_61,d_ivfho_62,d_ivfho_63,d_ivfho_65,d_ivfho_80,d_ivfho_81,d_ivfho_91,d_ivfho_92,e_ivfho_11,e_ivfho_12,e_ivfho_13,e_ivfho_50,e_ivfho_51,e_ivfho_53,e_ivfho_55,e_ivfho_56,e_ivfho_59,e_ivfho_60,e_ivfho_61,e_ivfho_62,e_ivfho_63,e_ivfho_65,e_ivfho_80,e_ivfho_81,e_ivfho_90,e_ivfho_91,e_ivfho_96,e_ivfho_97,ienddathh_1,ienddathh_10,ienddathh_11,ienddathh_12,ienddathh_13,ienddathh_14,ienddathh_15,ienddathh_16,ienddathh_17,ienddathh_18,ienddathh_19,ienddathh_2,ienddathh_20,ienddathh_21,ienddathh_22,ienddathh_23,ienddathh_3,ienddathh_4,ienddathh_5,ienddathh_6,ienddathh_7,ienddathh_8,ienddathh_9,b_month_10,b_month_11,b_month_12,b_month_13,b_month_14,b_month_15,b_month_16,b_month_17,b_month_18,b_month_19,b_month_2,b_month_20,b_month_21,b_month_22,b_month_23,b_month_24,b_month_3,b_month_4,b_month_5,b_month_6,b_month_7,b_month_8,b_month_9,c_month_10,c_month_11,c_month_12,c_month_13,c_month_14,c_month_15,c_month_16,c_month_17,c_month_18,c_month_19,c_month_2,c_month_20,c_month_21,c_month_22,c_month_23,c_month_24,c_month_3,c_month_4,c_month_5,c_month_6,c_month_7,c_month_8,c_month_9,d_month_10,d_month_11,d_month_12,d_month_13,d_month_14,d_month_15,d_month_16,d_month_17,d_month_18,d_month_19,d_month_2,d_month_20,d_month_21,d_month_22,d_month_23,d_month_24,d_month_3,d_month_4,d_month_5,d_month_6,d_month_7,d_month_8,d_month_9,e_month_10,e_month_11,e_month_12,e_month_13,e_month_14,e_month_15,e_month_16,e_month_17,e_month_18,e_month_19,e_month_2,e_month_20,e_month_21,e_month_22,e_month_23,e_month_24,e_month_3,e_month_4,e_month_5,e_month_6,e_month_7,e_month_8,e_month_9,ivtnc_1,ivtnc_10,ivtnc_11,ivtnc_12,ivtnc_13,ivtnc_14,ivtnc_15,ivtnc_16,ivtnc_17,ivtnc_18,ivtnc_19,ivtnc_2,ivtnc_20,ivtnc_21,ivtnc_22,ivtnc_23,ivtnc_24,ivtnc_3,ivtnc_4,ivtnc_5,ivtnc_6,ivtnc_7,ivtnc_8,ivtnc_9
27,-0.28371,-0.115876,-1.095393,-0.06573,0.958887,0.10705,-0.950787,-0.017847,0.337452,0.446605,-0.103971,-0.648642,-0.102372,-0.110536,-0.116101,-1.095393,0.11848,-0.109752,-0.055754,-0.046693,-0.251228,-0.713671,-0.041775,0.483555,-0.412933,-1.195721,-0.310811,-0.24992,-0.389147,-0.116499,0.01771,-0.342745,-0.73437,-0.627075,-0.109458,-0.339211,-0.011242,-0.306992,-0.135861,-0.07604,0.247667,-0.10215,0.379483,-0.393626,-0.355857,-0.103269,0.300307,0.07043,-0.184237,0.011441,-0.031456,-0.032236,-0.033598,-1.374061,-0.2606,-0.302591,-0.263092,-0.301276,-0.242768,-0.176924,-1.322376,-0.196803,-0.24081,0.157095,0.15788,0.158612,0.159385,0.160282,0.161261,0.162223,0.162984,0.184261,0.188936,0.199234,0.20491,0.211685,0.215199,0.219569,0.224109,0.226733,0.231534,0.233569,0.236054,0.238249,0.240541,0.243034,0.246003,0.248661,0.0,0.0,0.031977,0.046482,0.06008,0.066837,0.069134,-0.79084,False,True,True,False,True,True,False,False,False,False,False,False,True,True,False,False,True,True,False,False,False,False,False,False,False,True,True,False,False,False,True,True,True,True,True,False,True,False,False,False,True,True,True,True,True,True,True,True,False,False,False,True,False,True,False,False,True,False,True,True,False,False,False,False,False,True,False,False,False,False,True,False,True,False,False,True,False,True,False,False,True,True,False,True,False,True,False,True,False,True,False,True,False,True,False,True,False,False,False,True,False,True,False,True,False,True,False,False,False,False,False,True,False,False,False,False,True,False,True,False,False,True,False,False,True,False,False,True,False,False,True,False,False,False,True,False,False,True,False,False,True,False,False,True,False,False,True,False,False,True,False,False,True,False,False,True,False,False,True,False,True,False,True,False,False,False,True,False,False,False,False,False,False,True,False,True,False,False,False,False,True,False,False,True,False,False,False,False,False,True,False,True,False,False,False,False,True,False,True,False,False,False,True,False,False,False,True,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,True,False,False,False,True,False,True,False,False,False,False,False,False,False,False,False,True,False,False,False,False,False,False,False,True,False,False,True,False,False,False,False,True,False,False,False,False,False,True,False,False,False,False,False,False,True,False,False,False,False,False,False,True,False,False,False,True,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,True,False,False,False,False,False,False,True,False,False,False,False,False,False,False,False,False,False,False,False,False,True,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,True,False,False,False,False,False,False,False,False,True,False,False,False,False,False,False,False,False,False,False,False,False,False,False,True,False,False,False,False,False,False,False,True,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,True,False,False,False,False,False,True,False,False,False,False,False,False,False,False,True,False,False,False,False,False,False,False,True,False,False,False,False,False,False,True,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,True,False,False,False,False,False,False,True,False,False,False,False,False,False,False,False,True,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,True,False,False,False,False,False,False,False,False,True,False,False,False,False,False,False,False,False,False,True,False,False,False,False,False,False,False,False,False,True,False,False,False,False,False,True,False,False,False,False,False,False,False,False,False,False,False,False,True,False,False,False,False,False,False,False,False,False,True,False,False,False,False,False,False,False,False,True,False,False,False,False,False,False,False,False,False,False,True,False,False,False,False,False,False,False,False,False,False,True,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,True,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,True,False,True,False,False,False,False,False,False,False,False,False,True,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,True,False,False,False,False,False,False,False,False,True,False,False,False,False,False,False,False,False,False,False,False,False,False,False,True,False,False,False,False,False,False,False,False,True,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,True,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,True,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,True,False,False,False,False,False,False,False,False,False,False,False,False,True,False,False,False,False,False,False,False,False,False,False,True,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,True,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,True,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,True,False,False,False,False,False,False,False,False,False,False,True,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,True,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,True,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,True,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,True,False,False,False,False,False,False
35,-2.169433,-1.802825,0.392242,0.878459,-0.543315,-0.291053,-1.645235,-0.878042,-0.532203,-0.493183,-0.131257,-0.648642,-0.102372,-0.110536,-1.803121,0.392242,1.744135,-0.109752,-0.055754,-0.049984,-0.251228,-0.713671,-0.449438,0.328207,0.756356,-1.454086,-0.310811,-0.24992,-0.389147,-0.116499,0.390159,-0.105946,-0.640895,-0.499209,0.319382,0.612307,0.220158,0.696363,-0.135861,-0.07604,-0.091388,-0.323408,-0.564526,0.068229,-0.044475,0.186635,1.620534,-0.626279,0.648571,-0.130077,-0.17238,-0.1736,-0.175351,-1.374061,3.837305,-0.302591,3.800957,-0.301276,-0.242768,-0.176924,-1.322376,-0.196803,-0.24081,0.157095,0.15788,0.158612,0.159385,0.160282,0.161261,0.162223,0.162984,0.182733,0.187048,0.196738,0.202016,0.208184,0.211512,0.215737,0.219962,0.222289,0.226896,0.228626,0.230817,0.23296,0.234871,0.23713,0.2398,0.242258,0.0,0.0,0.031977,0.046482,0.06008,0.066837,0.069134,0.08562,True,False,True,True,True,True,False,False,False,False,False,False,True,True,False,False,True,True,False,False,False,False,False,False,False,True,False,False,False,False,True,True,True,True,True,True,True,False,False,False,False,True,True,True,True,True,True,True,False,False,False,False,False,True,False,False,True,False,True,True,True,False,False,False,False,False,True,False,False,True,False,False,True,False,False,False,False,False,False,True,False,False,True,True,False,True,False,True,False,True,False,True,False,True,False,True,False,False,False,True,False,True,False,False,False,False,False,True,True,False,False,True,False,False,False,False,True,False,False,True,False,True,False,False,True,False,False,True,False,False,True,False,False,False,True,False,False,True,False,False,True,False,False,True,False,False,True,False,False,True,False,False,True,False,False,True,False,False,True,False,True,False,True,False,False,False,False,False,False,False,False,False,False,False,True,False,False,False,True,False,False,False,False,False,True,False,False,False,False,True,False,True,False,False,False,False,True,False,True,False,False,False,True,False,False,False,False,True,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,True,False,True,False,False,False,True,False,False,False,False,True,False,False,False,False,True,False,False,False,False,False,False,False,False,True,False,True,False,False,False,False,True,False,False,False,False,False,False,False,False,False,False,True,False,False,False,True,False,False,False,False,False,False,False,False,True,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,True,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,True,False,False,False,True,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,True,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,True,False,False,False,False,False,False,False,False,False,False,False,False,False,True,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,True,False,False,False,False,False,False,False,True,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,True,False,False,False,False,False,False,False,False,True,False,False,False,False,False,False,False,False,True,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,True,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,True,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,True,False,False,False,False,False,False,False,False,True,False,False,False,False,False,False,False,False,False,False,False,False,True,False,False,False,False,False,False,False,True,False,False,False,False,False,False,False,False,False,False,True,False,False,False,False,False,False,False,False,False,True,False,False,False,False,False,False,False,False,False,True,False,False,False,False,False,True,False,False,False,False,False,False,False,False,False,False,False,False,True,False,False,False,False,False,False,False,False,True,False,False,False,False,False,False,False,False,False,False,True,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,True,False,False,False,False,False,False,False,False,True,False,False,False,False,False,False,False,True,False,False,False,False,False,False,False,False,False,False,False,False,False,False,True,False,False,False,False,False,False,False,False,False,False,True,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,True,False,False,False,False,False,False,False,False,False,False,False,False,False,True,False,False,False,False,False,False,False,False,False,False,False,True,False,False,False,False,False,False,False,False,False,False,False,False,False,False,True,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,True,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,True,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,True,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,True,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,True,False,False,False,False,False,False,False,False,False,False,False,True,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,True,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,True,False,False,False,False,False,False,False,False,False,False,True,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,True,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,True,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,True,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,True,False,False,False,False,False,False
51,0.441568,-1.628313,1.879876,0.28834,1.536658,-0.632284,-0.603563,-0.941035,-0.596216,-0.595581,-0.131257,-0.648642,-0.102372,-0.110536,-1.628602,1.879876,1.569958,-0.109752,-0.055754,-0.049984,-0.251228,-0.713671,-0.493553,0.328207,0.632355,1.324108,-0.310811,-0.24992,-0.389147,-0.116499,1.879954,-0.842653,0.378833,0.140122,-0.406193,-0.856455,-0.165412,-0.813406,-0.135861,-0.07604,0.121574,0.563163,1.03549,-0.393626,-0.355857,-0.303868,-0.524835,-0.03557,3.758982,-0.424517,-0.463684,-0.465787,-0.46757,-1.374061,-0.2606,-0.302591,-0.263092,-0.301276,-0.242768,-0.176924,-1.322376,-0.196803,-0.24081,0.157095,0.15788,0.158612,0.159385,0.160282,0.161261,0.162223,0.162984,0.184261,0.188936,0.199234,0.20491,0.211685,0.215199,0.219569,0.224109,0.226733,0.231534,0.233569,0.236054,0.238249,0.240541,0.243034,0.246003,0.248661,0.0,0.0,0.031977,0.046482,0.06008,0.066837,0.069134,-0.79084,True,False,False,True,True,True,False,False,False,False,False,False,True,True,False,False,True,True,False,False,False,False,False,False,False,True,False,False,False,False,True,True,True,True,True,True,True,False,False,False,False,True,True,True,True,True,True,True,False,False,False,False,False,True,False,False,True,False,False,True,True,False,False,True,False,False,True,False,False,False,False,False,False,True,False,False,False,False,False,False,True,True,False,True,False,True,False,True,False,True,False,True,False,True,False,True,False,False,False,True,False,True,False,False,False,False,False,True,False,True,False,True,False,False,False,False,True,False,False,True,False,True,False,False,False,True,False,True,False,False,True,False,False,False,True,False,False,True,False,False,True,False,False,True,False,False,True,False,False,True,False,False,True,False,False,True,False,False,True,False,True,False,True,False,False,True,False,False,False,False,False,False,False,False,True,False,False,False,False,True,False,False,False,False,False,True,False,False,False,True,False,True,False,False,False,False,True,False,False,False,False,False,True,False,False,False,True,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,True,False,False,False,False,False,True,False,False,False,False,False,False,False,False,False,True,False,False,False,False,False,False,False,False,True,False,False,False,False,False,True,False,False,False,True,False,False,False,False,False,False,True,False,False,False,False,False,False,True,True,False,False,False,False,False,True,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,True,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,True,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,True,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,True,False,False,False,False,False,False,False,True,False,False,False,False,False,False,False,True,False,False,False,False,False,False,False,True,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,True,False,False,False,False,False,False,False,False,False,False,True,False,False,False,False,True,False,False,False,False,False,False,False,False,False,False,False,True,False,False,False,False,False,False,False,False,False,True,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,True,False,False,False,True,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,True,False,False,True,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,True,False,False,False,False,False,False,False,False,False,True,False,False,False,False,False,False,False,False,False,False,False,False,False,False,True,False,False,False,False,False,False,True,False,False,False,False,False,False,False,False,False,False,True,False,False,False,False,False,False,False,False,False,False,False,False,True,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,True,False,False,False,False,False,True,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,True,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,True,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,True,False,False,False,False,False,False,False,False,False,False,False,False,False,False,True,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,True,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,True,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,True,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,True,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,True,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,True,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,True,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,True,False,False,False,False,False
52,-1.734266,-0.872094,-0.723484,1.763636,1.363326,0.960128,-1.124399,-0.914755,-0.761759,-0.825971,-0.131257,-0.280365,-0.102372,-0.110536,-0.872351,-0.723484,0.873248,-0.109752,-0.055754,-0.049984,-0.251228,-0.010486,-0.472739,-1.069922,0.803201,0.230947,-0.310811,-0.24992,-0.389147,-0.116499,-0.727188,1.209603,0.293856,-0.627075,-0.622643,-0.948164,-0.353013,-0.925356,-0.135861,-0.07604,-0.071929,0.056172,1.828206,-0.393626,-0.355857,-0.450192,0.300307,-0.626279,0.57917,-0.574657,-0.613726,-0.614948,-0.616073,-1.374061,-0.2606,-0.302591,-0.263092,-0.301276,-0.242768,-0.176924,-1.322376,-0.196803,-0.24081,0.157095,0.15788,0.158612,0.159385,0.160282,0.161261,0.162223,0.162984,0.184261,0.188936,0.199234,0.20491,0.211685,0.215199,0.219569,0.224109,0.226733,0.231534,0.233569,0.236054,0.238249,0.240541,0.243034,0.246003,0.248661,0.0,0.0,0.031977,0.046482,0.06008,0.066837,0.069134,-0.79084,False,True,False,True,True,True,False,False,False,False,False,False,True,False,False,True,True,True,True,False,False,False,False,False,False,True,False,False,False,False,True,True,True,True,True,True,True,False,False,False,False,True,True,True,True,True,True,True,False,False,False,False,False,True,False,False,True,False,True,True,True,False,False,False,False,True,False,False,False,False,True,False,True,False,False,False,False,False,False,False,True,False,True,True,False,True,False,True,False,True,False,True,False,True,False,True,False,False,False,True,False,True,False,False,False,False,False,False,True,False,False,True,False,False,False,False,True,False,False,True,False,True,False,False,True,False,False,True,False,False,True,False,False,False,True,False,False,True,False,False,True,False,False,True,False,False,True,False,True,False,False,False,True,False,False,True,False,False,True,False,True,False,True,False,False,False,True,False,False,False,False,False,True,False,False,True,False,False,False,False,True,False,False,False,True,False,False,False,False,True,False,True,False,False,False,False,True,False,True,False,False,False,True,False,False,False,True,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,True,False,False,False,False,False,True,False,False,False,False,False,False,False,False,False,True,False,False,False,False,False,False,False,True,False,True,False,False,False,False,False,False,False,False,False,True,False,True,False,False,False,False,False,False,False,False,True,False,False,False,True,False,False,False,False,True,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,True,False,False,False,False,False,False,True,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,True,False,False,False,False,False,False,False,False,True,False,False,False,False,False,False,False,False,False,False,False,False,False,False,True,False,False,False,False,False,False,False,True,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,True,False,False,False,False,True,False,False,False,False,False,False,False,False,True,False,False,False,False,False,False,False,True,False,False,False,False,False,False,False,True,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,True,False,False,False,False,False,False,True,False,False,False,False,False,False,False,False,True,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,True,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,True,False,False,False,False,False,False,False,False,True,False,False,False,False,False,False,False,False,False,True,False,False,False,False,False,False,False,False,False,False,True,False,False,False,False,True,False,False,False,False,False,False,False,False,False,False,False,False,True,False,False,False,False,False,False,False,False,False,True,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,True,False,False,False,False,True,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,True,False,True,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,True,False,False,False,False,False,False,False,False,False,True,False,False,False,False,False,False,False,False,False,False,False,False,True,False,False,False,False,False,False,False,False,False,True,False,False,False,False,False,False,False,False,False,False,True,False,False,False,False,False,False,False,False,False,False,True,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,True,False,False,False,False,False,False,False,True,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,True,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,True,False,False,False,False,False,False,False,False,False,False,False,False,True,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,True,False,False,False,False,False,False,False,False,False,False,True,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,True,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,True,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,True,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,True,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,True,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,True,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,True,False,False,False,False,False,False,False,False,False,False,False,False,False,True,False,False,False,False,False,False,False,False,False,False,False,False
53,-2.169433,-1.221118,1.879876,-1.186955,-0.774423,0.960128,1.421912,0.079422,0.440751,0.47573,-0.131257,-0.648642,-0.102372,-0.110536,-1.22139,1.879876,1.221603,-0.109752,-0.055754,-0.049984,-0.251228,-0.713671,-0.032011,1.493315,0.852802,0.470783,-0.310811,-0.24992,-0.389147,-0.116499,1.879954,0.683383,0.123901,-0.30741,0.718741,1.075321,0.428864,1.08321,-0.135861,-0.07604,-0.091388,-0.323408,-0.901372,-0.174634,-0.266981,0.45661,0.300307,-0.626279,-0.001272,-0.431157,-0.47057,-0.471498,-0.472895,-1.374061,-0.2606,-0.302591,-0.263092,-0.301276,-0.242768,-0.176924,-1.322376,-0.196803,-0.24081,0.157095,0.15788,0.158612,0.159385,0.160282,0.161261,0.162223,0.162984,0.184261,0.188936,0.199234,0.20491,0.211685,0.215199,0.219569,0.224109,0.226733,0.231534,0.233569,0.236054,0.238249,0.240541,0.243034,0.246003,0.248661,0.0,0.0,0.031977,0.046482,0.06008,0.066837,0.069134,-0.79084,True,True,False,True,True,True,False,False,False,False,False,False,True,False,False,False,True,True,False,False,False,False,False,False,False,True,False,False,False,False,True,True,True,True,True,False,True,False,False,False,False,True,True,True,True,True,True,True,False,False,False,False,False,True,False,False,True,False,False,True,True,False,False,True,False,False,True,False,False,True,False,True,False,False,False,False,False,False,False,False,True,True,False,True,False,True,False,True,False,True,False,True,False,True,False,True,False,False,False,True,False,True,False,False,False,False,False,True,False,False,False,True,False,False,False,False,True,False,False,True,False,True,False,False,True,False,False,True,False,False,True,False,False,False,True,False,False,True,False,False,True,False,False,True,False,True,False,False,True,False,False,False,True,False,False,True,False,False,True,False,True,False,True,False,False,False,False,False,False,False,False,False,True,False,True,False,False,False,True,False,False,False,False,False,False,True,False,False,False,True,False,True,False,False,False,False,True,False,True,False,False,False,True,False,False,False,True,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,True,False,True,False,False,False,True,False,False,False,False,False,False,False,False,False,True,False,False,False,False,False,False,False,True,False,False,False,False,False,True,False,False,False,False,False,True,False,False,False,False,False,True,False,False,False,False,False,True,False,True,False,False,False,False,False,True,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,True,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,True,False,False,False,False,False,False,True,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,True,False,False,False,False,False,False,False,True,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,True,False,False,False,False,False,False,False,True,False,False,False,False,False,False,False,True,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,True,False,False,False,False,False,False,False,False,True,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,True,False,False,False,False,True,False,False,False,False,False,False,False,False,False,False,False,False,True,False,False,False,False,False,False,False,False,False,True,False,False,False,False,False,False,False,False,False,True,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,True,False,False,False,True,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,True,False,False,False,False,False,True,False,False,False,False,False,False,False,False,False,False,False,False,False,True,False,False,False,False,False,False,False,False,False,True,False,False,False,False,False,False,False,False,False,False,False,False,False,True,False,False,False,False,False,False,True,False,False,False,False,False,False,False,False,False,False,True,False,False,False,False,False,False,False,False,False,False,False,False,False,True,False,False,False,False,False,False,False,False,False,False,False,False,False,True,False,False,False,False,False,False,False,False,False,False,False,False,True,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,True,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,True,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,True,False,False,False,False,False,False,False,False,False,False,False,False,False,False,True,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,True,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,True,False,False,False,False,False,False,False,False,False,False,False,True,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,True,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,True,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,True,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,True,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,True,False,False,False,False,False,False


Lets move on to fitting the lasso model!

In [14]:
# Build and run Lasso, then map nonzero coefficients back to variable labels (removed BaselineCategory)
from sklearn.linear_model import LassoCV
import numpy as np
import pandas as pd

# Fit Lasso with cross-validation
lasso = LassoCV(cv=5, random_state=0, max_iter=10000)
lasso.fit(X, Y)

# Get nonzero coefficients sorted by absolute value
coefs = pd.Series(lasso.coef_, index=X.columns)
nonzero_coefs = coefs[coefs != 0].reindex(coefs[coefs != 0].abs().sort_values(ascending=False).index)

# Create label mapping from Stata metadata
label_map = dict(zip(meta.column_names, meta.column_labels)) if hasattr(meta, 'column_labels') else {}
original_vars = set(meta.column_names) if hasattr(meta, 'column_names') else set()

def parse_variable(varname):
    """Parse variable to get base variable, category, and labels (no baseline category)."""
    # Find base variable and category
    if varname in original_vars:
        base_var, category = varname, ''
    elif '_' in varname:
        base, cat = varname.rsplit('_', 1)
        if base in original_vars:
            base_var, category = base, cat
        else:
            # Try progressively shorter prefixes for variables with underscores
            parts = varname.split('_')
            base_var, category = varname, ''
            for i in range(len(parts) - 1, 0, -1):
                potential_base = '_'.join(parts[:i])
                if potential_base in original_vars:
                    base_var, category = potential_base, '_'.join(parts[i:])
                    break
    else:
        base_var, category = varname, ''

    # Get variable label
    variable_label = label_map.get(base_var, base_var)

    # Get category label from value labels
    category_label = ''
    if category and hasattr(meta, 'variable_value_labels') and base_var in meta.variable_value_labels:
        value_dict = meta.variable_value_labels[base_var]
        for cat_type in [int, float, str]:
            try:
                cat_key = cat_type(category)
                if cat_key in value_dict:
                    category_label = value_dict[cat_key]
                    break
            except (ValueError, TypeError):
                continue

    return base_var, variable_label, category, category_label

# Build results table (no BaselineCategory)
results_data = []
for var in nonzero_coefs.index:
    base_var, variable_label, category, category_label = parse_variable(var)
    results_data.append({
        'Variable': var,
        'Label': variable_label,
        'Category': category,
        'CategoryLabel': category_label,
        'Coefficient': nonzero_coefs[var]
    })

# Display results with formatted CategoryLabel and wider Coefficient display
result_df = pd.DataFrame(results_data)
pd.set_option('display.max_rows', None)
print(f"Lasso found {len(result_df)} significant variables (nonzero coefficients):")

# Prepare display copy: truncate long category labels and format coefficients for readability
max_cat_len = 40  # max characters to show for category labels
result_df['CategoryLabel'] = result_df['CategoryLabel'].astype(str).apply(
    lambda s: s if len(s) <= max_cat_len else s[:max_cat_len-3] + '...'
)

# Keep numeric coefficient for downstream use, but create a formatted display version
result_df['Coefficient'] = result_df['Coefficient'].astype(float)
result_df_display = result_df.copy()
result_df_display['Coefficient'] = result_df_display['Coefficient'].map(lambda v: f"{v: .6f}")

# Show a compact, clearly formatted table
display_cols = ['Variable', 'Label', 'Category', 'CategoryLabel', 'Coefficient']
pd.set_option('display.max_colwidth', 50)
display(result_df_display[display_cols])

Lasso found 175 significant variables (nonzero coefficients):


Unnamed: 0,Variable,Label,Category,CategoryLabel,Coefficient
0,sf12mcs_dv,SF-12 Mental Component Summary (MCS),,,-3.555244
1,sclfsato_7,Satisfaction with life overall,7.0,completely satisfied,-1.727926
2,finnow_5,Subjective financial situation - current,5.0,Finding it very difficult,1.723305
3,sclfsato_6,Satisfaction with life overall,6.0,mostly satisfied,-1.239135
4,sf12pcs_dv,SF-12 Physical Component Summary (PCS),,,-1.020242
5,finnow_4,Subjective financial situation - current,4.0,Finding it quite difficult,0.88861
6,sclfsato_3,Satisfaction with life overall,3.0,somewhat dissatisfied,0.82065
7,sclfsato_5,Satisfaction with life overall,5.0,somewhat satisfied,-0.637365
8,sex_dv_1,"Sex, derived",1.0,Male,-0.413807
9,finfut_2,Subjective financial situation - future,2.0,Worse of than now,0.371181


**Very interesting outcome! Here are my key takeaways:**

Already included in my refined variable list:

- Neighbourhood social cohesion (nbrsnci_dv) is confirmed to be correlated with subjective well-being (scghq1_dv), as expected
- Demographics are important (e.g. age, gender)
- Education is important (highest qualification)
- Current economic activity is important

Variables to consider adding:
- Subjective financial situation (finnow) (finfut) -- clearly subjective financial strain is a key predictor of well-being

Variables I could consider deleting from my refined variable list:
- Job industry (not present here, and will be annoying due to lots of dummies)

Noteworthy variables:
- sclfsat* variables (e.g. satisfaction with life, health, income) are strong predictors of wellbeing. But they are effecitvely alternative wellbeing outcomes. If we include them, we risk circularity (i.e. we are predicting wellbeing with other measures of wellbeing). So I will exclude them
- sf12mcs (SF-12 mental health score) and sf12pcs (SF-12 physical health score) are strong predictors of wellbeing. They are also arguably alternative measures of wellbeing, so if we include them we risk circularity (over-control).




