# Introduction

This notebook will have a look at the large processed dataset (343 vars, 175,335 obs), and run a lasso regression to see which variables are most predictive of subjective wellbeing (measured on the Likert scale).

Before starting, these are the 15 variables I hand selected:

- gor_dv : Government Office Region
- urban_dv : Urban or rural area, derived
- nbrsnci_dv : Buckner's Neighbourhood Cohesion Instrument, short (α= .88)
- scghq1_dv : Subjective wellbeing (GHQ): Likert
- sex_dv : Sex, derived
- age_dv : Age, derived from dob_dv and intdat_dv
- ethn_dv : Ethnic group (derived from multiple sources)
- marstat_dv : Harmonised de facto marital status
- jbstat : Current economic activity
- qfhigh_dv : Highest educational qualification ever reported
- jbiindb_dv : Current job: Industrial classification (CNEF), two digits
- fimnnet_dv : total net personal income - no deductions
- fihhmnnet1_dv : total household net income - no deductions
- houscost1_dv : monthly housing cost including mortgage principal payments
- health : Long-standing illness or disability

In [1]:
import pandas as pd
pd.set_option('display.max_rows', None)
import matplotlib.pyplot as plt
%matplotlib inline
import pyreadstat

# Make 'head' scroll across the width of the screen
pd.set_option('display.max_columns', None)

In [2]:
# Import file

path = "/Users/arikatz/VSCode Projects/ukhls-informal-institutions-project/data/droppedvaraibleswithlotsofnegatives.dta"
df_full, meta = pyreadstat.read_dta(path)

print("Shape:", df_full.shape)
df_full.head()

Shape: (175335, 343)


Unnamed: 0,pidp,wave,wave_num,nbrsnci_dv,scghq1_dv,hidp,pno,hhorig,memorig,psu,strata,sampst,month,ivfio,ioutcome,sex,dvage,birthy,istrtdatd,istrtdatm,istrtdaty,lkmove,xpmove,jbstat,racel_dv,health,aidhh,aidxhh,j2has,bensta2,bensta3,bensta4,bensta5,bensta6,bensta7,bensta96,fiyrdia,finnow,finfut,vote1,vote6,mobuse,nch14resp,nch415resp,nchresp,nnatch,nadoptch,nchunder16,nch5to15,nch10to15,sclfsat1,sclfsat2,sclfsat7,sclfsato,marstat,employ,hgbiom,hgbiof,hgpart,respf16,respm16,intdatd_if,intdatm_if,intdaty_if,doby_if,age_if,pn1pno,pn2pno,pns1pno,pns2pno,hhsize,jbhas,istrtdathh,istrtdatmm,istrtdatss,ienddathh,ienddatmm,ienddatss,j2pay_if,fimngrs_tc,fimngrs_dv,fimnlabgrs_tc,fimnlabgrs_dv,fimnlabnet_tc,fimnlabnet_dv,fiyrinvinc_tc,fiyrinvinc_dv,fibenothr_tc,fibenothr_dv,j2pay_dv,j2paynet_dv,sex_dv,age_dv,intdatd_dv,intdatm_dv,intdaty_dv,doby_dv,pensioner_dv,npensioner_dv,marstat_dv,npn_dv,npns_dv,ngrp_dv,nnsib_dv,nnssib_dv,ethn_dv,fimnmisc_dv,fimnprben_dv,fimninvnet_dv,fimnpen_dv,fimnsben_dv,fimnnet_dv,country,gor_dv,urban_dv,hhresp_dv,xtra5min_dv,agegr5_dv,agegr10_dv,agegr13_dv,livesp_dv,cohab_dv,single_dv,mastat_dv,hhtype_dv,buno_dv,depchl_dv,nchild_dv,ndepchl_dv,respm16_dv,respf16_dv,rach16_dv,hrpid,hrpno,ppno,sppno,fnpno,fnspno,mnpno,mnspno,grfpno,grmpno,qfhigh_dv,qfhighfl_dv,hiqual_dv,jbiindb_dv,sf12pcs_dv,sf12mcs_dv,scflag_dv,paygu_if,paynu_if,seearngrs_if,fiyrinvinc_if,fibenothr_if,fimnlabgrs_if,fimngrs_if,ind5mus_xw,ivfho,intdated,intdatem,intdatey,ivh1,ivh2,ivh3,ivh4,ivh5,ivh6,ivh7,ivh8,ivh9,ivh10,ivh11,ivh12,ivh13,ivh14,ivh15,ivh16,hsbeds,hsrooms,hsownd,fuelhave1,fuelhave2,fuelhave3,fuelhave4,fuelhave96,fuelduel,heatch,xphsdct,xphsdba,cduse1,cduse2,cduse5,cduse6,cduse7,cduse8,cduse9,cduse12,cduse13,cduse96,pcnet,xpfood1_g3,xpfdout_g3,xpaltob_g3,ncars,hhintlang,n10to15,fihhmngrs_dv,fihhmngrs_tc,fihhmnlabgrs_dv,fihhmnlabgrs_tc,ctband_if,fihhmnnet1_dv,fihhmnlabnet_dv,fihhmnmisc_dv,fihhmnprben_dv,fihhmninv_dv,fihhmnpen_dv,fihhmnsben_dv,houscost1_dv,houscost2_dv,fihhmngrs1_dv,ctband_dv,ncouple_dv,nonepar_dv,nkids_dv,nch02_dv,nch34_dv,nch511_dv,nch1215_dv,npens_dv,nemp_dv,nue_dv,nwage_dv,nchoecd_dv,nadoecd_dv,ieqmoecd_dv,tenure_dv,fihhnegsei_if,fihhmngrs_if,issue_num,aintlen,outcome,ivtnc,w6osmflag,dcsedfl_dv,lwenum_dv,fwenum_dv,lwintvd_dv,fwintvd_dv,b_hidp,b_pno,b_ivfio,b_ivfho,b_month,c_hidp,c_pno,c_ivfio,c_ivfho,c_month,d_hidp,d_pno,d_ivfio,d_ivfho,d_month,e_hidp,e_pno,e_ivfio,e_ivfho,e_month,f_hidp,f_pno,f_ivfio,f_ivfho,f_month,g_hidp,g_pno,g_ivfio,g_ivfho,g_month,h_hidp,h_pno,h_ivfio,h_ivfho,h_month,i_hidp,i_pno,i_ivfio,i_ivfho,i_month,genetics,epigenetics,xwdat_dv,scend_dv,school_dv,bornuk_dv,generation,evercoh_dv,evermar_dv,anychild_dv,ethn_dv_source,prob91e,prob91w,prob91s,prob99w,prob99s,prob01ni,prob09ni,prob09e,prob09w,prob09s,bb_mortbh_tw,bc_mortbh_tw,bd_mortbh_tw,be_mortbh_tw,bf_mortbh_tw,bg_mortbh_tw,bh_mortbh_tw,bi_mortbh_tw,bj_mortbh_tw,bk_mortbh_tw,bl_mortbh_tw,bm_mortbh_tw,bn_mortbh_tw,bo_mortbh_tw,bp_mortbh_tw,bq_mortbh_tw,br_mortbh_tw,b_mortbh_tw,c_mortbh_tw,d_mortbh_tw,e_mortbh_tw,f_mortbh_tw,g_mortbh_tw,h_mortbh_tw,i_mortbh_tw,b_mortus_tw,c_mortus_tw,d_mortus_tw,e_mortus_tw,f_mortus_tw,g_mortus_tw,h_mortus_tw,psnenub_xd
0,22445,f,6,3.4,25,278664010,3,3,3,4,2,1,6,1,11,2,29,1984,26,6,2014,2,1,2,1,2,2,2,1,0,0,0,0,0,0,1,0,2,2,1,2,1,0,0,0,0,0,0,0,0,2,5,2,3,1,1,1,0,0,0,0,0,0,0,0,0,1,0,1,0,2,1,18,16,57,19,7,12,0,0,2572.590088,0,2572.590088,0,2012.0,0,0.0,0,0.0,90,72.0,2,29,26,6,2014,1984,2,1,6,1,1,0,0,0,1,0.0,0.0,0.0,0.0,0.0,2012.0,1,7,1,1,0,6,3,5,0,0,1,1,17,3,2,0,0,2,2,2,272012925,1,0,0,0,0,1,1,0,0,-8,0,3,31,62.12,32.59,1,0,0,0,0,0.0,0.0,0.0,0.0,14,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,0,2,14,4,14,4,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,280942006,5,1,10,4,279255608,5,1,10,4,278664010,3,1,14,6,278447092,1,1,10,6,278092814,1,1,10,6,277344816,1,1,10,6,0,0,3,-8,3,1,5,1,1,1,1,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.999978,0.999948,0.99992,0.99989,0.999854,0.999813,0.999772,0.999738,0.999689,0.999649,0.999609,0.999566,0.999452,0.999389,0.999288,0.999219,0.999144,0.999005,0.998933,0.99884,0.998742,0.998624,0.998511,0.998397,0.998219,1.0,1.0,1.0,1.0,1.0,1.0,1.0,0.0
1,22445,i,9,3.3,11,277344816,1,3,3,4,2,1,6,1,11,2,33,1984,23,10,2017,2,2,2,1,2,-8,2,2,0,0,0,0,0,0,1,0,2,3,1,1,1,0,0,0,0,0,0,0,0,4,4,4,5,2,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,1,18,50,38,19,28,40,0,0,2423.030029,0,2333.330078,0,1200.0,0,0.0,0,89.699997,0,0.0,2,33,23,10,2017,1984,2,0,1,0,0,0,0,0,1,0.0,0.0,0.0,0.0,89.699997,1289.699951,1,7,1,1,0,7,4,6,0,0,1,2,3,1,2,0,-8,2,2,2,22445,1,0,0,0,0,0,0,0,0,-8,0,3,31,57.2,46.08,1,0,0,0,0,0.0,0.0,0.0,0.0,10,23.0,10.0,2017.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,3.0,3.0,2.0,1.0,1.0,0.0,0.0,0.0,1.0,1.0,2.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,0.0,1.0,300.0,150.0,30.0,1.0,0.0,0.0,2423.030029,0.0,2333.330078,0.0,1.0,1289.699951,1200.0,0.0,0.0,0.0,0.0,89.699997,1300.0,736.869995,2423.030029,4.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,1.0,0.0,1.0,1.0,2.0,0.0,0.0,1.0,10.0,110.0,9.0,0,2,14,4,14,4,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,280942006,5,1,10,4,279255608,5,1,10,4,278664010,3,1,14,6,278447092,1,1,10,6,278092814,1,1,10,6,277344816,1,1,10,6,0,0,3,-8,3,1,5,1,1,1,1,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.999978,0.999948,0.99992,0.99989,0.999854,0.999813,0.999772,0.999738,0.999689,0.999649,0.999609,0.999566,0.999452,0.999389,0.999288,0.999219,0.999144,0.999005,0.998933,0.99884,0.998742,0.998624,0.998511,0.998397,0.998219,1.0,1.0,1.0,1.0,1.0,1.0,1.0,0.0
2,22445,l,12,1.6,32,276637622,1,3,3,4,2,1,4,1,11,2,35,1984,2,4,2020,2,1,6,1,2,2,2,2,0,0,0,0,0,0,1,0,4,1,1,2,1,2,1,2,2,0,2,0,0,5,3,3,5,2,2,0,0,2,0,1,0,0,0,0,0,0,0,0,0,4,2,21,1,46,21,17,32,0,0,145.169998,0,0.0,0,0.0,0,0.0,0,145.169998,0,0.0,2,35,2,4,2020,1984,2,0,1,0,0,0,0,0,1,0.0,0.0,0.0,0.0,145.169998,145.169998,1,7,1,1,0,8,4,7,1,0,0,2,11,1,2,2,2,1,2,1,276841780,1,2,2,0,0,0,0,0,0,1,0,1,0,67.18,19.42,0,0,0,0,0,1.0,0.0,1.0,0.0,10,2.0,4.0,2020.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,3.0,3.0,2.0,1.0,1.0,1.0,0.0,0.0,1.0,2.0,2.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,0.0,1.0,300.0,100.0,100.0,0.0,0.0,0.0,5656.390137,0.0,5070.0,0.0,1.0,4146.390137,3560.0,350.0,0.0,0.0,0.0,236.389999,1350.0,705.679993,5656.390137,4.0,1.0,0.0,2.0,1.0,1.0,0.0,0.0,0.0,1.0,1.0,2.0,2.0,2.0,2.1,2.0,0.0,0.0257,1.0,9.35,110.0,-9.0,0,2,14,4,14,4,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,280942006,5,1,10,4,279255608,5,1,10,4,278664010,3,1,14,6,278447092,1,1,10,6,278092814,1,1,10,6,277344816,1,1,10,6,0,0,3,-8,3,1,5,1,1,1,1,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.999978,0.999948,0.99992,0.99989,0.999854,0.999813,0.999772,0.999738,0.999689,0.999649,0.999609,0.999566,0.999452,0.999389,0.999288,0.999219,0.999144,0.999005,0.998933,0.99884,0.998742,0.998624,0.998511,0.998397,0.998219,1.0,1.0,1.0,1.0,1.0,1.0,1.0,0.0
3,29925,f,6,4.1,11,620547610,1,3,3,6,2,1,8,1,11,2,37,1977,29,9,2014,1,1,1,1,1,2,2,2,0,0,1,0,0,0,0,0,4,1,2,2,1,2,1,2,2,0,2,0,0,3,2,5,4,4,1,0,0,0,0,1,0,0,0,0,0,0,0,0,0,3,1,10,0,47,10,48,3,0,0,2175.620117,0,13.82,0,13.82,0,0.0,0,2161.800049,0,0.0,2,37,29,9,2014,1977,2,0,5,0,0,0,0,0,1,0.0,320.0,0.0,0.0,1841.800049,2175.620117,1,7,1,1,0,8,4,7,0,0,1,4,5,1,2,2,2,1,2,1,29925,1,0,0,0,0,0,0,0,0,-8,0,1,30,56.59,35.67,1,0,0,1,0,0.04,1.0,0.05,0.0,10,29.0,9.0,2014.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,2.0,1.0,4.0,1.0,1.0,0.0,0.0,0.0,1.0,1.0,2.0,1.0,1.0,1.0,1.0,1.0,0.0,1.0,1.0,1.0,1.0,0.0,1.0,350.0,30.0,0.0,1.0,0.0,0.0,2175.620117,0.0,13.82,0.0,3.0,2175.620117,13.82,0.0,320.0,0.0,0.0,1841.800049,1451.0,1451.0,2175.620117,2.0,0.0,1.0,2.0,0.0,2.0,0.0,0.0,0.0,1.0,0.0,1.0,2.0,1.0,1.6,7.0,0.0,0.0451,1.0,10.0,110.0,3.0,0,2,14,4,14,4,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,622866406,3,2,11,9,621384688,1,50,61,9,620547610,1,1,10,8,620316412,1,1,10,8,619935614,1,1,10,8,619024416,1,1,10,8,0,0,3,-8,3,1,5,1,1,1,2,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.999955,0.999895,0.999839,0.999777,0.999704,0.999622,0.999538,0.99947,0.999369,0.999289,0.999208,0.99912,0.99889,0.998761,0.998557,0.998418,0.998266,0.997985,0.997838,0.997649,0.997451,0.997212,0.996983,0.996752,0.996392,1.0,1.0,1.0,1.0,1.0,1.0,1.0,0.0
4,29925,i,9,3.5,9,619024416,1,3,3,6,2,1,8,1,11,2,40,1977,22,8,2017,1,2,2,1,1,2,2,2,0,0,1,0,0,0,0,0,4,3,2,1,1,2,2,2,2,0,2,2,0,6,3,6,5,5,1,0,0,0,0,1,0,0,0,0,0,0,0,0,0,3,2,20,39,52,21,37,24,0,0,3054.530029,0,1400.0,0,1250.0,0,0.0,0,1654.530029,0,0.0,2,40,22,8,2017,1977,2,0,4,0,0,0,0,0,1,0.0,1000.0,0.0,0.0,654.530029,2904.530029,1,7,1,1,0,9,5,8,0,0,1,5,5,1,2,2,2,1,2,1,622866606,1,0,0,0,0,0,0,0,0,-8,0,1,27,62.04,41.06,0,0,0,0,0,0.0,0.0,0.0,0.0,10,22.0,8.0,2017.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,2.0,1.0,1.0,1.0,1.0,0.0,0.0,0.0,1.0,1.0,2.0,1.0,1.0,1.0,1.0,1.0,0.0,1.0,1.0,1.0,1.0,0.0,1.0,300.0,100.0,0.0,1.0,0.0,0.0,3054.530029,0.0,1400.0,0.0,2.0,2904.530029,1250.0,0.0,1000.0,0.0,0.0,654.530029,0.0,0.0,3054.530029,4.0,0.0,1.0,2.0,0.0,0.0,2.0,0.0,0.0,1.0,0.0,1.0,2.0,1.0,1.6,1.0,0.0,0.0011,1.0,17.0,110.0,-9.0,0,2,14,4,14,4,-9,-9,-9,-9,-9,-9,-9,-9,-9,-9,622866406,3,2,11,9,621384688,1,50,61,9,620547610,1,1,10,8,620316412,1,1,10,8,619935614,1,1,10,8,619024416,1,1,10,8,0,0,3,-8,3,1,5,1,1,1,2,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.999955,0.999895,0.999839,0.999777,0.999704,0.999622,0.999538,0.99947,0.999369,0.999289,0.999208,0.99912,0.99889,0.998761,0.998557,0.998418,0.998266,0.997985,0.997838,0.997649,0.997451,0.997212,0.996983,0.996752,0.996392,1.0,1.0,1.0,1.0,1.0,1.0,1.0,0.0


To speed things up, we'll only use observations from wave c (the largest wave, with 40,509 obs)

In [None]:
# Drop all rows with non-F wave observations
df = df_full[df_full['wave'] == 'f']
print("Shape after dropping non-F wave observations:", df.shape)
df.head(20)

Great! Now we need to deal with missing and negative values. 

First, let's see which columns have the most missing and negative values.

Then, we can drop those columns.

Finally, we'll drop any remaining rows with missing or negative values.

In [3]:
# Calculate % missing and % negative for each column
missing_pct = df.isnull().mean() 
negative_pct = (df.select_dtypes(include=['number']) < 0).mean() * 100

# Combine into a summary DataFrame and sort by the sum of % missing and % negative (descending)
summary = pd.DataFrame({
    '% Missing': missing_pct,
    '% Negative': negative_pct
}).fillna(0)

summary['% Total'] = summary['% Missing'] + summary['% Negative']
summary = summary.sort_values(by='% Total', ascending=False)

summary.head(50)

NameError: name 'df' is not defined

In [None]:
# Delete the top 25 columns with the highest % missing + % negative
cols_to_drop = summary.head(25).index
df = df.drop(columns=cols_to_drop)

print("Shape after dropping top 25 columns with highest % missing + % negative:", df.shape)
df.head()

Shape after dropping top 25 columns with highest % missing + % negative: (35472, 318)


Unnamed: 0,pidp,wave,wave_num,nbrsnci_dv,scghq1_dv,hidp,pno,hhorig,memorig,psu,strata,sampst,month,ivfio,ioutcome,sex,dvage,birthy,istrtdatd,istrtdatm,istrtdaty,lkmove,jbstat,racel_dv,health,aidxhh,j2has,bensta2,bensta3,bensta4,bensta5,bensta6,bensta7,bensta96,finnow,finfut,vote1,vote6,mobuse,nch14resp,nch415resp,nchresp,nnatch,nadoptch,nchunder16,nch5to15,nch10to15,sclfsat1,sclfsat2,sclfsat7,sclfsato,marstat,employ,hgbiom,hgbiof,hgpart,respf16,respm16,intdatd_if,intdatm_if,intdaty_if,doby_if,age_if,pn1pno,pn2pno,pns1pno,pns2pno,hhsize,jbhas,istrtdathh,istrtdatmm,istrtdatss,ienddathh,ienddatmm,ienddatss,j2pay_if,fimngrs_tc,fimngrs_dv,fimnlabgrs_tc,fimnlabgrs_dv,fimnlabnet_tc,fimnlabnet_dv,fiyrinvinc_tc,fiyrinvinc_dv,fibenothr_tc,fibenothr_dv,j2pay_dv,j2paynet_dv,sex_dv,age_dv,intdatd_dv,intdatm_dv,intdaty_dv,doby_dv,pensioner_dv,npensioner_dv,marstat_dv,npn_dv,npns_dv,ngrp_dv,nnsib_dv,nnssib_dv,ethn_dv,fimnmisc_dv,fimnprben_dv,fimninvnet_dv,fimnpen_dv,fimnsben_dv,fimnnet_dv,country,gor_dv,urban_dv,hhresp_dv,xtra5min_dv,agegr5_dv,agegr10_dv,agegr13_dv,livesp_dv,cohab_dv,single_dv,mastat_dv,hhtype_dv,buno_dv,depchl_dv,nchild_dv,respm16_dv,respf16_dv,rach16_dv,hrpid,hrpno,ppno,sppno,fnpno,fnspno,mnpno,mnspno,grfpno,grmpno,qfhighfl_dv,hiqual_dv,jbiindb_dv,sf12pcs_dv,sf12mcs_dv,scflag_dv,paygu_if,paynu_if,seearngrs_if,fiyrinvinc_if,fibenothr_if,fimnlabgrs_if,fimngrs_if,ind5mus_xw,ivfho,intdated,intdatem,intdatey,ivh1,ivh2,ivh3,ivh4,ivh5,ivh6,ivh7,ivh8,ivh9,ivh10,ivh11,ivh12,ivh13,ivh14,ivh15,ivh16,hsbeds,hsrooms,hsownd,fuelhave1,fuelhave2,fuelhave3,fuelhave4,fuelhave96,fuelduel,heatch,xphsdct,xphsdba,cduse1,cduse2,cduse5,cduse6,cduse7,cduse8,cduse9,cduse12,cduse13,cduse96,pcnet,xpfood1_g3,xpfdout_g3,xpaltob_g3,ncars,hhintlang,n10to15,fihhmngrs_dv,fihhmngrs_tc,fihhmnlabgrs_dv,fihhmnlabgrs_tc,ctband_if,fihhmnnet1_dv,fihhmnlabnet_dv,fihhmnmisc_dv,fihhmnprben_dv,fihhmninv_dv,fihhmnpen_dv,fihhmnsben_dv,houscost1_dv,houscost2_dv,fihhmngrs1_dv,ctband_dv,ncouple_dv,nonepar_dv,nkids_dv,nch02_dv,nch34_dv,nch511_dv,nch1215_dv,npens_dv,nemp_dv,nue_dv,nwage_dv,nchoecd_dv,nadoecd_dv,ieqmoecd_dv,tenure_dv,fihhnegsei_if,fihhmngrs_if,issue_num,aintlen,outcome,ivtnc,w6osmflag,dcsedfl_dv,lwenum_dv,fwenum_dv,lwintvd_dv,fwintvd_dv,d_hidp,d_pno,d_ivfio,d_ivfho,e_hidp,e_pno,e_ivfio,e_ivfho,e_month,f_hidp,f_pno,f_ivfio,f_ivfho,f_month,g_hidp,g_pno,g_ivfio,g_ivfho,g_month,h_hidp,h_ivfio,h_ivfho,genetics,epigenetics,xwdat_dv,school_dv,bornuk_dv,evercoh_dv,evermar_dv,anychild_dv,ethn_dv_source,prob91e,prob91w,prob91s,prob99w,prob99s,prob01ni,prob09ni,prob09e,prob09w,prob09s,bb_mortbh_tw,bc_mortbh_tw,bd_mortbh_tw,be_mortbh_tw,bf_mortbh_tw,bg_mortbh_tw,bh_mortbh_tw,bi_mortbh_tw,bj_mortbh_tw,bk_mortbh_tw,bl_mortbh_tw,bm_mortbh_tw,bn_mortbh_tw,bo_mortbh_tw,bp_mortbh_tw,bq_mortbh_tw,br_mortbh_tw,b_mortbh_tw,c_mortbh_tw,d_mortbh_tw,e_mortbh_tw,f_mortbh_tw,g_mortbh_tw,h_mortbh_tw,i_mortbh_tw,b_mortus_tw,c_mortus_tw,d_mortus_tw,e_mortus_tw,f_mortus_tw,g_mortus_tw,h_mortus_tw,psnenub_xd
0,22445,f,6,3.4,25,278664010,3,3,3,4,2,1,6,1,11,2,29,1984,26,6,2014,2,2,1,2,2,1,0,0,0,0,0,0,1,2,2,1,2,1,0,0,0,0,0,0,0,0,2,5,2,3,1,1,1,0,0,0,0,0,0,0,0,0,1,0,1,0,2,1,18,16,57,19,7,12,0,0,2572.590088,0,2572.590088,0,2012.0,0,0.0,0,0.0,90,72.0,2,29,26,6,2014,1984,2,1,6,1,1,0,0,0,1,0.0,0.0,0.0,0.0,0.0,2012.0,1,7,1,1,0,6,3,5,0,0,1,1,17,3,2,0,2,2,2,272012925,1,0,0,0,0,1,1,0,0,0,3,31,62.12,32.59,1,0,0,0,0,0.0,0.0,0.0,0.0,14,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,0,2,14,4,14,4,280942006,5,1,10,279255608,5,1,10,4,278664010,3,1,14,6,278447092,1,1,10,6,278092814,1,10,0,0,3,3,1,1,1,1,1,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.999978,0.999948,0.99992,0.99989,0.999854,0.999813,0.999772,0.999738,0.999689,0.999649,0.999609,0.999566,0.999452,0.999389,0.999288,0.999219,0.999144,0.999005,0.998933,0.99884,0.998742,0.998624,0.998511,0.998397,0.998219,1.0,1.0,1.0,1.0,1.0,1.0,1.0,0.0
3,29925,f,6,4.1,11,620547610,1,3,3,6,2,1,8,1,11,2,37,1977,29,9,2014,1,1,1,1,2,2,0,0,1,0,0,0,0,4,1,2,2,1,2,1,2,2,0,2,0,0,3,2,5,4,4,1,0,0,0,0,1,0,0,0,0,0,0,0,0,0,3,1,10,0,47,10,48,3,0,0,2175.620117,0,13.82,0,13.82,0,0.0,0,2161.800049,0,0.0,2,37,29,9,2014,1977,2,0,5,0,0,0,0,0,1,0.0,320.0,0.0,0.0,1841.800049,2175.620117,1,7,1,1,0,8,4,7,0,0,1,4,5,1,2,2,1,2,1,29925,1,0,0,0,0,0,0,0,0,0,1,30,56.59,35.67,1,0,0,1,0,0.04,1.0,0.05,0.0,10,29.0,9.0,2014.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,2.0,1.0,4.0,1.0,1.0,0.0,0.0,0.0,1.0,1.0,2.0,1.0,1.0,1.0,1.0,1.0,0.0,1.0,1.0,1.0,1.0,0.0,1.0,350.0,30.0,0.0,1.0,0.0,0.0,2175.620117,0.0,13.82,0.0,3.0,2175.620117,13.82,0.0,320.0,0.0,0.0,1841.800049,1451.0,1451.0,2175.620117,2.0,0.0,1.0,2.0,0.0,2.0,0.0,0.0,0.0,1.0,0.0,1.0,2.0,1.0,1.6,7.0,0.0,0.0451,1.0,10.0,110.0,3.0,0,2,14,4,14,4,622866406,3,2,11,621384688,1,50,61,9,620547610,1,1,10,8,620316412,1,1,10,8,619935614,1,10,0,0,3,3,1,1,1,1,2,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.999955,0.999895,0.999839,0.999777,0.999704,0.999622,0.999538,0.99947,0.999369,0.999289,0.999208,0.99912,0.99889,0.998761,0.998557,0.998418,0.998266,0.997985,0.997838,0.997649,0.997451,0.997212,0.996983,0.996752,0.996392,1.0,1.0,1.0,1.0,1.0,1.0,1.0,0.0
10,333205,f,6,3.4,5,416683610,3,3,3,80,17,1,5,1,11,2,24,1990,2,6,2014,2,2,1,2,1,2,0,0,0,0,0,0,1,1,2,2,3,1,0,0,0,0,0,0,0,0,6,1,5,1,1,1,2,1,0,0,0,0,0,0,0,0,1,2,1,2,3,1,19,2,37,19,49,44,0,0,2129.169922,0,2125.0,0,1600.0,0,50.040001,0,0.0,0,0.0,2,24,2,6,2014,1990,2,0,6,2,2,0,0,0,1,0.0,0.0,4.17,0.0,0.0,1604.170044,1,5,2,1,0,5,3,4,0,0,1,1,19,3,2,0,2,2,2,411584289,2,0,0,1,1,2,2,0,0,0,4,33,57.49,54.2,1,0,0,0,0,0.0,0.0,0.0,0.0,10,28.0,5.0,2014.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,5.0,3.0,2.0,1.0,1.0,0.0,0.0,0.0,2.0,1.0,2.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,0.0,1.0,400.0,50.0,25.0,4.0,0.0,0.0,8560.679688,0.0,5125.0,0.0,1.0,6885.680176,3450.0,0.0,0.0,135.679993,3300.0,0.0,1116.0,56.700001,8560.679688,7.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,2.0,1.0,3.0,0.0,3.0,2.0,2.0,0.0,0.0154,1.0,14.0,110.0,5.0,0,2,11,6,11,6,-9,-9,-9,-9,-9,-9,-9,-9,-9,416683610,3,1,10,5,416466012,3,1,10,5,415935614,1,10,0,0,3,3,1,1,1,2,3,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.999982,0.999958,0.999936,0.999911,0.999882,0.99985,0.999816,0.999789,0.999749,0.999717,0.999685,0.99965,0.999559,0.999507,0.999426,0.99937,0.99931,0.999198,0.99914,0.999064,0.998985,0.99889,0.998798,0.998707,0.998563,1.0,1.0,1.0,1.0,1.0,1.0,1.0,0.0
12,387605,f,6,2.8,22,347486810,3,3,3,92,20,1,5,1,11,2,26,1988,30,5,2014,2,3,1,2,2,2,0,0,0,0,0,0,1,4,1,2,2,1,0,0,0,0,0,0,0,0,3,6,3,2,1,2,2,1,0,0,0,0,0,0,0,0,1,2,1,2,3,2,17,32,33,18,9,39,0,0,309.829987,0,0.0,0,0.0,0,0.0,0,309.829987,0,0.0,2,26,30,5,2014,1988,2,0,6,2,2,0,0,0,1,0.0,0.0,0.0,0.0,309.829987,309.829987,1,9,2,1,0,6,3,5,0,0,1,1,19,3,2,0,2,2,2,341490565,1,0,0,1,1,2,2,0,0,0,3,0,65.49,19.88,1,0,0,0,0,0.0,0.0,0.0,0.0,10,30.0,5.0,2014.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,4.0,2.0,2.0,1.0,0.0,1.0,0.0,0.0,-8.0,1.0,2.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,0.0,1.0,320.0,100.0,80.0,2.0,0.0,0.0,5860.490234,0.0,5509.0,0.0,1.0,4514.490234,4163.0,0.0,0.0,41.66,0.0,309.829987,560.0,37.799999,5860.490234,4.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,2.0,1.0,3.0,0.0,3.0,2.0,2.0,0.0,0.0,1.0,11.0,110.0,5.0,0,2,8,4,7,4,349690006,3,1,12,348044408,3,1,10,5,347486810,3,1,10,5,347194412,3,1,10,5,346698014,11,12,0,0,3,3,1,1,2,2,3,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.99998,0.999953,0.999928,0.9999,0.999867,0.99983,0.999793,0.999762,0.999717,0.999681,0.999645,0.999606,0.999502,0.999445,0.999353,0.999291,0.999222,0.999096,0.999031,0.998946,0.998857,0.998749,0.998647,0.998543,0.998381,1.0,1.0,1.0,1.0,1.0,1.0,1.0,0.0
15,541285,f,6,1.9,18,143384810,2,3,3,120,30,1,1,1,11,1,28,1985,14,1,2014,2,1,1,2,2,2,0,0,0,0,0,0,1,1,1,2,2,1,0,0,0,0,0,0,0,0,3,6,4,3,1,1,1,0,0,0,0,0,0,0,0,0,1,0,1,0,3,2,15,58,31,16,49,49,0,0,1300.0,0,1300.0,0,1126.800049,0,0.0,0,0.0,0,0.0,1,28,14,1,2014,1985,2,0,6,1,1,0,1,1,1,0.0,0.0,0.0,0.0,0.0,1126.800049,1,4,1,3,0,6,3,5,0,0,1,1,22,2,2,0,2,2,2,137280449,1,0,0,0,0,1,1,0,0,1,1,20,61.21,24.76,1,0,0,0,0,0.0,0.0,0.0,0.0,12,24.0,1.0,2014.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,5.0,4.0,1.0,1.0,1.0,0.0,0.0,0.0,1.0,1.0,-9.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,0.0,1.0,400.0,45.0,120.0,4.0,0.0,0.0,2226.040039,0.0,2224.790039,0.0,1.0,2045.619995,2044.369995,0.0,0.0,1.25,0.0,0.0,0.0,0.0,2226.040039,6.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,3.0,0.0,3.0,0.0,3.0,2.0,1.0,0.0,0.4156,1.0,18.0,210.0,4.0,0,2,12,3,6,3,145989206,4,2,11,144146408,3,2,11,2,143384810,2,1,12,1,143092412,2,11,14,1,142881614,11,12,0,0,3,1,1,2,2,2,3,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.999979,0.999951,0.999925,0.999896,0.999862,0.999824,0.999785,0.999753,0.999706,0.999669,0.999631,0.99959,0.999483,0.999423,0.999328,0.999263,0.999193,0.999062,0.998994,0.998906,0.998814,0.998702,0.998596,0.998489,0.998321,1.0,1.0,1.0,1.0,1.0,1.0,1.0,0.0


In [None]:
# Drop rows with any missing values
df = df.dropna()  # Remove rows with any NaN values

print("Shape after dropping rows with missing values:", df.shape)

df.head()

Shape after dropping rows with missing values: (35131, 318)


Unnamed: 0,pidp,wave,wave_num,nbrsnci_dv,scghq1_dv,hidp,pno,hhorig,memorig,psu,strata,sampst,month,ivfio,ioutcome,sex,dvage,birthy,istrtdatd,istrtdatm,istrtdaty,lkmove,jbstat,racel_dv,health,aidxhh,j2has,bensta2,bensta3,bensta4,bensta5,bensta6,bensta7,bensta96,finnow,finfut,vote1,vote6,mobuse,nch14resp,nch415resp,nchresp,nnatch,nadoptch,nchunder16,nch5to15,nch10to15,sclfsat1,sclfsat2,sclfsat7,sclfsato,marstat,employ,hgbiom,hgbiof,hgpart,respf16,respm16,intdatd_if,intdatm_if,intdaty_if,doby_if,age_if,pn1pno,pn2pno,pns1pno,pns2pno,hhsize,jbhas,istrtdathh,istrtdatmm,istrtdatss,ienddathh,ienddatmm,ienddatss,j2pay_if,fimngrs_tc,fimngrs_dv,fimnlabgrs_tc,fimnlabgrs_dv,fimnlabnet_tc,fimnlabnet_dv,fiyrinvinc_tc,fiyrinvinc_dv,fibenothr_tc,fibenothr_dv,j2pay_dv,j2paynet_dv,sex_dv,age_dv,intdatd_dv,intdatm_dv,intdaty_dv,doby_dv,pensioner_dv,npensioner_dv,marstat_dv,npn_dv,npns_dv,ngrp_dv,nnsib_dv,nnssib_dv,ethn_dv,fimnmisc_dv,fimnprben_dv,fimninvnet_dv,fimnpen_dv,fimnsben_dv,fimnnet_dv,country,gor_dv,urban_dv,hhresp_dv,xtra5min_dv,agegr5_dv,agegr10_dv,agegr13_dv,livesp_dv,cohab_dv,single_dv,mastat_dv,hhtype_dv,buno_dv,depchl_dv,nchild_dv,respm16_dv,respf16_dv,rach16_dv,hrpid,hrpno,ppno,sppno,fnpno,fnspno,mnpno,mnspno,grfpno,grmpno,qfhighfl_dv,hiqual_dv,jbiindb_dv,sf12pcs_dv,sf12mcs_dv,scflag_dv,paygu_if,paynu_if,seearngrs_if,fiyrinvinc_if,fibenothr_if,fimnlabgrs_if,fimngrs_if,ind5mus_xw,ivfho,intdated,intdatem,intdatey,ivh1,ivh2,ivh3,ivh4,ivh5,ivh6,ivh7,ivh8,ivh9,ivh10,ivh11,ivh12,ivh13,ivh14,ivh15,ivh16,hsbeds,hsrooms,hsownd,fuelhave1,fuelhave2,fuelhave3,fuelhave4,fuelhave96,fuelduel,heatch,xphsdct,xphsdba,cduse1,cduse2,cduse5,cduse6,cduse7,cduse8,cduse9,cduse12,cduse13,cduse96,pcnet,xpfood1_g3,xpfdout_g3,xpaltob_g3,ncars,hhintlang,n10to15,fihhmngrs_dv,fihhmngrs_tc,fihhmnlabgrs_dv,fihhmnlabgrs_tc,ctband_if,fihhmnnet1_dv,fihhmnlabnet_dv,fihhmnmisc_dv,fihhmnprben_dv,fihhmninv_dv,fihhmnpen_dv,fihhmnsben_dv,houscost1_dv,houscost2_dv,fihhmngrs1_dv,ctband_dv,ncouple_dv,nonepar_dv,nkids_dv,nch02_dv,nch34_dv,nch511_dv,nch1215_dv,npens_dv,nemp_dv,nue_dv,nwage_dv,nchoecd_dv,nadoecd_dv,ieqmoecd_dv,tenure_dv,fihhnegsei_if,fihhmngrs_if,issue_num,aintlen,outcome,ivtnc,w6osmflag,dcsedfl_dv,lwenum_dv,fwenum_dv,lwintvd_dv,fwintvd_dv,d_hidp,d_pno,d_ivfio,d_ivfho,e_hidp,e_pno,e_ivfio,e_ivfho,e_month,f_hidp,f_pno,f_ivfio,f_ivfho,f_month,g_hidp,g_pno,g_ivfio,g_ivfho,g_month,h_hidp,h_ivfio,h_ivfho,genetics,epigenetics,xwdat_dv,school_dv,bornuk_dv,evercoh_dv,evermar_dv,anychild_dv,ethn_dv_source,prob91e,prob91w,prob91s,prob99w,prob99s,prob01ni,prob09ni,prob09e,prob09w,prob09s,bb_mortbh_tw,bc_mortbh_tw,bd_mortbh_tw,be_mortbh_tw,bf_mortbh_tw,bg_mortbh_tw,bh_mortbh_tw,bi_mortbh_tw,bj_mortbh_tw,bk_mortbh_tw,bl_mortbh_tw,bm_mortbh_tw,bn_mortbh_tw,bo_mortbh_tw,bp_mortbh_tw,bq_mortbh_tw,br_mortbh_tw,b_mortbh_tw,c_mortbh_tw,d_mortbh_tw,e_mortbh_tw,f_mortbh_tw,g_mortbh_tw,h_mortbh_tw,i_mortbh_tw,b_mortus_tw,c_mortus_tw,d_mortus_tw,e_mortus_tw,f_mortus_tw,g_mortus_tw,h_mortus_tw,psnenub_xd
3,29925,f,6,4.1,11,620547610,1,3,3,6,2,1,8,1,11,2,37,1977,29,9,2014,1,1,1,1,2,2,0,0,1,0,0,0,0,4,1,2,2,1,2,1,2,2,0,2,0,0,3,2,5,4,4,1,0,0,0,0,1,0,0,0,0,0,0,0,0,0,3,1,10,0,47,10,48,3,0,0,2175.620117,0,13.82,0,13.82,0,0.0,0,2161.800049,0,0.0,2,37,29,9,2014,1977,2,0,5,0,0,0,0,0,1,0.0,320.0,0.0,0.0,1841.800049,2175.620117,1,7,1,1,0,8,4,7,0,0,1,4,5,1,2,2,1,2,1,29925,1,0,0,0,0,0,0,0,0,0,1,30,56.59,35.67,1,0,0,1,0,0.04,1.0,0.05,0.0,10,29,9,2014,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,2,1,4,1,1,0,0,0,1,1,2,1,1,1,1,1,0,1,1,1,1,0,1,350,30,0,1,0,0,2175.620117,0,13.82,0,3,2175.620117,13.82,0.0,320.0,0.0,0.0,1841.800049,1451.0,1451.0,2175.620117,2,0,1,2,0,2,0,0,0,1,0,1,2,1,1.6,7,0,0.0451,1,10.0,110,3,0,2,14,4,14,4,622866406,3,2,11,621384688,1,50,61,9,620547610,1,1,10,8,620316412,1,1,10,8,619935614,1,10,0,0,3,3,1,1,1,1,2,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.999955,0.999895,0.999839,0.999777,0.999704,0.999622,0.999538,0.99947,0.999369,0.999289,0.999208,0.99912,0.99889,0.998761,0.998557,0.998418,0.998266,0.997985,0.997838,0.997649,0.997451,0.997212,0.996983,0.996752,0.996392,1.0,1.0,1.0,1.0,1.0,1.0,1.0,0.0
10,333205,f,6,3.4,5,416683610,3,3,3,80,17,1,5,1,11,2,24,1990,2,6,2014,2,2,1,2,1,2,0,0,0,0,0,0,1,1,2,2,3,1,0,0,0,0,0,0,0,0,6,1,5,1,1,1,2,1,0,0,0,0,0,0,0,0,1,2,1,2,3,1,19,2,37,19,49,44,0,0,2129.169922,0,2125.0,0,1600.0,0,50.040001,0,0.0,0,0.0,2,24,2,6,2014,1990,2,0,6,2,2,0,0,0,1,0.0,0.0,4.17,0.0,0.0,1604.170044,1,5,2,1,0,5,3,4,0,0,1,1,19,3,2,0,2,2,2,411584289,2,0,0,1,1,2,2,0,0,0,4,33,57.49,54.2,1,0,0,0,0,0.0,0.0,0.0,0.0,10,28,5,2014,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,5,3,2,1,1,0,0,0,2,1,2,1,1,1,1,1,1,1,1,1,1,0,1,400,50,25,4,0,0,8560.679688,0,5125.0,0,1,6885.680176,3450.0,0.0,0.0,135.679993,3300.0,0.0,1116.0,56.700001,8560.679688,7,1,0,0,0,0,0,0,0,2,1,3,0,3,2.0,2,0,0.0154,1,14.0,110,5,0,2,11,6,11,6,-9,-9,-9,-9,-9,-9,-9,-9,-9,416683610,3,1,10,5,416466012,3,1,10,5,415935614,1,10,0,0,3,3,1,1,1,2,3,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.999982,0.999958,0.999936,0.999911,0.999882,0.99985,0.999816,0.999789,0.999749,0.999717,0.999685,0.99965,0.999559,0.999507,0.999426,0.99937,0.99931,0.999198,0.99914,0.999064,0.998985,0.99889,0.998798,0.998707,0.998563,1.0,1.0,1.0,1.0,1.0,1.0,1.0,0.0
12,387605,f,6,2.8,22,347486810,3,3,3,92,20,1,5,1,11,2,26,1988,30,5,2014,2,3,1,2,2,2,0,0,0,0,0,0,1,4,1,2,2,1,0,0,0,0,0,0,0,0,3,6,3,2,1,2,2,1,0,0,0,0,0,0,0,0,1,2,1,2,3,2,17,32,33,18,9,39,0,0,309.829987,0,0.0,0,0.0,0,0.0,0,309.829987,0,0.0,2,26,30,5,2014,1988,2,0,6,2,2,0,0,0,1,0.0,0.0,0.0,0.0,309.829987,309.829987,1,9,2,1,0,6,3,5,0,0,1,1,19,3,2,0,2,2,2,341490565,1,0,0,1,1,2,2,0,0,0,3,0,65.49,19.88,1,0,0,0,0,0.0,0.0,0.0,0.0,10,30,5,2014,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,4,2,2,1,0,1,0,0,-8,1,2,1,1,1,1,1,1,1,1,1,1,0,1,320,100,80,2,0,0,5860.490234,0,5509.0,0,1,4514.490234,4163.0,0.0,0.0,41.66,0.0,309.829987,560.0,37.799999,5860.490234,4,1,0,0,0,0,0,0,0,2,1,3,0,3,2.0,2,0,0.0,1,11.0,110,5,0,2,8,4,7,4,349690006,3,1,12,348044408,3,1,10,5,347486810,3,1,10,5,347194412,3,1,10,5,346698014,11,12,0,0,3,3,1,1,2,2,3,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.99998,0.999953,0.999928,0.9999,0.999867,0.99983,0.999793,0.999762,0.999717,0.999681,0.999645,0.999606,0.999502,0.999445,0.999353,0.999291,0.999222,0.999096,0.999031,0.998946,0.998857,0.998749,0.998647,0.998543,0.998381,1.0,1.0,1.0,1.0,1.0,1.0,1.0,0.0
15,541285,f,6,1.9,18,143384810,2,3,3,120,30,1,1,1,11,1,28,1985,14,1,2014,2,1,1,2,2,2,0,0,0,0,0,0,1,1,1,2,2,1,0,0,0,0,0,0,0,0,3,6,4,3,1,1,1,0,0,0,0,0,0,0,0,0,1,0,1,0,3,2,15,58,31,16,49,49,0,0,1300.0,0,1300.0,0,1126.800049,0,0.0,0,0.0,0,0.0,1,28,14,1,2014,1985,2,0,6,1,1,0,1,1,1,0.0,0.0,0.0,0.0,0.0,1126.800049,1,4,1,3,0,6,3,5,0,0,1,1,22,2,2,0,2,2,2,137280449,1,0,0,0,0,1,1,0,0,1,1,20,61.21,24.76,1,0,0,0,0,0.0,0.0,0.0,0.0,12,24,1,2014,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,5,4,1,1,1,0,0,0,1,1,-9,1,1,1,1,1,1,1,1,1,1,0,1,400,45,120,4,0,0,2226.040039,0,2224.790039,0,1,2045.619995,2044.369995,0.0,0.0,1.25,0.0,0.0,0.0,0.0,2226.040039,6,0,0,0,0,0,0,0,0,3,0,3,0,3,2.0,1,0,0.4156,1,18.0,210,4,0,2,12,3,6,3,145989206,4,2,11,144146408,3,2,11,2,143384810,2,1,12,1,143092412,2,11,14,1,142881614,11,12,0,0,3,1,1,2,2,2,3,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.999979,0.999951,0.999925,0.999896,0.999862,0.999824,0.999785,0.999753,0.999706,0.999669,0.999631,0.99959,0.999483,0.999423,0.999328,0.999263,0.999193,0.999062,0.998994,0.998906,0.998814,0.998702,0.998596,0.998489,0.998321,1.0,1.0,1.0,1.0,1.0,1.0,1.0,0.0
18,665045,f,6,3.3,10,212588410,4,3,3,144,38,1,1,1,11,1,32,1981,18,2,2014,2,2,1,2,2,2,0,0,0,0,0,0,1,4,1,1,1,1,0,0,0,0,0,0,0,0,5,3,5,4,1,1,1,0,0,0,0,0,0,0,0,0,1,0,1,2,4,1,15,26,7,16,15,3,0,0,460.0,0,460.0,0,460.0,0,0.0,0,0.0,0,0.0,1,32,18,2,2014,1981,2,0,6,1,2,0,0,1,1,0.0,0.0,0.0,0.0,0.0,460.0,1,5,2,3,0,7,4,6,0,0,1,1,19,4,2,0,2,2,2,205598685,1,0,0,0,2,1,1,0,0,0,3,24,53.97,50.14,0,0,0,0,0,0.0,0.0,0.0,0.0,12,25,2,2014,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,3,2,1,1,1,0,0,0,1,1,2,1,-8,-8,-8,-8,-8,-8,-8,-8,-8,-9,1,350,100,0,2,0,0,11270.959961,0,9879.849609,0,1,7817.240234,6426.129883,0.0,0.0,1148.439941,0.0,242.669998,0.0,0.0,11270.959961,3,1,0,0,0,0,0,0,0,3,1,4,0,4,2.5,1,0,0.862,1,17.0,210,20,0,2,14,2,12,3,214812006,4,1,12,213071208,4,1,12,3,212588410,4,1,12,1,212262012,4,10,12,1,212064814,1,12,0,0,3,1,1,2,1,2,2,0.000342,0.0,0.0,0.0,0.0,0.0,0.0,0.001829,0.0,0.0,0.999946,0.999874,0.999806,0.999732,0.999644,0.999545,0.999445,0.999363,0.999242,0.999145,0.999048,0.998943,0.998667,0.998512,0.998267,0.9981,0.997918,0.997581,0.997406,0.99718,0.996943,0.996656,0.996382,0.996107,0.995676,1.0,1.0,1.0,1.0,1.0,1.0,1.0,460.659668


In [None]:
# Drop rows with any negative values
numeric_cols = df.select_dtypes(include='number').columns
df = df[(df[numeric_cols] >= 0).all(axis=1)]
print("Shape after dropping rows with negative values:", df.shape)
df.head()

Shape after dropping rows with negative values: (30393, 318)


Unnamed: 0,pidp,wave,wave_num,nbrsnci_dv,scghq1_dv,hidp,pno,hhorig,memorig,psu,strata,sampst,month,ivfio,ioutcome,sex,dvage,birthy,istrtdatd,istrtdatm,istrtdaty,lkmove,jbstat,racel_dv,health,aidxhh,j2has,bensta2,bensta3,bensta4,bensta5,bensta6,bensta7,bensta96,finnow,finfut,vote1,vote6,mobuse,nch14resp,nch415resp,nchresp,nnatch,nadoptch,nchunder16,nch5to15,nch10to15,sclfsat1,sclfsat2,sclfsat7,sclfsato,marstat,employ,hgbiom,hgbiof,hgpart,respf16,respm16,intdatd_if,intdatm_if,intdaty_if,doby_if,age_if,pn1pno,pn2pno,pns1pno,pns2pno,hhsize,jbhas,istrtdathh,istrtdatmm,istrtdatss,ienddathh,ienddatmm,ienddatss,j2pay_if,fimngrs_tc,fimngrs_dv,fimnlabgrs_tc,fimnlabgrs_dv,fimnlabnet_tc,fimnlabnet_dv,fiyrinvinc_tc,fiyrinvinc_dv,fibenothr_tc,fibenothr_dv,j2pay_dv,j2paynet_dv,sex_dv,age_dv,intdatd_dv,intdatm_dv,intdaty_dv,doby_dv,pensioner_dv,npensioner_dv,marstat_dv,npn_dv,npns_dv,ngrp_dv,nnsib_dv,nnssib_dv,ethn_dv,fimnmisc_dv,fimnprben_dv,fimninvnet_dv,fimnpen_dv,fimnsben_dv,fimnnet_dv,country,gor_dv,urban_dv,hhresp_dv,xtra5min_dv,agegr5_dv,agegr10_dv,agegr13_dv,livesp_dv,cohab_dv,single_dv,mastat_dv,hhtype_dv,buno_dv,depchl_dv,nchild_dv,respm16_dv,respf16_dv,rach16_dv,hrpid,hrpno,ppno,sppno,fnpno,fnspno,mnpno,mnspno,grfpno,grmpno,qfhighfl_dv,hiqual_dv,jbiindb_dv,sf12pcs_dv,sf12mcs_dv,scflag_dv,paygu_if,paynu_if,seearngrs_if,fiyrinvinc_if,fibenothr_if,fimnlabgrs_if,fimngrs_if,ind5mus_xw,ivfho,intdated,intdatem,intdatey,ivh1,ivh2,ivh3,ivh4,ivh5,ivh6,ivh7,ivh8,ivh9,ivh10,ivh11,ivh12,ivh13,ivh14,ivh15,ivh16,hsbeds,hsrooms,hsownd,fuelhave1,fuelhave2,fuelhave3,fuelhave4,fuelhave96,fuelduel,heatch,xphsdct,xphsdba,cduse1,cduse2,cduse5,cduse6,cduse7,cduse8,cduse9,cduse12,cduse13,cduse96,pcnet,xpfood1_g3,xpfdout_g3,xpaltob_g3,ncars,hhintlang,n10to15,fihhmngrs_dv,fihhmngrs_tc,fihhmnlabgrs_dv,fihhmnlabgrs_tc,ctband_if,fihhmnnet1_dv,fihhmnlabnet_dv,fihhmnmisc_dv,fihhmnprben_dv,fihhmninv_dv,fihhmnpen_dv,fihhmnsben_dv,houscost1_dv,houscost2_dv,fihhmngrs1_dv,ctband_dv,ncouple_dv,nonepar_dv,nkids_dv,nch02_dv,nch34_dv,nch511_dv,nch1215_dv,npens_dv,nemp_dv,nue_dv,nwage_dv,nchoecd_dv,nadoecd_dv,ieqmoecd_dv,tenure_dv,fihhnegsei_if,fihhmngrs_if,issue_num,aintlen,outcome,ivtnc,w6osmflag,dcsedfl_dv,lwenum_dv,fwenum_dv,lwintvd_dv,fwintvd_dv,d_hidp,d_pno,d_ivfio,d_ivfho,e_hidp,e_pno,e_ivfio,e_ivfho,e_month,f_hidp,f_pno,f_ivfio,f_ivfho,f_month,g_hidp,g_pno,g_ivfio,g_ivfho,g_month,h_hidp,h_ivfio,h_ivfho,genetics,epigenetics,xwdat_dv,school_dv,bornuk_dv,evercoh_dv,evermar_dv,anychild_dv,ethn_dv_source,prob91e,prob91w,prob91s,prob99w,prob99s,prob01ni,prob09ni,prob09e,prob09w,prob09s,bb_mortbh_tw,bc_mortbh_tw,bd_mortbh_tw,be_mortbh_tw,bf_mortbh_tw,bg_mortbh_tw,bh_mortbh_tw,bi_mortbh_tw,bj_mortbh_tw,bk_mortbh_tw,bl_mortbh_tw,bm_mortbh_tw,bn_mortbh_tw,bo_mortbh_tw,bp_mortbh_tw,bq_mortbh_tw,br_mortbh_tw,b_mortbh_tw,c_mortbh_tw,d_mortbh_tw,e_mortbh_tw,f_mortbh_tw,g_mortbh_tw,h_mortbh_tw,i_mortbh_tw,b_mortus_tw,c_mortus_tw,d_mortus_tw,e_mortus_tw,f_mortus_tw,g_mortus_tw,h_mortus_tw,psnenub_xd
3,29925,f,6,4.1,11,620547610,1,3,3,6,2,1,8,1,11,2,37,1977,29,9,2014,1,1,1,1,2,2,0,0,1,0,0,0,0,4,1,2,2,1,2,1,2,2,0,2,0,0,3,2,5,4,4,1,0,0,0,0,1,0,0,0,0,0,0,0,0,0,3,1,10,0,47,10,48,3,0,0,2175.620117,0,13.82,0,13.82,0,0.0,0,2161.800049,0,0.0,2,37,29,9,2014,1977,2,0,5,0,0,0,0,0,1,0.0,320.0,0.0,0.0,1841.800049,2175.620117,1,7,1,1,0,8,4,7,0,0,1,4,5,1,2,2,1,2,1,29925,1,0,0,0,0,0,0,0,0,0,1,30,56.59,35.67,1,0,0,1,0,0.04,1.0,0.05,0.0,10,29,9,2014,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,2,1,4,1,1,0,0,0,1,1,2,1,1,1,1,1,0,1,1,1,1,0,1,350,30,0,1,0,0,2175.620117,0,13.82,0,3,2175.620117,13.82,0.0,320.0,0.0,0.0,1841.800049,1451.0,1451.0,2175.620117,2,0,1,2,0,2,0,0,0,1,0,1,2,1,1.6,7,0,0.0451,1,10.0,110,3,0,2,14,4,14,4,622866406,3,2,11,621384688,1,50,61,9,620547610,1,1,10,8,620316412,1,1,10,8,619935614,1,10,0,0,3,3,1,1,1,1,2,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.999955,0.999895,0.999839,0.999777,0.999704,0.999622,0.999538,0.99947,0.999369,0.999289,0.999208,0.99912,0.99889,0.998761,0.998557,0.998418,0.998266,0.997985,0.997838,0.997649,0.997451,0.997212,0.996983,0.996752,0.996392,1.0,1.0,1.0,1.0,1.0,1.0,1.0,0.0
12,387605,f,6,2.8,22,347486810,3,3,3,92,20,1,5,1,11,2,26,1988,30,5,2014,2,3,1,2,2,2,0,0,0,0,0,0,1,4,1,2,2,1,0,0,0,0,0,0,0,0,3,6,3,2,1,2,2,1,0,0,0,0,0,0,0,0,1,2,1,2,3,2,17,32,33,18,9,39,0,0,309.829987,0,0.0,0,0.0,0,0.0,0,309.829987,0,0.0,2,26,30,5,2014,1988,2,0,6,2,2,0,0,0,1,0.0,0.0,0.0,0.0,309.829987,309.829987,1,9,2,1,0,6,3,5,0,0,1,1,19,3,2,0,2,2,2,341490565,1,0,0,1,1,2,2,0,0,0,3,0,65.49,19.88,1,0,0,0,0,0.0,0.0,0.0,0.0,10,30,5,2014,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,4,2,2,1,0,1,0,0,-8,1,2,1,1,1,1,1,1,1,1,1,1,0,1,320,100,80,2,0,0,5860.490234,0,5509.0,0,1,4514.490234,4163.0,0.0,0.0,41.66,0.0,309.829987,560.0,37.799999,5860.490234,4,1,0,0,0,0,0,0,0,2,1,3,0,3,2.0,2,0,0.0,1,11.0,110,5,0,2,8,4,7,4,349690006,3,1,12,348044408,3,1,10,5,347486810,3,1,10,5,347194412,3,1,10,5,346698014,11,12,0,0,3,3,1,1,2,2,3,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.99998,0.999953,0.999928,0.9999,0.999867,0.99983,0.999793,0.999762,0.999717,0.999681,0.999645,0.999606,0.999502,0.999445,0.999353,0.999291,0.999222,0.999096,0.999031,0.998946,0.998857,0.998749,0.998647,0.998543,0.998381,1.0,1.0,1.0,1.0,1.0,1.0,1.0,0.0
15,541285,f,6,1.9,18,143384810,2,3,3,120,30,1,1,1,11,1,28,1985,14,1,2014,2,1,1,2,2,2,0,0,0,0,0,0,1,1,1,2,2,1,0,0,0,0,0,0,0,0,3,6,4,3,1,1,1,0,0,0,0,0,0,0,0,0,1,0,1,0,3,2,15,58,31,16,49,49,0,0,1300.0,0,1300.0,0,1126.800049,0,0.0,0,0.0,0,0.0,1,28,14,1,2014,1985,2,0,6,1,1,0,1,1,1,0.0,0.0,0.0,0.0,0.0,1126.800049,1,4,1,3,0,6,3,5,0,0,1,1,22,2,2,0,2,2,2,137280449,1,0,0,0,0,1,1,0,0,1,1,20,61.21,24.76,1,0,0,0,0,0.0,0.0,0.0,0.0,12,24,1,2014,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,5,4,1,1,1,0,0,0,1,1,-9,1,1,1,1,1,1,1,1,1,1,0,1,400,45,120,4,0,0,2226.040039,0,2224.790039,0,1,2045.619995,2044.369995,0.0,0.0,1.25,0.0,0.0,0.0,0.0,2226.040039,6,0,0,0,0,0,0,0,0,3,0,3,0,3,2.0,1,0,0.4156,1,18.0,210,4,0,2,12,3,6,3,145989206,4,2,11,144146408,3,2,11,2,143384810,2,1,12,1,143092412,2,11,14,1,142881614,11,12,0,0,3,1,1,2,2,2,3,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.999979,0.999951,0.999925,0.999896,0.999862,0.999824,0.999785,0.999753,0.999706,0.999669,0.999631,0.99959,0.999483,0.999423,0.999328,0.999263,0.999193,0.999062,0.998994,0.998906,0.998814,0.998702,0.998596,0.998489,0.998321,1.0,1.0,1.0,1.0,1.0,1.0,1.0,0.0
18,665045,f,6,3.3,10,212588410,4,3,3,144,38,1,1,1,11,1,32,1981,18,2,2014,2,2,1,2,2,2,0,0,0,0,0,0,1,4,1,1,1,1,0,0,0,0,0,0,0,0,5,3,5,4,1,1,1,0,0,0,0,0,0,0,0,0,1,0,1,2,4,1,15,26,7,16,15,3,0,0,460.0,0,460.0,0,460.0,0,0.0,0,0.0,0,0.0,1,32,18,2,2014,1981,2,0,6,1,2,0,0,1,1,0.0,0.0,0.0,0.0,0.0,460.0,1,5,2,3,0,7,4,6,0,0,1,1,19,4,2,0,2,2,2,205598685,1,0,0,0,2,1,1,0,0,0,3,24,53.97,50.14,0,0,0,0,0,0.0,0.0,0.0,0.0,12,25,2,2014,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,3,2,1,1,1,0,0,0,1,1,2,1,-8,-8,-8,-8,-8,-8,-8,-8,-8,-9,1,350,100,0,2,0,0,11270.959961,0,9879.849609,0,1,7817.240234,6426.129883,0.0,0.0,1148.439941,0.0,242.669998,0.0,0.0,11270.959961,3,1,0,0,0,0,0,0,0,3,1,4,0,4,2.5,1,0,0.862,1,17.0,210,20,0,2,14,2,12,3,214812006,4,1,12,213071208,4,1,12,3,212588410,4,1,12,1,212262012,4,10,12,1,212064814,1,12,0,0,3,1,1,2,1,2,2,0.000342,0.0,0.0,0.0,0.0,0.0,0.0,0.001829,0.0,0.0,0.999946,0.999874,0.999806,0.999732,0.999644,0.999545,0.999445,0.999363,0.999242,0.999145,0.999048,0.998943,0.998667,0.998512,0.998267,0.9981,0.997918,0.997581,0.997406,0.99718,0.996943,0.996656,0.996382,0.996107,0.995676,1.0,1.0,1.0,1.0,1.0,1.0,1.0,460.659668
28,1833965,f,6,2.8,20,754766010,2,3,3,46,12,3,10,1,11,1,49,1965,14,10,2014,2,1,1,2,2,2,0,0,0,0,0,0,1,2,1,2,3,1,0,0,0,0,0,0,0,0,3,4,6,3,1,1,1,0,0,0,0,0,0,0,0,0,1,0,1,0,2,1,15,21,10,16,23,3,0,0,1301.670044,0,1300.0,0,970.669983,0,20.040001,0,0.0,0,0.0,1,49,14,10,2014,1965,2,1,6,1,1,0,0,0,1,0.0,0.0,1.67,0.0,0.0,972.340027,1,8,2,1,0,10,5,9,0,0,1,1,17,2,2,0,2,2,2,748184965,1,0,0,0,0,1,1,0,0,0,5,30,49.61,25.16,1,0,0,0,0,0.0,0.0,0.0,0.0,10,14,10,2014,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,3,2,1,1,0,0,0,0,-8,2,1,1,1,1,1,1,1,1,1,1,1,0,1,200,10,0,1,0,0,2805.590088,0,1300.0,0,1,2476.26001,970.669983,0.0,0.0,881.590027,0.0,624.0,0.0,0.0,2805.590088,4,0,0,0,0,0,0,0,1,1,1,1,0,2,1.5,1,0,0.0,1,16.0,110,2,0,2,11,2,9,2,756846806,3,2,11,755412008,2,1,10,11,754766010,2,1,10,10,754494012,1,1,10,10,754310414,1,10,0,0,3,1,1,2,2,1,4,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,0.0


Great! Now we need to take out scghq1_dv (the outcome variable) and drop identifiers (pidp wave wave_num hidp)

In [None]:
# Separate outcome (scghq1_dv) and predictors (X)

Y = df['scghq1_dv'].astype(float)  # Ensure Y is a numeric pandas Series
X = df.drop(columns=['scghq1_dv'])

# Drop identifiers (pidp, wave, wave_num, hidp, pno, hhorig, memorig, psu, strata, sampst, month, ivfio, ioutcome, birthy, hrpid)
identifiers = ['pidp', 'wave', 'wave_num', 'hidp', 'pno', 'hhorig', 'memorig', 'psu', 'strata', 'sampst', 'month', 'ivfio', 'ioutcome', 'birthy', 'hrpid']
X = X.drop(columns=identifiers)

In [None]:
# Strict rule: discrete only if dtype is object/category or integer dtype, and num_unique <= MAX_UNIQUE_DISCRETE
# Anything with more than MAX_UNIQUE_DISCRETE unique values is automatically continuous.

MAX_UNIQUE_DISCRETE = 30

label_map = dict(zip(meta.column_names, meta.column_labels)) if hasattr(meta, 'column_names') else {}
rows = []

for col in X.columns:
    dtype = X[col].dtype
    col_vals = X[col].dropna()
    num_unique = int(col_vals.nunique())

    # Automatic continuous if too many uniques
    if num_unique > MAX_UNIQUE_DISCRETE:
        will_discrete = False
        reason = f'{num_unique} unique > {MAX_UNIQUE_DISCRETE} -> continuous'
    else:
        is_string = dtype == 'object' or str(dtype).startswith('category')
        is_integer = pd.api.types.is_integer_dtype(X[col].dtype)

        will_discrete = bool(is_string or is_integer)
        if is_string:
            reason = 'string/category dtype -> discrete'
        elif is_integer:
            reason = 'integer dtype -> discrete'
        else:
            reason = 'float/numeric dtype -> continuous'

    suggested_action = 'one-hot encode (discrete)' if will_discrete else 'treat as continuous'
    sample_vals = list(pd.Series(col_vals.unique()).sort_values()[:6]) if num_unique > 0 else []

    rows.append({
        'Column': col,
        'Label': label_map.get(col, ''),
        'DataType': str(dtype),
        'NumUnique': num_unique,
        'IsString': dtype == 'object' or str(dtype).startswith('category'),
        'IsInteger': pd.api.types.is_integer_dtype(X[col].dtype),
        'WillBeDiscrete': will_discrete,
        'DecisionReason': reason,
        'SuggestedAction': suggested_action
    })

variable_summary = pd.DataFrame(rows)
# Order discrete first for visibility
variable_summary = variable_summary.sort_values(by=['WillBeDiscrete', 'NumUnique'], ascending=[False, True]).reset_index(drop=True)

print(f"Total variables: {len(variable_summary)}")
print(f"Discrete by rule: {int(variable_summary['WillBeDiscrete'].sum())}")
print(f"MAX_UNIQUE_DISCRETE = {MAX_UNIQUE_DISCRETE}")

pd.set_option('display.max_rows', None)
display(variable_summary)


Total variables: 302
Discrete by rule: 204
MAX_UNIQUE_DISCRETE = 30


Unnamed: 0,Column,Label,DataType,NumUnique,IsString,IsInteger,WillBeDiscrete,DecisionReason,SuggestedAction
0,intdatd_if,"Interview date: Day, imputation flag",int64,1,False,True,True,integer dtype -> discrete,one-hot encode (discrete)
1,intdatm_if,"Interview date: Month, imputation flag",int64,1,False,True,True,integer dtype -> discrete,one-hot encode (discrete)
2,intdaty_if,"Interview date: Year, imputation flag",int64,1,False,True,True,integer dtype -> discrete,one-hot encode (discrete)
3,doby_if,DOB Year imputation flag,int64,1,False,True,True,integer dtype -> discrete,one-hot encode (discrete)
4,age_if,Imputation flag for age_dv,int64,1,False,True,True,integer dtype -> discrete,one-hot encode (discrete)
5,f_ivfio,individual interview outcome,int64,1,False,True,True,integer dtype -> discrete,one-hot encode (discrete)
6,sex,Sex,int64,2,False,True,True,integer dtype -> discrete,one-hot encode (discrete)
7,lkmove,Prefers to move house,int64,2,False,True,True,integer dtype -> discrete,one-hot encode (discrete)
8,health,Long-standing illness or disability,int64,2,False,True,True,integer dtype -> discrete,one-hot encode (discrete)
9,aidxhh,Non-residents cared for,int64,2,False,True,True,integer dtype -> discrete,one-hot encode (discrete)


Great! Now we need to scale the continuous variables.

In [None]:
# Scale continuous variables to mean 0, std 1
from sklearn.preprocessing import StandardScaler
import numpy as np

# Identify continuous columns using the variable_summary decisions
continuous_cols = variable_summary.loc[~variable_summary['WillBeDiscrete'], 'Column'].tolist()

# Only keep columns that still exist in X (defensive)
continuous_cols = [c for c in continuous_cols if c in X.columns]

print(f'Found {len(continuous_cols)} continuous column(s) to scale')

# Keep a copy of the unscaled X in case we need it later
X_unscaled = X.copy()

scaler = None
if len(continuous_cols) > 0:
    scaler = StandardScaler()
    # Convert to float (safe) and scale in-place on a copy
    X_scaled = X.copy()
    try:
        X_scaled[continuous_cols] = scaler.fit_transform(X_scaled[continuous_cols].astype(float))
    except Exception as e:
        # Fall back to scaling each column separately if there are issues with mixed dtypes
        print('Warning: bulk scaling failed, falling back to per-column scaling. Error:', e)
        for col in continuous_cols:
            try:
                vals = X_scaled[col].astype(float).values.reshape(-1, 1)
                X_scaled[col] = scaler.fit_transform(vals).ravel()
            except Exception as e2:
                print(f'  Could not scale column {col}:', e2)
    
    # Replace X with scaled version
    X = X_scaled
else:
    print('No continuous columns to scale; X left unchanged')

print('Shape of X after scaling:', X.shape)

# Quick sanity checks
if scaler is not None and len(continuous_cols) > 0:
    # show means and stds (approx) for a few columns
    sample_check = continuous_cols[:6]
    means = X[sample_check].mean().round(6)
    stds = X[sample_check].std().round(6)
    print('Sample scaled means (should be near 0):')
    print(means.to_dict())
    print('Sample scaled stds (should be near 1):')
    print(stds.to_dict())

# Display head for quick verification
pd.set_option('display.max_columns', None)
display(X.head())

Found 98 continuous column(s) to scale
Shape of X after scaling: (30393, 302)
Sample scaled means (should be near 0):
{'b_mortus_tw': 0.0, 'prob01ni': 0.0, 'c_mortus_tw': 0.0, 'prob91e': 0.0, 'prob91w': 0.0, 'prob91s': 0.0}
Sample scaled stds (should be near 1):
{'b_mortus_tw': 0.0, 'prob01ni': 1.000016, 'c_mortus_tw': 1.000016, 'prob91e': 1.000016, 'prob91w': 1.000016, 'prob91s': 1.000016}


Unnamed: 0,nbrsnci_dv,sex,dvage,istrtdatd,istrtdatm,istrtdaty,lkmove,jbstat,racel_dv,health,aidxhh,j2has,bensta2,bensta3,bensta4,bensta5,bensta6,bensta7,bensta96,finnow,finfut,vote1,vote6,mobuse,nch14resp,nch415resp,nchresp,nnatch,nadoptch,nchunder16,nch5to15,nch10to15,sclfsat1,sclfsat2,sclfsat7,sclfsato,marstat,employ,hgbiom,hgbiof,hgpart,respf16,respm16,intdatd_if,intdatm_if,intdaty_if,doby_if,age_if,pn1pno,pn2pno,pns1pno,pns2pno,hhsize,jbhas,istrtdathh,istrtdatmm,istrtdatss,ienddathh,ienddatmm,ienddatss,j2pay_if,fimngrs_tc,fimngrs_dv,fimnlabgrs_tc,fimnlabgrs_dv,fimnlabnet_tc,fimnlabnet_dv,fiyrinvinc_tc,fiyrinvinc_dv,fibenothr_tc,fibenothr_dv,j2pay_dv,j2paynet_dv,sex_dv,age_dv,intdatd_dv,intdatm_dv,intdaty_dv,doby_dv,pensioner_dv,npensioner_dv,marstat_dv,npn_dv,npns_dv,ngrp_dv,nnsib_dv,nnssib_dv,ethn_dv,fimnmisc_dv,fimnprben_dv,fimninvnet_dv,fimnpen_dv,fimnsben_dv,fimnnet_dv,country,gor_dv,urban_dv,hhresp_dv,xtra5min_dv,agegr5_dv,agegr10_dv,agegr13_dv,livesp_dv,cohab_dv,single_dv,mastat_dv,hhtype_dv,buno_dv,depchl_dv,nchild_dv,respm16_dv,respf16_dv,rach16_dv,hrpno,ppno,sppno,fnpno,fnspno,mnpno,mnspno,grfpno,grmpno,qfhighfl_dv,hiqual_dv,jbiindb_dv,sf12pcs_dv,sf12mcs_dv,scflag_dv,paygu_if,paynu_if,seearngrs_if,fiyrinvinc_if,fibenothr_if,fimnlabgrs_if,fimngrs_if,ind5mus_xw,ivfho,intdated,intdatem,intdatey,ivh1,ivh2,ivh3,ivh4,ivh5,ivh6,ivh7,ivh8,ivh9,ivh10,ivh11,ivh12,ivh13,ivh14,ivh15,ivh16,hsbeds,hsrooms,hsownd,fuelhave1,fuelhave2,fuelhave3,fuelhave4,fuelhave96,fuelduel,heatch,xphsdct,xphsdba,cduse1,cduse2,cduse5,cduse6,cduse7,cduse8,cduse9,cduse12,cduse13,cduse96,pcnet,xpfood1_g3,xpfdout_g3,xpaltob_g3,ncars,hhintlang,n10to15,fihhmngrs_dv,fihhmngrs_tc,fihhmnlabgrs_dv,fihhmnlabgrs_tc,ctband_if,fihhmnnet1_dv,fihhmnlabnet_dv,fihhmnmisc_dv,fihhmnprben_dv,fihhmninv_dv,fihhmnpen_dv,fihhmnsben_dv,houscost1_dv,houscost2_dv,fihhmngrs1_dv,ctband_dv,ncouple_dv,nonepar_dv,nkids_dv,nch02_dv,nch34_dv,nch511_dv,nch1215_dv,npens_dv,nemp_dv,nue_dv,nwage_dv,nchoecd_dv,nadoecd_dv,ieqmoecd_dv,tenure_dv,fihhnegsei_if,fihhmngrs_if,issue_num,aintlen,outcome,ivtnc,w6osmflag,dcsedfl_dv,lwenum_dv,fwenum_dv,lwintvd_dv,fwintvd_dv,d_hidp,d_pno,d_ivfio,d_ivfho,e_hidp,e_pno,e_ivfio,e_ivfho,e_month,f_hidp,f_pno,f_ivfio,f_ivfho,f_month,g_hidp,g_pno,g_ivfio,g_ivfho,g_month,h_hidp,h_ivfio,h_ivfho,genetics,epigenetics,xwdat_dv,school_dv,bornuk_dv,evercoh_dv,evermar_dv,anychild_dv,ethn_dv_source,prob91e,prob91w,prob91s,prob99w,prob99s,prob01ni,prob09ni,prob09e,prob09w,prob09s,bb_mortbh_tw,bc_mortbh_tw,bd_mortbh_tw,be_mortbh_tw,bf_mortbh_tw,bg_mortbh_tw,bh_mortbh_tw,bi_mortbh_tw,bj_mortbh_tw,bk_mortbh_tw,bl_mortbh_tw,bm_mortbh_tw,bn_mortbh_tw,bo_mortbh_tw,bp_mortbh_tw,bq_mortbh_tw,br_mortbh_tw,b_mortbh_tw,c_mortbh_tw,d_mortbh_tw,e_mortbh_tw,f_mortbh_tw,g_mortbh_tw,h_mortbh_tw,i_mortbh_tw,b_mortus_tw,c_mortus_tw,d_mortus_tw,e_mortus_tw,f_mortus_tw,g_mortus_tw,h_mortus_tw,psnenub_xd
3,0.639878,2,-0.672515,1.692497,9,2014,1,1,1,1,2,2,0,0,1,0,0,0,0,4,1,2,2,1,2,1,2,2,0,2,0,0,3,2,5,4,4,1,0,0,0,0,1,0,0,0,0,0,0,0,0,0,3,1,10,-1.669109,1.010297,10,1.060204,-1.516673,0,0,0.211878,0,-0.74178,0,-0.808718,0,-0.145866,0,1.954234,-0.1074,-0.115447,2,-0.672223,1.692497,9,2014,0.674773,2,0,5,0,0,0,0,0,1,-0.112401,3.941976,-0.202415,-0.297459,3.418842,0.127134,1,7,1,1,0,8,4,7,0,0,1,4,5,1,2,2,1,2,1,1,0,0,0,0,0,0,0,0,0,1,1.260867,0.628624,-1.485992,1,0,0,1,0,-0.154913,3.520283,-0.256349,-0.11085,10,1.675516,9,2014,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,2,1,4,1,1,0,0,0,1,1,2,1,1,1,1,1,0,1,1,1,1,0,1,0.143739,-0.5418,-0.66429,1,0,0,-0.640248,0,-0.95828,0,3,-0.187982,-0.329248,-0.17949,2.567482,-0.282252,-0.366117,1.888175,2.19613,3.374203,-0.170821,2,0,1,2,0,2,0,0,0,1,0,1,2,1,-0.367597,7,0,-0.513697,1,-0.255135,110,-0.441872,0,2,14,4,14,4,-0.272483,3,2,11,-0.273698,1,50,61,9,-0.274659,1,1,10,8,-0.274879,1,1,10,8,-0.275217,1,10,0,0,3,3,1,1,1,1,2,-1.361249,-0.255976,-0.295974,-0.258665,-0.294789,-0.25275,-0.17737,-1.248358,-0.183014,-0.228409,0.007908,0.00719,0.006539,0.005842,0.004955,0.00397,0.002964,0.002137,0.022051,0.026136,0.038135,0.044763,0.051329,0.054146,0.056311,0.062043,0.064654,0.067146,0.069479,0.071597,0.071033,0.072547,0.073635,0.076429,0.074465,0.0,0.005736,0.009787,0.011343,0.011343,0.023095,0.027758,-0.773167
12,-1.155361,2,-1.282752,1.815247,5,2014,2,3,1,2,2,2,0,0,0,0,0,0,1,4,1,2,2,1,0,0,0,0,0,0,0,0,3,6,3,2,1,2,2,1,0,0,0,0,0,0,0,0,1,2,1,2,3,2,17,0.21938,0.202793,18,-1.164646,0.550271,0,0,-0.924523,0,-0.750298,0,-0.821108,0,-0.145866,0,-0.312903,-0.1074,-0.115447,2,-1.282468,1.815247,5,2014,1.284052,2,0,6,2,2,0,0,0,1,-0.112401,-0.084674,-0.202415,-0.297459,-0.00565,-0.295682,1,9,2,1,0,6,3,5,0,0,1,1,19,3,2,0,2,2,2,1,0,0,1,1,2,2,0,0,0,3,-1.065674,1.42169,-3.124727,1,0,0,0,0,-0.32367,-0.288237,-0.425239,-0.11085,10,1.797526,5,2014,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,4,2,2,1,0,1,0,0,-8,1,2,1,1,1,1,1,1,1,1,1,1,0,1,-0.022917,0.052966,0.394765,2,0,0,0.653301,0,0.913276,0,1,0.136963,0.252237,-0.17949,-0.097751,-0.230377,-0.366117,-0.409101,0.377342,-0.590312,0.108284,4,1,0,0,0,0,0,0,0,2,1,3,0,3,0.265623,2,0,-0.659033,1,-0.172987,110,0.145006,0,2,8,4,7,4,-0.867782,3,1,12,-0.868917,3,1,10,5,-0.869201,3,1,10,5,-0.869521,3,1,10,5,-0.870122,11,12,0,0,3,3,1,1,2,2,3,-1.361249,-0.255976,-0.295974,-0.258665,-0.294789,-0.25275,-0.17737,-1.248358,-0.183014,-0.228409,0.092773,0.092952,0.093073,0.093144,0.093264,0.093334,0.093397,0.093441,0.113316,0.117582,0.128823,0.135021,0.14169,0.144998,0.148522,0.153802,0.156446,0.160377,0.162557,0.164917,0.165903,0.167832,0.169732,0.172706,0.17326,0.0,0.005736,0.009787,0.011343,0.011343,0.023095,0.027758,-0.773167
15,-2.398218,1,-1.1718,-0.148751,1,2014,2,1,1,2,2,2,0,0,0,0,0,0,1,1,1,2,2,1,0,0,0,0,0,0,0,0,3,6,4,3,1,1,1,0,0,0,0,0,0,0,0,0,1,0,1,0,3,2,15,1.753776,0.087436,16,1.117252,1.124423,0,0,-0.321438,0,0.050986,0,0.189146,0,-0.145866,0,-0.692189,-0.1074,-0.115447,1,-1.171515,-0.148751,1,2014,1.117885,2,0,6,1,1,0,1,1,1,-0.112401,-0.084674,-0.202415,-0.297459,-0.698229,-0.110544,1,4,1,3,0,6,3,5,0,0,1,1,22,2,2,0,2,2,2,1,0,0,0,0,1,1,0,0,1,1,0.485354,1.040305,-2.618266,1,0,0,0,0,-0.32367,-0.288237,-0.425239,-0.11085,12,1.065469,1,2014,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,5,4,1,1,1,0,0,0,1,1,-9,1,1,1,1,1,1,1,1,1,1,0,1,0.4215,-0.41435,0.924293,4,0,0,-0.622549,0,-0.205265,0,1,-0.206043,-0.044677,-0.17949,-0.097751,-0.280696,-0.366117,-0.873708,-0.765779,-0.696354,-0.167002,6,0,0,0,0,0,0,0,0,3,0,3,0,3,0.265623,1,0,0.680253,1,0.402053,210,-0.148433,0,2,12,3,6,3,-1.311681,4,2,11,-1.31292,3,2,11,2,-1.313597,2,1,12,1,-1.313893,2,11,14,1,-1.313881,11,12,0,0,3,1,1,2,2,2,3,-1.361249,-0.255976,-0.295974,-0.258665,-0.294789,-0.25275,-0.17737,-1.248358,-0.183014,-0.228409,0.090133,0.090206,0.090291,0.090389,0.09046,0.090502,0.090543,0.090558,0.110425,0.114693,0.125955,0.132173,0.138849,0.142145,0.145636,0.150945,0.153592,0.157493,0.159686,0.162043,0.162991,0.164917,0.166806,0.169784,0.170282,0.0,0.005736,0.009787,0.011343,0.011343,0.023095,0.027758,-0.773167
18,-0.464884,1,-0.949895,0.342248,2,2014,2,2,1,2,2,2,0,0,0,0,0,0,1,4,1,1,1,1,0,0,0,0,0,0,0,0,5,3,5,4,1,1,1,0,0,0,0,0,0,0,0,0,1,0,1,2,4,1,15,-0.134712,-1.296856,16,-0.822361,-1.516673,0,0,-0.833059,0,-0.466767,0,-0.408686,0,-0.145866,0,-0.692189,-0.1074,-0.115447,1,-0.949607,0.342248,2,2014,0.896329,2,0,6,1,2,0,0,1,1,-0.112401,-0.084674,-0.202415,-0.297459,-0.698229,-0.261651,1,5,2,3,0,7,4,6,0,0,1,1,19,4,2,0,2,2,2,1,0,0,0,2,1,1,0,0,0,3,0.795559,0.395159,0.01575,0,0,0,0,0,-0.32367,-0.288237,-0.425239,-0.11085,12,1.187478,2,2014,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,3,2,1,1,1,0,0,0,1,1,2,1,-8,-8,-8,-8,-8,-8,-8,-8,-8,-9,1,0.143739,0.052966,-0.66429,2,0,0,2.552612,0,2.401906,0,1,0.595822,0.569403,-0.17949,-0.097751,1.147798,-0.366117,-0.509811,-0.765779,-0.696354,0.518093,3,1,0,0,0,0,0,0,0,3,1,4,0,4,1.057148,1,0,2.118795,1,0.319905,210,4.546584,0,2,14,2,12,3,-1.161704,4,1,12,-1.162831,4,1,12,3,-1.162918,4,1,12,1,-1.163297,4,10,12,1,-1.163252,1,12,0,0,3,1,1,2,1,2,2,0.734638,-0.255976,-0.295974,-0.258665,-0.294789,-0.25275,-0.17737,0.80201,-0.183014,-0.228409,-0.023155,-0.024173,-0.025107,-0.026112,-0.027339,-0.028708,-0.030076,-0.031205,-0.011251,-0.00723,0.00505,0.011852,0.018417,0.021075,0.022778,0.028695,0.031321,0.033334,0.035751,0.037807,0.036716,0.038121,0.038949,0.041721,0.038909,0.0,0.005736,0.009787,0.011343,0.011343,0.023095,0.027758,-0.151515
28,-1.155361,1,-0.006803,-0.148751,10,2014,2,1,1,2,2,2,0,0,0,0,0,0,1,2,1,2,3,1,0,0,0,0,0,0,0,0,3,4,6,3,1,1,1,0,0,0,0,0,0,0,0,0,1,0,1,0,2,1,15,-0.429788,-1.12382,16,-0.365982,-1.516673,0,0,-0.320421,0,0.050986,0,0.049165,0,-0.140457,0,-0.692189,-0.1074,-0.115447,1,-0.0065,-0.148751,10,2014,0.010105,2,1,6,1,1,0,0,0,1,-0.112401,-0.084674,-0.199366,-0.297459,-0.698229,-0.145547,1,8,2,1,0,10,5,9,0,0,1,1,17,2,2,0,2,2,2,1,0,0,0,0,1,1,0,0,0,5,1.260867,0.006646,-2.576752,1,0,0,0,0,-0.32367,-0.288237,-0.425239,-0.11085,10,-0.154626,10,2014,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,3,2,1,1,0,0,0,0,-8,2,1,1,1,1,1,1,1,1,1,1,1,0,1,-0.689543,-0.711733,-0.66429,1,0,0,-0.419101,0,-0.520231,0,1,-0.146214,-0.195151,-0.17949,-0.097751,0.815514,-0.366117,0.062015,-0.765779,-0.696354,-0.123105,4,0,0,0,0,0,0,0,1,1,1,1,0,2,-0.525902,1,0,-0.659033,1,0.237756,110,-0.73531,0,2,11,2,9,2,0.019483,3,2,11,0.018157,2,1,10,11,0.017577,2,1,10,10,0.017253,1,1,10,10,0.01735,1,10,0,0,3,1,1,2,2,1,4,-1.361249,-0.255976,-0.295974,-0.258665,-0.294789,-0.25275,-0.17737,-1.248358,-0.183014,-0.228409,0.161802,0.16259,0.16332,0.164087,0.164974,0.165936,0.166877,0.167618,0.187453,0.191866,0.202499,0.208347,0.215112,0.218824,0.223462,0.228386,0.23106,0.236173,0.238236,0.240794,0.24305,0.245328,0.2479,0.251028,0.253652,0.0,0.005736,0.009787,0.011343,0.011343,0.023095,0.027758,-0.773167


Great, now we need to encode categorical variables using one-hot encoding

In [None]:
# Encode categorical variables using one-hot encoding, then show shape and head

# Identify categorical (discrete) columns from variable_summary
cat_cols = variable_summary.loc[variable_summary['WillBeDiscrete'], 'Column'].tolist()

# Defensive: keep only those present in X
cat_cols = [c for c in cat_cols if c in X.columns]
print(f'Found {len(cat_cols)} categorical column(s) to encode')

# If there are categorical columns, create dummies and merge with the rest of X
if len(cat_cols) > 0:
    # Convert to string to ensure stable dummy names (preserve distinct categories)
    cat_df = X[cat_cols].astype(str).apply(lambda s: s.str.replace(' ', '_'))
    # Create dummies; drop_first avoids creating a full-rank encoding
    dummies = pd.get_dummies(cat_df, prefix=cat_cols, prefix_sep='_', drop_first=True, dummy_na=False)
    # Build new X: drop original categorical columns, concat dummies
    X_encoded = X.drop(columns=cat_cols).copy()
    # Ensure no column name collisions (rename if necessary)
    overlap = set(X_encoded.columns).intersection(dummies.columns)
    if overlap:
        # Rare: if a dummy name collides with existing column, prefix dummy names with 'dum_'
        dummies = dummies.rename(columns={c: f'dum_{c}' for c in dummies.columns})
    X_encoded = pd.concat([X_encoded, dummies], axis=1)
else:
    print('No categorical columns selected for encoding')
    X_encoded = X.copy()

# Replace X with encoded version for downstream modeling
X = X_encoded

print('Shape of X after encoding:', X.shape)
pd.set_option('display.max_columns', None)
display(X.head())

Found 204 categorical column(s) to encode
Shape of X after encoding: (30393, 1255)
Shape of X after encoding: (30393, 1255)


Unnamed: 0,nbrsnci_dv,dvage,istrtdatd,istrtdatmm,istrtdatss,ienddatmm,ienddatss,fimngrs_dv,fimnlabgrs_dv,fimnlabnet_dv,fiyrinvinc_dv,fibenothr_dv,j2pay_dv,j2paynet_dv,age_dv,intdatd_dv,doby_dv,fimnmisc_dv,fimnprben_dv,fimninvnet_dv,fimnpen_dv,fimnsben_dv,fimnnet_dv,jbiindb_dv,sf12pcs_dv,sf12mcs_dv,fibenothr_if,fimnlabgrs_if,fimngrs_if,ind5mus_xw,intdated,xpfood1_g3,xpfdout_g3,xpaltob_g3,fihhmngrs_dv,fihhmnlabgrs_dv,fihhmnnet1_dv,fihhmnlabnet_dv,fihhmnmisc_dv,fihhmnprben_dv,fihhmninv_dv,fihhmnpen_dv,fihhmnsben_dv,houscost1_dv,houscost2_dv,fihhmngrs1_dv,ieqmoecd_dv,fihhmngrs_if,aintlen,ivtnc,d_hidp,e_hidp,f_hidp,g_hidp,h_hidp,prob91e,prob91w,prob91s,prob99w,prob99s,prob01ni,prob09ni,prob09e,prob09w,prob09s,bb_mortbh_tw,bc_mortbh_tw,bd_mortbh_tw,be_mortbh_tw,bf_mortbh_tw,bg_mortbh_tw,bh_mortbh_tw,bi_mortbh_tw,bj_mortbh_tw,bk_mortbh_tw,bl_mortbh_tw,bm_mortbh_tw,bn_mortbh_tw,bo_mortbh_tw,bp_mortbh_tw,bq_mortbh_tw,br_mortbh_tw,b_mortbh_tw,c_mortbh_tw,d_mortbh_tw,e_mortbh_tw,f_mortbh_tw,g_mortbh_tw,h_mortbh_tw,i_mortbh_tw,b_mortus_tw,c_mortus_tw,d_mortus_tw,e_mortus_tw,f_mortus_tw,g_mortus_tw,h_mortus_tw,psnenub_xd,sex_2,lkmove_2,health_2,aidxhh_2,j2has_2,bensta2_1,bensta3_1,bensta4_1,bensta5_1,bensta6_1,bensta7_1,bensta96_1,vote1_2,mobuse_2,employ_2,respf16_1,respm16_1,jbhas_2,j2pay_if_1,fimngrs_tc_1,fimnlabgrs_tc_1,fimnlabnet_tc_1,fiyrinvinc_tc_1,fibenothr_tc_1,pensioner_dv_2,urban_dv_2,xtra5min_dv_1,livesp_dv_1,cohab_dv_1,single_dv_1,depchl_dv_2,respm16_dv_2,respf16_dv_2,rach16_dv_2,qfhighfl_dv_1,scflag_dv_1,paygu_if_1,paynu_if_1,seearngrs_if_1,fiyrinvinc_if_1,fihhmngrs_tc_1,fihhmnlabgrs_tc_1,fihhnegsei_if_1,outcome_210,w6osmflag_1,dcsedfl_dv_2,genetics_1,epigenetics_1,xwdat_dv_3,bornuk_dv_2,evercoh_dv_2,evermar_dv_2,anychild_dv_2,istrtdaty_2015,istrtdaty_2016,finfut_2,finfut_3,sex_dv_1,sex_dv_2,intdaty_dv_2015,intdaty_dv_2016,npn_dv_1,npn_dv_2,npns_dv_1,npns_dv_2,ngrp_dv_1,ngrp_dv_2,hhresp_dv_2,hhresp_dv_3,ivfho_11,ivfho_12,intdatey_2015,intdatey_2016,nonepar_dv_1,nonepar_dv_2,f_ivfho_11,f_ivfho_12,school_dv_2,school_dv_3,vote6_2,vote6_3,vote6_4,npensioner_dv_1,npensioner_dv_2,npensioner_dv_3,country_2,country_3,country_4,ivh10_-2,ivh10_-9,ivh10_0,ivh11_-2,ivh11_-9,ivh11_0,ivh12_-2,ivh12_-9,ivh12_0,ivh13_-2,ivh13_-9,ivh13_0,ivh14_-2,ivh14_-9,ivh14_0,ivh15_-2,ivh15_-9,ivh15_0,ivh16_-2,ivh16_-9,ivh16_0,heatch_-2,heatch_1,heatch_2,pcnet_-2,pcnet_1,pcnet_2,ctband_if_1,ctband_if_2,ctband_if_3,nch02_dv_1,nch02_dv_2,nch02_dv_3,nch34_dv_1,nch34_dv_2,nch34_dv_3,npens_dv_1,npens_dv_2,npens_dv_3,ethn_dv_source_2,ethn_dv_source_3,ethn_dv_source_4,finnow_2,finnow_3,finnow_4,finnow_5,nadoptch_1,nadoptch_2,nadoptch_3,nadoptch_4,ivh1_-2,ivh1_-9,ivh1_0,ivh1_1,ivh2_-2,ivh2_-9,ivh2_0,ivh2_1,ivh3_-2,ivh3_-9,ivh3_0,ivh3_1,ivh4_-2,ivh4_-9,ivh4_0,ivh4_1,ivh5_-2,ivh5_-9,ivh5_0,ivh5_1,ivh6_-2,ivh6_-9,ivh6_0,ivh6_1,ivh7_-2,ivh7_-9,ivh7_0,ivh7_1,ivh8_-2,ivh8_-9,ivh8_0,ivh8_1,ivh9_-2,ivh9_-9,ivh9_0,ivh9_1,fuelhave1_-2,fuelhave1_-9,fuelhave1_0,fuelhave1_1,fuelhave2_-2,fuelhave2_-9,fuelhave2_0,fuelhave2_1,fuelhave3_-2,fuelhave3_-9,fuelhave3_0,fuelhave3_1,fuelhave4_-2,fuelhave4_-9,fuelhave4_0,fuelhave4_1,fuelhave96_-2,fuelhave96_-9,fuelhave96_0,fuelhave96_1,fuelduel_-2,fuelduel_-8,fuelduel_1,fuelduel_2,xphsdba_-2,xphsdba_1,xphsdba_2,xphsdba_3,cduse96_-2,cduse96_-9,cduse96_0,cduse96_1,ncouple_dv_1,ncouple_dv_2,ncouple_dv_3,ncouple_dv_4,nch1215_dv_1,nch1215_dv_2,nch1215_dv_3,nch1215_dv_4,issue_num_2,issue_num_3,issue_num_4,issue_num_5,nch10to15_1,nch10to15_2,nch10to15_3,nch10to15_4,nch10to15_5,marstat_dv_2,marstat_dv_3,marstat_dv_4,marstat_dv_5,marstat_dv_6,hiqual_dv_2,hiqual_dv_3,hiqual_dv_4,hiqual_dv_5,hiqual_dv_9,xphsdct_-2,xphsdct_-8,xphsdct_-9,xphsdct_1,xphsdct_2,cduse1_-2,cduse1_-8,cduse1_-9,cduse1_0,cduse1_1,cduse2_-2,cduse2_-8,cduse2_-9,cduse2_0,cduse2_1,cduse5_-2,cduse5_-8,cduse5_-9,cduse5_0,cduse5_1,cduse6_-2,cduse6_-8,cduse6_-9,cduse6_0,cduse6_1,cduse7_-2,cduse7_-8,cduse7_-9,cduse7_0,cduse7_1,cduse8_-2,cduse8_-8,cduse8_-9,cduse8_0,cduse8_1,cduse9_-2,cduse9_-8,cduse9_-9,cduse9_0,cduse9_1,cduse12_-2,cduse12_-8,cduse12_-9,cduse12_0,cduse12_1,cduse13_-2,cduse13_-8,cduse13_-9,cduse13_0,cduse13_1,n10to15_1,n10to15_2,n10to15_3,n10to15_4,n10to15_5,nch511_dv_1,nch511_dv_2,nch511_dv_3,nch511_dv_4,nch511_dv_5,fwenum_dv_2,fwenum_dv_3,fwenum_dv_4,fwenum_dv_5,fwenum_dv_6,fwintvd_dv_2,fwintvd_dv_3,fwintvd_dv_4,fwintvd_dv_5,fwintvd_dv_6,nch415resp_1,nch415resp_2,nch415resp_3,nch415resp_4,nch415resp_5,nch415resp_6,nch5to15_1,nch5to15_2,nch5to15_3,nch5to15_4,nch5to15_5,nch5to15_6,sclfsat1_2,sclfsat1_3,sclfsat1_4,sclfsat1_5,sclfsat1_6,sclfsat1_7,sclfsat2_2,sclfsat2_3,sclfsat2_4,sclfsat2_5,sclfsat2_6,sclfsat2_7,sclfsat7_2,sclfsat7_3,sclfsat7_4,sclfsat7_5,sclfsat7_6,sclfsat7_7,sclfsato_2,sclfsato_3,sclfsato_4,sclfsato_5,sclfsato_6,sclfsato_7,agegr10_dv_3,agegr10_dv_4,agegr10_dv_5,agegr10_dv_6,agegr10_dv_7,agegr10_dv_8,grfpno_1,grfpno_2,grfpno_3,grfpno_4,grfpno_5,grfpno_7,nch14resp_1,nch14resp_2,nch14resp_3,nch14resp_4,nch14resp_5,nch14resp_6,nch14resp_7,nchresp_1,nchresp_2,nchresp_3,nchresp_4,nchresp_5,nchresp_6,nchresp_7,nchunder16_1,nchunder16_2,nchunder16_3,nchunder16_4,nchunder16_5,nchunder16_6,nchunder16_7,nchild_dv_1,nchild_dv_2,nchild_dv_3,nchild_dv_4,nchild_dv_5,nchild_dv_6,nchild_dv_7,hrpno_10,hrpno_2,hrpno_3,hrpno_4,hrpno_5,hrpno_6,hrpno_7,hhintlang_0,hhintlang_2,hhintlang_4,hhintlang_5,hhintlang_6,hhintlang_8,hhintlang_9,nemp_dv_1,nemp_dv_2,nemp_dv_3,nemp_dv_4,nemp_dv_5,nemp_dv_6,nemp_dv_7,nchoecd_dv_1,nchoecd_dv_2,nchoecd_dv_3,nchoecd_dv_4,nchoecd_dv_5,nchoecd_dv_6,nchoecd_dv_8,marstat_2,marstat_3,marstat_4,marstat_5,marstat_6,marstat_7,marstat_8,marstat_9,hgbiom_1,hgbiom_2,hgbiom_3,hgbiom_4,hgbiom_5,hgbiom_6,hgbiom_7,hgbiom_8,pn1pno_1,pn1pno_10,pn1pno_2,pn1pno_3,pn1pno_4,pn1pno_5,pn1pno_6,pn1pno_7,pn2pno_2,pn2pno_3,pn2pno_4,pn2pno_5,pn2pno_6,pn2pno_7,pn2pno_8,pn2pno_9,pns1pno_1,pns1pno_10,pns1pno_2,pns1pno_3,pns1pno_4,pns1pno_5,pns1pno_6,pns1pno_7,pns2pno_2,pns2pno_3,pns2pno_4,pns2pno_5,pns2pno_6,pns2pno_7,pns2pno_8,pns2pno_9,mnpno_1,mnpno_2,mnpno_3,mnpno_4,mnpno_5,mnpno_6,mnpno_7,mnpno_8,mnspno_1,mnspno_2,mnspno_3,mnspno_4,mnspno_5,mnspno_6,mnspno_7,mnspno_8,hsownd_-8,hsownd_-9,hsownd_1,hsownd_2,hsownd_3,hsownd_4,hsownd_5,hsownd_97,nkids_dv_1,nkids_dv_2,nkids_dv_3,nkids_dv_4,nkids_dv_5,nkids_dv_6,nkids_dv_7,nkids_dv_8,nue_dv_1,nue_dv_2,nue_dv_3,nue_dv_4,nue_dv_5,nue_dv_6,nue_dv_7,nue_dv_8,tenure_dv_1,tenure_dv_2,tenure_dv_3,tenure_dv_4,tenure_dv_5,tenure_dv_6,tenure_dv_7,tenure_dv_8,lwenum_dv_11,lwenum_dv_12,lwenum_dv_13,lwenum_dv_14,lwenum_dv_6,lwenum_dv_7,lwenum_dv_8,lwenum_dv_9,lwintvd_dv_11,lwintvd_dv_12,lwintvd_dv_13,lwintvd_dv_14,lwintvd_dv_6,lwintvd_dv_7,lwintvd_dv_8,lwintvd_dv_9,nnsib_dv_1,nnsib_dv_2,nnsib_dv_3,nnsib_dv_4,nnsib_dv_5,nnsib_dv_6,nnsib_dv_7,nnsib_dv_8,nnsib_dv_9,nnssib_dv_1,nnssib_dv_2,nnssib_dv_3,nnssib_dv_4,nnssib_dv_5,nnssib_dv_6,nnssib_dv_7,nnssib_dv_8,nnssib_dv_9,mastat_dv_10,mastat_dv_2,mastat_dv_3,mastat_dv_4,mastat_dv_5,mastat_dv_6,mastat_dv_7,mastat_dv_8,mastat_dv_9,grmpno_1,grmpno_13,grmpno_2,grmpno_3,grmpno_4,grmpno_5,grmpno_6,grmpno_7,grmpno_8,g_pno_10,g_pno_2,g_pno_3,g_pno_4,g_pno_5,g_pno_6,g_pno_7,g_pno_8,g_pno_9,nnatch_1,nnatch_10,nnatch_2,nnatch_3,nnatch_4,nnatch_5,nnatch_6,nnatch_7,nnatch_8,nnatch_9,hgbiof_1,hgbiof_10,hgbiof_2,hgbiof_3,hgbiof_4,hgbiof_5,hgbiof_6,hgbiof_7,hgbiof_8,hgbiof_9,fnpno_1,fnpno_10,fnpno_2,fnpno_3,fnpno_4,fnpno_5,fnpno_6,fnpno_7,fnpno_8,fnpno_9,fnspno_1,fnspno_10,fnspno_2,fnspno_3,fnspno_4,fnspno_5,fnspno_6,fnspno_7,fnspno_8,fnspno_9,ctband_dv_1,ctband_dv_10,ctband_dv_2,ctband_dv_3,ctband_dv_4,ctband_dv_5,ctband_dv_6,ctband_dv_7,ctband_dv_8,ctband_dv_9,d_pno_10,d_pno_11,d_pno_2,d_pno_3,d_pno_4,d_pno_5,d_pno_6,d_pno_7,d_pno_8,d_pno_9,e_pno_10,e_pno_11,e_pno_2,e_pno_3,e_pno_4,e_pno_5,e_pno_6,e_pno_7,e_pno_8,e_pno_9,istrtdatm_10,istrtdatm_11,istrtdatm_12,istrtdatm_2,istrtdatm_3,istrtdatm_4,istrtdatm_5,istrtdatm_6,istrtdatm_7,istrtdatm_8,istrtdatm_9,jbstat_10,jbstat_11,jbstat_2,jbstat_3,jbstat_4,jbstat_5,jbstat_6,jbstat_7,jbstat_8,jbstat_9,jbstat_97,hgpart_1,hgpart_10,hgpart_11,hgpart_2,hgpart_3,hgpart_4,hgpart_5,hgpart_6,hgpart_7,hgpart_8,hgpart_9,intdatm_dv_10,intdatm_dv_11,intdatm_dv_12,intdatm_dv_2,intdatm_dv_3,intdatm_dv_4,intdatm_dv_5,intdatm_dv_6,intdatm_dv_7,intdatm_dv_8,intdatm_dv_9,gor_dv_10,gor_dv_11,gor_dv_12,gor_dv_2,gor_dv_3,gor_dv_4,gor_dv_5,gor_dv_6,gor_dv_7,gor_dv_8,gor_dv_9,agegr5_dv_11,agegr5_dv_12,agegr5_dv_13,agegr5_dv_14,agegr5_dv_15,agegr5_dv_4,agegr5_dv_5,agegr5_dv_6,agegr5_dv_7,agegr5_dv_8,agegr5_dv_9,agegr13_dv_11,agegr13_dv_12,agegr13_dv_13,agegr13_dv_2,agegr13_dv_3,agegr13_dv_4,agegr13_dv_5,agegr13_dv_6,agegr13_dv_7,agegr13_dv_8,agegr13_dv_9,buno_dv_10,buno_dv_11,buno_dv_13,buno_dv_2,buno_dv_3,buno_dv_4,buno_dv_5,buno_dv_6,buno_dv_7,buno_dv_8,buno_dv_9,ppno_1,ppno_10,ppno_11,ppno_2,ppno_3,ppno_4,ppno_5,ppno_6,ppno_7,ppno_8,ppno_9,sppno_1,sppno_10,sppno_11,sppno_2,sppno_3,sppno_4,sppno_5,sppno_6,sppno_7,sppno_8,sppno_9,intdatem_10,intdatem_11,intdatem_12,intdatem_2,intdatem_3,intdatem_4,intdatem_5,intdatem_6,intdatem_7,intdatem_8,intdatem_9,nwage_dv_1,nwage_dv_10,nwage_dv_11,nwage_dv_2,nwage_dv_3,nwage_dv_4,nwage_dv_5,nwage_dv_6,nwage_dv_7,nwage_dv_8,nwage_dv_9,nadoecd_dv_10,nadoecd_dv_11,nadoecd_dv_12,nadoecd_dv_2,nadoecd_dv_3,nadoecd_dv_4,nadoecd_dv_5,nadoecd_dv_6,nadoecd_dv_7,nadoecd_dv_8,nadoecd_dv_9,f_pno_10,f_pno_11,f_pno_13,f_pno_2,f_pno_3,f_pno_4,f_pno_5,f_pno_6,f_pno_7,f_pno_8,f_pno_9,hhsize_10,hhsize_11,hhsize_12,hhsize_13,hhsize_14,hhsize_2,hhsize_3,hhsize_4,hhsize_5,hhsize_6,hhsize_7,hhsize_8,hhsize_9,hsrooms_-2,hsrooms_-9,hsrooms_1,hsrooms_10,hsrooms_12,hsrooms_15,hsrooms_2,hsrooms_3,hsrooms_4,hsrooms_5,hsrooms_6,hsrooms_7,hsrooms_8,hsrooms_9,hsbeds_-2,hsbeds_-9,hsbeds_0,hsbeds_1,hsbeds_10,hsbeds_11,hsbeds_12,hsbeds_2,hsbeds_3,hsbeds_4,hsbeds_5,hsbeds_6,hsbeds_7,hsbeds_8,hsbeds_9,ncars_-2,ncars_0,ncars_1,ncars_10,ncars_11,ncars_12,ncars_2,ncars_3,ncars_30,ncars_4,ncars_5,ncars_6,ncars_7,ncars_8,ncars_9,racel_dv_10,racel_dv_11,racel_dv_12,racel_dv_13,racel_dv_14,racel_dv_15,racel_dv_16,racel_dv_17,racel_dv_2,racel_dv_4,racel_dv_5,racel_dv_6,racel_dv_7,racel_dv_8,racel_dv_9,racel_dv_97,ethn_dv_10,ethn_dv_11,ethn_dv_12,ethn_dv_13,ethn_dv_14,ethn_dv_15,ethn_dv_16,ethn_dv_17,ethn_dv_2,ethn_dv_4,ethn_dv_5,ethn_dv_6,ethn_dv_7,ethn_dv_8,ethn_dv_9,ethn_dv_97,d_ivfho_11,d_ivfho_12,d_ivfho_13,d_ivfho_39,d_ivfho_50,d_ivfho_51,d_ivfho_53,d_ivfho_55,d_ivfho_59,d_ivfho_60,d_ivfho_61,d_ivfho_62,d_ivfho_63,d_ivfho_65,d_ivfho_91,d_ivfho_92,g_ivfio_10,g_ivfio_11,g_ivfio_14,g_ivfio_15,g_ivfio_2,g_ivfio_50,g_ivfio_51,g_ivfio_52,g_ivfio_53,g_ivfio_54,g_ivfio_57,g_ivfio_80,g_ivfio_81,g_ivfio_83,g_ivfio_9,g_ivfio_99,h_ivfio_10,h_ivfio_11,h_ivfio_14,h_ivfio_15,h_ivfio_2,h_ivfio_50,h_ivfio_52,h_ivfio_53,h_ivfio_54,h_ivfio_57,h_ivfio_80,h_ivfio_81,h_ivfio_83,h_ivfio_9,h_ivfio_98,h_ivfio_99,hhtype_dv_10,hhtype_dv_11,hhtype_dv_12,hhtype_dv_16,hhtype_dv_17,hhtype_dv_18,hhtype_dv_19,hhtype_dv_2,hhtype_dv_20,hhtype_dv_21,hhtype_dv_22,hhtype_dv_23,hhtype_dv_3,hhtype_dv_4,hhtype_dv_5,hhtype_dv_6,hhtype_dv_8,e_ivfho_11,e_ivfho_12,e_ivfho_13,e_ivfho_50,e_ivfho_51,e_ivfho_53,e_ivfho_55,e_ivfho_56,e_ivfho_59,e_ivfho_60,e_ivfho_61,e_ivfho_62,e_ivfho_63,e_ivfho_65,e_ivfho_81,e_ivfho_91,e_ivfho_96,e_ivfho_97,d_ivfio_10,d_ivfio_11,d_ivfio_14,d_ivfio_15,d_ivfio_16,d_ivfio_18,d_ivfio_2,d_ivfio_21,d_ivfio_25,d_ivfio_50,d_ivfio_51,d_ivfio_52,d_ivfio_53,d_ivfio_57,d_ivfio_60,d_ivfio_63,d_ivfio_80,d_ivfio_81,d_ivfio_9,ienddathh_1,ienddathh_10,ienddathh_11,ienddathh_12,ienddathh_13,ienddathh_14,ienddathh_15,ienddathh_16,ienddathh_17,ienddathh_18,ienddathh_19,ienddathh_20,ienddathh_21,ienddathh_22,ienddathh_23,ienddathh_3,ienddathh_5,ienddathh_6,ienddathh_8,ienddathh_9,e_ivfio_10,e_ivfio_11,e_ivfio_14,e_ivfio_15,e_ivfio_16,e_ivfio_18,e_ivfio_2,e_ivfio_21,e_ivfio_25,e_ivfio_50,e_ivfio_51,e_ivfio_52,e_ivfio_53,e_ivfio_54,e_ivfio_57,e_ivfio_60,e_ivfio_63,e_ivfio_80,e_ivfio_83,e_ivfio_9,istrtdathh_10,istrtdathh_11,istrtdathh_12,istrtdathh_13,istrtdathh_14,istrtdathh_15,istrtdathh_16,istrtdathh_17,istrtdathh_18,istrtdathh_19,istrtdathh_2,istrtdathh_20,istrtdathh_21,istrtdathh_22,istrtdathh_23,istrtdathh_4,istrtdathh_5,istrtdathh_6,istrtdathh_7,istrtdathh_8,istrtdathh_9,e_month_10,e_month_11,e_month_12,e_month_13,e_month_14,e_month_15,e_month_16,e_month_17,e_month_18,e_month_19,e_month_2,e_month_20,e_month_21,e_month_22,e_month_23,e_month_24,e_month_3,e_month_4,e_month_5,e_month_6,e_month_7,e_month_8,e_month_9,f_month_10,f_month_11,f_month_12,f_month_13,f_month_14,f_month_15,f_month_16,f_month_17,f_month_18,f_month_19,f_month_2,f_month_20,f_month_21,f_month_22,f_month_23,f_month_24,f_month_3,f_month_4,f_month_5,f_month_6,f_month_7,f_month_8,f_month_9,g_month_10,g_month_11,g_month_12,g_month_13,g_month_14,g_month_15,g_month_16,g_month_17,g_month_18,g_month_19,g_month_2,g_month_20,g_month_21,g_month_22,g_month_23,g_month_24,g_month_3,g_month_4,g_month_5,g_month_6,g_month_7,g_month_8,g_month_9,g_ivfho_11,g_ivfho_12,g_ivfho_13,g_ivfho_14,g_ivfho_15,g_ivfho_16,g_ivfho_39,g_ivfho_50,g_ivfho_51,g_ivfho_52,g_ivfho_53,g_ivfho_54,g_ivfho_55,g_ivfho_59,g_ivfho_60,g_ivfho_61,g_ivfho_62,g_ivfho_63,g_ivfho_65,g_ivfho_66,g_ivfho_81,g_ivfho_91,g_ivfho_96,g_ivfho_97,h_ivfho_11,h_ivfho_12,h_ivfho_13,h_ivfho_14,h_ivfho_15,h_ivfho_16,h_ivfho_39,h_ivfho_50,h_ivfho_51,h_ivfho_53,h_ivfho_55,h_ivfho_59,h_ivfho_60,h_ivfho_61,h_ivfho_63,h_ivfho_65,h_ivfho_66,h_ivfho_81,h_ivfho_90,h_ivfho_91,h_ivfho_92,h_ivfho_95,h_ivfho_96,h_ivfho_97
3,0.639878,-0.672515,1.692497,-1.669109,1.010297,1.060204,-1.516673,0.211878,-0.74178,-0.808718,-0.145866,1.954234,-0.1074,-0.115447,-0.672223,1.692497,0.674773,-0.112401,3.941976,-0.202415,-0.297459,3.418842,0.127134,1.260867,0.628624,-1.485992,-0.154913,3.520283,-0.256349,-0.11085,1.675516,0.143739,-0.5418,-0.66429,-0.640248,-0.95828,-0.187982,-0.329248,-0.17949,2.567482,-0.282252,-0.366117,1.888175,2.19613,3.374203,-0.170821,-0.367597,-0.513697,-0.255135,-0.441872,-0.272483,-0.273698,-0.274659,-0.274879,-0.275217,-1.361249,-0.255976,-0.295974,-0.258665,-0.294789,-0.25275,-0.17737,-1.248358,-0.183014,-0.228409,0.007908,0.00719,0.006539,0.005842,0.004955,0.00397,0.002964,0.002137,0.022051,0.026136,0.038135,0.044763,0.051329,0.054146,0.056311,0.062043,0.064654,0.067146,0.069479,0.071597,0.071033,0.072547,0.073635,0.076429,0.074465,0.0,0.005736,0.009787,0.011343,0.011343,0.023095,0.027758,-0.773167,True,False,False,True,True,False,False,True,False,False,False,False,True,False,False,False,True,False,False,False,False,False,False,False,True,False,False,False,False,True,True,False,True,False,False,True,False,False,True,False,False,False,False,False,False,True,False,False,True,False,False,False,False,False,False,False,False,False,True,False,False,False,False,False,False,False,False,False,False,False,False,False,False,True,False,False,False,False,True,True,False,False,False,False,False,False,False,False,False,False,True,False,False,True,False,False,True,False,False,True,False,False,True,False,False,True,False,False,True,False,True,False,False,True,False,False,False,True,False,False,False,False,True,False,False,False,False,True,False,False,False,False,True,False,False,False,False,False,False,False,False,True,False,False,True,False,False,False,True,False,False,False,True,False,False,False,True,False,False,False,True,False,False,False,True,False,False,False,True,False,False,False,True,False,False,False,False,True,False,False,False,True,False,False,True,False,False,False,True,False,False,False,True,False,False,False,True,False,False,True,False,False,False,False,True,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,True,False,False,False,False,False,False,False,False,False,False,True,False,False,False,False,True,False,False,False,False,True,False,False,False,False,True,False,False,False,False,True,False,False,False,True,False,False,False,False,False,True,False,False,False,False,True,False,False,False,False,True,False,False,False,False,True,False,False,False,False,False,False,False,False,False,False,False,False,True,False,False,False,False,True,False,False,True,False,False,False,False,False,False,False,False,False,False,False,False,True,False,False,False,False,True,False,False,False,False,False,False,False,False,True,False,False,False,False,True,False,False,False,False,True,False,False,False,False,False,False,False,False,False,False,False,True,False,False,False,False,False,False,True,False,False,False,False,False,False,True,False,False,False,False,False,False,True,False,False,False,False,False,False,False,False,False,False,False,False,True,False,False,False,False,False,False,True,False,False,False,False,False,False,False,True,False,False,False,False,False,False,False,True,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,True,False,False,False,True,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,True,False,False,False,False,True,False,False,False,False,False,False,False,True,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,True,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,True,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,True,False,False,False,False,False,False,False,False,False,False,True,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,True,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,True,False,False,False,False,False,False,False,False,True,False,False,False,False,False,False,False,False,False,False,False,True,False,False,False,False,False,False,False,False,False,True,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,True,True,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,True,False,False,False,False,False,False,False,False,True,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,True,False,False,False,False,False,False,False,False,False,True,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,True,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,True,False,False,False,False,False,False,False,False,False,False,False,False,True,False,False,False,False,False,False,False,False,False,False,False,False,False,True,False,False,False,False,False,False,False,False,False,False,False,False,False,True,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,True,False,False,False,False,False,False,False,False,False,False,True,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,True,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,True,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,True,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False
12,-1.155361,-1.282752,1.815247,0.21938,0.202793,-1.164646,0.550271,-0.924523,-0.750298,-0.821108,-0.145866,-0.312903,-0.1074,-0.115447,-1.282468,1.815247,1.284052,-0.112401,-0.084674,-0.202415,-0.297459,-0.00565,-0.295682,-1.065674,1.42169,-3.124727,-0.32367,-0.288237,-0.425239,-0.11085,1.797526,-0.022917,0.052966,0.394765,0.653301,0.913276,0.136963,0.252237,-0.17949,-0.097751,-0.230377,-0.366117,-0.409101,0.377342,-0.590312,0.108284,0.265623,-0.659033,-0.172987,0.145006,-0.867782,-0.868917,-0.869201,-0.869521,-0.870122,-1.361249,-0.255976,-0.295974,-0.258665,-0.294789,-0.25275,-0.17737,-1.248358,-0.183014,-0.228409,0.092773,0.092952,0.093073,0.093144,0.093264,0.093334,0.093397,0.093441,0.113316,0.117582,0.128823,0.135021,0.14169,0.144998,0.148522,0.153802,0.156446,0.160377,0.162557,0.164917,0.165903,0.167832,0.169732,0.172706,0.17326,0.0,0.005736,0.009787,0.011343,0.011343,0.023095,0.027758,-0.773167,True,True,True,True,True,False,False,False,False,False,False,True,True,False,True,False,False,True,False,False,False,False,False,False,True,True,False,False,False,True,True,True,True,True,False,True,False,False,False,False,False,False,False,False,False,True,False,False,True,False,False,True,True,False,False,False,False,False,True,False,False,False,True,False,True,False,False,False,False,False,False,False,False,False,False,False,False,False,True,True,False,False,False,False,False,False,False,False,False,False,True,False,False,True,False,False,True,False,False,True,False,False,True,False,False,True,False,False,True,False,True,False,False,True,False,True,False,False,False,False,False,False,False,False,False,False,False,False,True,False,False,False,True,False,False,False,False,False,False,False,False,True,False,False,True,False,False,False,True,False,False,False,True,False,False,False,True,False,False,False,True,False,False,False,True,False,False,False,True,False,False,False,True,False,False,False,False,True,False,False,True,False,False,False,False,True,False,False,True,False,False,False,True,False,False,True,False,False,False,True,False,False,False,False,True,False,True,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,True,False,True,False,False,False,False,False,False,False,True,False,False,False,False,True,False,False,False,False,True,False,False,False,False,True,False,False,False,False,True,False,False,False,False,True,False,False,False,False,True,False,False,False,False,True,False,False,False,False,True,False,False,False,False,True,False,False,False,False,False,False,False,False,False,False,False,False,True,False,False,False,False,True,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,True,False,False,False,False,False,False,False,False,True,False,False,True,False,False,False,False,True,False,False,False,False,False,True,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,True,False,False,False,False,False,False,False,True,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,True,False,False,False,False,False,False,True,False,False,False,False,False,False,False,True,False,False,False,False,False,False,False,True,False,False,False,False,False,False,False,True,False,False,False,False,False,False,False,False,True,False,False,False,False,False,False,False,True,False,False,False,False,False,False,False,False,False,True,False,False,False,False,False,False,False,False,False,False,False,False,True,False,False,False,False,False,False,False,False,True,False,False,False,False,False,False,False,False,False,False,False,False,True,False,False,False,False,False,False,True,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,True,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,True,False,False,False,False,False,False,False,False,False,True,False,False,False,False,False,False,False,False,False,True,False,False,False,False,False,False,False,False,False,False,False,False,False,True,False,False,False,False,False,False,False,False,True,False,False,False,False,False,False,False,False,False,True,False,False,False,False,False,False,False,False,False,False,False,False,True,False,False,False,False,False,False,False,True,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,True,False,False,False,False,False,False,False,False,False,False,False,False,False,False,True,False,False,False,False,False,False,False,True,False,False,False,False,False,False,False,False,False,True,False,False,False,False,False,False,False,False,True,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,True,False,False,False,False,False,False,False,False,True,False,False,False,False,False,False,False,False,False,False,True,False,False,False,False,False,False,False,False,False,False,True,False,False,False,False,False,False,False,False,False,False,False,False,True,False,False,False,False,False,False,False,False,False,False,False,False,True,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,True,False,False,False,False,False,False,False,False,False,False,False,True,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,True,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,True,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,True,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,True,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,True,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,True,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,True,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,True,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,True,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False
15,-2.398218,-1.1718,-0.148751,1.753776,0.087436,1.117252,1.124423,-0.321438,0.050986,0.189146,-0.145866,-0.692189,-0.1074,-0.115447,-1.171515,-0.148751,1.117885,-0.112401,-0.084674,-0.202415,-0.297459,-0.698229,-0.110544,0.485354,1.040305,-2.618266,-0.32367,-0.288237,-0.425239,-0.11085,1.065469,0.4215,-0.41435,0.924293,-0.622549,-0.205265,-0.206043,-0.044677,-0.17949,-0.097751,-0.280696,-0.366117,-0.873708,-0.765779,-0.696354,-0.167002,0.265623,0.680253,0.402053,-0.148433,-1.311681,-1.31292,-1.313597,-1.313893,-1.313881,-1.361249,-0.255976,-0.295974,-0.258665,-0.294789,-0.25275,-0.17737,-1.248358,-0.183014,-0.228409,0.090133,0.090206,0.090291,0.090389,0.09046,0.090502,0.090543,0.090558,0.110425,0.114693,0.125955,0.132173,0.138849,0.142145,0.145636,0.150945,0.153592,0.157493,0.159686,0.162043,0.162991,0.164917,0.166806,0.169784,0.170282,0.0,0.005736,0.009787,0.011343,0.011343,0.023095,0.027758,-0.773167,False,True,True,True,True,False,False,False,False,False,False,True,True,False,False,False,False,True,False,False,False,False,False,False,True,False,False,False,False,True,True,True,True,True,True,True,False,False,False,False,False,False,False,True,False,True,False,False,True,False,True,True,True,False,False,False,False,True,False,False,False,True,False,True,False,False,False,False,True,False,True,False,False,False,False,False,True,False,False,True,False,False,False,False,False,False,False,False,False,False,True,False,False,True,False,False,True,False,False,True,False,False,True,False,False,True,False,False,True,False,True,False,False,True,False,True,False,False,False,False,False,False,False,False,False,False,False,False,True,False,False,False,False,False,False,False,False,False,False,False,False,True,False,False,True,False,False,False,True,False,False,False,True,False,False,False,True,False,False,False,True,False,False,False,True,False,False,False,True,False,False,False,True,False,False,False,False,True,False,False,False,True,False,False,True,False,False,False,True,False,False,False,True,False,False,False,True,False,False,True,False,False,False,False,True,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,True,False,False,False,False,False,False,False,True,False,False,False,False,False,False,True,False,False,False,False,True,False,False,False,False,True,False,False,False,False,True,False,False,False,False,True,False,False,False,False,True,False,False,False,False,True,False,False,False,False,True,False,False,False,False,True,False,False,False,False,False,False,False,False,False,False,False,True,False,False,False,False,True,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,True,False,False,False,False,False,False,False,False,True,False,False,False,True,False,False,False,False,True,False,False,False,False,True,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,True,False,False,False,False,False,False,False,False,True,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,True,False,False,False,False,False,False,False,True,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,True,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,True,False,False,False,False,False,False,False,True,False,False,False,False,False,False,False,False,False,True,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,True,False,False,False,False,False,False,False,False,True,False,False,False,False,False,False,False,False,False,False,True,False,False,False,True,False,False,False,False,False,False,False,False,True,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,True,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,True,False,False,False,False,False,False,False,True,False,False,False,False,False,False,False,False,True,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,True,False,False,False,False,False,False,False,False,False,False,False,False,True,False,False,False,False,False,False,False,False,False,True,False,False,False,False,False,False,False,True,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,True,False,False,False,False,False,False,False,False,False,False,True,False,False,False,False,False,False,False,False,False,True,False,False,False,False,False,False,False,False,False,False,False,False,False,True,False,False,False,False,False,False,False,False,False,False,False,False,False,False,True,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,True,False,False,False,False,False,False,False,False,False,False,False,False,False,True,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,True,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,True,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,True,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,True,False,False,False,False,False,False,True,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,True,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,True,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,True,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,True,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,True,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,True,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,True,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False
18,-0.464884,-0.949895,0.342248,-0.134712,-1.296856,-0.822361,-1.516673,-0.833059,-0.466767,-0.408686,-0.145866,-0.692189,-0.1074,-0.115447,-0.949607,0.342248,0.896329,-0.112401,-0.084674,-0.202415,-0.297459,-0.698229,-0.261651,0.795559,0.395159,0.01575,-0.32367,-0.288237,-0.425239,-0.11085,1.187478,0.143739,0.052966,-0.66429,2.552612,2.401906,0.595822,0.569403,-0.17949,-0.097751,1.147798,-0.366117,-0.509811,-0.765779,-0.696354,0.518093,1.057148,2.118795,0.319905,4.546584,-1.161704,-1.162831,-1.162918,-1.163297,-1.163252,0.734638,-0.255976,-0.295974,-0.258665,-0.294789,-0.25275,-0.17737,0.80201,-0.183014,-0.228409,-0.023155,-0.024173,-0.025107,-0.026112,-0.027339,-0.028708,-0.030076,-0.031205,-0.011251,-0.00723,0.00505,0.011852,0.018417,0.021075,0.022778,0.028695,0.031321,0.033334,0.035751,0.037807,0.036716,0.038121,0.038949,0.041721,0.038909,0.0,0.005736,0.009787,0.011343,0.011343,0.023095,0.027758,-0.151515,False,True,True,True,True,False,False,False,False,False,False,True,False,False,False,False,False,False,False,False,False,False,False,False,True,True,False,False,False,True,True,True,True,True,False,False,False,False,False,False,False,False,False,True,False,True,False,False,True,False,True,False,True,False,False,False,False,True,False,False,False,True,False,False,True,False,False,False,True,False,True,False,False,False,False,False,True,False,False,False,False,False,False,False,False,False,False,False,False,False,True,False,False,True,False,False,True,False,False,True,False,False,True,False,False,True,False,False,True,False,True,False,False,True,False,True,False,False,False,False,False,False,False,False,False,False,False,True,False,False,False,False,True,False,False,False,False,False,False,False,False,True,False,False,True,False,False,False,True,False,False,False,True,False,False,False,True,False,False,False,True,False,False,False,True,False,False,False,True,False,False,False,True,False,False,False,False,True,False,False,False,True,False,False,True,False,False,False,True,False,False,False,True,False,False,False,True,False,False,True,False,False,False,True,False,False,True,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,True,False,True,False,False,False,False,False,False,False,True,False,True,False,False,False,False,True,False,False,False,False,True,False,False,False,False,True,False,False,False,False,True,False,False,False,False,True,False,False,False,False,True,False,False,False,False,True,False,False,False,False,True,False,False,False,False,False,False,False,False,False,False,False,False,False,True,False,False,False,False,False,True,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,True,False,False,False,True,False,False,False,False,False,False,False,True,False,False,False,False,True,False,False,False,False,True,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,True,False,False,False,False,False,False,False,False,True,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,True,False,False,False,False,False,False,False,True,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,True,False,False,False,False,False,False,False,True,False,False,False,False,False,False,False,True,False,False,False,False,False,False,False,True,False,False,False,False,False,False,False,False,False,True,False,False,False,False,False,False,False,False,False,False,False,False,False,True,False,False,False,False,False,False,False,True,False,False,False,False,False,False,False,False,False,False,True,False,False,False,False,False,True,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,True,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,True,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,True,False,False,False,False,False,False,False,False,False,False,True,False,False,False,False,False,False,False,False,False,False,True,False,False,False,False,False,False,False,False,False,True,False,False,False,False,False,False,False,False,True,False,False,False,False,False,False,False,False,False,True,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,True,False,False,False,False,False,False,False,False,False,False,False,False,False,True,False,False,False,False,False,False,False,False,False,False,False,False,True,False,False,False,False,False,False,False,False,False,True,False,False,False,False,False,False,False,False,True,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,True,False,False,False,False,False,False,False,False,False,False,False,False,True,False,False,False,False,False,False,False,False,False,False,True,False,False,False,False,False,False,False,False,False,False,True,False,False,False,False,False,False,False,False,False,False,False,False,True,False,False,False,False,False,False,False,False,False,False,False,True,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,True,False,False,False,False,False,False,False,False,False,False,False,False,True,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,True,False,False,False,False,False,False,False,False,False,False,False,False,False,False,True,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,True,False,False,False,False,False,False,False,False,False,False,False,True,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,True,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,True,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,True,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,True,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,True,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False
28,-1.155361,-0.006803,-0.148751,-0.429788,-1.12382,-0.365982,-1.516673,-0.320421,0.050986,0.049165,-0.140457,-0.692189,-0.1074,-0.115447,-0.0065,-0.148751,0.010105,-0.112401,-0.084674,-0.199366,-0.297459,-0.698229,-0.145547,1.260867,0.006646,-2.576752,-0.32367,-0.288237,-0.425239,-0.11085,-0.154626,-0.689543,-0.711733,-0.66429,-0.419101,-0.520231,-0.146214,-0.195151,-0.17949,-0.097751,0.815514,-0.366117,0.062015,-0.765779,-0.696354,-0.123105,-0.525902,-0.659033,0.237756,-0.73531,0.019483,0.018157,0.017577,0.017253,0.01735,-1.361249,-0.255976,-0.295974,-0.258665,-0.294789,-0.25275,-0.17737,-1.248358,-0.183014,-0.228409,0.161802,0.16259,0.16332,0.164087,0.164974,0.165936,0.166877,0.167618,0.187453,0.191866,0.202499,0.208347,0.215112,0.218824,0.223462,0.228386,0.23106,0.236173,0.238236,0.240794,0.24305,0.245328,0.2479,0.251028,0.253652,0.0,0.005736,0.009787,0.011343,0.011343,0.023095,0.027758,-0.773167,False,True,True,True,True,False,False,False,False,False,False,True,True,False,False,False,False,False,False,False,False,False,False,False,True,True,False,False,False,True,True,True,True,True,False,True,False,False,False,False,False,False,False,False,False,True,False,False,True,False,True,True,False,False,False,False,False,True,False,False,False,True,False,True,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,True,False,True,False,False,False,False,False,False,False,True,False,False,True,False,False,True,False,False,True,False,False,True,False,False,True,False,False,True,False,False,True,False,True,False,True,False,False,False,False,False,False,False,False,True,False,False,False,False,True,True,False,False,False,False,False,False,False,False,False,False,True,False,False,True,False,False,False,True,False,False,False,True,False,False,False,True,False,False,False,True,False,False,False,True,False,False,False,True,False,False,False,True,False,False,False,False,True,False,False,True,False,False,False,True,False,False,False,True,False,False,False,True,False,False,True,False,False,False,True,False,False,False,False,True,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,True,False,False,False,True,False,False,False,False,True,False,False,False,False,False,True,False,False,False,False,True,False,False,False,False,True,False,False,False,False,True,False,False,False,False,True,False,False,False,False,True,False,False,False,False,True,False,False,False,False,True,False,False,False,False,True,False,False,False,False,False,False,False,False,False,False,True,False,False,False,False,True,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,True,False,False,False,False,False,False,True,False,False,False,False,False,False,False,True,False,False,True,False,False,False,False,False,False,True,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,True,False,False,False,False,False,False,True,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,True,False,False,False,False,False,False,False,True,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,True,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,True,False,False,False,False,False,False,False,True,False,False,False,False,False,False,False,False,False,True,False,False,False,False,False,False,False,False,False,False,False,False,False,True,False,False,False,False,False,False,False,True,False,False,False,False,False,False,False,True,False,False,False,False,False,False,False,False,False,False,False,False,False,False,True,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,True,False,False,False,False,False,False,False,False,True,False,False,False,False,False,False,False,False,True,False,False,False,False,False,False,False,True,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,True,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,True,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,True,False,False,False,True,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,True,False,False,False,False,False,False,False,False,False,False,True,False,False,False,False,False,False,False,False,False,False,False,False,False,True,False,False,False,False,False,False,False,False,False,False,True,False,False,False,False,False,False,False,False,False,False,False,False,True,False,False,False,False,False,False,False,False,False,False,False,False,False,True,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,True,False,False,False,False,False,False,False,False,True,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,True,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,True,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,True,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,True,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,True,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,True,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,True,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,True,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False,False


Lets move on to fitting the lasso model!

In [None]:
# Build and run Lasso, then map nonzero coefficients back to variable labels (removed BaselineCategory)
from sklearn.linear_model import LassoCV
import numpy as np
import pandas as pd

# Fit Lasso with cross-validation
lasso = LassoCV(cv=5, random_state=0, max_iter=10000)
lasso.fit(X, Y)

# Get nonzero coefficients sorted by absolute value
coefs = pd.Series(lasso.coef_, index=X.columns)
nonzero_coefs = coefs[coefs != 0].reindex(coefs[coefs != 0].abs().sort_values(ascending=False).index)

# Create label mapping from Stata metadata
label_map = dict(zip(meta.column_names, meta.column_labels)) if hasattr(meta, 'column_labels') else {}
original_vars = set(meta.column_names) if hasattr(meta, 'column_names') else set()

def parse_variable(varname):
    """Parse variable to get base variable, category, and labels (no baseline category)."""
    # Find base variable and category
    if varname in original_vars:
        base_var, category = varname, ''
    elif '_' in varname:
        base, cat = varname.rsplit('_', 1)
        if base in original_vars:
            base_var, category = base, cat
        else:
            # Try progressively shorter prefixes for variables with underscores
            parts = varname.split('_')
            base_var, category = varname, ''
            for i in range(len(parts) - 1, 0, -1):
                potential_base = '_'.join(parts[:i])
                if potential_base in original_vars:
                    base_var, category = potential_base, '_'.join(parts[i:])
                    break
    else:
        base_var, category = varname, ''

    # Get variable label
    variable_label = label_map.get(base_var, base_var)

    # Get category label from value labels
    category_label = ''
    if category and hasattr(meta, 'variable_value_labels') and base_var in meta.variable_value_labels:
        value_dict = meta.variable_value_labels[base_var]
        for cat_type in [int, float, str]:
            try:
                cat_key = cat_type(category)
                if cat_key in value_dict:
                    category_label = value_dict[cat_key]
                    break
            except (ValueError, TypeError):
                continue

    return base_var, variable_label, category, category_label

# Build results table (no BaselineCategory)
results_data = []
for var in nonzero_coefs.index:
    base_var, variable_label, category, category_label = parse_variable(var)
    results_data.append({
        'Variable': var,
        'Label': variable_label,
        'Category': category,
        'CategoryLabel': category_label,
        'Coefficient': nonzero_coefs[var]
    })

# Display results with formatted CategoryLabel and wider Coefficient display
result_df = pd.DataFrame(results_data)
pd.set_option('display.max_rows', None)
print(f"Lasso found {len(result_df)} significant variables (nonzero coefficients):")

# Prepare display copy: truncate long category labels and format coefficients for readability
max_cat_len = 40  # max characters to show for category labels
result_df['CategoryLabel'] = result_df['CategoryLabel'].astype(str).apply(
    lambda s: s if len(s) <= max_cat_len else s[:max_cat_len-3] + '...'
)

# Keep numeric coefficient for downstream use, but create a formatted display version
result_df['Coefficient'] = result_df['Coefficient'].astype(float)
result_df_display = result_df.copy()
result_df_display['Coefficient'] = result_df_display['Coefficient'].map(lambda v: f"{v: .6f}")

# Show a compact, clearly formatted table
display_cols = ['Variable', 'Label', 'Category', 'CategoryLabel', 'Coefficient']
pd.set_option('display.max_colwidth', 50)
display(result_df_display[display_cols])

Lasso found 160 significant variables (nonzero coefficients):


Unnamed: 0,Variable,Label,Category,CategoryLabel,Coefficient
0,sf12mcs_dv,SF-12 Mental Component Summary (MCS),,,-3.498345
1,finnow_5,Subjective financial situation - current,5.0,Finding it very difficult,1.884265
2,sclfsato_7,Satisfaction with life overall,7.0,completely satisfied,-1.620461
3,sclfsato_6,Satisfaction with life overall,6.0,mostly satisfied,-1.125316
4,sf12pcs_dv,SF-12 Physical Component Summary (PCS),,,-1.009862
5,finnow_4,Subjective financial situation - current,4.0,Finding it quite difficult,0.881343
6,sclfsato_3,Satisfaction with life overall,3.0,somewhat dissatisfied,0.826083
7,sclfsato_5,Satisfaction with life overall,5.0,somewhat satisfied,-0.570174
8,finfut_2,Subjective financial situation - future,2.0,Worse of than now,0.39324
9,sex_2,Sex,2.0,female,0.356643


**Very interesting outcome! Here are my key takeaways:**

Already included in my refined variable list:

- Neighbourhood social cohesion (nbrsnci_dv) is confirmed to be correlated with subjective well-being (scghq1_dv), as expected
- Demographics are important (e.g. age, gender)
- Education is important (highest qualification)
- Current economic activity is important

Variables to consider adding:
- Subjective financial situation (finnow) (finfut) -- clearly subjective financial strain is a key predictor of well-being


Variables I could consider deleting from my refined variable list:
- Job industry (not present here, and will be annoying due to lots of dummies)

Noteworthy variables:
- sclfsat* variables (e.g. satisfaction with life, health, income) are strong predictors of wellbeing. But they are effecitvely alternative wellbeing outcomes. If we include them, we risk circularity (i.e. we are predicting wellbeing with other measures of wellbeing). So I will exclude them
- sf12mcs (SF-12 mental health score) and sf12pcs (SF-12 physical health score) are strong predictors of wellbeing. They are also arguably alternative measures of wellbeing, so if we include them we risk circularity (over-control).




