# 소득 예측 AI 해커톤
- `알고리즘`, `정형`, `회귀`, `사회`, `RMSE`
- 기간
    - 2024.03.11 ~ 2024.04.08 09:59
- https://dacon.io/competitions/official/236230/data

## 배경

> 안녕하세요!
>
> 해커톤 41회, 소득 예측 AI 해커톤에 오신 것을 환영합니다.  
> 
> 이번 데이콘 해커톤은 다양한 개인적 특성을 바탕으로 한 데이터를 활용하여 소득 수준을 예측하는 것을 목표로 합니다.
> 
> 참가자 여러분들은 제공된 개인 특성 데이터를 기반으로 한 알고리즘 개발을 통해 
> 
> 개인의 소득 수준을 예측함으로써 사회적 문제 해결에 기여할 수 있는 기회가 될 것입니다. 
> 
> 또한, 이번 대회의 목표는 단순히 개인 소득 수준을 예측하는 것을 넘어, 
> 
> 다양한 개인적 특성 데이터를 분석하고 이해하는 능력을 키우며,
> 
> 이를 통해 AI 역량과 경험을 한 단계 더 발전시키는 것입니다. 




## 주제
- 개인 특성 데이터를 활용하여 개인 소득 수준을 예측하는 AI 모델 개발

## 설명
- 개인의 특성과 관련된 다양한 데이터를 활용하여 개인 소득 수준을 예측할 수 있는 AI 모델 개발

### 컬럼 설명
- `train.csv`
    - 한 사람에 관련된 다양한 사회적, 경제적 정보
    - `ID`: 학습 데이터 고유 ID
    - `Age` : 나이
    - `Gender` : 성별
    - `Education_Status` : 교육 수준
    - `Employment_Status` : 고용 형태
    - `Working_Week (Yearly)` : 연간 근무하는 주
    - `Industry_Status` : 고용 분야
    - `Occupation_Status` : 직업 형태
    - `Race` : 인종
    - `Hispanic_Origin` : 히스패닉 혈통
    - `Martial_Status` : 결혼 유무
    - `Household_Status` : 가정 세대원 상태
    - `Household_Summary` : 가정 세대원 상태 요약
    - `Citizenship` : 시민권
    - `Birth_Country` : 출생국가
    - `Birth_Country (Father)`
    - `Birth_Country (Mother)`
    - `Tax_Status` : 세금 형태
    - `Gains` : 소득 금액
    - `Losses` : 손실 금액
    - `Dividends` : 배당 금액
    - `Income_Status` : 최종 소득 상태
    - `Income` : 예측 목표, 1시간 단위의 소득을 예측 (0일 경우 근로소득이 없다)
- `test.csv`
    - 한 사람에 관련된 다양한 사회적, 경제적 정보
    - `ID` : 테스트 데이터 고유 ID
    - Income이 존재하지 않음
- `sample_submission.csv`
    - `ID` : 테스트 데이터 고유 ID
    - `Income` : ID에 해당되는 Income을 예측하여 제출

# 데이터셋 전처리

In [61]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
%matplotlib inline

pd.set_option('display.max_columns', None) ## 모든 열을 출력한다.

from datetime import datetime
from datetime import timedelta
import missingno as msno

In [62]:
train_path = "C:/Users/aryij/Documents/DataStudy/income-prediction-dacon/data/train.csv"
train_df = pd.read_csv(train_path)
train_df.head()

Unnamed: 0,ID,Age,Gender,Education_Status,Employment_Status,Working_Week (Yearly),Industry_Status,Occupation_Status,Race,Hispanic_Origin,Martial_Status,Household_Status,Household_Summary,Citizenship,Birth_Country,Birth_Country (Father),Birth_Country (Mother),Tax_Status,Gains,Losses,Dividends,Income_Status,Income
0,TRAIN_00000,63,M,Middle (7-8),Full-Time,4,Social Services,Services,White,All other,Married,Householder,Householder,Native,US,US,US,Nonfiler,0,0,0,Unknown,425
1,TRAIN_00001,37,M,Associates degree (Vocational),Full-Time,52,Entertainment,Services,White,All other,Separated,Nonfamily householder,Householder,Native,US,US,US,Single,0,0,0,Under Median,0
2,TRAIN_00002,58,F,High graduate,Full-Time,52,Manufacturing (Non-durable),Admin Support (include Clerical),Black,All other,Married,Householder,Householder,Native,US,US,US,Married Filling Jointly both under 65 (MFJ),3411,0,0,Under Median,860
3,TRAIN_00003,44,M,High graduate,Full-Time,52,Retail,Technicians & Support,White,All other,Divorced,Nonfamily householder,Householder,Native,US,US,US,Single,0,0,0,Under Median,850
4,TRAIN_00004,37,F,High graduate,Full-Time,52,Retail,Sales,White,All other,Divorced,Householder,Householder,Native,US,US,US,Head of Household (HOH),0,0,0,Unknown,570


In [63]:
train_df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 20000 entries, 0 to 19999
Data columns (total 23 columns):
 #   Column                  Non-Null Count  Dtype 
---  ------                  --------------  ----- 
 0   ID                      20000 non-null  object
 1   Age                     20000 non-null  int64 
 2   Gender                  20000 non-null  object
 3   Education_Status        20000 non-null  object
 4   Employment_Status       20000 non-null  object
 5   Working_Week (Yearly)   20000 non-null  int64 
 6   Industry_Status         20000 non-null  object
 7   Occupation_Status       20000 non-null  object
 8   Race                    20000 non-null  object
 9   Hispanic_Origin         20000 non-null  object
 10  Martial_Status          20000 non-null  object
 11  Household_Status        20000 non-null  object
 12  Household_Summary       20000 non-null  object
 13  Citizenship             20000 non-null  object
 14  Birth_Country           20000 non-null  object
 15  Bi

In [64]:
# 결측치 확인 -> 결측치 없음
train_df.isnull().sum()

ID                        0
Age                       0
Gender                    0
Education_Status          0
Employment_Status         0
Working_Week (Yearly)     0
Industry_Status           0
Occupation_Status         0
Race                      0
Hispanic_Origin           0
Martial_Status            0
Household_Status          0
Household_Summary         0
Citizenship               0
Birth_Country             0
Birth_Country (Father)    0
Birth_Country (Mother)    0
Tax_Status                0
Gains                     0
Losses                    0
Dividends                 0
Income_Status             0
Income                    0
dtype: int64

## `describe()` 결과

### 수치형 변수
- `Working_Week (Yearly)` : 데이터의 50% 부터 최대치까지 전부 52주 근무
    - → 일반적으로 52주 근무 하는 사람들이 많다
- `Gains` : 75%까지 0인 것에 비해 평균 약 383, 최대치 99999, 표준편차 약 4144
    - → 소수의 인원이 높은 소득금액을 벌어 들인다
- `Losses` : 75%까지 0인 것에 비해 평균 약 40, 최대치 4356, 표준편차 279
    - → 극소수의 인원이 높은 손실 금액을 기록했다
- `Dividends` : 75%까지 0인 것에 비해 평균 약 123, 최대치 45000, 표준편차 약 4144
    - → 극소수의 인원이 높은 배당 금액 기록했다
- `Income` : 50%가 500, 평균 약 555, 중위값 근처에서 평균값이 형성됨
    - → 1시간 단위의 소득이 중위값 500을 조금 넘는 사람들이 많다

In [65]:
train_df.describe()

Unnamed: 0,Age,Working_Week (Yearly),Gains,Losses,Dividends,Income
count,20000.0,20000.0,20000.0,20000.0,20000.0,20000.0
mean,35.6325,34.94305,383.1295,40.20215,123.45145,554.56525
std,17.994414,22.254592,4144.247487,279.182677,1206.949429,701.553155
min,0.0,0.0,0.0,0.0,0.0,0.0
25%,23.0,7.0,0.0,0.0,0.0,0.0
50%,34.0,52.0,0.0,0.0,0.0,500.0
75%,47.0,52.0,0.0,0.0,0.0,875.0
max,90.0,52.0,99999.0,4356.0,45000.0,9999.0


### 범주형 변수
- `Gender` : 여성이 10472, 남성이 9528 (약 52:48 비율)
- `Education_Status` : High graduate (고졸) 학력이 제일 많다
- `Employment_Status` : 아이 또는 군인이 가장 많다
- `Industry_Status` : 대학을 가지 않거나 아이인 경우가 가장 많다
- `Occupation_Status` : Unknown이 가장 많다
- `Race` : 백인이 가장 많다
- `Hispanic_Origin` : All other가 가장 많다
- `Martial_Status` : Married가 가장 많음
- `Household_Status` : Householder (세대주)인 경우가 가장 많다
- `Household_Summary` : Householder (세대주)인 경우가 가장 많다
- `Citizenship` : Native가 가장 많다
- `Birth_Country` : US (미국)가 가장 많다
- `Birth_Country (Father)` : US (미국)가 가장 많다
- `Birth_Country (Mother)` : US (미국)가 가장 많다
- `Tax_Status` : Married Filing Jointly (부부 합산 신고)가 가장 많다
    - 기혼인 부부가 각 소득을 합산하여 세금 보고
    - 가장 낮은 세율과 많은 세제 혜택이 있음
- `Income_Status` : Under Median (중위수 이하)가 가장 많다

In [66]:
train_df.describe(include="object")

Unnamed: 0,ID,Gender,Education_Status,Employment_Status,Industry_Status,Occupation_Status,Race,Hispanic_Origin,Martial_Status,Household_Status,Household_Summary,Citizenship,Birth_Country,Birth_Country (Father),Birth_Country (Mother),Tax_Status,Income_Status
count,20000,20000,20000,20000,20000,20000,20000,20000,20000,20000,20000,20000,20000,20000,20000,20000,20000
unique,20000,2,17,8,24,15,5,10,7,31,8,5,43,43,43,6,3
top,TRAIN_00000,F,High graduate,Children or Armed Forces,Not in universe or children,Unknown,White,All other,Married,Householder,Householder,Native,US,US,US,Married Filling Jointly both under 65 (MFJ),Under Median
freq,1,10472,6494,11142,4688,4688,16845,17769,9554,6087,8552,17825,17825,16563,16594,8588,13237


In [67]:
train_df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 20000 entries, 0 to 19999
Data columns (total 23 columns):
 #   Column                  Non-Null Count  Dtype 
---  ------                  --------------  ----- 
 0   ID                      20000 non-null  object
 1   Age                     20000 non-null  int64 
 2   Gender                  20000 non-null  object
 3   Education_Status        20000 non-null  object
 4   Employment_Status       20000 non-null  object
 5   Working_Week (Yearly)   20000 non-null  int64 
 6   Industry_Status         20000 non-null  object
 7   Occupation_Status       20000 non-null  object
 8   Race                    20000 non-null  object
 9   Hispanic_Origin         20000 non-null  object
 10  Martial_Status          20000 non-null  object
 11  Household_Status        20000 non-null  object
 12  Household_Summary       20000 non-null  object
 13  Citizenship             20000 non-null  object
 14  Birth_Country           20000 non-null  object
 15  Bi

## 컬럼명 변경

In [68]:
# 데이터프레임 컬럼명 소문자로 변경하는 함수 생성
def lower_column(df):
    lower_cols = []
    # .lower() 사용하여 대문자로 적힌 컬럼명 소문자로 변경
    for column in train_df.columns:
        lower_cols.append(column.lower())
    # 소문자로 변경한 컬럼명을 새롭게 데이터프레임의 컬럼명으로 지정
    df.columns = lower_cols
    return df.columns

In [69]:
# 변경한 소문자 함수 적용
lower_column(train_df)

Index(['id', 'age', 'gender', 'education_status', 'employment_status',
       'working_week (yearly)', 'industry_status', 'occupation_status', 'race',
       'hispanic_origin', 'martial_status', 'household_status',
       'household_summary', 'citizenship', 'birth_country',
       'birth_country (father)', 'birth_country (mother)', 'tax_status',
       'gains', 'losses', 'dividends', 'income_status', 'income'],
      dtype='object')

In [70]:
# 변경한 내역 확인
train_df.head()

Unnamed: 0,id,age,gender,education_status,employment_status,working_week (yearly),industry_status,occupation_status,race,hispanic_origin,martial_status,household_status,household_summary,citizenship,birth_country,birth_country (father),birth_country (mother),tax_status,gains,losses,dividends,income_status,income
0,TRAIN_00000,63,M,Middle (7-8),Full-Time,4,Social Services,Services,White,All other,Married,Householder,Householder,Native,US,US,US,Nonfiler,0,0,0,Unknown,425
1,TRAIN_00001,37,M,Associates degree (Vocational),Full-Time,52,Entertainment,Services,White,All other,Separated,Nonfamily householder,Householder,Native,US,US,US,Single,0,0,0,Under Median,0
2,TRAIN_00002,58,F,High graduate,Full-Time,52,Manufacturing (Non-durable),Admin Support (include Clerical),Black,All other,Married,Householder,Householder,Native,US,US,US,Married Filling Jointly both under 65 (MFJ),3411,0,0,Under Median,860
3,TRAIN_00003,44,M,High graduate,Full-Time,52,Retail,Technicians & Support,White,All other,Divorced,Nonfamily householder,Householder,Native,US,US,US,Single,0,0,0,Under Median,850
4,TRAIN_00004,37,F,High graduate,Full-Time,52,Retail,Sales,White,All other,Divorced,Householder,Householder,Native,US,US,US,Head of Household (HOH),0,0,0,Unknown,570


In [71]:
train_df.household_status.value_counts()

household_status
Householder                                                               6087
Spouse of householder                                                     4794
Child <18 never marr not in subfamily                                     2670
Nonfamily householder                                                     2465
Child 18+ never marr Not in a subfamily                                   1860
Secondary individual                                                       845
Other Rel 18+ never marr not in subfamily                                  195
Other Rel 18+ ever marr not in subfamily                                   154
Child 18+ ever marr Not in a subfamily                                     118
Child 18+ ever married Responsible Person of subfamily                     101
Child 18+ never married Responsible Person of subfamily                     96
Grandchild <18 never married child of subfamily Responsible Person          88
Responsible Person of unrelated sub

In [72]:
train_df.household_summary.value_counts()

household_summary
Householder                             8552
Spouse of householder                   4794
Child under 18 never married            2679
Child 18 or older                       2192
Nonrelative of householder               974
Other relative of householder            781
Group Quarters- Secondary individual      24
Child under 18 ever married                4
Name: count, dtype: int64

In [73]:
train_df[train_df.household_status=="Child <18 ever married Responsible Person of subfamily"].iloc[:, :15]

Unnamed: 0,id,age,gender,education_status,employment_status,working_week (yearly),industry_status,occupation_status,race,hispanic_origin,martial_status,household_status,household_summary,citizenship,birth_country
17855,TRAIN_17855,17,F,High Sophomore,Full-Time,52,Retail,Sales,White,All other,Married,Child <18 ever married Responsible Person of s...,Child under 18 ever married,Native,US


In [74]:
train_df[train_df.household_summary=="Child under 18 ever married"].iloc[:, :15]

Unnamed: 0,id,age,gender,education_status,employment_status,working_week (yearly),industry_status,occupation_status,race,hispanic_origin,martial_status,household_status,household_summary,citizenship,birth_country
5957,TRAIN_05957,16,F,High Freshman,Children or Armed Forces,0,Retail,Services,White,All other,Married (Spouse Absent),Child <18 ever marr not in subfamily,Child under 18 ever married,Native,US
8817,TRAIN_08817,16,M,Middle (7-8),Children or Armed Forces,0,Not in universe or children,Unknown,White,Mexican (Mexicano),Divorced,Child <18 ever marr not in subfamily,Child under 18 ever married,Native,US
9960,TRAIN_09960,15,M,High Freshman,Not Working,0,Not in universe or children,Unknown,White,All other,Divorced,Child <18 ever marr not in subfamily,Child under 18 ever married,Native,US
17855,TRAIN_17855,17,F,High Sophomore,Full-Time,52,Retail,Sales,White,All other,Married,Child <18 ever married Responsible Person of s...,Child under 18 ever married,Native,US


In [75]:
train_df[train_df.household_summary=="Child under 18 ever married"].iloc[:, 15:]


Unnamed: 0,birth_country (father),birth_country (mother),tax_status,gains,losses,dividends,income_status,income
5957,US,US,Nonfiler,0,0,0,Unknown,400
8817,US,US,Nonfiler,0,0,0,Under Median,0
9960,US,US,Nonfiler,0,0,0,Under Median,0
17855,US,US,Married Filling Jointly both under 65 (MFJ),0,0,0,Unknown,500


In [76]:
train_df.occupation_status.value_counts()

occupation_status
Unknown                             4688
Admin Support (include Clerical)    2709
Services                            2313
Craft & Repair                      1869
Sales                               1692
Professional                        1488
Machine Operators & Inspectors      1383
Management                          1111
Handlers/Cleaners                    837
Transportation                       690
Technicians & Support                558
Farming & Forestry & Fishing         296
Protective Services                  260
Private Household Services           105
Armed Forces                           1
Name: count, dtype: int64

In [77]:
train_df[train_df.occupation_status=="Unknown"]

Unnamed: 0,id,age,gender,education_status,employment_status,working_week (yearly),industry_status,occupation_status,race,hispanic_origin,martial_status,household_status,household_summary,citizenship,birth_country,birth_country (father),birth_country (mother),tax_status,gains,losses,dividends,income_status,income
6,TRAIN_00006,67,M,Middle (7-8),Children or Armed Forces,0,Not in universe or children,Unknown,White,All other,Divorced,Nonfamily householder,Householder,Native,US,US,US,Nonfiler,0,0,0,Unknown,0
7,TRAIN_00007,64,M,Masters degree,Not Working,5,Not in universe or children,Unknown,White,All other,Married,Householder,Householder,Native,US,US,US,Married Filling Jointly both under 65 (MFJ),0,0,2052,Under Median,0
11,TRAIN_00011,75,F,High Freshman,Children or Armed Forces,0,Not in universe or children,Unknown,White,Cuban,Married,Other Relative 18+ ever married Responsible Pe...,Other relative of householder,Foreign-born (Non-US Citizen),Cuba,Cuba,Cuba,Nonfiler,0,0,0,Unknown,0
16,TRAIN_00016,10,M,Children,Children or Armed Forces,0,Not in universe or children,Unknown,White,All other,Single,Child <18 never marr not in subfamily,Child under 18 never married,Native,US,US,US,Nonfiler,0,0,0,Unknown,0
21,TRAIN_00021,16,F,High Freshman,Not Working,14,Not in universe or children,Unknown,White,All other,Single,Child <18 never marr not in subfamily,Child under 18 never married,Native,US,US,US,Single,0,0,0,Under Median,0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
19986,TRAIN_19986,31,M,Middle (7-8),Not Working,0,Not in universe or children,Unknown,White,All other,Divorced,Householder,Householder,Native,US,US,US,Nonfiler,0,0,0,Unknown,0
19989,TRAIN_19989,14,M,Children,Children or Armed Forces,0,Not in universe or children,Unknown,White,All other,Single,Child <18 never marr not in subfamily,Child under 18 never married,Native,US,US,US,Nonfiler,0,0,0,Unknown,0
19990,TRAIN_19990,8,M,Children,Children or Armed Forces,0,Not in universe or children,Unknown,Asian/Pacific,All other,Single,Child <18 never marr not in subfamily,Child under 18 never married,Foreign-born (Non-US Citizen),Japan,Japan,Japan,Nonfiler,0,0,0,Unknown,0
19994,TRAIN_19994,17,F,High Sophomore,Not Working,0,Not in universe or children,Unknown,White,All other,Single,Child <18 never marr not in subfamily,Child under 18 never married,Native,US,US,US,Nonfiler,0,0,0,Under Median,0


In [78]:
train_df[train_df.occupation_status=="Unknown"]["industry_status"].value_counts()

industry_status
Not in universe or children    4688
Name: count, dtype: int64

In [83]:
train_df[train_df.occupation_status=="Unknown"]["income"].value_counts()

income
0    4688
Name: count, dtype: int64

In [79]:
train_df.occupation_status.value_counts()

occupation_status
Unknown                             4688
Admin Support (include Clerical)    2709
Services                            2313
Craft & Repair                      1869
Sales                               1692
Professional                        1488
Machine Operators & Inspectors      1383
Management                          1111
Handlers/Cleaners                    837
Transportation                       690
Technicians & Support                558
Farming & Forestry & Fishing         296
Protective Services                  260
Private Household Services           105
Armed Forces                           1
Name: count, dtype: int64

In [80]:
train_df[train_df.occupation_status=="Armed Forces"]

Unnamed: 0,id,age,gender,education_status,employment_status,working_week (yearly),industry_status,occupation_status,race,hispanic_origin,martial_status,household_status,household_summary,citizenship,birth_country,birth_country (father),birth_country (mother),tax_status,gains,losses,dividends,income_status,income
10342,TRAIN_10342,31,F,College,Seeking Full-Time,20,Armed Forces,Armed Forces,White,Puerto Rican,Married,Spouse of householder,Spouse of householder,Native,US,Puerto-Rico,Puerto-Rico,Married Filling Jointly both under 65 (MFJ),0,0,0,Under Median,0


In [82]:
train_df.industry_status.value_counts()

industry_status
Not in universe or children                     4688
Retail                                          3149
Manufacturing (Durable)                         1575
Manufacturing (Non-durable)                     1223
Education                                       1041
Business & Repair                                847
Medical (except Hospitals)                       838
Construction                                     832
Hospitals                                        821
Finance Insurance & Real Estate                  727
Transportation                                   693
Public Administration                            641
Other professional services                      477
Wholesale                                        450
Personal Services (except Private Household)     429
Social Services                                  367
Entertainment                                    278
Agriculture                                      268
Utilities & Sanitary          

In [81]:
train_df[train_df.industry_status=="Armed Forces"]

Unnamed: 0,id,age,gender,education_status,employment_status,working_week (yearly),industry_status,occupation_status,race,hispanic_origin,martial_status,household_status,household_summary,citizenship,birth_country,birth_country (father),birth_country (mother),tax_status,gains,losses,dividends,income_status,income
10342,TRAIN_10342,31,F,College,Seeking Full-Time,20,Armed Forces,Armed Forces,White,Puerto Rican,Married,Spouse of householder,Spouse of householder,Native,US,Puerto-Rico,Puerto-Rico,Married Filling Jointly both under 65 (MFJ),0,0,0,Under Median,0


In [84]:
train_df[train_df.income==0]

Unnamed: 0,id,age,gender,education_status,employment_status,working_week (yearly),industry_status,occupation_status,race,hispanic_origin,martial_status,household_status,household_summary,citizenship,birth_country,birth_country (father),birth_country (mother),tax_status,gains,losses,dividends,income_status,income
1,TRAIN_00001,37,M,Associates degree (Vocational),Full-Time,52,Entertainment,Services,White,All other,Separated,Nonfamily householder,Householder,Native,US,US,US,Single,0,0,0,Under Median,0
6,TRAIN_00006,67,M,Middle (7-8),Children or Armed Forces,0,Not in universe or children,Unknown,White,All other,Divorced,Nonfamily householder,Householder,Native,US,US,US,Nonfiler,0,0,0,Unknown,0
7,TRAIN_00007,64,M,Masters degree,Not Working,5,Not in universe or children,Unknown,White,All other,Married,Householder,Householder,Native,US,US,US,Married Filling Jointly both under 65 (MFJ),0,0,2052,Under Median,0
8,TRAIN_00008,24,F,Bachelors degree,Children or Armed Forces,52,Retail,Sales,White,All other,Single,Child 18+ never marr Not in a subfamily,Child 18 or older,Native,US,US,US,Single,0,0,0,Under Median,0
9,TRAIN_00009,53,M,High graduate,Seeking Full-Time,30,Construction,Machine Operators & Inspectors,White,All other,Married,Householder,Householder,Native,US,US,US,Married Filling Jointly both under 65 (MFJ),0,0,0,Under Median,0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
19986,TRAIN_19986,31,M,Middle (7-8),Not Working,0,Not in universe or children,Unknown,White,All other,Divorced,Householder,Householder,Native,US,US,US,Nonfiler,0,0,0,Unknown,0
19989,TRAIN_19989,14,M,Children,Children or Armed Forces,0,Not in universe or children,Unknown,White,All other,Single,Child <18 never marr not in subfamily,Child under 18 never married,Native,US,US,US,Nonfiler,0,0,0,Unknown,0
19990,TRAIN_19990,8,M,Children,Children or Armed Forces,0,Not in universe or children,Unknown,Asian/Pacific,All other,Single,Child <18 never marr not in subfamily,Child under 18 never married,Foreign-born (Non-US Citizen),Japan,Japan,Japan,Nonfiler,0,0,0,Unknown,0
19994,TRAIN_19994,17,F,High Sophomore,Not Working,0,Not in universe or children,Unknown,White,All other,Single,Child <18 never marr not in subfamily,Child under 18 never married,Native,US,US,US,Nonfiler,0,0,0,Under Median,0
