ADS Project 5:
Term: Spring 2017
Projec title: The Fragile Families Challenge
- team member 1 (
- team member 2 (
- team member 3 (
- team member 4 (
- team member 5 (
Please be advised: due to protection of privacy of fragile families, we are not allowed to upload any relevant dataset by the organizer at Princeton University.
- team member 1 (
Project summary:In this project, we use data based on the Fragile Families and Child Wellbeing Study, which has followed thousands of American families for more than 15 years. During this time, the Fragile Families study collected information about the children, their parents, their schools, and their larger environments.
Given all the background data from birth to year 9 and some training data from year 15,we infer six key outcomes (gpa, grit, material_hardship, eviction, layoff, jobtraining) in the year 15 test data. In the data cleaning process, we deal with categorical variable and continuous variables seperately, for continuous variable, we replace the NA with the median value of that variable, and create a new categrocial variable to indicate the NAs (where the NAs may contain information to some degree) ,attach the new indicating categorical variable to the oringinal categorical features.For categorical features, we replace the NAs with a number that doesn't exist in original dataset and transform every categorical to a dummy matrix, for every dummy matrix whose elements are either 0 or 1, we choose the 2nd to last column to avoid collinearity.
Given the cleaned data, our team work on different directions, one team work on different features and one team work on different machine learning tools, since we got to know the xgboost apparently outperforms other methods, we together work on features selected from various angles. For case 1: data obtained when children are at age 9 and only consider the continuous variables.Case 2: data obtained when children are at age 9, use categorical variables. Then we bag them by using the weighted average. We use the same strategy to other continuous outcomes( grit, material hardship).
Contribution statement: (default) All team members contributed equally in all stages of this project. All team members approve our work presented in this GitHub repository including this contributions statement.
proj/ ├── lib/ ├── data/ ├── doc/ ├── figs/ └── output/
Please see each subfolder for a README file.