-
Notifications
You must be signed in to change notification settings - Fork 1
aflaxman/space-time-predictive-validity
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Folders and files
Name | Name | Last commit message | Last commit date | |
---|---|---|---|---|
Repository files navigation
This project is an effort to validate and compare space-time smoothing methods plan for now: python script to generate noisy simulation data from 45q15 male estimates. output format is a csv file with columns labeled:: iso3, country, region, super-region, year, sex, age, y, se, x_1, x_2, x_3, ... file should rows for each ihme standard country, for each year 1970 to 2010 (for sex=male, and age='' for now) big question: what is y? (and what is se?) sometimes y = '' [1] otherwise y = rnormal(logit(45q15), se**-2.) [2] [1] decision for making y == '' could be:: for 20% of rows, chosen at random for 20% of countries, chosen at random forecasting and backcasting (drop earliest/latest 20% of years) compositional bias, e.g. drop with probability proportional to 45q15 missingness patterns from child, maternal, ihd [2] se could be:: constant f(x_i) for some x_i f(45q15) "heteroskedastic" two constants, large and small, selected randomly with small and large probability "heavy-tail error" 1. need a script (in python or otherwise) that messes up data in this format, but with complete data for column 'y' $ python simulate_noisy_data.py --missing=compositionalbias --error=heavytail < gold_45q15m.csv > noisy_45q15m.csv 2. have competing smoothing approach that take the results of this and create estimates in compatible format $ python nest_gp.py < noisy_45q15m.csv > estimate_45q15m.csv $ R --script='my_loess.r' < noisy_45q15m.csv > estimate_45q15m.csv 3. need a script that compares the results of the estimate to the gold standard $ python compare_estimate_to_gold.py estimate_45q15m.csv gold_45q15m.csv 4. master script to run simulate_noisy_data, a given model, and compare_estimate_to_gold multiple times and with different patterns of missingness. $ python master.py gold_45q15m.csv Additional Notes: Eureqa: Possibly a good model for the GPR machine? Eureqa: Takes data, looks for a functional relationship between the columns. You specify which variable you would like as a function of which others, you can tell it a lot about what you want the function to look like, and it finds some functions within your specifications that are minimal w.r.t. the error metric you choose. Like DisMod, we would like to take fairly complicated alorithms and put them in a framework where they are easy to use. SPK is another good example. Testing platform for Age-Space-Time smoother: Adult mortality estimates from the Lancet paper as the gold standard. Noise them up, see how well the smoother gets them back to where they were. How we noise them up is an open question. Like Eureqa, we may want to make our smoother so that users can choose which components - GPR, spline, etc. - will go into it. We should make sure we're not duplicating something that has already been done. We haven't yet found something that does exactly what we would like. How problem-specific is our goal? This would be widely useful around IHME. If we want it to be useful beyond IHME, good documentation and well-worked examples are essential. Common inputs: a csv file with columns labeled:: subregion, year, sex, age, y, se, x_1, x_2, x_3, ... y and se may both be '' (empty string) additionally, a csv representing the distance metric on subregions: subregion_i, subregion_j, distance between subregion_i and subregion_j Perhas something more general, i.e. you can specify a heirarchy using any variables. At least a heirarchy, in terms of groupings, for the spatial component. options: include sex in model? another goal: make using cluster easy and transparent
About
A framework for evaluating spatial-temporal imputation methods
Resources
Stars
Watchers
Forks
Releases
No releases published