You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Describe the bug
I'm running through the codes in the "SCAG-DT" folder using CA census based statistical areas to define "city", so the only comparable cities in my dataset to UDP outputs are Ventura and Imperial. After getting through 4_typology.py, my dataset's Ventura tracts matched well with the UDP Ventura typology file (only off by 1 census tract), but Imperial had 11/27 tracts not matching up, mostly the displacement and gentrification categories. In my dataset, there are 31 tracts in Imperial county but only 27 in UDP imperial_typology_output.csv and scag.csv. I suspect it's because the median calculation was off due to having a different total n, creating discrepancies when creating categorical variables, which accumulate and result in different typology categories.
To Reproduce
I think the discrepancy starts around line 384 in 2_data_curation.py due to median calculation, for example rm_hinc_18 = np.nanmedian(census['hinc_18'])
At this point there should still be 31 tracts in Imperial county (based on codes from beginning of 2_data_curation.py to line 384). if I filter out tracts not in the Imperial_database_2018.csv, then the medians match.
For example, median(mydataset_31tracts["hinc_18"])=41767, median(Imperial_database_2018.csv["hinc_18"])=43651, and median(mydataset_27tracts["hinc_18"])=43651.
I think this is also happening when working with pums and zillow data to create categorical variables.
The four missing tracts are: 6025010102 6025010900 6025012302 6025940000. These tracts are in Imperialcensus_summ_2018.csv as the input at the beginning of 2_data_curation.py. Do you know why these tracts are not included? Where in the codes should I be excluding the tracts, and based on what criteria? Thanks!
The text was updated successfully, but these errors were encountered:
Describe the bug
I'm running through the codes in the "SCAG-DT" folder using CA census based statistical areas to define "city", so the only comparable cities in my dataset to UDP outputs are Ventura and Imperial. After getting through 4_typology.py, my dataset's Ventura tracts matched well with the UDP Ventura typology file (only off by 1 census tract), but Imperial had 11/27 tracts not matching up, mostly the displacement and gentrification categories. In my dataset, there are 31 tracts in Imperial county but only 27 in UDP imperial_typology_output.csv and scag.csv. I suspect it's because the median calculation was off due to having a different total n, creating discrepancies when creating categorical variables, which accumulate and result in different typology categories.
To Reproduce
I think the discrepancy starts around line 384 in 2_data_curation.py due to median calculation, for example rm_hinc_18 = np.nanmedian(census['hinc_18'])
At this point there should still be 31 tracts in Imperial county (based on codes from beginning of 2_data_curation.py to line 384). if I filter out tracts not in the Imperial_database_2018.csv, then the medians match.
For example, median(mydataset_31tracts["hinc_18"])=41767, median(Imperial_database_2018.csv["hinc_18"])=43651, and median(mydataset_27tracts["hinc_18"])=43651.
I think this is also happening when working with pums and zillow data to create categorical variables.
The four missing tracts are: 6025010102 6025010900 6025012302 6025940000. These tracts are in Imperialcensus_summ_2018.csv as the input at the beginning of 2_data_curation.py. Do you know why these tracts are not included? Where in the codes should I be excluding the tracts, and based on what criteria? Thanks!
The text was updated successfully, but these errors were encountered: