Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix acs 2010 2014 #611

Merged
merged 4 commits into from Jan 16, 2019
Merged

Fix acs 2010 2014 #611

merged 4 commits into from Jan 16, 2019

Conversation

javitonino
Copy link

Point to 2015 geographies to keep retrocompatibility and save disk space, even if it is wrong to do so.

('geoidsl', input_['tiger'][self.geography + '_{}'.format(self.year) + GEOID_SUMLEVEL_COLUMN]),
('geoidsc', input_['tiger'][self.geography + '_{}'.format(self.year) + GEOID_SHORELINECLIPPED_COLUMN]),
('geoidsl', input_['tiger'][self.geography + '_{}'.format(self.tiger_year()) + GEOID_SUMLEVEL_COLUMN]),
('geoidsc', input_['tiger'][self.geography + '_{}'.format(self.tiger_year()) + GEOID_SHORELINECLIPPED_COLUMN]),

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

line too long (123 > 120 characters)

@juanignaciosl juanignaciosl self-assigned this Jan 15, 2019
Copy link
Contributor

@juanignaciosl juanignaciosl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Point to 2015 geographies to keep retrocompatibility and save disk space, even if it is wrong to do so.

Ok, so this change always bound ACS data to the last Tiger geometries, right? I guess that you considered bounding them to the "right" ones. What made you dismiss that option?

@@ -35,6 +35,7 @@
CBSA, PLACE]
YEARS = ['2010', '2014', '2015', '2016']
SAMPLES = [SAMPLE_5YR, SAMPLE_1YR]
MINIMUM_TIGER_YEAR = 2015
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Really minor: maybe there might be a YEARS array at tiger.py and this could get the first one. that way it'd would also document the available years.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a nice idea. The years belong to tiger indeed.

@javitonino
Copy link
Author

What made you dismiss that option?

In the current dump (April 2018), we have the 2010 and 2014 data bounded to 2015 geometries. This patch keeps this behaviour even if its incorrect. Main reason is that the 2014 and 2010 tiger is not at the source we use currently for tiger 2015/2016 (some S3 bucket), so including the geometries means developer effort as well as loading time and bigger dumps. Since it's something that has been broken forever, I don't think it's worth it to fix it now.

@juanignaciosl juanignaciosl removed their assignment Jan 16, 2019
@juanignaciosl
Copy link
Contributor

juanignaciosl commented Jan 16, 2019

In the current dump (April 2018), we have the 2010 and 2014 data bounded to 2015 geometries. This patch keeps this behaviour even if its incorrect

Fair enough 👍

🚀

@juanignaciosl
Copy link
Contributor

Tests are failing, and it seems to be related:

tasks.meta: INFO: rollback tasks.us.zillow.Zillow_Zip_1_2011_7d584ba935: (raised as a result of Query-invoked autoflush; consider using a session.no_autoflush block if this flush is occurring prematurely) (psycopg2.IntegrityError) duplicate key value violates unique constraint "obs_column_table_pkey"
DETAIL:  Key (column_id, table_id)=(us.census.tiger.zcta5_2016_geoidsl, us.zillow.zillow_zip_1_2011_7d584ba935) already exists.
 [SQL: 'INSERT INTO observatory.obs_column_table (column_id, table_id, colname) VALUES (%(column_id)s, %(table_id)s, %(colname)s)'] [parameters: ({'colname': 'region_name_sl', 'table_id': 'us.zillow.zillow_zip_1_2011_7d584ba935', 'column_id': 'us.census.tiger.zcta5_2016_geoidsl'}, {'colname': 'region_name_sc', 'table_id': 'us.zillow.zillow_zip_1_2011_7d584ba935', 'column_id': 'us.census.tiger.zcta5_2016_geoidsl'}, {'colname': 'allhomesplusmultifamily_zri', 'table_id': 'us.zillow.zillow_zip_1_2011_7d584ba935', 'column_id': 'us.zillow.AllHomesPlusMultifamily_Zri'}, {'colname': 'singlefamilyresidencerental_zri', 'table_id': 'us.zillow.zillow_zip_1_2011_7d584ba935', 'column_id': 'us.zillow.SingleFamilyResidenceRental_Zri'}, {'colname': 'singlefamilyresidence_zhvi', 'table_id': 'us.zillow.zillow_zip_1_2011_7d584ba935', 'column_id': 'us.zillow.SingleFamilyResidence_Zhvi'}, {'colname': 'allhomes_zhvi', 'table_id': 'us.zillow.zillow_zip_1_2011_7d584ba935', 'column_id': 'us.zillow.AllHomes_Zhvi'}, {'colname': 'allhomes_medianvaluepersqft', 'table_id': 'us.zillow.zillow_zip_1_2011_7d584ba935', 'column_id': 'us.zillow.AllHomes_MedianValuePerSqft'}, {'colname': 'allhomes_medianrentalpricepersqft', 'table_id': 'us.zillow.zillow_zip_1_2011_7d584ba935', 'column_id': 'us.zillow.AllHomes_MedianRentalPricePerSqft'})]

@javitonino
Copy link
Author

Those tests are also broken in master and I've got no idea why. I'll try to see what's happening before merging this.

('region_name_sl', input_['geoids']['{}_{}{}'.format(tiger_geo,TIGER_YEAR,GEOID_SUMLEVEL_COLUMN)]),
('region_name_sc', input_['geoids']['{}_{}{}'.format(tiger_geo,TIGER_YEAR,GEOID_SUMLEVEL_COLUMN)]),
('region_name_sl', input_['geoids']['{}_{}{}'.format(tiger_geo, TIGER_YEAR, GEOID_SUMLEVEL_COLUMN)]),
('region_name_sc', input_['geoids']['{}_{}{}'.format(tiger_geo, TIGER_YEAR, GEOID_SHORELINECLIPPED_COLUMN)]),

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

line too long (121 > 120 characters)

@javitonino javitonino merged commit fa116a0 into master Jan 16, 2019
@javitonino javitonino deleted the fix-acs-2010-2014 branch January 16, 2019 18:41
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants