-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Pluto 24v3 #976
Comments
waiting for PR #977 for a PLUTO zoning fix and for COLP to pass QA |
Zoning fix has been merged. COLP is still in QA. We decided to proceed with PLUTO build based on the previous COLP version (Dec '23) to avoid release delays |
Preliminary DE QA review based on the QAQC page ("last of same version type" comparison type):Build:
Aggregate Changes
Expected Value Comparison
Outlier Analysis
New and Vanished BBLs
|
Couple random comments
So seems like two big things to flag for gis
|
The sanitation districts is odd - looks like this only comes from |
With GIS for QA review. |
QA'ed PLUTO today. Everything seems to be in order, just going to have Jack take a look at it tomorrow for final conformation. Two quick flags:
FYI @NYCPlanning/data-engineering @jackrosacker @croswell81 |
Copying my note on zd3/4 from last time - only 215 records have zd3, only 13 have zd4. And we actually have a single zd3 change this time, which works out to a higher percentage of zd3 lots that had zd3 change than normal lots had zd1 change. So I don't think that's something we need to worry about. |
Sweet thanks. I'll note this in our QA doc so it doesn't come up again next time. |
FYI @sf-dcp GIS is signing off--with the caveat that DE should verify the lot area (mistakenly labeled floor area, according to finn) increase of BBL 3000160017. Barring any concern on DE's end, ready for promotion and subsequent publication. |
Hi @caseysmithpgh , @fvankrieken and I checked the lot and the result is... interesting.
Is the lot area in PLUTO supposed to include land area only or its total area? |
Hmm--interesting case. I don't know the answer, happy to take a look at the data dictionary to see if there is something in there about it. |
The Data dictionary says the following:
Nothing explicitly stated about land vs non-land area, but I imagine since that's the case the entirety of the lot is included in the LotArea |
Hi @caseysmithpgh, thank you for checking the data dictionary. Since the area aligns with the unclipped lot size and there is nothing in data dictionary indicating land-only area, we are moving forward with publishing PLUTO. |
@sf-dcp Great, thanks! Just let me know once it's been promoted and I will get started with our distribution process. |
Done! |
Hello @NYCPlanning/data-engineering! Found some weirdness while prepping QA data for this version of PLUTO today. 9 tax lots are showing a change in the BCT2020 field (census tract), when comparing 24v2 and 24v3. We're going to hold off on publishing until we get a chance to chat with Matt on Monday, and wanted to note the lots to you all as well.
These lots fall into three categories: (1) long linear lots that span multiple census tracts, (2) lots that just brush the edge of a census tract, and (3) small, regular lots that are fully contained within a census tract but have still changed in value. My query that found these variable values selected BCT2020 arbitrarily, and it's possible that similar changes exist for other fields as well. Also note that there are some odd lot boundary changes in the vicinity of |
Hmm. These tracts come from geocoding PTS, not via spatial joins, so it's odd that we'd see inconsistent behavior. So it would seem that this is due to either
|
Looking into the issue more, we get census tract field either through geocoding BBLs with Geosupport or via spatial join with census tract dataset. And these BBLs identified by Jack have one of the two situations:
We will be meeting with Amanda to research the issue and see if/how we can solve the changing census tract values long-term. We can move forward with publishing the current PLUTO version. |
@sf-dcp thanks for the summary and sounds good. We also had a chance to check in with @croswell81 re the tax lot errors that we found in the Battery Park City area, and it looks like we will need to re-build PLUTO 24v3 to incorporate the DTM that DOF fixed at end of day last Friday. We're in the process of republishing today's DTM to Digital Ocean and will update when the build is ready to go. Worth discussing at some point how to add some topology rules to our QA process to catch things like significant tax lot overlaps - might be a good topic for when we sit down for the next PLUTO QA together. |
Sounds good! Will let you know when it's ready for the 2nd round of QA |
@caseysmithpgh @jackrosacker @croswell81 Not creating a GH issue for QA since we've started this data update before the new process. PLUTO 24v3 has been re-built and promoted to draft folder under Notes:
|
[like] Matthew Croswell (DCP) reacted to your message:
…________________________________
From: sf-dcp ***@***.***>
Sent: Wednesday, August 21, 2024 9:25:54 PM
To: NYCPlanning/data-engineering ***@***.***>
Cc: Matthew Croswell (DCP) ***@***.***>; Mention ***@***.***>
Subject: [EXTERNAL] Re: [NYCPlanning/data-engineering] Pluto 24v3 (Issue #976)
CAUTION: This email originated from outside of the organization. Do not click links or open attachments unless you recognize the sender and know the content is safe. Forward suspect email to ***@***.******@***.***> as an attachment (Click the More button, then forward as attachment).
@caseysmithpgh<https://github.com/caseysmithpgh> @jackrosacker<https://github.com/jackrosacker> @croswell81<https://github.com/croswell81>
Not creating a GH issue for QA since we've started this data update before the new process.
PLUTO 24v3 has been re-built and promoted to draft folder under draft/24v3/2-update-dtm-and-correct-units/.
Notes:
* New DOF DTM data was used to fix the previous tax lot errors
* Other input datasets, including COLP, were also updated. You will notice a slightly bigger change to ownername field compared to previous draft (result of newer COLP data).
* We edited unitsres and unitstotal values for 13 condo lots as a result of manual research<https://github.com/NYCPlanning/data-engineering/pull/1089/files>.
—
Reply to this email directly, view it on GitHub<#976 (comment)>, or unsubscribe<https://github.com/notifications/unsubscribe-auth/AICWQSVDFEW26Z2G6XH7EXDZSUAWFAVCNFSM6AAAAABKNNLCSSVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDGMBTGAZTSOBUHA>.
You are receiving this because you were mentioned.Message ID: ***@***.***>
|
Hey DE, noting that we're still seeing issues with the DTM that was used for this latest draft build of PLUTO. We've flagged this to DOF for troubleshooting, and to Amanda to help determine best next steps for publication, so just keeping you all up to date. The issue is either on the DOF or GIS Team side, so not DE build-related. |
Hey DE, Matt was able to get a clean copy of the DTM from DOF and/or their consultant. I'll plan to process it first thing tomorrow so y'all can get started with a re-build. Can you confirm what zoning data was used to build 24v3 draft 2? If it was June, would it be reasonable for draft 3 to be built with July zoning (latest)? |
@caseysmithpgh the version I'm seeing for zoning data used in 23v3 draft 2 is so draft 3 with also have the latest |
@damonmcc sounds good. I can confirm |
Per discussion with @AmandaDoyle, we will manually correct 2020 census tract value for the JFK airport BBL (4142600001) as a part of this PLUTO version (@jackrosacker, this BBL is from your list above). We will continue researching the other BBLs with GRU. We will let you know when new draft is ready for your review. |
Hi GIS team, new draft In this draft, we used updated DTM data. We also corrected 2020 census tract value for the JFK lot. |
Draft 3 QA FailedHard Fail:
Flags:
@sf-dcp |
Hi @caseysmithpgh, q about edes: should we use the version 20240712 instead of the one we used in the latest draft? |
Morning @sf-dcp, sorry I could have been more clear. According to the QA page, the 20240712 version was used in the latest build. The version that should be used is GIS v 20240806, which was uploaded to digital ocean on 8/16/24 and can be found in the usual staging folder. Jack and I did confirm the same issue did not persist in the latest version. |
All data relevant to the new pluto build has been staged in DO. Please ensure you've pulled the latest versions of the following into recipes prior to building.
fyi @croswell81 @damonmcc |
Hi @caseysmithpgh, PLUTO has been rebuilt and can be found under
|
Thanks for the rundown @sf-dcp, jack and I will schedule some time to QA the latest draft on Monday or Tuesday. |
@fvankrieken I can confirm the sanitsub & sanitdistrict spikes are expected. The mid-cycle geosupport release is specifically for the new organics changes to those two fields. Reminder to all - we need to update pluto data dictionary, readme, and maybe change file with the zoning and special district values before sending to webteam. |
Draft 4 QA FailedHard Fail:
Flags:
@sf-dcp |
FYI: Re: Notes field Field Name: NOTES (Notes)
|
Re: EDesignation changes, edited to correct record counts and explanation at the end TLDRFor lots with multiple records in So if a lot has a new record which ends up "first" in that sorted list and has a different EDes number than the previous "first", it'll change for that lot in PLUTO. detailsThat count of ~300 values changing seems to be a combination of "lots which gained or lost an EDes number" and "lots where the EDes number changed" (relevant sql here). It seems like 21 lots have had their EDes number changed. This is how we choose the EDes number (
Below is an example of a lot (
Without having seen the previous version of |
Hi GIS, I re-built PLUTO with the corrected zoning column |
Hey @sf-dcp, it looks like, yes, we should wait to sign off on this for right this moment. We'll stop by and check with GR today so that we have a prompt turnaround on this. |
@jackrosacker & @caseysmithpgh I saw Casey's email and I promoted new PLUTO build to draft here. |
Draft 5 QA - Passed Flags: Sanitation districts and sub-districts: Non-insignificant number of changed values. Confirmed with Matt that this is not a result of the organics changes, as this is not yet in geosupport. Not a major blocker, but we would like to understand why the changes. Community Gardens - noticed the record count decreased by 5, this is valid since a housing development merged 5 former CG lots into one development lot. QA report notes:
Full report available on SharePoint here @sf-dcp |
Nice! We will modify the qaqc page. Here are the requested source versions in the meantime:
PLUTO Readme: the only logic change in PLUTO is increasing the field size from 9 to 12 characters for PS: PLUTO has been promoted to the |
24v3 seems to be on Bytes, so we can distribute to Open Data now |
@damonmcc Yes, I tagged @alexrichey in our Monthly open data issue 1074. We can tag all of Data Engineering in the future if that is helpful. |
@croswell81 no worries on who to tag. the need to wait for Bytes is temporary, so improving any "it's on Bytes!" notifications probably isn't worth it |
PLUTO has been distributed to Bytes & Open Data. |
Main tasks
Data loading
Manual Updates
Updated 2x a year typically in June and December
Automated Updates
Open data automated pull
DOF Automated Pull and Number of Buildings
Updated with Quarterly updates (check here)
Updated with Zoning Taxlots
(check here for latest run).
These are all produced by GIS, who typically update them sometime in the first week of each month.
Check in with them before archiving with data library
Never Updated (Safe to ignore)
The text was updated successfully, but these errors were encountered: