Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

image_width and image_height incorrect in database #2485

Closed
misaugstad opened this issue Mar 10, 2021 · 16 comments · Fixed by #2814, #3173 or #3199
Closed

image_width and image_height incorrect in database #2485

misaugstad opened this issue Mar 10, 2021 · 16 comments · Fixed by #2814, #3173 or #3199

Comments

@misaugstad
Copy link
Member

As pointed out in this comment on the sidewalk-panorama-tools repo, the sv_image_width and sv_image_height values are not always correct in our database. I investigated a bit further. It turns out that a fixed width and height were coded in from the beginning:

GSVData(panorama.gsvPanoramaId, 13312, 6656, 512, 512, panorama.imageDate, 1, "", false, Some(timestamp))

And that is the resolution of (at least most of) the imagery in DC, but that is not always the resolution of GSV imagery. In Seattle, there are often higher resolution images (16384x8192).

I did some digging, confirmed that we can get the pano's full width and height from the JavaScript API when we're looking at it on the audit page. And the tile size should actually remain the same (the 512's above), but we should be filling that programmatically as well! We can access all of that in the callback to the getPanorama function on the front-end using:

svl.streetViewService.getPanorama({ pano: panoId },
    function (data, panoStatus) {
        console.log(data.tiles.worldSize.width);
        console.log(data.tiles.worldSize.height);
        console.log(data.tiles.tileSize.width);
        console.log(data.tiles.tileSize.height);
    }
});

This is all great moving forward! Unfortunately, I don't think that there is a way to figure this out programmatically for images that are no longer being hosted by Google. For any where we still have access to the imagery, we should try to backfill the tables to have the correct values though.

@misaugstad
Copy link
Member Author

We also have these lines in public/javascripts/SVLabel/src/SVLabel/Main.js:

svl.canvasWidth = 720;
svl.canvasHeight = 480;
svl.svImageHeight = 6656;
svl.svImageWidth = 13312;
svl.alpha_x = 4.6;
svl.alpha_y = -4.65;

Not sure if any other values need to change based on changing the image width. Honestly don't know what the alpha values are used for.

@misaugstad
Copy link
Member Author

Well, I may have found a problem with our sv_image_x coordinates. Looking at these lines in Point.js:

if (svImageCoord.x < 0) {
    svImageCoord.x = svImageCoord.x + svImageWidth;
}

So if the coordinate is negative, we've been adding the image width to it. And that image width was not always correct :)

Just did a test on a label and it had an x value of -757. Then 13312 was added to it, so it's set to 12555. But I think we should have added 16384 to it, bringing it to 15627.

And even calculateImageCoordinateFromPointPov(), which calculates sv_image_x, uses the image width:

imageX = (svImageWidth * (heading / 360) + ((svImageWidth / 360) / 2)) / zoomFactor;
imageY = ((svImageHeight / 2) * (pitch / 90)) / zoomFactor;

It might take some work to figure out how incorrect our data are and how much we will be able to fix.

@jonfroehlich
Copy link
Member

Ugh, that's no good. So two things here:

  • We have to fix the bug
  • We have to figure out how many previous data points were affected and whether we can address that via a post hoc "database update" script or something

@misaugstad misaugstad changed the title image_width and image_height incorrect in database image_width and image_height incorrect in database Mar 10, 2021
@robertyoung2
Copy link

Hi @misaugstad,

I've put together a sample list of label_ids for dropped curbs where I felt the crops were not making much sense (no dropped curb in the vicinity of the crop). As you mentioned in Depth data endpoint removed, need to get metadata elsewhere #8 it looks like all these odd label positions only seem to occur in the larger gsv image size of (16384, 8192). I tested and observed all these labels with and without a scaling factor for sv_x_image and sv_y_image (multiplied by 1.23). This did not improve the position for any crops bar label_id: 116193, which could just be luck.

I selected 25 labels as a sample from the roughly ~2,500 crops I processed using the metadata csv you provided me with. There may be human error involved as the label_ids I selected were the ones I deemed as 'incorrect'. This issue seems to occur less often than Labels not placed directly on obstacles #2478 based on the sample crops.

incorrect_label_scaling.csv

Example:

41822
Scaled Label ID: 41822

41822
Non-Scaled Label ID: 41822

As always, let me know if this is useful, or if you require further information/samples!

@misaugstad
Copy link
Member Author

This is very helpful, thank you! And these are some nice examples we can use to see if we can fix this issue 😉

@misaugstad
Copy link
Member Author

Another thing we are seeing is that the photographer_heading and photographer_pitch will sometimes change by a small margin, typically around 1 degree, rarely more than 3 or 4 degrees (not really sure why). I did not think that this was a value that could change, but it looks like we did actually understand this, judging by the comment and following code here.

As of right now, we aren't re-rendering labels that were placed in the past on a pano, but that code would be useful once we merge our open PR that does that ;) And we should try adding that kind of code to the label viewers in LabelMap, Gallery, and the Admin page to see if that improves label placement.

Given that the photographer_heading and photographer_pitch values are used to compute sv_image_x and sv_image_y, it would make sense that our cropper would perform better when using the heading/pitch recorded at the time that the label was placed. And after @ThatOneGoat and I did some preliminary testing, that seems to be the case.

Cross-posting with #2741

@misaugstad
Copy link
Member Author

Oops, there's a little more to do here! Will update soon.

@misaugstad misaugstad reopened this Mar 17, 2023
@jonfroehlich
Copy link
Member

@shokiami just sent us a CSV with a list of pano ids along with the error in pixels of the GSV to 2D translation. I wasn't sure where to add this? Maybe in sidewalk-panorama-tools? I know @crescendochu is interested in this as well.

errors.csv

@misaugstad
Copy link
Member Author

Below are some updates on things that were talked about in this thread. These are the less important updates, so I kind of want to just separate them out into their own comment. More coming soon on errors in the actual sv_image_x values :)

  1. Not sure if any other values need to change based on changing the image width. Honestly don't know what the alpha values are used for.

    When it comes to alpha_x and alpha_y, we just don't use those. I think that they might have been used for an old way that we computed label location, but it hasn't been used since I started working on the project in 2017. In PR Remove old Tohme code, cleaning Explore page code, and more! #3173 I removed them from the database. I just have them kept in constants called SVL.ALPHA_X and svl.ALPHA_Y in case they are ever relevant again.

  2. It turns out that a fixed width and height were coded in from the beginning

    We had previously not been recording the correct image_width and image_height in the database, and images were therefore being downloaded incorrectly by our scraper. TLDR; We are recording the correct data and downloading the correct image sizes going forward, but a tiny fraction could still potentially be fixed (I'm not sure it's worth fixing).

    • In PR Record more panorama metadata #2814 we marked image_width/height entries in our databases as NULL, since we didn't know if old values were accurate. All data since then (March 2022) is correct.
    • We then re-downloaded any images with the pano scraper so that they had the correct width/height. If the panos were old enough to have XML data (which was only available with the depth data), then we used the width/height in that XML as a backup.
    • I don't fully remember if we backfilled the image_width/height columns in the database based on the XML, though circumstantial evidence points to us having done that :)
    • The image_width/height columns are NULL in our database for only 2k panos (1% of the ~200k that we have). I've verified that at least some of these panos are still available from Google, so we could certainly write a script that checks for the data from Google and updates the database. It would have to be in JavaScript, since we can only get this data from the StreetViewService JS object. I also doubt that we'd be able to get more than half of the data that we're missing due to missing imagery and user-contributed panos. We should discuss if this is worth fixing.

@misaugstad
Copy link
Member Author

Now the juicy stuff: The sv_image_x values are way off in some cases, which has HUGE implications for CV.

As noted above, one big reason for the incorrect sv_image_x values was that the image_width was hardcoded, even though there are panos of different sizes! As of #3173, this has been fixed, though it is not yet on production. This was causing different problems in different situations:

  1. In most cases, we were post-hoc scaling the sv_image_x/y values based on image size, as mentioned in this comment. So it was at least fixable in this case!
  2. There were cases when sv_image_x was negative mid-computation (just based on the heading angles we were using), we were adding the image_width to it to make it positive... But with an incorrect hardcoded width, scaling no longer based on image size no longer worked!

I have also just found cases where the way we calculate the angle is just straight up wrong. I'm not totally sure what the old math is trying to do, but it seems to only be accurate when the photographerHeading is near 180 degrees! I'll start with how I'm changing it, then show an example, and finally talk about how we can fix it (teaser: we have the data to fix it for old labels!).

Our old way of calculating the sv_image_x was the following:

let oldImageX = function(svImageWidth, heading) {
    var imageX = svImageWidth * (heading / 360) + ((svImageWidth / 360) / 2);
    if (imageX < 0) imageX += svImageWidth;
    return imageX;
}

Here's the new way that I'm doing it (note that the photographerHeading matters):

let newImageX = function(svImageWidth, heading, photographerHeading) {
    // The photographerHeading represents the center of the image.
    // Subtract 180 to find the heading where sv_image_x = 0.
    let headingPixelZero = photographerHeading - 180;

    // Both headings are between -180 and 180, convert to 0 to 360 to make math easier.
    heading = (heading + 360) % 360;
    headingPixelZero = (headingPixelZero + 360) % 360;

    // We then find the difference between label's heading and the heading where sv_image_x = 0.
    // Divide by 360, multiply by the image width, and that's your sv_image_x!
    let imageX = svImageWidth * (heading - headingPixelZero) / 360;
    if (imageX < 0) imageX += svImageWidth;
    return imageX;
}

An example! Let's look at label 19237 in Seattle on pano 66a-Se8TKoUjN_MEFtWeqg. Here are some variables that we know:

let svImageWidth = 16384;
let photographerHeading = 153;
let heading = 91;

Here's what we get with our two different methods:

oldImageX(svImageWidth, heading);
// result: 4164
newImageX(svImageWidth, heading, photographerHeading);
// result: 5370

Here is what the label looks like:
Screenshot from 2023-03-23 15-25-33

And here are the two locations that we get:
Screenshot from 2023-03-23 15-47-25

Can we fix the old data? Yes! Is it just a simple SQL query? I don't think so! Thankfully, we have all the data that we need. But there is a lot of math that we have to do to get from what we have to the new sv_image_x/y values. In our JavaScript, we would need to use the svl.util.panomarker.calculatePointPov() function (a 48 line func) to get the heading value that we need to input to our newImageX() function above. I'm honestly not sure whether it's easier to do that in a messy SQL query (or series of queries where I save intermediate computations in temporary tables) or in a script in JS/Python.

Either way, I think that the next step should be to write that script/query, and then compare that to the data that we have now. I want to see how much of our data would change by a significant number of pixels. How drastic of a change would it be? I'd love to see how that directly impacts CV performance, but I'm not sure that that's worth my time unless someone else already has something set up that would make it easy to test this.

Then we should make the update assuming that all looks good! I believe that we have multiple CV projects with partners ongoing rn, so fixing this is probably a pretty high priority...

Always interested to hear you thoughts @jonfroehlich!

Some other notes as I wrap up for the day:

  • I haven't looked into the sv_image_y calculation much, but it looks right if we indeed get a full 180 degree field of view in the Y-direction. Will look into it soon.
  • I don't know how this hasn't been a glaring issue during CV work. I just checked 10 or so labels, some new and some old, where the photographerHeading is far from 180, and I'm seeing this issue on all of them. Of course this is just anectodal so far, and we'll want to wait on the analysis of how much data would change.
  • The new way for calculating sv_image_x isn't even on the test servers yet. I just figured out and tested this math today!

@jonfroehlich
Copy link
Member

Wow, I could hug you @misaugstad. Such great work. Woohoo!

I agree that this is high priority. Are the test cases that @shokiami sent useful? @crescendochu said she also might have some good test cases (where labels seemed off after 2D translation).

Also, would love to fix this retrospectively for all data if possible... given again that most projects will use our old data too.

@misaugstad
Copy link
Member Author

Are the test cases that @shokiami sent useful?

I don't think that they've been helpful thus far. I think that the values in there include the adjustments that they had made to the location of label through the cropper. So the x/y values in there aren't the same ones that we have in the database. And all of the errors listed in that CSV are on the order of like 10-200 pixels, whereas I'm dealing above with errors that are more in the realm of many 100s of pixels.

That being said, this may be more helpful when I look at sv_image_y, since those errors are of a similar magnitude. If I make adjustments to how we calculate it, it will be useful to compare the new calculation to what they got after their adjustments to see which works better!

Also, would love to fix this retrospectively for all data if possible... given again that most projects will use our old data too.

Totally agree. I'm totally fine with leaving DC as our pilot, and I consider the data after that to be important, and we have a lot of it!

@jonfroehlich
Copy link
Member

Gotcha. Thanks for the update.

Now that your brain is "dug in" on this, I think it would be most efficient to continue the deep dive and continue the explorations of sv_image_y, the relevant problems + fixes you've identified, and then how to fix this retrospectively in the database.

Exciting to have this update!

@misaugstad
Copy link
Member Author

In #3192 I just fixed the sv_image_x calculation going forward. We still need to update it retroactively.

I just took a look at sv_image_y, and the calculation looks to be correct.

The issue with sv_image_y is that we aren't actually giving them the y pixel... We basically give them the y-pixel offset from the center of the image for some reason. So when someone uses our API (including us for CV projects), we need to take the height of the image and subtract the sv_image_y value from Project Sidewalk to get the actual sv_image_y.

Super easy to fix both retroactively and for the future (we just need to update it downstream in CV pipelines). For the future, we need to change this line

var imageY = (svImageHeight / 2) * (pov.pitch / 90);

to this

var imageY = svImageHeight / 2 - (svImageHeight / 2) * (pov.pitch / 90);

Retroactively would look something like

UPDATE label_point
SET sv_image_y = gsv_data.image_height / 2 - sv_image_y
FROM label
INNER JOIN gsv_data ON label.gsv_panorama_id = gsv_data.gsv_panorama_id
WHERE label_point.label_id = label.label_id;

Then there is still the issue where we know that the sv_image_y values can be off somewhat. This probably has to do with the math in povToPixel3DOffset() instead of calculateImageCoordinateFromPov(). I've been editing the latter, but the former should be investigated as well.

@misaugstad misaugstad mentioned this issue Mar 29, 2023
@jonfroehlich
Copy link
Member

Before we do the retrospective fixes, let's make sure we create backups of all city databases. I think you'd do this anyway but just emphasizing that :)

@misaugstad
Copy link
Member Author

Great point. Now seems like a decent time to take a snapshot of each database, so I'll get started on that today!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment