-
Notifications
You must be signed in to change notification settings - Fork 23
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Want to ignore data larger than a label but smaller than a user #2508
Comments
One idea that has been brought up is to start users out auditing with a smaller goal, like only looking for curb ramps. One issue with that is that we don't have anything set up in our database to incorporate those partial audits without just marking a user as low quality. Something like what I proposed about would help with that. We could ignore their early audits, and once they "graduate" to doing full auditing, we can start actually incorporating their work. Ideally we'd include their partial work as well! We have part of that set up now, where we include labels that have been validated as correct, even if they are from low quality users. |
Some extra ideas:
|
I think that using label-specific ratings would have to be an augmentation to the binary rating, or maybe it would be switched to after there is enough data from the user. We wouldn't be able to feel like we have an accurate feel for the quality that the user has for specific label types until they've collected quite a bit of data. The benefit of our current method is that, in addition to validations (which look at the false positive rate) we also check how many labels they've placed in total; if the number of labels is too low, we expect them to be contributing low quality data (lots of false negatives). If they go 3 blocks without placing any labels, it's highly unlikely that they are doing it right. But if they go 3 blocks without placing any given label type, that could be totally normal and okay. I do agree that users can perform better for different label types. I just don't think that going that granular can be our used as our primary metric for data quality. |
Oh and this also doesn't address the issue where users improve over time. As their scores in the different label types improve, how do we take that into account on the previous routes that they audited when their score was low? |
So, the goal of this PR is generally to move the "low quality" flag away from being a blanket flag on an entire user's set of data, and instead being applied individually to streets instead? And splitting the "low quality" flag up into more specific flags such as a "Stale Data" flag to be treated differently in terms of re-audit status, clustering, validation, and dashboard/leaderboard inclusion? If I'm getting that right, then how about flags like these: Types of flags:
Also, what metrics do we/could we use to determine whether a street meets these conditions (i.e. false positives, incomplete/false negatives)? How do we currently set a user as low quality and remove that flag from (i.e. automatically from validations, manually)? |
Thanks for this, very helpful for getting the ball rolling! Some comments below. Let me know your thoughts after this!
We don't care about the date of the audit so much as the date of the imagery that they used as part of their audit. To really make use of this, we would want to add tools to check the age of current imagery on a street and compare it to the imagery that was labeled by users on that street. This feels like it's outside the scope of this issue, so let's tackle it in a future one! Though it could be helpful to keep in mind that this is something that we'll want to do while you're working on this ticket!
Hmm, I don't think that we want to apply this to all users (I think that's what you're saying?). I'm imagining that we just add a I copied over your chart and made some edits, comments on the specific boxes below.
|
If we're still interested in the stale and early flags, I'd like to add them in now while implementing the new flags even if they aren't used until a later PR applies them. If I add them in, how does this logic seem?
Also, should automatically applying the Incomplete and Low Quality flags to old or new street audits be a part of this PR or just focus on adding the flags in this PR? If we add them during this PR, what metrics should be used to sort streets into these flag types? |
I think that we're interested in the "stale" flag, but I think that the "low quality" flag could be used in place of the "early" flag. And I agree that it could make sense to add the "stale" flag at the same time that you're adding the others. The only change I would make to your table for the "stale" flag is that we probably don't need to validate those labels, since they're old and shouldn't be included in clustering.
I don't have any plans for this to be automatically applied. I'd love for this PR to have some tools for admins to automatically mark old data for a user as low quality or incomplete though! |
Here's a plan for tackling this issue (and a few questions) to look over before digging into the implementation, divided by each function (columns in the above tables): Re-Auditing Clustering Validate Dashboard Leaderboard In terms of modifying the database, I was thinking we add new columns for each flag in the street edge table. Any thoughts on this set of changes? |
For auditing, I believe that updating the street "priority" should hopefully be the only thing that needs to change. Right now we essentially have two types of audits, either from high quality users or from low quality users; these two types influence street priority differently, with high quality user audits lowering the priority more. To try and keep things simple, let's start by just treating "low quality", "incomplete", and "stale" audits the same way as we currently treat audits from users marked as low quality.
Given that we should be doing everything through "priority", I don't think you'll need to make any changes to this code.
The goal of our clustering is to create a dataset that we can provide through our APIs that don't have duplicates. Since the same user can label the same thing from multiple angles, and multiple users can also label the same thing if they go down the same street, there are a lot of duplicated labels in our dataset. We employ clustering to group all the nearby labels into a single label "cluster". We then output those clusters through our APIs so that someone can get a list of all the curb ramps in a city without duplicates, for example. And so your job is to add some filters so that we exclude labels that are likely to be incorrect from this dataset!
Sounds good!
Just looked at this code. Despite being called from The wording is also a bit confusing in the names of variables and in the comments. It isn't actually filtering out low quality data right now; it's returning all the streets, but if audits are low quality then we are saying that the street is "unaudited". This is done with the line that says
I think you're actually looking for the
You don't want to edit the |
For validation of incomplete streets, do any modifications to the query in LabelTable.scala, at the function retrieveLabelListForValidation (line 545) need to be made? From my understanding, the label will still be included with or without the incomplete label, so it doesn't need to explicitly be included. Should the priority for labels incomplete streets be adjusted at all, or just leave it as is? |
I think we can leave as-is for now, and yes they'll be included unless you make a change there! |
I've been thinking about how to implement the admin controls for setting the "low_quality" and "incomplete" flags, how does something like this seem? After clicking on a user's username from the Admin page, you're brought to a page on the user's stats. Scrolling down to the "Audited Streets" section, each street would have a dropdown to mark that street for the user as "incomplete" or not. Below the table, a text input or calendar dropdown of some find could be used to select a date for all streets before that date to be marked "low_quality." Does this seem like a good direction for how to implement this in the admin page? Also, for the current low quality flag that's currently applied to users, is this automatically calculated and applied or can admins also set this flag for users? |
Oooo, actually there is the issue of streets where the user didn't place any labels at all! So it would be better for this to be an option that you are given when you click on the actual street on the map! I'm actually refactoring all the code for the maps as we speak... So if you want to leave that part for later, that might ease the issues of merge conflicts between our branches. But if that's where you're at, feel free to implement. It won't be that bad to merge! |
I think that it's okay to make all of their tasks for a street the same. In general, users won't be completing more than one task on the same street. It should be very rare. And If you are looking at their labels and think that their work on the street is lacking, then generally it's a combination of all of their audits of that street.
It shows a popup giving you that option right now. We made that for LabelMap and just brought it over to this map since we are reusing the code. We could either replace it with the option to mark the street as low quality / incomplete on this map, or we could just add it on so that you have the option to do both!
Oops, I definitely meant to say something on this earlier! I think that right above the "Labels" section is probably good. Everything below that summary stats table is almost never looked at. At some point I want to just combine this code with the User Dashboard code so that we can just see what the user would normally see, plus some extra stuff #3070. Out of scope for this PR, but just letting you know what's in the plans!
I like it! Remind me why we wouldn't also have such an option for marking them as incomplete?
In general, we think of a user auditing a street as a single task. If they have multiple tasks for the street, give them all the same marking. No need to over-complicate things by making admins keep track of multiple tasks for a single street. So just the two buttons will suffice, no need to even use the word "all" in the language.
Yep!
Yep! And this should mark all of a user's tasks on the street as such. |
Brief description of problem/feature
I would love to be able to mark a user's audit of a specific street as something to ignore in the database. Sometimes a user will input junk data, and we can manually mark them as "low quality" and that's all well and good. But what I'm increasingly coming up against are users that begin by contributing pretty poor quality data, I look at the bad data and give them feedback, and then they improve going forward.
Now we either have to mark the user as "low quality" in our database and ignore all of their contributions, or we have to make users validate each of their labels individually. And even with the latter, if an issue they had when auditing was that they were missing a lot of problems, we can't take care of that during validation. So if the user gets better and gets marked as "high quality", the first 10 streets that they audited poorly will be considered "high quality" by us.
So I'd like to be able to mark specific streets audited by a specific person as either "low quality" or even give it an "ignore" flag. This way, we will make sure that someone else audits that street again.
There are other times that we have wanted this feature, too. For example, we might look at Label Map and see that there is a street with very few labels on it, and we check in GSV and see that data is missing for that street. We've wanted to make sure that that street gets re-audited, and this would be a way to accomplish that.
Now there are a ton of decision to make when implementing this. Which flags do we want and what exactly should their meaning and function be? We might want a "low quality" flag and a "stale data" flag (see #1165). If we mark data as "low quality", what all should we remove that data from?
The text was updated successfully, but these errors were encountered: