Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Want to ignore data larger than a label but smaller than a user #2508

Closed
misaugstad opened this issue Mar 19, 2021 · 17 comments · Fixed by #3507
Closed

Want to ignore data larger than a label but smaller than a user #2508

misaugstad opened this issue Mar 19, 2021 · 17 comments · Fixed by #3507

Comments

@misaugstad
Copy link
Member

misaugstad commented Mar 19, 2021

Brief description of problem/feature

I would love to be able to mark a user's audit of a specific street as something to ignore in the database. Sometimes a user will input junk data, and we can manually mark them as "low quality" and that's all well and good. But what I'm increasingly coming up against are users that begin by contributing pretty poor quality data, I look at the bad data and give them feedback, and then they improve going forward.

Now we either have to mark the user as "low quality" in our database and ignore all of their contributions, or we have to make users validate each of their labels individually. And even with the latter, if an issue they had when auditing was that they were missing a lot of problems, we can't take care of that during validation. So if the user gets better and gets marked as "high quality", the first 10 streets that they audited poorly will be considered "high quality" by us.

So I'd like to be able to mark specific streets audited by a specific person as either "low quality" or even give it an "ignore" flag. This way, we will make sure that someone else audits that street again.

There are other times that we have wanted this feature, too. For example, we might look at Label Map and see that there is a street with very few labels on it, and we check in GSV and see that data is missing for that street. We've wanted to make sure that that street gets re-audited, and this would be a way to accomplish that.

Now there are a ton of decision to make when implementing this. Which flags do we want and what exactly should their meaning and function be? We might want a "low quality" flag and a "stale data" flag (see #1165). If we mark data as "low quality", what all should we remove that data from?

  • Of course we want to mark the street to be re-audited.
  • I assume that we would want to remove the data from clustering in most cases. But what if the data is good but just incomplete? Then we might want to mark it for re-audit but still include that data in clustering.
  • Should we be validating those labels? I think this has the same argument as the previous bullet. It implies that we might want to split the "low quality" flag into multiple, more granular, flags.
  • Should the data be included in evaluating whether the user contributes high quality data or not? If we just see a one-off street that wasn't labeled well, I don't think that we want to do much more than just mark it for re-audit. But what if this is a user that started out with low quality data, but has since received some training from myself of @gari01234? Ideally, I would like to ignore these streets when it comes to computing performance metrics for the user. Sometimes I'm tempted to just tell someone to create a new account after I've given them feedback so that I can just mark all their old contributions as "low quality", and with their new training they will have all that data on a new account. We might be able to accomplish something like this on the back-end by completely ignoring old contributions.
  • Assuming that we are almost completely ignoring a set of audits for a user when computing these metrics... How far does that go? I assume that we want to leave those contributions on their user dashboard? What about the leaderboard?
@misaugstad
Copy link
Member Author

One idea that has been brought up is to start users out auditing with a smaller goal, like only looking for curb ramps. One issue with that is that we don't have anything set up in our database to incorporate those partial audits without just marking a user as low quality.

Something like what I proposed about would help with that. We could ignore their early audits, and once they "graduate" to doing full auditing, we can start actually incorporating their work. Ideally we'd include their partial work as well! We have part of that set up now, where we include labels that have been validated as correct, even if they are from low quality users.

@davphan
Copy link
Collaborator

davphan commented Oct 26, 2023

Some extra ideas:

  1. Could users have a score associated with each label type? Users could start with a low score for each label type, and as their labels get validated, their "label score" increases/decreases (low score validations could also maybe give users a warning pop-up on login like "hey! people keep saying your "no sidewalk" labels suck... Here's some tips!). Labels with a low score for that type from a user could also be prioritized more for validation than labels with a high score type from that user (i.e. user A has a low score for "no sidewalk" labels, so all their "no sidewalk" labels get prioritized for validation; user A has a high score for "Missing curb ramp" labels so "missing curb ramp" labels from user A are deprioritized for validation).
  2. Rather than a low-quality vs high-quality binary model on users, could we give each street a priority score that starts at a max (high priority) when they have no labels and drops depending on factors like:
  • Label scores for each user that audited the street (for each user that audited the street, the more high label scores they have, the more the priority drops)

    This could also solve the issue mentioned in Increase audit priority for streets with no labels #2075 where missing labels didn't having a metric to prioritize streets; with this system, if a user's score for a label is low, the street will have a higher priority allowing other users to audit and catch missed labels (if another user catches a missed label, we could also make it lower the previous user's label score)

  1. Regarding the 5th bullet point, we could have these label scores shown in the user dashboard so they know relatively how much their labels are prioritized

@misaugstad
Copy link
Member Author

Rather than a low-quality vs high-quality binary model on users

I think that using label-specific ratings would have to be an augmentation to the binary rating, or maybe it would be switched to after there is enough data from the user. We wouldn't be able to feel like we have an accurate feel for the quality that the user has for specific label types until they've collected quite a bit of data. The benefit of our current method is that, in addition to validations (which look at the false positive rate) we also check how many labels they've placed in total; if the number of labels is too low, we expect them to be contributing low quality data (lots of false negatives). If they go 3 blocks without placing any labels, it's highly unlikely that they are doing it right. But if they go 3 blocks without placing any given label type, that could be totally normal and okay.

I do agree that users can perform better for different label types. I just don't think that going that granular can be our used as our primary metric for data quality.

@misaugstad
Copy link
Member Author

Oh and this also doesn't address the issue where users improve over time. As their scores in the different label types improve, how do we take that into account on the previous routes that they audited when their score was low?

@davphan
Copy link
Collaborator

davphan commented Nov 28, 2023

So, the goal of this PR is generally to move the "low quality" flag away from being a blanket flag on an entire user's set of data, and instead being applied individually to streets instead? And splitting the "low quality" flag up into more specific flags such as a "Stale Data" flag to be treated differently in terms of re-audit status, clustering, validation, and dashboard/leaderboard inclusion? If I'm getting that right, then how about flags like these:

Types of flags:

Flag Name Re-audit? Include in Clustering Validate Include in Dashboard Include in Leaderboard Notes
Low Quality Street is most likely complete but with bad label types/locations (false positives)
Incomplete Street is likely incomplete (false negatives)
Stale Old data, if audit date is before a designated date, automatically apply stale flag
Early For early user data, apply a date to each user where labels before that date are considered bad quality from new users, if street is audited before that date from user then apply flag

Also, what metrics do we/could we use to determine whether a street meets these conditions (i.e. false positives, incomplete/false negatives)? How do we currently set a user as low quality and remove that flag from (i.e. automatically from validations, manually)?

@misaugstad
Copy link
Member Author

Thanks for this, very helpful for getting the ball rolling! Some comments below. Let me know your thoughts after this!

Old data, if audit date is before a designated date, automatically apply stale flag

We don't care about the date of the audit so much as the date of the imagery that they used as part of their audit. To really make use of this, we would want to add tools to check the age of current imagery on a street and compare it to the imagery that was labeled by users on that street. This feels like it's outside the scope of this issue, so let's tackle it in a future one! Though it could be helpful to keep in mind that this is something that we'll want to do while you're working on this ticket!

For early user data, apply a date to each user where labels before that date are considered bad quality

Hmm, I don't think that we want to apply this to all users (I think that's what you're saying?). I'm imagining that we just add a low_quality flag to the audit_task table, and then we add an admin tool that can automatically mark low_quality as true for all streets audited by them before a given date.


I copied over your chart and made some edits, comments on the specific boxes below.

Flag Name Re-audit? Include in Clustering Validate Include in Dashboard Include in Leaderboard Notes
Incomplete The labels placed on the street were good, but a lot of labels were missed (false negatives)
Low Quality Street was otherwise audited poorly
  • Incomplete, include in clustering: In general, we want to include high quality labels in clustering! So if we think that the data is high quality, just incomplete, then we can include the labels in clustering!
  • Low Quality, validate: I think that if we expect that the user's labels are low quality, then we should deprioritize them in the validation pipeline. If we think the data is bad, we don't want to waste other users' time looking through those bad labels.
  • Incomplete, validate: Similar argument to above; if we think the data that was provided is good, then we can forward that along to the validators.
  • Include in dashboard: Maybe we should just include all of this data in the users' dashboards. It might be confusing to them if they look back at their dashboard and see data they had added has just disappeared. I'm not 100% convinced, but I think that it will minimize the number of times that we get emails asking why their data is missing!

@davphan
Copy link
Collaborator

davphan commented Nov 29, 2023

If we're still interested in the stale and early flags, I'd like to add them in now while implementing the new flags even if they aren't used until a later PR applies them. If I add them in, how does this logic seem?

Flag Name Re-audit? Include in Clustering Validate Include in Dashboard Include in Leaderboard Notes
Stale Old data (method tbd for future PR), automatically apply stale flag.
Early Data For early user data, flag as a date, manually applied by admins. If date is set and street audit date is before declared date, then apply flag. Otherwise (or if no date set, default), don't apply flag.

Also, should automatically applying the Incomplete and Low Quality flags to old or new street audits be a part of this PR or just focus on adding the flags in this PR? If we add them during this PR, what metrics should be used to sort streets into these flag types?

@misaugstad
Copy link
Member Author

If we're still interested in the stale and early flags

I think that we're interested in the "stale" flag, but I think that the "low quality" flag could be used in place of the "early" flag. And I agree that it could make sense to add the "stale" flag at the same time that you're adding the others.

The only change I would make to your table for the "stale" flag is that we probably don't need to validate those labels, since they're old and shouldn't be included in clustering.

should automatically applying the Incomplete and Low Quality flags to old or new street audits be a part of this PR or just focus on adding the flags in this PR? If we add them during this PR, what metrics should be used to sort streets into these flag types?

I don't have any plans for this to be automatically applied. I'd love for this PR to have some tools for admins to automatically mark old data for a user as low quality or incomplete though!

@davphan
Copy link
Collaborator

davphan commented Dec 29, 2023

Here's a plan for tackling this issue (and a few questions) to look over before digging into the implementation, divided by each function (columns in the above tables):

Re-Auditing
Auditing looks to be done through tasks, which are ranked by priority. First, in AuditTaskTable.scala in the selectANewTaskInARegion function (line 411), modify the "possibletasks" query to include streets with an Incomplete, Low Quality, or Stale flag.
In terms of priority, are any modifications to the street edge priority that need to be done to a street when any of these flags are set? If so, is that something that can be handled in this issue or something to be added as a separate issue?

Clustering
I'm kind of confused on what clustering does and what it is used for, could I get a short explanation on that to understand how to modify it?

Validate
In LabelTable.scala, at the function retrieveLabelListForValidation (line 545), adjust the query to include labels in streets with the Incomplete flag, and NOT include those in streets with the Stale or Low Quality flag.

Dashboard
In AuditTaskTable.scala in function selectStreetsWithAuditStatus (line 311), include streets with Incomplete, Stale, and Low Quality status.

Leaderboard
In UserStatTable.scala, in the updateUserStatTable function (line 154), whenever a flag is applied modify the stats to decrement the stats based on the flagged streets.

In terms of modifying the database, I was thinking we add new columns for each flag in the street edge table. Any thoughts on this set of changes?

@misaugstad
Copy link
Member Author

In terms of priority, are any modifications to the street edge priority that need to be done to a street when any of these flags are set? If so, is that something that can be handled in this issue or something to be added as a separate issue?

For auditing, I believe that updating the street "priority" should hopefully be the only thing that needs to change. Right now we essentially have two types of audits, either from high quality users or from low quality users; these two types influence street priority differently, with high quality user audits lowering the priority more.

To try and keep things simple, let's start by just treating "low quality", "incomplete", and "stale" audits the same way as we currently treat audits from users marked as low quality.

modify the "possibletasks" query to include streets with an Incomplete, Low Quality, or Stale flag

Given that we should be doing everything through "priority", I don't think you'll need to make any changes to this code.

I'm kind of confused on what clustering does and what it is used for, could I get a short explanation on that to understand how to modify it?

The goal of our clustering is to create a dataset that we can provide through our APIs that don't have duplicates. Since the same user can label the same thing from multiple angles, and multiple users can also label the same thing if they go down the same street, there are a lot of duplicated labels in our dataset. We employ clustering to group all the nearby labels into a single label "cluster". We then output those clusters through our APIs so that someone can get a list of all the curb ramps in a city without duplicates, for example.

And so your job is to add some filters so that we exclude labels that are likely to be incorrect from this dataset!

Validate
In LabelTable.scala, at the function retrieveLabelListForValidation (line 545), adjust the query to include labels in streets with the Incomplete flag, and NOT include those in streets with the Stale or Low Quality flag.

Sounds good!

Dashboard
In AuditTaskTable.scala in function selectStreetsWithAuditStatus (line 311), include streets with Incomplete, Stale, and Low Quality status.

Just looked at this code. Despite being called from UserProfileController.scala, it looks like this is used to get the streets for LabelMap/RouteBuilder/Admin page, and not for the user dashboard!

The wording is also a bit confusing in the names of variables and in the comments. It isn't actually filtering out low quality data right now; it's returning all the streets, but if audits are low quality then we are saying that the street is "unaudited".

This is done with the line that says if _ut.highQuality. Here, I would treat "low quality" and "incomplete" audits the same way, but we can ignore "stale" ones for now. So it would probably then look something like
if _ut.highQuality || !_ct.lowQuality || !_ct.incomplete

Leaderboard
In UserStatTable.scala, in the updateUserStatTable function (line 154), whenever a flag is applied modify the stats to decrement the stats based on the flagged streets.

I think you're actually looking for the getLeaderboardStats function at line 346.

In terms of modifying the database, I was thinking we add new columns for each flag in the street edge table.

You don't want to edit the street_edge table, since that isn't user-specific. You want to edit the audit_task table, which represents a user's audit of a street!

@davphan
Copy link
Collaborator

davphan commented Jan 7, 2024

For validation of incomplete streets, do any modifications to the query in LabelTable.scala, at the function retrieveLabelListForValidation (line 545) need to be made? From my understanding, the label will still be included with or without the incomplete label, so it doesn't need to explicitly be included. Should the priority for labels incomplete streets be adjusted at all, or just leave it as is?

@misaugstad
Copy link
Member Author

I think we can leave as-is for now, and yes they'll be included unless you make a change there!

@davphan
Copy link
Collaborator

davphan commented Jan 16, 2024

I've been thinking about how to implement the admin controls for setting the "low_quality" and "incomplete" flags, how does something like this seem?

image

After clicking on a user's username from the Admin page, you're brought to a page on the user's stats. Scrolling down to the "Audited Streets" section, each street would have a dropdown to mark that street for the user as "incomplete" or not. Below the table, a text input or calendar dropdown of some find could be used to select a date for all streets before that date to be marked "low_quality."

Does this seem like a good direction for how to implement this in the admin page?

Also, for the current low quality flag that's currently applied to users, is this automatically calculated and applied or can admins also set this flag for users?

@misaugstad
Copy link
Member Author

After clicking on a user's username from the Admin page, you're brought to a page on the user's stats. Scrolling down to the "Audited Streets" section, each street would have a dropdown to mark that street for the user as "incomplete" or not.

I think that the main issue with this is that it's hard to tie the entries in that table to anything we actually use to evaluate users' work; honestly, that table is pretty useless right now :) We actually evaluate users' work by looking at their labels on the map. So I think that at the bottom of the popup that shows up for a label, the line about the task ID could allow you to mark that task as incomplete or low quality.
Screenshot from 2024-01-16 13-48-46

Below the table, a text input or calendar dropdown of some find could be used to select a date for all streets before that date to be marked "low_quality."

I like this! I'm assuming that it will be easy enough to put a calendar entry there, and we can default to the current time and date. Well, actually ideally I'd like to show the current status, and then if you try to edit it, the current date/time is easiest to update it to.

And by "current status" I mean that if their first 5 tasks are marked as "low quality" (but they've done more tasks after those 5), then we show the date/time of the 5th task! Let me know if that doesn't make sense 😁

for the current low quality flag that's currently applied to users, is this automatically calculated and applied or can admins also set this flag for users?

The high_quality flag is set automatically. But there is also a high_quality_manual flag that, as of now, only I can set by updating that column in the database. Once per day the high_quality column is updated automatically, and if high_quality_manual is set, it overrides whatever would normally be used there. I'd be open to this being available for admins to do more broadly, but it isn't a requirement.

@misaugstad
Copy link
Member Author

I think that at the bottom of the popup that shows up for a label, the line about the task ID could allow you to mark that task as incomplete or low quality.

Oooo, actually there is the issue of streets where the user didn't place any labels at all! So it would be better for this to be an option that you are given when you click on the actual street on the map! I'm actually refactoring all the code for the maps as we speak... So if you want to leave that part for later, that might ease the issues of merge conflicts between our branches. But if that's where you're at, feel free to implement. It won't be that bad to merge!

@davphan
Copy link
Collaborator

davphan commented Jan 17, 2024

So I think that at the bottom of the popup that shows up for a label, the line about the task ID could allow you to mark that task as incomplete or low quality.

I like this! This idea would allow you to mark just that user's task on that street as incomplete/low quality.

So it would be better for this to be an option that you are given when you click on the actual street on the map!

However, with this option it would mark all user's audits on that street as low quality/incomplete. Is the goal to make all user's audits on that street low quality/incomplete, or are we aiming to mark only specific user tasks on a street? Also, I believe when you click on the actual street, it redirects you to audit that street.

I think that the main issue with this is that it's hard to tie the entries in that table to anything we actually use to evaluate users' work; honestly, that table is pretty useless right now :)

Since no one uses that table, where would be a better place to put the calendar entry?

We could also implement a mixture of everything to give more options for users to apply flags? Something like:

  1. A calendar entry for "low_quality" flags on a user
  2. A "Mark Task as Incomplete" and "Mark Task as Low Quality" button on the task ID line when clicking a label
    image
  3. A popup/modal when clicking on a street, which contains a list of all user's tasks on that street and "Mark Task as Incomplete" and "Mark Task as Low Quality" buttons. Also, an "Audit This Street" button (to preserve the current functionality), and a "Mark All Tasks as Incomplete" and "Mark All Tasks as Low Quality" button to mark all users on that street with a flag.
    image

How does this look?

@misaugstad
Copy link
Member Author

However, with this option it would mark all user's audits on that street as low quality/incomplete. Is the goal to make all user's audits on that street low quality/incomplete, or are we aiming to mark only specific user tasks on a street?

I think that it's okay to make all of their tasks for a street the same. In general, users won't be completing more than one task on the same street. It should be very rare. And If you are looking at their labels and think that their work on the street is lacking, then generally it's a combination of all of their audits of that street.

Also, I believe when you click on the actual street, it redirects you to audit that street.

It shows a popup giving you that option right now. We made that for LabelMap and just brought it over to this map since we are reusing the code. We could either replace it with the option to mark the street as low quality / incomplete on this map, or we could just add it on so that you have the option to do both!

Since no one uses that table, where would be a better place to put the calendar entry?

Oops, I definitely meant to say something on this earlier! I think that right above the "Labels" section is probably good. Everything below that summary stats table is almost never looked at. At some point I want to just combine this code with the User Dashboard code so that we can just see what the user would normally see, plus some extra stuff #3070. Out of scope for this PR, but just letting you know what's in the plans!

We could also implement a mixture of everything to give more options for users to apply flags?
A calendar entry for "low_quality" flags on a user

I like it! Remind me why we wouldn't also have such an option for marking them as incomplete?

A popup/modal when clicking on a street, which contains a list of all user's tasks on that street and "Mark Task as Incomplete" and "Mark Task as Low Quality" buttons. ... a "Mark All Tasks as Incomplete" and "Mark All Tasks as Low Quality" button to mark all users on that street with a flag.

In general, we think of a user auditing a street as a single task. If they have multiple tasks for the street, give them all the same marking. No need to over-complicate things by making admins keep track of multiple tasks for a single street. So just the two buttons will suffice, no need to even use the word "all" in the language.

an "Audit This Street" button (to preserve the current functionality)

Yep!

A "Mark Task as Incomplete" and "Mark Task as Low Quality" button on the task ID line when clicking a label

Yep! And this should mark all of a user's tasks on the street as such.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
2 participants