Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add in total auditing time to admin dashboard #2903

Closed
jonfroehlich opened this issue May 19, 2022 · 5 comments · Fixed by #3563
Closed

Add in total auditing time to admin dashboard #2903

jonfroehlich opened this issue May 19, 2022 · 5 comments · Fixed by #3563
Assignees

Comments

@jonfroehlich
Copy link
Member

Total auditing time:
image

It's been increasingly important to have stats about total auditing time. Here are some timing stats that would be useful:

  • Overall labeling time across users
  • Overall validation time across users
  • Avg auditing time / 100m (for this, compute avg auditing time / 100m for every user then take the average of this list)
@misaugstad
Copy link
Member

Related to #985

I think that this would be too heavy of a lift, given how slow it is to query the audit_task_interaction table. In fact, we used to have charts that included auditing time for users, but we removed them for this reason.

I added working time into the admin dashboard for individual users because the query generally won't take more than one minute for a single user.

Validation time would be doable though. That query doesn't take too long, since we can just look at the timestamps for when validations are added because the time between validations is so small. We could use this method for auditing time as well, but I would want to think of that only as a rough estimate since there are generally longer pauses between adding labels.

@misaugstad
Copy link
Member

Another idea is to compute auditing time for users nightly, and only update it for users who had audited since the last time it was computed. That way we have all those values stored and could just sum them up when loading the admin page.

@misaugstad
Copy link
Member

I recently moved all the interactions that we care about to a smaller table that is a copy of the audit_task_interaction table. This means that our queries that were taking 40+ seconds to run now take less than a second. I think that we should be able to compute this data for the entire city now by using this smaller table!

@JeremyFreiburger you'll see an example of how to do this for one user through the /timeCheck page. We would just need to do it for all users. I'm also including a query that I wrote recently to get something similar, in case this code is helpful to you at all!

SELECT COUNT(*) AS n, avg(min_per_mi) AS "minutes_per_mi (mean)", percentile_cont(0.5) WITHIN GROUP (ORDER BY min_per_mi) AS "minutes_per_mi (median)",
       MIN(min_per_mi) AS "minutes_per_mi (min)", MAX(min_per_mi) AS "minutes_per_mi (max)", stddev(min_per_mi) AS "minutes_per_mi (sd)"
FROM (
    SELECT user_stat.user_id, username, minutes_audited / (meters_audited / 1609.34) AS min_per_mi, minutes_audited, meters_audited,
           meters_audited / 1609.34 AS miles, labels_per_meter * meters_audited AS label_count, high_quality, role_id
    FROM user_stat
    INNER JOIN sidewalk_user ON user_stat.user_id = sidewalk_user.user_id
    INNER JOIN user_role ON sidewalk_user.user_id = user_role.user_id
    INNER JOIN ( 
        SELECT user_id,
               CAST(extract( second from SUM(diff) ) / 60 +
                    extract( minute from SUM(diff) ) +
                    extract( hour from SUM(diff) ) * 60 AS decimal(10,2)) AS minutes_audited
        FROM (
            SELECT user_id, (timestamp - LAG(timestamp, 1) OVER(PARTITION BY user_id ORDER BY timestamp)) AS diff
            FROM (
                SELECT mission.user_id, timestamp
                FROM audit_task_interaction_small
                INNER JOIN mission ON audit_task_interaction_small.mission_id = mission.mission_id
                WHERE mission_type_id <> 1 -- exclude tutorials
                    AND action IN ('ViewControl_MouseDown', 'LabelingCanvas_MouseDown', 'NextSlideButton_Click', 'PreviousSlideButton_Click')
                UNION
                SELECT webpage_activity.user_id, timestamp
                FROM webpage_activity
                WHERE (
                        activity LIKE 'Visit_Labeling_Guide%'
                        OR activity = 'Visit_UserDashboard'
                        OR activity = 'Visit_Help'
                    )
            )"timestamps"
        ) "time_diffs"
        WHERE diff < '00:00:30.000' AND diff > '00:00:00.000'
        GROUP BY user_id
    ) "audit_times" ON user_stat.user_id = audit_times.user_id
    WHERE meters_audited > 100
        AND high_quality = TRUE
        AND role_id = 1
        AND minutes_audited > 30
);

@misaugstad
Copy link
Member

  • Overall labeling time across users
  • Overall validation time across users
  • Avg auditing time / 100m (for this, compute avg auditing time / 100m for every user then take the average of this list)

And I agree with @jonfroehlich that we should include all of these! When doing the 3rd bullet, let's filter for users that have at least 30 minutes of auditing and have audited at least 100 meters (similar to the query above). And then the median is probably the metric we care about here.

We don't really have to do this filtering when computing total auditing/validating time though.

@JeremyFreiburger
Copy link
Collaborator

Also adding Time for past week from screenshot.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
3 participants