Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Simplify report views #1872

Merged
merged 2 commits into from Apr 28, 2023
Merged

Simplify report views #1872

merged 2 commits into from Apr 28, 2023

Conversation

krysal
Copy link
Member

@krysal krysal commented Apr 21, 2023

Fixes

Fixes #1037 by @obulat

Description

Pass the media identifier directly instead of using the method serializer. This update saves two db statements per report. One query and one update were both redundant.

Testing Instructions

  1. (optional) On main branch run just api/recreate
  2. Set DJANGO_DB_LOGGING=True on your .env file
  3. Restart the service to apply the change and see the logs
just down && just api/up && just logs web
  1. Report on any image, e.g.:
curl --request POST \
  --url http://localhost:50280/v1/images/<uuid>/report/ \
  --header 'Content-Type: application/json' \
  --data '{
	"reason": "mature",
	"description": "Image contains sensitive content"
}'
  1. Notice there are similar SELECT queries and an UPDATE statement right after the INSERT for the new report
openverse-web-1  | [2023-04-27 17:54:56,395 - django.db.backends - 131][DEBUG] [dcca485c44bf4916a23512d241c24379] (0.009) SELECT "image"."id", "image"."created_on", "image"."updated_on", "image"."identifier", "image"."foreign_identifier", "image"."title", "image"."foreign_landing_url", "image"."creator", "image"."creator_url", "image"."thumbnail", "image"."provider", "image"."url", "image"."filesize", "image"."filetype", "image"."watermarked", "image"."license", "image"."license_version", "image"."source", "image"."last_synced_with_source", "image"."removed_from_source", "image"."view_count", "image"."tags", "image"."category", "image"."meta_data", "image"."width", "image"."height", "api_matureimage"."created_on", "api_matureimage"."identifier" FROM "image" LEFT OUTER JOIN "api_matureimage" ON ("image"."identifier" = "api_matureimage"."identifier") WHERE "image"."identifier" = '3d6be27a03b64f138d7c1cdad8f0d9e9'::uuid LIMIT 21; args=(UUID('3d6be27a-03b6-4f13-8d7c-1cdad8f0d9e9'),); alias=default
openverse-web-1  | [2023-04-27 17:54:56,397 - django.db.backends - 131][DEBUG] [dcca485c44bf4916a23512d241c24379] (0.001) SELECT "image"."id", "image"."created_on", "image"."updated_on", "image"."identifier", "image"."foreign_identifier", "image"."title", "image"."foreign_landing_url", "image"."creator", "image"."creator_url", "image"."thumbnail", "image"."provider", "image"."url", "image"."filesize", "image"."filetype", "image"."watermarked", "image"."license", "image"."license_version", "image"."source", "image"."last_synced_with_source", "image"."removed_from_source", "image"."view_count", "image"."tags", "image"."category", "image"."meta_data", "image"."width", "image"."height" FROM "image" WHERE "image"."identifier" = '3d6be27a03b64f138d7c1cdad8f0d9e9'::uuid LIMIT 21; args=(UUID('3d6be27a-03b6-4f13-8d7c-1cdad8f0d9e9'),); alias=default
openverse-web-1  | [2023-04-27 17:54:56,399 - django.db.backends - 131][DEBUG] [dcca485c44bf4916a23512d241c24379] (0.000) SELECT 1 AS "a" FROM "image" WHERE "image"."identifier" = '3d6be27a03b64f138d7c1cdad8f0d9e9'::uuid LIMIT 1; args=(Int4(1), UUID('3d6be27a-03b6-4f13-8d7c-1cdad8f0d9e9')); alias=default
openverse-web-1  | [2023-04-27 17:54:56,403 - django.db.backends - 131][DEBUG] [dcca485c44bf4916a23512d241c24379] (0.004) INSERT INTO "nsfw_reports" ("created_at", "reason", "description", "status", "identifier") VALUES ('2023-04-27 17:54:56.399288+00:00'::timestamptz, 'mature', 'Image contains sensitive content', 'pending_review', '3d6be27a03b64f138d7c1cdad8f0d9e9'::uuid) RETURNING "nsfw_reports"."id"; args=(datetime.datetime(2023, 4, 27, 17, 54, 56, 399288, tzinfo=datetime.timezone.utc), 'mature', 'Image contains sensitive content', 'pending_review', UUID('3d6be27a-03b6-4f13-8d7c-1cdad8f0d9e9')); alias=default
openverse-web-1  | [2023-04-27 17:54:56,405 - django.db.backends - 131][DEBUG] [dcca485c44bf4916a23512d241c24379] (0.002) UPDATE "nsfw_reports" SET "status" = 'pending_review' WHERE ("nsfw_reports"."identifier" = '3d6be27a03b64f138d7c1cdad8f0d9e9'::uuid AND "nsfw_reports"."status" = 'pending_review' AND "nsfw_reports"."reason" = 'mature'); args=('pending_review', UUID('3d6be27a-03b6-4f13-8d7c-1cdad8f0d9e9'), 'pending_review', 'mature'); alias=default
openverse-web-1  | [2023-04-27 17:54:56,406 - log_request_id.middleware -  55][INFO] [dcca485c44bf4916a23512d241c24379] method=POST path=/v1/images/3d6be27a-03b6-4f13-8d7c-1cdad8f0d9e9/report/ status=201
openverse-web-1  | [27/Apr/2023 17:54:56] "POST /v1/images/3d6be27a-03b6-4f13-8d7c-1cdad8f0d9e9/report/ HTTP/1.1" 201 120
  1. Now check this branch and create another report
  2. Look at the logs and observe the DB statements are reduced.
  3. Verify on the Django admin that reports are created http://localhost:50280/admin/api/imagereport/

Checklist

  • My pull request has a descriptive title (not a vague title likeUpdate index.md).
  • My pull request targets the default branch of the repository (main) or a parent feature branch.
  • My commit messages follow best practices.
  • My code follows the established code style of the repository.
  • I added or updated tests for the changes I made (if applicable).
  • I added or updated documentation (if applicable).
  • I tried running the project locally and verified that there are no visible errors.
  • I ran the DAG documentation generator (if applicable).

Developer Certificate of Origin

Developer Certificate of Origin
Developer Certificate of Origin
Version 1.1

Copyright (C) 2004, 2006 The Linux Foundation and its contributors.
1 Letterman Drive
Suite D4700
San Francisco, CA, 94129

Everyone is permitted to copy and distribute verbatim copies of this
license document, but changing it is not allowed.


Developer's Certificate of Origin 1.1

By making a contribution to this project, I certify that:

(a) The contribution was created in whole or in part by me and I
    have the right to submit it under the open source license
    indicated in the file; or

(b) The contribution is based upon previous work that, to the best
    of my knowledge, is covered under an appropriate open source
    license and I have the right under that license to submit that
    work with modifications, whether created in whole or in part
    by me, under the same open source license (unless I am
    permitted to submit under a different license), as indicated
    in the file; or

(c) The contribution was provided directly to me by some other
    person who certified (a), (b) or (c) and I have not modified
    it.

(d) I understand and agree that this project and the contribution
    are public and that a record of the contribution (including all
    personal information I submit with it, including my sign-off) is
    maintained indefinitely and may be redistributed consistent with
    this project or the open source license(s) involved.

@openverse-bot openverse-bot added this to In progress in Openverse PRs Apr 21, 2023
@github-actions github-actions bot added the 🧱 stack: api Related to the Django API label Apr 21, 2023
@zackkrida
Copy link
Member

@krysal limit 21 is a default for queries made by Django, I came across this while debugging that perf issue last month!

@krysal
Copy link
Member Author

krysal commented Apr 21, 2023

It seemed like that. Thank you, Zack! Found a reference here:

https://github.com/django/django/blob/92537e83c1322c40dd39a8f0f9c78018307f2102/django/db/models/query.py#L39

@krysal krysal added 🟨 priority: medium Not blocking but should be addressed soon 💻 aspect: code Concerns the software code in the repository 🧰 goal: internal improvement Improvement that benefits maintainers, not users labels Apr 21, 2023
@krysal krysal marked this pull request as ready for review April 27, 2023 18:03
@krysal krysal requested a review from a team as a code owner April 27, 2023 18:03
@openverse-bot openverse-bot moved this from In progress to Needs review in Openverse PRs Apr 27, 2023
Copy link
Contributor

@stacimc stacimc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tests great and the changes look good to me!

Thanks for the excellent testing instructions. I tried making a report on main and saw three SELECTs, an INSERT, and an UPDATE. On the branch I saw that one of the SELECTs and the redundant UPDATE were both eliminated. LGTM!


serializer = self.get_serializer(report)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand what was going on in the original code here 🤔 But the changes look good to me!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It could be the original author assumed that the serializer's data wouldn't be updated if the underlying object changed during save? The only reason I could think of for this is if one of the fields on the response was something that could be changed by saving, but none are:

fields = ["identifier", "reason", "description"]

Anyway, that's just a guess. I'm not sure what Django's behaviour is when saving the serializer changes the underlying data (due to the model's save method making changes), but in this case it definitely does not appear to matter.

Copy link
Contributor

@sarayourfriend sarayourfriend left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cool!

Comment on lines +244 to +247

# Prevent redundant update statement when creating the report
if self.status != PENDING:
same_reports.update(status=self.status)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe this would be clearer if the status check was before the query on line 238 and caused an early return? Something like:

if self.status == PENDING:
    return

Only because then it is quite clear that the intention is that pending reports should have no effect on other reports.


serializer = self.get_serializer(report)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It could be the original author assumed that the serializer's data wouldn't be updated if the underlying object changed during save? The only reason I could think of for this is if one of the fields on the response was something that could be changed by saving, but none are:

fields = ["identifier", "reason", "description"]

Anyway, that's just a guess. I'm not sure what Django's behaviour is when saving the serializer changes the underlying data (due to the model's save method making changes), but in this case it definitely does not appear to matter.

Openverse PRs automation moved this from Needs review to Reviewer approved Apr 27, 2023
@sarayourfriend
Copy link
Contributor

sarayourfriend commented Apr 28, 2023

@krysal I'm merging this to test the DAG sync script fix. I'll test this in staging to confirm things are working as expected after it deploys 👍

@sarayourfriend sarayourfriend merged commit 59a4411 into main Apr 28, 2023
42 checks passed
@sarayourfriend sarayourfriend deleted the improv/report_endpoints branch April 28, 2023 01:38
Openverse PRs automation moved this from Reviewer approved to Merged! Apr 28, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
💻 aspect: code Concerns the software code in the repository 🧰 goal: internal improvement Improvement that benefits maintainers, not users 🟨 priority: medium Not blocking but should be addressed soon 🧱 stack: api Related to the Django API
Projects
No open projects
Openverse PRs
  
Merged!
Development

Successfully merging this pull request may close these issues.

Report view makes unnecessary db queries
4 participants