Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add API schema tests using schemathesis #3444

Merged
merged 18 commits into from Feb 29, 2024
Merged

Add API schema tests using schemathesis #3444

merged 18 commits into from Feb 29, 2024

Conversation

obulat
Copy link
Contributor

@obulat obulat commented Dec 3, 2023

Fixes

Fixes #3446 by @obulat

Description

This PR adds the schemathesis step to the API testing in the CI.
The checks run for 2 flags: response_headers_conformance and not_a_server_error. I changed the related endpoint to fix the not_a_server_error problem.

The Schemathesis docs say that running the tests as part of the Pytest test suite is possible, but I couldn't get it to run, so I added the CLI. I would be happy if someone could find a way to run the tests as part of the test suite.
This PR adds Schemathesis tests running as part of the pytest suite (thanks to @sarayourfriend ). It checkes that all of the response codes are documented in the openapi schema correctly.

The code is also updated to make the tests pass.

Testing Instructions

Check out this PR and run just api/test. The tests should run the schema tests and should pass.

If you revert the changes to the related function and run just api/test-schema, you should see the following output:
This is for the first implementation of this as part of the just script in CI.

Schemathesis output ``` ============================================================ Schemathesis test session starts =========================================================== Schema location: file:///Users/obulat/Documents/openverse/openverse/api/openapi.json Base URL: http://localhost:50280 Specification version: Open API 3.0.3 Workers: 1 Collected API operations: 24

GET /v1/audio/ . [ 4%]
GET /v1/audio/{identifier}/ . [ 8%]
GET /v1/audio/{identifier}/related/ . [ 12%]
POST /v1/audio/{identifier}/report/ . [ 16%]
GET /v1/audio/{identifier}/thumb/ . [ 20%]
GET /v1/audio/{identifier}/waveform/ . [ 25%]
GET /v1/audio/source/{source}/ . [ 29%]
GET /v1/audio/source/{source}/creator/{creator}/ . [ 33%]
GET /v1/audio/stats/ . [ 37%]
GET /v1/audio/tag/{tag}/ . [ 41%]
POST /v1/auth_tokens/register/ . [ 45%]
POST /v1/auth_tokens/token/ . [ 50%]
GET /v1/images/ . [ 54%]
GET /v1/images/{identifier}/ . [ 58%]
GET /v1/images/{identifier}/related/ . [ 62%]
POST /v1/images/{identifier}/report/ . [ 66%]
GET /v1/images/{identifier}/thumb/ . [ 70%]
GET /v1/images/{identifier}/watermark/ . [ 75%]
GET /v1/images/oembed/ . [ 79%]
GET /v1/images/source/{source}/ . [ 83%]
GET /v1/images/source/{source}/creator/{creator}/ . [ 87%]
GET /v1/images/stats/ . [ 91%]
GET /v1/images/tag/{tag}/ . [ 95%]
GET /v1/rate_limit/ . [100%]

======================================================================== SUMMARY ========================================================================

Performed checks:
response_headers_conformance 2103 / 2103 passed PASSED

WARNINGS:

  • Most of the responses from POST /v1/auth_tokens/token/ have a 401 status code. Did you specify proper API credentials?
  • Most of the responses from GET /v1/rate_limit/ have a 403 status code. Did you specify proper API credentials?

Tip: Use the --report CLI option to visualize test results via Schemathesis.io.
We run additional conformance checks on reports from public repos.

================================================================== 24 passed in 29.73s ==================================================================
============================================================ Schemathesis test session starts ===========================================================
Schema location: file:///Users/obulat/Documents/openverse/openverse/api/openapi.json
Base URL: http://localhost:50280
Specification version: Open API 3.0.3
Workers: 1
Collected API operations: 24

GET /v1/audio/ . [ 4%]
GET /v1/audio/{identifier}/ . [ 8%]
GET /v1/audio/{identifier}/related/ F [ 12%]
POST /v1/audio/{identifier}/report/ . [ 16%]
GET /v1/audio/{identifier}/thumb/ . [ 20%]
GET /v1/audio/{identifier}/waveform/ . [ 25%]
GET /v1/audio/source/{source}/ . [ 29%]
GET /v1/audio/source/{source}/creator/{creator}/ . [ 33%]
GET /v1/audio/stats/ . [ 37%]
GET /v1/audio/tag/{tag}/ . [ 41%]
POST /v1/auth_tokens/register/ . [ 45%]
POST /v1/auth_tokens/token/ . [ 50%]
GET /v1/images/ . [ 54%]
GET /v1/images/{identifier}/ . [ 58%]
GET /v1/images/{identifier}/related/ F [ 62%]
POST /v1/images/{identifier}/report/ . [ 66%]
GET /v1/images/{identifier}/thumb/ . [ 70%]
GET /v1/images/{identifier}/watermark/ . [ 75%]
GET /v1/images/oembed/ . [ 79%]
GET /v1/images/source/{source}/ . [ 83%]
GET /v1/images/source/{source}/creator/{creator}/ . [ 87%]
GET /v1/images/stats/ . [ 91%]
GET /v1/images/tag/{tag}/ . [ 95%]
GET /v1/rate_limit/ . [100%]

======================================================================== FAILURES =======================================================================
__________________________________________________________ GET /v1/audio/{identifier}/related/ __________________________________________________________

  1. Test Case ID: 1IqpOZ
  • Server error

[500] Internal Server Error:

`{"detail":"Could not find items."}`

Reproduce with:

curl -X GET http://localhost:50280/v1/audio/00000000-0000-4000-8000-000000000000/related/

Or add this option to your command line parameters: --hypothesis-seed=110085890912020084465361027240064100643
__________________________________________________________ GET /v1/images/{identifier}/related/ _________________________________________________________

  1. Test Case ID: sLiRsm
  • Server error

[500] Internal Server Error:

`{"detail":"Could not find items."}`

Reproduce with:

curl -X GET http://localhost:50280/v1/images/00000000-0000-4000-8000-000000000000/related/

Or add this option to your command line parameters: --hypothesis-seed=121590553949486563256538290782721953302
======================================================================== SUMMARY ========================================================================

Performed checks:
not_a_server_error 1903 / 1909 passed FAILED

WARNINGS:

  • Most of the responses from POST /v1/auth_tokens/token/ have a 401 status code. Did you specify proper API credentials?
  • Most of the responses from GET /v1/rate_limit/ have a 403 status code. Did you specify proper API credentials?
</details>

## Checklist
<!-- Replace  the [ ] with [x] to check the boxes. -->

- [ ] My pull request has a descriptive title (not a vague title like`Update index.md`).
- [ ] My pull request targets the _default_ branch of the repository (`main`) or a parent feature branch.
- [ ] My commit messages follow [best practices][best_practices].
- [ ] My code follows the established code style of the repository.
- [ ] I added or updated tests for the changes I made (if applicable).
- [ ] I added or updated documentation (if applicable).
- [ ] I tried running the project locally and verified that there are no visible errors.
- [ ] I ran the DAG documentation generator (if applicable).

[best_practices]:
  https://git-scm.com/book/en/v2/Distributed-Git-Contributing-to-a-Project#_commit_guidelines

## Developer Certificate of Origin
<!-- You must read and understand the following attestation. -->

<details>
<summary>Developer Certificate of Origin</summary>

Developer Certificate of Origin
Version 1.1

Copyright (C) 2004, 2006 The Linux Foundation and its contributors.
1 Letterman Drive
Suite D4700
San Francisco, CA, 94129

Everyone is permitted to copy and distribute verbatim copies of this
license document, but changing it is not allowed.

Developer's Certificate of Origin 1.1

By making a contribution to this project, I certify that:

(a) The contribution was created in whole or in part by me and I
have the right to submit it under the open source license
indicated in the file; or

(b) The contribution is based upon previous work that, to the best
of my knowledge, is covered under an appropriate open source
license and I have the right under that license to submit that
work with modifications, whether created in whole or in part
by me, under the same open source license (unless I am
permitted to submit under a different license), as indicated
in the file; or

(c) The contribution was provided directly to me by some other
person who certified (a), (b) or (c) and I have not modified
it.

(d) I understand and agree that this project and the contribution
are public and that a record of the contribution (including all
personal information I submit with it, including my sign-off) is
maintained indefinitely and may be redistributed consistent with
this project or the open source license(s) involved.


</details>

@github-actions github-actions bot added 🧱 stack: api Related to the Django API 🧱 stack: mgmt Related to repo management and automations labels Dec 3, 2023
@openverse-bot openverse-bot added the 🚦 status: awaiting triage Has not been triaged & therefore, not ready for work label Dec 3, 2023
@obulat obulat changed the title Add schemathesis tests Add API schema tests using schemathesis Dec 3, 2023
@obulat obulat added 🌟 goal: addition Addition of new feature 💻 aspect: code Concerns the software code in the repository 🟨 priority: medium Not blocking but should be addressed soon and removed 🚦 status: awaiting triage Has not been triaged & therefore, not ready for work labels Dec 3, 2023
@obulat obulat force-pushed the add/schemathesis branch 3 times, most recently from 4914336 to d8df0bb Compare December 3, 2023 18:01
@obulat obulat marked this pull request as ready for review December 3, 2023 18:10
@obulat obulat requested review from a team as code owners December 3, 2023 18:10
@obulat
Copy link
Contributor Author

obulat commented Dec 3, 2023

It seems that the tests change between the runs because oembed 500 error occurred only in a CI run

@obulat obulat force-pushed the add/schemathesis branch 2 times, most recently from 5242a61 to 2788980 Compare December 7, 2023 04:35
Copy link
Member

@dhruvkb dhruvkb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The changes to the API code LGTM! But I have some notes about improvements to the execution of the test and provisioning of schemathesis.

api/justfile Outdated
Comment on lines 125 to 132
test-schema: wait-up
just doc-test
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

doc-test can be changed to a pre-requisite recipe here because we depend on the output of that recipe openapi.json to run this step.

Suggested change
test-schema: wait-up
just doc-test
test-schema: wait-up doc-test

api/justfile Outdated
# Run the API schema tests using schemathesis
test-schema: wait-up
just doc-test
python -m pip install schemathesis && st run --checks response_headers_conformance openapi.json --base-url http://localhost:50280 && st run --checks not_a_server_error openapi.json --base-url http://localhost:50280
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This step seems installs the schemathesis package on the host Python environment, which is not ideal. It's also not guaranteed to work on macOS for example, where there is no unversioned python (but python3 is present across all platforms afaik).

Screenshot 2023-12-11 at 7 43 24 AM

The Getting Started guide lists a bunch of ways to run this, including a Docker image. That seems like a more cross-platform way to run the check that does not affect the host.

Comment on lines 407 to 437
- name: Run API schema tests
run: just api/test-schema
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This would fit better as a matrix entry in the django-checks job below, as those are all checks based on just recipes. Additionally the recipe this depends on i.e. doc-test is also a job in that matrix.

@AetherUnbound
Copy link
Contributor

Drafting this until feedback can be addressed

@AetherUnbound AetherUnbound marked this pull request as draft December 15, 2023 01:15
@sarayourfriend sarayourfriend force-pushed the add/schemathesis branch 2 times, most recently from cb53010 to affc501 Compare January 11, 2024 23:09
@sarayourfriend
Copy link
Contributor

@obulat I'm playing with this PR locally right now. I pushed a change to run the tests in Docker, but we can actually just integrate it into the API's pytest suite using their Python API. Trying that out now.

@sarayourfriend
Copy link
Contributor

Okay! All tests except the auth token ones are passing.

You can run the schemathesis tests with just api/test -k test_schema, because they're just part of the normal API test suite.

I was able to do some cleanup here of the error serializers, which aren't necessary, because drf-spectuacular understands DRF's exception classes. I found this because the validation error serializer did not match the schema of 400 responses, which schemathesis didn't like.

Right now I've had repeated failures with the creator path, because schemathesis keeps trying to pass the newline character in percent encoding to the creator path parameter. I'm not sure why it's doing that, or how to prevent. It also seems like it might be flaky.

@obulat
Copy link
Contributor Author

obulat commented Feb 12, 2024

@sarayourfriend, I really like having the tests run by pytest. However, having seen the amount of warnings of deprecations coming from them, I wonder if the pytest part of Schemathesis is not maintained as well as the main repository. Hopefully, it won't fall behind more, and not make us regret moving this test to pytest runner.

@sarayourfriend
Copy link
Contributor

I didn't notice the deprecation warnings 🤔

@AetherUnbound
Copy link
Contributor

Actually, it seems like when I run j api/test the test fails, but I can run j api/test -k test_media_integration and all tests pass 🤔 is anyone else seeing this?

@openverse-bot
Copy link
Collaborator

Based on the medium urgency of this PR, the following reviewers are being gently reminded to review this PR:

@dhruvkb
This reminder is being automatically generated due to the urgency configuration.

Excluding weekend1 days, this PR was ready for review 8 day(s) ago. PRs labelled with medium urgency are expected to be reviewed within 4 weekday(s)2.

@obulat, if this PR is not ready for a review, please draft it to prevent reviewers from getting further unnecessary pings.

Footnotes

  1. Specifically, Saturday and Sunday.

  2. For the purpose of these reminders we treat Monday - Friday as weekdays. Please note that the operation that generates these reminders runs at midnight UTC on Monday - Friday. This means that depending on your timezone, you may be pinged outside of the expected range.

@obulat
Copy link
Contributor Author

obulat commented Feb 28, 2024

@AetherUnbound, some tests rely on the sample data, and if your local sample data was changed in another PR (for instance, to add some special characters to the creator/tag values), the tests might fail until you update the indices.

Regardless, I was baffled that the testing code was so small 😮 🚀

Now I see why there's so much talk about property-based testing :)

Copy link
Member

@dhruvkb dhruvkb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM 🎉 for the most part but I have some questions and some concerns. Since the additional checks from schemathesis are good to have, I'm not blocking the PR.

@@ -11,3 +11,4 @@ docs/_build/
# OpenAPI
openapi.yaml
openapi.json
.hypothesis
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be in a separate Schemathesis block. Currently it looks like something from OpenAPI.

api/Pipfile Outdated
@@ -6,16 +6,17 @@ verify_ssl = true
[dev-packages]
factory-boy = "~=3.2"
fakeredis = "==2.19.0"
freezegun = "~=1.2.2"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for sorting this list. We usually pin dependencies only up to the minor version (same for schemathesis below).

@@ -114,6 +114,41 @@ class Meta:
AudioHyperlinksSerializer = get_hyperlinks_serializer("audio")


class AudioAltFileSerializer(serializers.Serializer):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this can be defined as a subclass of AudioSerializer with different Meta.fields. I think it's worth exploring that possibility to reduce the code as the fields will not have to be defined manually.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can't it just be a model serializer, with no new code needed at all? Otherwise we'd create a circular dependency on AudioSerializer and the serializer that subclasses, because AudioSerializer needs to reference it in the alt_files definition.

class AltAudioFileSerializer(serializers.ModelSerializer):
    class Meta:
        model = models.AltAudioFile

It might need fields, depending on needs, but it certainly doesn't need any more than this, I hope!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried fixing this in this PR, but got confused because making this a subclass of AudioSerializer would make it a recursive serializer. I'll open a new issue for this.

from rest_framework.serializers import ValidationError
from rest_framework.exceptions import ValidationError
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please correct me if I'm mistaken but I thought the practice was to raise ValidationError from rest_framework.serializers as per these docs.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

serializers.ValidationError is an alias for the exceptions.ValidationError. The only good reason to use serializers.ValidationError, as in the docs, is because the convention in serializer definition modules is to from rest_framework import serializers, and so it's convenient and consistent to address it from the alias in that module. There's no reason not to use the exceptions module, though, and I think it makes particular sense in a file that otherwise has nothing to do with serialization (it's named exceptions.py).

At least, that's my interpretation of why the docs are written that way, and why I would choose to write code one way or the other, but I also don't think there's a reason for/against always using one or the other, and there's nothing in the docs you linked that explicitly implies you should use one or the other, aside from the examples using the alias (I suspect for the reason above).

obulat and others added 17 commits February 29, 2024 08:13
Signed-off-by: Olga Bulat <obulat@gmail.com>
Signed-off-by: Olga Bulat <obulat@gmail.com>
Signed-off-by: Olga Bulat <obulat@gmail.com>
Signed-off-by: Olga Bulat <obulat@gmail.com>
Signed-off-by: Olga Bulat <obulat@gmail.com>
Signed-off-by: Olga Bulat <obulat@gmail.com>
Signed-off-by: Olga Bulat <obulat@gmail.com>
Signed-off-by: Olga Bulat <obulat@gmail.com>
Signed-off-by: Olga Bulat <obulat@gmail.com>
Signed-off-by: Olga Bulat <obulat@gmail.com>
Signed-off-by: Olga Bulat <obulat@gmail.com>
Signed-off-by: Olga Bulat <obulat@gmail.com>
Signed-off-by: Olga Bulat <obulat@gmail.com>
Signed-off-by: Olga Bulat <obulat@gmail.com>
@obulat obulat merged commit 2747615 into main Feb 29, 2024
41 checks passed
@obulat obulat deleted the add/schemathesis branch February 29, 2024 06:28
Copy link
Contributor

@sarayourfriend sarayourfriend left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@obulat I had some comments staged on this PR, including a tip to fix the issue with the circular serializer. Can you create an issue to follow up on that, to create a model serializer instead of manually defining the videos?

Comment on lines +170 to +173
try:
res = super().post(request._request)
except DataError:
raise InvalidCredentials()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This, on the other hand, would be unnecessary after #3663

@@ -114,6 +114,41 @@ class Meta:
AudioHyperlinksSerializer = get_hyperlinks_serializer("audio")


class AudioAltFileSerializer(serializers.Serializer):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can't it just be a model serializer, with no new code needed at all? Otherwise we'd create a circular dependency on AudioSerializer and the serializer that subclasses, because AudioSerializer needs to reference it in the alt_files definition.

class AltAudioFileSerializer(serializers.ModelSerializer):
    class Meta:
        model = models.AltAudioFile

It might need fields, depending on needs, but it certainly doesn't need any more than this, I hope!

@obulat
Copy link
Contributor Author

obulat commented Mar 19, 2024

@sarayourfriend, sorry I only now read your comments here.

I think we cannot use a model serializer because AltAudioFile is a child of AbstractAltFile class which is not a Django Model:

class AbstractAltFile:
"""
This is not a Django model.
This Python class serves as the schema for an alternative file. An alt file
provides alternative qualities, formats and resolutions that are available
from the provider that are not canonical.
The schema of the class must correspond to that of the
:py:class:`api.models.mixins.FileMixin` class.
"""

@dhruvkb, is there a reason why the AbstractAltFile is not based on the FileMixin and thus a subclass of Model?

@dhruvkb
Copy link
Member

dhruvkb commented Mar 19, 2024

If any class inherits from Model, Django will try to create a table (and migrations) for it, which we do not need for this class. That seems like the only explanation for this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
💻 aspect: code Concerns the software code in the repository 🌟 goal: addition Addition of new feature 🟨 priority: medium Not blocking but should be addressed soon 🧱 stack: api Related to the Django API 🧱 stack: mgmt Related to repo management and automations
Projects
Archived in project
Development

Successfully merging this pull request may close these issues.

Use property-based testing for the API
5 participants