Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Problem: Storage service File API endpoint response has changed in qa/1.x (returns "string type" for size not integer #1094

Closed
5 tasks
ross-spencer opened this issue Feb 10, 2020 · 9 comments
Labels
API Issues relating to the Archivematica APIs, bugs/new endpoints, etc. Status: review The issue's code has been merged and is ready for testing/review. Type: bug A flaw in the code that causes the software to produce an incorrect or unexpected result. Type: regression A bug that is a regression of supported functionality from previous releases
Milestone

Comments

@ross-spencer
Copy link
Contributor

Expected behaviour

API contracts are not broken between releases in Archivematica. The size field for information about an AIP should return an integer type.

Current behaviour

The size field returns a string-type.

This was introduced via artefactual/archivematica-storage-service@d54716c which on the face of it, does not seem to be a controversial change. In hindsight we should have considered it in greater detail, and was signaled with the change in Archivematica here: artefactual/archivematica@473bb9d#diff-2ffb0ea23203dd32f7be41aa919e77a7.

Steps to reproduce

Construct an API call that is being used by Archivematica (this uses HTTPie):

http -v --pretty=format GET "http://127.0.0.1:62081/api/v2/file/7aa7bb5e-ed3c-4049-83e9-94c445ffaa58/" \
    Authorization:"ApiKey test:test"

Observe the response:

HTTP/1.1 200 OK
Cache-Control: no-cache
Connection: keep-alive
Content-Language: en
Content-Type: application/json
Date: Mon, 10 Feb 2020 14:40:01 GMT
Server: nginx/1.16.1
Transfer-Encoding: chunked
Vary: Accept, Accept-Language, Cookie
X-Frame-Options: SAMEORIGIN

{
    "current_full_path": "/var/archivematica/sharedDirectory/www/AIPsStore/7aa7/bb5e/ed3c/4049/83e9/94c4/45ff/aa58/2-7aa7bb5e-ed3c-4049-83e9-94c445ffaa58.7z",
    "current_location": "/api/v2/location/dd63e694-8625-40ca-8c2e-6c3ddab547a8/",
    "current_path": "7aa7/bb5e/ed3c/4049/83e9/94c4/45ff/aa58/2-7aa7bb5e-ed3c-4049-83e9-94c445ffaa58.7z",
    "encrypted": false,
    "misc_attributes": {},
    "origin_pipeline": "/api/v2/pipeline/0e07d9f7-0d06-4b67-8dae-afc51eea4737/",
    "package_type": "AIP",
    "related_packages": [],
    "replicas": [],
    "replicated_package": null,
    "resource_uri": "/api/v2/file/7aa7bb5e-ed3c-4049-83e9-94c445ffaa58/",

    "size": "156730", <---- NB. This is our string response...
     ^^^^
    "status": "UPLOADED",
    "uuid": "7aa7bb5e-ed3c-4049-83e9-94c445ffaa58"
}

For the size field other consumers expect:

    "size": 156730, <---- NB. This is our previous integer response...
     ^^^^
    "status": "UPLOADED",
    "uuid": "7aa7bb5e-ed3c-4049-83e9-94c445ffaa58"
}

Which means currently this is broken for anyone who handles this response expecting an integer (as we previously did).

Your environment (version of Archivematica, operating system, other relevant details)

Docker compose running various commits of AM before and after the one indeitifed.

Additional context

  • The BigInteger field has a size of 9223372036854775807, via Django.
  • The integer field can represent values up to 2147483647. via Django.
  • I haven't found confirmation that integers in JSON have a concrete maximum/minimum size.
  • I haven't found confirmation about whether adopting the string type preserves the integrity of this field across 32-bit and 64-bit implementations of Python, which on the face of it seems likely, but it might not be.

In terms of fixing this, then:

  • there maybe some way to override the return. NB. does this affect integrity on other platforms.
  • we might otherwise need to roll the commit back.

Attaching the discussion label. cc. @@jorikvankemenade as well for opinions.


For Artefactual use:

Before you close this issue, you must check off the following:

  • All pull requests related to this issue are properly linked
  • All pull requests related to this issue have been merged
  • A testing plan for this issue has been implemented and passed (testing plan information should be included in the issue body or comments)
  • Documentation regarding this issue has been written and merged (if applicable)
  • Details about this issue have been added to the release notes (if applicable)
@ross-spencer ross-spencer added Type: regression A bug that is a regression of supported functionality from previous releases API Issues relating to the Archivematica APIs, bugs/new endpoints, etc. Type: bug A flaw in the code that causes the software to produce an incorrect or unexpected result. triage-release-1.11 Request: discussion The path towards resolving the issue is unclear and opinion is sought from other community members. labels Feb 10, 2020
@sallain sallain assigned sallain and jraddaoui and unassigned sallain Feb 10, 2020
@sallain sallain added this to the 1.11.0 milestone Feb 10, 2020
@sallain sallain added Status: ready The issue is sufficiently described/scoped to be picked up by a developer. and removed triage-release-1.11 labels Feb 10, 2020
@ross-spencer
Copy link
Contributor Author

Original work was related to #981

@jorikvankemenade
Copy link

I remember this one, sorry for not making my considerations more clear in an issue/the pr.

API contracts are not broken between releases in Archivematica. The size field for information about an AIP should return an integer type.

I partially agree. You should not break a contract unless you have a very good reason. And I would like to argue that we have a good reason here. Correctly handling and displaying the size of an AIP for me is stronger than an "undocumented expectation". Of course, this does not mean that we shouldn't try to make the API as intuitive and stable as possible.

As @replaceafill points out in #981, switching from MySQL 5.6 to 5.7 changes the default SQL mode which causes an error when inserting a value that is too large. However, in earlier versions of MySQL, the problem is there, you are just not notified during the insert. From my understanding, the Storage Service will move to a MySQL based database rather sooner than later. This would mean that rolling back this change, would result in ingest problems for reasonable transfers.

I must admit that I have been somewhat lazy. I have tried to see if I could find a way to force the type of the JSON response. I could not find an easy way to do it, and since I only broke one internal call I decided that it was simpler to fix rather than spending a lot of time on it.

So to summarise: yes we break the contract. However, given that we fix an actual bug over not breaking keeping an undocumented expectation I would like to argue that this is worth it. But maybe someone more experienced with Django and the API implementation in Archivematica could solve the new problem?

@jorikvankemenade
Copy link

I just checked #1507 and here I mentioned that the problem is caused by how the requests library the parsing of the resulting JSON. IIRC the JSON was unchanged, the problem is how the requests library decides to parse the JSON. Which I think is outside our control. But I might have missed something.

@ross-spencer
Copy link
Contributor Author

Perfect! Thank you @jorikvankemenade! All valid, and welcome points. Radda is going to take a look at this one and I'm sure the pragmatic path will be found. 🙂

@jraddaoui
Copy link

Thanks @ross-spencer and @jorikvankemenade, great report and comments!

I agree with Jorik that we should not revert the initial change but I'm having conflicts about keeping it as a string or force it as a number. I think we need to consider who is the bigger consumer of the affected endpoints and that's hard to know, at least for me.

As Ross said, it doesn't seem to be any limitation on the JSON number specification but it looks like there may be some issues on the Javascript side, for example. I also wonder if the value is being casted to an integer by the receiver, if that may cause an overflow error in some languages.

Considering that changes may be needed in the consumer either way (number or string) and that we'll need to "hack" django-tastypie's behavior to make it a number (something that they even have a PR from 2011 and never addressed it), I'd leave it as a string.


Some interesting Q&A:

@sevein
Copy link
Contributor

sevein commented Feb 18, 2020

I have tried to see if I could find a way to force the type of the JSON response.

This would do the trick:

diff --git a/storage_service/locations/api/resources.py b/storage_service/locations/api/resources.py
index 1a2da9a..cf91930 100644
--- a/storage_service/locations/api/resources.py
+++ b/storage_service/locations/api/resources.py
@@ -909,6 +909,12 @@ class PackageResource(ModelResource):
         bundle.data["encrypted"] = encrypted
         return bundle
 
+    def dehydrate_size(self, bundle):
+        try:
+            return int(bundle.data["size"])
+        except (ValueError, TypeError):
+            return -1  # Or pass if we don't want to populate.
+
     def hydrate_current_location(self, bundle):
         """Customize unserialization of current_location.

It's pretty easy and a common approach in Tastypie. Thoughts?

@jorikvankemenade
Copy link

jorikvankemenade commented Feb 18, 2020

@sevein that looks promising. I was unaware of this feature of Django/Tastypie, thanks for sharing :). For anyone interested in what the hydrate/dehydrate concepts are, check out the Tastypie documentation.

@jraddaoui jraddaoui assigned sevein and unassigned jraddaoui Feb 18, 2020
@sromkey sromkey added Status: in progress Issue that is currently being worked on. and removed Status: ready The issue is sufficiently described/scoped to be picked up by a developer. labels Feb 24, 2020
sevein added a commit to artefactual/archivematica-storage-service that referenced this issue Feb 26, 2020
Manually dehydrate `BigIntegerField` as an integer since Tastypie can't do
that for us (django-tastypie/django-tastypie#299).

Connects to archivematica/Issues#1094.
sevein added a commit to artefactual/archivematica-storage-service that referenced this issue Feb 26, 2020
Manually dehydrate `BigIntegerField` as an integer since Tastypie can't do
that for us (django-tastypie/django-tastypie#299).

Connects to archivematica/Issues#1094.
sevein added a commit to artefactual/archivematica-storage-service that referenced this issue Feb 26, 2020
Manually dehydrate `BigIntegerField` as an integer since Tastypie can't do
that for us (django-tastypie/django-tastypie#299).

Connects to archivematica/Issues#1094.
@sevein sevein added Status: review The issue's code has been merged and is ready for testing/review. and removed Status: in progress Issue that is currently being worked on. labels Feb 26, 2020
@sevein
Copy link
Contributor

sevein commented Feb 26, 2020

Ready for QA. Follow the same steps that Ross shared to reproduce!

@replaceafill
Copy link
Member

Verified! size attribute is an integer again.

$ http -v --pretty=format GET "http://localhost:62081/api/v2/file/6149c4df-fda1-420b-956b-3a75abbe0ffe/" Authorization:"ApiKey test:test"
GET /api/v2/file/6149c4df-fda1-420b-956b-3a75abbe0ffe/ HTTP/1.1
Accept: */*
Accept-Encoding: gzip, deflate
Authorization: ApiKey test:test
Connection: keep-alive
Host: localhost:62081
User-Agent: HTTPie/0.9.8



HTTP/1.1 200 OK
Cache-Control: no-cache
Connection: keep-alive
Content-Language: en
Content-Type: application/json
Date: Thu, 27 Feb 2020 22:13:15 GMT
Server: nginx/1.14.2
Transfer-Encoding: chunked
Vary: Accept, Accept-Language, Cookie
X-Frame-Options: SAMEORIGIN

{
    "current_full_path": "/var/archivematica/sharedDirectory/www/AIPsStore/6149/c4df/fda1/420b/956b/3a75/abbe/0ffe/pictures-6149c4df-fda1-420b-956b-3a75abbe0ffe.7z",
    "current_location": "/api/v2/location/c2dd8f5b-a095-4ba9-bf09-3aff9a3bb435/",
    "current_path": "6149/c4df/fda1/420b/956b/3a75/abbe/0ffe/pictures-6149c4df-fda1-420b-956b-3a75abbe0ffe.7z",
    "encrypted": false,
    "misc_attributes": {},
    "origin_pipeline": "/api/v2/pipeline/fa3f8d23-2fea-4434-b231-dd5289b162f0/",
    "package_type": "AIP",
    "related_packages": [
        "/api/v2/file/544e5a38-304a-440f-bd6d-83de3e609d4e/"
    ],
    "replicas": [],
    "replicated_package": null,
    "resource_uri": "/api/v2/file/6149c4df-fda1-420b-956b-3a75abbe0ffe/",
    "size": 5059319,
    "status": "UPLOADED",
    "uuid": "6149c4df-fda1-420b-956b-3a75abbe0ffe"
}

@sallain sallain removed the Request: discussion The path towards resolving the issue is unclear and opinion is sought from other community members. label Mar 3, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
API Issues relating to the Archivematica APIs, bugs/new endpoints, etc. Status: review The issue's code has been merged and is ready for testing/review. Type: bug A flaw in the code that causes the software to produce an incorrect or unexpected result. Type: regression A bug that is a regression of supported functionality from previous releases
Projects
None yet
Development

No branches or pull requests

7 participants