Skip to content

Conversation

@krowvin
Copy link
Collaborator

@krowvin krowvin commented Mar 12, 2025

ISSUE

When testing the blobs for storing and retrieving with cwms-python version 0.6.0 the API was unable to retrieve the BLOB with the following error:

ERROR:root:Error decoding CDA response as JSON: Expecting value: line 1 column 3 (char 2) on line 1

This was because the mime-type returned by the blobs endpoint for a GET request is application/octet-stream

Seen here:

$ curl -X 'GET' 'https://cwms-data.usace.army.mil/cwms-data/blobs/GATECHANGES.XML?office=SWT'  -H 'accept: */*' -I
HTTP/1.1 200 
Strict-Transport-Security: max-age=31536000;includeSubDomains
X-Frame-Options: SAMEORIGIN
X-Content-Type-Options: nosniff
X-XSS-Protection: 1; mode=block
Cache-Control: max-age=300
ETag: 3579315270
Content-Type: application/octet-stream
Content-Length: 863
Date: Wed, 12 Mar 2025 03:58:43 GMT
Server: webthing

If you run what was present before in the get method

It attempts to force the response into json even if it is not json.

FIX

To fix this I made sure to check the content type in the get method of api.py. Dynamically deciding when to apply the various methods to the response data.

MISSING CDA TYPES

Currently CDA only lets you store/retrieve octet-stream - I posted an issue for this on the CDA repo here:

SUMMARY OF CHANGES

I also took the opportunity to :

  1. Make sure the store_blob casted the value to a base64 encoded string if it was not already done. This is required in order for the API to store the value. Otherwise you will get a serialization error in the logs and a 500 status.
  2. Converted the response check for codes to .ok as 3## redirects are handled by requests and imo anything less than 400 should be reasonable - I'm not dead set on this but we could make it < 400 if you prefer the verbosity. (not the < 300 we had).
  3. Add notes to the BLOB store/get pydocs to ensure user knows the id gets uppercased on storing and must be uppercase on retrieval.
  4. Switch from using the response.close() to a context manager which properly ensures no resource/connection leaks.
  5. Use response.text and response.content along with response.json() based on the content-type returned, updated the get_xml to use the new get method for backwards compatibility.

TESTS

I went about writing a quick get script with some mock tests for store_blob and various other endpoints to make sure it was properly building the payload (base64 encoded value) and I did not introduce any other breaking changes.

Here is that script for reference:

import cwms

cwms.init_session()


import sys

data = {
    "office-id": "SWT",
    "id": "MYFILE_OR_BLOB_ID.TXT",
    "description": "Your description here",
    "media-type-id": "application/octet-stream",
    "value": "STRING of content or BASE64_ENCODED_STRING",
}
# cwms.store_blobs(data, fail_if_exists=False)

# sys.exit()
changes = cwms.get_blob("GATECHANGES.XML", "SWT")
print(changes)

xml_catalog = cwms.get_blobs(office_id="SWT", blob_id_like="*.XML")
print(xml_catalog.json)

timeseries = cwms.get_timeseries(
    office_id="SWT", ts_id="KEYS.Elev.Inst.1Hour.0.Ccp-Rev"
)
print(timeseries.df)

location = cwms.get_location("KEYS", "SWT")
print(location.df)


outlet = cwms.get_outlet("SWT", "KEYS")
print(outlet.df)

@krowvin krowvin changed the title Fix BLOB store, Update API.py GET Fix BLOB store/get, Update API.py GET Mar 12, 2025
@krowvin krowvin added the blocking A district needs this to move forward with cloud migration label Mar 12, 2025
@krowvin
Copy link
Collaborator Author

krowvin commented Mar 12, 2025

I'm uncertain if it makes more sense for get_clob to return an object or if the string should be passed through.

I tried setting a Union type for the get handler of JSON and str but got a few errors from mypy about downstream functions expecting only JSON.

In the future as more types are added it could be possible to do things like add ElementTree and parse the BLOB as XML given the content type from the headers is set.

Also, sorry for the spam. I was actively remembering (contributing doc) as I went and how much I could run/test locally before committing.

Copy link
Collaborator

@Enovotny Enovotny left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like the changes. just have a couple comments. Thank for making the improvements to the api calls.

@sonarqubecloud
Copy link

@krowvin
Copy link
Collaborator Author

krowvin commented Mar 13, 2025

Your requests

  1. Removed the extra location groups file
  2. Removed get_xml
  3. Changed types to Any where required by typechecker for XML/etc
  • Updated method to return raw string instead of Dicts where needed

Few more things

  1. Ran the spellchecker for entire codebase
  2. converted 102/2 api version to dynamically figure out the mimetype based on the format you provide (CDA has a format param but will use the mimetype/accept as well - NOT both)
  3. Unlocked the versions because
  • We control the version with the wrapper itself
  • CDA can/should handle if a user/we try to specify a version higher
  • Adds room for API growth

I also ran my python file above to make sure these worked against the national CDA instance, not just the mocks

@Enovotny
Copy link
Collaborator

I like some of the changes, but we should keep the calls get_....xml. Something that I learned from Jordan when first setting this up was the each function should do one thing or provide a single data format. This is for testing purposes. That is why we created the cwms data type to provide the json/dataframe which is the backbone of this package. Anything that provides a different format of data in the get calls should be a separate function. get_ratings_xml ect... I think I am fine with the change in the api.py calls. I think that makes things a little more readable and maintainable long term. but we should not have format as a parameter in any of the get... functions. I originally had that parameters as well to get json or dataframe and Jordan steered me into the current implementation.

@krowvin
Copy link
Collaborator Author

krowvin commented May 29, 2025

I tested these changes against the below cwms-python code and our CDA instance to confirm this would work as intended.

import os
import cwms

cwms.init_session(api_key="apikey " + os.getenv("CDA_API_KEY", ""), api_root=os.getenv("CDA_HOST", "") + "/")


cwms.store_blobs(
    data={
        "office-id": "SWT",
        "id": "TEST.TXT",
        "description": "Your description here",
        "media-type-id": "text/plain",
        "value": "A test of cwms-python blob store",
    }, 
    fail_if_exists=False
)

print(cwms.get_blob(blob_id="DATACHECK.HYDROPOWER.JSON", office_id="SWT"))
print(cwms.get_blob(blob_id="EUFA.PLOT.PNG", office_id="SWT"))
print(cwms.get_blob(blob_id="TEST.TXT", office_id="SWT"))
print(cwms.get_blobs(office_id="SWT", blob_id_like="TEST").json)

print("Stored!")

Saved to:
https://cwms-data.usace.army.mil/cwms-data/blobs/TEST.TXT?office=SWT

Output here:

{'startTime': 't-7d', 'updated': '2025-03-27 07:28:12.067961-05:00', 'DATA': 1, 'NAME': 0, 'endTime': 't+6h', 'groups': [['Group Name 2', {'Pool Elevation': ['FGIB.Elev.Inst.30Minutes.0.Decodes-Raw']}]]}

iVBORw0KGgoAA...really-long-base64-string...AABJRU5ErkJggg==

A test of cwms-python blob store

{'blobs': [{'office-id': 'SWT', 'id': 'TEST-TEXT', 'description': 'a test text response', 'media-type-id': 'text/plain'}, {'office-id': 'SWT', 'id': 'TEST.TXT', 'description': 'Your description here', 'media-type-id': 'text/plain'}], 'page': 'fHwwfHwxMDA=', 'page-size': 100, 'total': 0}

Stored!

I reverted the format and get_xml/etc changes as requested.

Let me know if I missed any changes you would like to see.

Side note, the get may normally return a JSON. But with the blob endpoint it could be any format based on the mimetype.

Could also create a get_any if you wanted to maintain JSON on get?

@krowvin krowvin force-pushed the bug/get-blob-response branch from 505c192 to 73bdcad Compare May 29, 2025 21:30
@krowvin
Copy link
Collaborator Author

krowvin commented May 29, 2025

Realized we could handle the images mimetype too. Those are stored in blob as base64 strings.

cwms-python/cwms/api.py

Lines 232 to 233 in 505c192

if content_type.startswith("image/"):
return base64.b64encode(response.content).decode("utf-8")

Depending on the mime-type it might not play nice with an attempt at python doing a decode on it. But will leave any extras I missed (XML/XLSX/etc) for the next PR...

return response.content.decode("utf-8")

@sonarqubecloud
Copy link

@krowvin krowvin merged commit cd91824 into main Jun 3, 2025
8 checks passed
@krowvin krowvin deleted the bug/get-blob-response branch June 3, 2025 17:02
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

blocking A district needs this to move forward with cloud migration

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants