Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

XStore C++ tests are failing because Azurite returns wrong md5 #2409

Closed
mikamins opened this issue Jun 7, 2024 · 16 comments
Closed

XStore C++ tests are failing because Azurite returns wrong md5 #2409

mikamins opened this issue Jun 7, 2024 · 16 comments
Assignees
Labels
blob-storage question Further information is requested

Comments

@mikamins
Copy link

mikamins commented Jun 7, 2024

Which service(blob, file, queue, table) does this issue concern?

blob

Which version of the Azurite was used?

3.30.0

Where do you get Azurite? (npm, DockerHub, NuGet, Visual Studio Code Extension)

npm

What's the Node.js version?

v20.10.0

What problem was encountered?

My team in XStore is trying to migrate tests from Storage Emulator to Azurite. We are blocked by failures that occur only when using Azurite.

The following code is a simplified example of the problem. It works against a real storage account and Storage Emulator.
With Azurite, it throws a storage_exception indicating Calculated MD5 does not match existing property.

Steps to reproduce the issue?

void Sample(
    const std::string& accountName,
    const std::string& accountKey,
    const std::string& blobUrl)
{
    storage_credentials credentials(
        to_string_t(accountName),
        to_string_t(accountKey));
    auto blob = cloud_block_blob(
        storage_uri(web::http::uri(to_string_t(blobUrl))),
        credentials);
    const auto& client = blob.service_client();

    auto context = operation_context();
    auto options = blob_request_options();
    options.set_retry_policy(no_retry_policy());
    auto conditions = access_condition::generate_empty_condition();

    auto container = client.get_container_reference(blob.container().name());
    container.create_if_not_exists(
        blob_container_public_access_type::off,
        options,
        context);

    // Create an empty blob
    std::vector<BYTE> emptyContent;
    blob.upload_from_stream(
        Concurrency::streams::rawptr_stream<BYTE>::open_istream(
            emptyContent.data(),
            emptyContent.size()),
        access_condition::generate_if_not_exists_condition(),
        options,
        context);

    // Write some data to the blob
    utility::string_t blockId = L"Zm9v";
    blob.upload_block(
        blockId,
        Concurrency::streams::rawptr_stream<BYTE>::open_istream(
            (const BYTE*)"abc",
            3),
        checksum_none,
        conditions,
        options,
        context);
    blob.upload_block_list(
        {blockId},
        conditions,
        options,
        context);

    // Read the data back
    BYTE buffer[10]{};
    blob.download_range_to_stream(
        Concurrency::streams::rawptr_stream<BYTE>::open_ostream(buffer, 10),
        0,
        6,
        conditions,
        options,
        context);
    VERIFY_ARE_EQUAL('a', buffer[0]);
    VERIFY_ARE_EQUAL('b', buffer[1]);
    VERIFY_ARE_EQUAL('c', buffer[2]);
}

If possible, please provide the debug log

azurite.log

Have you found a mitigation/solution?

No

@blueww blueww self-assigned this Jun 11, 2024
@blueww blueww added question Further information is requested blob-storage labels Jun 11, 2024
@blueww
Copy link
Member

blueww commented Jun 11, 2024

@mikamins

The Azurite debug log has no error occur, so the error you meet should be reported from c++ SDK.

However, from above code, it looks you are using old c++ SDK which is already deprecated. (see link)

Would you please see if you can repro this issue with latest c++ SDK?
If so, we will continue investigation on it.
Here's a migration guild from the old deprecated c++ SDK to latest c++ SDK: https://github.com/Azure/azure-sdk-for-cpp/blob/main/sdk/storage/MigrationGuide.md

BTW, from the code it looks first upload a blob with 3 bytes length, then download 6 bytes from it.
Currently Azurite will just return the 3 bytes in the blob, which looks is aligned with server behavior per my test. Then not sure why the error happens. So please try to repro the issue with latest c++ SDK. If so, we can find SDK to look why the error happens.

@mikamins
Copy link
Author

mikamins commented Jun 12, 2024

Azurite is returning the correct status code 206 and returning the partial content as expected. The issue is in the response headers.

The latest C++ SDK performs all operations synchronously, so it can never be adopted by our team within XStore. Could you please investigate why Azurite does not work with Microsoft.Azure.Storage.CPP.v140 v7.5.0?

The MD5 and version headers are only major differences I see between the responses from Azurite and Azure/Storage Emulator. One of them is causing issues with the SDK. Considering the exception message says Calculated MD5 does not match existing property., I suspect the MD5 header

@blueww
Copy link
Member

blueww commented Jun 12, 2024

@mikamins

From the server responds header (get from fiddler) and Azurite responds header (get from Azurite debug log), of a GetBlob request with "x-ms-range: bytes=0-5" on a blob whose length is 3B.
They are very similar.

Besides same status code, same content, they also have same Content-Range, Content-Length, x-ms-blob-content-md5 headers. Azurite has one additional header content-md5 whose value is also correct.

So not sure why the deprecated C++ SDK report this error.
I can't repro this issue with other SDK like .net.

It will need SDK team support to look into deprecated SDK code and find the issue.
If you would like to continue the investigation, would you please file a github issue to C++ SDK and ask why the error reported?
When we know why the error happen, then we can know how to fix it in Azurite.

Azure Server

HTTP/1.1 206 Partial Content
Content-Length: 3
Content-Type: application/octet-stream
Content-Range: bytes 0-2/3
Last-Modified: Wed, 12 Jun 2024 02:52:43 GMT
Accept-Ranges: bytes
ETag: "0x8DC8A8ABB866A3A"
Server: Windows-Azure-Blob/1.0 Microsoft-HTTPAPI/2.0
x-ms-request-id: cc35a03b-301e-0057-6173-bced59000000
x-ms-client-request-id: a2141546-1af3-4e05-b763-e4f220100d5a
x-ms-version: 2019-07-07
x-ms-creation-time: Wed, 12 Jun 2024 02:52:43 GMT
x-ms-blob-content-md5: kAFQmDzST7DWlj99KOF/cg==
x-ms-lease-status: unlocked
x-ms-lease-state: available
x-ms-blob-type: BlockBlob
x-ms-server-encrypted: true
Date: Wed, 12 Jun 2024 02:52:47 GMT

Azurite

Headers={
"server":"Azurite-Blob/3.30.0",
"last-modified":"Wed, 12 Jun 2024 02:50:58 GMT",
"x-ms-creation-time":"Wed, 12 Jun 2024 02:50:58 GMT",
"content-length":"3",
"content-type":"application/octet-stream",
"content-range":"bytes 0-2/3",
"etag":"\"0x1F4502392A75EB0\"",
"content-md5":"kAFQmDzST7DWlj99KOF/cg==",
"x-ms-blob-type":"BlockBlob",
"x-ms-lease-state":"available",
"x-ms-lease-status":"unlocked",
"x-ms-client-request-id":"4ac1e003-92a8-4722-b5db-819b53abf9fe",
"x-ms-request-id":"579187d7-dd0d-4c3f-9ab5-9197d75ff924",
"x-ms-version":"2024-05-04",
"accept-ranges":"bytes",
"date":"Wed, 12 Jun 2024 02:51:32 GMT",
"x-ms-server-encrypted":"true",
"x-ms-blob-content-md5":"kAFQmDzST7DWlj99KOF/cg=="}

@Jinming-Hu
Copy link
Member

Jinming-Hu commented Jun 13, 2024

Hi @mikamins , we were not able to reproduce this issue with latest versions of Azurite and C++ SDK. Was the attached log generated with your sample code?

We found

2024-06-07T15:04:19.988Z 60120bfb-be88-425d-8db6-5d7a9eda1537 info: BlobStorageContextMiddleware: RequestMethod=PUT RequestURL=http://127.0.0.1/devstoreaccount1/unittest/43221676-0E2C-4EF8-AEDD-7FB73B1E18CA?comp=blocklist RequestHeaders:{"connection":"Keep-Alive","content-type":"","authorization":"SharedKey devstoreaccount1:4vOY+2gpZE7ww3MmewE83WaD4yiRuAgPdpEscZLzq2Y=","user-agent":"Azure-Storage/7.5.0 (Native; Windows; MSC_VER 1900)","x-ms-blob-content-md5":"1B2M2Y8AsgTpgAmY7PhCfg==","x-ms-client-request-id":"00eaef02-b1b5-48f1-b585-647d4ec2975f","x-ms-date":"Fri, 07 Jun 2024 15:04:19 GMT","x-ms-version":"2019-12-12","content-length":"90","host":"127.0.0.1:10000"} ClientIP=127.0.0.1 Protocol=http HTTPVersion=1.1

in your log, x-ms-blob-content-md5":"1B2M2Y8AsgTpgAmY7PhCfg== indicates you set blob content md5 by yourself, but you didn't in your sample code.

@blueww
Copy link
Member

blueww commented Jun 13, 2024

Hi @mikamins,

If you really need the new C++ SDK to support Async call to upgrade to it, you can raise an issue in https://github.com/Azure/azure-sdk-for-cpp/issues to raise your requirement.

@mikamins
Copy link
Author

mikamins commented Jun 17, 2024

Hi @mikamins , we were not able to reproduce this issue with latest versions of Azurite and C++ SDK. Was the attached log generated with your sample code?

We found

2024-06-07T15:04:19.988Z 60120bfb-be88-425d-8db6-5d7a9eda1537 info: BlobStorageContextMiddleware: RequestMethod=PUT RequestURL=http://127.0.0.1/devstoreaccount1/unittest/43221676-0E2C-4EF8-AEDD-7FB73B1E18CA?comp=blocklist RequestHeaders:{"connection":"Keep-Alive","content-type":"","authorization":"SharedKey devstoreaccount1:4vOY+2gpZE7ww3MmewE83WaD4yiRuAgPdpEscZLzq2Y=","user-agent":"Azure-Storage/7.5.0 (Native; Windows; MSC_VER 1900)","x-ms-blob-content-md5":"1B2M2Y8AsgTpgAmY7PhCfg==","x-ms-client-request-id":"00eaef02-b1b5-48f1-b585-647d4ec2975f","x-ms-date":"Fri, 07 Jun 2024 15:04:19 GMT","x-ms-version":"2019-12-12","content-length":"90","host":"127.0.0.1:10000"} ClientIP=127.0.0.1 Protocol=http HTTPVersion=1.1

in your log, x-ms-blob-content-md5":"1B2M2Y8AsgTpgAmY7PhCfg== indicates you set blob content md5 by yourself, but you didn't in your sample code.

Yes, the attached log was created with the sample code, and using Azurite 3.30 and azure-storage-cpp 7.50. If you are unable to reproduce, could you provide the exact versions that you used and attach the log?

@mikamins
Copy link
Author

mikamins commented Jun 18, 2024

I stepped through the sample code in more detail, and the SDK is doing the right thing. Azurite is returning the incorrect md5 when downloading the blob.

Blob setup:

  • An empty blob is created
    • Note: md5("") = d41d8cd98f00b204e9800998ecf8427e, in base64 1B2M2Y8AsgTpgAmY7PhCfg==
  • A block with content "abc" is committed
    • Note: md5("abc") = 900150983cd24fb0d6963f7d28e17f72, in base64 kAFQmDzST7DWlj99KOF/cg==

Failing download:

  • When downloading the byte range 0-5, Azurite is correctly returning 206 Partial Content with content "abc". However, the response header content-md5: 1B2M2Y8AsgTpgAmY7PhCfg== is not correct
    • As noted above 1B2M2Y8AsgTpgAmY7PhCfg== is the md5 for an empty string, but the file content is "abc" at this point
    • Since the response has a content-md5 header, the SDK verifies it with the expected md5 (kAFQmDzST7DWlj99KOF/cg==), and throws on the mismatch
    • Note: Neither Azure or Storage Emulator includes a content-md5 header in this response

Log is attached:
azurite-2024-06-17.log

Download HTTP Request:

GET http://127.0.0.1:10000/devstoreaccount1/unittest/E2642A3C-58CF-4CA4-A7C5-2CE4C7A29B91 HTTP/1.1
Connection: Keep-Alive
Accept-Encoding: peerdist
Authorization: SharedKey devstoreaccount1:JhenggacHCvhOxTnO7qcK8+OaibtuQcSzPTkZ8zu6zw=
User-Agent: Azure-Storage/7.5.0 (Native; Windows; MSC_VER 1900)
x-ms-client-request-id: 31a5ac86-f1a3-458a-ba22-cc4400be02a9
x-ms-date: Tue, 18 Jun 2024 00:17:02 GMT
x-ms-range: bytes=0-5
x-ms-version: 2019-12-12
X-P2P-PeerDist: Version=1.1
X-P2P-PeerDistEx: MinContentInformation=1.0, MaxContentInformation=2.0
Host: 127.0.0.1:10000

Response with bad content-md5:

HTTP/1.1 206 Partial Content
Server: Azurite-Blob/3.30.0
last-modified: Tue, 18 Jun 2024 00:17:02 GMT
x-ms-creation-time: Tue, 18 Jun 2024 00:17:02 GMT
content-length: 3
content-type: application/octet-stream
content-range: bytes 0-2/3
etag: "0x22AF68371ECD940"
content-md5: 1B2M2Y8AsgTpgAmY7PhCfg==
x-ms-blob-type: BlockBlob
x-ms-lease-state: available
x-ms-lease-status: unlocked
x-ms-client-request-id: 31a5ac86-f1a3-458a-ba22-cc4400be02a9
x-ms-request-id: bff087cc-05f0-4c98-996f-0a39ccd4838e
x-ms-version: 2024-05-04
accept-ranges: bytes
date: Tue, 18 Jun 2024 00:17:02 GMT
x-ms-server-encrypted: true
x-ms-blob-content-md5: 1B2M2Y8AsgTpgAmY7PhCfg==
Connection: keep-alive
Keep-Alive: timeout=5

abc

@Jinming-Hu
Copy link
Member

@blueww This seems to be a bug in Azurite. Azurite doesn't clear all blob properties when it's overwritten.

@blueww
Copy link
Member

blueww commented Jun 18, 2024

@mikamins , @Jinming-Hu

Thanks for the investigation!
I will look into it and update later.

@mikamins mikamins changed the title XStore C++ tests are failing only when using Azurite XStore C++ tests are failing because Azurite returns wrong md5 Jun 18, 2024
@blueww
Copy link
Member

blueww commented Jun 19, 2024

@mikamins

I can't repro this with Azurite.
Azurite will return correct content MD5 "kAFQmDzST7DWlj99KOF/cg==" after commit block list with a block contains "abc".

After look into the debug log shared in the above comment from you, I see you have set header "x-ms-blob-content-md5":"1B2M2Y8AsgTpgAmY7PhCfg==" when commit the block list, so the wrong content MD5 is send from client side.
If client set the content MD5, Azurite will respect it, else Azurite should have the correct MD5.

2024-06-18T00:17:02.633Z 63c975e5-7b5d-4755-ac9a-a65e617053c7 info: BlobStorageContextMiddleware: RequestMethod=PUT RequestURL=http://127.0.0.1/devstoreaccount1/unittest/E2642A3C-58CF-4CA4-A7C5-2CE4C7A29B91?comp=blocklist RequestHeaders:{"connection":"Keep-Alive","content-type":"","authorization":"SharedKey devstoreaccount1:bprNXXG2v3W9YXS4l8Z9KS6A4MYYoiOhsMWQWqSoKd0=","user-agent":"Azure-Storage/7.5.0 (Native; Windows; MSC_VER 1900)","x-ms-blob-content-md5":"1B2M2Y8AsgTpgAmY7PhCfg==","x-ms-client-request-id":"31a5ac86-f1a3-458a-ba22-cc4400be02a9","x-ms-date":"Tue, 18 Jun 2024 00:17:02 GMT","x-ms-version":"2019-12-12","content-length":"90","host":"127.0.0.1:10000"} ClientIP=127.0.0.1 Protocol=http HTTPVersion=1.1

@Jinming-Hu
Copy link
Member

Jinming-Hu commented Jun 19, 2024

@blueww Track1 SDK keeps state of a blob at client side (state includes blob properties). Is it possible that when we get properties of the old blob (empty content), the local state is populated, then md5 is sent out over the wire when calling CommitBlocks?

This cannot be reproed with public Azure because public Azure service doesn't return blob-md5 for partial read. Hmm, it explains everything.

@blueww
Copy link
Member

blueww commented Jun 20, 2024

Thanks @Jinming-Hu for the investigation!

Per rest API doc, Put Blob should return Content-MD5, and Azurite is aligned with the rest API doc.
Besides Azurite is returning the correct MD5. (If user set it, return the user set value. )

@mikamins
The suggested way to fix this issue for long term is upgrading to the latest C++ SDK.
Else a workaround is to clear the blob object contentMD5 properties before you run blob.upload_block_list(). Would you please try and see if it works on you scenario?

@Jinming-Hu
Copy link
Member

@blueww

per REST API doc

If the request is to read a specified range and the x-ms-range-get-content-md5 is set to true, the request returns an MD5 hash for the range, as long as the range size is less than or equal to 4 MiB.
If neither of these sets of conditions is true, no value is returned for the Content-MD5 header.

Azurite should fix its wrong behavior.

@blueww
Copy link
Member

blueww commented Jun 20, 2024

@Jinming-Hu

The REST API doc you shared is for Get blob API.
But the API which get the content MD5 is Put Blob (per the c++ code and Azurite debug log in this issue, blob object get the Content MD5 "1B2M2Y8AsgTpgAmY7PhCfg==" when Put blob with 0 size).
Put Blob API doc should be the one I shared: rest API doc.
And ContentMD5 should be returned per this API doc.

@Jinming-Hu
Copy link
Member

@blueww I don't think we're on the same page. Anyway, the workaround you proposed does sound good to me.

@blueww
Copy link
Member

blueww commented Jul 2, 2024

Close as the fix PR (#2417 Download blob range only return ContentMD5 when request has header x-ms-range-get-content-md5) has been merged, will be in next Azurite release.

@blueww blueww closed this as completed Jul 2, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
blob-storage question Further information is requested
Projects
None yet
Development

No branches or pull requests

3 participants