Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] Multiple blobs in same container with seemingly same file name causes strange behaviour #42008

Open
jenspettersson opened this issue Feb 15, 2024 · 1 comment
Labels
Client This issue points to a problem in the data-plane of the library. customer-reported Issues that are reported by GitHub users external to the Azure organization. needs-team-attention This issue needs attention from Azure service team or SDK team question The issue doesn't require a change to the product in order to be resolved. Most issues start as that Service Attention This issue is responsible by Azure service team. Storage Storage Service (Queues, Blobs, Files)

Comments

@jenspettersson
Copy link

Library name and version

Azure.Storage.Blobs 12.19.1

Describe the bug

When users on different devices upload documents to a "shared" storage account and container, sometimes file names can look to be exactly the same but are actually different. This can happen when using characters like Å, Ä and Ö (probably others as well, but these were causing us problems) in file names.

See the following example, where I try to reproduce the issue:

User 1 with file:
image

User 2 with file that looks to have the same file name:
image

User 1 uploads the file first, with content "OLD REVISION NUMBER 1". Then, sometime after User 2 uploads a new file that looks to have the same name with content "NEW REVISION NUMBER 2" with the intention to replace any existing file.

This will result in the following two blobs in the blob container:
image
The file with earliest Modified time is the file from User 1.

We then use the Azure Portal to get the public URL to the first file (User 1), and we get the following URL:
https://[STORAGE-ACCOUNT].blob.core.windows.net/public/lab/ImportantDocumentÅÄÖ.txt

If I navigate to that file in a browser or using Postman:
image

If I then copy the URL of the second file (User 2) I get:
https://[STORAGE-ACCOUNT].blob.core.windows.net/public/lab/ImportantDocumentÅÄÖ.txt (seemingly the same)

Let's open that url in Postman:
image

It shows the correct content of "NEW REVISION NUMBER 2".

However, if we use Safari on a iOS Device and copies the second url, we expect to get the file with the content "NEW REVISION NUMBER 2":

image image

The user with an iOS Device using Safari gets the first file, but the same user on the same device with the same url using Chrome gets the second file:
image
This is a bit confusing for the user.

We, as developers, now know that one of the filename actually is: ImportantDocumentA\u030aA\u0308O\u0308.txt while the other is ImportantDocumentÅÄÖ.txt

We also know that we should normalize the filenames prior to uploading them but we wanted to report this as it's very likely that this might happen to someone else that might forget to normalize filenames, causing a very confusing experiance for the end user.

Lastly, this was a highly simplified version of an acutal scenario we've experienced. In our case we didn't use public blob containers but instead created read only SAS-tokens and for the files with non-normalized filenames our iOS/Safari users couldn't download the file as Azure Blob Storage said the signature didn't match. We're guessing that this simplified example might highlight the problem anyway.

Expected behavior

The SDK should normalize filenames when uploading a blob, prohibiting uploads of files that looks like they have the exact same filename.

Actual behavior

You can upload files that looks to have the same file name, causing issues on some devices/browsers.

Reproduction Steps

Super simple C# console application to create two files in the same container that looks like they have the same filename:

var blobContainerClient = new BlobContainerClient(connectionString, "public");

var files = new[]
{
    "/Users/LAB/User1/ImportantDocumentÅÄÖ.txt",
    "/Users/LAB/User2/ImportantDocumentA\u030aA\u0308O\u0308.txt"
};

Console.WriteLine("Simulating two users uploading files with seemingly identical names but different content.");

foreach (var file in files)
{
    var fileName = file.Split("/").Last();
    await using var fileStream = File.Open(file, FileMode.Open);
    await blobContainerClient.UploadBlobAsync("lab/" + fileName, fileStream);
    Console.WriteLine($"Uploaded {fileName}");
    //To make it easier to distinguish them in the Blob Container in the Azure Portal
    await Task.Delay(2000);
}

Environment

No response

@github-actions github-actions bot added Client This issue points to a problem in the data-plane of the library. customer-reported Issues that are reported by GitHub users external to the Azure organization. needs-team-triage This issue needs the team to triage. question The issue doesn't require a change to the product in order to be resolved. Most issues start as that Storage Storage Service (Queues, Blobs, Files) labels Feb 15, 2024
@jsquire jsquire added Service Attention This issue is responsible by Azure service team. needs-team-attention This issue needs attention from Azure service team or SDK team and removed needs-team-triage This issue needs the team to triage. labels Feb 15, 2024
Copy link

Thanks for the feedback! We are routing this to the appropriate team for follow-up. cc @xgithubtriage.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Client This issue points to a problem in the data-plane of the library. customer-reported Issues that are reported by GitHub users external to the Azure organization. needs-team-attention This issue needs attention from Azure service team or SDK team question The issue doesn't require a change to the product in order to be resolved. Most issues start as that Service Attention This issue is responsible by Azure service team. Storage Storage Service (Queues, Blobs, Files)
Projects
None yet
Development

No branches or pull requests

2 participants