-
Notifications
You must be signed in to change notification settings - Fork 1.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[BUG] Multiple createPathFile internal operations causing uploadFIle api to convert finite size file to zero bytes file in ADLS gen2 #40235
Comments
Thank you for your feedback. Tagging and routing to the team member best able to assist. |
Hi @fivetran-arunsuri , thanks for reporting this issue. Taking a quick look at the sample provided: DataLakeDirectoryClient directoryClient = fileSystemClient.getDirectoryClient(adlsFolderPath);
RETRIER.get(() -> {
DataLakeFileClient fileClient = directoryClient.createFile(fileToUpload.getName(), true);
fileClient.uploadFromFile(fileToUpload.getPath(), true);
return fileClient.getFileName();
}, MAX_ATTEMPTS); Calling both
will result in a Still looking into why there is a |
Hi @fivetran-arunsuri. Thank you for opening this issue and giving us the opportunity to assist. To help our team better understand your issue and the details of your scenario please provide a response to the question asked above or the information requested above. This will help us more accurately address your issue. |
hey alzimmermsft I understood the fix suggested by you to remove one of the 2 createPathFile operation but I am interested to know answers of why there is a CreateFilePath REST operation after uploading and why the LastModifed value is Monday, 01-Jan-01 00:00:00 GMT. No We are not making upload related calls after this |
alzimmermsft any update on this? |
Still haven't found a root cause on this as the And does this happen every time or randomly during the application run? |
It does not happen everytime. It is an intermittent issue but it always occurs at times |
If possible, could you produce HTTP logs when the scenario is hit? DataLakeServiceClient serviceClient = new DataLakeServiceClientBuilder()
// add credential information here
// endpoint here
.httpLogOptions(DataLakeServiceClientBuilder.getDefaultHttpLogOptions().setLogLevel(HttpLogDetailLevel.HEADERS))
.buildClient();
DataLakeDirectoryClient directoryClient = serviceClient.getFileSystemClient(fileSystemName).getDirectoryClient(adlsFolderPath);
// Retrier code here This will help find if the empty file creation correlates with a retry, which would be a bug in this case. Still can't reproduce the Last-Modified time issue as I've tried creating empty files, files that are created with a single append, and files created with multiple appends before flushing and still haven't hit this case yet. FYI @seanmcc-msft if you know anything about this edge case problem. |
alzimmermsft where will these logs will be produced in this case? I mean if we create the service client like this, should this be part of client logs |
Describe the bug
Following code is used to upload the files to Azure Data lake storage container(retrier retries the upload in case of failure). It internally involves, CreatePathFile, Append File,
FlushFile rest API operations. We have enabled the diagnostic logs on the container and in some cases we saw following operations are performed in following order
There are multiple createPathFile operations being performed here for the same file. It should not happen ideally because if just createPathFile will be triggered once the file has data, it will turn it into zero bytes file.
Also, we noticed the LastModified time in logs for append file operation always shows
Monday, 01-Jan-01 00:00:00 GMT
, which looks incorrect to me. Can you please help in this case?Exception or Stack Trace
To Reproduce
Code Snippet
DataLakeDirectoryClient directoryClient = fileSystemClient.getDirectoryClient(adlsFolderPath);
RETRIER.get(
() -> {
DataLakeFileClient fileClient = directoryClient.createFile(fileToUpload.getName(), true);
fileClient.uploadFromFile(fileToUpload.getPath(), true);
return fileClient.getFileName();
},
MAX_ATTEMPTS);
Expected behavior
It should just upload the file with finite size
Setup (please complete the following information):
"com.azure:azure-storage-file-datalake:12.18.1",
"com.azure:azure-storage-common:12.24.1",
The text was updated successfully, but these errors were encountered: