Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Convenience layer for uploading file to ADLS #3231

Closed
fantyz opened this issue Nov 5, 2018 · 10 comments
Closed

Convenience layer for uploading file to ADLS #3231

fantyz opened this issue Nov 5, 2018 · 10 comments
Assignees
Labels
Client This issue points to a problem in the data-plane of the library. customer-reported Issues that are reported by GitHub users external to the Azure organization. feature-request This issue requires a new behavior in the product in order be resolved.

Comments

@fantyz
Copy link

fantyz commented Nov 5, 2018

Hi,

We're having issues uploading files to ADLS using the SDK.

We're using:

  • github.com/Azure/azure-sdk-for-go/services/datalake/store/2016-11-01/filesystem (v21.1.0, 6d20bdbae88c06c36d72eb512295417693bfdf4e)
  • github.com/Azure/go-autorest (v10.15.5, 9bc4033dd347c7f416fca46b2f42a043dc1fbdf6)

When doing client.Create we get the following error consistently:

filesystem.Client#Create: Failure responding to request: StatusCode=404 -- Original Error: autorest/azure: error response cannot be parsed: "" error: EOF

It only happens with larger files (I do not know what the cut-off point is, but a 500kb file uploads fine and a 37mb file consistently fails.

It would also be great if you could return informative errors instead of this. I mean it might as well have returned a plain: "Some error happened. Too bad."

@fantyz
Copy link
Author

fantyz commented Nov 5, 2018

A quick test of only creating an empty file with Create and write the data with Append resulted in the same error:

filesystem.Client#Append: Failure responding to request: StatusCode=404 -- Original Error: autorest/azure: error response cannot be parsed: "" error: EOF

@jhendrixMSFT
Copy link
Member

I don't think this is specific to the Go SDK as I can repro this same behavior using powershell. @joshgav can you please loop in somebody from the data lake store team to help?

@jhendrixMSFT jhendrixMSFT added Data Lake Store Service Attention This issue is responsible by Azure service team. labels Nov 5, 2018
@fantyz
Copy link
Author

fantyz commented Nov 8, 2018

It would be great if someone could take a look at this?
We have a release blocked by this issue making it somewhat urgent for us.

@jhendrixMSFT
Copy link
Member

@fantyz I'm trying to find somebody from the service team to take a look.

@jhendrixMSFT
Copy link
Member

OK, the issue here is that the service only supports uploads in 4MB chunks. This is documented here however is easy to overlook; see the buffersize parameter. I have asked the service team to update the service to return a meaningful error message.
As for code changes to make it work, I will investigate now that I understand the root cause.

@jhendrixMSFT
Copy link
Member

Here's a working example.

package main

import (
	"bytes"
	"context"
	"fmt"
	"io"
	"io/ioutil"
	"os"

	"github.com/Azure/azure-sdk-for-go/services/datalake/store/2016-11-01/filesystem"
	"github.com/Azure/go-autorest/autorest/azure/auth"
)

const (
	accountName = "<account name>"
	destination = "<destination>"
	inputFile   = `<big file for upload>`
	bufferSize  = 4 * 1024 * 1024
)

func main() {
	a, err := auth.NewAuthorizerFromEnvironment()
	if err != nil {
		panic(err)
	}
	client := filesystem.NewClient()
	client.Authorizer = a

	f, err := os.Open(inputFile)
	if err != nil {
		panic(err)
	}
	defer f.Close()

	resp, err := client.Create(context.Background(), accountName, destination, nil, nil, filesystem.DATA, nil, nil)
	if err != nil {
		panic(err)
	}
	fmt.Println(resp.Status)

	buffer := make([]byte, bufferSize, bufferSize)
	for {
		n, err := f.Read(buffer)
		if err == io.EOF {
			break
		}
		flag := filesystem.DATA
		if n < bufferSize {
			// last chunk
			flag = filesystem.CLOSE
		}
		chunk := ioutil.NopCloser(bytes.NewReader(buffer))
		fmt.Printf("uploading chunk %d...", n)
		resp, err = client.Append(context.Background(), accountName, destination, chunk, nil, flag, nil, nil)
		if err != nil {
			panic(err)
		}
		fmt.Printf("%d\n", resp.StatusCode)
	}
	fmt.Println("done!")
}

@jhendrixMSFT jhendrixMSFT removed Data Lake Store Service Attention This issue is responsible by Azure service team. labels Nov 14, 2018
@jhendrixMSFT
Copy link
Member

Opened Azure/azure-rest-api-specs#4464 to update the SDK docs.

@jhendrixMSFT jhendrixMSFT self-assigned this Nov 14, 2018
@fantyz
Copy link
Author

fantyz commented Nov 15, 2018

Thanks - we'll do the suggested work-around.

I was kinda hoping for such low level code as calls to Read and juggling byte buffers would be handled for me. You do not see this logic as something that should be part of the SDK and not the end user?

I would avoid doing the flag = filesystem.CLOSE as port of the loop though and handle it afterwards (or in the io.EOF case). Read does not provide any guarantee that n == len(buffer) (in fact quite the opposite) filesystem.CLOSE might be set prematurely.

From https://golang.org/pkg/io/#Reader

Read reads up to len(p) bytes into p. It returns the number of bytes read (0 <= n <= len(p)) and any error encountered. Even if Read returns n < len(p), it may use all of p as scratch space during the call. If some data is available but not len(p) bytes, Read conventionally returns what is available instead of waiting for more.

You also have the case where the last call to Read before io.EOF returns a full buffer where filesystem.CLOSE never happens.

@jhendrixMSFT
Copy link
Member

Sorry I should have prefaced my sample as non-production code.
Agreed that this functionality belongs in the SDK. For some context, the packages we have are generated from our OpenAPI specs which is what we refer to as the "protocol layer". Functionality like this would go into a hand-written "convenience layer" built on top of the protocol layer as there's no way we'd know how to generate this functionality.
We have discussed internally how to package convenience layer code, I will use this issue to track it for DLS but given other priorities at present (e.g. modules) I suspect a solution is a ways out.

@jhendrixMSFT jhendrixMSFT changed the title Failing uploading file to ADLS Convenience layer for uploading file to ADLS Nov 15, 2018
dyindude pushed a commit to dyindude/terraform-provider-azurerm that referenced this issue Jan 9, 2019
@ArcturusZhang ArcturusZhang added feature-request This issue requires a new behavior in the product in order be resolved. and removed Feature Request labels Apr 27, 2020
@azure-sdk azure-sdk added the customer-reported Issues that are reported by GitHub users external to the Azure organization. label Sep 24, 2020
@tzhanl tzhanl added the Client This issue points to a problem in the data-plane of the library. label Nov 18, 2020
@RickWinter RickWinter added this to the Backlog milestone Jul 12, 2021
@RickWinter
Copy link
Member

Track1 will not provide a GA library with datalake support.

@github-actions github-actions bot locked and limited conversation to collaborators Apr 11, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Client This issue points to a problem in the data-plane of the library. customer-reported Issues that are reported by GitHub users external to the Azure organization. feature-request This issue requires a new behavior in the product in order be resolved.
Projects
None yet
Development

No branches or pull requests

6 participants