Skip to content

Latest commit

 

History

History
108 lines (69 loc) · 4.71 KB

data-ingestion-curl.md

File metadata and controls

108 lines (69 loc) · 4.71 KB
title titleSuffix description author ms.author ms.reviewer ms.date ms.service ms.subservice ms.topic
Use curl to load data into HDFS
SQL Server Big Data Clusters
Use curl to load data into HDFS on SQL Server 2019 big data cluster.
WilliamDAssafMSFT
wiassaf
wiassaf
10/05/2021
sql
big-data-cluster
conceptual

Use curl to load data into HDFS on [!INCLUDEbig-data-clusters-2019]

[!INCLUDESQL Server 2019]

This article explains how to use curl to load data into HDFS on [!INCLUDEbig-data-clusters-2019].

[!INCLUDEbig-data-clusters-banner-retirement]

Prerequisites

Obtain the service external IP

WebHDFS is started when deployment is completed, and its access goes through Knox. The Knox endpoint is exposed through a Kubernetes service called gateway-svc-external. To create the necessary WebHDFS URL to upload/download files, you need the gateway-svc-external service external IP address and the name of your big data cluster. You can get the gateway-svc-external service external IP address by running the following command:

kubectl get service gateway-svc-external -n <big data cluster name> -o json | jq -r .status.loadBalancer.ingress[0].ip

Note

The <big data cluster name> here is the name of the cluster that you specified in the deployment configuration file. The default name is mssql-cluster.

Construct the URL to access WebHDFS

Now, you can construct the URL to access the WebHDFS as follows:

https://<gateway-svc-external service external IP address>:30443/gateway/default/webhdfs/v1/

For example:

https://13.66.190.205:30443/gateway/default/webhdfs/v1/

Authentication with Active Directory

For deployments with Active Directory, use the authentication parameter with curl with Negotiate authentication.

To use curl with Active Directory authentication, run this command:

kinit <username>

The command generates a Kerberos token for curl to use. The commands demonstrated in the next sections specify the --anyauth parameter for curl. For URLs that require Negotiate authentication, curl automatically detects and uses the generated Kerberos token instead of username and password to authenticate to the URLs.

List a file

To list file under hdfs:///product_review_data, use the following curl command:

curl -i -k --anyauth -u root:<AZDATA_PASSWORD> -X GET 'https://<gateway-svc-external IP external address>:30443/gateway/default/webhdfs/v1/product_review_data/?op=liststatus'

[!INCLUDE big-data-cluster-root-user]

For endpoints that do not use root, use the following curl command:

curl -i -k --anyauth -u <AZDATA_USERNAME>:<AZDATA_PASSWORD> -X GET 'https://<gateway-svc-external IP external address>:30443/gateway/default/webhdfs/v1/product_review_data/?op=liststatus'

Put a local file into HDFS

To put a new file test.csv from local directory to product_review_data directory, use the following curl command (the Content-Type parameter is required):

curl -i -L -k --anyauth -u root:<AZDATA_PASSWORD> -X PUT 'https://<gateway-svc-external IP external address>:30443/gateway/default/webhdfs/v1/product_review_data/test.csv?op=create' -H 'Content-Type: application/octet-stream' -T 'test.csv'

[!INCLUDE big-data-cluster-root-user]

For endpoints that do not use root, use the following curl command:

curl -i -L -k --anyauth -u <AZDATA_USERNAME>:<AZDATA_PASSWORD> -X PUT 'https://<gateway-svc-external IP external address>:30443/gateway/default/webhdfs/v1/product_review_data/test.csv?op=create' -H 'Content-Type: application/octet-stream' -T 'test.csv'

Create a directory

To create a directory test under hdfs:///, use the following command:

curl -i -L -k --anyauth -u root:<AZDATA_PASSWORD> -X PUT 'https://<gateway-svc-external IP external address>:30443/gateway/default/webhdfs/v1/test?op=MKDIRS'

[!INCLUDE big-data-cluster-root-user] For endpoints that do not use root, use the following curl command:

curl -i -L -k --anyauth -u <AZDATA_USERNAME>:<AZDATA_PASSWORD> -X PUT 'https://<gateway-svc-external IP external address>:30443/gateway/default/webhdfs/v1/test?op=MKDIRS'

Next steps

For more information, see [Introducing [!INCLUDEbig-data-clusters-nover]](big-data-cluster-overview.md).