title | titleSuffix | description | author | ms.author | ms.reviewer | ms.date | ms.service | ms.subservice | ms.topic |
---|---|---|---|---|---|---|---|---|---|
Use curl to load data into HDFS |
SQL Server Big Data Clusters |
Use curl to load data into HDFS on SQL Server 2019 big data cluster. |
WilliamDAssafMSFT |
wiassaf |
wiassaf |
10/05/2021 |
sql |
big-data-cluster |
conceptual |
Use curl to load data into HDFS on [!INCLUDEbig-data-clusters-2019]
[!INCLUDESQL Server 2019]
This article explains how to use curl to load data into HDFS on [!INCLUDEbig-data-clusters-2019].
[!INCLUDEbig-data-clusters-banner-retirement]
WebHDFS is started when deployment is completed, and its access goes through Knox. The Knox endpoint is exposed through a Kubernetes service called gateway-svc-external. To create the necessary WebHDFS URL to upload/download files, you need the gateway-svc-external service external IP address and the name of your big data cluster. You can get the gateway-svc-external service external IP address by running the following command:
kubectl get service gateway-svc-external -n <big data cluster name> -o json | jq -r .status.loadBalancer.ingress[0].ip
Note
The <big data cluster name>
here is the name of the cluster that you specified in the deployment configuration file. The default name is mssql-cluster
.
Now, you can construct the URL to access the WebHDFS as follows:
https://<gateway-svc-external service external IP address>:30443/gateway/default/webhdfs/v1/
For example:
https://13.66.190.205:30443/gateway/default/webhdfs/v1/
For deployments with Active Directory, use the authentication parameter with curl
with Negotiate authentication.
To use curl
with Active Directory authentication, run this command:
kinit <username>
The command generates a Kerberos token for curl
to use. The commands demonstrated in the next sections specify the --anyauth
parameter for curl
. For URLs that require Negotiate authentication, curl
automatically detects and uses the generated Kerberos token instead of username and password to authenticate to the URLs.
To list file under hdfs:///product_review_data, use the following curl command:
curl -i -k --anyauth -u root:<AZDATA_PASSWORD> -X GET 'https://<gateway-svc-external IP external address>:30443/gateway/default/webhdfs/v1/product_review_data/?op=liststatus'
[!INCLUDE big-data-cluster-root-user]
For endpoints that do not use root, use the following curl command:
curl -i -k --anyauth -u <AZDATA_USERNAME>:<AZDATA_PASSWORD> -X GET 'https://<gateway-svc-external IP external address>:30443/gateway/default/webhdfs/v1/product_review_data/?op=liststatus'
To put a new file test.csv from local directory to product_review_data directory, use the following curl command (the Content-Type parameter is required):
curl -i -L -k --anyauth -u root:<AZDATA_PASSWORD> -X PUT 'https://<gateway-svc-external IP external address>:30443/gateway/default/webhdfs/v1/product_review_data/test.csv?op=create' -H 'Content-Type: application/octet-stream' -T 'test.csv'
[!INCLUDE big-data-cluster-root-user]
For endpoints that do not use root, use the following curl command:
curl -i -L -k --anyauth -u <AZDATA_USERNAME>:<AZDATA_PASSWORD> -X PUT 'https://<gateway-svc-external IP external address>:30443/gateway/default/webhdfs/v1/product_review_data/test.csv?op=create' -H 'Content-Type: application/octet-stream' -T 'test.csv'
To create a directory test under hdfs:///
, use the following command:
curl -i -L -k --anyauth -u root:<AZDATA_PASSWORD> -X PUT 'https://<gateway-svc-external IP external address>:30443/gateway/default/webhdfs/v1/test?op=MKDIRS'
[!INCLUDE big-data-cluster-root-user] For endpoints that do not use root, use the following curl command:
curl -i -L -k --anyauth -u <AZDATA_USERNAME>:<AZDATA_PASSWORD> -X PUT 'https://<gateway-svc-external IP external address>:30443/gateway/default/webhdfs/v1/test?op=MKDIRS'
For more information, see [Introducing [!INCLUDEbig-data-clusters-nover]](big-data-cluster-overview.md).