# HDFS 階層化
Big Data Cluster の持つ HDFS 以外の分散ストレージをマウント  
[Configure HDFS tiering on SQL Server big data clusters](https://docs.microsoft.com/en-us/sql/big-data-cluster/hdfs-tiering?view=sqlallproducts-allversions)
>HDFS tiering for SQL Server 2019 big data clusters (preview). At this time, we support connecting to Azure Data Lake Storage Gen2, and Amazon S3.


## 1. BDC の準備

**1. 資格情報用のファイル作成**
- [Azure Data Lake Storage (ADLS) Gen2](https://docs.microsoft.com/ja-jp/sql/big-data-cluster/hdfs-tiering-mount-adlsgen2?view=sqlallproducts-allversions#credentials-for-mounting)
- [AWS S3](https://docs.microsoft.com/ja-jp/sql/big-data-cluster/hdfs-tiering-mount-s3?view=sqlallproducts-allversions#access-keys)

**2. BDC にログイン**
```
kubectl get svc mgmtproxy-svc-external -n mssql-cluster
mssqlctl login -e $("https://{0}" -f $ENV:MGMTPROXY_ENDPOINT) -u $ENV:MGMTPROXY_LOGIN -p $ENV:MGMTPROXY_LOGIN_PASSWORD
```

**3. HDFS に外部ストレージをマウント**
```
# ADL Gen2 のマウント
mssqlctl storage mount create --remote-uri "abfs://azureblob@$($ENV:ADLGen2_ACCOUNT).dfs.core.windows.net/" --mount-path /mounts/azureblob --credential-file "C:\Users\decodeadmin\Desktop\Demo\00.Setup\01.環境構築\files.creds"

# パーミッションの調整
kubectl exec -n mssql-cluster -it master-0 -c hadoop /bin/bash
hdfs dfs -ls -r /mounts/azureblob
hdfs dfs -chmod -R o+rx /mounts/azureblob
exit

# AWS S3 のマウント
mssqlctl storage mount create --remote-uri "s3a://$($ENV:S3_ACCOUNT)/" --mount-path /mounts/aws --credential-file "C:\Users\decodeadmin\Desktop\Demo\00.Setup\01.環境構築\s3files.creds"

# ステータスの確認
mssqlctl storage mount status

# 削除
# mssqlctl storage mount delete --mount-path /mounts/azureblob
# mssqlctl storage mount delete --mount-path /mounts/aws
```

![ADL Gen2](https://raw.githubusercontent.com/MasayukiOzawa/decode-2019-demo/master/Images/03.Data%20Store/01.Storage%20Pool/ADL%20Gen2.png)
![AWS S3](https://github.com/MasayukiOzawa/decode-2019-demo/raw/master/Images/03.Data%20Store/01.Storage%20Pool/AWS%20S3.png)  
![HDFS Mount](https://raw.githubusercontent.com/MasayukiOzawa/decode-2019-demo/master/Images/03.Data%20Store/01.Storage%20Pool/HDFS%20Mount.png)

## 2. 階層化したデータへのアクセス (ADL Gen2)

In [8]:
USE [StoragePool];

-- オブジェクトの初期化
IF EXISTS (SELECT * FROM sys.external_tables WHERE name = 'StoragePoolADLTBL')
BEGIN
	DROP EXTERNAL TABLE [StoragePoolADLTBL]
END;
GO

**1. 外部テーブルの作成**  
テーブルの作成方法は、HDFS のデータアクセスと同様

In [9]:
USE [StoragePool];

-- ADL Gen2 の内容を外部テーブルとして作成
CREATE EXTERNAL TABLE [StoragePoolADLTBL]
(
	wcs_click_date_sk BIGINT , 
	wcs_click_time_sk BIGINT , 
	wcs_sales_sk BIGINT , 
	wcs_item_sk BIGINT , 
	wcs_web_page_sk BIGINT , 
	wcs_user_sk BIGINT
)
WITH
(
    DATA_SOURCE = SqlStoragePool,
	LOCATION = '/mounts/azureblob',
    FILE_FORMAT = csv_file
)
GO

**2. ADL Gen2 のデータにアクセス**

In [10]:
USE [StoragePool];

SELECT COUNT(*) FROM [StoragePoolADLTBL]
SELECT TOP 25 * FROM [StoragePoolADLTBL]

(No column name)
998


wcs_click_date_sk,wcs_click_time_sk,wcs_sales_sk,wcs_item_sk,wcs_web_page_sk,wcs_user_sk
38569,4250,,7840,18,
38569,85106,,11130,18,
38569,52655,,3716,18,
38569,70934,,13243,18,
38569,40166,,5389,18,
38570,73271,,3331,18,
38570,24651,,10049,18,
38570,23805,,921,18,
38570,66458,,4407,18,
38570,65912,,11494,18,


![ADL Gen2 Query Plan](https://github.com/MasayukiOzawa/decode-2019-demo/raw/master/Images/03.Data%20Store/01.Storage%20Pool/ADL%20Query%20Plan.png)

## 3. 階層化したデータへのアクセス (AWS S3)

In [11]:
USE [StoragePool];

-- オブジェクトの初期化
IF EXISTS (SELECT * FROM sys.external_tables WHERE name = 'StoragePoolS3TBL')
BEGIN
	DROP EXTERNAL TABLE [StoragePoolS3TBL]
END;
GO

**1. 外部テーブルの作成**  
テーブルの作成方法は、HDFS のデータアクセスと同様

In [12]:
USE [StoragePool];

-- AWS S3 の内容を外部テーブルとして作成
CREATE EXTERNAL TABLE [StoragePoolS3TBL]
(
	wcs_click_date_sk BIGINT , 
	wcs_click_time_sk BIGINT , 
	wcs_sales_sk BIGINT , 
	wcs_item_sk BIGINT , 
	wcs_web_page_sk BIGINT , 
	wcs_user_sk BIGINT
)
WITH
(
    DATA_SOURCE = SqlStoragePool,
	LOCATION = '/mounts/aws/s3',
    FILE_FORMAT = csv_file
)
GO

**2. AWS S3 のデータにアクセス**

In [13]:
USE [StoragePool];

SELECT COUNT(*) FROM [StoragePoolS3TBL]
SELECT TOP 25 * FROM [StoragePoolS3TBL]

(No column name)
1497


wcs_click_date_sk,wcs_click_time_sk,wcs_sales_sk,wcs_item_sk,wcs_web_page_sk,wcs_user_sk
38569,4250,,7840,18,
38569,85106,,11130,18,
38569,52655,,3716,18,
38569,70934,,13243,18,
38569,40166,,5389,18,
38570,73271,,3331,18,
38570,24651,,10049,18,
38570,23805,,921,18,
38570,66458,,4407,18,
38570,65912,,11494,18,
