![Microsoft](https://raw.githubusercontent.com/microsoft/azuredatastudio/master/src/sql/media/microsoft-small-logo.png)
 
## Deploy SQL Server 2019 Big Data Cluster on an existing Azure Kubernetes Service (AKS) cluster
 
This notebook walks through the process of deploying a <a href="https://docs.microsoft.com/sql/big-data-cluster/big-data-cluster-overview?view=sqlallproducts-allversions">SQL Server 2019 Big Data Cluster</a> on an existing AKS cluster.
 
* Follow the instructions in the **Prerequisites** cell to install the tools if not already installed.
* Make sure you have the target cluster set as the current context in your kubectl config file.
        The config file would typically be under C:\Users\(userid)\.kube on Windows, and under ~/.kube/ for macOS and Linux for a default installation.
        In the kubectl config file, look for "current-context" and ensure it is set to the AKS cluster that the SQL Server 2019 Big Data Cluster will be deployed to.
* The **Required information** cell will prompt you for password that will be used to access the cluster controller, SQL Server, and Knox.
* The values in the **Default settings** cell can be changed as appropriate.

<span style="color:red"><font size="3">Please press the "Run Cells" button to run the notebook</font></span>

### **Prerequisites** 
Ensure the following tools are installed and added to PATH before proceeding.
 
|Tools|Description|Installation|
|---|---|---|
|kubectl | Command-line tool for monitoring the underlying Kuberentes cluster | [Installation](https://kubernetes.io/docs/tasks/tools/install-kubectl/#install-kubectl-binary-using-native-package-management) |
|azdata | Command-line tool for installing and managing a Big Data Cluster |[Installation](https://docs.microsoft.com/en-us/sql/big-data-cluster/deploy-install-azdata?view=sqlallproducts-allversions) |

### **Check dependencies**

In [1]:
import pandas,sys,os,json,html,getpass,time
pandas_version = pandas.__version__.split('.')
pandas_major = int(pandas_version[0])
pandas_minor = int(pandas_version[1])
pandas_patch = int(pandas_version[2])
if not (pandas_major > 0 or (pandas_major == 0 and pandas_minor > 24) or (pandas_major == 0 and pandas_minor == 24 and pandas_patch >= 2)):
    sys.exit('Please upgrade the Notebook dependency before you can proceed, you can do it by running the "Reinstall Notebook dependencies" command in command palette (View menu -> Command Palette…).')
def run_command(command):
    print("Executing: " + command)
    !{command}
    if _exit_code != 0:
        sys.exit(f'Command execution failed with exit code: {str(_exit_code)}.\n\t{command}\n')
    print(f'Successfully executed: {command}')

run_command('kubectl version --client=true')
run_command('azdata --version')

### **Required information**

In [3]:
invoked_by_wizard = "AZDATA_NB_VAR_BDC_ADMIN_PASSWORD" in os.environ
if not invoked_by_wizard:
    mssql_password = getpass.getpass(prompt = 'SQL Server 2019 Big Data Cluster controller password')
    if mssql_password == "":
        sys.exit(f'Password is required.')
    confirm_password = getpass.getpass(prompt = 'Confirm password')
    if mssql_password != confirm_password:
        sys.exit(f'Passwords do not match.')

print('You can also use the controller password to access Knox and SQL Server.')

### **Default settings**

In [4]:
if invoked_by_wizard:
    mssql_cluster_name = os.environ["AZDATA_NB_VAR_BDC_CLUSTER_NAME"]
    mssql_controller_username = os.environ["AZDATA_NB_VAR_BDC_CONTROLLER_USERNAME"]
    mssql_password = os.environ["AZDATA_NB_VAR_BDC_ADMIN_PASSWORD"]
    mssql_source_profile = os.environ["AZDATA_NB_VAR_BDC_DEPLOYMENT_PROFILE"]
    mssql_kube_config_path = os.environ["AZDATA_NB_VAR_BDC_KUBECONFIG_PATH"]
    mssql_cluster_context = os.environ["AZDATA_NB_VAR_BDC_CLUSTER_CONTEXT"]
    mssql_sqlserver_scale = int(os.environ["AZDATA_NB_VAR_BDC_SQLSERVER_SCALE"])
    mssql_compute_scale = int(os.environ["AZDATA_NB_VAR_BDC_COMPUTEPOOL_SCALE"])
    mssql_data_scale = int(os.environ["AZDATA_NB_VAR_BDC_DATAPOOL_SCALE"])
    mssql_hdfs_scale = int(os.environ["AZDATA_NB_VAR_BDC_HDFSPOOL_SCALE"])
    mssql_spark_scale = int(os.environ["AZDATA_NB_VAR_BDC_SPARKPOOL_SCALE"])
    mssql_name_node_scale = int(os.environ["AZDATA_NB_VAR_BDC_NAMENODE_SCALE"])
    mssql_include_spark = os.environ["AZDATA_NB_VAR_BDC_INCLUDESPARK"] == "true"
    mssql_controller_port = int(os.environ["AZDATA_NB_VAR_BDC_CONTROLLER_PORT"])
    mssql_sqlserver_port = int(os.environ["AZDATA_NB_VAR_BDC_SQL_PORT"])
    mssql_gateway_port = int(os.environ["AZDATA_NB_VAR_BDC_GATEWAY_PORT"])
    mssql_readable_secondary_port = os.environ["AZDATA_NB_VAR_BDC_READABLE_SECONDARY_PORT"]
    mssql_controller_data_storage_class = os.environ["AZDATA_NB_VAR_BDC_CONTROLLER_DATA_STORAGE_CLASS"]
    mssql_controller_data_size = int(os.environ["AZDATA_NB_VAR_BDC_CONTROLLER_DATA_STORAGE_SIZE"])
    mssql_controller_logs_storage_class = os.environ["AZDATA_NB_VAR_BDC_CONTROLLER_LOGS_STORAGE_CLASS"]
    mssql_controller_logs_size = int(os.environ["AZDATA_NB_VAR_BDC_CONTROLLER_LOGS_STORAGE_SIZE"])
    mssql_datapool_data_storage_class = os.environ["AZDATA_NB_VAR_BDC_DATA_DATA_STORAGE_CLASS"]
    mssql_datapool_data_size = int(os.environ["AZDATA_NB_VAR_BDC_DATA_DATA_STORAGE_SIZE"])
    mssql_datapool_logs_storage_class = os.environ["AZDATA_NB_VAR_BDC_DATA_LOGS_STORAGE_CLASS"]
    mssql_datapool_logs_size = int(os.environ["AZDATA_NB_VAR_BDC_DATA_LOGS_STORAGE_SIZE"])
    mssql_hdfs_data_storage_class = os.environ["AZDATA_NB_VAR_BDC_HDFS_DATA_STORAGE_CLASS"]
    mssql_hdfs_data_size = int(os.environ["AZDATA_NB_VAR_BDC_HDFS_DATA_STORAGE_SIZE"])
    mssql_hdfs_logs_storage_class = os.environ["AZDATA_NB_VAR_BDC_HDFS_LOGS_STORAGE_CLASS"]
    mssql_hdfs_logs_size = int(os.environ["AZDATA_NB_VAR_BDC_HDFS_LOGS_STORAGE_SIZE"])
    mssql_sql_data_storage_class = os.environ["AZDATA_NB_VAR_BDC_SQL_DATA_STORAGE_CLASS"]
    mssql_sql_data_size = int(os.environ["AZDATA_NB_VAR_BDC_SQL_DATA_STORAGE_SIZE"])
    mssql_sql_logs_storage_class = os.environ["AZDATA_NB_VAR_BDC_SQL_LOGS_STORAGE_CLASS"]
    mssql_sql_logs_size = int(os.environ["AZDATA_NB_VAR_BDC_SQL_LOGS_STORAGE_SIZE"])
    mssql_hadr_enabled = os.environ["AZDATA_NB_VAR_BDC_ENABLE_HADR"] == "true"
    os.environ["KUBECONFIG"] = mssql_kube_config_path
else:
    mssql_source_profile = 'aks-dev-test'
    mssql_cluster_name = 'mssql-cluster'
    mssql_controller_username = 'admin'

mssql_target_profile = 'ads-bdc-custom-profile'
print(f'SQL Server Big Data Cluster name: {mssql_cluster_name}')
print(f'SQL Server Big Data Cluster controller username: {mssql_controller_username}')
print(f'Deployment source profile: {mssql_source_profile}')
print(f'Deployment profile: {mssql_target_profile}')
if invoked_by_wizard:
    print(f'kube config path: {mssql_kube_config_path}')
    print(f'Cluster context: {mssql_cluster_context}')
    print(f'SQL Server Master scale: {mssql_sqlserver_scale}')
    print(f'Enable Availability Groups: {mssql_hadr_enabled}')
    print(f'Compute pool scale: {mssql_compute_scale}')
    print(f'HDFS pool scale: {mssql_hdfs_scale}')
    print(f'Include Spark in HDFS pool: {mssql_include_spark}')
    print(f'Data pool scale: {mssql_data_scale}')
    print(f'Spark pool scale: {mssql_spark_scale}')
    print(f'HDFS name node scale: {mssql_name_node_scale}')
    print(f'Controller port: {mssql_controller_port}')
    print(f'SQL Server port: {mssql_sqlserver_port}')
    print(f'Gateway port: {mssql_gateway_port}')
    if mssql_readable_secondary_port != '':
        print(f'Readable secondary port: {mssql_readable_secondary_port}')
    print(f'Controller data storage class name: {mssql_controller_data_storage_class}')
    print(f'Controller logs storage class name: {mssql_controller_logs_storage_class}')
    print(f'Controller data storage size(GB): {mssql_controller_data_size}')
    print(f'Controller logs storage size(GB): {mssql_controller_logs_size}')
    print(f'Data pool data storage class name: {mssql_datapool_data_storage_class}')
    print(f'Data pool logs storage class name: {mssql_datapool_logs_storage_class}')
    print(f'Data pool data storage size(GB): {mssql_datapool_data_size}')
    print(f'Data pool logs storage size(GB): {mssql_datapool_logs_size}')
    print(f'HDFS data storage class name: {mssql_hdfs_data_storage_class}')
    print(f'HDFS logs storage class name: {mssql_hdfs_logs_storage_class}')
    print(f'HDFS data storage size(GB): {mssql_hdfs_data_size}')
    print(f'HDFS logs storage size(GB): {mssql_hdfs_logs_size}')
    print(f'SQL Server Master data storage class name: {mssql_sql_data_storage_class}')
    print(f'SQL Server Master logs storage class name: {mssql_sql_logs_storage_class}')
    print(f'SQL Server Master data storage size(GB): {mssql_sql_data_size}')
    print(f'SQL Server Master logs storage size(GB): {mssql_sql_logs_size}')

### **Set and show current context**

In [0]:
if invoked_by_wizard:
    run_command(f'kubectl config use-context {mssql_cluster_context}')
run_command('kubectl config current-context')

### **Create a deployment configuration file**

In [6]:
os.environ["ACCEPT_EULA"] = 'yes'
run_command(f'azdata bdc config init --source {mssql_source_profile} --target {mssql_target_profile} --force')
if invoked_by_wizard:
    run_command(f'azdata bdc config replace -c {mssql_target_profile}/bdc.json -j "$.spec.resources.gateway.spec.endpoints[?(@.name==""Knox"")].port={mssql_gateway_port}"')
    run_command(f'azdata bdc config replace -c {mssql_target_profile}/control.json -j "$.spec.endpoints[?(@.name==""Controller"")].port={mssql_controller_port}"')
    run_command(f'azdata bdc config replace -c {mssql_target_profile}/control.json -j $.spec.storage.data.className={mssql_controller_data_storage_class}')
    run_command(f'azdata bdc config replace -c {mssql_target_profile}/control.json -j $.spec.storage.data.size={mssql_controller_data_size}Gi')
    run_command(f'azdata bdc config replace -c {mssql_target_profile}/control.json -j $.spec.storage.logs.className={mssql_controller_logs_storage_class}')
    run_command(f'azdata bdc config replace -c {mssql_target_profile}/control.json -j $.spec.storage.logs.size={mssql_controller_logs_size}Gi')
    bdcPatch = {
        "patch":[
            {
                "op": "replace",
                "path": "spec.resources.master.spec",
                "value": {
                    "type": "Master",
                    "replicas": mssql_sqlserver_scale,
                    "endpoints": [
                        {
                            "name": "Master",
                            "serviceType": "LoadBalancer",
                            "port": mssql_sqlserver_port
                        }
                    ],
                    "settings": {
                        "sql": {
                            "hadr.enabled": mssql_hadr_enabled
                            }
                        },
                    "storage": {
                    "data": {
                        "size": f'{mssql_sql_data_size}Gi',
                        "className": mssql_sql_data_storage_class,
                        "accessMode": "ReadWriteOnce"
                        },
                    "logs": {
                        "size": f'{mssql_sql_logs_size}Gi',
                        "className": mssql_sql_logs_storage_class,
                        "accessMode": "ReadWriteOnce"
                        }
                    }
                }
            }, {
                "op": "replace",
                "path": "metadata.name",
                "value": mssql_cluster_name
            }, {
                "op": "replace",
                "path": "spec.resources.sparkhead.spec",
                "value": {
                    "replicas": mssql_spark_scale
                }
            }, {
                "op": "replace",
                "path": "spec.resources.compute-0.spec",
                "value": {
                    "replicas": mssql_name_node_scale
                }
            }, {
                "op": "replace",
                "path": "spec.resources.nmnode-0.spec",
                "value": {
                    "type": "Compute",
                    "replicas": mssql_compute_scale
                }
            }, {
                "op": "replace",
                "path": "spec.resources.storage-0.spec",
                "value": {
                    "type": "Storage",
                    "replicas": mssql_hdfs_scale,
                    "settings": {
                        "spark": {
                            "includeSpark": mssql_include_spark
                        }
                    },
                    "storage": {
                        "data": {
                            "size": f'{mssql_hdfs_data_size}Gi',
                            "className": mssql_hdfs_data_storage_class,
                            "accessMode": "ReadWriteOnce"
                        },
                        "logs": {
                            "size": f'{mssql_hdfs_logs_size}Gi',
                            "className": mssql_hdfs_logs_storage_class,
                            "accessMode": "ReadWriteOnce"
                        }
                    }
                }
            },{
                "op": "replace",
                "path": "spec.resources.data-0.spec",
                "value": {
                    "type": "Data",
                    "replicas": mssql_data_scale,
                    "storage": {
                        "data": {
                            "size": f'{mssql_datapool_data_size}Gi',
                            "className": mssql_datapool_data_storage_class,
                            "accessMode": "ReadWriteOnce"
                        },
                        "logs": {
                            "size": f'{mssql_datapool_logs_size}Gi',
                            "className": mssql_datapool_logs_storage_class,
                            "accessMode": "ReadWriteOnce"
                        }
                    }
                }
            }
        ]
    }
    if mssql_spark_scale > 0:
        bdcPatch['patch'].append({
            "op": "add",
            "path": "spec.resources.spark-0",
            "value": {
                "metadata": {
                    "kind": "Pool",
                    "name": "default"
                    },
                "spec": {
                    "type": "Spark",
                    "replicas": mssql_spark_scale
                    }
                }
            })
        bdcPatch['patch'].append({
            "op": "add",
            "path": "spec.services.spark.resources/-",
            "value": "spark-0"
            })
        bdcPatch['patch'].append({
            "op": "add",
            "path": "spec.services.hdfs.resources/-",
            "value": "spark-0"
            })    
    if mssql_hadr_enabled:
        bdcPatch['patch'][0]['value']['endpoints'].append({
            "name": "MasterSecondary",
            "dnsName": "",
            "serviceType": "NodePort",
            "port": int(mssql_readable_secondary_port)})
    with open(f'{mssql_target_profile}/patch.json', "w") as write_file:
        json.dump(bdcPatch, write_file)
    run_command(f'azdata bdc config patch -c {mssql_target_profile}/bdc.json --patch-file {mssql_target_profile}/patch.json')
else:
    run_command(f'azdata bdc config replace -c {mssql_target_profile}/bdc.json -j metadata.name={mssql_cluster_name}')

### **Create SQL Server 2019 Big Data Cluster**

In [7]:
print (f'Creating SQL Server 2019 Big Data Cluster: {mssql_cluster_name} using configuration {mssql_target_profile}')
os.environ["CONTROLLER_USERNAME"] = mssql_controller_username
os.environ["CONTROLLER_PASSWORD"] = mssql_password
os.environ["MSSQL_SA_PASSWORD"] = mssql_password
os.environ["KNOX_PASSWORD"] = mssql_password
run_command(f'azdata bdc create -c {mssql_target_profile}')

### **Login to SQL Server 2019 Big Data Cluster**

In [8]:
run_command(f'azdata login --cluster-name {mssql_cluster_name}')

### **Show SQL Server 2019 Big Data Cluster endpoints**

In [9]:
from IPython.display import *
pandas.set_option('display.max_colwidth', -1)
cmd = f'azdata bdc endpoint list'
cmdOutput = !{cmd}
endpoints = json.loads(''.join(cmdOutput))
endpointsDataFrame = pandas.DataFrame(endpoints)
endpointsDataFrame.columns = [' '.join(word[0].upper() + word[1:] for word in columnName.split()) for columnName in endpoints[0].keys()]
display(HTML(endpointsDataFrame.to_html(index=False, render_links=True)))

### **Connect to SQL Server Master instance in Azure Data Studio**
Click the link below to connect to the SQL Server Master instance of the SQL Server 2019 Big Data Cluster.

In [10]:
sqlEndpoints = [x for x in endpoints if x['name'] == 'sql-server-master']
if sqlEndpoints and len(sqlEndpoints) == 1:
    connectionParameter = '{"serverName":"' + sqlEndpoints[0]['endpoint'] + '","providerName":"MSSQL","authenticationType":"SqlLogin","userName":"sa","password":' + json.dumps(mssql_password) + '}'
    display(HTML('<br/><a href="command:azdata.connect?' + html.escape(connectionParameter)+'"><font size="3">Click here to connect to SQL Server Master instance</font></a><br/>'))
else:
    sys.exit('Could not find the SQL Server Master instance endpoint')