<p align="center">
<img src ="https://raw.githubusercontent.com/microsoft/azuredatastudio/master/src/sql/media/microsoft_logo_gray.svg?sanitize=true" width="250" align="center">
</p>
 
## Model Management with MLFlow on SQL Server 2019 Big Data Cluster in kubeadm
 
This notebook walks through the process of deploying MLFlow in SQL Server 2019 Big Data Cluster in kubeadm. You will be able to connect to the container and track models using MLFlow after this.

<span style="color:red"><font size="3">Please press the "Run Cells" button to run the notebook</font></span>

### **Prerequisites**
Ensure the following tools are installed and added to PATH before proceeding.

|Tools|Description|Installation|
|---|---|---|
|kubectl | Command-line tool for monitoring the underlying Kuberentes cluster | [Installation](https://kubernetes.io/docs/tasks/tools/install-kubectl/#install-kubectl-binary-using-native-package-management) |

### **Check dependencies**

In [2]:
import pandas,sys,os,json,time,subprocess
pandas_version = pandas.__version__.split('.')
pandas_major = int(pandas_version[0])
pandas_minor = int(pandas_version[1])
pandas_patch = int(pandas_version[2])
if not (pandas_major > 0 or (pandas_major == 0 and pandas_minor > 24) or (pandas_major == 0 and pandas_minor == 24 and pandas_patch >= 2)):
    sys.exit('Please upgrade the Notebook dependency before you can proceed, you can do it by running the "Reinstall Notebook dependencies" command in command palette (View menu -> Command Palette…).')
def run_command(command):
    print("Executing: " + command)
    stdout = subprocess.check_output(
        command,
        stderr=subprocess.STDOUT,
        shell=True).decode("utf-8")
    print(stdout)
    return stdout
    if _exit_code != 0:
        sys.exit(f'Command execution failed with exit code: {str(_exit_code)}.\n\t{command}\n')
    print(f'Successfully executed: {command}')
    
run_command('kubectl version --client=true')

### **Setup cluster context**
Enter your kube_config and your Big Data Cluster name.

In [3]:
from sys import platform

home_dir = None
if platform == "linux" or platform == "linux2":
    home_dir = os.environ.get("HOME")
elif platform == "darwin":
    home_dir = os.environ.get("HOME")
elif platform == "win32":
    home_dir = os.environ.get("USERPROFILE")
default_config_path = os.path.join(home_dir, ".kube", "config")
kube_config = os.environ.get("KUBECONFIG")
if kube_config:
    default_config_path = kube_config
kube_config = input("Enter kube config. Default: %s" % default_config_path) or default_config_path
os.environ["KUBECONFIG"] = kube_config
print("Cluster Config Location: %s" % kube_config)
namespace = input("Enter Big data cluster name")
run_command('kubectl config set-context --current --namespace=%s' % namespace)

In [4]:
from pathlib import Path
def find_file(file_name):
    root = os.getcwd()
    file_path = None
    for filename in Path(root).rglob(file_name):
        file_path = os.path.join(root, filename)
        break
    return file_path

### **Deploy MLFlow Container**
Deploy an MLFlow container in your Big Data Cluster

In [6]:
config_file = find_file('mlflow-kubeadm.yaml')
run_command('kubectl apply -f "%s" -n %s' % (config_file, namespace))

### **Get MLFlow container endpoint**

In [17]:

for i in range(1, 50):
    pod_status = run_command('kubectl get pod --selector=app=mlflow  -n %s -o=jsonpath="{.items[0].status.containerStatuses[0].state}"" ' % namespace)
    if pod_status and "running" in str(pod_status):
        break
    else:
        time.sleep(50)
        



You can access your MLFlow container at this url.

### **Install the MLFlow client inside BDC**

Get the number of replicas.

In [18]:
replicas = run_command('kubectl get sts storage-0 -n %s -o=jsonpath="{.status.replicas}"' % namespace)

Copy and install the MLFlow package

In [19]:
package_file = find_file('mlflow-1.1.1.dev0-py3-none-any.whl')
for i in range(0, int(replicas)):
    pod_name = "storage-0-%s" % str(i)
    print('Installing MLFlow in %s' % pod_name)
    run_command('kubectl cp "%s" -c hadoop %s:/var/mlflow-1.1.1.dev0-py3-none-any.whl -n %s' % (package_file, pod_name, namespace))
    run_command('kubectl exec -ti %s -c hadoop -n %s -- pip3 install /var/mlflow-1.1.1.dev0-py3-none-any.whl' % (pod_name, namespace))

### **Install the MLFlow client inside Azure Data Studio**

In [0]:
pip install mlflow-1.1.1.dev0-py3-none-any.whl

Now, you can use and access MLFlow from Azure Data Studio.