This notebook was generated from the following AutoML run:

https://ml.azure.com/runs/model_loan_validationset_21?wsid=/subscriptions/84677e42-3672-4e5b-98f7-81f8a44649a9/resourcegroups/ML_Project/workspaces/ML_ProjectWS1

# (A) Automated ML Method - Voting Ensemble Model 

#  Train using Azure Machine Learning Compute

* Connect to an Azure Machine Learning Workspace
* Use existing compute target or create new
* Configure & Run command


## Prerequisites
Please ensure Azure Machine Learning Python SDK v2 is installed on the machine running Jupyter.

## Connect to a Workspace

Initialize a workspace object from the previous experiment. 

In [None]:
# Import the required libraries
from azure.identity import DefaultAzureCredential
from azure.ai.ml import MLClient

# The workspace information from the previous experiment has been pre-filled for you.
subscription_id = "84677e42-3672-4e5b-98f7-81f8a44649a9"
resource_group = "ML_Project"
workspace_name = "ML_ProjectWS1"

credential = DefaultAzureCredential()
ml_client = MLClient(credential, subscription_id, resource_group, workspace_name)
workspace = ml_client.workspaces.get(name=ml_client.workspace_name)
print(ml_client.workspace_name, workspace.resource_group, workspace.location, ml_client.connections._subscription_id, sep = '\n')

### Create project directory

Create a directory that will contain the training script that you will need access to on the remote resource.

In [None]:
import os
import shutil

project_folder = os.path.join(".", 'code_folder')
os.makedirs(project_folder, exist_ok=True)
shutil.copy('script.py', project_folder)

### Use existing compute target or create new (Basic)

Azure Machine Learning Compute is managed compute infrastructure that allows the user to easily create single to multi-node compute of the appropriate VM Family. It is created **within your workspace region** and is a resource that can be used by other users in your workspace. It autoscales by default to the max_nodes, when a job is submitted, and executes in a containerized environment packaging the dependencies as specified by the user. 

Since it is managed compute, job scheduling and cluster management are handled internally by Azure Machine Learning service. 

A compute cluster can be created using the `AmlCompute` class. Some of the key parameters of this class are:

* `size` - The VM size to use for the cluster. For more information, see [Supported VM series and sizes](https://docs.microsoft.com/en-us/azure/machine-learning/concept-compute-target#supported-vm-series-and-sizes).
* `max_instances` - The maximum number of nodes to use on the cluster. Default is 1.

In [None]:
from azure.ai.ml.entities import AmlCompute

# Choose a name for your CPU cluster
cluster_name = "cpu-cluster"

# Verify that cluster does not exist already
try:
    cluster = ml_client.compute.get(cluster_name)
    print('Found existing cluster, use it.')
except Exception:
    compute = AmlCompute(name=cluster_name, size='STANDARD_D2_V2',
                         max_instances=4)
    cluster = ml_client.compute.begin_create_or_update(compute)


### Configure & Run
The environment and compute has been pre-filled from the original training job. More information can be found here:

`command`: https://docs.microsoft.com/en-us/python/api/azure-ai-ml/azure.ai.ml?view=azure-python-preview#azure-ai-ml-command

`environment`: https://docs.microsoft.com/en-us/azure/machine-learning/resource-curated-environments#automated-ml-automl

`compute`: https://docs.microsoft.com/en-us/python/api/azure-ai-ml/azure.ai.ml.entities.amlcompute?view=azure-python-preview



In [None]:
# To test the script with an environment referenced by a custom yaml file, uncomment the following lines and replace the `conda_file` value with the path to the yaml file.
# Set the value of `environment` in the `command` job below to `env`.

# env = Environment(
#    name="automl-tabular-env",
#    description="environment for automl inference",
#    image="mcr.microsoft.com/azureml/openmpi4.1.0-ubuntu20.04:20210727.v1",
#    conda_file="conda.yaml",
# )

In [None]:
from azure.ai.ml import command, Input

# To test with new training / validation datasets, replace the default dataset id(s)/uri(s) taken from parent run below
command_str = 'python script.py --training_dataset_uri azureml://locations/eastus/workspaces/7171f9b6-3d02-48f4-ac3a-50cd60c22aea/data/Loan_dataset/versions/1'
command_job = command(
    code=project_folder,
    command=command_str,
    tags=dict(automl_child_run_id='model_loan_validationset_21'),
    environment='AzureML-AutoML:172',
    compute='cpu-cluster',
    experiment_name='Model_Loan_Validationset')
 
returned_job = ml_client.create_or_update(command_job)
returned_job.studio_url

### Initialize MLFlow Client
The metrics and artifacts for the run can be accessed via the MLFlow interface. 
Initialize the MLFlow client here, and set the backend as Azure ML, via. the MLFlow Client.

*IMPORTANT*, you need to have installed the latest MLFlow packages with:

    pip install azureml-mlflow

    pip install mlflow

In [None]:
# %pip install azureml-mlflow
# %pip install mlflow

In [None]:
import mlflow

# Obtain the tracking URL from MLClient
MLFLOW_TRACKING_URI = ml_client.workspaces.get(
    name=ml_client.workspace_name
).mlflow_tracking_uri

# Set the MLFLOW TRACKING URI

mlflow.set_tracking_uri(MLFLOW_TRACKING_URI)

# Retrieve the metrics logged to the run.
from mlflow.tracking.client import MlflowClient

# Initialize MLFlow client
mlflow_client = MlflowClient()
mlflow_run = mlflow_client.get_run(returned_job.name)
mlflow_run.data.metrics


### Download Fitted Model
Download the resulting fitted model to the local folder in `local_dir`.

In [None]:
# import os

# Create local folder
# local_dir = "./artifact_downloads"
# if not os.path.exists(local_dir):
#     os.mkdir(local_dir)
# Download run's artifacts/outputs
# local_path = mlflow_client.download_artifacts(
#     mlflow_run.info.run_id, "outputs", local_dir# )
# print("Artifacts downloaded in: {}".format(local_path))
# print("Artifacts: {}".format(os.listdir(local_path)))


## Test Code after Deployment 

In [None]:
import urllib.request
import json
import os
import ssl

def allowSelfSignedHttps(allowed):
    # bypass the server certificate verification on client side
    if allowed and not os.environ.get('PYTHONHTTPSVERIFY', '') and getattr(ssl, '_create_unverified_context', None):
        ssl._create_default_https_context = ssl._create_unverified_context

allowSelfSignedHttps(True) # this line is needed if you use self-signed certificate in your scoring service.

# Request data goes here
# The example below assumes JSON formatting which may be updated
# depending on the format your endpoint expects.
# More information can be found here:
# https://docs.microsoft.com/azure/machine-learning/how-to-deploy-advanced-entry-script
data =  {
  "input_data": {
    "columns": [
      "no_of_dependents",
      "education",
      "self_employed",
      "income_annum",
      "loan_amount",
      "loan_term",
      "cibil_score",
      "residential_assets_value",
      "commercial_assets_value",
      "luxury_assets_value",
      "bank_asset_value"
    ],
    "index": [],
    "data": []
  }
}

body = str.encode(json.dumps(data))

url = 'https://ml-projectws1-group1-loan.eastus.inference.ml.azure.com/score'
# Replace this with the primary/secondary key, AMLToken, or Microsoft Entra ID token for the endpoint
api_key = ''
if not api_key:
    raise Exception("A key should be provided to invoke the endpoint")

# The azureml-model-deployment header will force the request to go to a specific deployment.
# Remove this header to have the request observe the endpoint traffic rules
headers = {'Content-Type':'application/json', 'Authorization':('Bearer '+ api_key), 'azureml-model-deployment': 'ml-projectws1-group1-loan' }

req = urllib.request.Request(url, body, headers)

try:
    response = urllib.request.urlopen(req)

    result = response.read()
    print(result)
except urllib.error.HTTPError as error:
    print("The request failed with status code: " + str(error.code))

    # Print the headers - they include the requert ID and the timestamp, which are useful for debugging the failure
    print(error.info())
    print(error.read().decode("utf8", 'ignore'))

# (B)Pipeline Method - Boosted Decision Tree Model

In [None]:
import json
import os
import traceback

from azureml.core.workspace import Workspace

try:
    from azureml.train.automl._remote_script import model_test_wrapper
    print("SDK supports model testing.")
except Exception:
    print("SDK does not support model testing.")
    raise


import sys
import traceback

try:
    from azureml.train.automl._remote_script import setup_wrapper
    from azureml.train.automl._remote_script import driver_wrapper
except Exception as e:
    print("v2 driver import failed with exception: {}. Falling back to v1 driver.".format(e))
    traceback.print_exc()
    import importlib
    import inspect
    import logging
    import os
    import sys
    import time

    from automl.client.core.common import utilities
    from azureml.core.experiment import Experiment
    from azureml.core.run import Run
    from azureml.train.automl import automl
    from azureml.train.automl import fit_pipeline
    from azureml.train.automl._automl_settings import _AutoMLSettings

    try:
        from azureml.train.automl._cachestore import _CacheStore
        from azureml.train.automl._preprocessorcontexts import (RawDataContext,
                                                            TransformedDataContext)
        from azureml.train.automl._transform_data import _transform_data
        sdk_has_cache_capability = True
    except ImportError:
        sdk_has_cache_capability = False

    try:
        from azureml.train.automl.utilities import _validate_data_splits
        sdk_has_validate_data_splits = True
    except ImportError:
        sdk_has_validate_data_splits = False

    try:
        # Works only for azureml-train-automl>1.0.10
        from automl.client.core.common.training_utilities import validate_training_data, check_x_y
    except:
        from azureml.train.automl.utilities import _validate_training_data as validate_training_data, _check_x_y as check_x_y

    try:
        from automl.client.core.common.training_utilities import validate_training_data_dict
        sdk_has_validate_data_dict = True
    except:
        sdk_has_validate_data_dict = False


    try:
        from automl.client.core.common.logging_utilities import log_traceback
    except ImportError:
        def log_traceback(exception, logger, **kwargs):
            """Do nothing if not imported."""
            pass

    # holding these strings here to identify the exception that created by this script
    try:
        from  automl.client.core.common.exceptions import ErrorTypes
    except ImportError:
        class ErrorTypes:
            """Possible types of errors."""

            User = 'User'
            Service = 'Service'
            Client = 'Client'
            Unclassified = 'Unclassified'
            All = {User, Service, Client, Unclassified}

    def _get_auto_cv(X, y, X_valid, y_valid, cv_splits_indices, automl_settings_obj, logger):
        if hasattr(automl_settings_obj, "rule_based_validation"):
            return automl_settings_obj.rule_based_validation(
                X, y, X_valid, y_valid, cv_splits_indices,
                logger=logger)
        else:
            logger.info("SDK has no auto cv capability.")
            return X, y, X_valid, y_valid

    def _get_auto_cv_dict(input_dict, automl_settings_obj, logger):
        input_dict['X'], input_dict['y'], input_dict['X_valid'], input_dict['y_valid'] = _get_auto_cv(
            input_dict.get('X'),
            input_dict.get('y'),
            input_dict.get('X_valid'),
            input_dict.get('y_valid'),
            input_dict.get('cv_splits_indices'),
            automl_settings_obj,
            logger=logger)
        return input_dict

    def _get_cv_from_transformed_data_context(transformed_data_context, logger):
        n_cv = None
        if transformed_data_context._on_demand_pickle_keys is None:
            n_cv = None
        else:
            n_cv = sum([1 if "cv" in key else 0 for key in transformed_data_context._on_demand_pickle_keys])
        logger.info("The cv got from transformed_data_context is {}.".format(n_cv))
        return n_cv

    def _get_data_from_dataprep(dataprep_json, automl_settings_obj, logger):
        current_run = Run.get_submitted_run()
        parent_run_id = _get_parent_run_id(current_run._run_id)
        print("[ParentRunId:{}]: Start getting data using dataprep.".format(parent_run_id))
        logger.info("[ParentRunId:{}]: Start getting data using dataprep.".format(parent_run_id))
        try:
            import azureml.train.automl._dataprep_utilities as dataprep_utilities
        except Exception as e:
            e.error_type = ErrorTypes.Unclassified
            log_traceback(e, logger)
            logger.error(e)
            raise e

        fit_iteration_parameters_dict = dict()

        class RetrieveNumpyArrayError(Exception):
            def __init__(self):
                super().__init__()

        try:
            print("Resolving Dataflows...")
            logger.info("Resolving Dataflows...")
            dataprep_json_obj = json.loads(dataprep_json)
            if 'activities' in dataprep_json_obj: # json is serialized dataflows
                dataflow_dict = dataprep_utilities.load_dataflows_from_json(
                    dataprep_json)
                for k in ['X', 'X_valid', 'sample_weight', 'sample_weight_valid']:
                    fit_iteration_parameters_dict[k] = dataprep_utilities.try_retrieve_pandas_dataframe(dataflow_dict.get(k))
                for k in ['y', 'y_valid']:
                    try:
                        fit_iteration_parameters_dict[k] = dataprep_utilities.try_retrieve_numpy_array(dataflow_dict.get(k))
                    except IndexError:
                        raise RetrieveNumpyArrayError()

                cv_splits_dataflows = []
                i = 0
                while 'cv_splits_indices_{0}'.format(i) in dataflow_dict:
                    cv_splits_dataflows.append(
                        dataflow_dict['cv_splits_indices_{0}'.format(i)])
                    i = i + 1
                fit_iteration_parameters_dict['cv_splits_indices'] = None if len(cv_splits_dataflows) == 0 \
                    else dataprep_utilities.try_resolve_cv_splits_indices(cv_splits_dataflows)
            else: # json is dataprep options
                print('Creating Dataflow from options...\r\nOptions:')
                logger.info('Creating Dataflow from options...')
                print(dataprep_json_obj)
                datastore_name = dataprep_json_obj['datastoreName'] # mandatory
                data_path = dataprep_json_obj['dataPath'] # mandatory
                label_column = dataprep_json_obj['label'] # mandatory
                separator = dataprep_json_obj.get('columnSeparator', ',')
                header = dataprep_json_obj.get('promoteHeader', True)
                encoding = dataprep_json_obj.get('encoding', None)
                quoting = dataprep_json_obj.get('ignoreNewlineInQuotes', False)
                skip_rows = dataprep_json_obj.get('skipRows', 0)
                feature_columns = dataprep_json_obj.get('features', [])

                from azureml.core import Datastore
                import azureml.dataprep as dprep
                if header:
                    header = dprep.PromoteHeadersMode.CONSTANTGROUPED
                else:
                    header = dprep.PromoteHeadersMode.NONE
                try:
                    encoding = dprep.FileEncoding[encoding]
                except:
                    encoding = dprep.FileEncoding.UTF8

                ws = Run.get_context().experiment.workspace
                datastore = Datastore(ws, datastore_name)
                dflow = dprep.read_csv(path=datastore.path(data_path),
                                        separator=separator,
                                        header=header,
                                        encoding=encoding,
                                        quoting=quoting,
                                        skip_rows=skip_rows)

                if len(feature_columns) == 0:
                    X = dflow.drop_columns(label_column)
                else:
                    X = dflow.keep_columns(feature_columns)

                print('Inferring types for feature columns...')
                logger.info('Inferring types for feature columns...')
                sct = X.builders.set_column_types()
                sct.learn()
                sct.ambiguous_date_conversions_drop()
                X = sct.to_dataflow()

                y = dflow.keep_columns(label_column)
                if automl_settings_obj.task_type.lower() == 'regression':
                    y = y.to_number(label_column)

                print('X:')
                print(X)
                logger.info('X:')
                logger.info(X)

                print('y:')
                print(y)
                logger.info('y:')
                logger.info(y)

                try:
                    from azureml.train.automl._dataprep_utilities import try_retrieve_pandas_dataframe_adb
                    _X = try_retrieve_pandas_dataframe_adb(X)
                    fit_iteration_parameters_dict['X'] = _X.values
                    fit_iteration_parameters_dict['x_raw_column_names'] = _X.columns.values
                except ImportError:
                    logger.info("SDK version does not support column names extraction, fallback to old path")
                    fit_iteration_parameters_dict['X'] = dataprep_utilities.try_retrieve_pandas_dataframe(X)

                try:
                    fit_iteration_parameters_dict['y'] = dataprep_utilities.try_retrieve_numpy_array(y)
                except IndexError:
                    raise RetrieveNumpyArrayError()

            logger.info("Finish getting data using dataprep.")
            return fit_iteration_parameters_dict
        except Exception as e:
            print("[ParentRunId:{0}]: Error from resolving Dataflows: {1} {2}".format(parent_run_id, e.__class__, e))
            logger.error("[ParentRunId:{0}]: Error from resolving Dataflows: {1} {2}".format(parent_run_id, e.__class__, e))
            if isinstance(e, RetrieveNumpyArrayError):
                logger.debug("Label column (y) does not exist in user's data.")
                e.error_type = ErrorTypes.User
            elif "The provided path is not valid." in str(e):
                logger.debug("User's data is not accessible from remote run.")
                e.error_type = ErrorTypes.User
            elif "Required secrets are missing. Please call use_secrets to register the missing secrets." in str(e):
                logger.debug("User should use Datastore to data that requires secrets.")
                e.error_type = ErrorTypes.User
            else:
                e.error_type = ErrorTypes.Client
            log_traceback(e, logger)
            raise RuntimeError("Error during extracting Dataflows")

    def _init_logger(automl_settings_obj=None):
        sdk_has_custom_dimension_logger = False
        try:
            from azureml.telemetry import set_diagnostics_collection
            if automl_settings_obj is not None:
                set_diagnostics_collection(send_diagnostics=automl_settings_obj.send_telemetry,
                                           verbosity=automl_settings_obj.telemetry_verbosity)
        except:
            print("set_diagnostics_collection failed.")

        try:
            from azureml.train.automl._logging import get_logger
            if "automl_settings" in inspect.getcallargs(get_logger, log_file_name="AutoML_remote.log"):
                logger = get_logger(log_file_name="AutoML_remote.log", automl_settings=automl_settings_obj)
                sdk_has_custom_dimension_logger = True
            else:
                logger = get_logger(log_file_name="AutoML_remote.log")
                sdk_has_custom_dimension_logger = False
            logger.info("sdk_has_custom_dimension_logger {}.".format(sdk_has_custom_dimension_logger))
        except ImportError:
            logger = logging.getLogger(__name__)
            logger.addHandler(logging.NullHandler())
        logger.info("Init logger successfully with automl_settings {}.".format(automl_settings_obj))
        try:
            from automl.client.core.common.utilities import get_sdk_dependencies
            logger.info(get_sdk_dependencies())
        except Exception as e:
            pass
        return logger, sdk_has_custom_dimension_logger

    def _init_directory(directory, logger):
        logger.info("Start init directory.")
        if(directory == None):
            directory = os.path.dirname(__file__)

        logger.info("Adding directory to system path.")
        sys.path.append(directory)

        # create the outputs folder
        logger.info("Creating output folder.")
        os.makedirs('./outputs', exist_ok=True)
        print("create output folder")
        logger.info("Finished init directory.")
        return directory

    def _get_parent_run_id(run_id):
        split = run_id.split("_")
        if len(split) > 2:
            split.pop()
        else:
            return run_id

        parent_run_id = '_'.join(str(e) for e in split)
        return parent_run_id

    def _load_data_from_user_script(script_directory, entry_point, logger):
        #  Load user script to get access to GetData function
        logger.info("Loading data using user script.")
        try:
            from azureml.train.automl import extract_user_data
        except Exception as e:
            logger.warning(e)

        module_name = None
        if (entry_point.endswith('.py')):
            module_name = entry_point[:-3]

        spec = importlib.util.spec_from_file_location(
            module_name, os.path.join(script_directory, entry_point))
        module_obj = importlib.util.module_from_spec(spec)
        spec.loader.exec_module(module_obj)
        # print("Extracting user Data from {0}".format(module_name))

        fit_iteration_parameters_dict = dict()
        try:
            output_dict = extract_user_data(module_obj)
            for k, v in output_dict.items():
                fit_iteration_parameters_dict[k] = v
        except Exception as e:
            logger.warning("Meeting exceptions using user script {}.".format(e))
            fit_iteration_parameters_dict['X'], fit_iteration_parameters_dict['y'] = module_obj.get_data()

        return fit_iteration_parameters_dict

    def _prepare_data(dataprep_json, automl_settings_obj, script_directory, entry_point, logger):
        if dataprep_json:
            return _get_data_from_dataprep(dataprep_json, automl_settings_obj, logger)
        else:
            return _load_data_from_user_script(script_directory, entry_point, logger)

    def _get_transformed_data_context(X, y, X_valid, y_valid,
                                     sample_weight, sample_weight_valid,
                                     x_raw_column_names, cv_splits_indices,
                                     data_store, run_target,
                                     automl_settings_obj, parent_run_id, logger,
                                     raw_data_context=None):
        logger.info("Getting transformed data context.")
        if raw_data_context is None:
            logger.info("raw_data_context is None, creating a new one.")
            raw_data_context = RawDataContext(task_type=automl_settings_obj.task_type,
                                              X=X,
                                              y=y,
                                              X_valid=X_valid,
                                              y_valid=y_valid,
                                              sample_weight=sample_weight,
                                              sample_weight_valid=sample_weight_valid,
                                              x_raw_column_names=x_raw_column_names,
                                              lag_length=automl_settings_obj.lag_length,
                                              cv_splits_indices=cv_splits_indices,
                                              automl_settings_obj=automl_settings_obj,
                                              enable_cache=automl_settings_obj.enable_cache,
                                              data_store=data_store,
                                              run_target='remote',
                                              timeseries=automl_settings_obj.is_timeseries,
                                              timeseries_param_dict=utilities._get_ts_params_dict(automl_settings_obj)
                                              )

        transformed_data_context = _transform_data(raw_data_context=raw_data_context,
                                                   preprocess=automl_settings_obj.preprocess,
                                                   logger=logger,
                                                   run_id=parent_run_id)
        logger.info("Finished getting transformed data context.")

        return transformed_data_context

    def _set_problem_info_for_setup(fit_iteration_parameters_dict,
                                   automl_settings_obj, task_type, preprocess,
                                   enable_subsampling, num_iterations,
                                   logger):
        current_run = Run.get_submitted_run()
        logger.info("Start to set problem info for the setup for run id {}.".format(current_run._run_id))
        logger.info("Setup experiment.")
        try:
            experiment = current_run.experiment
            parent_run_id = _get_parent_run_id(current_run._run_id)
            data_store = experiment.workspace.get_default_datastore()
            found_data_store = True
            logger.info("Using data store.")
        except Exception as e:
            logger.warning("Getting data store, fallback to default {}".format(e))
            found_data_store = False

        logger.info("Caching supported {}.".format(sdk_has_cache_capability and found_data_store))
        print("caching supported {}".format(sdk_has_cache_capability and found_data_store))
        if sdk_has_validate_data_dict:
            # The newest version of validate_training_data_dict should contains check_x_y
            logger.info("Using validate_training_data_dict now.")
            validate_training_data_dict(data_dict=fit_iteration_parameters_dict, automl_settings=automl_settings_obj)
        else:
            logger.info("Using validate_training_data now.")
            validate_training_data(X=fit_iteration_parameters_dict.get('X'),
                                      y=fit_iteration_parameters_dict.get('y'),
                                      X_valid=fit_iteration_parameters_dict.get('X_valid'),
                                      y_valid=fit_iteration_parameters_dict.get('y_valid'),
                                      sample_weight=fit_iteration_parameters_dict.get('sample_weight'),
                                      sample_weight_valid=fit_iteration_parameters_dict.get('sample_weight_valid'),
                                      cv_splits_indices=fit_iteration_parameters_dict.get('cv_splits_indices'),
                                      automl_settings=automl_settings_obj)
            check_x_y(fit_iteration_parameters_dict.get('X'), fit_iteration_parameters_dict.get('y'), automl_settings_obj)
        if sdk_has_cache_capability and found_data_store:
            data_splits_validated = True
            try:
                start = time.time()
                transformed_data_context = _get_transformed_data_context(
                    X=fit_iteration_parameters_dict.get('X'),
                    y=fit_iteration_parameters_dict.get('y'),
                    X_valid=fit_iteration_parameters_dict.get('X_valid'),
                    y_valid=fit_iteration_parameters_dict.get('y_valid'),
                    sample_weight=fit_iteration_parameters_dict.get('sample_weight'),
                    sample_weight_valid=fit_iteration_parameters_dict.get('sample_weight_valid'),
                    x_raw_column_names=fit_iteration_parameters_dict.get('x_raw_column_names'),
                    cv_splits_indices=fit_iteration_parameters_dict.get('cv_splits_indices'),
                    automl_settings_obj=automl_settings_obj,
                    data_store=data_store,
                    run_target='remote',
                    parent_run_id=parent_run_id,
                    logger=logger
                )
                end = time.time()
                print("time taken for transform {}".format(end-start))
                logger.info("time taken for transform {}".format(end-start))
                if sdk_has_validate_data_splits:
                    try:
                        logger.info("Validating data splits now.")
                        _validate_data_splits(X=transformed_data_context.X,
                                              y=transformed_data_context.y,
                                              X_valid=transformed_data_context.X_valid,
                                              y_valid=transformed_data_context.y_valid,
                                              cv_splits=transformed_data_context.cv_splits,
                                              automl_settings=automl_settings_obj)
                        data_splits_validated = True
                    except Exception as data_split_exception:
                        data_splits_validated = False
                        logger.error("Meeting validation errors {}.".format(data_split_exception))
                        log_traceback(data_split_exception, logger)
                        raise data_split_exception
                logger.info("Start setting problem info.")
                automl.set_problem_info(transformed_data_context.X, transformed_data_context.y,
                                        automl_settings_obj.task_type,
                                        current_run=current_run,
                                        preprocess=automl_settings_obj.preprocess,
                                        lag_length=automl_settings_obj.lag_length,
                                        transformed_data_context=transformed_data_context,
                                        enable_cache=automl_settings_obj.enable_cache,
                                        subsampling=enable_subsampling)
            except Exception as e:
                if sdk_has_validate_data_splits and not data_splits_validated:
                    logger.error("sdk_has_validate_data_splits is True and data_splits_validated is False {}.".format(e))
                    log_traceback(e, logger)
                    raise e
                else:
                    logger.warning("Setup failed, fall back to old model {}".format(e))
                    print("Setup failed, fall back to old model {}".format(e))
                    automl.set_problem_info(
                        X=fit_iteration_parameters_dict.get('X'),
                        y=fit_iteration_parameters_dict.get('y'),
                        task_type=task_type, current_run=current_run,
                        preprocess=preprocess, subsampling=enable_subsampling
                    )
        else:
            logger.info("Start setting problem info using old model.")
            if sdk_has_validate_data_splits:
                _validate_data_splits(X=fit_iteration_parameters_dict.get('X'),
                                      y=fit_iteration_parameters_dict.get('y'),
                                      X_valid=fit_iteration_parameters_dict.get('X_valid'),
                                      y_valid=fit_iteration_parameters_dict.get('y_valid'),
                                      cv_splits=fit_iteration_parameters_dict.get('cv_splits_indices'),
                                      automl_settings=automl_settings_obj)
            automl.set_problem_info(
                X=fit_iteration_parameters_dict.get('X'),
                y=fit_iteration_parameters_dict.get('y'),
                task_type=task_type, current_run=current_run,
                preprocess=preprocess, subsampling=enable_subsampling
            )

    def _post_setup(logger):
        logger.info("Setup run completed successfully!")
        print("Setup run completed successfully!")

    def _get_automl_settings(automl_settings, logger):
        automl_settings_obj = None
        current_run = Run.get_submitted_run()
        found_data_store = False
        data_store = None

        start = time.time()

        try:
            experiment = current_run.experiment

            parent_run_id = _get_parent_run_id(current_run._run_id)
            print("parent run id {}".format(parent_run_id))

            automl_settings_obj = _AutoMLSettings.from_string_or_dict(automl_settings)
            data_store = experiment.workspace.get_default_datastore()
            found_data_store = True
        except Exception as e:
            logger.warning("getting data store, fallback to default {}".format(e))
            print("failed to get default data store  {}".format(e))
            found_data_store = False

        end = time.time()
        print("Caching supported {}, time taken for get default DS {}".format(sdk_has_cache_capability and found_data_store, (end - start)))

        return automl_settings_obj, found_data_store, data_store

    def _load_transformed_data_context_from_cache(automl_settings_obj, parent_run_id,
                                                 found_data_store, data_store,
                                                 logger):
        logger.info("Loading the data from datastore.")
        transformed_data_context = None
        if sdk_has_cache_capability and automl_settings_obj is not None and automl_settings_obj.enable_cache and \
                automl_settings_obj.preprocess and found_data_store:

            try:
                start = time.time()
                transformed_data_context = TransformedDataContext(X={},
                                                                  run_id=parent_run_id,
                                                                  run_targets='remote',
                                                                  logger=logger,
                                                                  enable_cache=True,
                                                                  data_store=data_store)
                transformed_data_context._load_from_cache()
                end = time.time()
                logger.info("Time taken for loading from cache {}.".format(end-start))
                print("Time taken for loading from cache {}.".format(end-start))

            except Exception as e:
                logger.warning("Error while loading from cache, defaulting to redo {}".format(e))
                transformed_data_context = None
        return transformed_data_context

    def _start_run(automl_settings_obj, run_id, training_percent, iteration,
                  pipeline_spec, pipeline_id,
                  dataprep_json, script_directory,
                  entry_point,
                  logger,
                  transformed_data_context=None):
        logger.info("Starting the run.")
        if transformed_data_context is None:
            logger.info("transformed_data_context is None, loading data now.")
            fit_iteration_parameters_dict = _prepare_data(
                dataprep_json=dataprep_json,
                automl_settings_obj=automl_settings_obj,
                script_directory=script_directory,
                entry_point=entry_point,
                logger=logger
            )

            fit_iteration_parameters_dict = _get_auto_cv_dict(fit_iteration_parameters_dict, automl_settings_obj, logger)

            result = fit_pipeline(
                pipeline_script=pipeline_spec,
                automl_settings=automl_settings_obj,
                run_id=run_id,
                fit_iteration_parameters_dict = fit_iteration_parameters_dict,
                train_frac=training_percent/100,
                iteration=iteration,
                pipeline_id=pipeline_id,
                remote=True,
                child_run_metrics=Run.get_context(_batch_upload_metrics=False),
                logger=logger)
        else:
            if automl_settings_obj.n_cross_validations is None and transformed_data_context.X_valid is None:
                automl_settings_obj.n_cross_validations = _get_cv_from_transformed_data_context(
                    transformed_data_context, logger)
            result = fit_pipeline(
                pipeline_script=pipeline_spec,
                automl_settings=automl_settings_obj,
                run_id=run_id,
                train_frac=training_percent/100,
                iteration=iteration,
                pipeline_id=pipeline_id,
                remote=True,
                child_run_metrics=Run.get_context(_batch_upload_metrics=False),
                logger=logger,
                transformed_data_context=transformed_data_context)
        logger.info("Run finished.")
        return result

    def _post_run(result, run_id, automl_settings, logger):
        print("for Run Id : ", run_id)
        print("result : ", result)
        if len(result['errors']) > 0:
            err_type = next(iter(result['errors']))
            inner_ex = result['errors'][err_type]['exception']
            inner_ex.error_type = ErrorTypes.Client
            log_traceback(inner_ex, logger)
            raise RuntimeError(inner_ex) from inner_ex

        score = result[automl_settings['primary_metric']]
        duration = result['fit_time']
        print("Score : ", score)
        print("Duration : ", duration)
        print("Childrun completed successfully!")
        logger.info("Childrun completed successfully!")

    def driver_wrapper(
        script_directory, automl_settings, run_id, training_percent,
        iteration, pipeline_spec, pipeline_id, dataprep_json, entry_point,
        **kwargs
    ):
        automl_settings_obj = _AutoMLSettings.from_string_or_dict(automl_settings)
        logger, sdk_has_custom_dimension_logger = _init_logger(automl_settings_obj)
        if sdk_has_custom_dimension_logger:
            logger.update_default_properties({
                "parent_run_id": _get_parent_run_id(run_id),
                "child_run_id": run_id
            })
        logger.info("[RunId:{}]: remote automl driver begins.".format(run_id))

        try:
            script_directory = _init_directory(directory=script_directory, logger=logger)

            automl_settings_obj, found_data_store, data_store = _get_automl_settings(
                automl_settings=automl_settings, logger=logger)

            transformed_data_context = _load_transformed_data_context_from_cache(
                automl_settings_obj=automl_settings_obj,
                parent_run_id=_get_parent_run_id(run_id),
                found_data_store=found_data_store,
                data_store=data_store,
                logger=logger
            )
            result = _start_run(automl_settings_obj=automl_settings_obj,
                            run_id=run_id,
                            training_percent=training_percent,
                            iteration=iteration,
                            pipeline_spec=pipeline_spec,
                            pipeline_id=pipeline_id,
                            dataprep_json=dataprep_json,
                            script_directory=script_directory,
                            entry_point=entry_point,
                            logger=logger,
                            transformed_data_context=transformed_data_context)
            _post_run(result=result, run_id=run_id, automl_settings=automl_settings, logger=logger)
        except Exception as e:
            logger.error("driver_wrapper meets exceptions. {}".format(e))
            log_traceback(e, logger)
            raise Exception(e)

        logger.info("[RunId:{}]: remote automl driver finishes.".format(run_id))
        return result

    def setup_wrapper(
        script_directory, dataprep_json, entry_point, automl_settings, task_type,
        preprocess, enable_subsampling, num_iterations,
        **kwargs
    ):
        automl_settings_obj = _AutoMLSettings.from_string_or_dict(automl_settings)

        logger, sdk_has_custom_dimension_logger = _init_logger(automl_settings_obj)
        try:
            child_run_id = Run.get_submitted_run()._run_id
            parent_run_id = _get_parent_run_id(child_run_id)
            if sdk_has_custom_dimension_logger:
                logger.update_default_properties({
                    "parent_run_id": parent_run_id,
                    "child_run_id": child_run_id
                })
            logger.info("[ParentRunId:{}]: remote setup script begins.".format(parent_run_id))
            script_directory = _init_directory(directory=script_directory, logger=logger)

            logger.info("Preparing data for set problem info now.")

            fit_iteration_parameters_dict = _prepare_data(
                dataprep_json=dataprep_json,
                automl_settings_obj=automl_settings_obj,
                script_directory=script_directory,
                entry_point=entry_point,
                logger=logger
            )
            fit_iteration_parameters_dict = _get_auto_cv_dict(fit_iteration_parameters_dict, automl_settings_obj, logger)

            print("Setting Problem Info now.")
            _set_problem_info_for_setup(
                fit_iteration_parameters_dict=fit_iteration_parameters_dict,
                automl_settings_obj=automl_settings_obj,
                task_type=task_type,
                preprocess=preprocess,
                enable_subsampling=enable_subsampling,
                num_iterations=num_iterations,
                logger=logger)
        except Exception as e:
            logger.error("setup_wrapper meets exceptions. {}".format(e))
            log_traceback(e, logger)
            raise Exception(e)

        _post_setup(logger=logger)
        logger.info("[ParentRunId:{}]: remote setup script finishes.".format(parent_run_id))
        return # PLACEHOLDER for RemoteScript helper functions

args = sys.argv
aml_token = None
script_directory = None
print("Starting the model test run....")

preprocess = "True"  # PLACEHOLDER
run_id = "bc5d68bc-d4d6-419c-895a-b251662cf29f"  # PLACEHOLDER
training_run_id = "model_loan_validationset_21" # PLACEHOLDER
automl_settings = {'is_subgraph_orchestration':False,'is_automode':True,'path':'./sample_projects/','subscription_id':'84677e42-3672-4e5b-98f7-81f8a44649a9','resource_group':'ML_Project','workspace_name':'ML_ProjectWS1','compute_target':'cpu-cluster','iterations':1000,'primary_metric':'AUC_weighted','task_type':'classification','IsImageTask':False,'IsTextDNNTask':False,'validation_size':0.2,'test_size':0.2,'n_cross_validations':None,'preprocess':True,'is_timeseries':False,'time_column_name':None,'grain_column_names':None,'max_cores_per_iteration':-1,'max_concurrent_iterations':2,'max_nodes':1,'iteration_timeout_minutes':15,'enforce_time_on_windows':False,'experiment_timeout_minutes':15,'exit_score':'NaN','experiment_exit_score':'NaN','whitelist_models':None,'blacklist_models':['LogisticRegression','SGD','MultinomialNaiveBayes','BernoulliNaiveBayes','SVM','KNN','DecisionTree','ExtremeRandomTrees','RandomForest'],'blacklist_algos':['LogisticRegression','SGD','MultinomialNaiveBayes','BernoulliNaiveBayes','SVM','KNN','DecisionTree','ExtremeRandomTrees','RandomForest','TensorFlowLinearClassifier','TensorFlowDNN'],'auto_blacklist':False,'blacklist_samples_reached':False,'exclude_nan_labels':False,'verbosity':20,'model_explainability':True,'enable_onnx_compatible_models':False,'enable_feature_sweeping':False,'send_telemetry':True,'enable_early_stopping':True,'early_stopping_n_iters':20,'distributed_dnn_max_node_check':False,'enable_distributed_featurization':True,'enable_distributed_dnn_training':True,'enable_distributed_dnn_training_ort_ds':False,'ensemble_iterations':15,'enable_tf':False,'enable_cache':False,'enable_subsampling':False,'metric_operation':'maximize','enable_streaming':False,'use_incremental_learning_override':False,'force_streaming':False,'enable_dnn':False,'is_gpu_tmp':False,'enable_run_restructure':False,'featurization':'auto','vm_priority':'dedicated','label_column_name':'loan_status','weight_column_name':None,'miro_flight':'default','many_models':False,'many_models_process_count_per_node':0,'automl_many_models_scenario':None,'enable_batch_run':True,'save_mlflow':True,'track_child_runs':True,'test_include_predictions_only':False,'enable_mltable_quick_profile':'True','has_multiple_series':False,'_enable_future_regressors':False,'enable_ensembling':True,'enable_stack_ensembling':False,'ensemble_download_models_timeout_sec':300.0,'stack_meta_learner_train_percentage':0.2,'vm_type':'STANDARD_D2_V2','process_count_per_instance':0,'gpu_memory_gb':0.0} # PLACEHOLDER for AutoMLSettings
dataprep_json = None  # PLACEHOLDER
mltable_data_json = "{\"Type\":\"MLTable\",\"TrainData\":{\"Uri\":\"azureml://locations/eastus/workspaces/7171f9b6-3d02-48f4-ac3a-50cd60c22aea/data/Loan_dataset/versions/1\",\"ResolvedUri\":\"azureml://locations/eastus/workspaces/7171f9b6-3d02-48f4-ac3a-50cd60c22aea/data/Loan_dataset/versions/1\",\"AssetId\":\"azureml://locations/eastus/workspaces/7171f9b6-3d02-48f4-ac3a-50cd60c22aea/data/Loan_dataset/versions/1\"},\"TestData\":null,\"ValidData\":null}" # PLACEHOLDER
entry_point = "get_data.py" # PLACEHOLDER

print("run_id in the real script: ", run_id)
project_dir = "/tmp/azureml_runs/" + run_id

enable_streaming = False # PLACEHOLDER

if enable_streaming is not None:
    print("Set enable_streaming flag to", enable_streaming)
    automl_settings['enable_streaming']=enable_streaming

test_include_predictions_only = False # PLACEHOLDER

if test_include_predictions_only is not None:
    print("Set test_include_predictions_only flag to", test_include_predictions_only)
    automl_settings['test_include_predictions_only'] = test_include_predictions_only

def model_test_run():
    global script_directory

    model_test_wrapper(
        script_directory=script_directory,
        automl_settings=automl_settings,
        run_id=run_id,
        training_run_id=training_run_id,
        dataprep_json=dataprep_json,
        mltable_data_json=mltable_data_json,
        entry_point=entry_point)


if __name__ == '__main__':
    model_test_run()
