# Scikit-Learn Linear Regression
Using SALES_VIEW from DWC. This view has 6,291,450 records

## Install fedml_azure package

In [1]:
pip install fedml_azure-1.0.0-py3-none-any.whl --force-reinstall

Processing ./fedml_azure-1.0.0-py3-none-any.whl
Collecting hdbcli
  Using cached hdbcli-2.10.13-cp34-abi3-manylinux1_x86_64.whl (11.7 MB)
Installing collected packages: hdbcli, fedml-azure
  Attempting uninstall: hdbcli
    Found existing installation: hdbcli 2.10.13
    Uninstalling hdbcli-2.10.13:
      Successfully uninstalled hdbcli-2.10.13
  Attempting uninstall: fedml-azure
    Found existing installation: fedml-azure 1.0.0
    Uninstalling fedml-azure-1.0.0:
      Successfully uninstalled fedml-azure-1.0.0
Successfully installed fedml-azure-1.0.0 hdbcli-2.10.13
Note: you may need to restart the kernel to use updated packages.


## Import the libraries needed in this notebook

In [2]:
from fedml_azure import create_workspace
from fedml_azure import create_compute
from fedml_azure import create_environment
from fedml_azure import DwcAzureTrain

## Set up
### Creating a workspace. This takes a dictionary as input for parameter workspace_args.

Before running the below cell, ensure that you have a workspace and replace the subscription_id, 
resource_group, and workspace_name with your information.

https://docs.microsoft.com/en-us/azure/machine-learning/how-to-manage-workspace?tabs=python


In [3]:
#creation of workspace
workspace=create_workspace(workspace_args=
                                    {'subscription_id':"cb97564e-cea8-45a4-9c5c-a3357e8f7ee4",
                                        "resource_group": "AI_Strategy_AzureML_Resource",
                                        "workspace_name": "AIStrategy_AzureML_Worskpace"
                                    }
                        )



Getting existing Workspace


### Creating a Compute Cluster. This takes the workspace, a compute_type, and compute_args.
https://docs.microsoft.com/en-us/azure/machine-learning/how-to-create-attach-compute-cluster?tabs=python

https://docs.microsoft.com/en-us/python/api/azureml-core/azureml.core.compute.amlcompute.amlcompute?view=azure-ml-py

In [4]:
#creation of compute target
compute=create_compute(workspace=workspace,
                   compute_type='AmlComputeCluster',
                   compute_args={'vm_size':'Standard_D12_v2',
                                'vm_priority':'lowpriority',
                                'compute_name':'cpu-clu-linear',
                                'min_nodes':0,
                                'max_nodes':4,
                                'idle_seconds_before_scaledown':1700
                                }
                )



Creating Compute_target
Found compute target. just use it. cpu-clu-linear


### Creating an Environment. This takes the workspace, environment_type, and environment_args.

The whl file for the fedml_azure library must be passed to the pip_wheel_files key in the environment_args and to use scikit-learn, you must pass the name to conda_packages as well.

https://docs.microsoft.com/en-us/python/api/azureml-core/azureml.core.environment(class)?view=azure-ml-py

https://docs.microsoft.com/en-us/azure/machine-learning/how-to-use-environments

In [5]:
#creation of environment
environment=create_environment(workspace=workspace,
                           environment_type='CondaPackageEnvironment',
                           environment_args={'name':'test-env-linear','conda_packages':['scikit-learn'],'pip_wheel_files':['fedml_azure-1.0.0-py3-none-any.whl']})

Creating Environment


## Now, let's train the model
### First, we need to instantiate the training class - this will assign the resources.

In [6]:
train=DwcAzureTrain(workspace=workspace,
                    environment=environment,
                    experiment_args={'name':'test-2'},
                    compute=compute)

Assigning Workspace
Creating Experiment
Assigning compute
Assigning Environment


### Then, we need to generate the run config. This is needed to package the configuration specified so we can submit a job for training. 

Before running the following cell, you should have a config.json file with the specified values to allow you to access to DWC. Provide this file path to config_file_path in the below cell.

You should also have the follow view SALES_VIEW created in your DWC. To gather this data, please refer to https://eforexcel.com/wp/downloads-18-sample-csv-files-data-sets-for-testing-sales/

Please note the 2M records data was downloaded and duplicated 3 times to represent a large dataset in DWC.

https://docs.microsoft.com/en-us/python/api/azureml-core/azureml.core.scriptrunconfig?view=azure-ml-py

In [7]:
#generating the run config
src=train.generate_run_config(config_file_path='dwc_configs/config.json',
                          config_args={
                                          'source_directory':'Scikit-Learn-Linear-Regression',
                                          'script':'train_script.py',
                                          'arguments':[
                                                        '--model_file_name','regression.pkl',
                                                        '--table_name', 'SALES_VIEW'
                                                      ]
                                          }
                            )

Generating script run config
Config file already exists in the script_directory Scikit-Learn-Linear-Regression


### Submitting the job for training

In [8]:
#submitting the training run
run=train.submit_run(src)

Submitting training run
RunId: test-2_1633630701_bbb9c42b
Web View: https://ml.azure.com/runs/test-2_1633630701_bbb9c42b?wsid=/subscriptions/cb97564e-cea8-45a4-9c5c-a3357e8f7ee4/resourcegroups/AI_Strategy_AzureML_Resource/workspaces/AIStrategy_AzureML_Worskpace&tid=42f7676c-f455-423c-82f6-dc2d99791af7

Streaming azureml-logs/20_image_build_log.txt

2021/10/07 18:18:28 Downloading source code...
2021/10/07 18:18:29 Finished downloading source code
2021/10/07 18:18:29 Creating Docker network: acb_default_network, driver: 'bridge'
2021/10/07 18:18:29 Successfully set up Docker network: acb_default_network
2021/10/07 18:18:29 Setting up Docker configuration...
2021/10/07 18:18:30 Successfully set up Docker configuration
2021/10/07 18:18:30 Logging in to registry: d531eb48a65445d3b0ed1b6002c65026.azurecr.io
2021/10/07 18:18:31 Successfully logged into d531eb48a65445d3b0ed1b6002c65026.azurecr.io
2021/10/07 18:18:31 Executing step ID: acb_step_0. Timeout(sec): 5400, Working directory: '', Net

## Register the model for deployment

In [9]:
model=train.register_model(run=run,
                           model_args={'model_name':'sklearn_linReg_model',
                                       'model_path':'outputs/regression.pkl'},
                            resource_config_args={'cpu':1, 'memory_in_gb':0.5},
                            is_sklearn_model=True
                           )

print('Name:', model.name)
print('Version:', model.version)

Registering the model
Configuring parameters for sklearn model
Name: sklearn_linReg_model
Version: 10
