# Scikit-Learn Linear Regression
Using SALES_VIEW from SAP Datasphere. This view has 6,291,450 records

## Install fedml_azure package

In [None]:
pip install fedml_azure --force-reinstall

## Import the libraries needed in this notebook

In [None]:
from fedml_azure import create_workspace
from fedml_azure import DbConnection
from fedml_azure import create_compute
from fedml_azure import create_environment
from fedml_azure import DwcAzureTrain
from fedml_azure import deploy
from fedml_azure import predict
from fedml_azure import register_model

## Set up

### Initialize the workspace

The create_workspace method takes a dictionary as input for parameter workspace_args.

Before running the below cell, ensure that you have a workspace and replace the subscription_id, resource_group, and workspace_name with your information. https://docs.microsoft.com/en-us/azure/machine-learning/how-to-manage-workspace?tabs=python

Refer the documentation on the ‘create_workspace’ method and parameters (https://github.com/SAP-samples/data-warehouse-cloud-fedml/blob/main/Azure/docs/fedml_azure.md#create_workspace)..


In [None]:
workspace=create_workspace(workspace_args={
                                            "subscription_id": '<subscription-id>',
                                            "resource_group": '<resource-group>',
                                            "workspace_name": '<workspace_name>'
                                            }
)

### Create a Compute target

The create_compute method takes the workspace, a compute_type, and compute_args as parameters.The following code creates a Compute Cluster with the name 'cluster' for training.

Refer the documentation on the ‘create_compute’ method and parameters (https://github.com/SAP-samples/data-warehouse-cloud-fedml/blob/main/Azure/docs/fedml_azure.md#create_compute).


In [None]:
compute=create_compute(workspace=workspace,
                   compute_type='AmlComputeCluster',
                   compute_args={'vm_size':'Standard_D12_v2',
                                'vm_priority':'lowpriority',
                                'compute_name':'cpu-cluster',
                                'min_nodes':0,
                                'max_nodes':4,
                                'idle_seconds_before_scaledown':1700
                                }
                )

### Create an Environment

The create_environment method takes the workspace, environment_type, and environment_args as parameters.

Pass 'fedml_azure' as a pip package and to use scikit-learn, you must pass the name to conda_packages as well.

Refer the documentation on the ‘create_environment’ method and parameters (https://github.com/SAP-samples/data-warehouse-cloud-fedml/blob/main/Azure/docs/fedml_azure.md#create_environment). 
`

In [None]:
environment=create_environment(workspace=workspace,
                           environment_type='CondaPackageEnvironment',
                           environment_args={'name':'regression-sklearn',
                                             'pip_packages':['joblib','fedml_azure'],
                                             'conda_packages':['scikit-learn']})


## Now, lets train the model

### Creating a Training object and setting the workspace, compute target, and environment.

Before running the below cell, ensure that you have a workspace and replace the subscription_id, resource_group, and workspace_name with your information.

Pass the 'fedml_azure' as a pip package and to use scikit-learn, you must pass the name to conda_packages as well.

Refer the documentation on the ‘DwcAzureTrain’ class (https://github.com/SAP-samples/data-warehouse-cloud-fedml/blob/main/Azure/docs/fedml_azure.md#dwcazuretrain-class).


In [None]:
train=DwcAzureTrain(workspace=workspace,
                    environment=environment,
                    experiment_args={'name':'regression-experiment'},
                    compute=compute)

### Then, we need to generate the run config. This is needed to package the configuration specified so we can submit a job for training. 

Before running the following cell, you should have a config.json file with the specified values to allow you to access to SAP Datasphere. Provide this file path to config_file_path in the below cell.

You should also have the follow view SALES_VIEW created in your SAP Datasphere. To gather this data, please refer to https://eforexcel.com/wp/downloads-18-sample-csv-files-data-sets-for-testing-sales/

Please note the 2M records data was downloaded and duplicated 3 times to represent a large dataset in SAP Datasphere.

Refer the documentation on the ‘generate_run_config’ method and parameters https://github.com/SAP-samples/data-warehouse-cloud-fedml/blob/main/Azure/docs/fedml_azure.md#generate_run_config).

In [None]:
src=train.generate_run_config(config_file_path='dwc_configs/config.json',
                          config_args={
                                          'source_directory':'Scikit-Learn-Linear-Regression',
                                          'script':'train_script.py',
                                          'arguments':[
                                                        '--model_file_name','regression.pkl',
                                                        '--table_name', 'SALES_VIEW'
                                                      ]
                                          }
                            )

### Submit the training job with the option to download the model outputs

Refer the documentation on ‘submit_run’ method and parameters (https://github.com/SAP-samples/data-warehouse-cloud-fedml/blob/main/Azure/docs/fedml_azure.md#submit_run)

In [None]:
run=train.submit_run(src)

### Register the model

Pass ‘outputs/model_file_name.pkl’ to 'model_path' key of model_args ,where ‘model_file_name’ is the name of the .pkl model file specified in the previous step. 

Provide the desired model name to ‘model_name’ key of model_args in the below cell. The 'is_sklearn_model' flag specifies if a scikit learn model is being registered.

Refer the documentation on ‘register_model’ method and parameters (https://github.com/SAP-samples/data-warehouse-cloud-fedml/blob/main/Azure/docs/fedml_azure.md#register_model)

In [None]:
model=train.register_model(run=run,
                           model_args={'model_name':'sklearn_linReg_model',
                                       'model_path':'outputs/regression.pkl'},
                            resource_config_args={'cpu':1, 'memory_in_gb':0.5},
                            is_sklearn_model=True
                           )

### Register the model without the training run with the model file. This is helpful if you need to deploy models with just the model file. This step is optional

 #### For this sample use case, we download the model generated from the training run. This step is optional

In [None]:
train.download_files(run)

We now use the model file to register the model.  This step is optional

Refer the documentation on 'register_model' for more details (https://github.com/SAP-samples/data-warehouse-cloud-fedml/blob/main/Azure/docs/fedml_azure.md#register_model_for_deploy)

In [None]:
model=register_model(model_args={'workspace':workspace,
                                'model_name':'regression_model',
                                'model_path':'outputs/regression-experiment/regression-experiment_1648852509_6fa36175/outputs/regression.pkl'},
                                resource_config_args={'cpu':1, 'memory_in_gb':0.5},
                                is_sklearn_model=True
                                )

### Get the test data from SAP Datasphere

In [None]:
from sklearn.linear_model import LinearRegression
from sklearn.model_selection import train_test_split
from fedml_azure import DbConnection
import pandas as pd
import numpy as np
import json 

with open('Scikit-Learn-Linear-Regression/config.json', 'r') as f:
    config = json.load(f)


label_column = 'totalprofit'

def get_data(table_name):
    db = DbConnection(url='Scikit-Learn-Linear-Regression/config.json')
    schema=config['schema']
    query=f'SELECT TOP 100 * FROM "{schema}"."{table_name}"'
    data = db.execute_query(query)
    data = pd.DataFrame(data[0], columns=data[1])
    data=data[['unitssold', 'unitprice', 'unitcost','totalrevenue', 'totalcost','totalprofit']]
    return data

data = get_data('sales_view_athena') 
data=data.dropna()#getting data from SAP Datasphere
y = data[label_column]

data.drop(label_column, axis=1, inplace=True)

print(np.shape(data), np.shape(y))

X_train, X_test, y_train, y_test = train_test_split(data, y, test_size=0.3)

In [None]:
test_data = json.dumps({
    'data': X_test.values.tolist()
})

 ### Deploy the model as a Webservice to Azure Compute Instance

Refer the documentation on 'deploy' for more details (https://github.com/SAP-samples/data-warehouse-cloud-fedml/blob/main/Azure/docs/fedml_azure.md#deploy)

In [None]:
aci_endpoint,_,aci_service=deploy(compute_type='ACI',
                                 inference_config_args={'entry_script':'Scikit-Learn-Linear-Regression/score.py', 'environment':environment},
                                 deploy_config_args={'cpu_cores':1, 'memory_gb':0.5},
                                 deploy_args={'workspace':workspace,'name':'aciwebservice','models':[model]}
                                )

### Inference the ACI endpoint using the test data

Refer the documentation on 'predict' for more details (https://github.com/SAP-samples/data-warehouse-cloud-fedml/blob/main/Azure/docs/fedml_azure.md#predict)

#### Inference the ACI endpoint using the webservice object

In [None]:
result=predict(service=aci_service,data=test_data)
result

#### Inference the ACI endpoint using the endpoint_url

In [None]:
result=predict(endpoint_url=aci_endpoint,compute_type='ACI',data=test_data)
result

 ### Deploy the model as a Webservice Locally

Refer the documentation on 'deploy' for more details (https://github.com/SAP-samples/data-warehouse-cloud-fedml/blob/main/Azure/docs/fedml_azure.md#deploy)

In [None]:
local_endpoint,_,local_service=deploy(compute_type='local',
                                     inference_config_args={'entry_script':'Scikit-Learn-Linear-Regression/score.py', 'environment':environment},
                                    deploy_config_args={'port':8500},
                                    deploy_args={'workspace':workspace,'name':'localregservice','models':[model]}
                                    )

### Inference the Local endpoint using the test data

Refer the documentation on 'predict' for more details (https://github.com/SAP-samples/data-warehouse-cloud-fedml/blob/main/Azure/docs/fedml_azure.md#predict)

#### Inference the Local endpoint using the webservice object

In [None]:
result=predict(service=local_service,data=test_data)
result

#### Inference the Local endpoint using the endpoint_url

In [None]:
result=predict(endpoint_url=local_endpoint,data=test_data,compute_type='Local')
result

 ### Deploy the model as a Webservice to Azure Kubernetes Service

#### Create Azure Kubernetes Service Compute

Refer the documentation on the ‘create_compute’ method and parameters (https://github.com/SAP-samples/data-warehouse-cloud-fedml/blob/main/Azure/docs/fedml_azure.md#create_compute).

In [None]:
aks=create_compute(workspace=workspace,
                   compute_type='AKS',
                   compute_args={'compute_name':'aks1'})

 #### Deploy the model as a Webservice to Azure Kubernetes Service

Refer the documentation on 'deploy' for more details (https://github.com/SAP-samples/data-warehouse-cloud-fedml/blob/main/Azure/docs/fedml_azure.md#deploy)

In [None]:
aks_endpoint,api_key,aks_service=deploy(compute_type='AKS',
                                        inference_config_args={'entry_script':'Scikit-Learn-Linear-Regression/score.py', 'environment':environment},
                                        deploy_args={'workspace':workspace,'name':'akswebservice','models':[model],'deployment_target':aks}
                                        )

### Inference the AKS endpoint using the test data

Refer the documentation on 'predict' for more details (https://github.com/SAP-samples/data-warehouse-cloud-fedml/blob/main/Azure/docs/fedml_azure.md#predict)

#### Inference the AKS endpoint using the webservice object

In [None]:
result=predict(service=aks_service,data=test_data)
result

#### Inference the AKS endpoint using the endpoint_url

In [None]:
result=predict(endpoint_url=aks_endpoint,data=test_data,compute_type='aks',api_key=api_key)
result

### Write the result back to SAP Datasphere

#### Create table in SAP Datasphere

In [None]:
from fedml_azure import DbConnection
db = DbConnection(url='Scikit-Learn-Linear-Regression/config.json')

In [None]:
db.create_table("CREATE TABLE LINEAR_SALES_VIEW (unitssold FLOAT,unitprice FLOAT,unitcost FLOAT,totalrevenue FLOAT,totalcost FLOAT,result FLOAT)")

#### Storing the result in the dataframe

In [None]:
import pandas as pd
result_df=pd.DataFrame(result['result'])
result_df.rename( columns={0:'result'}, inplace=True )

In [None]:
X_test['result']=result_df['result'].values
X_test

 #### Inserting the data into table

In [None]:
db.insert_into_table('LINEAR_SALES_VIEW',X_test)