# Verify Python libraries are installed
Note that your Synapse Spark pool includes all the libraries required to run this notebook. They were added during the pool creation by using the correct requirements.txt file.

The libraries installed are:
```python
    numpy==1.17.1
    pandas==0.24.2
    idna==2.5
    scipy==1.3.1
    azureml-sdk==1.3.0
    azureml-automl-core==1.3.0
    azureml-automl-runtime==1.2.0
```

Synapse Spark pool already have the required libraries to connect to Cosmos DB operational and analytical storage.

In [3]:
import azureml
from azureml.core import Run
from azureml.core import Workspace
from azureml.core.model import Model
from azureml.core.run import Run
from azureml.core.experiment import Experiment

import scipy

# Verify versions of key libraries
# view version history at https://pypi.org/project/azureml-sdk/#history 
print("Azure ML SDK Version:", azureml.core.VERSION)
print("SciPy Version: ", scipy.__version__)

Azure ML SDK Version: 1.3.0
SciPy Version:  1.1.0

# Batch Scoring data
In this notebook, you will use apply forecasting model you created previously to determine if the battery will be in need of replacement within the next 30 days.

## Configure access to the Azure Machine Learning resources
To begin, you will need to provide the following information about your Azure Subscription.

**If you are using your own Azure subscription, please provide names for subscription_id, resource_group, workspace_name and workspace_region to use.** You should already have the Azure Machine Learning service workspace in your lab resource group. If not, the values you enter will be used to create a new one (note that the workspace needs to be of type [Machine Learning Workspace](https://docs.microsoft.com/en-us/azure/machine-learning/service/setup-create-workspace)).

In the following cell, be sure to set the values for `subscription_id`, `resource_group`, `workspace_name` and `workspace_region` as directed by the comments (*these values can be acquired from the Azure Portal*).

To get these values, do the following:  
1. Navigate to the Azure Portal and login with the credentials provided.  
2. From the left hand menu, under Favorites, select `Resource Groups`.  
3. In the list, select the resource group used for the lab.  
4. Open your Azure Machine Learning service workspace.

  - **If this does not yet exist**, you can retrieve the `subscription_id`, `resource_group`, and `workspace_region` values from the resource group's Overview blade. You will need to make up your own `workspace_name` value, such as "iot-aml-ws-YOUR_INITIALS". **Set to the existing workspace name if it already exists**.   


5. The requested values should be in the Overview blade.

In [4]:
#Provide the Subscription ID of your existing Azure subscription
subscription_id = "220fc532-6091-423c-8ba0-66c2397d591b"

#Provide values for the existing Resource Group 
resource_group = "iot-lab-2020"

#Provide the Workspace Name and Azure Region of the Azure Machine Learning Workspace
workspace_name = "Cosmos-DB-IoT-ML-2klrbk7bxl3tk"
workspace_region = "East US"

print("Finished setting Azure Machine Learning service variables.")

Finished setting Azure Machine Learning service variables.

## Configure access to the Azure Machine Learning resources

Run the following cells to connect to your **Azure Machine Learning Workspace**  Make sure the Workspace already exist and your Service Principal has been configured to use it followig the instrucitons defined in [Setup Service Principal Authentication](https://docs.microsoft.com/en-us/azure/machine-learning/how-to-setup-authentication?view=azure-ml-py#set-up-service-principal-authentication)

In [5]:
from azureml.core.authentication import ServicePrincipalAuthentication

# How do I get this from KeyVault?
sp = ServicePrincipalAuthentication(
        tenant_id="72f988bf-86f1-41af-91ab-2d7cd011db47", # service principal tenantID
        service_principal_id="299abf99-3a3e-4ede-95e6-5c76944a5c4f", # service principal clientId
        service_principal_password="1d71ab00-4f54-47d5-8f43-2d093517e017") # service principal clientSecret 

In [6]:
# By using the exist_ok param, if the worskpace already exists we get a reference to the existing workspace
from azureml.core import Workspace

ws = Workspace.get(
        name=workspace_name, 
        auth=sp,
        subscription_id=subscription_id)
ws.get_details()

{'id': '/subscriptions/220fc532-6091-423c-8ba0-66c2397d591b/resourceGroups/iot-lab-2020/providers/Microsoft.MachineLearningServices/workspaces/Cosmos-DB-IoT-ML-2klrbk7bxl3tk', 'name': 'Cosmos-DB-IoT-ML-2klrbk7bxl3tk', 'location': 'eastus', 'type': 'Microsoft.MachineLearningServices/workspaces', 'sku': 'Enterprise', 'workspaceid': '13d4628c-5afc-40c3-9a7a-ef09ad68a159', 'description': '', 'friendlyName': '', 'creationTime': '2020-04-13T17:31:53.6662789+00:00', 'containerRegistry': '/subscriptions/220fc532-6091-423c-8ba0-66c2397d591b/resourceGroups/iot-lab-2020/providers/Microsoft.ContainerRegistry/registries/cosmosdbiotm2bc27104', 'keyVault': '/subscriptions/220fc532-6091-423c-8ba0-66c2397d591b/resourcegroups/iot-lab-2020/providers/microsoft.keyvault/vaults/iot-vault-2klrbk7bxl3tk', 'applicationInsights': '/subscriptions/220fc532-6091-423c-8ba0-66c2397d591b/resourcegroups/iot-lab-2020/providers/microsoft.insights/components/cosmos-db-iot-insights-2klrbk7bxl3tk', 'identityPrincipalId': '

## Retrieve the pre-trained model
A pre-trained models has been made available in a public Azure Storage account. Run the following cell to download the model and then register it as a model within your Azure Machine Learning workspace.

In [17]:
import os
import urllib.request

print("Downloading the pre-trained model...")
os.makedirs("models", exist_ok=True)
# Update Code to download file from New Repo
#urllib.request.urlretrieve('https://github.com/AzureCosmosDB/scenario-based-labs/tree/master/IoT/deploy/modelv3.pkl', 'models/modelv3.pkl')
print("Download complete.")

print("Uploading and registering model...")
registered_model = Model.register(model_path="models/modelv3.pkl", 
                                  model_name="batt-cycles-7", 
                                  workspace=ws)

Downloading the pre-trained model...
Download complete.
Uploading and registering model...
Registering model batt-cycles-7

Run the following to retrieve the model from your Azure Machine Learning workspace, and inspect some of its properties.

In [18]:
from azureml.core.model import Model
from sklearn.externals import joblib
from azureml.train import automl

model_path = Model.get_model_path(model_name = 'batt-cycles-7', _workspace=ws)
print("Model saved to ", model_path)
model = joblib.load(model_path)
print("Model loaded.")

Model saved to  azureml-models/batt-cycles-7/3/modelv3.pkl
Model loaded.

## Load the data from Cosmos DB to batch score it
Run the following cells to query Cosmos DB Analytical store, prepare the data using SQL queries and then surface the data as temporary views.

### Registering Helper Function
This function makes it easier to create dataframe based on the Analytical store containers

In [7]:
import pyspark

def _cosmos_olap(self, collection):
    cdb_analytical_config = {
    "spark.cosmos.synapse.linkedServiceName" : "CosmosDbIoTLab",
    "spark.cosmos.region" : "eastus2",
    "spark.cosmos.databaseName" : "ContosoAuto",
    "spark.cosmos.containerName" : collection
    }
    return self.format('com.microsoft.azure.cosmos.analytics.spark.connector.CosmosSource')\
        .options(**cdb_analytical_config)\
        .load()
    
setattr(pyspark.sql.readwriter.DataFrameReader, 'cosmos_olap', _cosmos_olap)

### Register Temp View
Now we register the view required to create the dataset that will use to make the predictions. Notice how you are now capable to join data from multiple Cosmos DB containers


In [8]:
vechicle_metadata_df = spark.read.cosmos_olap('metadata').createOrReplaceTempView("metadata")

### Generate Scoring dataset
Now we are ady to use the previously created view to generate the final dataset

In [9]:
trips_clean = spark.sql("""
    SELECT  vin, 
            to_utc_timestamp(tripEnded, \"yyyy-MM-dd'T'HH:mm:ss.SSSX'Z'\") as tripEnded, 
            to_utc_timestamp(tripStarted, \"yyyy-MM-dd'T'HH:mm:ss.SSSX'Z'\") as tripStarted, 
            ((unix_timestamp(to_utc_timestamp(tripEnded, \"yyyy-MM-dd'T'HH:mm:ss.SSSX'Z'\")) - 
                unix_timestamp(to_utc_timestamp(tripStarted, \"yyyy-MM-dd'T'HH:mm:ss.SSSX'Z'\")))/60.0) as tripDurationMinutes
    FROM metadata
    WHERE   entityType = 'Trip' 
            AND (tripStarted is not null AND tripStarted <> '0' AND tripStarted <> '') 
            AND (tripEnded is not null AND tripEnded <> '0' AND tripEnded <> '')
    """)

trips_clean.createOrReplaceTempView("trips_clean")
trips_clean.printSchema()

root
 |-- vin: string (nullable = true)
 |-- tripEnded: timestamp (nullable = true)
 |-- tripStarted: timestamp (nullable = true)
 |-- tripDurationMinutes: decimal(27,6) (nullable = true)

In [10]:
vehicles_raw = spark.sql("""
    SELECT vin, batteryAgeDays, batteryRatedCycles, lifetimeBatteryCyclesUsed 
    FROM metadata 
    WHERE entityType ='Vehicle'
    """)

vehicles_raw.createOrReplaceTempView("vehicles_raw")
vehicles_raw.printSchema()

root
 |-- vin: string (nullable = true)
 |-- batteryAgeDays: long (nullable = true)
 |-- batteryRatedCycles: long (nullable = true)
 |-- lifetimeBatteryCyclesUsed: double (nullable = true)

In [11]:
vehicles_batch = spark.sql("""
    SELECT  v.vin as vin, 
            to_date(t.tripEnded, 'yyyy-MM-dd') as tripEnded, 
            t.tripDurationMinutes, 
            v.batteryAgeDays, 
            v.batteryRatedCycles, 
            v.lifetimeBatteryCyclesUsed 
    FROM    vehicles_raw v 
    INNER JOIN trips_clean t 
        ON v.vin = t.vin
    """)

vehicles_batch.createOrReplaceTempView("vehicles_batch")
vehicles_batch.printSchema()

root
 |-- vin: string (nullable = true)
 |-- tripEnded: date (nullable = true)
 |-- tripDurationMinutes: decimal(27,6) (nullable = true)
 |-- batteryAgeDays: long (nullable = true)
 |-- batteryRatedCycles: long (nullable = true)
 |-- lifetimeBatteryCyclesUsed: double (nullable = true)

Run the following cells to convert the Spark DataFrame to a Pandas DataFrame for use with the pre-created model.

In [35]:
import pandas as pd

spark_df = spark.sql("SELECT cast(tripEnded as string) as date, batteryAgeDays as battery_Age_Days, tripDurationMinutes as daily_Trip_Duration, lifetimeBatteryCyclesUsed, batteryRatedCycles, vin from vehicles_batch v")
pd_df = spark_df.toPandas()
pd_df['date'] = pd.to_datetime(pd_df['date']) # Added to address Spark Date to Pandas date conversion

## Define the scoring logic
The following cell will apply the model and return a prediction for whether or not maintenance is required.

Run the following cell to define the helper method.

In [40]:
def predict_maintenance(row):
    # from azureml.train import automl
    from sklearn.linear_model import LinearRegression
    import pandas as pd
    import numpy as np
    from datetime import datetime
    predict_needs_service = 0
    
    startday = row["battery_Age_Days"]
    dailytripduration = row["daily_Trip_Duration"]
    current_cycles = row["lifetimeBatteryCyclesUsed"]
    rated_lifetime_cycles = row["batteryRatedCycles"]

    dayslist = range(startday, startday+30)
    pds_df = pd.DataFrame({'battery_Age_Days': dayslist, 'daily_Trip_Duration': dailytripduration})

    y_Pred = reg.predict(np.array(pds_df))
    total_cycles_next_30_days = y_Pred[[29,]][0][0]

    if current_cycles + total_cycles_next_30_days > rated_lifetime_cycles:
        predict_needs_service = 1

    return predict_needs_service

Calculate the predictions by running the following cell.

In [41]:
predictions = pd_df.apply(predict_maintenance, axis=1)

Now, run the following cell to examine the predication by `VIN`

In [42]:
import pandas as pd
batch_predictions_pdf = pd.DataFrame({"vin": pd_df["vin"], "serviceRequired":predictions})
batch_predictions_pdf

vin  serviceRequired
0    QRD1S64DT4QIBM7AT                1
1    IZGBOUOH0QDT7KX44                0
2    CMJT8KNLDLWSSYAY8                1
3    R3L4HK7T3NX7QVKEN                1
4    5KIJ2LBT9NBXRYNJ1                0
5    XF93A5HNUMQF3W7XN                1
6    BTJWLTKYWTYE5QNYD                1
7    H9B2RLTFU2H2I7ZZA                0
8    PTZYTOTXBWVWBAJVG                1
9    5Y7C4X1AW9OD6YJ5H                1
10   OXPVYJT8F5CV0QETY                0
11   TBMYQQ5TJDC85HHHD                0
12   K9QF880KTD5NDKG0M                0
13   8R5D8PMU2LJP25G25                1
14   DZ0JN3HME3OKBBFYU                1
15   PB2GAMT1UBQC0N2BW                0
16   SQSP37SUBMYRE39P0                1
17   HWNOTJPA7R5PBDWLZ                0
18   DRZQCEQTOITFGYE1F                1
19   GS7OYBBL6M7ENHIK5                1
20   1UGMYO6ZDQ3KC72HT                0
21   8RRMNKA6SN8JH8R6M                1
22   VRJXVUQUWLMYNIC8X                1
23   6WBGQ85RXT2YDTPVP                0
24   C3TJEP9O4OHLFO

## Write the predictions back to Cosmos DB
Now you will save the previously created predictions DataFrame back to the `maintenance` collection in Cosmos DB.

Run the following cells to do so.

In [43]:
# Retrieve conneciton string and key from LinkService
import sys
import re

from pyspark.sql import SparkSession
sc = SparkSession.builder.getOrCreate()
token_library = sc._jvm.com.microsoft.azure.synapse.tokenlibrary.TokenLibrary

connection_string = token_library.getConnectionString('CosmosDbIoTLab')
matchObj = re.match( r'AccountEndpoint=(.*);Database=(.*);AccountKey="(.*)";', connection_string, re.M|re.I)
endpoint = matchObj.group(1)
masterkey = matchObj.group(3)

In [44]:
maintReadConfig = {
    "Endpoint" : endpoint,
    "Masterkey" : masterkey,
    "Database" : "ContosoAuto",
    "Collection" : "maintenance"
    }

maint = spark.read.format("com.microsoft.azure.cosmosdb.spark").options(**maintReadConfig).load()
maint.createOrReplaceTempView("maintenance")

writeConfig = {
    "Endpoint" : endpoint,
    "Masterkey" : masterkey,
    "Database" : "ContosoAuto",
    "Collection" : "maintenance",
    "Upsert" : "false"
    }

# Schema used by the maintenance collection
from pyspark.sql.types import StructType, StructField, IntegerType, StringType, BooleanType
maintSchema = StructType([
  StructField("vin",StringType(),True),
  StructField("serviceRequired",IntegerType(),True),
  StructField("id",StringType(),True),
  StructField("_attachments",StringType(),True),
  StructField("_etag",StringType(),True),
  StructField("_rid",StringType(),True),
  StructField("_self",StringType(),True),
  StructField("_ts",IntegerType(),True),
])

In [45]:
# delete any existing maintenance predictions
from azure.cosmos import CosmosClient, PartitionKey, exceptions

client = CosmosClient(endpoint, credential=masterkey)
database = client.get_database_client("ContosoAuto")
container = database.get_container_client("maintenance")

for item in container.query_items(query='SELECT * FROM c',
                                  enable_cross_partition_query=True):
    print('Deleting Document Id: {0}'.format(item['id']))
    container.delete_item(item, partition_key=item['vin'])

Deleting Document Id: 1668a5b9-c1e8-4db8-aa7e-83fc89486d2b
Deleting Document Id: 9be8fea7-5dd3-4b6e-9e4d-4bd60ce35a10
Deleting Document Id: 78c16daa-9675-4e70-b760-3610e2fda4ee
Deleting Document Id: db237a88-82b4-4135-b95e-991b0b6e0fd4
Deleting Document Id: d4f73293-dfbb-4eee-ac68-31f787baf752
Deleting Document Id: f104b09f-4646-485c-96e3-6751d44f3fe9
Deleting Document Id: 8cecfe88-5ab2-4168-8f05-3c0d20c9577c
Deleting Document Id: 83085f3d-3758-401d-8116-137bc182c1a3
Deleting Document Id: cd33616b-ed44-42f0-829a-bb5dc7f07552
Deleting Document Id: f11f7007-c9ee-429f-9f51-f72c645aede0
Deleting Document Id: eb632ebc-ee0a-4fb0-8bb5-e2ec3c667e5d
Deleting Document Id: 818c361d-9edd-4341-b995-854568fb9e56
Deleting Document Id: 4926fd0a-c042-4143-a4ab-8ba47d0ff5e8
Deleting Document Id: d4ff61c5-d676-4296-8dd6-b0a5eb997c23
Deleting Document Id: 070d2f90-2748-4900-811a-dc568c9f2c39
Deleting Document Id: f84f6f79-c85e-49dd-b5b1-14e3383a544b
Deleting Document Id: b74746de-60aa-40e0-b9c5-fa8a538b8e

In [46]:
# write the new prediction out to Cosmos DB
batch_predictions = spark.createDataFrame(batch_predictions_pdf)
batch_predictions.write.format("com.microsoft.azure.cosmosdb.spark").mode("overwrite").options(**writeConfig).save()

  'JavaPackage' object is not callable
Attempting non-optimization as 'spark.sql.execution.arrow.fallback.enabled' is set to true.