![](inference.png)

# Inference Layer
## Contents
- <a href='#1'>1. Load Python libraries and importing the data</a>  


- <a href='#2'>2. Feature Store Client initialization</a> 
    
    
- <a href='#3'>3. Feature Registry</a>
    - <a href='#3.1'>3.1  Scenario 1 : Batch Prediction</a> 
    - <a href='#3.2'>3.2. Scenario 2 :Online Redis Data for prediction</a> 

    




## 1. Load Python libraries and importing the data

In [63]:
import os,json
import numpy as np
import pandas as pd
import random,time
from datetime import datetime
from feast import Client, Feature, Entity, ValueType, FeatureTable
from feast.data_source import FileSource, KafkaSource
from feast.data_format import ParquetFormat, AvroFormat
from feast.pyspark.abc import RetrievalJobParameters, SparkJobStatus, SparkJob
import feast.staging.entities as entities
from feast.config import Config
import gcsfs
from pyarrow.parquet import ParquetDataset
from urllib.parse import urlparse

## 2. Feature Store Client initialization

>  Run the following Command by connecting to Kubernetes Cluster

%%bash
```
kubectl get svc 

```
- Copy the Redis & Kafka IP and paste in below variables 
- Copy the Project ID and paste below
- Copy the Dataproc Cluster Name and GCS Staging bucket
- Copy the GCS feast staging bucket

In [12]:

class feature_store_client:
    
    def __init__(self,env,bucket):
        
        self.env=env
        self.staging_bucket=bucket
        
    def feature_store_settings(self):
        
        if self.env.lower()=="dataproc":
            # Using environmental variables
            environment = {'FEAST_CORE_URL': 'feast-release-feast-core.default:6565',
                         'FEAST_DATAPROC_CLUSTER_NAME': 'dataprocfeast',
                         'FEAST_DATAPROC_PROJECT': '<PROJECT>',
                         'FEAST_DATAPROC_REGION': 'us-east1',
                         'FEAST_STAGING_LOCATION': self.staging_bucket,
                         'FEAST_HISTORICAL_FEATURE_OUTPUT_FORMAT': 'parquet',
                         'FEAST_HISTORICAL_FEATURE_OUTPUT_LOCATION': f"{self.staging_bucket}historical" ,
                         'FEAST_HISTORICAL_SERVING_URL': 'feast-release-feast-online-serving.default:6566',
                         'FEAST_REDIS_HOST': '<REDIS_IP>',
                         'FEAST_REDIS_PORT': '6379',
                         'FEAST_SERVING_URL': 'feast-release-feast-online-serving.default:6566',
                         'FEAST_SPARK_HOME': '/usr/local/spark',
                         'FEAST_SPARK_LAUNCHER': 'dataproc',
                         'FEAST_SPARK_STAGING_LOCATION': 'gs://dataproc-staging-us-east1-996861042416-4w01soni/artifacts/',
                         'FEAST_SPARK_STANDALONE_MASTER': 'local[*]',
                         'STAGING_BUCKET': self.staging_bucket,
                         'DEMO_KAFKA_BROKERS': '<KAFKA_IP>'
                           
                          }              
     
            for key,value in environment.items():
                os.environ[key] = value 
            
            
       

In [13]:
staging_bucket='gs://feast-staging-bucket-9919526/'
set_env=feature_store_client('Dataproc',staging_bucket)
set_env.feature_store_settings()

In [14]:

client = Client()
               

## 3.  Scenario 1 : Batch Prediction
##  Retrieving historical features
Feast provides a historical retrieval interface for exporting feature data in order to train machine learning models. Essentially, users are able to enrich their data with features from any feature tables.

1. Define feature references
Feature references define the specific features that will be retrieved from Feast. These features can come from multiple feature tables. The only requirement is that the feature tables that make up the feature references have the same entity (or composite entity).

2. Define an entity dataframe

Feast needs to join feature values onto specific entities at specific points in time. Thus, it is necessary to provide an entity dataframe as part of the get_historical_features method. In the example above we are defining an entity source. This source is an external file that provides Feast with the entity dataframe.

3. Launch historical retrieval job

Once the feature references and an entity source are defined, it is possible to call get_historical_features(). This method launches a job that extracts features from the sources defined in the provided feature tables, joins them onto the provided entity source, and returns a reference to the training dataset that is produced.

In [15]:
def read_parquet(uri):
    parsed_uri = urlparse(uri)
    if parsed_uri.scheme == "file":
        return pd.read_parquet(parsed_uri.path)
    elif parsed_uri.scheme == "gs":
        fs = gcsfs.GCSFileSystem()
        files = ["gs://" + path for path in fs.glob(uri + '/part-*')]
        ds = ParquetDataset(files, filesystem=fs)
        return ds.read().to_pandas()
    elif parsed_uri.scheme == 's3':
        import s3fs
        fs = s3fs.S3FileSystem()
        files = ["s3://" + path for path in fs.glob(uri + '/part-*')]
        ds = ParquetDataset(files, filesystem=fs)
        return ds.read().to_pandas()
    elif parsed_uri.scheme == 'wasbs':
        import adlfs
        fs = adlfs.AzureBlobFileSystem(
            account_name=os.getenv('FEAST_AZURE_BLOB_ACCOUNT_NAME'), account_key=os.getenv('FEAST_AZURE_BLOB_ACCOUNT_ACCESS_KEY')
        )
        uripath = parsed_uri.username + parsed_uri.path
        files = fs.glob(uripath + '/part-*')
        ds = ParquetDataset(files, filesystem=fs)
        return ds.read().to_pandas()
    else:
        raise ValueError(f"Unsupported URL scheme {uri}")

In [16]:
with open("features.json") as f:
    features = json.load(f)
features

['fare_statistics:passenger_count',
 'fare_statistics:fare_amount',
 'fare_statistics:target',
 'trip_statistics:pickup_longitude',
 'trip_statistics:pickup_latitude',
 'trip_statistics:dropoff_longitude',
 'trip_statistics:dropoff_latitude',
 'trip_statistics:longitude_distance',
 'trip_statistics:latitude_distance',
 'trip_statistics:distance_travelled']

In [17]:
def change_datetime(df,col):
    df[col]=pd.to_datetime(df[col])
    return df


In [18]:
entities_with_timestamp=pd.read_csv('gs://feastproject/driver_id.csv')
entities_with_timestamp=change_datetime(entities_with_timestamp,'event_timestamp')

In [19]:
# get_historical_features will return immediately once the Spark job has been submitted succesfully.
job = client.get_historical_features(
    feature_refs=features,
entity_source= entities.stage_entities_to_fs(entity_source=entities_with_timestamp, staging_location=os.getenv("STAGING_BUCKET"),config= Config)
)

In [20]:
# get_output_file_uri will block until the Spark job is completed.
output_file_uri = job.get_output_file_uri()

In [21]:
Master_featured_data=read_parquet(output_file_uri)
Master_featured_data.head()

Unnamed: 0,driver_id,event_timestamp,fare_statistics__passenger_count,fare_statistics__fare_amount,fare_statistics__target,trip_statistics__pickup_latitude,trip_statistics__dropoff_longitude,trip_statistics__pickup_longitude,trip_statistics__distance_travelled,trip_statistics__dropoff_latitude,trip_statistics__latitude_distance,trip_statistics__longitude_distance
0,969360,2020-10-18 13:52:18,1.0,-0.09439,1.0,40.775044,-73.967976,-73.976809,0.015092,40.762807,0.012237,0.008833
1,235304,2020-10-18 11:09:46,2.0,1.775493,1.0,40.737,-73.864842,-73.978438,0.118402,40.770393,0.033393,0.113596
2,879445,2020-10-19 19:47:45,4.0,1.442517,1.0,40.779072,-74.005989,-73.962341,0.082182,40.709438,0.069633,0.043648
3,432934,2020-10-19 16:14:06,1.0,-0.409891,0.0,40.789004,-73.959358,-73.966692,0.021315,40.809018,0.020014,0.007334
4,464887,2020-10-18 22:50:28,1.0,0.702562,1.0,40.753488,-74.006575,-73.97274,0.035912,40.741452,0.012036,0.033835


#### Spark Job Status

In [22]:
def wait_for_job_status(
    job: SparkJob,
    expected_status: SparkJobStatus,
    max_retry: int = 4,
    retry_interval: int = 5,
):
    for i in range(max_retry):
        if job.get_status() == expected_status:
            print("The Spark Job is Completed")
            return
        time.sleep(retry_interval)
    raise ValueError(f"Timeout waiting for job status to become {expected_status.name}")

In [23]:
wait_for_job_status(job,SparkJobStatus.COMPLETED)

The Spark Job is Completed


In [24]:
pred_dict={'instances':Master_featured_data.drop(columns=['driver_id','event_timestamp','fare_statistics__target']).iloc[7:8,:].values.tolist()[0]}

In [28]:
pred_dict

{'instances': [1.0,
  0.19824585210095336,
  40.85737991333008,
  -73.90807342529298,
  -73.90799713134766,
  7.864199879342028e-05,
  40.85736083984375,
  1.9073486328125e-05,
  7.629394532671085e-05]}

In [25]:
import requests
def predict_object_detection(data, host,session, url):
    headers = {}
    headers["Host"] = host
    cookies = None
    if session != "":
        cookies = {'authservice_session': session}
    print(url, headers, cookies)
    res = requests.post(url, json=data, headers=headers,cookies=cookies, verify=False)
    if res.status_code == 200:
        return res.json()
    else:
        return "Status code : {0}".format(res.status_code)


### Serving Enviorment Varibles

Here we will be using Internal DNS of Kubernetes Service
%%bash
```
kubectl get inferenceservice -n `$namespace`

```
**Format**:

**HOST**: `<SERVING_MODEL_NAME>.<NAMESPACE>.svc.cluster.local`

**URL**:  `http://<SERVING_MODEL_NAME>.<NAMESPACE>.svc.cluster.local/v1/models/<SERVING_MODEL_NAME>:predict`

- Copy the Host and paste below
- Copy the Predict URL endpoint

In [39]:
session=""
host='seldon-serving.aniruddha-choudhury.svc.cluster.local'
headers={"Host": host,"Cookie": "authservice_session={}".format(SESSION)}
url='http://seldon-serving.aniruddha-choudhury.svc.cluster.local/v1/models/seldon-serving:predict'


In [41]:
response = requests.post(url, data =json.dumps(pred_dict) ,headers = headers)
response.json()

{'predictions': 'Profit of Driver'}

### Scenario : 2 Online Redis Data for prediction

In [42]:
fare_details=pd.read_csv("faredetails.csv")
entities_sample = [{"driver_id": e} for e in fare_details['driver_id'].values.tolist()]
entities_sample=entities_sample[:20]
entities_sample

[{'driver_id': 610685},
 {'driver_id': 825735},
 {'driver_id': 428317},
 {'driver_id': 356886},
 {'driver_id': 603801},
 {'driver_id': 183971},
 {'driver_id': 600461},
 {'driver_id': 596197},
 {'driver_id': 382017},
 {'driver_id': 599864},
 {'driver_id': 486440},
 {'driver_id': 60412},
 {'driver_id': 318925},
 {'driver_id': 942474},
 {'driver_id': 111143},
 {'driver_id': 991464},
 {'driver_id': 239703},
 {'driver_id': 580070},
 {'driver_id': 640316},
 {'driver_id': 365485}]

In [43]:
features_online_data = client.get_online_features(
    feature_refs=features,
    entity_rows=entities_sample).to_dict()


In [44]:
features_online_data=pd.DataFrame(features_online_data)
features_online_data.head()

Unnamed: 0,driver_id,fare_statistics:passenger_count,trip_statistics:distance_travelled,trip_statistics:pickup_latitude,trip_statistics:dropoff_longitude,trip_statistics:longitude_distance,trip_statistics:dropoff_latitude,trip_statistics:pickup_longitude,fare_statistics:target,fare_statistics:fare_amount,trip_statistics:latitude_distance
0,610685,1,0.009436,40.721319,-73.84161,0.002701,40.712278,-73.844311,0,-1.354113,0.009041
1,825735,1,0.079696,40.711303,-73.979268,0.03678,40.782004,-74.016048,1,1.088648,0.070701
2,428317,2,0.013674,40.76127,-73.991242,0.008504,40.750562,-73.982738,0,-0.813646,0.010708
3,356886,1,0.02534,40.733143,-73.991567,0.004437,40.758092,-73.98713,0,-0.191734,0.024949
4,603801,1,0.01947,40.768008,-73.956655,0.01144,40.783762,-73.968095,0,-0.975267,0.015754


####  Rearrange the Table for the Prediction on Trained Inference

In [45]:
columns_regex=[]
for i in list(features_online_data.drop(columns=['driver_id']).columns):
    columns_regex.append('__'.join(i.split(':')))
    features_online_data=features_online_data.rename(columns = {i:'__'.join(i.split(':'))})
columns_regex

['fare_statistics__passenger_count',
 'trip_statistics__distance_travelled',
 'trip_statistics__pickup_latitude',
 'trip_statistics__dropoff_longitude',
 'trip_statistics__longitude_distance',
 'trip_statistics__dropoff_latitude',
 'trip_statistics__pickup_longitude',
 'fare_statistics__target',
 'fare_statistics__fare_amount',
 'trip_statistics__latitude_distance']

In [46]:
Master_Columns=list(Master_featured_data.columns)
Prediction_DataFrame=pd.DataFrame()
for i in Master_Columns:
    if i in columns_regex:
        Prediction_DataFrame[i]=features_online_data[i]
    else:
        pass
Prediction_DataFrame.dropna(inplace=True)
Prediction_DataFrame.drop(columns=['fare_statistics__target'],inplace=True)
Prediction_DataFrame.head() 

Unnamed: 0,fare_statistics__passenger_count,fare_statistics__fare_amount,trip_statistics__pickup_latitude,trip_statistics__dropoff_longitude,trip_statistics__pickup_longitude,trip_statistics__distance_travelled,trip_statistics__dropoff_latitude,trip_statistics__latitude_distance,trip_statistics__longitude_distance
0,1,-1.354113,40.721319,-73.84161,-73.844311,0.009436,40.712278,0.009041,0.002701
1,1,1.088648,40.711303,-73.979268,-74.016048,0.079696,40.782004,0.070701,0.03678
2,2,-0.813646,40.76127,-73.991242,-73.982738,0.013674,40.750562,0.010708,0.008504
3,1,-0.191734,40.733143,-73.991567,-73.98713,0.02534,40.758092,0.024949,0.004437
4,1,-0.975267,40.768008,-73.956655,-73.968095,0.01947,40.783762,0.015754,0.01144


In [58]:
pred_dict={'instances':Prediction_DataFrame.iloc[7:8,:].values.tolist()[0]}

In [61]:
response = requests.post(url, data =json.dumps(pred_dict) ,headers = headers)
response.json()

{'predictions': 'Profit of Driver'}