In [1]:
# !pip install gevent

## Simple Example - using gevent package

In [2]:
import gevent # loading the gevent package
from time import time # For response time comparisons
class Prediction:
    def __init__(self):
        pass
    
    def first_model_prediction(self, data=None, arg=None):
        print ("In first model")
        # Code for preprocessing, feature engineering if required
        # Assuming that this takes ~ 500ms
        # Let us sleep for 500ms
        gevent.sleep(0.5)
        print ("Out first model")
        
        return "prediction1"
    
    def second_model_prediction(self, data=None, arg=None):
        print ("In second model")
        # Code for preprocessing, feature engineering if required
        # Assuming that this takes ~ 500ms
        # Let us sleep for 500ms
        gevent.sleep(0.5)
        print ("Out second model")
        
        return "prediction2"
    
    # final prediction 
    def predict(self, data=None):
        # Code for preprocessing, feature engineering in case both models uses same features
        arg = None # Dummy argument
        
        # Creating separate threads for each model
        g1 = gevent.spawn(self.first_model_prediction, data, arg)
        g2 = gevent.spawn(self.second_model_prediction, data, arg)
        
        # Joining the threads together
        gevent.joinall([g1, g2])
        
        # getting the first model results
        first_model_result = g1.value

        # getting the second model results
        second_model_result = g2.value

        # Performing some calculations with the results of the two models
        final_result = first_model_result + second_model_result
            
        return final_result

- Please note that I have used a class instead of simple functions for people interested in Object Oriented Programming. We can instead use simple functions as well.

In [3]:
predict = Prediction()

tb = time()
# Prediction using the first model
first_result = predict.first_model_prediction()
# Prediction using the second model
second_result = predict.second_model_prediction()
ta = time()
print ("total time taken without using gevent is {}".format(ta-tb))

In first model
Out first model
In second model
Out second model
total time taken without using gevent is 1.0057871341705322


- Since the functions are running sequentially. So calling the functions directly ended up with a response time of close to 1 second. 
- As seen, The sequence of output is first model In and Out. Then second model In and Out.

- Let us try using the predict function which uses gevent and run the models in parallel.

In [4]:
tb = time()
result = predict.predict(None)
ta = time()
print ("total time taken using gevent is {}".format(ta-tb))

In first model
In second model
Out first model
Out second model
total time taken using gevent is 0.5052769184112549


- As shown above, the model prediction ran in parallel. First model and Second model In together and are processed simultaneously. And the response time is now close to 500ms.

- The total time taken is the maximum of the time taken by both the function.