Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Persistent private instances #1749

Closed
da-source opened this issue Dec 31, 2020 · 41 comments
Closed

Persistent private instances #1749

da-source opened this issue Dec 31, 2020 · 41 comments
Labels
question Further information is requested

Comments

@da-source
Copy link

da-source commented Dec 31, 2020

I would like to use Cortex functionality, to create an application where each user will be able to request and communicate with AWS instance for a period of time. In this scenario, data of each user will be processed and stored on one whole AWS instance. From the documentation, I understand that each API call will use an instance that it is not busy at the moment. It wouldn’t be ideal if by making an API call, a user would receive sensitive data stored by a another user on the same instance. Would it be possible to somehow mark an instance to which an API call is being made? That way the data of individual users wouldn’t be made accesible to everyone, but only to those users who request/use an instance.

@da-source da-source added the question Further information is requested label Dec 31, 2020
@da-source da-source changed the title Persistent instances Persistent private instances Dec 31, 2020
@da-source
Copy link
Author

da-source commented Dec 31, 2020

I would like to use Cortex functionality, to create an application where each user will be able to request and communicate with AWS instance for a period of time. In this scenario, data of each user will be processed and stored on one whole AWS instance. From the documentation, I understand that each API call will use an instance that it is not busy at the moment. It wouldn’t be ideal if by making an API call, a user would receive sensitive data stored by a another user on the same instance. Would it be possible to somehow mark an instance to which an API call is being made? That way the data of individual users wouldn’t be made accesible to everyone, but only to those users who request/use an instance.

For example, when I request an API, Cortex will spin up an instance, and next time I make a call, will I access the same instance or some other one (considering that I will specify Compute requirement to be the max of the type of instance I’ll use, so that one instance will be used per API) ?

@deliahu
Copy link
Member

deliahu commented Jan 1, 2021

@da-source that is an interesting question. I have a few follow up questions:

  1. How many users do you expect to serve?
  2. Do you want there to be 1 user per instance, or is it ok if multiple users (at random) access the same instance, as long as a user's requests are all routed to the same instance each time?

Generally, it would be better to not do it this way if possible, since there is some behavior that could be undefined. For example, what if the user doesn't make a request for a bit, the instance spins down, and then the user makes a new request: will they have lost their data, or is it ok to spin up a new instance?

Also, locking users to specific instances could affect autoscaling's ability to distribute the load, since instances created during autoscaling could only be used for new users (all existing users would have to stay on their initial instance).

@da-source
Copy link
Author

@da-source that is an interesting question. I have a few follow up questions:

  1. How many users do you expect to serve?
  2. Do you want there to be 1 user per instance, or is it ok if multiple users (at random) access the same instance, as long as a user's requests are all routed to the same instance each time?

Generally, it would be better to not do it this way if possible, since there is some behavior that could be undefined. For example, what if the user doesn't make a request for a bit, the instance spins down, and then the user makes a new request: will they have lost their data, or is it ok to spin up a new instance?

Also, locking users to specific instances could affect autoscaling's ability to distribute the load, since instances created during autoscaling could only be used for new users (all existing users would have to stay on their initial instance).

Somewhere around 150-300 users. It would be best if each user only had access only to one instance. I will set up a web interface, with a timer set to amount of minutes an instance will be alive, once it spins up (which must be defined in cortex.yaml as I understand), after timer runs out, there will be an option to request a new instance. Since the time window at which instance is alive will be predefined and known by a user, the user will have that time window (around 30-50 minutes) to interact with the instance, after which new instance could be requested if needed.

Ability to distribute the load, that you’ve mentioned, I think wouldn’t be an issue, since if all existing users will have to stay on their instances, untill the timer runs out/instance spins down, that would actually be preferable.

@miguelvr
Copy link
Collaborator

miguelvr commented Jan 1, 2021

Somewhere around 150-300 users. It would be best if each user only had access only to one instance. I will set up a web interface, with a timer set to amount of minutes an instance will be alive, once it spins up (which must be defined in cortex.yaml as I understand), after timer runs out, there will be an option to request a new instance. Since the time window at which instance is alive will be predefined and known by a user, the user will have that time window (around 30-50 minutes) to interact with the instance, after which new instance could be requested if needed.

Ability to distribute the load, that you’ve mentioned, I think wouldn’t be an issue, since if all existing users will have to stay on their instances, untill the timer runs out/instance spins down, that would actually be preferable.

Hello!

Engineering-wise that sounds like a very bad idea for an application architecture. Can you tell us what is it that you are trying to achieve? There must be other ways to achieve it, while maintaining a stateless application.

@da-source
Copy link
Author

da-source commented Jan 1, 2021

@miguelvr I have a few quite large models and datasets, which I would like to be available to use at any given time and I would like to make this application as cost-effective as possible. I considered creating separate API for each model/dataset, but doing so would require running many idle instances on the cluster, when APIs are not being used. I also considered using a single API implementation - downloading all the models/datasets to each instance and then loading them when needed, using load_model, but doing so would: a) unnecessarily increase AWS data-transfer costs, since most of the models will be unused; b) the instance usual disk capacity of 50G might not be enough to store all the datasets.

If I could instead, create a single API, which would download the model+dataset combination on an instance upon user's request (using predict.py's payload), run the instance for set amount of time and then spin it down and that way I would avoid running many unused APIs or download many files that won't be used on an instance. Keeping only one instance per user, could help avoiding OOM errors, if a user decides to download few datasets or load more than one model (average compute requirement for each model is around 6G, so running only one model per instance is plauisble). Also, if instances will be given to users at random, a user might download/use one model on one instance and then will have to re-download the model, if he gets access to another instance without his model, and avoiding that is an additional motivation for keeping one instance per user.

@miguelvr
Copy link
Collaborator

miguelvr commented Jan 1, 2021

Sounds like what you want might be solved with multi model caching, which Cortex supports. You can check the documentation here: https://docs.cortex.dev/workloads/multi-model

@da-source
Copy link
Author

@miguelvr With multi-model caching, will all the models be downloaded on an instance, when it spins up or will the model of choice be downloaded after each API call specifying which model to get?

@miguelvr
Copy link
Collaborator

miguelvr commented Jan 1, 2021

You can configure how many models are cached on disk

@da-source
Copy link
Author

da-source commented Jan 1, 2021

You can configure how many models are cached on disk

But will the models be downloaded on instance startup or API call? I don’t want to unnecessarily download models/data, since they are quite large and AWS charges 0.15$/1GB

@da-source
Copy link
Author

da-source commented Jan 2, 2021

@deliahu Multi-model documentation states that you can store many models in cache and then load model that you need. But will all the models be downloaded on an instance, when it is requested?

Or will the model specified in query_params be downloaded? If so, if it is downloaded on one instance, and API call randomly redirects user to another instance without that model, will it have to be downloaded again?

I’m trying to avoid those AWS data transfer prices, because my files are so large (all of them combined currently weigh around 95-100GB)

@RobertLucian
Copy link
Member

RobertLucian commented Jan 2, 2021

@da-source the models are downloaded only when they are requested. This means that when multi-model caching is enabled, the models won't get downloaded when the API starts, but when requests start coming in. The cache then gets populated until its threshold is hit (the max number of models kept in the cache) and then they are dropped based on the LRU policy.

The models that will get downloaded are those specified in the API spec (predictor.models field), not the ones that are passed into query_params. Generally, query_params is used to specify the model that has to be used when running the inference.

To reiterate, the API will start with no models, and only when requests come in, that's when they get downloaded. Subsequent requests for the same model will be faster because the model will already be present on disk / in memory.

The opposite is when only live reloading is enabled, for which case, the models are downloaded on the API from the very beginning - and they are always available to the user.

Does that make sense to you? And if not, could you tell us where we could improve our documentation (the parts that are confusing if any) with regards to live-reloading/multi-model caching? Your feedback is gonna help us improve our documentation for future users :)


Out of pure curiosity, what kind of models are you deploying? 95-100GB per each one is quite a lot.

@da-source
Copy link
Author

da-source commented Jan 2, 2021

@da-source the models are downloaded only when they are requested. This means that when multi-model caching is enabled, the models won't get downloaded when the API starts, but when requests start coming in. The cache then gets populated until its threshold is hit (the max number of models kept in the cache) and then they are dropped based on the LRU policy.

The models that will get downloaded are those specified in the API spec (predictor.models field), not the ones that are passed into query_params. Generally, query_params is used to specify the model that has to be used when running the inference.

To reiterate, the API will start with no models, and only when requests come in, that's when they get downloaded. Subsequent requests for the same model will be faster because the model will already be present on disk / in memory.

The opposite is when only live reloading is enabled, for which case, the models are downloaded on the API from the very beginning - and they are always available to the user.

Does that make sense to you? And if not, could you tell us where we could improve our documentation (the parts that are confusing if any) with regards to live-reloading/multi-model caching? Your feedback is gonna help us improve our documentation for future users :)

Out of pure curiosity, what kind of models are you deploying? 95-100GB per each one is quite a lot.

Thanks, for clearing it up! One small detail: with multi-model caching, when request for a certain model comes, will it affect only one instance or will the requested model be downloaded on all of the running instances?
And should query_params be used to define which model to request (with multi-model caching)?

Multi-model part of the documentation seemed a little unclear to me and you could update it with the information that you’ve given me, to make the documentation a bit more detailed. Also, it’s not clear from documentation how to use query_params when making a request to an API with curl*

I’m using a number of pretrained XLNET models. What I meant to say was, all of them combined weigh around 100GB (not each one of them).

@RobertLucian
Copy link
Member

@da-source with model-multi caching, when a request lands in for a certain model, it will only affect a single API replica (not an instance and nor all instances). An API replica can fit a single time or multiple times on a single instance, it depends on how many compute resources the API demands.

I suppose that the thing you are requesting is described by #1288 ticket. Is this something that you would definitely need?

And should query_params be used to define which model to request (with multi-model caching)?

Anything that's passed into the predict method can be used to determine the model to use. It's just convenient to use query_params because the payload is generally kept for the model input.

Multi-model part of the documentation seemed a little unclear to me and you could update it with the information that you’ve given me, to make the documentation a bit more detailed. Also, it’s not clear from documentation how to use query_params when making a request to an API with curl*

I see! We'll see what we can do about it! Thank you!

@da-source
Copy link
Author

da-source commented Jan 4, 2021

@RobertLucian I see. If I specify API compute requirement, so that it can fit only on one instance, make a certain model request (which will download model on one instance, if its not present on an instance), and then after some time make the request for the same model, will Cortex access the instance on which model was previously downloaded or will instance be chosen at random, and model will have to be downloaded again, if the randomly accessed instance doesn’t have the previously downloaded model?

@deliahu
Copy link
Member

deliahu commented Jan 5, 2021

will instance be chosen at random, and model will have to be downloaded again, if the randomly accessed instance doesn’t have the previously downloaded model?

Yes that is correct

@da-source
Copy link
Author

da-source commented Jan 5, 2021

will instance be chosen at random, and model will have to be downloaded again, if the randomly accessed instance doesn’t have the previously downloaded model?

Yes that is correct

Is there a way to modify this behaviour or at least increase chances of the instance with the needed model at hand being used, instead of an instance which doesn’t have the previously downloaded model? That way, the models that aren’t being chosen often, wouldn’t need to be downloaded on each API call. Ideally, each user would repeatedly access only one instance (where the needed model and data would be stored at), before instance spins down

@deliahu
Copy link
Member

deliahu commented Jan 6, 2021

@da-source something like this could be possible for us to implement. Since it would be based on consistent hashing of a request header (e.g. user ID) it would rely on a few things being true to run as smoothly as possible, e.g. do your users each generate a similar number of requests, or is there a wide variety?

There is another option worth considering, which is to create a separate API for each user. This would ensure that each replica is fully owned by a single user: all requests would be routed to the same replica, however multiple users would not be able to share a replica (I wrote "replica" since you can have multiple replicas per instance). Does this approach seem promising?

@da-source
Copy link
Author

@da-source something like this could be possible for us to implement. Since it would be based on consistent hashing of a request header (e.g. user ID) it would rely on a few things being true to run as smoothly as possible, e.g. do your users each generate a similar number of requests, or is there a wide variety?

There is another option worth considering, which is to create a separate API for each user. This would ensure that each replica is fully owned by a single user: all requests would be routed to the same replica, however multiple users would not be able to share a replica (I wrote "replica" since you can have multiple replicas per instance). Does this approach seem promising?

In my case, the deviation in number of user requests I think would usually be below 100. Could something like this be done with the current architecture?

How would the autoscaling work, in the scenario that you’ve suggested? Is there a way to auto-create/delete API upon each new user’s request or will I need to launch enormous amount of APIs on a cluster at launch?

@deliahu
Copy link
Member

deliahu commented Jan 7, 2021

In my case, the deviation in number of user requests I think would usually be below 100. Could something like this be done with the current architecture?

The current architecture supports it, but it would need to be implemented (this functionality is not currently implemented or exposed to the user).

How would the autoscaling work, in the scenario that you’ve suggested? Is there a way to auto-create/delete API upon each new user’s request or will I need to launch enormous amount of APIs on a cluster at launch?

Each API would have it's own autoscaling, so it would be based on the traffic generated by each user. Each user would have a dedicated API with at least one replica.

Is there a way to auto-create/delete API upon each new user’s request or will I need to launch enormous amount of APIs on a cluster at launch?

You can create/delete APIs programmatically. The user would have to make two different types of requests: one to create the API, and one to call it once it's live. For the API that creates Cortex APIs, for separation of concerns, it might best to run this outside of the cortex cluster on a separate backend (e.g. app engine, elastic beanstock, lambda, heroku, etc). But I don't see why it couldn't run in the Cortex cluster if that's your preference. You would use Cortex's Python client to create/delete the Cortex APIs.

@da-source
Copy link
Author

da-source commented Jan 7, 2021

@deliahu Deploying an API that creates/deletes Cortex APIs seems to be the best solution in my case!

I'm getting TypeError: string indices must be integers when trying to create Python cluster_client, using:

#Cortex client
import cortex
cortex.cluster_client('name','aws','myoperator','key','skey')

What could be causing this?

@deliahu
Copy link
Member

deliahu commented Jan 8, 2021

@da-source Do you also see a stack trace? If so, do you mind sending that? Also, what version of Cortex are you using?

Also, just to make sure, when you are creating the client, are you replacing "myoperator" with the actual operator endpoint (e.g. https://a3d8ff1b76d304a399ef0c3cd97adcd4-170b826c42af4167.elb.us-west-2.amazonaws.com, and "key"/"skey" with your actual key and secret key?

@da-source
Copy link
Author

da-source commented Jan 8, 2021

TypeError                                 Traceback (most recent call last)
<ipython-input-95-88a0fb64dd75> in <module>()
      2 import cortex
      3 cortex.cluster_client('a','aws','operator',
----> 4                       'key','skey')

1 frames
/usr/local/lib/python3.6/dist-packages/cortex/__init__.py in cluster_client(name, provider, operator_endpoint, aws_access_key_id, aws_secret_access_key)
     86     run_cli(cli_args, hide_output=True)
     87 
---> 88     return Client(name)
     89 
     90 

/usr/local/lib/python3.6/dist-packages/cortex/client.py in __init__(self, env)
     43         """
     44         self.env = env
---> 45         self.env_name = env["name"]
     46 
     47     # CORTEX_VERSION_MINOR

TypeError: string indices must be integers

Yes, of course, I replace the arguments with the actual keys and operator

@da-source
Copy link
Author

@deliahu I’m using Cortex 0.25

@miguelvr
Copy link
Collaborator

miguelvr commented Jan 8, 2021

@da-source it seems to me that you are not using the correct cortex client.

please try using this:

import cortex as cx

client = cx.client(your_env_name)

@da-source
Copy link
Author

@miguelvr I don't think that is the issue, I'm getting:

client = cx.client(test)
NotFound: can't find environment test, create one by calling `cortex.cluster_client()`

@miguelvr
Copy link
Collaborator

miguelvr commented Jan 8, 2021

and what is the name of the environment in which your cluster is?

please run cortex env list in your terminal

@vishalbollu
Copy link
Contributor

@da-source Thanks for bringing this to our attention. There is indeed a bug in the code that prevents the creation and updating of environments in python. This PR will fix the bug #1772.

As a temporary workaround, please use the cortex env configure CLI command in the terminal to create and update environments. Once the environment has been created/updated using the CLI, creating a client from that environment e.g. client = cx.client("test") should work as expected.

@da-source
Copy link
Author

da-source commented Jan 8, 2021

@da-source Thanks for bringing this to our attention. There is indeed a bug in the code that prevents the creation and updating of environments in python. This PR will fix the bug #1772.

As a temporary workaround, please use the cortex env configure CLI command in the terminal to create and update environments. Once the environment has been created/updated using the CLI, creating a client from that environment e.g. client = cx.client("test") should work as expected.

Alright! But it still haven't resolved the problem that @deliahu and I started adressing:

cortex.cluster_client('a','aws','op',
                      'key','skey')
TypeError: string indices must be integer

@vishalbollu
Copy link
Contributor

vishalbollu commented Jan 8, 2021

There is a bug in cortex.cluster_client() so it won't work. The bug will be fixed in the next release.

As a workaround, you can mimic the behaviour of cortex.cluster_client() by using the Cortex CLI in a terminal to create an environment for your cluster using the command cortex env configure. Once the environment is created you can directly access the client using client = cortex.client("test").

If running Cortex CLI commands isn't feasible in your use case, you can run the python code below which will mimic the behaviour of cortex.cluster_client().

import cortex
from cortex.binary import run_cli

env_name = "test"
provider = "aws"
operator_endpoint = "your operator endpoint"
aws_access_key_id = "your aws access key id"
aws_secret_access_key = "your aws secret key"

cli_args = ["env", "configure", env_name, "--provider", provider, "--operator-endpoint", operator_endpoint, "--aws-access-key-id", aws_access_key_id, "--aws-secret-access-key", aws_secret_access_key]

run_cli(cli_args, hide_output=True)

client = cortex.client(env_name)

Let me know if this works for you.

@da-source
Copy link
Author

da-source commented Jan 9, 2021

@vishalbollu I was able to create a client, but I'm getting an error when trying to create_api():

class PythonPredictor:
    def __init__(self):
        s3 = boto3.client('s3', aws_access_key_id='key',
        aws_secret_access_key='skey')
        s3.download_file('bucket', 'key', "filename")

    def predict(self):
        return 'hello'

client.create_api(api_spec= {'name': 'testapi',
  'kind':'RealtimeAPI',
    'type': 'python',
    'path': 'predictor.py',
    'cpu': '1',
    'mem': '1G',
    'api_gateway': 'none'}, predictor=PythonPredictor)

---------------------------------------------------------------------------
CortexBinaryException                     Traceback (most recent call last)
<ipython-input-14-fd690c2c9ee5> in <module>()
      6     'cpu': '1',
      7     'mem': '1G',
----> 8     'api_gateway': 'none'}, predictor=PythonPredictor)

2 frames
/usr/local/lib/python3.6/dist-packages/cortex/binary/__init__.py in run_cli(args, hide_output, mixed_output)
    108         raise CortexBinaryException(result + "\n" + process.stderr.read())
    109 
--> 110     raise CortexBinaryException(process.stderr.read())
    111 
    112 

CortexBinaryException: error: Post "https://aaaca5bfce6a64cd4aefb45335fb109f-8c4bfefdef26a3e8.elb.us-east-1.amazonaws.com/deploy: dial tcp: lookup aaaca5bfce6a64cd4aefb45335fb109f-8c4bfefdef26a3e8.elb.us-east-1.amazonaws.com on 127.0.0.11:53: no such host

unable to connect to your cluster in the test environment (operator endpoint: https://aaaca5bfce6a64cd4aefb45335fb109f-8c4bfefdef26a3e8.elb.us-east-1.amazonaws.com)

if you don't have a cluster running:
    → if you'd like to create a cluster, run `cortex cluster up --configure-env test`
    → otherwise you can ignore this message, and prevent it in the future with `cortex env delete test`

if you have a cluster running:
    → run `cortex cluster info --configure-env test` to update your environment (include `--config <cluster.yaml>` if you have a cluster configuration file)
    → if you set `operator_load_balancer_scheme: internal` in your cluster configuration file, your CLI must run from within a VPC that has access to your cluster's VPC (see https://docs.cortex.dev/v/0.26/)

I tried running cortex cluster info --configure-env test in the console and got:

your cluster has 0 API replicas running across 1 instance

instance type   lifecycle   replicas   CPU (requested / total allocatable)   memory (requested / total allocatable)
t3.medium       on-demand   0          0m / 1290m                            0 / 2851464Ki

an environment named "test" has been configured to point to this cluster (and was set as the default environment)

But after re-running create_api() code again, I got the same error. (I'm testing the code in Colab environment, but my cluster is running on AWS so I'm not sure if it makes a difference).

@deliahu
Copy link
Member

deliahu commented Jan 10, 2021

@da-source This error is happening because for some reason the Python client is not able to connect to the cluster.

When you run cortex cluster info --configure-env test, does it print out the operator endpoint, and does it match what you are using in Python (https://aaaca5bfce6a64cd4aefb45335fb109f-8c4bfefdef26a3e8.elb.us-east-1.amazonaws.com)? Also, if you run cortex get --env test on the CLI does it work, or give you the same error?

@da-source
Copy link
Author

da-source commented Jan 11, 2021

cortex cluster info --configure-env test

The operator I get by calling cortex cluster info --configure-env test matches the one I'm using in Python and running cortex get --env test in the CLI returns no apis are deployed.

Also, I retried running this snippet, which mimics the behaviour of cortex.cluster_client():

import cortex
from cortex.binary import run_cli

env_name = "test"
provider = "aws"
operator_endpoint = "@"
aws_access_key_id = "@"
aws_secret_access_key = "@"

cli_args = ["env", "configure", 'env_name',  "--provider", provider, "--operator-endpoint", operator_endpoint, "--aws-access-key-id", aws_access_key_id, "--aws-secret-access-key", aws_secret_access_key]

run_cli(cli_args, hide_output=True)

client = cortex.client(env_name)

And got this error:

NotFound: can't find environment test, create one by calling cortex.cluster_client()

Which is weird because cortex.cluster_client() isn't working

@vishalbollu
Copy link
Contributor

Is it possible that there is a copy-paste error? Based on the script you've specified above, 'env_name' is in single quotes in the line cli_args = [...]. Therefore, it is possible that rather than creating an environment named test, an environment named env_name is created. Please try again after removing the single quotes around 'env_name' and let us know how it goes.

@da-source
Copy link
Author

@vishalbollu Yes, that was a simple copy-paste error :)

  1. When creating an API using Python client, what should I do about the 'predictor path' argument? When using CLI, I usually set it to predictor.py which is located in my project folder, but should I omit it with Python client since I'm passing a predictor=PythonPredictor?
client.create_api(api_spec= {'name': 'testapi3',
  'kind':'RealtimeAPI',
  'predictor':
    {'type': 'python',
    'path': 'predictor.py'},
    'compute':
    {'cpu': '1',
    'mem': '5G'}}, predictor=PythonPredictor)
  1. I would like to use AWS Lambda to run the Python client, which will create/return/delete Cortex APIs. What would be the best way to specify which kind of predictor the Cortex API to be deployed will use (assuming that all the APIs will have the same specs, but different predictors), should I just go with simple if statements:
#define predictor
class PythonPredictor:
    def __init__(self, model_name):
      s3 = boto3.client('s3', aws_access_key_id='@', aws_secret_access_key='@')
      if (model_name='a'):
        s3.download_file('a', 'a', "./a")
      if (model_name='b'):
        s3.download_file('b', 'b', "./b")

    def predict(self):
        #do something

#create api
client.create_api(api_spec= {'name': 'modelname',
  'kind':'RealtimeAPI',
  'predictor':
    {'type': 'python'},
    'compute':
    {'cpu': '1',
    'mem': '5G'}}, predictor=PythonPredictor('modelname'))

or would you recommend some other way?

@RobertLucian
Copy link
Member

RobertLucian commented Jan 13, 2021

@da-source so let's answer your question.

  1. When creating an API using Python client, what should I do about the 'predictor path' argument? When using CLI, I usually set it to predictor.py which is located in my project folder, but should I omit it with Python client since I'm passing a predictor=PythonPredictor?

When using the Python Client, if the predictor parameter is specified, then setting the predictor.path field is no longer required. If the predictor parameter is passed, then specifying the predictor.path is required.

  1. I would like to use AWS Lambda to run the Python client, which will create/return/delete Cortex APIs. What would be the best way to specify which kind of predictor the Cortex API to be deployed will use (assuming that all the APIs will have the same specs, but different predictors), should I just go with simple if statements:

The best way to configure the API at deploy time is to populate the predictor.config field (dictionary). Then you will use the config parameter as it is passed into your API predictor class constructor to configure the API's behavior.

One thing I notice is that the PythonPredictor implementation in your example has the incorrect signature for your constructor - at the very least, the config parameter must be present in the constructor's signature. Check out the docs here.

2nd thing I notice is that in

client.create_api(api_spec= {'name': 'modelname',
  'kind':'RealtimeAPI',
  'predictor':
    {'type': 'python'},
    'compute':
    {'cpu': '1',
    'mem': '5G'}}, predictor=PythonPredictor('modelname'))

you're initializing the PythonPredictor class, when in reality, it shouldn't. The class will get initialized when the API gets deployed on the Cortex cluster. The predictor parameter only takes in a class, not an initialized object - just like when a cortex.yaml API is getting deployed from a directory - the class definition is there, but it's not initialized at deploy time.

All in all, your example would look like this:

# define predictor
class PythonPredictor:
  def __init__(self, config):
    self.config = config
    if self.config["condition"] == "value-1":
        # initialization of type 1
    elif self.config["condition"] == "value-2":
        # initialization of type 2
    else:
        # whatever other kind of initialization

  def predict(self):
    #do something

# create api
client.create_api(api_spec={
  'name': 'modelname',
  'kind':'RealtimeAPI',
  'predictor': {
      'type': 'python',
      'config': {'condition': 'value-1'},
  },
  'compute': {
    'cpu': '1',
    'mem': '5G'
  }}, predictor=PythonPredictor)

Let us know if this clears up things for you!

@da-source
Copy link
Author

da-source commented Jan 13, 2021

@RobertLucian Thanks! And what should should I do about importing libraries? I tried doing it on init but that didn't seem to be working:

class PythonPredictor:
  def __init__(self, config):
    import boto3

  def predict(self):
    return boto3 # Outputs 'boto3' not defined

@RobertLucian
Copy link
Member

@da-source the imports' scope is limited to their respective module/class/function(or method). In your case, the boto3 package is only limited to the constructor method. One way to make this work is to assign the module to a class attribute such as boto3:

class PythonPredictor:
  def __init__(self, config):
    self.boto3 = __import__("boto3")
    # or import boto3 and then assign boto3 to boto3 attribute

  def predict(self):
    return self.boto3

Generally, packages should not be made available this way (through assigning them to class attributes) - the imports should be done in the constructor, you initialize whatever you need there, and then in the predict method (or others), you use whatever was initialized in the constructor.

I wouldn't recommend doing the imports at the class level like this either:

class PythonPredictor:
  boto3 = __import__("boto3")

And that's because the imports will be done when the class is defined (and not when an object of this class is instantiated).


There may be some work we have to do to improve the UX with regards to importing modules though.

@da-source
Copy link
Author

@RobertLucian So for now, this:

class PythonPredictor:
  def __init__(self, config):
    self.boto3 = __import__("boto3")
    # or import boto3 and then assign boto3 to boto3 attribute

  def predict(self):
    return self.boto3

is the best option, am I correct?

@RobertLucian
Copy link
Member

RobertLucian commented Jan 14, 2021

@da-source as long as you're required to use boto3 in your predict method, yes, for now.

That being said, I would still point out that importing the modules wherever they are needed is the better alternative - in your case, I would strive to import them in the constructor and use them to initialize stuff just there.

@deliahu
Copy link
Member

deliahu commented Jan 15, 2021

@da-source @RobertLucian I just wanted to be clear and double check for others reading this thread that this is the recommended approach:

class PythonPredictor:
  def __init__(self, config):
    import boto3
    # boto3 can be used here


  def predict(self):
    import boto3
    # boto3 can be used here

@deliahu
Copy link
Member

deliahu commented Jan 20, 2021

@da-source I'll go ahead and close this issue, let us know if you have additional questions

@deliahu deliahu closed this as completed Jan 20, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

5 participants