End-to-end tests for Pre-packaged model servers hang if name doesn't match exactly #820

axsaucedo · 2019-08-26T17:05:02Z

When running e2e tests the rollout deployment checks are done with the exact string of automatically generated deployment name - i.e.:

seldon-core/testing/scripts/test_prepackaged_servers.py

Line 35 in 60c9fd2

wait_for_rollout("iris-default-8bb3ef6")

If the model is created with a different name the deployment doesn't start. A fix could be to monitor the deployment through the label name as opposed to the generated name.

ukclivecox · 2019-08-26T18:53:26Z

I agree a better way to wait for rollout that is less brittle is needed.

RafalSkolasinski · 2020-01-06T12:37:33Z

Do we have cases when the deployment failed? How could I reproduce the failure?

Do I understand correctly that scope of the fix would be to modify wait_for_rollout function

seldon-core/testing/scripts/test_prepackaged_servers.py

Line 21 in 60c9fd2

def wait_for_rollout(deploymentName):

to monitor rollout status using labels and add appropriate labels to object's yaml definitions, e.g. here?

RafalSkolasinski · 2020-01-06T15:52:42Z

It seems that kubectl rollout status is expecting a deployment name, not labels.
I see two options:

Add labels to deployments and knowing it and the part of name (in context of example above: iris-default) find full name of the deployment using kubectl get ...
Write a python function that process yaml object definition and generate full name

I did try to do 2. Following function

def deployment_name(fname):
    with open(fname, 'r') as f:
        data = yaml.safe_load(f.read())
    
    sdep_name = data['metadata']['name']
    predictor_spec = data['spec']['predictors'][0]
    pod_spec = predictor_spec['componentSpecs'][0]['spec']

    s = []
    for container in pod_spec['containers']:
        s.append(container['name'])
        s.append(container['image'])

    s = ":".join(s) + ";"
    pod_hash = hashlib.md5(s.encode()).hexdigest()[:7]            
    
    sdep_name = "-".join([sdep_name, predictor_spec['graph']['name'], pod_hash])
    return sdep_name

seems to work properly on yaml's that define containers, .e.g this one but does not work on ones that do not define containers, e.g. iris.yml.

RafalSkolasinski · 2020-01-06T16:00:11Z

In case of iris.yml the name comes from pod having container classifier that uses seldonio/sklearnserver_rest:0.2 image:

>>> name = "classifier"
>>> image = "seldonio/sklearnserver_rest:0.2"
>>> s = f"{name}:{image};"
>>> hashlib.md5(s.encode()).hexdigest()[:7]    
4903e3c

As the yaml file does not contain information about which image will be used it may be better to indeed go with adding labels and filtering by them manually, a.k.a. option 1 in previous comment.

RafalSkolasinski · 2020-01-06T16:27:40Z

I think I may have found another option. I believe that SeldonDeployment names must be unique in within a namespace. If that is true I can get name of deployments with for example

>>> import yaml
>>> from subprocess import run
>>> ret = run('kubectl get -n seldon seldondeployment sklearn -o yaml', shell=True, capture_output=True)
>>> data = yaml.safe_load(ret.stdout.decode())
>>> list(data['status']['deploymentStatus'])
['iris-default-4903e3c']

@axsaucedo @adriangonz What do you think? It seems like simplest and shortest solution.

RafalSkolasinski · 2020-01-06T17:42:48Z

I pushed proof of concept fix. Check #1315.

New approach is based on getting deyployment names directly from SeldonDeployment objects. This allow to avoid hard-coded hashes in test scripts.

* 1297 WIP Update Analytics Helm Chart Signed-off-by: glindsell <gl@seldon.io> * Update README.md ns: seldon -> seldon-system * first try * add preprocessor and structure notebook * pack outlier detection into seldon deployment * add endpoint that combines the classification and outlier detection * polish example and return outliers score via tags * cleanup model wrapper * push alternative layout of the example * add combiner to the example * add comments in new notebook * use jsonData instead of strData for return values * add logging * introduce base image to optimize s2i builds * remove redundant version of the example * adjust image names * add images and remove output from requirement installation cells * Bump pillow from 6.2.0 to 7.0.0 in /python Bumps [pillow](https://github.com/python-pillow/Pillow) from 6.2.0 to 7.0.0. - [Release notes](https://github.com/python-pillow/Pillow/releases) - [Changelog](https://github.com/python-pillow/Pillow/blob/master/CHANGES.rst) - [Commits](python-pillow/Pillow@6.2.0...7.0.0) Signed-off-by: dependabot-preview[bot] <support@dependabot.com> * Bump okhttp from 4.2.2 to 4.3.0 in /engine Bumps [okhttp](https://github.com/square/okhttp) from 4.2.2 to 4.3.0. - [Release notes](https://github.com/square/okhttp/releases) - [Changelog](https://github.com/square/okhttp/blob/master/CHANGELOG.md) - [Commits](square/okhttp@parent-4.2.2...parent-4.3.0) Signed-off-by: dependabot-preview[bot] <support@dependabot.com> * Automatically find deployment names in e2e tests, closes #820 New approach is based on getting deyployment names directly from SeldonDeployment objects. This allow to avoid hard-coded hashes in test scripts. * set deployment replicas * Use https for training set * Remove log4j from pom * Update link * apply fix to other tests and iterate over deployments in wait_for_rollout * adjust to tests being run with Python 3.6 * remove note about missing graph, add nblink * modify local operator tests to use proper namespace and run helm uninstall at the end * update to new kind * request ephemeral storage * exception should be logged * Bump okhttp from 4.3.0 to 4.3.1 in /engine Bumps [okhttp](https://github.com/square/okhttp) from 4.3.0 to 4.3.1. - [Release notes](https://github.com/square/okhttp/releases) - [Changelog](https://github.com/square/okhttp/blob/master/CHANGELOG.md) - [Commits](square/okhttp@parent-4.3.0...parent-4.3.1) Signed-off-by: dependabot-preview[bot] <support@dependabot.com> * operator build test * 1297 WIP Update Analytics Helm Chart Signed-off-by: glindsell <gl@seldon.io> * typo fix: missing api in io.seldon.wrapper.api.SeldonPredictionService * Create and use seldonio/core-builder:0.10 * fix operator build - controller-gen install for go modules * make gpu image Python 3 exclusive, closes #1324 * version 1.0.1 * version 1.0.2-SNAPSHOT * seldon-core python version 1.0.1 * python wrapper version usage updated * update images reference doc Co-authored-by: RafalSkolasinski <r.j.skolasinski@gmail.com> Co-authored-by: dependabot-preview[bot] <27856297+dependabot-preview[bot]@users.noreply.github.com> Co-authored-by: Adrian Gonzalez <adrian.gonz.mar@gmail.com> Co-authored-by: Ryan Dawson <ryandawson@cantab.net> Co-authored-by: Gurminder Sunner <gsunner2000@gmail.com>

ukclivecox added this to To do in 1.0 via automation Aug 26, 2019

ukclivecox added this to the 1.0.x milestone Aug 26, 2019

ukclivecox closed this as completed Nov 7, 2019

1.0 automation moved this from To do to Done Nov 7, 2019

ukclivecox reopened this Nov 7, 2019

1.0 automation moved this from Done to In progress Nov 7, 2019

ukclivecox added this to To do in 1.1 via automation Nov 7, 2019

ukclivecox removed this from In progress in 1.0 Nov 7, 2019

ukclivecox modified the milestones: 1.0, 1.1 Nov 7, 2019

RafalSkolasinski mentioned this issue Jan 6, 2020

fix issue 820 #1315

Merged

RafalSkolasinski moved this from To do to In progress in 1.1 Jan 8, 2020

seldondev closed this as completed in fd1beaf Jan 9, 2020

1.1 automation moved this from In progress to Done Jan 9, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

End-to-end tests for Pre-packaged model servers hang if name doesn't match exactly #820

End-to-end tests for Pre-packaged model servers hang if name doesn't match exactly #820

axsaucedo commented Aug 26, 2019

ukclivecox commented Aug 26, 2019

RafalSkolasinski commented Jan 6, 2020

RafalSkolasinski commented Jan 6, 2020

RafalSkolasinski commented Jan 6, 2020

RafalSkolasinski commented Jan 6, 2020

RafalSkolasinski commented Jan 6, 2020

End-to-end tests for Pre-packaged model servers hang if name doesn't match exactly #820

End-to-end tests for Pre-packaged model servers hang if name doesn't match exactly #820

Comments

axsaucedo commented Aug 26, 2019

ukclivecox commented Aug 26, 2019

RafalSkolasinski commented Jan 6, 2020

RafalSkolasinski commented Jan 6, 2020

RafalSkolasinski commented Jan 6, 2020

RafalSkolasinski commented Jan 6, 2020

RafalSkolasinski commented Jan 6, 2020