Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Forest Inference Operator #118

Merged
merged 15 commits into from Jun 14, 2022

Conversation

oliverholworthy
Copy link
Member

@oliverholworthy oliverholworthy commented Jun 13, 2022

Add PredictForest Inference Operator. Wraps the FIL operator added in #110 And uses the python backend and Business Logic Scripting to make requests to the FIL model.

Exporting the Forest operator results in two Triton models:

  • One fil backend model
  • One python backend model which wraps the fil backend. When a request is made to this model. In the Python runner, we use Business Logic Scripting to create a request to the model with the fil backend that runs inference on the model.

Example

# export the model data to a model repository directory
from merlin.systems.dag.ops.fil import PredictForest

model = ...  # XGBoost, LightGBM, or cuml/sklearn random forest.
input_schema = ...  # merlin Schema representing input features to the model
ops = feature_names >> PredictForest(model, input_schema)
ensemble = Ensemble(ops, input_schema)
ensemble.export(export_path)


# make a request to the ensemble
from merlin.systems.triton.utils import send_triton_request

outputs_list = ensemble.graph.output_schema.column_names
response = send_triton_request(request, outputs_list, endpoint="triton:8001", triton_model="ensemble_model")
response

Note on GPU usage

I encountered an issue when using GPU with the FIL op + the PredictForest wrapper op. I have been receiving the following error:

failed to open cuda ipc handle

A work-around I found was to set the instance group to CPU

PredictForest(model, input_schema, instance_group="CPU")

@nvidia-merlin-bot
Copy link

Click to view CI Results
GitHub pull request #118 of commit e0d9a9d078fa5d8d3c6dd1ba115ecdc5ed6c096a, no merge conflicts.
Running as SYSTEM
Setting status of e0d9a9d078fa5d8d3c6dd1ba115ecdc5ed6c096a to PENDING with url https://10.20.13.93:8080/job/merlin_systems/77/console and message: 'Pending'
Using context: Jenkins
Building on master in workspace /var/jenkins_home/workspace/merlin_systems
using credential fce1c729-5d7c-48e8-90cb-b0c314b1076e
 > git rev-parse --is-inside-work-tree # timeout=10
Fetching changes from the remote Git repository
 > git config remote.origin.url https://github.com/NVIDIA-Merlin/systems # timeout=10
Fetching upstream changes from https://github.com/NVIDIA-Merlin/systems
 > git --version # timeout=10
using GIT_ASKPASS to set credentials login for merlin-systems user + githubtoken
 > git fetch --tags --force --progress -- https://github.com/NVIDIA-Merlin/systems +refs/pull/118/*:refs/remotes/origin/pr/118/* # timeout=10
 > git rev-parse e0d9a9d078fa5d8d3c6dd1ba115ecdc5ed6c096a^{commit} # timeout=10
Checking out Revision e0d9a9d078fa5d8d3c6dd1ba115ecdc5ed6c096a (detached)
 > git config core.sparsecheckout # timeout=10
 > git checkout -f e0d9a9d078fa5d8d3c6dd1ba115ecdc5ed6c096a # timeout=10
Commit message: "Add Forest class for running inference with forest models"
 > git rev-list --no-walk 1e1226f81fe02bbdf1ede45ff8401ffaedbb01e0 # timeout=10
[merlin_systems] $ /bin/bash /tmp/jenkins15805946083825117376.sh
============================= test session starts ==============================
platform linux -- Python 3.8.10, pytest-7.1.2, pluggy-1.0.0
rootdir: /var/jenkins_home/workspace/merlin_systems/systems, configfile: pyproject.toml
plugins: anyio-3.5.0, xdist-2.5.0, forked-1.4.0, cov-3.0.0
collected 18 items / 2 skipped

tests/unit/test_version.py . [ 5%]
tests/unit/systems/test_ensemble.py .... [ 27%]
tests/unit/systems/test_ensemble_ops.py .. [ 38%]
tests/unit/systems/test_export.py . [ 44%]
tests/unit/systems/test_graph.py . [ 50%]
tests/unit/systems/test_inference_ops.py .. [ 61%]
tests/unit/systems/test_op_runner.py .... [ 83%]
tests/unit/systems/test_tensorflow_inf_op.py ... [100%]

=============================== warnings summary ===============================
../../../../../usr/local/lib/python3.8/dist-packages/nvtabular/framework_utils/init.py:18
/usr/local/lib/python3.8/dist-packages/nvtabular/framework_utils/init.py:18: DeprecationWarning: The nvtabular.framework_utils module is being replaced by the Merlin Models library. Support for importing from nvtabular.framework_utils is deprecated, and will be removed in a future version. Please consider using the models and layers from Merlin Models instead.
warnings.warn(

tests/unit/systems/test_ensemble.py: 7 warnings
tests/unit/systems/test_export.py: 1 warning
tests/unit/systems/test_inference_ops.py: 2 warnings
tests/unit/systems/test_op_runner.py: 4 warnings
/usr/local/lib/python3.8/dist-packages/cudf/core/dataframe.py:1292: UserWarning: The deep parameter is ignored and is only included for pandas compatibility.
warnings.warn(

tests/unit/systems/test_export.py::test_export_run_ensemble_triton[tensorflow-parquet]
/var/jenkins_home/workspace/merlin_systems/systems/merlin/systems/triton/export.py:304: UserWarning: Column x is being generated by NVTabular workflow but is unused in test_name_tf model
warnings.warn(

tests/unit/systems/test_export.py::test_export_run_ensemble_triton[tensorflow-parquet]
/var/jenkins_home/workspace/merlin_systems/systems/merlin/systems/triton/export.py:304: UserWarning: Column y is being generated by NVTabular workflow but is unused in test_name_tf model
warnings.warn(

tests/unit/systems/test_export.py::test_export_run_ensemble_triton[tensorflow-parquet]
/var/jenkins_home/workspace/merlin_systems/systems/merlin/systems/triton/export.py:304: UserWarning: Column id is being generated by NVTabular workflow but is unused in test_name_tf model
warnings.warn(

-- Docs: https://docs.pytest.org/en/stable/how-to/capture-warnings.html
============ 18 passed, 2 skipped, 18 warnings in 76.86s (0:01:16) =============
Performing Post build task...
Match found for : : True
Logical operation result is TRUE
Running script : #!/bin/bash
cd /var/jenkins_home/
CUDA_VISIBLE_DEVICES=1 python test_res_push.py "https://api.GitHub.com/repos/NVIDIA-Merlin/systems/issues/$ghprbPullId/comments" "/var/jenkins_home/jobs/$JOB_NAME/builds/$BUILD_NUMBER/log"
[merlin_systems] $ /bin/bash /tmp/jenkins3958063561638480839.sh

@github-actions
Copy link

Documentation preview

https://nvidia-merlin.github.io/systems/review/pr-118

@oliverholworthy oliverholworthy added the enhancement New feature or request label Jun 13, 2022
@oliverholworthy oliverholworthy marked this pull request as ready for review June 13, 2022 14:34
@nvidia-merlin-bot
Copy link

Click to view CI Results
GitHub pull request #118 of commit f5234219b6ec6bb083d8420775a96e35a764ed3e, no merge conflicts.
Running as SYSTEM
Setting status of f5234219b6ec6bb083d8420775a96e35a764ed3e to PENDING with url https://10.20.13.93:8080/job/merlin_systems/78/console and message: 'Pending'
Using context: Jenkins
Building on master in workspace /var/jenkins_home/workspace/merlin_systems
using credential fce1c729-5d7c-48e8-90cb-b0c314b1076e
 > git rev-parse --is-inside-work-tree # timeout=10
Fetching changes from the remote Git repository
 > git config remote.origin.url https://github.com/NVIDIA-Merlin/systems # timeout=10
Fetching upstream changes from https://github.com/NVIDIA-Merlin/systems
 > git --version # timeout=10
using GIT_ASKPASS to set credentials login for merlin-systems user + githubtoken
 > git fetch --tags --force --progress -- https://github.com/NVIDIA-Merlin/systems +refs/pull/118/*:refs/remotes/origin/pr/118/* # timeout=10
 > git rev-parse f5234219b6ec6bb083d8420775a96e35a764ed3e^{commit} # timeout=10
Checking out Revision f5234219b6ec6bb083d8420775a96e35a764ed3e (detached)
 > git config core.sparsecheckout # timeout=10
 > git checkout -f f5234219b6ec6bb083d8420775a96e35a764ed3e # timeout=10
Commit message: "Add test for Ensemble with NVTabular Workflow and Forest Operator"
 > git rev-list --no-walk e0d9a9d078fa5d8d3c6dd1ba115ecdc5ed6c096a # timeout=10
[merlin_systems] $ /bin/bash /tmp/jenkins17896677633240590267.sh
============================= test session starts ==============================
platform linux -- Python 3.8.10, pytest-7.1.2, pluggy-1.0.0
rootdir: /var/jenkins_home/workspace/merlin_systems/systems, configfile: pyproject.toml
plugins: anyio-3.5.0, xdist-2.5.0, forked-1.4.0, cov-3.0.0
collected 18 items / 2 skipped

tests/unit/test_version.py . [ 5%]
tests/unit/systems/test_ensemble.py ...F [ 27%]
tests/unit/systems/test_ensemble_ops.py .. [ 38%]
tests/unit/systems/test_export.py . [ 44%]
tests/unit/systems/test_graph.py . [ 50%]
tests/unit/systems/test_inference_ops.py .. [ 61%]
tests/unit/systems/test_op_runner.py .... [ 83%]
tests/unit/systems/test_tensorflow_inf_op.py ... [100%]

=================================== FAILURES ===================================
______________ test_workflow_tf_e2e_multi_op_plus_2_run[parquet] _______________

tmpdir = local('/tmp/pytest-of-jenkins/pytest-21/test_workflow_tf_e2e_multi_op_1')
dataset = <merlin.io.dataset.Dataset object at 0x7f2b5c1fcb50>
engine = 'parquet'

@pytest.mark.skipif(not TRITON_SERVER_PATH, reason="triton server not found")
@pytest.mark.parametrize("engine", ["parquet"])
def test_workflow_tf_e2e_multi_op_plus_2_run(tmpdir, dataset, engine):
    # Create a Workflow
    schema = dataset.schema
    for name in ["x", "y", "id"]:
        dataset.schema.column_schemas[name] = dataset.schema.column_schemas[name].with_tags(
            [Tags.USER]
        )

    workflow_ops = ["name-cat"] >> wf_ops.Categorify(cat_cache="host")
    workflow = Workflow(workflow_ops)
    workflow.fit(dataset)

    embedding_shapes_1 = wf_ops.get_embedding_sizes(workflow)

    cats = ["name-string"] >> wf_ops.Categorify(cat_cache="host")
    workflow_2 = Workflow(cats)
    workflow_2.fit(dataset)

    embedding_shapes = wf_ops.get_embedding_sizes(workflow_2)
    embedding_shapes_1.update(embedding_shapes)
    embedding_shapes_1["name-string_plus_2"] = embedding_shapes_1["name-string"]

    # Create Tensorflow Model
    model = create_tf_model(["name-cat", "name-string_plus_2"], [], embedding_shapes_1)

    # Creating Triton Ensemble
    triton_chain_1 = ["name-cat"] >> TransformWorkflow(workflow)
    triton_chain_2 = ["name-string"] >> TransformWorkflow(workflow_2) >> PlusTwoOp()
    triton_chain = (triton_chain_1 + triton_chain_2) >> PredictTensorflow(model)

    triton_ens = Ensemble(triton_chain, schema)

    # Creating Triton Ensemble Config
    ensemble_config, nodes_config = triton_ens.export(str(tmpdir))
    config_path = tmpdir / "ensemble_model" / "config.pbtxt"

    # Checking Triton Ensemble Config
    with open(config_path, "rb") as f:
        config = model_config.ModelConfig()
        raw_config = f.read()
        parsed = text_format.Parse(raw_config, config)

        # The config file contents are correct
        assert parsed.name == "ensemble_model"
        assert parsed.platform == "ensemble"
        assert hasattr(parsed, "ensemble_scheduling")

    df = dataset.to_ddf().compute()[["name-string", "name-cat"]].iloc[:3]
  response = _run_ensemble_on_tritonserver(str(tmpdir), ["output"], df, triton_ens.name)

tests/unit/systems/test_ensemble.py:233:


tests/unit/systems/utils/triton.py:39: in _run_ensemble_on_tritonserver
with run_triton_server(tmpdir) as client:
/usr/lib/python3.8/contextlib.py:113: in enter
return next(self.gen)


modelpath = '/tmp/pytest-of-jenkins/pytest-21/test_workflow_tf_e2e_multi_op_1'

@contextlib.contextmanager
def run_triton_server(modelpath):
    """This function starts up a Triton server instance and returns a client to it.

    Parameters
    ----------
    modelpath : string
        The path to the model to load.

    Yields
    ------
    client: tritonclient.InferenceServerClient
        The client connected to the Triton server.

    """
    cmdline = [
        TRITON_SERVER_PATH,
        "--model-repository",
        modelpath,
        "--backend-config=tensorflow,version=2",
    ]
    env = os.environ.copy()
    env["CUDA_VISIBLE_DEVICES"] = "0"
    with subprocess.Popen(cmdline, env=env) as process:
        try:
            with grpcclient.InferenceServerClient("localhost:8001") as client:
                # wait until server is ready
                for _ in range(60):
                    if process.poll() is not None:
                        retcode = process.returncode
                      raise RuntimeError(f"Tritonserver failed to start (ret={retcode})")

E RuntimeError: Tritonserver failed to start (ret=-11)

merlin/systems/triton/utils.py:46: RuntimeError
----------------------------- Captured stderr call -----------------------------
I0613 21:35:24.957590 30865 tensorflow.cc:2176] TRITONBACKEND_Initialize: tensorflow
I0613 21:35:24.957708 30865 tensorflow.cc:2186] Triton TRITONBACKEND API version: 1.8
I0613 21:35:24.957716 30865 tensorflow.cc:2192] 'tensorflow' TRITONBACKEND API version: 1.8
I0613 21:35:24.957722 30865 tensorflow.cc:2216] backend configuration:
{"cmdline":{"version":"2"}}
I0613 21:35:25.156383 30865 pinned_memory_manager.cc:240] Pinned memory pool is created at '0x7f51de000000' with size 268435456
I0613 21:35:25.157113 30865 cuda_memory_manager.cc:105] CUDA memory pool is created on device 0 with size 67108864
I0613 21:35:25.162023 30865 model_repository_manager.cc:997] loading: 0_transformworkflow:1
I0613 21:35:25.262379 30865 model_repository_manager.cc:997] loading: 3_predicttensorflow:1
I0613 21:35:25.264870 30865 backend.cc:46] TRITONBACKEND_Initialize: nvtabular
I0613 21:35:25.264899 30865 backend.cc:53] Triton TRITONBACKEND API version: 1.8
I0613 21:35:25.264912 30865 backend.cc:56] 'nvtabular' TRITONBACKEND API version: 1.8
I0613 21:35:25.265089 30865 backend.cc:76] Loaded libpython successfully
I0613 21:35:25.362644 30865 model_repository_manager.cc:997] loading: 2_plustwoop:1
I0613 21:35:25.451733 30865 backend.cc:89] Python interpreter is initialized
I0613 21:35:25.452707 30865 tensorflow.cc:2276] TRITONBACKEND_ModelInitialize: 3_predicttensorflow (version 1)
I0613 21:35:25.453202 30865 model_inst_state.hpp:58] Loading TritonPythonModel from module 'merlin.systems.triton.workflow_model'
I0613 21:35:25.462978 30865 model_repository_manager.cc:997] loading: 1_transformworkflow:1
I0613 21:35:27.330284 30865 tensorflow.cc:2325] TRITONBACKEND_ModelInstanceInitialize: 3_predicttensorflow (GPU device 0)
I0613 21:35:27.330410 30865 model_repository_manager.cc:1152] successfully loaded '0_transformworkflow' version 1
2022-06-13 21:35:28.378766: I tensorflow/cc/saved_model/reader.cc:43] Reading SavedModel from: /tmp/pytest-of-jenkins/pytest-21/test_workflow_tf_e2e_multi_op_1/3_predicttensorflow/1/model.savedmodel
2022-06-13 21:35:28.380621: I tensorflow/cc/saved_model/reader.cc:78] Reading meta graph with tags { serve }
2022-06-13 21:35:28.380646: I tensorflow/cc/saved_model/reader.cc:119] Reading SavedModel debug info (if present) from: /tmp/pytest-of-jenkins/pytest-21/test_workflow_tf_e2e_multi_op_1/3_predicttensorflow/1/model.savedmodel
2022-06-13 21:35:28.380752: I tensorflow/core/platform/cpu_feature_guard.cc:152] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: SSE3 SSE4.1 SSE4.2 AVX
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2022-06-13 21:35:28.384806: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1525] Created device /job:localhost/replica:0/task:0/device:GPU:0 with 7965 MB memory: -> device: 0, name: Tesla P100-DGXS-16GB, pci bus id: 0000:07:00.0, compute capability: 6.0
2022-06-13 21:35:28.425490: I tensorflow/cc/saved_model/loader.cc:230] Restoring SavedModel bundle.
2022-06-13 21:35:28.483287: I tensorflow/cc/saved_model/loader.cc:214] Running initialization op on SavedModel bundle at path: /tmp/pytest-of-jenkins/pytest-21/test_workflow_tf_e2e_multi_op_1/3_predicttensorflow/1/model.savedmodel
2022-06-13 21:35:28.496362: I tensorflow/cc/saved_model/loader.cc:321] SavedModel load for tags { serve }; Status: success: OK. Took 117613 microseconds.
I0613 21:35:28.496571 30865 model_repository_manager.cc:1152] successfully loaded '3_predicttensorflow' version 1
I0613 21:35:28.500081 30865 python.cc:1903] TRITONBACKEND_ModelInstanceInitialize: 2_plustwoop (GPU device 0)
0613 21:35:30.458274 31014 pb_stub.cc:369] Failed to initialize Python stub: ModuleNotFoundError: No module named 'tests.unit.systems'

At:
(973): _find_and_load_unlocked
(991): _find_and_load
(1014): _gcd_import
(219): _call_with_frames_removed
(961): _find_and_load_unlocked
(991): _find_and_load
(1014): _gcd_import
(219): _call_with_frames_removed
(961): _find_and_load_unlocked
(991): _find_and_load
(1014): _gcd_import
/usr/lib/python3.8/importlib/init.py(127): import_module
/usr/local/lib/python3.8/dist-packages/merlin/systems/dag/op_runner.py(30): init
/tmp/pytest-of-jenkins/pytest-21/test_workflow_tf_e2e_multi_op_1/2_plustwoop/1/model.py(66): initialize

I0613 21:35:30.458505 30865 model_inst_state.hpp:58] Loading TritonPythonModel from module 'merlin.systems.triton.workflow_model'
I0613 21:35:30.468488 30865 model_repository_manager.cc:1152] successfully loaded '1_transformworkflow' version 1
E0613 21:35:30.469794 30865 model_repository_manager.cc:1155] failed to load '2_plustwoop' version 1: Internal: ModuleNotFoundError: No module named 'tests.unit.systems'

At:
(973): _find_and_load_unlocked
(991): _find_and_load
(1014): _gcd_import
(219): _call_with_frames_removed
(961): _find_and_load_unlocked
(991): _find_and_load
(1014): _gcd_import
(219): _call_with_frames_removed
(961): _find_and_load_unlocked
(991): _find_and_load
(1014): _gcd_import
/usr/lib/python3.8/importlib/init.py(127): import_module
/usr/local/lib/python3.8/dist-packages/merlin/systems/dag/op_runner.py(30): init
/tmp/pytest-of-jenkins/pytest-21/test_workflow_tf_e2e_multi_op_1/2_plustwoop/1/model.py(66): initialize

E0613 21:35:30.471102 30865 model_repository_manager.cc:1341] Invalid argument: ensemble 'ensemble_model' depends on '2_plustwoop' which has no loaded version
I0613 21:35:30.471257 30865 server.cc:524]
+------------------+------+
| Repository Agent | Path |
+------------------+------+
+------------------+------+

I0613 21:35:30.471843 30865 server.cc:551]
+------------+-----------------------------------------------------------------+-----------------------------+
| Backend | Path | Config |
+------------+-----------------------------------------------------------------+-----------------------------+
| tensorflow | /opt/tritonserver/backends/tensorflow2/libtriton_tensorflow2.so | {"cmdline":{"version":"2"}} |
| nvtabular | /opt/tritonserver/backends/nvtabular/libtriton_nvtabular.so | {} |
+------------+-----------------------------------------------------------------+-----------------------------+

I0613 21:35:30.471918 30865 server.cc:594]
+---------------------+---------+-----------------------------------------------------------------------------------------------------------+
| Model | Version | Status |
+---------------------+---------+-----------------------------------------------------------------------------------------------------------+
| 0_transformworkflow | 1 | READY |
| 1_transformworkflow | 1 | READY |
| 2_plustwoop | 1 | UNAVAILABLE: Internal: ModuleNotFoundError: No module named 'tests.unit.systems' |
| | | |
| | | At: |
| | | (973): _find_and_load_unlocked |
| | | (991): _find_and_load |
| | | (1014): _gcd_import |
| | | (219): _call_with_frames_removed |
| | | (961): _find_and_load_unlocked |
| | | (991): _find_and_load |
| | | (1014): _gcd_import |
| | | (219): _call_with_frames_removed |
| | | (961): _find_and_load_unlocked |
| | | (991): _find_and_load |
| | | (1014): _gcd_import |
| | | /usr/lib/python3.8/importlib/init.py(127): import_module |
| | | /usr/local/lib/python3.8/dist-packages/merlin/systems/dag/op_runner.py(30): init |
| | | /tmp/pytest-of-jenkins/pytest-21/test_workflow_tf_e2e_multi_op_1/2_plustwoop/1/model.py(66): initialize |
| 3_predicttensorflow | 1 | READY |
+---------------------+---------+-----------------------------------------------------------------------------------------------------------+

I0613 21:35:30.513300 30865 metrics.cc:651] Collecting metrics for GPU 0: Tesla P100-DGXS-16GB
I0613 21:35:30.514973 30865 tritonserver.cc:1962]
+----------------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Option | Value |
+----------------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| server_id | triton |
| server_version | 2.20.0 |
| server_extensions | classification sequence model_repository model_repository(unload_dependents) schedule_policy model_configuration system_shared_memory cuda_shared_memory binary_tensor_data statistics trace |
| model_repository_path[0] | /tmp/pytest-of-jenkins/pytest-21/test_workflow_tf_e2e_multi_op_1 |
| model_control_mode | MODE_NONE |
| strict_model_config | 1 |
| rate_limit | OFF |
| pinned_memory_pool_byte_size | 268435456 |
| cuda_memory_pool_byte_size{0} | 67108864 |
| response_cache_byte_size | 0 |
| min_supported_compute_capability | 6.0 |
| strict_readiness | 1 |
| exit_timeout | 30 |
+----------------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+

I0613 21:35:30.515001 30865 server.cc:252] Waiting for in-flight requests to complete.
I0613 21:35:30.515009 30865 model_repository_manager.cc:1029] unloading: 3_predicttensorflow:1
I0613 21:35:30.515059 30865 model_repository_manager.cc:1029] unloading: 1_transformworkflow:1
I0613 21:35:30.515115 30865 model_repository_manager.cc:1029] unloading: 0_transformworkflow:1
I0613 21:35:30.515294 30865 tensorflow.cc:2363] TRITONBACKEND_ModelInstanceFinalize: delete instance stateI0613 21:35:30.515315 30865 backend.cc:160] TRITONBACKEND_ModelInstanceFinalize: delete instance state
I0613 21:35:30.515305 30865 server.cc:267] Timeout 30: Found 3 live models and 0 in-flight non-inference requests

------------------------------ Captured log call -------------------------------
WARNING absl:signature_serialization.py:146 Function _wrapped_model contains input name(s) name-cat, name-string_plus_2 with unsupported characters which will be renamed to name_cat, name_string_plus_2 in the SavedModel.
WARNING absl:save.py:133 <nvtabular.framework_utils.tensorflow.layers.embedding.DenseFeatures object at 0x7f2b4434bd90> has the same name 'DenseFeatures' as a built-in Keras object. Consider renaming <class 'nvtabular.framework_utils.tensorflow.layers.embedding.DenseFeatures'> to avoid naming conflicts when loading with tf.keras.models.load_model. If renaming is not possible, pass the object in the custom_objects parameter of the load function.
WARNING tensorflow:load.py:167 No training configuration found in save file, so the model was not compiled. Compile it manually.
WARNING absl:signature_serialization.py:146 Function _wrapped_model contains input name(s) name-cat, name-string_plus_2 with unsupported characters which will be renamed to name_cat, name_string_plus_2 in the SavedModel.
WARNING absl:save.py:133 <nvtabular.framework_utils.tensorflow.layers.embedding.DenseFeatures object at 0x7f2b4434bd90> has the same name 'DenseFeatures' as a built-in Keras object. Consider renaming <class 'nvtabular.framework_utils.tensorflow.layers.embedding.DenseFeatures'> to avoid naming conflicts when loading with tf.keras.models.load_model. If renaming is not possible, pass the object in the custom_objects parameter of the load function.
WARNING tensorflow:load.py:167 No training configuration found in save file, so the model was not compiled. Compile it manually.
=============================== warnings summary ===============================
../../../.local/lib/python3.8/site-packages/nvtabular/framework_utils/init.py:18
/var/jenkins_home/.local/lib/python3.8/site-packages/nvtabular/framework_utils/init.py:18: DeprecationWarning: The nvtabular.framework_utils module is being replaced by the Merlin Models library. Support for importing from nvtabular.framework_utils is deprecated, and will be removed in a future version. Please consider using the models and layers from Merlin Models instead.
warnings.warn(

tests/unit/systems/test_ensemble.py: 7 warnings
tests/unit/systems/test_export.py: 1 warning
tests/unit/systems/test_inference_ops.py: 2 warnings
tests/unit/systems/test_op_runner.py: 4 warnings
/usr/local/lib/python3.8/dist-packages/cudf/core/dataframe.py:1292: UserWarning: The deep parameter is ignored and is only included for pandas compatibility.
warnings.warn(

tests/unit/systems/test_export.py::test_export_run_ensemble_triton[tensorflow-parquet]
/var/jenkins_home/workspace/merlin_systems/systems/merlin/systems/triton/export.py:304: UserWarning: Column x is being generated by NVTabular workflow but is unused in test_name_tf model
warnings.warn(

tests/unit/systems/test_export.py::test_export_run_ensemble_triton[tensorflow-parquet]
/var/jenkins_home/workspace/merlin_systems/systems/merlin/systems/triton/export.py:304: UserWarning: Column y is being generated by NVTabular workflow but is unused in test_name_tf model
warnings.warn(

tests/unit/systems/test_export.py::test_export_run_ensemble_triton[tensorflow-parquet]
/var/jenkins_home/workspace/merlin_systems/systems/merlin/systems/triton/export.py:304: UserWarning: Column id is being generated by NVTabular workflow but is unused in test_name_tf model
warnings.warn(

-- Docs: https://docs.pytest.org/en/stable/how-to/capture-warnings.html
=========================== short test summary info ============================
FAILED tests/unit/systems/test_ensemble.py::test_workflow_tf_e2e_multi_op_plus_2_run[parquet]
======= 1 failed, 17 passed, 2 skipped, 18 warnings in 71.58s (0:01:11) ========
Build step 'Execute shell' marked build as failure
Performing Post build task...
Match found for : : True
Logical operation result is TRUE
Running script : #!/bin/bash
cd /var/jenkins_home/
CUDA_VISIBLE_DEVICES=1 python test_res_push.py "https://api.GitHub.com/repos/NVIDIA-Merlin/systems/issues/$ghprbPullId/comments" "/var/jenkins_home/jobs/$JOB_NAME/builds/$BUILD_NUMBER/log"
[merlin_systems] $ /bin/bash /tmp/jenkins3064280779929266647.sh

@nvidia-merlin-bot
Copy link

Click to view CI Results
GitHub pull request #118 of commit 36a7cdf5b8766f0007d62fedc6fee03097190227, no merge conflicts.
Running as SYSTEM
Setting status of 36a7cdf5b8766f0007d62fedc6fee03097190227 to PENDING with url https://10.20.13.93:8080/job/merlin_systems/79/console and message: 'Pending'
Using context: Jenkins
Building on master in workspace /var/jenkins_home/workspace/merlin_systems
using credential fce1c729-5d7c-48e8-90cb-b0c314b1076e
 > git rev-parse --is-inside-work-tree # timeout=10
Fetching changes from the remote Git repository
 > git config remote.origin.url https://github.com/NVIDIA-Merlin/systems # timeout=10
Fetching upstream changes from https://github.com/NVIDIA-Merlin/systems
 > git --version # timeout=10
using GIT_ASKPASS to set credentials login for merlin-systems user + githubtoken
 > git fetch --tags --force --progress -- https://github.com/NVIDIA-Merlin/systems +refs/pull/118/*:refs/remotes/origin/pr/118/* # timeout=10
 > git rev-parse 36a7cdf5b8766f0007d62fedc6fee03097190227^{commit} # timeout=10
Checking out Revision 36a7cdf5b8766f0007d62fedc6fee03097190227 (detached)
 > git config core.sparsecheckout # timeout=10
 > git checkout -f 36a7cdf5b8766f0007d62fedc6fee03097190227 # timeout=10
Commit message: "Add test for Ensemble with NVTabular Workflow and Forest Operator"
 > git rev-list --no-walk f5234219b6ec6bb083d8420775a96e35a764ed3e # timeout=10
[merlin_systems] $ /bin/bash /tmp/jenkins2740923852274120150.sh
============================= test session starts ==============================
platform linux -- Python 3.8.10, pytest-7.1.2, pluggy-1.0.0
rootdir: /var/jenkins_home/workspace/merlin_systems/systems, configfile: pyproject.toml
plugins: anyio-3.5.0, xdist-2.5.0, forked-1.4.0, cov-3.0.0
collected 18 items / 2 skipped

tests/unit/test_version.py . [ 5%]
tests/unit/systems/test_ensemble.py ...F [ 27%]
tests/unit/systems/test_ensemble_ops.py .. [ 38%]
tests/unit/systems/test_export.py . [ 44%]
tests/unit/systems/test_graph.py . [ 50%]
tests/unit/systems/test_inference_ops.py .. [ 61%]
tests/unit/systems/test_op_runner.py .... [ 83%]
tests/unit/systems/test_tensorflow_inf_op.py ... [100%]

=================================== FAILURES ===================================
______________ test_workflow_tf_e2e_multi_op_plus_2_run[parquet] _______________

tmpdir = local('/tmp/pytest-of-jenkins/pytest-22/test_workflow_tf_e2e_multi_op_1')
dataset = <merlin.io.dataset.Dataset object at 0x7f79ec3062b0>
engine = 'parquet'

@pytest.mark.skipif(not TRITON_SERVER_PATH, reason="triton server not found")
@pytest.mark.parametrize("engine", ["parquet"])
def test_workflow_tf_e2e_multi_op_plus_2_run(tmpdir, dataset, engine):
    # Create a Workflow
    schema = dataset.schema
    for name in ["x", "y", "id"]:
        dataset.schema.column_schemas[name] = dataset.schema.column_schemas[name].with_tags(
            [Tags.USER]
        )

    workflow_ops = ["name-cat"] >> wf_ops.Categorify(cat_cache="host")
    workflow = Workflow(workflow_ops)
    workflow.fit(dataset)

    embedding_shapes_1 = wf_ops.get_embedding_sizes(workflow)

    cats = ["name-string"] >> wf_ops.Categorify(cat_cache="host")
    workflow_2 = Workflow(cats)
    workflow_2.fit(dataset)

    embedding_shapes = wf_ops.get_embedding_sizes(workflow_2)
    embedding_shapes_1.update(embedding_shapes)
    embedding_shapes_1["name-string_plus_2"] = embedding_shapes_1["name-string"]

    # Create Tensorflow Model
    model = create_tf_model(["name-cat", "name-string_plus_2"], [], embedding_shapes_1)

    # Creating Triton Ensemble
    triton_chain_1 = ["name-cat"] >> TransformWorkflow(workflow)
    triton_chain_2 = ["name-string"] >> TransformWorkflow(workflow_2) >> PlusTwoOp()
    triton_chain = (triton_chain_1 + triton_chain_2) >> PredictTensorflow(model)

    triton_ens = Ensemble(triton_chain, schema)

    # Creating Triton Ensemble Config
    ensemble_config, nodes_config = triton_ens.export(str(tmpdir))
    config_path = tmpdir / "ensemble_model" / "config.pbtxt"

    # Checking Triton Ensemble Config
    with open(config_path, "rb") as f:
        config = model_config.ModelConfig()
        raw_config = f.read()
        parsed = text_format.Parse(raw_config, config)

        # The config file contents are correct
        assert parsed.name == "ensemble_model"
        assert parsed.platform == "ensemble"
        assert hasattr(parsed, "ensemble_scheduling")

    df = dataset.to_ddf().compute()[["name-string", "name-cat"]].iloc[:3]
  response = _run_ensemble_on_tritonserver(str(tmpdir), ["output"], df, triton_ens.name)

tests/unit/systems/test_ensemble.py:233:


tests/unit/systems/utils/triton.py:39: in _run_ensemble_on_tritonserver
with run_triton_server(tmpdir) as client:
/usr/lib/python3.8/contextlib.py:113: in enter
return next(self.gen)


modelpath = '/tmp/pytest-of-jenkins/pytest-22/test_workflow_tf_e2e_multi_op_1'

@contextlib.contextmanager
def run_triton_server(modelpath):
    """This function starts up a Triton server instance and returns a client to it.

    Parameters
    ----------
    modelpath : string
        The path to the model to load.

    Yields
    ------
    client: tritonclient.InferenceServerClient
        The client connected to the Triton server.

    """
    cmdline = [
        TRITON_SERVER_PATH,
        "--model-repository",
        modelpath,
        "--backend-config=tensorflow,version=2",
    ]
    env = os.environ.copy()
    env["CUDA_VISIBLE_DEVICES"] = "0"
    with subprocess.Popen(cmdline, env=env) as process:
        try:
            with grpcclient.InferenceServerClient("localhost:8001") as client:
                # wait until server is ready
                for _ in range(60):
                    if process.poll() is not None:
                        retcode = process.returncode
                      raise RuntimeError(f"Tritonserver failed to start (ret={retcode})")

E RuntimeError: Tritonserver failed to start (ret=-11)

merlin/systems/triton/utils.py:46: RuntimeError
----------------------------- Captured stderr call -----------------------------
I0613 21:45:35.405584 5427 tensorflow.cc:2176] TRITONBACKEND_Initialize: tensorflow
I0613 21:45:35.405697 5427 tensorflow.cc:2186] Triton TRITONBACKEND API version: 1.8
I0613 21:45:35.405705 5427 tensorflow.cc:2192] 'tensorflow' TRITONBACKEND API version: 1.8
I0613 21:45:35.405710 5427 tensorflow.cc:2216] backend configuration:
{"cmdline":{"version":"2"}}
I0613 21:45:35.604318 5427 pinned_memory_manager.cc:240] Pinned memory pool is created at '0x7f6224000000' with size 268435456
I0613 21:45:35.605082 5427 cuda_memory_manager.cc:105] CUDA memory pool is created on device 0 with size 67108864
I0613 21:45:35.610233 5427 model_repository_manager.cc:997] loading: 0_transformworkflow:1
I0613 21:45:35.710452 5427 model_repository_manager.cc:997] loading: 3_predicttensorflow:1
I0613 21:45:35.713646 5427 backend.cc:46] TRITONBACKEND_Initialize: nvtabular
I0613 21:45:35.713684 5427 backend.cc:53] Triton TRITONBACKEND API version: 1.8
I0613 21:45:35.713700 5427 backend.cc:56] 'nvtabular' TRITONBACKEND API version: 1.8
I0613 21:45:35.713927 5427 backend.cc:76] Loaded libpython successfully
I0613 21:45:35.810696 5427 model_repository_manager.cc:997] loading: 2_plustwoop:1
I0613 21:45:35.902555 5427 backend.cc:89] Python interpreter is initialized
I0613 21:45:35.903461 5427 tensorflow.cc:2276] TRITONBACKEND_ModelInitialize: 3_predicttensorflow (version 1)
I0613 21:45:35.903971 5427 model_inst_state.hpp:58] Loading TritonPythonModel from module 'merlin.systems.triton.workflow_model'
I0613 21:45:35.911944 5427 model_repository_manager.cc:997] loading: 1_transformworkflow:1
I0613 21:45:37.802591 5427 tensorflow.cc:2325] TRITONBACKEND_ModelInstanceInitialize: 3_predicttensorflow (GPU device 0)
I0613 21:45:37.802701 5427 model_repository_manager.cc:1152] successfully loaded '0_transformworkflow' version 1
2022-06-13 21:45:38.854896: I tensorflow/cc/saved_model/reader.cc:43] Reading SavedModel from: /tmp/pytest-of-jenkins/pytest-22/test_workflow_tf_e2e_multi_op_1/3_predicttensorflow/1/model.savedmodel
2022-06-13 21:45:38.856397: I tensorflow/cc/saved_model/reader.cc:78] Reading meta graph with tags { serve }
2022-06-13 21:45:38.856421: I tensorflow/cc/saved_model/reader.cc:119] Reading SavedModel debug info (if present) from: /tmp/pytest-of-jenkins/pytest-22/test_workflow_tf_e2e_multi_op_1/3_predicttensorflow/1/model.savedmodel
2022-06-13 21:45:38.856525: I tensorflow/core/platform/cpu_feature_guard.cc:152] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: SSE3 SSE4.1 SSE4.2 AVX
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2022-06-13 21:45:38.860569: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1525] Created device /job:localhost/replica:0/task:0/device:GPU:0 with 10318 MB memory: -> device: 0, name: Tesla P100-DGXS-16GB, pci bus id: 0000:07:00.0, compute capability: 6.0
2022-06-13 21:45:38.896760: I tensorflow/cc/saved_model/loader.cc:230] Restoring SavedModel bundle.
2022-06-13 21:45:38.954117: I tensorflow/cc/saved_model/loader.cc:214] Running initialization op on SavedModel bundle at path: /tmp/pytest-of-jenkins/pytest-22/test_workflow_tf_e2e_multi_op_1/3_predicttensorflow/1/model.savedmodel
2022-06-13 21:45:38.974395: I tensorflow/cc/saved_model/loader.cc:321] SavedModel load for tags { serve }; Status: success: OK. Took 119514 microseconds.
I0613 21:45:38.974564 5427 model_repository_manager.cc:1152] successfully loaded '3_predicttensorflow' version 1
I0613 21:45:38.979517 5427 python.cc:1903] TRITONBACKEND_ModelInstanceInitialize: 2_plustwoop (GPU device 0)
0613 21:45:40.946091 5600 pb_stub.cc:369] Failed to initialize Python stub: ModuleNotFoundError: No module named 'tests.unit.systems'

At:
(973): _find_and_load_unlocked
(991): _find_and_load
(1014): _gcd_import
(219): _call_with_frames_removed
(961): _find_and_load_unlocked
(991): _find_and_load
(1014): _gcd_import
(219): _call_with_frames_removed
(961): _find_and_load_unlocked
(991): _find_and_load
(1014): _gcd_import
/usr/lib/python3.8/importlib/init.py(127): import_module
/usr/local/lib/python3.8/dist-packages/merlin/systems/dag/op_runner.py(30): init
/tmp/pytest-of-jenkins/pytest-22/test_workflow_tf_e2e_multi_op_1/2_plustwoop/1/model.py(66): initialize

I0613 21:45:40.946301 5427 model_inst_state.hpp:58] Loading TritonPythonModel from module 'merlin.systems.triton.workflow_model'
I0613 21:45:40.954976 5427 model_repository_manager.cc:1152] successfully loaded '1_transformworkflow' version 1
E0613 21:45:40.956217 5427 model_repository_manager.cc:1155] failed to load '2_plustwoop' version 1: Internal: ModuleNotFoundError: No module named 'tests.unit.systems'

At:
(973): _find_and_load_unlocked
(991): _find_and_load
(1014): _gcd_import
(219): _call_with_frames_removed
(961): _find_and_load_unlocked
(991): _find_and_load
(1014): _gcd_import
(219): _call_with_frames_removed
(961): _find_and_load_unlocked
(991): _find_and_load
(1014): _gcd_import
/usr/lib/python3.8/importlib/init.py(127): import_module
/usr/local/lib/python3.8/dist-packages/merlin/systems/dag/op_runner.py(30): init
/tmp/pytest-of-jenkins/pytest-22/test_workflow_tf_e2e_multi_op_1/2_plustwoop/1/model.py(66): initialize

E0613 21:45:40.957520 5427 model_repository_manager.cc:1341] Invalid argument: ensemble 'ensemble_model' depends on '2_plustwoop' which has no loaded version
I0613 21:45:40.957645 5427 server.cc:524]
+------------------+------+
| Repository Agent | Path |
+------------------+------+
+------------------+------+

I0613 21:45:40.958676 5427 server.cc:551]
+------------+-----------------------------------------------------------------+-----------------------------+
| Backend | Path | Config |
+------------+-----------------------------------------------------------------+-----------------------------+
| tensorflow | /opt/tritonserver/backends/tensorflow2/libtriton_tensorflow2.so | {"cmdline":{"version":"2"}} |
| nvtabular | /opt/tritonserver/backends/nvtabular/libtriton_nvtabular.so | {} |
+------------+-----------------------------------------------------------------+-----------------------------+

I0613 21:45:40.958861 5427 server.cc:594]
+---------------------+---------+-----------------------------------------------------------------------------------------------------------+
| Model | Version | Status |
+---------------------+---------+-----------------------------------------------------------------------------------------------------------+
| 0_transformworkflow | 1 | READY |
| 1_transformworkflow | 1 | READY |
| 2_plustwoop | 1 | UNAVAILABLE: Internal: ModuleNotFoundError: No module named 'tests.unit.systems' |
| | | |
| | | At: |
| | | (973): _find_and_load_unlocked |
| | | (991): _find_and_load |
| | | (1014): _gcd_import |
| | | (219): _call_with_frames_removed |
| | | (961): _find_and_load_unlocked |
| | | (991): _find_and_load |
| | | (1014): _gcd_import |
| | | (219): _call_with_frames_removed |
| | | (961): _find_and_load_unlocked |
| | | (991): _find_and_load |
| | | (1014): _gcd_import |
| | | /usr/lib/python3.8/importlib/init.py(127): import_module |
| | | /usr/local/lib/python3.8/dist-packages/merlin/systems/dag/op_runner.py(30): init |
| | | /tmp/pytest-of-jenkins/pytest-22/test_workflow_tf_e2e_multi_op_1/2_plustwoop/1/model.py(66): initialize |
| 3_predicttensorflow | 1 | READY |
+---------------------+---------+-----------------------------------------------------------------------------------------------------------+

I0613 21:45:41.005340 5427 metrics.cc:651] Collecting metrics for GPU 0: Tesla P100-DGXS-16GB
I0613 21:45:41.006940 5427 tritonserver.cc:1962]
+----------------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Option | Value |
+----------------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| server_id | triton |
| server_version | 2.20.0 |
| server_extensions | classification sequence model_repository model_repository(unload_dependents) schedule_policy model_configuration system_shared_memory cuda_shared_memory binary_tensor_data statistics trace |
| model_repository_path[0] | /tmp/pytest-of-jenkins/pytest-22/test_workflow_tf_e2e_multi_op_1 |
| model_control_mode | MODE_NONE |
| strict_model_config | 1 |
| rate_limit | OFF |
| pinned_memory_pool_byte_size | 268435456 |
| cuda_memory_pool_byte_size{0} | 67108864 |
| response_cache_byte_size | 0 |
| min_supported_compute_capability | 6.0 |
| strict_readiness | 1 |
| exit_timeout | 30 |
+----------------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+

I0613 21:45:41.006967 5427 server.cc:252] Waiting for in-flight requests to complete.
I0613 21:45:41.006976 5427 model_repository_manager.cc:1029] unloading: 3_predicttensorflow:1
I0613 21:45:41.007022 5427 model_repository_manager.cc:1029] unloading: 1_transformworkflow:1
I0613 21:45:41.007073 5427 model_repository_manager.cc:1029] unloading: 0_transformworkflow:1
I0613 21:45:41.007202 5427 server.cc:267] Timeout 30: Found 3 live models and 0 in-flight non-inference requests
I0613 21:45:41.007241 5427 tensorflow.cc:2363] TRITONBACKEND_ModelInstanceFinalize: delete instance state
I0613 21:45:41.007257 5427 backend.cc:160] TRITONBACKEND_ModelInstanceFinalize: delete instance state
I0613 21:45:41.007286 5427 backend.cc:160] TRITONBACKEND_ModelInstanceFinalize: delete instance state
------------------------------ Captured log call -------------------------------
WARNING absl:signature_serialization.py:146 Function _wrapped_model contains input name(s) name-cat, name-string_plus_2 with unsupported characters which will be renamed to name_cat, name_string_plus_2 in the SavedModel.
WARNING absl:save.py:133 <nvtabular.framework_utils.tensorflow.layers.embedding.DenseFeatures object at 0x7f79ec306c10> has the same name 'DenseFeatures' as a built-in Keras object. Consider renaming <class 'nvtabular.framework_utils.tensorflow.layers.embedding.DenseFeatures'> to avoid naming conflicts when loading with tf.keras.models.load_model. If renaming is not possible, pass the object in the custom_objects parameter of the load function.
WARNING tensorflow:load.py:167 No training configuration found in save file, so the model was not compiled. Compile it manually.
WARNING absl:signature_serialization.py:146 Function _wrapped_model contains input name(s) name-cat, name-string_plus_2 with unsupported characters which will be renamed to name_cat, name_string_plus_2 in the SavedModel.
WARNING absl:save.py:133 <nvtabular.framework_utils.tensorflow.layers.embedding.DenseFeatures object at 0x7f79ec306c10> has the same name 'DenseFeatures' as a built-in Keras object. Consider renaming <class 'nvtabular.framework_utils.tensorflow.layers.embedding.DenseFeatures'> to avoid naming conflicts when loading with tf.keras.models.load_model. If renaming is not possible, pass the object in the custom_objects parameter of the load function.
WARNING tensorflow:load.py:167 No training configuration found in save file, so the model was not compiled. Compile it manually.
=============================== warnings summary ===============================
../../../.local/lib/python3.8/site-packages/nvtabular/framework_utils/init.py:18
/var/jenkins_home/.local/lib/python3.8/site-packages/nvtabular/framework_utils/init.py:18: DeprecationWarning: The nvtabular.framework_utils module is being replaced by the Merlin Models library. Support for importing from nvtabular.framework_utils is deprecated, and will be removed in a future version. Please consider using the models and layers from Merlin Models instead.
warnings.warn(

tests/unit/systems/test_ensemble.py: 7 warnings
tests/unit/systems/test_export.py: 1 warning
tests/unit/systems/test_inference_ops.py: 2 warnings
tests/unit/systems/test_op_runner.py: 4 warnings
/usr/local/lib/python3.8/dist-packages/cudf/core/dataframe.py:1292: UserWarning: The deep parameter is ignored and is only included for pandas compatibility.
warnings.warn(

tests/unit/systems/test_export.py::test_export_run_ensemble_triton[tensorflow-parquet]
/var/jenkins_home/workspace/merlin_systems/systems/merlin/systems/triton/export.py:304: UserWarning: Column x is being generated by NVTabular workflow but is unused in test_name_tf model
warnings.warn(

tests/unit/systems/test_export.py::test_export_run_ensemble_triton[tensorflow-parquet]
/var/jenkins_home/workspace/merlin_systems/systems/merlin/systems/triton/export.py:304: UserWarning: Column y is being generated by NVTabular workflow but is unused in test_name_tf model
warnings.warn(

tests/unit/systems/test_export.py::test_export_run_ensemble_triton[tensorflow-parquet]
/var/jenkins_home/workspace/merlin_systems/systems/merlin/systems/triton/export.py:304: UserWarning: Column id is being generated by NVTabular workflow but is unused in test_name_tf model
warnings.warn(

-- Docs: https://docs.pytest.org/en/stable/how-to/capture-warnings.html
=========================== short test summary info ============================
FAILED tests/unit/systems/test_ensemble.py::test_workflow_tf_e2e_multi_op_plus_2_run[parquet]
======= 1 failed, 17 passed, 2 skipped, 18 warnings in 72.29s (0:01:12) ========
Build step 'Execute shell' marked build as failure
Performing Post build task...
Match found for : : True
Logical operation result is TRUE
Running script : #!/bin/bash
cd /var/jenkins_home/
CUDA_VISIBLE_DEVICES=1 python test_res_push.py "https://api.GitHub.com/repos/NVIDIA-Merlin/systems/issues/$ghprbPullId/comments" "/var/jenkins_home/jobs/$JOB_NAME/builds/$BUILD_NUMBER/log"
[merlin_systems] $ /bin/bash /tmp/jenkins5813164175143600594.sh

Copy link
Contributor

@karlhigley karlhigley left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks really good and solves a long-standing problem we've had re: back-ends with weird input formats vs accurate schemas that pass validation ❤️

Looks like there are some test failures but I think they're minor "typo in the assertion" kinds of things, so I'm optimistically approving it and let's get it in As Soon As Reasonable Without Overly Exerting Ourselves (ASARWOEO.) 😺

pb_utils = None


class Forest(PipelineableInferenceOperator):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is less an issue with this PR and more an issue for the library generally, which I don't think should block this PR but is probably a worthwhile point of discussion on which I'm open to suggestions:

We should probably come up with a consistent naming scheme for the operators that make actual model predictions in separate back-ends (like FIL and Tensorflow.) Originally I was thinking to make the convention FrameworkNamePredict, which I thought read pretty nicely in the operator graph definition. On second thought, I realized it might be nice for the prediction operators to be conceptually and alphabetically grouped together, so we changed the name to PredictTensorflow, which would imply this operator might be called PredictForest or PredictFIL. I'm not entirely happy with either convention though, so open to suggestions for how we might do this in a consistent way.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(Also, as a point of order for interpreting my PR comments: I mean this as literally as you can possibly take it—and I usually do. This is not a polite way of saying something else; I am not making a veiled suggestion for changing the code herein. I honestly don't know what to do here, and could be persuaded to go in any direction someone makes a halfway compelling case for. I just think we should probably, at some point, not necessarily now or in this PR, figure out how to name these operators consistently but I can't myself make a halfway compelling case for any particular convention, so...I don't really have a suggestion here, just an issue to raise for your consideration and future pondering. Happy to address this someday, when we think of something smarter than I have so far.)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For consistency with the current operators, I've updated to use the name PredictForest. There seems to be a similar pattern of <Type of Operatior><Type Of Data or Model> (QueryFaiss, QueryFeast, PredictTensorflow, FilterCandidates, UnrollFeatures, TransformWorkflow)

tests/unit/systems/fil/test_forest.py Outdated Show resolved Hide resolved

config_path = tmpdir / "1_fil" / "config.pbtxt"
parsed_config = read_config(config_path)
assert parsed_config.name == "0_fil"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should maybe also be 1_fil, but I'm honestly not 100% sure how (or if) we should account for this op in the ensemble operator id numbering. I think it could be problematic to have two operators with the same id integer, but I don't think this code actually affects the mapping of ids to operators, so maybe aligning the prefix integer with the id of the operator it belongs to makes sense here? Assuming this doesn't break anything, I have no objections to this convention, I'm just thinking out loud here.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I think having the fIl model prefix id match the PredictForest id as it currently does (aside from this test typo) makes sense because they’re part of the same operator conceptually, from the perspective of the user of the library. And it should avoid collisions if there are more than one in the ensemble.

@nvidia-merlin-bot
Copy link

Click to view CI Results
GitHub pull request #118 of commit 46e82485690f300ba3371f926be021b04f678ff0, no merge conflicts.
Running as SYSTEM
Setting status of 46e82485690f300ba3371f926be021b04f678ff0 to PENDING with url https://10.20.13.93:8080/job/merlin_systems/80/console and message: 'Pending'
Using context: Jenkins
Building on master in workspace /var/jenkins_home/workspace/merlin_systems
using credential fce1c729-5d7c-48e8-90cb-b0c314b1076e
 > git rev-parse --is-inside-work-tree # timeout=10
Fetching changes from the remote Git repository
 > git config remote.origin.url https://github.com/NVIDIA-Merlin/systems # timeout=10
Fetching upstream changes from https://github.com/NVIDIA-Merlin/systems
 > git --version # timeout=10
using GIT_ASKPASS to set credentials login for merlin-systems user + githubtoken
 > git fetch --tags --force --progress -- https://github.com/NVIDIA-Merlin/systems +refs/pull/118/*:refs/remotes/origin/pr/118/* # timeout=10
 > git rev-parse 46e82485690f300ba3371f926be021b04f678ff0^{commit} # timeout=10
Checking out Revision 46e82485690f300ba3371f926be021b04f678ff0 (detached)
 > git config core.sparsecheckout # timeout=10
 > git checkout -f 46e82485690f300ba3371f926be021b04f678ff0 # timeout=10
Commit message: "Correct config name in forest test"
 > git rev-list --no-walk 36a7cdf5b8766f0007d62fedc6fee03097190227 # timeout=10
[merlin_systems] $ /bin/bash /tmp/jenkins8461255176555227623.sh
============================= test session starts ==============================
platform linux -- Python 3.8.10, pytest-7.1.2, pluggy-1.0.0
rootdir: /var/jenkins_home/workspace/merlin_systems/systems, configfile: pyproject.toml
plugins: anyio-3.5.0, xdist-2.5.0, forked-1.4.0, cov-3.0.0
collected 18 items / 2 skipped

tests/unit/test_version.py . [ 5%]
tests/unit/systems/test_ensemble.py ...F [ 27%]
tests/unit/systems/test_ensemble_ops.py .. [ 38%]
tests/unit/systems/test_export.py . [ 44%]
tests/unit/systems/test_graph.py . [ 50%]
tests/unit/systems/test_inference_ops.py .. [ 61%]
tests/unit/systems/test_op_runner.py .... [ 83%]
tests/unit/systems/test_tensorflow_inf_op.py ... [100%]

=================================== FAILURES ===================================
______________ test_workflow_tf_e2e_multi_op_plus_2_run[parquet] _______________

tmpdir = local('/tmp/pytest-of-jenkins/pytest-0/test_workflow_tf_e2e_multi_op_1')
dataset = <merlin.io.dataset.Dataset object at 0x7f3994192e50>
engine = 'parquet'

@pytest.mark.skipif(not TRITON_SERVER_PATH, reason="triton server not found")
@pytest.mark.parametrize("engine", ["parquet"])
def test_workflow_tf_e2e_multi_op_plus_2_run(tmpdir, dataset, engine):
    # Create a Workflow
    schema = dataset.schema
    for name in ["x", "y", "id"]:
        dataset.schema.column_schemas[name] = dataset.schema.column_schemas[name].with_tags(
            [Tags.USER]
        )

    workflow_ops = ["name-cat"] >> wf_ops.Categorify(cat_cache="host")
    workflow = Workflow(workflow_ops)
    workflow.fit(dataset)

    embedding_shapes_1 = wf_ops.get_embedding_sizes(workflow)

    cats = ["name-string"] >> wf_ops.Categorify(cat_cache="host")
    workflow_2 = Workflow(cats)
    workflow_2.fit(dataset)

    embedding_shapes = wf_ops.get_embedding_sizes(workflow_2)
    embedding_shapes_1.update(embedding_shapes)
    embedding_shapes_1["name-string_plus_2"] = embedding_shapes_1["name-string"]

    # Create Tensorflow Model
    model = create_tf_model(["name-cat", "name-string_plus_2"], [], embedding_shapes_1)

    # Creating Triton Ensemble
    triton_chain_1 = ["name-cat"] >> TransformWorkflow(workflow)
    triton_chain_2 = ["name-string"] >> TransformWorkflow(workflow_2) >> PlusTwoOp()
    triton_chain = (triton_chain_1 + triton_chain_2) >> PredictTensorflow(model)

    triton_ens = Ensemble(triton_chain, schema)

    # Creating Triton Ensemble Config
    ensemble_config, nodes_config = triton_ens.export(str(tmpdir))
    config_path = tmpdir / "ensemble_model" / "config.pbtxt"

    # Checking Triton Ensemble Config
    with open(config_path, "rb") as f:
        config = model_config.ModelConfig()
        raw_config = f.read()
        parsed = text_format.Parse(raw_config, config)

        # The config file contents are correct
        assert parsed.name == "ensemble_model"
        assert parsed.platform == "ensemble"
        assert hasattr(parsed, "ensemble_scheduling")

    df = dataset.to_ddf().compute()[["name-string", "name-cat"]].iloc[:3]
  response = _run_ensemble_on_tritonserver(str(tmpdir), ["output"], df, triton_ens.name)

tests/unit/systems/test_ensemble.py:233:


tests/unit/systems/utils/triton.py:39: in _run_ensemble_on_tritonserver
with run_triton_server(tmpdir) as client:
/usr/lib/python3.8/contextlib.py:113: in enter
return next(self.gen)


modelpath = '/tmp/pytest-of-jenkins/pytest-0/test_workflow_tf_e2e_multi_op_1'

@contextlib.contextmanager
def run_triton_server(modelpath):
    """This function starts up a Triton server instance and returns a client to it.

    Parameters
    ----------
    modelpath : string
        The path to the model to load.

    Yields
    ------
    client: tritonclient.InferenceServerClient
        The client connected to the Triton server.

    """
    cmdline = [
        TRITON_SERVER_PATH,
        "--model-repository",
        modelpath,
        "--backend-config=tensorflow,version=2",
    ]
    env = os.environ.copy()
    env["CUDA_VISIBLE_DEVICES"] = "0"
    with subprocess.Popen(cmdline, env=env) as process:
        try:
            with grpcclient.InferenceServerClient("localhost:8001") as client:
                # wait until server is ready
                for _ in range(60):
                    if process.poll() is not None:
                        retcode = process.returncode
                      raise RuntimeError(f"Tritonserver failed to start (ret={retcode})")

E RuntimeError: Tritonserver failed to start (ret=-11)

merlin/systems/triton/utils.py:46: RuntimeError
----------------------------- Captured stderr call -----------------------------
I0614 11:04:26.568579 1768 tensorflow.cc:2176] TRITONBACKEND_Initialize: tensorflow
I0614 11:04:26.568695 1768 tensorflow.cc:2186] Triton TRITONBACKEND API version: 1.8
I0614 11:04:26.568702 1768 tensorflow.cc:2192] 'tensorflow' TRITONBACKEND API version: 1.8
I0614 11:04:26.568708 1768 tensorflow.cc:2216] backend configuration:
{"cmdline":{"version":"2"}}
I0614 11:04:26.759275 1768 pinned_memory_manager.cc:240] Pinned memory pool is created at '0x7f7666000000' with size 268435456
I0614 11:04:26.760000 1768 cuda_memory_manager.cc:105] CUDA memory pool is created on device 0 with size 67108864
I0614 11:04:26.765024 1768 model_repository_manager.cc:997] loading: 0_transformworkflow:1
I0614 11:04:26.865340 1768 model_repository_manager.cc:997] loading: 3_predicttensorflow:1
I0614 11:04:26.868557 1768 backend.cc:46] TRITONBACKEND_Initialize: nvtabular
I0614 11:04:26.868595 1768 backend.cc:53] Triton TRITONBACKEND API version: 1.8
I0614 11:04:26.868612 1768 backend.cc:56] 'nvtabular' TRITONBACKEND API version: 1.8
I0614 11:04:26.868837 1768 backend.cc:76] Loaded libpython successfully
I0614 11:04:26.965625 1768 model_repository_manager.cc:997] loading: 2_plustwoop:1
I0614 11:04:27.041988 1768 backend.cc:89] Python interpreter is initialized
I0614 11:04:27.042937 1768 tensorflow.cc:2276] TRITONBACKEND_ModelInitialize: 3_predicttensorflow (version 1)
I0614 11:04:27.043456 1768 model_inst_state.hpp:58] Loading TritonPythonModel from module 'merlin.systems.triton.workflow_model'
I0614 11:04:27.065945 1768 model_repository_manager.cc:997] loading: 1_transformworkflow:1
I0614 11:04:28.959747 1768 tensorflow.cc:2325] TRITONBACKEND_ModelInstanceInitialize: 3_predicttensorflow (GPU device 0)
I0614 11:04:28.960478 1768 model_repository_manager.cc:1152] successfully loaded '0_transformworkflow' version 1
2022-06-14 11:04:29.998800: I tensorflow/cc/saved_model/reader.cc:43] Reading SavedModel from: /tmp/pytest-of-jenkins/pytest-0/test_workflow_tf_e2e_multi_op_1/3_predicttensorflow/1/model.savedmodel
2022-06-14 11:04:30.000712: I tensorflow/cc/saved_model/reader.cc:78] Reading meta graph with tags { serve }
2022-06-14 11:04:30.000736: I tensorflow/cc/saved_model/reader.cc:119] Reading SavedModel debug info (if present) from: /tmp/pytest-of-jenkins/pytest-0/test_workflow_tf_e2e_multi_op_1/3_predicttensorflow/1/model.savedmodel
2022-06-14 11:04:30.000841: I tensorflow/core/platform/cpu_feature_guard.cc:152] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: SSE3 SSE4.1 SSE4.2 AVX
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2022-06-14 11:04:30.006699: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1525] Created device /job:localhost/replica:0/task:0/device:GPU:0 with 12669 MB memory: -> device: 0, name: Tesla P100-DGXS-16GB, pci bus id: 0000:07:00.0, compute capability: 6.0
2022-06-14 11:04:30.039734: I tensorflow/cc/saved_model/loader.cc:230] Restoring SavedModel bundle.
2022-06-14 11:04:30.100774: I tensorflow/cc/saved_model/loader.cc:214] Running initialization op on SavedModel bundle at path: /tmp/pytest-of-jenkins/pytest-0/test_workflow_tf_e2e_multi_op_1/3_predicttensorflow/1/model.savedmodel
2022-06-14 11:04:30.113225: I tensorflow/cc/saved_model/loader.cc:321] SavedModel load for tags { serve }; Status: success: OK. Took 114443 microseconds.
I0614 11:04:30.113417 1768 model_repository_manager.cc:1152] successfully loaded '3_predicttensorflow' version 1
I0614 11:04:30.116845 1768 python.cc:1903] TRITONBACKEND_ModelInstanceInitialize: 2_plustwoop (GPU device 0)
0614 11:04:32.084367 1861 pb_stub.cc:369] Failed to initialize Python stub: ModuleNotFoundError: No module named 'tests.unit.systems'

At:
(973): _find_and_load_unlocked
(991): _find_and_load
(1014): _gcd_import
(219): _call_with_frames_removed
(961): _find_and_load_unlocked
(991): _find_and_load
(1014): _gcd_import
(219): _call_with_frames_removed
(961): _find_and_load_unlocked
(991): _find_and_load
(1014): _gcd_import
/usr/lib/python3.8/importlib/init.py(127): import_module
/usr/local/lib/python3.8/dist-packages/merlin/systems/dag/op_runner.py(30): init
/tmp/pytest-of-jenkins/pytest-0/test_workflow_tf_e2e_multi_op_1/2_plustwoop/1/model.py(66): initialize

I0614 11:04:32.084559 1768 model_inst_state.hpp:58] Loading TritonPythonModel from module 'merlin.systems.triton.workflow_model'
I0614 11:04:32.091672 1768 model_repository_manager.cc:1152] successfully loaded '1_transformworkflow' version 1
E0614 11:04:32.092697 1768 model_repository_manager.cc:1155] failed to load '2_plustwoop' version 1: Internal: ModuleNotFoundError: No module named 'tests.unit.systems'

At:
(973): _find_and_load_unlocked
(991): _find_and_load
(1014): _gcd_import
(219): _call_with_frames_removed
(961): _find_and_load_unlocked
(991): _find_and_load
(1014): _gcd_import
(219): _call_with_frames_removed
(961): _find_and_load_unlocked
(991): _find_and_load
(1014): _gcd_import
/usr/lib/python3.8/importlib/init.py(127): import_module
/usr/local/lib/python3.8/dist-packages/merlin/systems/dag/op_runner.py(30): init
/tmp/pytest-of-jenkins/pytest-0/test_workflow_tf_e2e_multi_op_1/2_plustwoop/1/model.py(66): initialize

E0614 11:04:32.093725 1768 model_repository_manager.cc:1341] Invalid argument: ensemble 'ensemble_model' depends on '2_plustwoop' which has no loaded version
I0614 11:04:32.093829 1768 server.cc:524]
+------------------+------+
| Repository Agent | Path |
+------------------+------+
+------------------+------+

I0614 11:04:32.094668 1768 server.cc:551]
+------------+-----------------------------------------------------------------+-----------------------------+
| Backend | Path | Config |
+------------+-----------------------------------------------------------------+-----------------------------+
| tensorflow | /opt/tritonserver/backends/tensorflow2/libtriton_tensorflow2.so | {"cmdline":{"version":"2"}} |
| nvtabular | /opt/tritonserver/backends/nvtabular/libtriton_nvtabular.so | {} |
+------------+-----------------------------------------------------------------+-----------------------------+

I0614 11:04:32.094805 1768 server.cc:594]
+---------------------+---------+----------------------------------------------------------------------------------------------------------+
| Model | Version | Status |
+---------------------+---------+----------------------------------------------------------------------------------------------------------+
| 0_transformworkflow | 1 | READY |
| 1_transformworkflow | 1 | READY |
| 2_plustwoop | 1 | UNAVAILABLE: Internal: ModuleNotFoundError: No module named 'tests.unit.systems' |
| | | |
| | | At: |
| | | (973): _find_and_load_unlocked |
| | | (991): _find_and_load |
| | | (1014): _gcd_import |
| | | (219): _call_with_frames_removed |
| | | (961): _find_and_load_unlocked |
| | | (991): _find_and_load |
| | | (1014): _gcd_import |
| | | (219): _call_with_frames_removed |
| | | (961): _find_and_load_unlocked |
| | | (991): _find_and_load |
| | | (1014): _gcd_import |
| | | /usr/lib/python3.8/importlib/init.py(127): import_module |
| | | /usr/local/lib/python3.8/dist-packages/merlin/systems/dag/op_runner.py(30): init |
| | | /tmp/pytest-of-jenkins/pytest-0/test_workflow_tf_e2e_multi_op_1/2_plustwoop/1/model.py(66): initialize |
| 3_predicttensorflow | 1 | READY |
+---------------------+---------+----------------------------------------------------------------------------------------------------------+

I0614 11:04:32.138948 1768 metrics.cc:651] Collecting metrics for GPU 0: Tesla P100-DGXS-16GB
I0614 11:04:32.140669 1768 tritonserver.cc:1962]
+----------------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Option | Value |
+----------------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| server_id | triton |
| server_version | 2.20.0 |
| server_extensions | classification sequence model_repository model_repository(unload_dependents) schedule_policy model_configuration system_shared_memory cuda_shared_memory binary_tensor_data statistics trace |
| model_repository_path[0] | /tmp/pytest-of-jenkins/pytest-0/test_workflow_tf_e2e_multi_op_1 |
| model_control_mode | MODE_NONE |
| strict_model_config | 1 |
| rate_limit | OFF |
| pinned_memory_pool_byte_size | 268435456 |
| cuda_memory_pool_byte_size{0} | 67108864 |
| response_cache_byte_size | 0 |
| min_supported_compute_capability | 6.0 |
| strict_readiness | 1 |
| exit_timeout | 30 |
+----------------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+

I0614 11:04:32.140704 1768 server.cc:252] Waiting for in-flight requests to complete.
I0614 11:04:32.140714 1768 model_repository_manager.cc:1029] unloading: 3_predicttensorflow:1
I0614 11:04:32.140765 1768 model_repository_manager.cc:1029] unloading: 1_transformworkflow:1
I0614 11:04:32.140822 1768 model_repository_manager.cc:1029] unloading: 0_transformworkflow:1
I0614 11:04:32.140967 1768 server.cc:267] Timeout 30: Found 3 live models and 0 in-flight non-inference requests
I0614 11:04:32.141022 1768 tensorflow.cc:2363] TRITONBACKEND_ModelInstanceFinalize: delete instance state
I0614 11:04:32.141038 1768 backend.cc:160] TRITONBACKEND_ModelInstanceFinalize: delete instance state
I0614 11:04:32.141033 1768 backend.cc:160] TRITONBACKEND_ModelInstanceFinalize: delete instance state
------------------------------ Captured log call -------------------------------
WARNING absl:signature_serialization.py:146 Function _wrapped_model contains input name(s) name-cat, name-string_plus_2 with unsupported characters which will be renamed to name_cat, name_string_plus_2 in the SavedModel.
WARNING absl:save.py:133 <nvtabular.framework_utils.tensorflow.layers.embedding.DenseFeatures object at 0x7f395854a730> has the same name 'DenseFeatures' as a built-in Keras object. Consider renaming <class 'nvtabular.framework_utils.tensorflow.layers.embedding.DenseFeatures'> to avoid naming conflicts when loading with tf.keras.models.load_model. If renaming is not possible, pass the object in the custom_objects parameter of the load function.
WARNING tensorflow:load.py:167 No training configuration found in save file, so the model was not compiled. Compile it manually.
WARNING absl:signature_serialization.py:146 Function _wrapped_model contains input name(s) name-cat, name-string_plus_2 with unsupported characters which will be renamed to name_cat, name_string_plus_2 in the SavedModel.
WARNING absl:save.py:133 <nvtabular.framework_utils.tensorflow.layers.embedding.DenseFeatures object at 0x7f395854a730> has the same name 'DenseFeatures' as a built-in Keras object. Consider renaming <class 'nvtabular.framework_utils.tensorflow.layers.embedding.DenseFeatures'> to avoid naming conflicts when loading with tf.keras.models.load_model. If renaming is not possible, pass the object in the custom_objects parameter of the load function.
WARNING tensorflow:load.py:167 No training configuration found in save file, so the model was not compiled. Compile it manually.
=============================== warnings summary ===============================
../../../.local/lib/python3.8/site-packages/nvtabular/framework_utils/init.py:18
/var/jenkins_home/.local/lib/python3.8/site-packages/nvtabular/framework_utils/init.py:18: DeprecationWarning: The nvtabular.framework_utils module is being replaced by the Merlin Models library. Support for importing from nvtabular.framework_utils is deprecated, and will be removed in a future version. Please consider using the models and layers from Merlin Models instead.
warnings.warn(

tests/unit/systems/test_ensemble.py: 7 warnings
tests/unit/systems/test_export.py: 1 warning
tests/unit/systems/test_inference_ops.py: 2 warnings
tests/unit/systems/test_op_runner.py: 4 warnings
/usr/local/lib/python3.8/dist-packages/cudf/core/dataframe.py:1292: UserWarning: The deep parameter is ignored and is only included for pandas compatibility.
warnings.warn(

tests/unit/systems/test_export.py::test_export_run_ensemble_triton[tensorflow-parquet]
/var/jenkins_home/workspace/merlin_systems/systems/merlin/systems/triton/export.py:304: UserWarning: Column x is being generated by NVTabular workflow but is unused in test_name_tf model
warnings.warn(

tests/unit/systems/test_export.py::test_export_run_ensemble_triton[tensorflow-parquet]
/var/jenkins_home/workspace/merlin_systems/systems/merlin/systems/triton/export.py:304: UserWarning: Column y is being generated by NVTabular workflow but is unused in test_name_tf model
warnings.warn(

tests/unit/systems/test_export.py::test_export_run_ensemble_triton[tensorflow-parquet]
/var/jenkins_home/workspace/merlin_systems/systems/merlin/systems/triton/export.py:304: UserWarning: Column id is being generated by NVTabular workflow but is unused in test_name_tf model
warnings.warn(

-- Docs: https://docs.pytest.org/en/stable/how-to/capture-warnings.html
=========================== short test summary info ============================
FAILED tests/unit/systems/test_ensemble.py::test_workflow_tf_e2e_multi_op_plus_2_run[parquet]
======= 1 failed, 17 passed, 2 skipped, 18 warnings in 73.46s (0:01:13) ========
Build step 'Execute shell' marked build as failure
Performing Post build task...
Match found for : : True
Logical operation result is TRUE
Running script : #!/bin/bash
cd /var/jenkins_home/
CUDA_VISIBLE_DEVICES=1 python test_res_push.py "https://api.GitHub.com/repos/NVIDIA-Merlin/systems/issues/$ghprbPullId/comments" "/var/jenkins_home/jobs/$JOB_NAME/builds/$BUILD_NUMBER/log"
[merlin_systems] $ /bin/bash /tmp/jenkins985048274841165495.sh

@nvidia-merlin-bot
Copy link

Click to view CI Results
GitHub pull request #118 of commit 604b232a994670f96e6948a684a3d22612f17909, no merge conflicts.
Running as SYSTEM
Setting status of 604b232a994670f96e6948a684a3d22612f17909 to PENDING with url https://10.20.13.93:8080/job/merlin_systems/81/console and message: 'Pending'
Using context: Jenkins
Building on master in workspace /var/jenkins_home/workspace/merlin_systems
using credential fce1c729-5d7c-48e8-90cb-b0c314b1076e
 > git rev-parse --is-inside-work-tree # timeout=10
Fetching changes from the remote Git repository
 > git config remote.origin.url https://github.com/NVIDIA-Merlin/systems # timeout=10
Fetching upstream changes from https://github.com/NVIDIA-Merlin/systems
 > git --version # timeout=10
using GIT_ASKPASS to set credentials login for merlin-systems user + githubtoken
 > git fetch --tags --force --progress -- https://github.com/NVIDIA-Merlin/systems +refs/pull/118/*:refs/remotes/origin/pr/118/* # timeout=10
 > git rev-parse 604b232a994670f96e6948a684a3d22612f17909^{commit} # timeout=10
Checking out Revision 604b232a994670f96e6948a684a3d22612f17909 (detached)
 > git config core.sparsecheckout # timeout=10
 > git checkout -f 604b232a994670f96e6948a684a3d22612f17909 # timeout=10
Commit message: "Rename Forest PredictForest and update docstring."
 > git rev-list --no-walk 46e82485690f300ba3371f926be021b04f678ff0 # timeout=10
[merlin_systems] $ /bin/bash /tmp/jenkins12417168055295635629.sh
============================= test session starts ==============================
platform linux -- Python 3.8.10, pytest-7.1.2, pluggy-1.0.0
rootdir: /var/jenkins_home/workspace/merlin_systems/systems, configfile: pyproject.toml
plugins: anyio-3.5.0, xdist-2.5.0, forked-1.4.0, cov-3.0.0
collected 18 items / 2 skipped

tests/unit/test_version.py . [ 5%]
tests/unit/systems/test_ensemble.py ...F [ 27%]
tests/unit/systems/test_ensemble_ops.py .. [ 38%]
tests/unit/systems/test_export.py . [ 44%]
tests/unit/systems/test_graph.py . [ 50%]
tests/unit/systems/test_inference_ops.py .. [ 61%]
tests/unit/systems/test_op_runner.py .... [ 83%]
tests/unit/systems/test_tensorflow_inf_op.py ... [100%]

=================================== FAILURES ===================================
______________ test_workflow_tf_e2e_multi_op_plus_2_run[parquet] _______________

tmpdir = local('/tmp/pytest-of-jenkins/pytest-1/test_workflow_tf_e2e_multi_op_1')
dataset = <merlin.io.dataset.Dataset object at 0x7f17a84b7b20>
engine = 'parquet'

@pytest.mark.skipif(not TRITON_SERVER_PATH, reason="triton server not found")
@pytest.mark.parametrize("engine", ["parquet"])
def test_workflow_tf_e2e_multi_op_plus_2_run(tmpdir, dataset, engine):
    # Create a Workflow
    schema = dataset.schema
    for name in ["x", "y", "id"]:
        dataset.schema.column_schemas[name] = dataset.schema.column_schemas[name].with_tags(
            [Tags.USER]
        )

    workflow_ops = ["name-cat"] >> wf_ops.Categorify(cat_cache="host")
    workflow = Workflow(workflow_ops)
    workflow.fit(dataset)

    embedding_shapes_1 = wf_ops.get_embedding_sizes(workflow)

    cats = ["name-string"] >> wf_ops.Categorify(cat_cache="host")
    workflow_2 = Workflow(cats)
    workflow_2.fit(dataset)

    embedding_shapes = wf_ops.get_embedding_sizes(workflow_2)
    embedding_shapes_1.update(embedding_shapes)
    embedding_shapes_1["name-string_plus_2"] = embedding_shapes_1["name-string"]

    # Create Tensorflow Model
    model = create_tf_model(["name-cat", "name-string_plus_2"], [], embedding_shapes_1)

    # Creating Triton Ensemble
    triton_chain_1 = ["name-cat"] >> TransformWorkflow(workflow)
    triton_chain_2 = ["name-string"] >> TransformWorkflow(workflow_2) >> PlusTwoOp()
    triton_chain = (triton_chain_1 + triton_chain_2) >> PredictTensorflow(model)

    triton_ens = Ensemble(triton_chain, schema)

    # Creating Triton Ensemble Config
    ensemble_config, nodes_config = triton_ens.export(str(tmpdir))
    config_path = tmpdir / "ensemble_model" / "config.pbtxt"

    # Checking Triton Ensemble Config
    with open(config_path, "rb") as f:
        config = model_config.ModelConfig()
        raw_config = f.read()
        parsed = text_format.Parse(raw_config, config)

        # The config file contents are correct
        assert parsed.name == "ensemble_model"
        assert parsed.platform == "ensemble"
        assert hasattr(parsed, "ensemble_scheduling")

    df = dataset.to_ddf().compute()[["name-string", "name-cat"]].iloc[:3]
  response = _run_ensemble_on_tritonserver(str(tmpdir), ["output"], df, triton_ens.name)

tests/unit/systems/test_ensemble.py:233:


tests/unit/systems/utils/triton.py:39: in _run_ensemble_on_tritonserver
with run_triton_server(tmpdir) as client:
/usr/lib/python3.8/contextlib.py:113: in enter
return next(self.gen)


modelpath = '/tmp/pytest-of-jenkins/pytest-1/test_workflow_tf_e2e_multi_op_1'

@contextlib.contextmanager
def run_triton_server(modelpath):
    """This function starts up a Triton server instance and returns a client to it.

    Parameters
    ----------
    modelpath : string
        The path to the model to load.

    Yields
    ------
    client: tritonclient.InferenceServerClient
        The client connected to the Triton server.

    """
    cmdline = [
        TRITON_SERVER_PATH,
        "--model-repository",
        modelpath,
        "--backend-config=tensorflow,version=2",
    ]
    env = os.environ.copy()
    env["CUDA_VISIBLE_DEVICES"] = "0"
    with subprocess.Popen(cmdline, env=env) as process:
        try:
            with grpcclient.InferenceServerClient("localhost:8001") as client:
                # wait until server is ready
                for _ in range(60):
                    if process.poll() is not None:
                        retcode = process.returncode
                      raise RuntimeError(f"Tritonserver failed to start (ret={retcode})")

E RuntimeError: Tritonserver failed to start (ret=-11)

merlin/systems/triton/utils.py:46: RuntimeError
----------------------------- Captured stderr call -----------------------------
I0614 11:29:55.058785 2805 tensorflow.cc:2176] TRITONBACKEND_Initialize: tensorflow
I0614 11:29:55.058900 2805 tensorflow.cc:2186] Triton TRITONBACKEND API version: 1.8
I0614 11:29:55.058908 2805 tensorflow.cc:2192] 'tensorflow' TRITONBACKEND API version: 1.8
I0614 11:29:55.058913 2805 tensorflow.cc:2216] backend configuration:
{"cmdline":{"version":"2"}}
I0614 11:29:55.247653 2805 pinned_memory_manager.cc:240] Pinned memory pool is created at '0x7f4b7e000000' with size 268435456
I0614 11:29:55.248393 2805 cuda_memory_manager.cc:105] CUDA memory pool is created on device 0 with size 67108864
I0614 11:29:55.253378 2805 model_repository_manager.cc:997] loading: 0_transformworkflow:1
I0614 11:29:55.353715 2805 model_repository_manager.cc:997] loading: 3_predicttensorflow:1
I0614 11:29:55.356916 2805 backend.cc:46] TRITONBACKEND_Initialize: nvtabular
I0614 11:29:55.356954 2805 backend.cc:53] Triton TRITONBACKEND API version: 1.8
I0614 11:29:55.356971 2805 backend.cc:56] 'nvtabular' TRITONBACKEND API version: 1.8
I0614 11:29:55.357197 2805 backend.cc:76] Loaded libpython successfully
I0614 11:29:55.453997 2805 model_repository_manager.cc:997] loading: 2_plustwoop:1
I0614 11:29:55.533406 2805 backend.cc:89] Python interpreter is initialized
I0614 11:29:55.534356 2805 tensorflow.cc:2276] TRITONBACKEND_ModelInitialize: 3_predicttensorflow (version 1)
I0614 11:29:55.534855 2805 model_inst_state.hpp:58] Loading TritonPythonModel from module 'merlin.systems.triton.workflow_model'
I0614 11:29:55.554282 2805 model_repository_manager.cc:997] loading: 1_transformworkflow:1
I0614 11:29:57.477015 2805 tensorflow.cc:2325] TRITONBACKEND_ModelInstanceInitialize: 3_predicttensorflow (GPU device 0)
I0614 11:29:57.477108 2805 model_repository_manager.cc:1152] successfully loaded '0_transformworkflow' version 1
2022-06-14 11:29:58.529588: I tensorflow/cc/saved_model/reader.cc:43] Reading SavedModel from: /tmp/pytest-of-jenkins/pytest-1/test_workflow_tf_e2e_multi_op_1/3_predicttensorflow/1/model.savedmodel
2022-06-14 11:29:58.531515: I tensorflow/cc/saved_model/reader.cc:78] Reading meta graph with tags { serve }
2022-06-14 11:29:58.531539: I tensorflow/cc/saved_model/reader.cc:119] Reading SavedModel debug info (if present) from: /tmp/pytest-of-jenkins/pytest-1/test_workflow_tf_e2e_multi_op_1/3_predicttensorflow/1/model.savedmodel
2022-06-14 11:29:58.531641: I tensorflow/core/platform/cpu_feature_guard.cc:152] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: SSE3 SSE4.1 SSE4.2 AVX
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2022-06-14 11:29:58.535844: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1525] Created device /job:localhost/replica:0/task:0/device:GPU:0 with 12669 MB memory: -> device: 0, name: Tesla P100-DGXS-16GB, pci bus id: 0000:07:00.0, compute capability: 6.0
2022-06-14 11:29:58.565356: I tensorflow/cc/saved_model/loader.cc:230] Restoring SavedModel bundle.
2022-06-14 11:29:58.620707: I tensorflow/cc/saved_model/loader.cc:214] Running initialization op on SavedModel bundle at path: /tmp/pytest-of-jenkins/pytest-1/test_workflow_tf_e2e_multi_op_1/3_predicttensorflow/1/model.savedmodel
2022-06-14 11:29:58.632677: I tensorflow/cc/saved_model/loader.cc:321] SavedModel load for tags { serve }; Status: success: OK. Took 103108 microseconds.
I0614 11:29:58.632848 2805 model_repository_manager.cc:1152] successfully loaded '3_predicttensorflow' version 1
I0614 11:29:58.636760 2805 python.cc:1903] TRITONBACKEND_ModelInstanceInitialize: 2_plustwoop (GPU device 0)
0614 11:30:00.614068 2898 pb_stub.cc:369] Failed to initialize Python stub: ModuleNotFoundError: No module named 'tests.unit.systems'

At:
(973): _find_and_load_unlocked
(991): _find_and_load
(1014): _gcd_import
(219): _call_with_frames_removed
(961): _find_and_load_unlocked
(991): _find_and_load
(1014): _gcd_import
(219): _call_with_frames_removed
(961): _find_and_load_unlocked
(991): _find_and_load
(1014): _gcd_import
/usr/lib/python3.8/importlib/init.py(127): import_module
/usr/local/lib/python3.8/dist-packages/merlin/systems/dag/op_runner.py(30): init
/tmp/pytest-of-jenkins/pytest-1/test_workflow_tf_e2e_multi_op_1/2_plustwoop/1/model.py(66): initialize

I0614 11:30:00.614298 2805 model_inst_state.hpp:58] Loading TritonPythonModel from module 'merlin.systems.triton.workflow_model'
I0614 11:30:00.623928 2805 model_repository_manager.cc:1152] successfully loaded '1_transformworkflow' version 1
E0614 11:30:00.625268 2805 model_repository_manager.cc:1155] failed to load '2_plustwoop' version 1: Internal: ModuleNotFoundError: No module named 'tests.unit.systems'

At:
(973): _find_and_load_unlocked
(991): _find_and_load
(1014): _gcd_import
(219): _call_with_frames_removed
(961): _find_and_load_unlocked
(991): _find_and_load
(1014): _gcd_import
(219): _call_with_frames_removed
(961): _find_and_load_unlocked
(991): _find_and_load
(1014): _gcd_import
/usr/lib/python3.8/importlib/init.py(127): import_module
/usr/local/lib/python3.8/dist-packages/merlin/systems/dag/op_runner.py(30): init
/tmp/pytest-of-jenkins/pytest-1/test_workflow_tf_e2e_multi_op_1/2_plustwoop/1/model.py(66): initialize

E0614 11:30:00.626507 2805 model_repository_manager.cc:1341] Invalid argument: ensemble 'ensemble_model' depends on '2_plustwoop' which has no loaded version
I0614 11:30:00.626617 2805 server.cc:524]
+------------------+------+
| Repository Agent | Path |
+------------------+------+
+------------------+------+

I0614 11:30:00.627609 2805 server.cc:551]
+------------+-----------------------------------------------------------------+-----------------------------+
| Backend | Path | Config |
+------------+-----------------------------------------------------------------+-----------------------------+
| tensorflow | /opt/tritonserver/backends/tensorflow2/libtriton_tensorflow2.so | {"cmdline":{"version":"2"}} |
| nvtabular | /opt/tritonserver/backends/nvtabular/libtriton_nvtabular.so | {} |
+------------+-----------------------------------------------------------------+-----------------------------+

I0614 11:30:00.627685 2805 server.cc:594]
+---------------------+---------+----------------------------------------------------------------------------------------------------------+
| Model | Version | Status |
+---------------------+---------+----------------------------------------------------------------------------------------------------------+
| 0_transformworkflow | 1 | READY |
| 1_transformworkflow | 1 | READY |
| 2_plustwoop | 1 | UNAVAILABLE: Internal: ModuleNotFoundError: No module named 'tests.unit.systems' |
| | | |
| | | At: |
| | | (973): _find_and_load_unlocked |
| | | (991): _find_and_load |
| | | (1014): _gcd_import |
| | | (219): _call_with_frames_removed |
| | | (961): _find_and_load_unlocked |
| | | (991): _find_and_load |
| | | (1014): _gcd_import |
| | | (219): _call_with_frames_removed |
| | | (961): _find_and_load_unlocked |
| | | (991): _find_and_load |
| | | (1014): _gcd_import |
| | | /usr/lib/python3.8/importlib/init.py(127): import_module |
| | | /usr/local/lib/python3.8/dist-packages/merlin/systems/dag/op_runner.py(30): init |
| | | /tmp/pytest-of-jenkins/pytest-1/test_workflow_tf_e2e_multi_op_1/2_plustwoop/1/model.py(66): initialize |
| 3_predicttensorflow | 1 | READY |
+---------------------+---------+----------------------------------------------------------------------------------------------------------+

I0614 11:30:00.669944 2805 metrics.cc:651] Collecting metrics for GPU 0: Tesla P100-DGXS-16GB
I0614 11:30:00.671569 2805 tritonserver.cc:1962]
+----------------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Option | Value |
+----------------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| server_id | triton |
| server_version | 2.20.0 |
| server_extensions | classification sequence model_repository model_repository(unload_dependents) schedule_policy model_configuration system_shared_memory cuda_shared_memory binary_tensor_data statistics trace |
| model_repository_path[0] | /tmp/pytest-of-jenkins/pytest-1/test_workflow_tf_e2e_multi_op_1 |
| model_control_mode | MODE_NONE |
| strict_model_config | 1 |
| rate_limit | OFF |
| pinned_memory_pool_byte_size | 268435456 |
| cuda_memory_pool_byte_size{0} | 67108864 |
| response_cache_byte_size | 0 |
| min_supported_compute_capability | 6.0 |
| strict_readiness | 1 |
| exit_timeout | 30 |
+----------------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+

I0614 11:30:00.671604 2805 server.cc:252] Waiting for in-flight requests to complete.
I0614 11:30:00.671613 2805 model_repository_manager.cc:1029] unloading: 3_predicttensorflow:1
I0614 11:30:00.671664 2805 model_repository_manager.cc:1029] unloading: 1_transformworkflow:1
I0614 11:30:00.671717 2805 model_repository_manager.cc:1029] unloading: 0_transformworkflow:1
I0614 11:30:00.671857 2805 server.cc:267] Timeout 30: Found 3 live models and 0 in-flight non-inference requests
I0614 11:30:00.671912 2805 tensorflow.cc:2363] TRITONBACKEND_ModelInstanceFinalize: delete instance state
I0614 11:30:00.671924 2805 backend.cc:160] TRITONBACKEND_ModelInstanceFinalize: delete instance state
------------------------------ Captured log call -------------------------------
WARNING absl:signature_serialization.py:146 Function _wrapped_model contains input name(s) name-cat, name-string_plus_2 with unsupported characters which will be renamed to name_cat, name_string_plus_2 in the SavedModel.
WARNING absl:save.py:133 <nvtabular.framework_utils.tensorflow.layers.embedding.DenseFeatures object at 0x7f17a80e04c0> has the same name 'DenseFeatures' as a built-in Keras object. Consider renaming <class 'nvtabular.framework_utils.tensorflow.layers.embedding.DenseFeatures'> to avoid naming conflicts when loading with tf.keras.models.load_model. If renaming is not possible, pass the object in the custom_objects parameter of the load function.
WARNING tensorflow:load.py:167 No training configuration found in save file, so the model was not compiled. Compile it manually.
WARNING absl:signature_serialization.py:146 Function _wrapped_model contains input name(s) name-cat, name-string_plus_2 with unsupported characters which will be renamed to name_cat, name_string_plus_2 in the SavedModel.
WARNING absl:save.py:133 <nvtabular.framework_utils.tensorflow.layers.embedding.DenseFeatures object at 0x7f17a80e04c0> has the same name 'DenseFeatures' as a built-in Keras object. Consider renaming <class 'nvtabular.framework_utils.tensorflow.layers.embedding.DenseFeatures'> to avoid naming conflicts when loading with tf.keras.models.load_model. If renaming is not possible, pass the object in the custom_objects parameter of the load function.
WARNING tensorflow:load.py:167 No training configuration found in save file, so the model was not compiled. Compile it manually.
=============================== warnings summary ===============================
../../../.local/lib/python3.8/site-packages/nvtabular/framework_utils/init.py:18
/var/jenkins_home/.local/lib/python3.8/site-packages/nvtabular/framework_utils/init.py:18: DeprecationWarning: The nvtabular.framework_utils module is being replaced by the Merlin Models library. Support for importing from nvtabular.framework_utils is deprecated, and will be removed in a future version. Please consider using the models and layers from Merlin Models instead.
warnings.warn(

tests/unit/systems/test_ensemble.py: 7 warnings
tests/unit/systems/test_export.py: 1 warning
tests/unit/systems/test_inference_ops.py: 2 warnings
tests/unit/systems/test_op_runner.py: 4 warnings
/usr/local/lib/python3.8/dist-packages/cudf/core/dataframe.py:1292: UserWarning: The deep parameter is ignored and is only included for pandas compatibility.
warnings.warn(

tests/unit/systems/test_export.py::test_export_run_ensemble_triton[tensorflow-parquet]
/var/jenkins_home/workspace/merlin_systems/systems/merlin/systems/triton/export.py:304: UserWarning: Column x is being generated by NVTabular workflow but is unused in test_name_tf model
warnings.warn(

tests/unit/systems/test_export.py::test_export_run_ensemble_triton[tensorflow-parquet]
/var/jenkins_home/workspace/merlin_systems/systems/merlin/systems/triton/export.py:304: UserWarning: Column y is being generated by NVTabular workflow but is unused in test_name_tf model
warnings.warn(

tests/unit/systems/test_export.py::test_export_run_ensemble_triton[tensorflow-parquet]
/var/jenkins_home/workspace/merlin_systems/systems/merlin/systems/triton/export.py:304: UserWarning: Column id is being generated by NVTabular workflow but is unused in test_name_tf model
warnings.warn(

-- Docs: https://docs.pytest.org/en/stable/how-to/capture-warnings.html
=========================== short test summary info ============================
FAILED tests/unit/systems/test_ensemble.py::test_workflow_tf_e2e_multi_op_plus_2_run[parquet]
======= 1 failed, 17 passed, 2 skipped, 18 warnings in 71.25s (0:01:11) ========
Build step 'Execute shell' marked build as failure
Performing Post build task...
Match found for : : True
Logical operation result is TRUE
Running script : #!/bin/bash
cd /var/jenkins_home/
CUDA_VISIBLE_DEVICES=1 python test_res_push.py "https://api.GitHub.com/repos/NVIDIA-Merlin/systems/issues/$ghprbPullId/comments" "/var/jenkins_home/jobs/$JOB_NAME/builds/$BUILD_NUMBER/log"
[merlin_systems] $ /bin/bash /tmp/jenkins6715904996099375955.sh

@nvidia-merlin-bot
Copy link

Click to view CI Results
GitHub pull request #118 of commit 5738256b1766f6aea3d2e9a2d48cf040512d73b7, no merge conflicts.
Running as SYSTEM
Setting status of 5738256b1766f6aea3d2e9a2d48cf040512d73b7 to PENDING with url https://10.20.13.93:8080/job/merlin_systems/82/console and message: 'Pending'
Using context: Jenkins
Building on master in workspace /var/jenkins_home/workspace/merlin_systems
using credential fce1c729-5d7c-48e8-90cb-b0c314b1076e
 > git rev-parse --is-inside-work-tree # timeout=10
Fetching changes from the remote Git repository
 > git config remote.origin.url https://github.com/NVIDIA-Merlin/systems # timeout=10
Fetching upstream changes from https://github.com/NVIDIA-Merlin/systems
 > git --version # timeout=10
using GIT_ASKPASS to set credentials login for merlin-systems user + githubtoken
 > git fetch --tags --force --progress -- https://github.com/NVIDIA-Merlin/systems +refs/pull/118/*:refs/remotes/origin/pr/118/* # timeout=10
 > git rev-parse 5738256b1766f6aea3d2e9a2d48cf040512d73b7^{commit} # timeout=10
Checking out Revision 5738256b1766f6aea3d2e9a2d48cf040512d73b7 (detached)
 > git config core.sparsecheckout # timeout=10
 > git checkout -f 5738256b1766f6aea3d2e9a2d48cf040512d73b7 # timeout=10
Commit message: "Correct name of directory in tests"
 > git rev-list --no-walk 604b232a994670f96e6948a684a3d22612f17909 # timeout=10
[merlin_systems] $ /bin/bash /tmp/jenkins9073330330584653846.sh
============================= test session starts ==============================
platform linux -- Python 3.8.10, pytest-7.1.2, pluggy-1.0.0
rootdir: /var/jenkins_home/workspace/merlin_systems/systems, configfile: pyproject.toml
plugins: anyio-3.5.0, xdist-2.5.0, forked-1.4.0, cov-3.0.0
collected 18 items / 2 skipped

tests/unit/test_version.py . [ 5%]
tests/unit/systems/test_ensemble.py ...F [ 27%]
tests/unit/systems/test_ensemble_ops.py .. [ 38%]
tests/unit/systems/test_export.py . [ 44%]
tests/unit/systems/test_graph.py . [ 50%]
tests/unit/systems/test_inference_ops.py .. [ 61%]
tests/unit/systems/test_op_runner.py .... [ 83%]
tests/unit/systems/test_tensorflow_inf_op.py ... [100%]

=================================== FAILURES ===================================
______________ test_workflow_tf_e2e_multi_op_plus_2_run[parquet] _______________

tmpdir = local('/tmp/pytest-of-jenkins/pytest-2/test_workflow_tf_e2e_multi_op_1')
dataset = <merlin.io.dataset.Dataset object at 0x7f0249c94b50>
engine = 'parquet'

@pytest.mark.skipif(not TRITON_SERVER_PATH, reason="triton server not found")
@pytest.mark.parametrize("engine", ["parquet"])
def test_workflow_tf_e2e_multi_op_plus_2_run(tmpdir, dataset, engine):
    # Create a Workflow
    schema = dataset.schema
    for name in ["x", "y", "id"]:
        dataset.schema.column_schemas[name] = dataset.schema.column_schemas[name].with_tags(
            [Tags.USER]
        )

    workflow_ops = ["name-cat"] >> wf_ops.Categorify(cat_cache="host")
    workflow = Workflow(workflow_ops)
    workflow.fit(dataset)

    embedding_shapes_1 = wf_ops.get_embedding_sizes(workflow)

    cats = ["name-string"] >> wf_ops.Categorify(cat_cache="host")
    workflow_2 = Workflow(cats)
    workflow_2.fit(dataset)

    embedding_shapes = wf_ops.get_embedding_sizes(workflow_2)
    embedding_shapes_1.update(embedding_shapes)
    embedding_shapes_1["name-string_plus_2"] = embedding_shapes_1["name-string"]

    # Create Tensorflow Model
    model = create_tf_model(["name-cat", "name-string_plus_2"], [], embedding_shapes_1)

    # Creating Triton Ensemble
    triton_chain_1 = ["name-cat"] >> TransformWorkflow(workflow)
    triton_chain_2 = ["name-string"] >> TransformWorkflow(workflow_2) >> PlusTwoOp()
    triton_chain = (triton_chain_1 + triton_chain_2) >> PredictTensorflow(model)

    triton_ens = Ensemble(triton_chain, schema)

    # Creating Triton Ensemble Config
    ensemble_config, nodes_config = triton_ens.export(str(tmpdir))
    config_path = tmpdir / "ensemble_model" / "config.pbtxt"

    # Checking Triton Ensemble Config
    with open(config_path, "rb") as f:
        config = model_config.ModelConfig()
        raw_config = f.read()
        parsed = text_format.Parse(raw_config, config)

        # The config file contents are correct
        assert parsed.name == "ensemble_model"
        assert parsed.platform == "ensemble"
        assert hasattr(parsed, "ensemble_scheduling")

    df = dataset.to_ddf().compute()[["name-string", "name-cat"]].iloc[:3]
  response = _run_ensemble_on_tritonserver(str(tmpdir), ["output"], df, triton_ens.name)

tests/unit/systems/test_ensemble.py:233:


tests/unit/systems/utils/triton.py:39: in _run_ensemble_on_tritonserver
with run_triton_server(tmpdir) as client:
/usr/lib/python3.8/contextlib.py:113: in enter
return next(self.gen)


modelpath = '/tmp/pytest-of-jenkins/pytest-2/test_workflow_tf_e2e_multi_op_1'

@contextlib.contextmanager
def run_triton_server(modelpath):
    """This function starts up a Triton server instance and returns a client to it.

    Parameters
    ----------
    modelpath : string
        The path to the model to load.

    Yields
    ------
    client: tritonclient.InferenceServerClient
        The client connected to the Triton server.

    """
    cmdline = [
        TRITON_SERVER_PATH,
        "--model-repository",
        modelpath,
        "--backend-config=tensorflow,version=2",
    ]
    env = os.environ.copy()
    env["CUDA_VISIBLE_DEVICES"] = "0"
    with subprocess.Popen(cmdline, env=env) as process:
        try:
            with grpcclient.InferenceServerClient("localhost:8001") as client:
                # wait until server is ready
                for _ in range(60):
                    if process.poll() is not None:
                        retcode = process.returncode
                      raise RuntimeError(f"Tritonserver failed to start (ret={retcode})")

E RuntimeError: Tritonserver failed to start (ret=-11)

merlin/systems/triton/utils.py:46: RuntimeError
----------------------------- Captured stderr call -----------------------------
I0614 11:35:16.042740 3747 tensorflow.cc:2176] TRITONBACKEND_Initialize: tensorflow
I0614 11:35:16.042859 3747 tensorflow.cc:2186] Triton TRITONBACKEND API version: 1.8
I0614 11:35:16.042867 3747 tensorflow.cc:2192] 'tensorflow' TRITONBACKEND API version: 1.8
I0614 11:35:16.042873 3747 tensorflow.cc:2216] backend configuration:
{"cmdline":{"version":"2"}}
I0614 11:35:16.233575 3747 pinned_memory_manager.cc:240] Pinned memory pool is created at '0x7f7cf4000000' with size 268435456
I0614 11:35:16.234329 3747 cuda_memory_manager.cc:105] CUDA memory pool is created on device 0 with size 67108864
I0614 11:35:16.239320 3747 model_repository_manager.cc:997] loading: 0_transformworkflow:1
I0614 11:35:16.339666 3747 model_repository_manager.cc:997] loading: 3_predicttensorflow:1
I0614 11:35:16.342808 3747 backend.cc:46] TRITONBACKEND_Initialize: nvtabular
I0614 11:35:16.342848 3747 backend.cc:53] Triton TRITONBACKEND API version: 1.8
I0614 11:35:16.342866 3747 backend.cc:56] 'nvtabular' TRITONBACKEND API version: 1.8
I0614 11:35:16.343091 3747 backend.cc:76] Loaded libpython successfully
I0614 11:35:16.439943 3747 model_repository_manager.cc:997] loading: 2_plustwoop:1
I0614 11:35:16.517644 3747 backend.cc:89] Python interpreter is initialized
I0614 11:35:16.518593 3747 tensorflow.cc:2276] TRITONBACKEND_ModelInitialize: 3_predicttensorflow (version 1)
I0614 11:35:16.519102 3747 model_inst_state.hpp:58] Loading TritonPythonModel from module 'merlin.systems.triton.workflow_model'
I0614 11:35:16.540246 3747 model_repository_manager.cc:997] loading: 1_transformworkflow:1
I0614 11:35:18.443806 3747 tensorflow.cc:2325] TRITONBACKEND_ModelInstanceInitialize: 3_predicttensorflow (GPU device 0)
I0614 11:35:18.443914 3747 model_repository_manager.cc:1152] successfully loaded '0_transformworkflow' version 1
2022-06-14 11:35:19.497477: I tensorflow/cc/saved_model/reader.cc:43] Reading SavedModel from: /tmp/pytest-of-jenkins/pytest-2/test_workflow_tf_e2e_multi_op_1/3_predicttensorflow/1/model.savedmodel
2022-06-14 11:35:19.498985: I tensorflow/cc/saved_model/reader.cc:78] Reading meta graph with tags { serve }
2022-06-14 11:35:19.499009: I tensorflow/cc/saved_model/reader.cc:119] Reading SavedModel debug info (if present) from: /tmp/pytest-of-jenkins/pytest-2/test_workflow_tf_e2e_multi_op_1/3_predicttensorflow/1/model.savedmodel
2022-06-14 11:35:19.499114: I tensorflow/core/platform/cpu_feature_guard.cc:152] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: SSE3 SSE4.1 SSE4.2 AVX
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2022-06-14 11:35:19.506675: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1525] Created device /job:localhost/replica:0/task:0/device:GPU:0 with 12669 MB memory: -> device: 0, name: Tesla P100-DGXS-16GB, pci bus id: 0000:07:00.0, compute capability: 6.0
2022-06-14 11:35:19.541223: I tensorflow/cc/saved_model/loader.cc:230] Restoring SavedModel bundle.
2022-06-14 11:35:19.597645: I tensorflow/cc/saved_model/loader.cc:214] Running initialization op on SavedModel bundle at path: /tmp/pytest-of-jenkins/pytest-2/test_workflow_tf_e2e_multi_op_1/3_predicttensorflow/1/model.savedmodel
2022-06-14 11:35:19.610096: I tensorflow/cc/saved_model/loader.cc:321] SavedModel load for tags { serve }; Status: success: OK. Took 112635 microseconds.
I0614 11:35:19.610287 3747 model_repository_manager.cc:1152] successfully loaded '3_predicttensorflow' version 1
I0614 11:35:19.615537 3747 python.cc:1903] TRITONBACKEND_ModelInstanceInitialize: 2_plustwoop (GPU device 0)
0614 11:35:21.544412 3840 pb_stub.cc:369] Failed to initialize Python stub: ModuleNotFoundError: No module named 'tests.unit.systems'

At:
(973): _find_and_load_unlocked
(991): _find_and_load
(1014): _gcd_import
(219): _call_with_frames_removed
(961): _find_and_load_unlocked
(991): _find_and_load
(1014): _gcd_import
(219): _call_with_frames_removed
(961): _find_and_load_unlocked
(991): _find_and_load
(1014): _gcd_import
/usr/lib/python3.8/importlib/init.py(127): import_module
/usr/local/lib/python3.8/dist-packages/merlin/systems/dag/op_runner.py(30): init
/tmp/pytest-of-jenkins/pytest-2/test_workflow_tf_e2e_multi_op_1/2_plustwoop/1/model.py(66): initialize

I0614 11:35:21.544654 3747 model_inst_state.hpp:58] Loading TritonPythonModel from module 'merlin.systems.triton.workflow_model'
I0614 11:35:21.554739 3747 model_repository_manager.cc:1152] successfully loaded '1_transformworkflow' version 1
E0614 11:35:21.556085 3747 model_repository_manager.cc:1155] failed to load '2_plustwoop' version 1: Internal: ModuleNotFoundError: No module named 'tests.unit.systems'

At:
(973): _find_and_load_unlocked
(991): _find_and_load
(1014): _gcd_import
(219): _call_with_frames_removed
(961): _find_and_load_unlocked
(991): _find_and_load
(1014): _gcd_import
(219): _call_with_frames_removed
(961): _find_and_load_unlocked
(991): _find_and_load
(1014): _gcd_import
/usr/lib/python3.8/importlib/init.py(127): import_module
/usr/local/lib/python3.8/dist-packages/merlin/systems/dag/op_runner.py(30): init
/tmp/pytest-of-jenkins/pytest-2/test_workflow_tf_e2e_multi_op_1/2_plustwoop/1/model.py(66): initialize

E0614 11:35:21.556673 3747 model_repository_manager.cc:1341] Invalid argument: ensemble 'ensemble_model' depends on '2_plustwoop' which has no loaded version
I0614 11:35:21.556730 3747 server.cc:524]
+------------------+------+
| Repository Agent | Path |
+------------------+------+
+------------------+------+

I0614 11:35:21.557259 3747 server.cc:551]
+------------+-----------------------------------------------------------------+-----------------------------+
| Backend | Path | Config |
+------------+-----------------------------------------------------------------+-----------------------------+
| tensorflow | /opt/tritonserver/backends/tensorflow2/libtriton_tensorflow2.so | {"cmdline":{"version":"2"}} |
| nvtabular | /opt/tritonserver/backends/nvtabular/libtriton_nvtabular.so | {} |
+------------+-----------------------------------------------------------------+-----------------------------+

I0614 11:35:21.557332 3747 server.cc:594]
+---------------------+---------+----------------------------------------------------------------------------------------------------------+
| Model | Version | Status |
+---------------------+---------+----------------------------------------------------------------------------------------------------------+
| 0_transformworkflow | 1 | READY |
| 1_transformworkflow | 1 | READY |
| 2_plustwoop | 1 | UNAVAILABLE: Internal: ModuleNotFoundError: No module named 'tests.unit.systems' |
| | | |
| | | At: |
| | | (973): _find_and_load_unlocked |
| | | (991): _find_and_load |
| | | (1014): _gcd_import |
| | | (219): _call_with_frames_removed |
| | | (961): _find_and_load_unlocked |
| | | (991): _find_and_load |
| | | (1014): _gcd_import |
| | | (219): _call_with_frames_removed |
| | | (961): _find_and_load_unlocked |
| | | (991): _find_and_load |
| | | (1014): _gcd_import |
| | | /usr/lib/python3.8/importlib/init.py(127): import_module |
| | | /usr/local/lib/python3.8/dist-packages/merlin/systems/dag/op_runner.py(30): init |
| | | /tmp/pytest-of-jenkins/pytest-2/test_workflow_tf_e2e_multi_op_1/2_plustwoop/1/model.py(66): initialize |
| 3_predicttensorflow | 1 | READY |
+---------------------+---------+----------------------------------------------------------------------------------------------------------+

I0614 11:35:21.597915 3747 metrics.cc:651] Collecting metrics for GPU 0: Tesla P100-DGXS-16GB
I0614 11:35:21.599481 3747 tritonserver.cc:1962]
+----------------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Option | Value |
+----------------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| server_id | triton |
| server_version | 2.20.0 |
| server_extensions | classification sequence model_repository model_repository(unload_dependents) schedule_policy model_configuration system_shared_memory cuda_shared_memory binary_tensor_data statistics trace |
| model_repository_path[0] | /tmp/pytest-of-jenkins/pytest-2/test_workflow_tf_e2e_multi_op_1 |
| model_control_mode | MODE_NONE |
| strict_model_config | 1 |
| rate_limit | OFF |
| pinned_memory_pool_byte_size | 268435456 |
| cuda_memory_pool_byte_size{0} | 67108864 |
| response_cache_byte_size | 0 |
| min_supported_compute_capability | 6.0 |
| strict_readiness | 1 |
| exit_timeout | 30 |
+----------------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+

I0614 11:35:21.599516 3747 server.cc:252] Waiting for in-flight requests to complete.
I0614 11:35:21.599525 3747 model_repository_manager.cc:1029] unloading: 3_predicttensorflow:1
I0614 11:35:21.599572 3747 model_repository_manager.cc:1029] unloading: 1_transformworkflow:1
I0614 11:35:21.599630 3747 model_repository_manager.cc:1029] unloading: 0_transformworkflow:1
I0614 11:35:21.599718 3747 server.cc:267] Timeout 30: Found 3 live models and 0 in-flight non-inference requests
I0614 11:35:21.599726 3747 tensorflow.cc:2363] TRITONBACKEND_ModelInstanceFinalize: delete instance state
I0614 11:35:21.599745 3747 backend.cc:160] TRITONBACKEND_ModelInstanceFinalize: delete instance state
------------------------------ Captured log call -------------------------------
WARNING absl:signature_serialization.py:146 Function _wrapped_model contains input name(s) name-cat, name-string_plus_2 with unsupported characters which will be renamed to name_cat, name_string_plus_2 in the SavedModel.
WARNING absl:save.py:133 <nvtabular.framework_utils.tensorflow.layers.embedding.DenseFeatures object at 0x7f024a172910> has the same name 'DenseFeatures' as a built-in Keras object. Consider renaming <class 'nvtabular.framework_utils.tensorflow.layers.embedding.DenseFeatures'> to avoid naming conflicts when loading with tf.keras.models.load_model. If renaming is not possible, pass the object in the custom_objects parameter of the load function.
WARNING tensorflow:load.py:167 No training configuration found in save file, so the model was not compiled. Compile it manually.
WARNING absl:signature_serialization.py:146 Function _wrapped_model contains input name(s) name-cat, name-string_plus_2 with unsupported characters which will be renamed to name_cat, name_string_plus_2 in the SavedModel.
WARNING absl:save.py:133 <nvtabular.framework_utils.tensorflow.layers.embedding.DenseFeatures object at 0x7f024a172910> has the same name 'DenseFeatures' as a built-in Keras object. Consider renaming <class 'nvtabular.framework_utils.tensorflow.layers.embedding.DenseFeatures'> to avoid naming conflicts when loading with tf.keras.models.load_model. If renaming is not possible, pass the object in the custom_objects parameter of the load function.
WARNING tensorflow:load.py:167 No training configuration found in save file, so the model was not compiled. Compile it manually.
=============================== warnings summary ===============================
../../../.local/lib/python3.8/site-packages/nvtabular/framework_utils/init.py:18
/var/jenkins_home/.local/lib/python3.8/site-packages/nvtabular/framework_utils/init.py:18: DeprecationWarning: The nvtabular.framework_utils module is being replaced by the Merlin Models library. Support for importing from nvtabular.framework_utils is deprecated, and will be removed in a future version. Please consider using the models and layers from Merlin Models instead.
warnings.warn(

tests/unit/systems/test_ensemble.py: 7 warnings
tests/unit/systems/test_export.py: 1 warning
tests/unit/systems/test_inference_ops.py: 2 warnings
tests/unit/systems/test_op_runner.py: 4 warnings
/usr/local/lib/python3.8/dist-packages/cudf/core/dataframe.py:1292: UserWarning: The deep parameter is ignored and is only included for pandas compatibility.
warnings.warn(

tests/unit/systems/test_export.py::test_export_run_ensemble_triton[tensorflow-parquet]
/var/jenkins_home/workspace/merlin_systems/systems/merlin/systems/triton/export.py:304: UserWarning: Column x is being generated by NVTabular workflow but is unused in test_name_tf model
warnings.warn(

tests/unit/systems/test_export.py::test_export_run_ensemble_triton[tensorflow-parquet]
/var/jenkins_home/workspace/merlin_systems/systems/merlin/systems/triton/export.py:304: UserWarning: Column y is being generated by NVTabular workflow but is unused in test_name_tf model
warnings.warn(

tests/unit/systems/test_export.py::test_export_run_ensemble_triton[tensorflow-parquet]
/var/jenkins_home/workspace/merlin_systems/systems/merlin/systems/triton/export.py:304: UserWarning: Column id is being generated by NVTabular workflow but is unused in test_name_tf model
warnings.warn(

-- Docs: https://docs.pytest.org/en/stable/how-to/capture-warnings.html
=========================== short test summary info ============================
FAILED tests/unit/systems/test_ensemble.py::test_workflow_tf_e2e_multi_op_plus_2_run[parquet]
======= 1 failed, 17 passed, 2 skipped, 18 warnings in 71.25s (0:01:11) ========
Build step 'Execute shell' marked build as failure
Performing Post build task...
Match found for : : True
Logical operation result is TRUE
Running script : #!/bin/bash
cd /var/jenkins_home/
CUDA_VISIBLE_DEVICES=1 python test_res_push.py "https://api.GitHub.com/repos/NVIDIA-Merlin/systems/issues/$ghprbPullId/comments" "/var/jenkins_home/jobs/$JOB_NAME/builds/$BUILD_NUMBER/log"
[merlin_systems] $ /bin/bash /tmp/jenkins12657324563057445998.sh

@nvidia-merlin-bot
Copy link

Click to view CI Results
GitHub pull request #118 of commit 30c6878d37785647b338bcd79bb78ace64f8f84a, no merge conflicts.
Running as SYSTEM
Setting status of 30c6878d37785647b338bcd79bb78ace64f8f84a to PENDING with url https://10.20.13.93:8080/job/merlin_systems/83/console and message: 'Pending'
Using context: Jenkins
Building on master in workspace /var/jenkins_home/workspace/merlin_systems
using credential fce1c729-5d7c-48e8-90cb-b0c314b1076e
 > git rev-parse --is-inside-work-tree # timeout=10
Fetching changes from the remote Git repository
 > git config remote.origin.url https://github.com/NVIDIA-Merlin/systems # timeout=10
Fetching upstream changes from https://github.com/NVIDIA-Merlin/systems
 > git --version # timeout=10
using GIT_ASKPASS to set credentials login for merlin-systems user + githubtoken
 > git fetch --tags --force --progress -- https://github.com/NVIDIA-Merlin/systems +refs/pull/118/*:refs/remotes/origin/pr/118/* # timeout=10
 > git rev-parse 30c6878d37785647b338bcd79bb78ace64f8f84a^{commit} # timeout=10
Checking out Revision 30c6878d37785647b338bcd79bb78ace64f8f84a (detached)
 > git config core.sparsecheckout # timeout=10
 > git checkout -f 30c6878d37785647b338bcd79bb78ace64f8f84a # timeout=10
Commit message: "Correct name of directory in tests"
 > git rev-list --no-walk 5738256b1766f6aea3d2e9a2d48cf040512d73b7 # timeout=10
[merlin_systems] $ /bin/bash /tmp/jenkins16655649153270021424.sh
============================= test session starts ==============================
platform linux -- Python 3.8.10, pytest-7.1.2, pluggy-1.0.0
rootdir: /var/jenkins_home/workspace/merlin_systems/systems, configfile: pyproject.toml
plugins: anyio-3.5.0, xdist-2.5.0, forked-1.4.0, cov-3.0.0
collected 18 items / 2 skipped

tests/unit/test_version.py . [ 5%]
tests/unit/systems/test_ensemble.py ...F [ 27%]
tests/unit/systems/test_ensemble_ops.py .. [ 38%]
tests/unit/systems/test_export.py . [ 44%]
tests/unit/systems/test_graph.py . [ 50%]
tests/unit/systems/test_inference_ops.py .. [ 61%]
tests/unit/systems/test_op_runner.py .... [ 83%]
tests/unit/systems/test_tensorflow_inf_op.py ... [100%]

=================================== FAILURES ===================================
______________ test_workflow_tf_e2e_multi_op_plus_2_run[parquet] _______________

tmpdir = local('/tmp/pytest-of-jenkins/pytest-3/test_workflow_tf_e2e_multi_op_1')
dataset = <merlin.io.dataset.Dataset object at 0x7ff9b80218b0>
engine = 'parquet'

@pytest.mark.skipif(not TRITON_SERVER_PATH, reason="triton server not found")
@pytest.mark.parametrize("engine", ["parquet"])
def test_workflow_tf_e2e_multi_op_plus_2_run(tmpdir, dataset, engine):
    # Create a Workflow
    schema = dataset.schema
    for name in ["x", "y", "id"]:
        dataset.schema.column_schemas[name] = dataset.schema.column_schemas[name].with_tags(
            [Tags.USER]
        )

    workflow_ops = ["name-cat"] >> wf_ops.Categorify(cat_cache="host")
    workflow = Workflow(workflow_ops)
    workflow.fit(dataset)

    embedding_shapes_1 = wf_ops.get_embedding_sizes(workflow)

    cats = ["name-string"] >> wf_ops.Categorify(cat_cache="host")
    workflow_2 = Workflow(cats)
    workflow_2.fit(dataset)

    embedding_shapes = wf_ops.get_embedding_sizes(workflow_2)
    embedding_shapes_1.update(embedding_shapes)
    embedding_shapes_1["name-string_plus_2"] = embedding_shapes_1["name-string"]

    # Create Tensorflow Model
    model = create_tf_model(["name-cat", "name-string_plus_2"], [], embedding_shapes_1)

    # Creating Triton Ensemble
    triton_chain_1 = ["name-cat"] >> TransformWorkflow(workflow)
    triton_chain_2 = ["name-string"] >> TransformWorkflow(workflow_2) >> PlusTwoOp()
    triton_chain = (triton_chain_1 + triton_chain_2) >> PredictTensorflow(model)

    triton_ens = Ensemble(triton_chain, schema)

    # Creating Triton Ensemble Config
    ensemble_config, nodes_config = triton_ens.export(str(tmpdir))
    config_path = tmpdir / "ensemble_model" / "config.pbtxt"

    # Checking Triton Ensemble Config
    with open(config_path, "rb") as f:
        config = model_config.ModelConfig()
        raw_config = f.read()
        parsed = text_format.Parse(raw_config, config)

        # The config file contents are correct
        assert parsed.name == "ensemble_model"
        assert parsed.platform == "ensemble"
        assert hasattr(parsed, "ensemble_scheduling")

    df = dataset.to_ddf().compute()[["name-string", "name-cat"]].iloc[:3]
  response = _run_ensemble_on_tritonserver(str(tmpdir), ["output"], df, triton_ens.name)

tests/unit/systems/test_ensemble.py:233:


tests/unit/systems/utils/triton.py:39: in _run_ensemble_on_tritonserver
with run_triton_server(tmpdir) as client:
/usr/lib/python3.8/contextlib.py:113: in enter
return next(self.gen)


modelpath = '/tmp/pytest-of-jenkins/pytest-3/test_workflow_tf_e2e_multi_op_1'

@contextlib.contextmanager
def run_triton_server(modelpath):
    """This function starts up a Triton server instance and returns a client to it.

    Parameters
    ----------
    modelpath : string
        The path to the model to load.

    Yields
    ------
    client: tritonclient.InferenceServerClient
        The client connected to the Triton server.

    """
    cmdline = [
        TRITON_SERVER_PATH,
        "--model-repository",
        modelpath,
        "--backend-config=tensorflow,version=2",
    ]
    env = os.environ.copy()
    env["CUDA_VISIBLE_DEVICES"] = "0"
    with subprocess.Popen(cmdline, env=env) as process:
        try:
            with grpcclient.InferenceServerClient("localhost:8001") as client:
                # wait until server is ready
                for _ in range(60):
                    if process.poll() is not None:
                        retcode = process.returncode
                      raise RuntimeError(f"Tritonserver failed to start (ret={retcode})")

E RuntimeError: Tritonserver failed to start (ret=-11)

merlin/systems/triton/utils.py:46: RuntimeError
----------------------------- Captured stderr call -----------------------------
I0614 13:42:34.409028 5262 tensorflow.cc:2176] TRITONBACKEND_Initialize: tensorflow
I0614 13:42:34.409145 5262 tensorflow.cc:2186] Triton TRITONBACKEND API version: 1.8
I0614 13:42:34.409153 5262 tensorflow.cc:2192] 'tensorflow' TRITONBACKEND API version: 1.8
I0614 13:42:34.409159 5262 tensorflow.cc:2216] backend configuration:
{"cmdline":{"version":"2"}}
I0614 13:42:34.600060 5262 pinned_memory_manager.cc:240] Pinned memory pool is created at '0x7f3626000000' with size 268435456
I0614 13:42:34.600813 5262 cuda_memory_manager.cc:105] CUDA memory pool is created on device 0 with size 67108864
I0614 13:42:34.605777 5262 model_repository_manager.cc:997] loading: 0_transformworkflow:1
I0614 13:42:34.706014 5262 model_repository_manager.cc:997] loading: 3_predicttensorflow:1
I0614 13:42:34.709272 5262 backend.cc:46] TRITONBACKEND_Initialize: nvtabular
I0614 13:42:34.709312 5262 backend.cc:53] Triton TRITONBACKEND API version: 1.8
I0614 13:42:34.709329 5262 backend.cc:56] 'nvtabular' TRITONBACKEND API version: 1.8
I0614 13:42:34.709558 5262 backend.cc:76] Loaded libpython successfully
I0614 13:42:34.806255 5262 model_repository_manager.cc:997] loading: 2_plustwoop:1
I0614 13:42:34.881720 5262 backend.cc:89] Python interpreter is initialized
I0614 13:42:34.882646 5262 tensorflow.cc:2276] TRITONBACKEND_ModelInitialize: 3_predicttensorflow (version 1)
I0614 13:42:34.883133 5262 model_inst_state.hpp:58] Loading TritonPythonModel from module 'merlin.systems.triton.workflow_model'
I0614 13:42:34.906526 5262 model_repository_manager.cc:997] loading: 1_transformworkflow:1
I0614 13:42:36.787807 5262 tensorflow.cc:2325] TRITONBACKEND_ModelInstanceInitialize: 3_predicttensorflow (GPU device 0)
I0614 13:42:36.787922 5262 model_repository_manager.cc:1152] successfully loaded '0_transformworkflow' version 1
2022-06-14 13:42:37.829214: I tensorflow/cc/saved_model/reader.cc:43] Reading SavedModel from: /tmp/pytest-of-jenkins/pytest-3/test_workflow_tf_e2e_multi_op_1/3_predicttensorflow/1/model.savedmodel
2022-06-14 13:42:37.831081: I tensorflow/cc/saved_model/reader.cc:78] Reading meta graph with tags { serve }
2022-06-14 13:42:37.831106: I tensorflow/cc/saved_model/reader.cc:119] Reading SavedModel debug info (if present) from: /tmp/pytest-of-jenkins/pytest-3/test_workflow_tf_e2e_multi_op_1/3_predicttensorflow/1/model.savedmodel
2022-06-14 13:42:37.831223: I tensorflow/core/platform/cpu_feature_guard.cc:152] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: SSE3 SSE4.1 SSE4.2 AVX
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2022-06-14 13:42:37.836777: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1525] Created device /job:localhost/replica:0/task:0/device:GPU:0 with 12669 MB memory: -> device: 0, name: Tesla P100-DGXS-16GB, pci bus id: 0000:07:00.0, compute capability: 6.0
2022-06-14 13:42:37.877313: I tensorflow/cc/saved_model/loader.cc:230] Restoring SavedModel bundle.
2022-06-14 13:42:37.932082: I tensorflow/cc/saved_model/loader.cc:214] Running initialization op on SavedModel bundle at path: /tmp/pytest-of-jenkins/pytest-3/test_workflow_tf_e2e_multi_op_1/3_predicttensorflow/1/model.savedmodel
2022-06-14 13:42:37.948934: I tensorflow/cc/saved_model/loader.cc:321] SavedModel load for tags { serve }; Status: success: OK. Took 119736 microseconds.
I0614 13:42:37.949100 5262 model_repository_manager.cc:1152] successfully loaded '3_predicttensorflow' version 1
I0614 13:42:37.954000 5262 python.cc:1903] TRITONBACKEND_ModelInstanceInitialize: 2_plustwoop (GPU device 0)
0614 13:42:39.844126 5355 pb_stub.cc:369] Failed to initialize Python stub: ModuleNotFoundError: No module named 'tests.unit.systems'

At:
(973): _find_and_load_unlocked
(991): _find_and_load
(1014): _gcd_import
(219): _call_with_frames_removed
(961): _find_and_load_unlocked
(991): _find_and_load
(1014): _gcd_import
(219): _call_with_frames_removed
(961): _find_and_load_unlocked
(991): _find_and_load
(1014): _gcd_import
/usr/lib/python3.8/importlib/init.py(127): import_module
/usr/local/lib/python3.8/dist-packages/merlin/systems/dag/op_runner.py(30): init
/tmp/pytest-of-jenkins/pytest-3/test_workflow_tf_e2e_multi_op_1/2_plustwoop/1/model.py(66): initialize

I0614 13:42:39.844329 5262 model_inst_state.hpp:58] Loading TritonPythonModel from module 'merlin.systems.triton.workflow_model'
I0614 13:42:39.853253 5262 model_repository_manager.cc:1152] successfully loaded '1_transformworkflow' version 1
E0614 13:42:39.854278 5262 model_repository_manager.cc:1155] failed to load '2_plustwoop' version 1: Internal: ModuleNotFoundError: No module named 'tests.unit.systems'

At:
(973): _find_and_load_unlocked
(991): _find_and_load
(1014): _gcd_import
(219): _call_with_frames_removed
(961): _find_and_load_unlocked
(991): _find_and_load
(1014): _gcd_import
(219): _call_with_frames_removed
(961): _find_and_load_unlocked
(991): _find_and_load
(1014): _gcd_import
/usr/lib/python3.8/importlib/init.py(127): import_module
/usr/local/lib/python3.8/dist-packages/merlin/systems/dag/op_runner.py(30): init
/tmp/pytest-of-jenkins/pytest-3/test_workflow_tf_e2e_multi_op_1/2_plustwoop/1/model.py(66): initialize

E0614 13:42:39.855506 5262 model_repository_manager.cc:1341] Invalid argument: ensemble 'ensemble_model' depends on '2_plustwoop' which has no loaded version
I0614 13:42:39.855622 5262 server.cc:524]
+------------------+------+
| Repository Agent | Path |
+------------------+------+
+------------------+------+

I0614 13:42:39.856589 5262 server.cc:551]
+------------+-----------------------------------------------------------------+-----------------------------+
| Backend | Path | Config |
+------------+-----------------------------------------------------------------+-----------------------------+
| tensorflow | /opt/tritonserver/backends/tensorflow2/libtriton_tensorflow2.so | {"cmdline":{"version":"2"}} |
| nvtabular | /opt/tritonserver/backends/nvtabular/libtriton_nvtabular.so | {} |
+------------+-----------------------------------------------------------------+-----------------------------+

I0614 13:42:39.856760 5262 server.cc:594]
+---------------------+---------+----------------------------------------------------------------------------------------------------------+
| Model | Version | Status |
+---------------------+---------+----------------------------------------------------------------------------------------------------------+
| 0_transformworkflow | 1 | READY |
| 1_transformworkflow | 1 | READY |
| 2_plustwoop | 1 | UNAVAILABLE: Internal: ModuleNotFoundError: No module named 'tests.unit.systems' |
| | | |
| | | At: |
| | | (973): _find_and_load_unlocked |
| | | (991): _find_and_load |
| | | (1014): _gcd_import |
| | | (219): _call_with_frames_removed |
| | | (961): _find_and_load_unlocked |
| | | (991): _find_and_load |
| | | (1014): _gcd_import |
| | | (219): _call_with_frames_removed |
| | | (961): _find_and_load_unlocked |
| | | (991): _find_and_load |
| | | (1014): _gcd_import |
| | | /usr/lib/python3.8/importlib/init.py(127): import_module |
| | | /usr/local/lib/python3.8/dist-packages/merlin/systems/dag/op_runner.py(30): init |
| | | /tmp/pytest-of-jenkins/pytest-3/test_workflow_tf_e2e_multi_op_1/2_plustwoop/1/model.py(66): initialize |
| 3_predicttensorflow | 1 | READY |
+---------------------+---------+----------------------------------------------------------------------------------------------------------+

I0614 13:42:39.904090 5262 metrics.cc:651] Collecting metrics for GPU 0: Tesla P100-DGXS-16GB
I0614 13:42:39.905750 5262 tritonserver.cc:1962]
+----------------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Option | Value |
+----------------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| server_id | triton |
| server_version | 2.20.0 |
| server_extensions | classification sequence model_repository model_repository(unload_dependents) schedule_policy model_configuration system_shared_memory cuda_shared_memory binary_tensor_data statistics trace |
| model_repository_path[0] | /tmp/pytest-of-jenkins/pytest-3/test_workflow_tf_e2e_multi_op_1 |
| model_control_mode | MODE_NONE |
| strict_model_config | 1 |
| rate_limit | OFF |
| pinned_memory_pool_byte_size | 268435456 |
| cuda_memory_pool_byte_size{0} | 67108864 |
| response_cache_byte_size | 0 |
| min_supported_compute_capability | 6.0 |
| strict_readiness | 1 |
| exit_timeout | 30 |
+----------------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+

I0614 13:42:39.905785 5262 server.cc:252] Waiting for in-flight requests to complete.
I0614 13:42:39.905794 5262 model_repository_manager.cc:1029] unloading: 3_predicttensorflow:1
I0614 13:42:39.905846 5262 model_repository_manager.cc:1029] unloading: 1_transformworkflow:1
I0614 13:42:39.905901 5262 model_repository_manager.cc:1029] unloading: 0_transformworkflow:1
I0614 13:42:39.906027 5262 server.cc:267] Timeout 30: Found 3 live models and 0 in-flight non-inference requests
I0614 13:42:39.906053 5262 tensorflow.cc:2363] TRITONBACKEND_ModelInstanceFinalize: delete instance state
I0614 13:42:39.906055 5262 backend.cc:160] TRITONBACKEND_ModelInstanceFinalize: delete instance state
------------------------------ Captured log call -------------------------------
WARNING absl:signature_serialization.py:146 Function _wrapped_model contains input name(s) name-cat, name-string_plus_2 with unsupported characters which will be renamed to name_cat, name_string_plus_2 in the SavedModel.
WARNING absl:save.py:133 <nvtabular.framework_utils.tensorflow.layers.embedding.DenseFeatures object at 0x7ff9b05ef520> has the same name 'DenseFeatures' as a built-in Keras object. Consider renaming <class 'nvtabular.framework_utils.tensorflow.layers.embedding.DenseFeatures'> to avoid naming conflicts when loading with tf.keras.models.load_model. If renaming is not possible, pass the object in the custom_objects parameter of the load function.
WARNING tensorflow:load.py:167 No training configuration found in save file, so the model was not compiled. Compile it manually.
WARNING absl:signature_serialization.py:146 Function _wrapped_model contains input name(s) name-cat, name-string_plus_2 with unsupported characters which will be renamed to name_cat, name_string_plus_2 in the SavedModel.
WARNING absl:save.py:133 <nvtabular.framework_utils.tensorflow.layers.embedding.DenseFeatures object at 0x7ff9b05ef520> has the same name 'DenseFeatures' as a built-in Keras object. Consider renaming <class 'nvtabular.framework_utils.tensorflow.layers.embedding.DenseFeatures'> to avoid naming conflicts when loading with tf.keras.models.load_model. If renaming is not possible, pass the object in the custom_objects parameter of the load function.
WARNING tensorflow:load.py:167 No training configuration found in save file, so the model was not compiled. Compile it manually.
=============================== warnings summary ===============================
../../../.local/lib/python3.8/site-packages/nvtabular/framework_utils/init.py:18
/var/jenkins_home/.local/lib/python3.8/site-packages/nvtabular/framework_utils/init.py:18: DeprecationWarning: The nvtabular.framework_utils module is being replaced by the Merlin Models library. Support for importing from nvtabular.framework_utils is deprecated, and will be removed in a future version. Please consider using the models and layers from Merlin Models instead.
warnings.warn(

tests/unit/systems/test_ensemble.py: 7 warnings
tests/unit/systems/test_export.py: 1 warning
tests/unit/systems/test_inference_ops.py: 2 warnings
tests/unit/systems/test_op_runner.py: 4 warnings
/usr/local/lib/python3.8/dist-packages/cudf/core/dataframe.py:1292: UserWarning: The deep parameter is ignored and is only included for pandas compatibility.
warnings.warn(

tests/unit/systems/test_export.py::test_export_run_ensemble_triton[tensorflow-parquet]
/var/jenkins_home/workspace/merlin_systems/systems/merlin/systems/triton/export.py:304: UserWarning: Column x is being generated by NVTabular workflow but is unused in test_name_tf model
warnings.warn(

tests/unit/systems/test_export.py::test_export_run_ensemble_triton[tensorflow-parquet]
/var/jenkins_home/workspace/merlin_systems/systems/merlin/systems/triton/export.py:304: UserWarning: Column y is being generated by NVTabular workflow but is unused in test_name_tf model
warnings.warn(

tests/unit/systems/test_export.py::test_export_run_ensemble_triton[tensorflow-parquet]
/var/jenkins_home/workspace/merlin_systems/systems/merlin/systems/triton/export.py:304: UserWarning: Column id is being generated by NVTabular workflow but is unused in test_name_tf model
warnings.warn(

-- Docs: https://docs.pytest.org/en/stable/how-to/capture-warnings.html
=========================== short test summary info ============================
FAILED tests/unit/systems/test_ensemble.py::test_workflow_tf_e2e_multi_op_plus_2_run[parquet]
======= 1 failed, 17 passed, 2 skipped, 18 warnings in 71.03s (0:01:11) ========
Build step 'Execute shell' marked build as failure
Performing Post build task...
Match found for : : True
Logical operation result is TRUE
Running script : #!/bin/bash
cd /var/jenkins_home/
CUDA_VISIBLE_DEVICES=1 python test_res_push.py "https://api.GitHub.com/repos/NVIDIA-Merlin/systems/issues/$ghprbPullId/comments" "/var/jenkins_home/jobs/$JOB_NAME/builds/$BUILD_NUMBER/log"
[merlin_systems] $ /bin/bash /tmp/jenkins1877984556786968701.sh

@nvidia-merlin-bot
Copy link

Click to view CI Results
GitHub pull request #118 of commit 816abb74d0231d0a8f9ced62f48a2b3017ee5ded, no merge conflicts.
Running as SYSTEM
Setting status of 816abb74d0231d0a8f9ced62f48a2b3017ee5ded to PENDING with url https://10.20.13.93:8080/job/merlin_systems/87/console and message: 'Pending'
Using context: Jenkins
Building on master in workspace /var/jenkins_home/workspace/merlin_systems
using credential fce1c729-5d7c-48e8-90cb-b0c314b1076e
 > git rev-parse --is-inside-work-tree # timeout=10
Fetching changes from the remote Git repository
 > git config remote.origin.url https://github.com/NVIDIA-Merlin/systems # timeout=10
Fetching upstream changes from https://github.com/NVIDIA-Merlin/systems
 > git --version # timeout=10
using GIT_ASKPASS to set credentials login for merlin-systems user + githubtoken
 > git fetch --tags --force --progress -- https://github.com/NVIDIA-Merlin/systems +refs/pull/118/*:refs/remotes/origin/pr/118/* # timeout=10
 > git rev-parse 816abb74d0231d0a8f9ced62f48a2b3017ee5ded^{commit} # timeout=10
Checking out Revision 816abb74d0231d0a8f9ced62f48a2b3017ee5ded (detached)
 > git config core.sparsecheckout # timeout=10
 > git checkout -f 816abb74d0231d0a8f9ced62f48a2b3017ee5ded # timeout=10
Commit message: "Merge branch 'main' into ops-fil-wrapper"
 > git rev-list --no-walk 93dfe203bf03de2d69ce42c488562e7e3107a367 # timeout=10
[merlin_systems] $ /bin/bash /tmp/jenkins10462710633549038945.sh
============================= test session starts ==============================
platform linux -- Python 3.8.10, pytest-7.1.2, pluggy-1.0.0
rootdir: /var/jenkins_home/workspace/merlin_systems/systems, configfile: pyproject.toml
plugins: anyio-3.5.0, xdist-2.5.0, forked-1.4.0, cov-3.0.0
collected 17 items / 2 skipped

tests/unit/test_version.py . [ 5%]
tests/unit/systems/test_ensemble.py ... [ 23%]
tests/unit/systems/test_ensemble_ops.py .. [ 35%]
tests/unit/systems/test_export.py . [ 41%]
tests/unit/systems/test_graph.py . [ 47%]
tests/unit/systems/test_inference_ops.py .. [ 58%]
tests/unit/systems/test_op_runner.py .... [ 82%]
tests/unit/systems/test_tensorflow_inf_op.py ... [100%]

=============================== warnings summary ===============================
../../../.local/lib/python3.8/site-packages/nvtabular/framework_utils/init.py:18
/var/jenkins_home/.local/lib/python3.8/site-packages/nvtabular/framework_utils/init.py:18: DeprecationWarning: The nvtabular.framework_utils module is being replaced by the Merlin Models library. Support for importing from nvtabular.framework_utils is deprecated, and will be removed in a future version. Please consider using the models and layers from Merlin Models instead.
warnings.warn(

tests/unit/systems/test_ensemble.py: 4 warnings
tests/unit/systems/test_export.py: 1 warning
tests/unit/systems/test_inference_ops.py: 2 warnings
tests/unit/systems/test_op_runner.py: 4 warnings
/usr/local/lib/python3.8/dist-packages/cudf/core/dataframe.py:1292: UserWarning: The deep parameter is ignored and is only included for pandas compatibility.
warnings.warn(

tests/unit/systems/test_export.py::test_export_run_ensemble_triton[tensorflow-parquet]
/var/jenkins_home/workspace/merlin_systems/systems/merlin/systems/triton/export.py:304: UserWarning: Column x is being generated by NVTabular workflow but is unused in test_name_tf model
warnings.warn(

tests/unit/systems/test_export.py::test_export_run_ensemble_triton[tensorflow-parquet]
/var/jenkins_home/workspace/merlin_systems/systems/merlin/systems/triton/export.py:304: UserWarning: Column y is being generated by NVTabular workflow but is unused in test_name_tf model
warnings.warn(

tests/unit/systems/test_export.py::test_export_run_ensemble_triton[tensorflow-parquet]
/var/jenkins_home/workspace/merlin_systems/systems/merlin/systems/triton/export.py:304: UserWarning: Column id is being generated by NVTabular workflow but is unused in test_name_tf model
warnings.warn(

-- Docs: https://docs.pytest.org/en/stable/how-to/capture-warnings.html
============ 17 passed, 2 skipped, 15 warnings in 62.05s (0:01:02) =============
Performing Post build task...
Match found for : : True
Logical operation result is TRUE
Running script : #!/bin/bash
cd /var/jenkins_home/
CUDA_VISIBLE_DEVICES=1 python test_res_push.py "https://api.GitHub.com/repos/NVIDIA-Merlin/systems/issues/$ghprbPullId/comments" "/var/jenkins_home/jobs/$JOB_NAME/builds/$BUILD_NUMBER/log"
[merlin_systems] $ /bin/bash /tmp/jenkins10579017086969993451.sh

@nvidia-merlin-bot
Copy link

Click to view CI Results
GitHub pull request #118 of commit eaf7e3253cdb74f57a417c14ff7642a272ba182a, no merge conflicts.
Running as SYSTEM
Setting status of eaf7e3253cdb74f57a417c14ff7642a272ba182a to PENDING with url https://10.20.13.93:8080/job/merlin_systems/88/console and message: 'Pending'
Using context: Jenkins
Building on master in workspace /var/jenkins_home/workspace/merlin_systems
using credential fce1c729-5d7c-48e8-90cb-b0c314b1076e
 > git rev-parse --is-inside-work-tree # timeout=10
Fetching changes from the remote Git repository
 > git config remote.origin.url https://github.com/NVIDIA-Merlin/systems # timeout=10
Fetching upstream changes from https://github.com/NVIDIA-Merlin/systems
 > git --version # timeout=10
using GIT_ASKPASS to set credentials login for merlin-systems user + githubtoken
 > git fetch --tags --force --progress -- https://github.com/NVIDIA-Merlin/systems +refs/pull/118/*:refs/remotes/origin/pr/118/* # timeout=10
 > git rev-parse eaf7e3253cdb74f57a417c14ff7642a272ba182a^{commit} # timeout=10
Checking out Revision eaf7e3253cdb74f57a417c14ff7642a272ba182a (detached)
 > git config core.sparsecheckout # timeout=10
 > git checkout -f eaf7e3253cdb74f57a417c14ff7642a272ba182a # timeout=10
Commit message: "Merge branch 'main' into ops-fil-wrapper"
 > git rev-list --no-walk 816abb74d0231d0a8f9ced62f48a2b3017ee5ded # timeout=10
[merlin_systems] $ /bin/bash /tmp/jenkins18123301131867031504.sh
============================= test session starts ==============================
platform linux -- Python 3.8.10, pytest-7.1.2, pluggy-1.0.0
rootdir: /var/jenkins_home/workspace/merlin_systems/systems, configfile: pyproject.toml
plugins: anyio-3.5.0, xdist-2.5.0, forked-1.4.0, cov-3.0.0
collected 17 items / 2 skipped

tests/unit/test_version.py . [ 5%]
tests/unit/systems/test_ensemble.py ... [ 23%]
tests/unit/systems/test_ensemble_ops.py .. [ 35%]
tests/unit/systems/test_export.py . [ 41%]
tests/unit/systems/test_graph.py . [ 47%]
tests/unit/systems/test_inference_ops.py .. [ 58%]
tests/unit/systems/test_op_runner.py .... [ 82%]
tests/unit/systems/test_tensorflow_inf_op.py ... [100%]

=============================== warnings summary ===============================
../../../.local/lib/python3.8/site-packages/nvtabular/framework_utils/init.py:18
/var/jenkins_home/.local/lib/python3.8/site-packages/nvtabular/framework_utils/init.py:18: DeprecationWarning: The nvtabular.framework_utils module is being replaced by the Merlin Models library. Support for importing from nvtabular.framework_utils is deprecated, and will be removed in a future version. Please consider using the models and layers from Merlin Models instead.
warnings.warn(

tests/unit/systems/test_ensemble.py: 4 warnings
tests/unit/systems/test_export.py: 1 warning
tests/unit/systems/test_inference_ops.py: 2 warnings
tests/unit/systems/test_op_runner.py: 4 warnings
/usr/local/lib/python3.8/dist-packages/cudf/core/dataframe.py:1292: UserWarning: The deep parameter is ignored and is only included for pandas compatibility.
warnings.warn(

tests/unit/systems/test_export.py::test_export_run_ensemble_triton[tensorflow-parquet]
/var/jenkins_home/workspace/merlin_systems/systems/merlin/systems/triton/export.py:304: UserWarning: Column x is being generated by NVTabular workflow but is unused in test_name_tf model
warnings.warn(

tests/unit/systems/test_export.py::test_export_run_ensemble_triton[tensorflow-parquet]
/var/jenkins_home/workspace/merlin_systems/systems/merlin/systems/triton/export.py:304: UserWarning: Column y is being generated by NVTabular workflow but is unused in test_name_tf model
warnings.warn(

tests/unit/systems/test_export.py::test_export_run_ensemble_triton[tensorflow-parquet]
/var/jenkins_home/workspace/merlin_systems/systems/merlin/systems/triton/export.py:304: UserWarning: Column id is being generated by NVTabular workflow but is unused in test_name_tf model
warnings.warn(

-- Docs: https://docs.pytest.org/en/stable/how-to/capture-warnings.html
============ 17 passed, 2 skipped, 15 warnings in 68.31s (0:01:08) =============
Performing Post build task...
Match found for : : True
Logical operation result is TRUE
Running script : #!/bin/bash
cd /var/jenkins_home/
CUDA_VISIBLE_DEVICES=1 python test_res_push.py "https://api.GitHub.com/repos/NVIDIA-Merlin/systems/issues/$ghprbPullId/comments" "/var/jenkins_home/jobs/$JOB_NAME/builds/$BUILD_NUMBER/log"
[merlin_systems] $ /bin/bash /tmp/jenkins5243527176623995237.sh

@nvidia-merlin-bot
Copy link

Click to view CI Results
GitHub pull request #118 of commit 96e6d80c37f296f7c8ef59118fcc679d9dc3ec8a, no merge conflicts.
Running as SYSTEM
Setting status of 96e6d80c37f296f7c8ef59118fcc679d9dc3ec8a to PENDING with url https://10.20.13.93:8080/job/merlin_systems/89/console and message: 'Pending'
Using context: Jenkins
Building on master in workspace /var/jenkins_home/workspace/merlin_systems
using credential fce1c729-5d7c-48e8-90cb-b0c314b1076e
 > git rev-parse --is-inside-work-tree # timeout=10
Fetching changes from the remote Git repository
 > git config remote.origin.url https://github.com/NVIDIA-Merlin/systems # timeout=10
Fetching upstream changes from https://github.com/NVIDIA-Merlin/systems
 > git --version # timeout=10
using GIT_ASKPASS to set credentials login for merlin-systems user + githubtoken
 > git fetch --tags --force --progress -- https://github.com/NVIDIA-Merlin/systems +refs/pull/118/*:refs/remotes/origin/pr/118/* # timeout=10
 > git rev-parse 96e6d80c37f296f7c8ef59118fcc679d9dc3ec8a^{commit} # timeout=10
Checking out Revision 96e6d80c37f296f7c8ef59118fcc679d9dc3ec8a (detached)
 > git config core.sparsecheckout # timeout=10
 > git checkout -f 96e6d80c37f296f7c8ef59118fcc679d9dc3ec8a # timeout=10
Commit message: "Correct backend assertion for transform op in forest ensemble test"
 > git rev-list --no-walk eaf7e3253cdb74f57a417c14ff7642a272ba182a # timeout=10
[merlin_systems] $ /bin/bash /tmp/jenkins16938884123612280413.sh
============================= test session starts ==============================
platform linux -- Python 3.8.10, pytest-7.1.2, pluggy-1.0.0
rootdir: /var/jenkins_home/workspace/merlin_systems/systems, configfile: pyproject.toml
plugins: anyio-3.5.0, xdist-2.5.0, forked-1.4.0, cov-3.0.0
collected 17 items / 2 skipped

tests/unit/test_version.py . [ 5%]
tests/unit/systems/test_ensemble.py ... [ 23%]
tests/unit/systems/test_ensemble_ops.py .. [ 35%]
tests/unit/systems/test_export.py . [ 41%]
tests/unit/systems/test_graph.py . [ 47%]
tests/unit/systems/test_inference_ops.py .. [ 58%]
tests/unit/systems/test_op_runner.py .... [ 82%]
tests/unit/systems/test_tensorflow_inf_op.py ... [100%]

=============================== warnings summary ===============================
../../../.local/lib/python3.8/site-packages/nvtabular/framework_utils/init.py:18
/var/jenkins_home/.local/lib/python3.8/site-packages/nvtabular/framework_utils/init.py:18: DeprecationWarning: The nvtabular.framework_utils module is being replaced by the Merlin Models library. Support for importing from nvtabular.framework_utils is deprecated, and will be removed in a future version. Please consider using the models and layers from Merlin Models instead.
warnings.warn(

tests/unit/systems/test_ensemble.py: 4 warnings
tests/unit/systems/test_export.py: 1 warning
tests/unit/systems/test_inference_ops.py: 2 warnings
tests/unit/systems/test_op_runner.py: 4 warnings
/usr/local/lib/python3.8/dist-packages/cudf/core/dataframe.py:1292: UserWarning: The deep parameter is ignored and is only included for pandas compatibility.
warnings.warn(

tests/unit/systems/test_export.py::test_export_run_ensemble_triton[tensorflow-parquet]
/var/jenkins_home/workspace/merlin_systems/systems/merlin/systems/triton/export.py:304: UserWarning: Column x is being generated by NVTabular workflow but is unused in test_name_tf model
warnings.warn(

tests/unit/systems/test_export.py::test_export_run_ensemble_triton[tensorflow-parquet]
/var/jenkins_home/workspace/merlin_systems/systems/merlin/systems/triton/export.py:304: UserWarning: Column y is being generated by NVTabular workflow but is unused in test_name_tf model
warnings.warn(

tests/unit/systems/test_export.py::test_export_run_ensemble_triton[tensorflow-parquet]
/var/jenkins_home/workspace/merlin_systems/systems/merlin/systems/triton/export.py:304: UserWarning: Column id is being generated by NVTabular workflow but is unused in test_name_tf model
warnings.warn(

-- Docs: https://docs.pytest.org/en/stable/how-to/capture-warnings.html
============ 17 passed, 2 skipped, 15 warnings in 67.46s (0:01:07) =============
Performing Post build task...
Match found for : : True
Logical operation result is TRUE
Running script : #!/bin/bash
cd /var/jenkins_home/
CUDA_VISIBLE_DEVICES=1 python test_res_push.py "https://api.GitHub.com/repos/NVIDIA-Merlin/systems/issues/$ghprbPullId/comments" "/var/jenkins_home/jobs/$JOB_NAME/builds/$BUILD_NUMBER/log"
[merlin_systems] $ /bin/bash /tmp/jenkins1429528439706231767.sh

@bschifferer
Copy link
Contributor

@oliverholworthy what is feature_names in the example above?

ops = feature_names >> PredictForest(model, input_schema)

We have input_schema and feature_names. Is this redundant? What is the difference?

@radekosmulski
Copy link
Contributor

I would like to second what @bschifferer raised -- passing the schema into PredictForest is confusing especially as this is not what we are doing in the case of PredictTensorflow. I realize that this is just a single argument that is different but would be good to aim for consistency.

@oliverholworthy I am working on #136 and ran into some issues. To figure out what is going on, I grabbed the code for test_ensemble from tests/unit/systems/fil/test_forest.py and ran it on my machine to export the ensemble as follows:

import numpy as np
import pandas as pd
import pytest
import sklearn.datasets
import xgboost
from google.protobuf import text_format

from merlin.dag import ColumnSelector
from merlin.io import Dataset
from merlin.schema import ColumnSchema, Schema
from merlin.systems.dag.ensemble import Ensemble
from merlin.systems.dag.ops.fil import PredictForest
from merlin.systems.dag.ops.workflow import TransformWorkflow
from nvtabular import Workflow
from nvtabular import ops as wf_ops

rows = 200
num_features = 16
X, y = sklearn.datasets.make_regression(
    n_samples=rows,
    n_features=num_features,
    n_informative=num_features // 3,
    random_state=0,
)
feature_names = [str(i) for i in range(num_features)]
df = pd.DataFrame(X, columns=feature_names)
dataset = Dataset(df)

# Fit GBDT Model
model = xgboost.XGBRegressor()
model.fit(X, y)

input_schema = Schema([ColumnSchema(col, dtype=np.float32) for col in feature_names])
selector = ColumnSelector(feature_names)

workflow_ops = ["0", "1", "2"] >> wf_ops.LogOp()
workflow = Workflow(workflow_ops)
workflow.fit(dataset)

triton_chain = selector >> TransformWorkflow(workflow) >> PredictForest(model, input_schema)

triton_ens = Ensemble(triton_chain, input_schema)

triton_ens.export('sklearn_model')

I then attempt to start the triton inference with tritonserver --model-repository=sklearn_model and it fails to start with the following output

I0718 03:38:56.091352 2215 pinned_memory_manager.cc:240] Pinned memory pool is created at '0x7f9b52000000' with size 268435456
I0718 03:38:56.091664 2215 cuda_memory_manager.cc:105] CUDA memory pool is created on device 0 with size 67108864
I0718 03:38:56.093551 2215 model_repository_manager.cc:1191] loading: 0_transformworkflow:1
I0718 03:38:56.193687 2215 model_repository_manager.cc:1191] loading: 1_predictforest:1
I0718 03:38:56.197167 2215 python.cc:2388] TRITONBACKEND_ModelInstanceInitialize: 0_transformworkflow (GPU device 0)
I0718 03:38:56.293836 2215 model_repository_manager.cc:1191] loading: 1_fil:1
I0718 03:38:57.550458 2215 model_repository_manager.cc:1345] successfully loaded '0_transformworkflow' version 1
I0718 03:38:57.557949 2215 initialize.hpp:43] TRITONBACKEND_Initialize: fil
I0718 03:38:57.557963 2215 backend.hpp:47] Triton TRITONBACKEND API version: 1.9
I0718 03:38:57.557966 2215 backend.hpp:52] 'fil' TRITONBACKEND API version: 1.9
I0718 03:38:57.558221 2215 python.cc:2388] TRITONBACKEND_ModelInstanceInitialize: 1_predictforest (GPU device 0)
I0718 03:38:59.057032 2215 model_initialize.hpp:37] TRITONBACKEND_ModelInitialize: 1_fil (version 1)
I0718 03:38:59.057153 2215 model_repository_manager.cc:1345] successfully loaded '1_predictforest' version 1
I0718 03:38:59.058240 2215 instance_initialize.hpp:46] TRITONBACKEND_ModelInstanceInitialize: 1_fil_0 (GPU device 0)
I0718 03:38:59.064649 2215 model_repository_manager.cc:1345] successfully loaded '1_fil' version 1
E0718 03:38:59.064744 2215 model_repository_manager.cc:1551] Invalid argument: in ensemble ensemble_model, ensemble tensor 10 is mapping to non-existing input 10 in model 0_transformworkflow
I0718 03:38:59.064782 2215 server.cc:556] 
+------------------+------+
| Repository Agent | Path |
+------------------+------+
+------------------+------+

I0718 03:38:59.064852 2215 server.cc:583] 
+---------+---------------------------------+---------------------------------+
| Backend | Path                            | Config                          |
+---------+---------------------------------+---------------------------------+
| python  | /opt/tritonserver/backends/pyth | {"cmdline":{"auto-complete-conf |
|         | on/libtriton_python.so          | ig":"false","min-compute-capabi |
|         |                                 | lity":"6.000000","backend-direc |
|         |                                 | tory":"/opt/tritonserver/backen |
|         |                                 | ds","default-max-batch-size":"4 |
|         |                                 | "}}                             |
|         |                                 |                                 |
| fil     | /opt/tritonserver/backends/fil/ | {"cmdline":{"auto-complete-conf |
|         | libtriton_fil.so                | ig":"false","min-compute-capabi |
|         |                                 | lity":"6.000000","backend-direc |
|         |                                 | tory":"/opt/tritonserver/backen |
|         |                                 | ds","default-max-batch-size":"4 |
|         |                                 | "}}                             |
|         |                                 |                                 |
+---------+---------------------------------+---------------------------------+

I0718 03:38:59.064896 2215 server.cc:626] 
+---------------------+---------+--------+
| Model               | Version | Status |
+---------------------+---------+--------+
| 0_transformworkflow | 1       | READY  |
| 1_fil               | 1       | READY  |
| 1_predictforest     | 1       | READY  |
+---------------------+---------+--------+

I0718 03:38:59.088547 2215 metrics.cc:650] Collecting metrics for GPU 0: Quadro RTX 8000
I0718 03:38:59.088832 2215 tritonserver.cc:2138] 
+----------------------------------+------------------------------------------+
| Option                           | Value                                    |
+----------------------------------+------------------------------------------+
| server_id                        | triton                                   |
| server_version                   | 2.22.0                                   |
| server_extensions                | classification sequence model_repository |
|                                  |  model_repository(unload_dependents) sch |
|                                  | edule_policy model_configuration system_ |
|                                  | shared_memory cuda_shared_memory binary_ |
|                                  | tensor_data statistics trace             |
| model_repository_path[0]         | sklearn_model                            |
| model_control_mode               | MODE_NONE                                |
| strict_model_config              | 1                                        |
| rate_limit                       | OFF                                      |
| pinned_memory_pool_byte_size     | 268435456                                |
| cuda_memory_pool_byte_size{0}    | 67108864                                 |
| response_cache_byte_size         | 0                                        |
| min_supported_compute_capability | 6.0                                      |
| strict_readiness                 | 1                                        |
| exit_timeout                     | 30                                       |
+----------------------------------+------------------------------------------+

I0718 03:38:59.088857 2215 server.cc:257] Waiting for in-flight requests to complete.
I0718 03:38:59.088866 2215 server.cc:273] Timeout 30: Found 0 model versions that have in-flight inferences
I0718 03:38:59.088875 2215 model_repository_manager.cc:1223] unloading: 1_fil:1
I0718 03:38:59.088911 2215 model_repository_manager.cc:1223] unloading: 1_predictforest:1
I0718 03:38:59.088953 2215 model_repository_manager.cc:1223] unloading: 0_transformworkflow:1
I0718 03:38:59.088987 2215 instance_finalize.hpp:36] TRITONBACKEND_ModelInstanceFinalize: delete instance state
I0718 03:38:59.089046 2215 server.cc:288] All models are stopped, unloading models
I0718 03:38:59.089058 2215 server.cc:295] Timeout 30: Found 3 live models and 0 in-flight non-inference requests
I0718 03:38:59.089112 2215 model_finalize.hpp:36] TRITONBACKEND_ModelFinalize: delete model state
I0718 03:38:59.089135 2215 model_repository_manager.cc:1328] successfully unloaded '1_fil' version 1
I0718 03:39:00.089134 2215 server.cc:295] Timeout 29: Found 2 live models and 0 in-flight non-inference requests
I0718 03:39:00.339623 2215 model_repository_manager.cc:1328] successfully unloaded '0_transformworkflow' version 1
I0718 03:39:00.548048 2215 model_repository_manager.cc:1328] successfully unloaded '1_predictforest' version 1
I0718 03:39:01.089222 2215 server.cc:295] Timeout 28: Found 0 live models and 0 in-flight non-inference requests
error: creating server: Internal - failed to load all models

Could I please ask you for help? Do you know what might be going on there?

@radekosmulski
Copy link
Contributor

I was able to figure out what is going on there, no further assistance on this would be needed.

@oliverholworthy
Copy link
Member Author

passing the schema into PredictForest is confusing especially as this is not what we are doing in the case of PredictTensorflow. I realize that this is just a single argument that is different but would be good to aim for consistency.

This may have been discussed in Slack. The reason the input schema is required as an argument to the operator is because the model objects don't have enough information to determine the schema required to setup the operator correctly. Tensorflow models have a way to inspect the signature including names and dtypes of the inputs expected.

There was some discussion about coming up with a serialization format that will allow us to figure this out from any model (built with Merlin Models) NVIDIA-Merlin/Merlin#489

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants