[FEA] Model inference locally #359

Ahanmr · 2022-01-13T13:05:12Z

🚀 Feature request

Wanted to check if it is currently possible to perform model inference locally in place of setting up Triton Server. If yes, where is the documentation available currently for the same?

Motivation

Currently, whenever I build a new model, I have to deploy it on Triton Server, but just to check the model performance of a batch of 10-20 sessions for testing, wanted to check if there was a way to see results by inferencing locally.
Secondly, to setup triton server, there's a need to download the docker image and setup the nvcr.io/nvidia/merlin/merlin-inference:21.09 image. But is there a work around to inference locally?

The text was updated successfully, but these errors were encountered:

rnyak · 2022-01-18T19:35:46Z

@Ahanmr Do you mean model accuracy with model performance? If yes, you need to use trainer.evaluate(metric_key_prefix='eval') for PyTorch model to evaluate your model's prediction power on validation set, first (see this example). If you want to do final prediction on a test set, you can simply do the following after model training:

model.eval()
model(input_batch_test_set)

and for Tensorflow model you can do model.predict(input_batch_test_set).

you don't need inference server for that, if that's your question?

Ahanmr · 2022-01-27T17:06:47Z

@Ahanmr Do you mean model accuracy with model performance? If yes, you need to use trainer.evaluate(metric_key_prefix='eval') for PyTorch model to evaluate your model's prediction power on validation set, first (see this example). If you want to do final prediction on a test set, you can simply do the following after model training:
model.eval()
model(input_batch_test_set)
and for Tensorflow model you can do model.predict(input_batch_test_set).

you don't need inference server for that, if that's your question?

So, the following line of code helped me get the metrics for the test_set, but I wanted to clarify if I'm doing this correctly once, so that I can go ahead and deploy this.

eval_metrics = trainer.evaluate(eval_dataset=eval_data_paths, metric_key_prefix='eval')
for key in sorted(eval_metrics.keys()):
    print("  %s = %s" % (key, str(eval_metrics[key])))

After this, to get the predictions on a batch of product_id sessions, I do the following:

eval_dataload = get_dataloader(eval_data_paths, train_args.per_device_eval_batch_size)
batch = next(iter(eval_dataload))

and batch looks like this:

{'product_id-list_seq': tensor([[   642,    642,    642,      0,      0,      0,      0,      0,      0,
               0,      0,      0,      0,      0,      0,      0,      0,      0,
               0,      0],
         [  2962,   2962,   2962,      0,      0,      0,      0,      0,      0,
               0,      0,      0,      0,      0,      0,      0,      0,      0,
               0,      0],
         [ 12252,  10221,  18489,  20159,  31245,  10221,  18885,  12252,  18885,
           12252,  18885,  12252,   2886,  20159,   2886,  20159,   2886,  31245,
           78571,  10221],
         [  5005,   1839,   3112,  15067,   3112,    455,   3503,      0,      0,
               0,      0,      0,      0,      0,      0,      0,      0,      0,
               0,      0],
         ...
         [  6589,   1881,    185,      0,      0,      0,      0,      0,      0,
               0,      0,      0,      0,      0,      0,      0,      0,      0,
               0,      0]], device='cuda:0')}

then finally, on running, model(batch), I got the following output dictionary, just wanted to clarify, what are labels here? Is this the target or the predicted output?

output = model(batch)

{'loss': tensor(12.9607, device='cuda:0', grad_fn=<NllLossBackward0>),
 'labels': tensor([   642,    642,   2962,   2962,  10221,  18489,  20159,  31245,  10221,
          18885,  12252,  18885,  12252,  18885,  12252,   2886,  20159,   2886,
          20159,   2886,  31245,  78571,  10221,   1839,   3112,  15067,   3112,
            455,   3503,  27648,  33940, 371869,   1693,   3809,   1693,   1152,
          39043,  20117, 162305,  42138,   1976,  11139,   1920,    746,    746,
            322,   3780,   3121, 185678,   3121,   3780,   3121,   3780,   3121,
           3121,  73064,  73064,  73064, 419261, 147073,  95499, 175334,  59296,
           1497,   2602,   2602, 462300,  50554,   7634,  50554,  74959,  50554,
         225943,   3525,   1264,    501,   1264,     79,   6027,   6027,  31012,
             21,      6,    306,   4363,  14283,  30137,   4363,  30137, 224529,
         143820, 224522,    925,    576,   1902,  17982,  25200, 162768, 162772,
         203166, 162772, 162768,  85980,  62426,   1881,    185],
        device='cuda:0'),
 'predictions': tensor([[ -40.8026,  -41.2528,  -13.0302,  ...,  -41.4139,  -41.3386,
           -41.0184],
         [ -40.8026,  -41.2528,  -13.0302,  ...,  -41.4139,  -41.3386,
           -41.0184],
         [ -57.8077,  -57.5196,  -21.6697,  ...,  -58.0113,  -55.8364,
           -57.8259],
         ...,
         [ -16.0957,  -16.1184,  -10.7048,  ...,  -16.0890,  -16.1076,
           -16.0958],
         [-103.6853, -102.6834,  -13.4621,  ..., -100.6037, -104.4472,
          -104.3079],
         [ -70.6631,  -70.4196,   -5.5336,  ...,  -71.1709,  -70.4627,
           -70.9110]], device='cuda:0', grad_fn=<LogSoftmaxBackward0>),
 'pred_metadata': {},
 'model_outputs': []}

So, I extracted output['predictions'], but this is a logit each of size of the original catalog. So, is there a direct function to get the reverse mapping of the product_id mapped to the top-n maximum values from this tensor. Am I correct? So if I find the top-n maximum values from this tensor, those would be products with highest score? Wanted to check is there a way to get the original product_ids from the dataframe by converting this tensor directly to a beautified output like shown in triton server notebook?

SaadAhmed433 · 2022-03-17T08:10:32Z

I am also confused about this. Is there a way to get sorted predictions for each session with their labels (predicted output)?

rnyak · 2022-03-17T13:57:01Z

@SaadAhmed433 The model generates predicted logit values and you need to convert them to the recommended item_ids. See this example how we do it with a util function (cell 23).

Hope that helps.

SaadAhmed433 · 2022-03-17T14:35:42Z

Yes I have been looking at this notebook for sometime now and after a bit of experimentation I was able to make it work for my use case. Hopefully the implementation is correct @rnyak

test_path = ['./data/sessions_by_day_mind/6/test.parquet']
trainer.eval_dataloader = get_dataloader(test_path, train_args.per_device_eval_batch_size)

#batch 
batch = next(iter(trainer.get_eval_dataloader()))

The batch contains 32 sessions.

model.eval()

model(batch, training=False)

response =  model(batch, training=False)["predictions"]

Stores the prediction logits in response variable above

Now I load the data that I have pre-processed

df = cudf.read_parquet("./data/sessions_by_day_mind/6/test.parquet")
df = df.iloc[0:32,:] # 32 sessions selected in the dataframe too correspond to the ones selected in the batch

Made a very slight change to the util function because I am not using the tritonclient inference server

def visualize_response(batch, response, top_k, session_col="session_id"):
    """
    Util function to extract top-k encoded item-ids from logits

    Parameters
    ----------
    batch : cudf.DataFrame
        the batch of raw data sent to triton server.
    response: prediction logits in a tensor
    top_k: int
        the `top_k` top items to retrieve from predictions.
    """
    sessions = batch[session_col].drop_duplicates().values
    predictions = response.cpu().detach().numpy() #changed from as_numpy to numoy()
    top_preds = np.argpartition(predictions, -top_k, axis=1)[:, -top_k:]
    for session, next_items in zip(sessions, top_preds):
        print(
            "- Top-%s predictions for session `%s`: %s\n"
            % (top_k, session, " || ".join([str(e) for e in next_items]))
        )

Finally

visualize_response(df, response , top_k=5, session_col='session_id')

Output:

kontrabas380 · 2022-04-27T16:18:10Z

@SaadAhmed433 how did you define the method "get_dataloader"?

SaadAhmed433 · 2022-04-28T10:44:55Z

@kontrabas380 I followed the tutorial example given in this notebook

cchudzian · 2022-05-06T16:07:51Z

Hi All,

First of all, thank you for sharing this awesome code with us!
Hope you don't mind me adding my two cents to the very interesting thread.

I'm afraid that there's an issue with the local inference as explained above, i.e.

model.eval()
model(input_batch_test_set)

The masking layer if retained in a trained model and not disabled (I'm not sure there's a easy way to turn it off / remove), then at inference it will be replacing an embedding of the last element of every sequence with a learnable masking embedding. To my understanding this effectively makes the model ignore the last, more recently consumed item.

lightsailpro · 2022-06-22T19:06:43Z

@SaadAhmed433: About the local prediction, in your code: "batch = next(iter(trainer.get_eval_dataloader()))" only returns 32 sessions. Do you know how to loop through the whole file to get all sessions? Does trainer.eval_dataloader remember last batch read position and we just keep calling next until batch is null, or "next(iter(" just randomly return 32 records? Thanks

test_path = ['./data/sessions_by_day_mind/6/test.parquet']
trainer.eval_dataloader = get_dataloader(test_path, train_args.per_device_eval_batch_size)

#batch
batch = next(iter(trainer.get_eval_dataloader()))

SaadAhmed433 · 2022-06-24T14:00:00Z

Yes you can keep calling the next and you will get the next batch in sequence, it will remember where it ended and will start from the new position.

Here is a small snippet of code

test_path = ['./data/test.parquet']
test_loader = get_dataloader(test_path )

#create iterable batch 
iterable_batch = iter(test_loader)

Now you loop over the length of the test_loader and apply the next method

for j in range(len(test_loader)):
    batch = next(iterable_batch)
    
    with torch.no_grad():
        response = model_p(batch, training=False)["predictions"]

hosseinkalbasi · 2022-08-07T10:44:40Z

@SaadAhmed433 thanks for the detailed explanation. On the last section, the visualize_response function, what are those numbers in front of the session_id in the result? For example, for session 17 you have a recommendation of 230. Is that a session ID, item ID, or an index? Could you finish this last part of your example?

hosseinkalbasi · 2022-08-08T16:09:57Z

@rnyak Thanks for the explanation. Are you sure the printed predictions of this example are Item IDs? Or are they indices of Items? I searched in the initial dataset and cannot find some this numbers as an item id. I wish the example would go one extra step to print exactly the recommendation from the initial dataset.

rnyak · 2022-08-08T18:22:50Z

@hosseinkalbasi These are categorified (encoded) item ids. you need to convert/map them back to original item ids.

in order to do that, you need to find the categories folder where you can see the parquet file unique.item_id.parquet including values for encoded item_id --> original item_id. Just read in this parquet file, and you will see the index column is the encoded item_id, and the item_id column is the original id. From there you can do the mapping yourself with a simple custom code.

hosseinkalbasi · 2022-08-08T20:41:56Z

Thank you @rnyak for the detailed response, interesting! I understand now!
Unfortunately, I still cannot find the connection between the item_id and encoded item id. I unpacked the unique.item_id.parquet as you specified. The encoded-item-id is not there and the magnitude of the item ids with what we see as recommendations (encoded item ids) is different. I got the same result after unpacking ./workflow_etl/categories/unique.session_id.parquet. Did I go ahead as you instructed? If I could somehow decode the encoded item id, that'd be great!

rnyak · 2022-08-08T21:23:31Z

The index column from 0 to 52739 are your encoded item ids, that you get from Triton after you run visualize_response function. Arent the values of the item ids what you get as recommendations (encoded item ids) within 0-52739 in your case?

basically if you do that

category_item_id =  cudf.read_parquet('/workspace/kdd/categories/unique.item_id.parquet').reset_index().rename(columns={"index": "encoded_item_id"})

you will get your encoded item id column and original ids in the same cudf dataframe.

hosseinkalbasi · 2022-08-09T10:47:41Z

Thank you @rnyak! This is so much clearer for me now. I can now resolve recommendations.
No, the full length of recommendation I get is a bit (4 items) more than the total number of item ids! Not sure if this is a BUG or how it could be explained!

Detailed comaprisons:

Let's see how many Item Ids we have in the first place

cudf.read_parquet('/workspace/data/interactions_merged_df.parquet')['item_id'].nunique()
# output: 52739

How many encoded item ids do we have after NVTabular workflow has been applied?

category_item_id = cudf.read_parquet('./workflow_etl/categories/unique.item_id.parquet').reset_index().dropna(axis=0, subset=['item_id']).rename(columns={"index": "encoded_item_id"})
category_item_id

# number of rows: 52739

All good so far! However, if we take a closer look at the prediction (Triton Inference Server response), we have 52743 logit values for each predicted session!

After running this code:

import nvtabular.inference.triton as nvt_triton
import tritonclient.grpc as grpcclient

inputs = nvt_triton.convert_df_to_triton_input(filtered_batch.columns, filtered_batch, grpcclient.InferInput)
output_names = ["output_1"]
outputs = []
for col in output_names:
    outputs.append(grpcclient.InferRequestedOutput(col))
    print(outputs)
MODEL_NAME_NVT = "t4r_tf"
with grpcclient.InferenceServerClient("<inference-server-ip>:8001") as client:
    response = client.infer(MODEL_NAME_NVT, inputs)
    response_as_numpy = response.as_numpy(col)
    print(col, ':\n', response_as_numpy)
    print("*"*8, response_as_numpy.shape)

Output:

[<tritonclient.grpc.InferRequestedOutput object at 0x7f73388d8850>]
output_1 :
 [[-15.010758  -15.9208555  -9.183845  ... -15.026     -17.51056
  -14.421532 ]
 [-14.498426  -15.743032  -10.884762  ... -15.942841  -17.284363
  -15.883841 ]
 [-19.52346   -16.099092  -11.823889  ... -16.027607  -18.37647
  -13.709213 ]
 ...
 [-15.586963  -16.59345    -9.404068  ... -21.083416  -17.071983
  -16.640312 ]
 [-16.728306  -14.76777   -10.235783  ... -18.325325  -15.863041
  -13.49     ]
 [-19.494577  -21.577778   -9.008173  ... -23.189869  -18.583168
  -19.197771 ]]
******** (15, 52743)

Is this expected @rnyak ? I expected the length of each row of prediction to be 52739 as well.

rnyak · 2022-08-09T12:38:51Z

@hosseinkalbasi what's your filtered batch? any filtering did you apply? what's the code before the code block you shared above. Btw, assume you are running end-to-end example nb?

Note that in the NVT pipeline we apply min session length filter, meaning sessions less than 2 are not considered.

korchi · 2023-08-02T10:59:14Z

Hi everyone. May I ask you for help?
I would like to understand how trained model is going to behave in production. For that end I want to recompute RecallAt on my own by inferencing the model and compare it with recall returned by trainer.evaluate(eval_dataset=eval_path)). But I keep getting different results of metrics.

Specifically, I thought, that if eval_on_last_item_seq_only=True then trainer.evaluate() method would compute metrics by counting how many times the prediction hits the last item_id in the sequence/session (given the inputs sequence [0:N-1], where Nth item_id is the label).

I tried to replicate it. I truncated my valid.parquet dataset by splitting each sequence into new sequence [1:N-1] (as input), and last item [N] (for label) and computed RecallAt on my own by simply inferencing and counting number of hits.

predictions = model(sequence_truncated, testing=False)
correct = int(label in predictions.argsort()[:-20])

However, I am getting smaller RecallAt compared to running trainer.evaluate(eval_dataset=eval_path).

I debugged code and also found out, that the predictions inside the trainer.evaluate() differ from my predictions during the inference mainly due to testing=True, which significantly influences mask_targets and masking_scheme. Could someone explain in detail how testing variable influences inputs and outputs and why model predictions are so different when testing=True (in evaluate()) and testing=False (during the inference)? And how can I simulate evaluate method by inferencing the model (testing=False)?

Thanks a lot.

Ahanmr added the status/needs-triage label Jan 13, 2022

Ahanmr closed this as completed Feb 7, 2022

rnyak mentioned this issue May 18, 2022

[BUG] InferenceServerException:failed to load 't4r_pytorch', no version is available #364

Closed

hosseinkalbasi mentioned this issue Aug 20, 2022

[DOC] Add Pytorch local inference to example notebooks #476

Closed

rnyak mentioned this issue Jan 30, 2023

How to see Affinity/score for each recommendations ? #607

Closed

rnyak mentioned this issue Feb 9, 2023

[QST] Decoding item-id in NextItemPredictionTask #612

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[FEA] Model inference locally #359

[FEA] Model inference locally #359

Ahanmr commented Jan 13, 2022

rnyak commented Jan 18, 2022

Ahanmr commented Jan 27, 2022 •

edited

Loading

SaadAhmed433 commented Mar 17, 2022

rnyak commented Mar 17, 2022

SaadAhmed433 commented Mar 17, 2022

kontrabas380 commented Apr 27, 2022

SaadAhmed433 commented Apr 28, 2022

cchudzian commented May 6, 2022

lightsailpro commented Jun 22, 2022

SaadAhmed433 commented Jun 24, 2022

hosseinkalbasi commented Aug 7, 2022

hosseinkalbasi commented Aug 8, 2022

rnyak commented Aug 8, 2022

hosseinkalbasi commented Aug 8, 2022 •

edited

Loading

rnyak commented Aug 8, 2022 •

edited

Loading

hosseinkalbasi commented Aug 9, 2022

rnyak commented Aug 9, 2022 •

edited

Loading

korchi commented Aug 2, 2023

[FEA] Model inference locally #359

[FEA] Model inference locally #359

Comments

Ahanmr commented Jan 13, 2022

🚀 Feature request

Motivation

rnyak commented Jan 18, 2022

Ahanmr commented Jan 27, 2022 • edited Loading

SaadAhmed433 commented Mar 17, 2022

rnyak commented Mar 17, 2022

SaadAhmed433 commented Mar 17, 2022

kontrabas380 commented Apr 27, 2022

SaadAhmed433 commented Apr 28, 2022

cchudzian commented May 6, 2022

lightsailpro commented Jun 22, 2022

SaadAhmed433 commented Jun 24, 2022

hosseinkalbasi commented Aug 7, 2022

hosseinkalbasi commented Aug 8, 2022

rnyak commented Aug 8, 2022

hosseinkalbasi commented Aug 8, 2022 • edited Loading

rnyak commented Aug 8, 2022 • edited Loading

hosseinkalbasi commented Aug 9, 2022

rnyak commented Aug 9, 2022 • edited Loading

korchi commented Aug 2, 2023

Ahanmr commented Jan 27, 2022 •

edited

Loading

hosseinkalbasi commented Aug 8, 2022 •

edited

Loading

rnyak commented Aug 8, 2022 •

edited

Loading

rnyak commented Aug 9, 2022 •

edited

Loading