# Inference

---

After training a model you need to be able to interact with in order to make predictions on new data.

Inference is a breeze with AIQC because it persists all of the information that we need to preprocess our new samples and reconstruct our model.

Normally, the challenge with inference is being able to preprocess your new samples the same way as your processed your training samples. Additionally, if you provide labels with your new data for the purpose of evaluation, then PyTorch requires you to reconstruct parts of your model like your optimizer in order to calculate loss.

---

In [3]:
from aiqc import mlops, datum, tests

Below we're using AIQC's `tests.py` to quickly make a trained model so that we have examples to work with for making inference-based predictions.

In [4]:
%%capture
queue_multiclass = tests.tf_multi_tab
queue_multiclass.run_jobs()

## Predictor

Let's say that we have a trained model in the form of a `Predictor`,

In [5]:
predictor = queue_multiclass.jobs[0].predictors[0]

and that we have samples that we want to generate predictions for.

## New Splitset

In [6]:
df = datum.to_pandas('iris.tsv').sample(10)

In [7]:
df.tail()

Unnamed: 0,sepal_length,sepal_width,petal_length,petal_width,species
139,6.9,3.1,5.4,2.1,virginica
86,6.7,3.1,4.7,1.5,versicolor
21,5.1,3.7,1.5,0.4,setosa
2,4.7,3.2,1.3,0.2,setosa
44,5.1,3.8,1.9,0.4,setosa


We'll fashion a new `Splitset` of the samples that we want to predict using the high-level API.

- Leave the `label_column` blank if you are conducting pure inference where you don't know the real Label/target. Otherwise, `splitset.label` will be used to generate metrics for your new predictions.
- Ultimately, any splits that you make will be ignored when calling `infer()` below as all samples from the `Dataset` will be utilized.

In [9]:
splitset = mlops.Pipeline.Tabular(
    df_or_path=df, label_column = 'species'
)

## Run Inference

Then pass that `Splitset` to `Predictor.infer()`.

During `infer()`, it will validate that the schema of your new Splitset's `Feature` (the latest `Window` if provided) and `Label` (if provided) match the schema of the original training Splitset. 

- `Dataset.Tabular` schema includes column ordering and dtype.
- `Dataset.Image` schema includes Pillow size (height/width) and mode (color dimensions).

In [10]:
prediction = predictor.infer(splitset_id=splitset.id)

If you encoded your Labels or generated unsupervised encoded data, don't worry, the output will be `inverse_transform`'ed as seen below.

In [11]:
prediction.predictions

{'infer': array(['virginica', 'virginica', 'setosa', 'versicolor', 'setosa',
        'virginica', 'versicolor', 'setosa', 'setosa', 'setosa'],
       dtype=object)}

For more information on the `Prediction` object, reference the [Low-Level API](api_low_level.html) documentation.