# Reduce Data Movement
Larger pieces of data can take a non-insignificant amount of time to move. For instance a large image in a raw pixel format:

In [None]:
import torch

fake_image_data = torch.rand(1, 3, 800, 1280)

In [None]:
%%timeit 
fake_image_data.to('cuda')
fake_image_data.to('cpu')

Hosting the pre and post processing within your inference server is an easy way to prevent unecessary calls.

# Compile pre/post processing
Pre and post processing are often significant pieces of processing that should be accelerated to reduce end to end latency.

In [None]:
import numba
import os
import cv2
import numpy as np

# create data input
image_file_path = "sample_images/group-photo.jpg"

target_input_height = 800
target_input_width = 1280

original_image = cv2.imread(image_file_path)


resized_image = cv2.resize(original_image, (target_input_width,
                           target_input_height))

In [None]:
# pre processing
def pre_process_image(input_image):
    image_rgb = input_image[...,::-1] # BGR to RGB
    image = image_rgb.astype(np.float32)

    image = np.divide(image, 255)
    image = np.transpose(image, (2, 0, 1))  # HWC to CHW

    image = np.expand_dims(image, axis=0) # add batch dimension
    
    return image

In [None]:
%%timeit 
image = pre_process_image(resized_image)

### Numba
See here for more info: https://numba.pydata.org/numba-doc/latest/user/5minguide.html

In [None]:
@numba.jit(nopython=True)
def fast_pre_process_image(input_image):
    image_rgb = input_image[...,::-1] # BGR to RGB
    image = image_rgb.astype(np.float32)

    image = np.divide(image, 255)
    image = np.transpose(image, (2, 0, 1))  # HWC to CHW

    image = np.expand_dims(image, axis=0) # add batch dimension
    
    return image

# note: it compiles the function first time it is used
image = fast_pre_process_image(resized_image)

In [None]:
%%timeit 
image = fast_pre_process_image(resized_image)

You should consider a tool like Nvidia's [DALI](https://github.com/NVIDIA/DALI) to create pre/post processing pipelines. They can even be intergrated into [Triton](https://github.com/triton-inference-server/dali_backend) for a complete end to end inference pipeline.