Deployable AI aims to enable quick inference serving in local environment in various styles of your choice.
To install this package, the easiest is to run pip install dpai. If you prefer directly install using this repo code, you can clone it and run make command directly.
-
save your model in
.joblibformat. Example:from joblib import dump your_model_artifact = { "model": your_model, # other metadata "tokenizer": ..., "quantization": ..., ... } dump(your_model_artifact, "MODEL_ARTIFACT_PATH.joblib")
-
Create inference script
inference.pywith two functionsinput_fnandpredict_fn(similar to how sagemaker inference does). Usually you'll create an inference file for each model you register. Example:def input_fn(data): processed_data_for_model_input = ... # some transformation logic return processed_data_for_model_input def predict_fn(input, model): result = model(input) return result
-
Register model: run
deployaible register --name=your_model_name --model_path=your_model_path --inference_path=your_inference_path -
Serve your model: run
deployaible serve --port=your_portYou will get a backend running onyour_port(default is 9000). A sample endpoint will belocalhost:9000/your_model_name/predict. -
Format your data input in JSON style:
{"data": your_input_data}. Make sure it aligns with theinput_fnyour infrence script -
Test endpoint: example request
curl -X POST -H "Content-Type: application/json" -d '{"data": ["val"]}' http://localhost:9100/GPT4/predict
-
You can also the APIs via swagger UI on
http://localhost:your_port/docs
- Supports multiple types of model serving
- Sample UI
- Works on Linux/MacOS/Windows
- Currently only supported request type is
application/json.
See the doc here