A typical Neuron developer flow includes compilation phase and then deployment (inference) on inf1 instance/s.
To quickly start developing with Neuron:
- Setup your environment to compile and deploy on Inf1 instance/s:
ec2-then-ec2-setenv
- Run a tutorial from one of the leading machine learning frameworks supported by Neuron:
pytorch-tutorials
tensorflow-tutorials
mxnet-tutorials
- Explore more flows to develop with Neuron:
neuron-devflows
Customers can train their models anywhere and easily migrate their ML applications to Neuron and run their high-performance production predictions with Inferentia. Once a model is trained to the required accuracy, model is compiled to an optimized binary form, referred to as a Neuron Executable File Format (NEFF), and loaded by the Neuron runtime driver to execute inference input requests on the Inferentia chips. Developers have the option to train their models in fp16 or keep training in 32-bit floating point for best accuracy and Neuron will auto-cast the 32-bit trained model to run at speed of 16-bit using bfloat16.