# Exporting and saving machine learning models

Exporting and saving machine learning models is a crucial step in the model development process, allowing you to preserve the state of a model after training and deploy it in different environments. Here's a detailed guide on how to export and save machine learning models, focusing on various aspects and formats commonly used.

### Understanding Saving vs. Exporting

**Saving**
- Preserves the model's architecture, trained weights, and often associated configuration information (like hyperparameters or a vocabulary) so you don't need to train each time.
- Primarily intended for future use within the same framework or closely related environments where you started training.

**Exporting**:
- Converts the model into a representation suitable for deployment in production environments or for use across different frameworks.
- Often involves optimizations or format changes for better inference speed and compatibility.


### Common Formats for Saving Models
**Framework Specific Formats**
- PyTorch: (`.pth` or `.pt`): Saves either the entire model or just the state dictionary, which includes weights and biases but not the architecture.

- TensorFlow/Keras: (`.h5` or `SavedModel`): TensorFlow offers multiple ways to save models; as a single HDF5 file containing the architecture, weights, and training configuration, or as a SavedModel directory, which is a more comprehensive save format.

**Framework-Agnostic Formats**
- ONNX (Open Neural Network Exchange): A cross-platform format supported by many deep learning frameworks, which allows for model exchange between different tools.

### Saving Models in PyTorch
To save a model in PyTorch, you typically use either `torch.save` for the whole model or just the parameters.

In [2]:
import torch
from transformers import BertTokenizer, BertForQuestionAnswering

In [3]:
# Load a pre-trained BERT model and tokenizer
tokenizer = BertTokenizer.from_pretrained('bert-large-uncased-whole-word-masking-finetuned-squad')
model = BertForQuestionAnswering.from_pretrained('bert-large-uncased-whole-word-masking-finetuned-squad')

In [4]:
# Saving the Entire Model
torch.save(model, 'qa_model.pth')

In [5]:
# Saving Only the State Dictionary
torch.save(model.state_dict(), 'qa_model.pth')

### Exporting Models to ONNX
Exporting a model to ONNX requires the model to be in evaluation mode and a sample input to trace the computation graph.

In [6]:
import torch.onnx

# Set the model to evaluation mode
model.eval()

# Create dummy input as required for the model to run
inputs = tokenizer("What is AI?", "AI is Artificial Intelligence", return_tensors="pt")

# Export the model
torch.onnx.export(model, 
                  args=(inputs['input_ids'], inputs['attention_mask']), 
                  f="qa_model.onnx",
                  input_names=['input_ids', 'attention_mask'],
                  output_names=['start_logits', 'end_logits'],
                  dynamic_axes={'input_ids' : {0 : 'batch_size'},    # Variable batch size
                                'attention_mask' : {0 : 'batch_size'},
                                'start_logits' : {0 : 'batch_size'},
                                'end_logits' : {0 : 'batch_size'}})


### Loading and Using Saved Models

 Loading Models in PyTorch

In [7]:
# Loading the entire model
model = torch.load('qa_model.pth')


Running an ONNX Model

In [8]:
import onnxruntime as ort

session = ort.InferenceSession('qa_model.onnx')

### Considerations for Production
- Version Control: Always version your models to manage updates smoothly.
- Testing: Before deployment, rigorously test the model's performance and stability in an environment similar to production.
- Optimization: Consider model optimization techniques for better performance, especially in resource-constrained environments (e.g., model quantization, pruning).