# API Model Integration

In this notebook we will take a look at integrating the fine-tuned BERT model for question answering into a real-world application using FastAPI is a great way to deploy and utilize it in a production environment. The setup involves creating an API that receives a question and context as input and returns the predicted answer. We will create a Docker container for the FastAPI app

### Saving the Model and Tokenizer
After training the model in 4.1.1, you can save it along with its tokenizer to a directory. This is commonly done using the `save_pretrained()` method provided by the Hugging Face Transformers library

In [None]:
from transformers import BertTokenizer, BertForQuestionAnswering, AdamW
import torch

# Assume the rest of your model training setup is here

model_path = "qa_model.pth"

# Training loop here
# After training:
model.save_pretrained(model_path)
tokenizer.save_pretrained(model_path)

### Integrating the Saved Model into FastAPI
Now that your model and tokenizer are saved, you can load them from the saved directory in your FastAPI application. This will allow your API to use the fine-tuned model to answer questions. The below example code is stored in `app.py` in the `qa_app` folder

```python
from fastapi import FastAPI, HTTPException
from pydantic import BaseModel
from transformers import BertTokenizer, BertForQuestionAnswering
import torch
import os

app = FastAPI()

# Fetch the model directory from the environment variable
model_directory = os.getenv('MODEL_PATH', '/app/models')
model_filename = 'qa_model.pth'  # Specify your model filename here

# Full path to the model file
model_path = os.path.join(model_directory, model_filename)

# Load the model and tokenizer
model = BertForQuestionAnswering.from_pretrained(model_path)
tokenizer = BertTokenizer.from_pretrained(model_path)
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
model.to(device)

class QAInput(BaseModel):
    question: str
    context: str

@app.post("/predict")
async def predict(data: QAInput):
    # Encode the inputs
    inputs = tokenizer.encode_plus(data.question, data.context, return_tensors="pt")
    input_ids = inputs["input_ids"].to(device)
    attention_mask = inputs["attention_mask"].to(device)

    # Make prediction
    with torch.no_grad():
        outputs = model(input_ids=input_ids, attention_mask=attention_mask)
        answer_start_scores, answer_end_scores = outputs.start_logits, outputs.end_logits

    # Find the tokens with the highest `start` and `end` scores.
    answer_start = torch.argmax(answer_start_scores)
    answer_end = torch.argmax(answer_end_scores) + 1

    # Convert tokens to answer string
    answer = tokenizer.convert_tokens_to_string(tokenizer.convert_ids_to_tokens(input_ids[0][answer_start:answer_end]))

    return {"question": data.question, "context": data.context, "answer": answer}

if __name__ == "__main__":
    import uvicorn
    uvicorn.run(app, host="0.0.0.0", port=8000)


### Create a Dockerfile
Create a `Dockerfile` in the same directory as your FastAPI app (`app.py`). This file will define the Docker image that includes your app and all its dependencies.

```Docker
# Use an official Python runtime as a parent image
FROM python:3.8-slim

# Set the working directory in the container
WORKDIR /usr/src/app

# Copy the local directory contents into the container
COPY . .

# Install any needed packages specified in requirements.txt
RUN pip install --no-cache-dir -r requirements.txt

# Make port 8000 available to the world outside this container
EXPOSE 8000

# Define environment variable
ENV MODEL_PATH=/usr/src/app/model

# Run app.py when the container launches
CMD ["uvicorn", "app:app", "--host", "0.0.0.0", "--port", "8000"]


### Create a Requirements File
Create a `requirements.txt` file that lists the packages that your app depends on. Make sure to include fastapi, uvicorn, torch, transformers, and any other required libraries.

```txt
fastapi
uvicorn
torch
transformers


### Build the Docker Image
From your project directory (where your `Dockerfile` and `app.py` are located), run the following command to build the Docker image
```bash
docker build -t qa_app .

<img src="./imgs/docker_build.png" alt="drawing" width="650"/>

### Run the Docker Container
```bash
docker run -p 8000:8000 qa_app

Docker runs the container and map port 8000 of the container to port 8000 on your host, allowing us to access the FastAPI application using the browser, `requests` library or Postman.

<img src="./imgs/docker_run.png" alt="drawing" width="800"/>

### Testing `qa_app` using `requests`

In [2]:
import requests

# The endpoint URL
url = 'http://localhost:8000/predict'

# Example question and context
data = {
    "question": "When did the Titanic sink?",
    "context": "The RMS Titanic sank in the early morning hours of April 15, 1912, after colliding with an iceberg during its maiden voyage from Southampton to New York City."
}

# Sending a POST request
response = requests.post(url, json=data)

# Print the response from the server
print("Status Code:", response.status_code)
print("Response:", response.json())


Status Code: 200
Response: {'question': 'When did the Titanic sink?', 'context': 'The RMS Titanic sank in the early morning hours of April 15, 1912, after colliding with an iceberg during its maiden voyage from Southampton to New York City.', 'answer': 'april 15 , 1912'}


This result shows that the model is successfully able to respond to the question. While this notebook provides a basic foundation for setting up a question-answering model using BERT and deploying it via FastAPI, it requires more extensive training with diverse and complex datasets, ensure the model can generalize well across different types of queries, and continuously monitor and update the model to adapt to new data and user feedback.