In [None]:
%pip install flask ollama

To add a model to your Ollama local setup and call it using Python, follow these general steps:

1. Install Ollama
Ensure you have the Ollama CLI installed on your system. If you haven't done this yet, download and install it from the [Ollama website](https://ollama.com/).

2. Add a Model to Ollama
Ollama has a built-in command to download and add models. To add a model, open your terminal and use the following command:

```bash
ollama pull llama3.1:8b
```

3. Ensure Ollama is Running
Make sure your Ollama instance is up and running locally. You can use:

```bash
ollama serve
```

# Instructions for Running Flask API in Jupyter Notebook

This guide will walk you through setting up a local Flask API that mimics OpenAI's `/v1/completions` endpoint using the Ollama model or another local language model.

## Prerequisites

Ensure the following are installed:
- **Flask**: Python web framework.
- **Ollama** (or your preferred local LLM library): This guide assumes you're using Ollama.

You can install them by running the following commands in a cell:

```bash
!pip install flask ollama
```
Steps to Set Up and Run the Flask App
Write the Flask App:

Copy the Flask app code provided below into a code cell and execute it.
Run the Flask Server:

You can run the Flask server directly from within a Jupyter Notebook. The server will be accessible at http://localhost:5000.
Send Requests to the API:

After the server is running, you can make POST requests to http://localhost:5000/v1/completions using tools like curl, Postman, or Python code within the notebook.
Example Request:

To test the API, you can use the following code inside the notebook:
```python
import requests

url = "http://localhost:5000/v1/completions"
data = {
    "prompt": "What is the capital of France?",
    "max_tokens": 50,
    "temperature": 0.7,
    "model": "llama3.1:8b"
}

response = requests.post(url, json=data)
print(response.json())
```
Or you can test it with a tool like curl or Postman:

```bash
curl -X POST http://localhost:5000/v1/completions \
    -H "Content-Type: application/json" \
    -d '{
      "prompt": "What is the capital of France?",
      "max_tokens": 50,
      "temperature": 0.7,
      "model": "llama3.1:8b"
    }'
```
Stop the Server:
When you're done, stop the Flask server by interrupting the kernel in the Jupyter Notebook.

In [1]:
from flask import Flask, request, jsonify
import requests  # For making HTTP requests to the Ollama API
import threading

app = Flask(__name__)

# Define the Ollama server URL
OLLAMA_API_URL = "http://localhost:11434/v1/completions"  # Update if using a different port

@app.route('/v1/completions', methods=['POST'])
def completions():
    # Parse the incoming JSON request
    data = request.get_json()

    # Extract necessary information from the request (mimicking OpenAI API)
    prompt = data.get("prompt", "")
    max_tokens = data.get("max_tokens", 100)
    temperature = data.get("temperature", 0.7)
    top_p = data.get("top_p", 1)
    model = data.get("model", "llama3.1:8b")  # Default to a model you have configured

    # Error handling
    if not prompt:
        return jsonify({"error": "Prompt is required"}), 400

    try:
        # Send a request to the Ollama API
        response = requests.post(
            OLLAMA_API_URL,
            json={
                "model": model,
                "prompt": prompt,
                "max_tokens": max_tokens,
                "temperature": temperature,
                "top_p": top_p
            }
        )

        # Check if the request was successful
        if response.status_code != 200:
            return jsonify({"error": f"Failed to communicate with Ollama server: {response.text}"}), response.status_code

        # Extract the completion text from the Ollama server response
        completion_response = response.json()
        completion_text = completion_response['choices'][0]['text']

        # Format the response like the OpenAI API
        response = {
            "id": model,
            "object": "text_completion",
            "created": 1234567890,  # You can use actual timestamp if desired
            "model": model,
            "choices": [
                {
                    "text": completion_text,
                    "index": 0,
                    "logprobs": None,
                    "finish_reason": "length" if len(completion_text.split()) >= max_tokens else "stop"
                }
            ],
            "usage": {
                "prompt_tokens": len(prompt.split()),
                "completion_tokens": len(completion_text.split()),
                "total_tokens": len(prompt.split()) + len(completion_text.split())
            }
        }

        return jsonify(response)

    except Exception as e:
        # Handle any errors and return a 500 status code with error message
        return jsonify({"error": str(e)}), 500

# Function to run the Flask app in a separate thread
def run_app():
    app.run(debug=True, use_reloader=False, host='0.0.0.0', port=5000)

# Start the Flask server in a background thread
flask_thread = threading.Thread(target=run_app)
flask_thread.start()


 * Serving Flask app '__main__'
 * Debug mode: on


 * Running on all addresses (0.0.0.0)
 * Running on http://127.0.0.1:5000
 * Running on http://192.168.1.128:5000
Press CTRL+C to quit
127.0.0.1 - - [11/Sep/2024 18:30:18] "POST /v1/completions HTTP/1.1" 200 -
