# Run ollama in a single Colab Session

To enable GPU in this notebook, select Runtime -> Change runtime type in the Menu bar. Under Hardware Accelerator, select GPU.

Then, scroll to the Configuration [cell](#scrollTo=8WIVDY-V-kVw&line=1&uniqifier=1) and update it with your ngrok authentication token.

To run, select Runtime -> Run all. Go to this [cell](#scrollTo=GZiFjH5QZFrd&uniqifier=1) and read the instructions on how to update your `.env` file.

# Check GPU

In [None]:
!nvidia-smi

# Install Deps

In [None]:
!apt update -y -qq
!apt install -y -qq curl lshw libcairo2-dev pkg-config python3-dev
!curl https://ollama.ai/install.sh | sh

!pip install flask -q
!pip install pyngrok -q
!pip install requests -q
!pip install flask-cors -q

import subprocess
sub = subprocess.Popen("ollama serve", shell=True, stdout=subprocess.PIPE)
!ollama pull zephyr

# Configuration

Please set the NGROK auth token to access the tunnel.

In [None]:
NGROK_AUTH_TOKEN = '' #@param {type:'string'}
OLLAMA_URL = 'http://127.0.0.1:11434' #@param {type:'string'}

# Main Code

In [None]:
from flask import Flask, request, Response
import json
import subprocess
import requests
from pyngrok import ngrok
from flask_cors import CORS
import os
from urllib.parse import urlencode

# Update .env file Instructions

After the cell below has started running, copy the public url provided by ngrok and update OLLAMA_BASE_URL in your `.env` file. Below is an example output that you will see.

```
NgrokTunnel: "http://f9a8-34-73-238-198.ngrok.io" -> "http://localhost:5000"
 * Serving Flask app '__main__'
 * Debug mode: off
INFO:werkzeug:WARNING: This is a development server. Do not use it in a production deployment. Use a production WSGI server instead.
 * Running on http://127.0.0.1:5000
INFO:werkzeug:Press CTRL+C to quit
```

DO NOT use this url, use the URL provided by the actual output from running the cell below. In this example, you will update your OLLAMA_BASE_URL variable with:

```
OLLAMA_BASE_URL=http://f9a8-34-73-238-198.ngrok.io
```

This url will change every time you rerun this cell, so remember to update your `.env` file when that happens.

In [None]:
app = Flask(__name__)
ngrok.set_auth_token(NGROK_AUTH_TOKEN)
CORS(app)

@app.route('/api/generate', methods=['POST']) # Create route for generate_completion function
def generate_completion():
  model = request.form.get('model') # Get value model from the parameter url
  prompt = request.form.get('prompt') # Get value prompt from the parameter url
  persona = request.form.get('persona') # Get value persona from the parameter url
  temperature = request.form.get('temperature') # Get value persona from the parameter url
  json_data = { # JSON data to be sent in the POST request
    "model": model or 'zephyr',
    "prompt": prompt,
    "system": persona or "You are 2B from NieR Automata. Answer as 2B, the assistant, only.",
    "options" : { "temperature": temperature or 0.8 },
    "stream": False
  }
  headers = { "Content-Type": "application/json" } # Set the headers for the request
  response = requests.post(f'{OLLAMA_URL}/api/generate', json=json_data, headers=headers)
  return json.loads(response.text)

@app.route('/api/pull', methods=['POST'])
def pull_model():
  model_name = request.form.get('name')
  json_data = { "name": model_name, "stream": False }
  headers = { "Content-Type": "application/json" } # Set the headers for the request
  response = requests.post(f'{OLLAMA_URL}/api/pull', json=json_data, headers=headers)
  return json.loads(response.text)

def main():
  sub = subprocess.Popen("ollama serve", shell=True, stdout=subprocess.PIPE)
  http_tunnel = ngrok.connect(5000) # Open tunnel
  print(http_tunnel)
  app.run() # Run app

if __name__ == '__main__': main()

## Leave the above cell running and the tab open

This is to ensure the runtime does not disconnect and shut down the server.

When you're done remember to disconnect the runtime.