<a href="https://colab.research.google.com/github/celinaLind/pentagram/blob/main/Image_Generation_Walkthrough_Notes.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [4]:
%pip install modal

Collecting modal
  Downloading modal-0.68.44-py3-none-any.whl.metadata (2.3 kB)
Collecting fastapi (from modal)
  Downloading fastapi-0.115.6-py3-none-any.whl.metadata (27 kB)
Collecting grpclib==0.4.7 (from modal)
  Downloading grpclib-0.4.7.tar.gz (61 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m61.3/61.3 kB[0m [31m2.5 MB/s[0m eta [36m0:00:00[0m
[?25h  Installing build dependencies ... [?25l[?25hdone
  Getting requirements to build wheel ... [?25l[?25hdone
  Preparing metadata (pyproject.toml) ... [?25l[?25hdone
Collecting synchronicity~=0.9.8 (from modal)
  Downloading synchronicity-0.9.8-py3-none-any.whl.metadata (8.3 kB)
Collecting types-certifi (from modal)
  Downloading types_certifi-2021.10.8.3-py3-none-any.whl.metadata (1.4 kB)
Collecting types-toml (from modal)
  Downloading types_toml-0.10.8.20240310-py3-none-any.whl.metadata (1.5 kB)
Collecting watchfiles (from modal)
  Downloading watchfiles-1.0.3-cp310-cp310-manylinux_2_17_x86_64.manylin

# Modal

Modal is a serverless container
meaning it spins up and down the servers in use dependent on

by default model containers spin down after 60 seconds
- you can set directly @app.function(container_idle_timeout=*timeInSeconds*)
- or always have a few servers running
@app.function(keep_warm=*#ofServersToKeepActive*)
  - this could cause issues with cost
- Cron job or schedule keep_warm function where is runs after a set amount of time

In [None]:
%%writefile modal.py

import modal
import io
from fastapi import Response, HTTPException, Query, Request
from datetime import datetime, timezone
import os

# modal endpoints is a fastAPI server under the hood
# os allows access to environment variables

# download model
def download_model():
  # why are the imports done within the function?
  from diffusers import AutoPipelineForText2Image
  import torch
  # load model and set up pipeline
  pipe = AutoPipelineForText2Image.from_pretrained("stabilityai/sdxl-turbo",
                                                   torch_dtype=torch.float16,
                                                   # the lower the precision the worse the quality
                                                   variant="fp16")
# set up image (i.e. docker container)
image = (modal.Image.debian_slim()
        # the below packages are quite robust so in order to save your local space
         # we install it to the modal docker container
            .pip_install("fastapi[standard]", "transformers", "accelerate", "diffusers", "requests")
            .run_function(download_model))

app = modal.App("sd-demo", image=image)

#what is cls??
@app.cls(
    image=image,
    gpu="A10G", #can distribute the load across multiple GPUs if you receive large loads
    # pick GPU based on case, the above is the cheapest quality gpu from Modal
    container_idle_timeout=300,
    secrets=[modal.Secret.from_name("API_KEY_NAME")]
)
class Model:
  @modal.build() # decorator for methods that should execute at build time
  @modal.enter() # decorator for methods that needs to be called when new container is started
  # method called on app build
  def load_weights(self):
    from diffusers import AutoPipelineForText2Image
    import torch

    self.pipe = AutoPipelineForText2Image.from_pretrained("stabilityai/sdxl-turbo",
                                          torch_dtype=torch.float16,
                                          variant="fp16")
    self.pipe.to("cuda") # cuda is NVIDIA software stack (cuda kernels??)
    # ^ this tells modal to make sure it runs on the GPU

    self.API_KEY_NAME = os.environ["API_KEY_NAME"]

  # create modal endpoint (api call)
  @model.web_endpoint()
  # a query parameter is like a youtube video id
  def generate(self, request: Request, prompt: str = Query(..., description="The prompt for image generation")):
    # from request headers get the API key
    api_key = request.headers.get("X-API-KEY") #the api key 'X-API-KEY' might have a different name
    if api_key != self.API_KEY_NAME:
      # don't allow use of api and throw error code
      raise HTTPException(status_code=401, detail="Unauthorized")

    # if you increase inference steps: Quality better, latency worse
    # test with different inference step amounts {10 or 20} <== will take a lot longer to deploy/generate
    # guidance_scale: is how well the model conforms/confines to the prompt
    # before .images[0] it returns a list of string images
    image = self.pipe(prompt, num_inference_steps=1, guidance_scale=0.0).images[0]

    # create a buffer or in memory file to save image
    buffer = io.BytesIO()

    # save the image to buffer
    image.save(buffer, format="JPEG")

    # return image from buffer
    # media_type: lets the browser know what it will be getting
    return Response(content=buffer.getvalue(), media_type="image/jpeg")

    #need to incorporate validation to confirm correct image type is being returned
  # the initial deployment/run it has to download the model meaning it could take 5-10 minutes or more
  # the end result of the deployment is a URL generated by Modal

  # after we get the URL we need to SECURE the URL so random ppl can't access the API
  # can be done using an API key (can add a secret directly in Modal using python-secrets library to generate a secure secret token)

  @modal.web_endpoint()
  def health(self): # not secure but doesn't have computations so not a big deal
    """Lightweght endpoint for keeping the container warm"""
    return {"status": "healthy", "timestamp": datetime.now(timezone.utc).isoformat()}

  #warm-keeping fcn that runs every 5 minutes
  @app.function(
      schedule=modal.Cron("*/5 * * * *"),
      secrets=[modal.Secret.from_name("API_KEY_NAME")]
  )
  # call name of function in URL
  def keep_warm():
    health_url: "vercelURL-functionName.modal.run"
    generate_url: "vercelurl-functionName.modal.run"

    # first check health endpt (no API key needed)


In [None]:
# in another repo to call api from backend
# in route.ts file

# keep in mind: NEVER call external API on frontend always call on backend

# first get prompt from user ==> get from request body

# save modal url as environment variable
const url = new URL("construct new url using deployed modal url")

url.searchParams.set("prompt", text)

# after url created with parameters
# create fetch request
const response = await fetch(url.toString(), {
    method: "GET",
    headers: {
        "X-API-KEY": process.env.API_KEY_NAME,
        Accept: "image/jpeg"
    }
})

#error handling


#Add blob storage for generated images
# create database in vercel > blob > will provide you blob read/write token to use
# install and import vercel/blob and corresponding put functions
# import crypto

# to read data received from API
const imageBuffer = await response.arrayBuffer();

# using the crypto library to create unique token names for each generated image
const filename = `${crypto.randomUUID()}.jpg`

# store filepath you get from vercel into the database

const blob = await put(filename, imageBuffer,
                        { access: "public",
                       contentType: "image/jpeg"})
# when you add it to vercel, vercel returns a public URL for use
# then return the blob image
return NextResponse.json({
    ...
})

In [None]:
# handle frontend response

const [imageUrl, setImageUrl] = useState...

# if data imageUrl doesn't show return error
# if successful update state variable on load and create new image
# then show generated image on frontend

# Securing App

## Public api endpoint
-- need to secure the generate-image endpoint

**Things to do** (refactor code):
1. send request through server
2. use server-actions

separate client side code from what is being run on server by creating a separate server-action within actions folder

"use server" < put at the top of the file
-- since it is running on the server you can use the secret since no one can see the headers you don't want to see it

### make component for client side code (if possible)
--> instead of making fetch request in "use client" file you call a function within a "use server" file that handles the API request and response


* will need to verify the api secret in the route file

## Other securing features:
- encryption
- house modal URL in the env file

# Setup Postgres
can be done through Vercel
- store user data
  - image url
  - user prompt
  - creation time
  - latency for generation (const now = datetime.now())
    - can be used for analytics to log trends, cross-referencing with model dashboard, etc.
  
--> could technically provide multiple images since they are provided in list format

--> create system to deter inappropriate or malicious prompt image generation
  - can be done through LLM directly or within prompts