<a href="https://colab.research.google.com/github/FrescoDev/stabe-diffusion-2.1-gpt-experiment/blob/main/stabe_diffusion_2_1_gpt_experiment.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# **Stable Diffusion & GPT-3 to Generate animations on Replicate**

> Indented block



Simple Colab setup to use lyrics (or any general input) as a base prompt to generate prompt variations to eventually feed into SD v2.1 and generate latent space trasversal gifs/mp4s using replicate's API. 

In [None]:
#@title Install dependencies
!pip install openai
!pip install -U git+https://github.com/huggingface/diffusers.git
!pip install transformers
!pip install accelerate
!pip install replicate

In [None]:
#@title Enter your API Keys
#@markdown 
OPEN_AI_KEY = "" #@param {type:"string"}
REPLICATE_API_KEY = "" #@param {type:"string"}

In [None]:
#@title Helper functions
from PIL import Image

# Define image grid to display multiple generations nicely
def image_grid(imgs, rows, cols):
    assert len(imgs) == rows*cols

    w, h = imgs[0].size
    grid = Image.new('RGB', size=(cols*w, rows*h))
    grid_w, grid_h = grid.size
    
    for i, img in enumerate(imgs):
        grid.paste(img, box=(i%cols*w, i//cols*h))
    return grid

In [None]:
#@title Download SD v2.1
from diffusers import StableDiffusionPipeline

pipe = StableDiffusionPipeline.from_pretrained("stabilityai/stable-diffusion-2-1-base").to("cuda")

In [None]:
#@title Theme / Inspiration / Lyrical segment
#@markdown
# Example below is a line from Tupac's "Me against the World"
lyrics = "The message I stress: to make it stop, study your lessons Don't settle for less, even the genius asks his questions" #@param {type:"string"}


In [None]:
#@title Generate prompts using GPT-3 (davinci 3)
import openai
openai.api_key = OPEN_AI_KEY

# Quite a verbose prompt that produced nice HD shoe images, probably could be tweaked for different purposes. 
response = openai.Completion.create(
  model="text-davinci-003",
  prompt=f'Produce variations on this comma separated list of words which describes a beautiful artisitc image. These words and word combinations will map to points in latent space and convey some semantic meaning which will then be used to generate images using novel diffusion algorithms. Use this information to explore different prompt variations.\n\nHere are some good prompt example: \n\nhi-res, professional HD 8k, side profile wide shot of a minimal Basquiat themed futuristic art piece on a wide plain background, centered, ultrarealistic, cinematic lighting, fashion, stunning design, breathtakingly detailed photography|hi-res, professional HD 8k, side profile panning shot of a detailed Basquiat abstract art piece on a grey mottled background, artfully framed, beautifully lit, fashionable, bold design, breathtakingly vivid photography|HD super clear 4k video, front-on zooming shot of a modern Basquiat painting on a bright white background, centred, surreal lighting, edgy, visually captivating design, detailed photography |afro-futurist styled hi-res wide angle shot of a graffiti-style Basquiat mural on a desolate landscape background, off-centre, delicate lighting, stylish, colourful design, artfully composed photography|abstract art canvas, front-on still shot of a vibrant abstract Basquiat artwork on a ghostly blurred background, balanced, romantic lighting, creative, intricate design, captivatingly crafted photography|HQ digital art, side profile panorama of a cuboid Basquiat painting on a textured beige background, off-centre, low-key lighting, funky, stunning design, artistically composed photography\n\nNow, come up with 3 interesting prompts that are relevant to the following lyrics. Try not to make these generic, be creative and try and think outside of the box. Do not overuse common tropes like love but capture the nuance and cultural context of the artist bearing their emotions on the track. Try exploring a diverse set of themes, focusing on the other words in the lyrics.\n\nLyrics:\n\n{lyrics}\n\n|Lo-fi iphone footage, eerie side on shot of an anonymous activist making a silent stand before a dilapidated, old city skyline at night, creatively highlighted and framed, questioning yet powerful design, thought-provoking, in the style of basquiat and warhol|',
  temperature=0.96,
  max_tokens=777,
  top_p=1,
  frequency_penalty=0,
  presence_penalty=0
)

raw_gen = response.choices[0].text
prompts = raw_gen.split('|')

print(prompts)
print(len(prompts))

In [None]:
#@title Edit negative prompts (optional)
#@markdown
neg_prompts = "words, writing, blurry face, ugly, bad, plain, boot, fake, CGI, cropped, blurry, rough, low quality, low res, grainy" #@param {type:"string"}

In [None]:
#@title Generate images from prompts using SD v2.1
# Plenty of things here can be adjusted, i.e. to keep memory down we're only generating 2 prompts.
neg_prompt_list = [neg_prompts] * len(prompts)
images = pipe(prompts, height=560, width=560, negative_prompt=neg_prompt_list).images
grid = image_grid(images, rows=1, cols=len(prompts))
grid

In [None]:
#@title Save prompts to file (if they look cool)
# Open the file in write mode
with open('prompts.txt', 'a') as f:
  for prompt in prompts:
    # Check if the element is a string
    if isinstance(prompt, str):
      # Prompt the user for input
      user_input = input('Add "{}" to the file? (y/n) '.format(prompt))
      if user_input == 'y':
        # Write the string to the file
        f.write(prompt + '\n')


In [None]:
#@title Use Replicte to generate animations from prompts
import os
import replicate


os.environ['REPLICATE_API_TOKEN'] = REPLICATE_API_KEY

model = replicate.models.get("andreasjansson/stable-diffusion-animation")
version = model.versions.get("ca1f5e306e5721e19c473e0d094e6603f0456fe759c10715fcd6c1b79242d4a5")
output = version.predict(gif_frames_per_second=30, num_animation_frames=25, num_inference_steps=50, prompt_strength=0.9, num_interpolation_steps=5, prompt_start=prompts[0], prompt_end=prompts[1])
