# Computer Vision 🤗

In the final part of the project, which involves converting the book summary into an image, I found a text-to-image model on Hugging Face 🤗 called [`stable-diffusion`](https://huggingface.co/stabilityai/stable-diffusion-xl-base-1.0?text=The+book+follows+the+journey+of+an+Anabaptist+radical+across+Europe+in+the+first+half+of+the+16th+century+as+he+joins+in+various+movements+and+uprisings+that+come+as+a+result+of+the+Protestant+reformation.+The+book+spans+30+years+as+he+is+pursued+by+%5C%27Q%5C%27+%28short+for+%22Qo%C3%A8let%22%29%2C+a+spy+for+the+Roman+Catholic+Church+cardinal+Giovanni+Pietro+Carafa.+The+main+character%2C+who+changes+his+name+many+times+during+the+story%2C+first+fights+in+the+German+Peasants%5C%27+War+beside+Thomas+M%C3%BCntzer%2C+then+is+in+M%C3%BCnster%5C%27s+siege%2C+during+the+M%C3%BCnster+Rebellion%2C+and+some+years+later%2C+in+Venice.).Stable Diffusion is a deep learning, text-to-image model released from [`Stability AI`](https://stability.ai/) in 2022 based on diffusion techniques. It is considered to be a part of the ongoing artifical intelligence boom. However, due to lack of memory, I was compelled to use the [`Clipdrop`](https://clipdrop.co/apis/docs/text-to-image) website's API, which allowed me to make 100 requests for free. Using together.ai and the Llama language model, I compressed the book summary and then passed the compressed summary through Clipdrop to stable-diffusion to generate an image of the book summary. You are welcome to see the first generated image :)
The book was 'The New Prophecy Warriors: Sunset'.

<center><img src="firstGeneratedImage.png" alt="Image Description" style="width:50%;"/></center>

In [2]:
import pandas as pd
import requests
from together import Together
import os
import cv2
import numpy as np
from dotenv import load_dotenv
from diffusers import DiffusionPipeline
import torch

In [3]:
df = pd.read_excel('bookInfo.xlsx')

### Stable Diffusion

I directly downloaded stable diffusion from Hugging Face, and it was about 7 gigabytes. However, due to memory constraints, I couldn't use the model locally.

In [None]:
# load both base & refiner
base = DiffusionPipeline.from_pretrained(
    "stabilityai/stable-diffusion-xl-base-1.0", torch_dtype=torch.float16, variant="fp16", use_safetensors=True
)
# base.to("cuda")
refiner = DiffusionPipeline.from_pretrained(
    "stabilityai/stable-diffusion-xl-refiner-1.0",
    text_encoder_2=base.text_encoder_2,
    vae=base.vae,
    torch_dtype=torch.float16,
    use_safetensors=True,
    variant="fp16",
)
# refiner.to("cuda")

In [None]:
# Define how many steps and what % of steps to be run on each experts (80/20) here
n_steps = 40
high_noise_frac = 0.8

prompt = f"Depict a picture based on the summary of a book which will come in the following: {df[0]}"

run both experts
image = base(
    prompt=prompt,
    num_inference_steps=n_steps,
    denoising_end=high_noise_frac,
    output_type="latent",
).images
image = refiner(
    prompt=prompt,
    num_inference_steps=n_steps,
    denoising_start=high_noise_frac,
    image=image,
).images[0]


### Stable Diffusion via Clipdrop

In [41]:
random_book = df.sample()

In [42]:
random_book

Unnamed: 0,bookName,author,publishDate,genres,summary
930,The First Circle,Aleksandr Solzhenitsyn,1968,"['Fiction', 'Novel']","Innokentii Volodin, a diplomat, makes a telep..."


In [43]:
random_summary =  random_book['summary']

In [44]:
# Load environment variables from .env file
load_dotenv()

# Create client
client = Together(api_key=os.environ.get("TOGETHER_API_KEY"))


response = client.chat.completions.create(
    model="meta-llama/Llama-3-8b-chat-hf",
    messages=[{"role": "user", "content": f"Condense the provided book summary into a concise 900-word limit without adding any extra information like 'Here is the condensed summary within the 900-word limit'.: {random_summary}"}]
)
condensed_summary  = response.choices[0].message.content



In [45]:
condensed_summary

"Innokentii Volodin, a diplomat, makes a telepathic connection with a mysterious woman named Katya, who is hiding a dark secret. As their connection deepens, Volodin becomes entangled in a web of intrigue and deception, threatening to destroy his career and personal life.\n\nVolodin's life is turned upside down when he meets Katya, a woman with a troubled past and a mysterious connection to the Russian government. Despite the danger, Volodin finds himself drawn to Katya, and their telepathic bond grows stronger with each passing day.\n\nAs Volodin delves deeper into Katya's past, he uncovers a complex web of lies and deceit that threatens to destroy his reputation and his career. He must navigate the treacherous world of international diplomacy, where allegiances are tested and loyalties are questioned.\n\nMeanwhile, Katya's past begins to unravel, revealing a dark history of espionage and betrayal. Volodin is torn between his love for Katya and his duty to uncover the truth, even as h

In [46]:
# Depict a picture based on following summary : {condense_summary}
r = requests.post('https://clipdrop-api.co/text-to-image/v1',
  files = {
      'prompt': (None, f'Depict a picture based on the following summary : {condensed_summary}', 'text/plain')
  },
  headers = { 'x-api-key': os.environ.get("CLIPDROP_API_KEY")}
)

In [47]:
byte_array = np.frombuffer(r.content, dtype=np.uint8)

In [48]:
image = cv2.imdecode(byte_array, cv2.IMREAD_COLOR)

In [50]:
cv2.imshow('Image', image)
while True:
    if cv2.waitKey(1) & 0xFF == ord('q'):  # Wait for 'q' key to be pressed
        break
cv2.destroyAllWindows()