# 0. Goal
We aim to finetune stable diffusion v1.4 base model to be able to genetare logo images for different brands given a textual prompt.

# 1. Dataset description

We collect a list of top 100 global brands from *Forbes* and crawl the web to collect logo images of those brands. The brands are organized into 18 categories. The summary of the collected dataset is given below-

|    Number of Samples    |       424      |
|:-----------------------:|:--------------:|
|        Categories       |       18       |
|          Brands         |       100      |
|  Most Frequent Category | Technology(88) |
| Least Frequent Category |   Tobacco(5)   |

###Link to all the raw data
https://drive.google.com/drive/folders/1SpTF5V46GuLdP9FIecpsOKHWMBYozVaL?usp=drive_link

Thelist of all the brands and their respective categories are stored in the **brands.txt** file in this drive.

###Crawl the web and download 5 png images for each brand

In [None]:
from google.colab import drive
drive.mount('/content/drive')
file_path = '/content/drive/My Drive/FFgenAI/brands.txt'
save_dir = '/content/drive/My Drive/FFgenAI/images'

It sohould be noted that the following crawler script was ran in local machine to avoid drive file system issues and uploaded manually to a public drive folder. The full raw dataset is available at https://drive.google.com/drive/folders/1SpTF5V46GuLdP9FIecpsOKHWMBYozVaL?usp=drive_link

In [None]:
from icrawler.builtin import GoogleImageCrawler
import os


def crawl_images(brands, num_images=5):
    counter = 1  # Initialize counter for filenames
    for brand_info in brands:
        brand, category = brand_info.strip().split("=")

        # Create folder for each category
        storage_dir = f'./images_2/{category}'
        os.makedirs(storage_dir, exist_ok=True)

        google_crawler = GoogleImageCrawler(storage={'root_dir': storage_dir})

        prompt = brand + ' brand logo png'  # Include search terms
        google_crawler.crawl(keyword=prompt, max_num=num_images)

        # Rename downloaded files with conflict handling
        for filename in os.listdir(storage_dir):
            new_filename = f'{brand}_{counter:03}.png'  # 03 for 3-digit padding
            new_filepath = os.path.join(storage_dir, new_filename)

            # Check for existing file with the desired name
            if os.path.exists(new_filepath):
                # Conflict handling: either skip or generate unique name
                # Option 1: Skip renaming if filename matches desired format
                if not(filename[0].isdigit()) and filename.endswith('.png'):
                    continue
                else:
                    # Option 2: Generate unique filename with an appended number
                    i = 1
                    while os.path.exists(new_filepath):
                        new_filename = f'{brand}_{i:02}.png'
                        new_filepath = os.path.join(storage_dir, new_filename)
                        i += 1

            os.rename(os.path.join(storage_dir, filename), new_filepath)

        counter += 1  # Increment counter after renaming


if __name__ == '__main__':
    with open('brands.txt', 'r') as file:
        data = file.read()
        search_keywords = data.splitlines()
        crawl_images(search_keywords, num_images=5)

#Preprocessing and generating metadata file

In the preprocessing step we resize all the images and and rename them in a generalized sequence. We also create the *metadata.csv* file that contains image name and corresponsing text description. We use ***BLIP*** to generate captions automatically for the logo images. We need to manually filter the images for bad images and check caaptions for icorrect or irrelevant descriptions. Then we push everything to the huggingface hub as a dataset.


In [None]:
img_dir = '/content/drive/My Drive/FFgenAI/images'

In [None]:
# Clone the BLIP repository
!git clone https://github.com/salesforce/BLIP.git
# Change the directory to BLIP
import os
os.chdir("BLIP")
!pip install -q timm
!pip install -q fairscale

In [None]:
# Import the libraries
import tqdm
from PIL import Image
import pandas as pd
import requests
import torch
from torchvision import transforms
from torchvision.transforms.functional import InterpolationMode
import matplotlib.pyplot as plt
from models.blip import blip_decoder

In [None]:
# Use cuda if available
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')

In [None]:
# Set the image size
image_size = 128

In [None]:
# URL BLIP model
model_url = 'https://storage.googleapis.com/sfr-vision-language-research/BLIP/models/model_base_capfilt_large.pth'

# Load the model
model = blip_decoder(pretrained=model_url, image_size=image_size, vit='base')
model.eval()
model = model.to(device)

Downloading:   0%|          | 0.00/226k [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/48.0 [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/570 [00:00<?, ?B/s]

reshape position embedding from 196 to 64
load checkpoint from https://storage.googleapis.com/sfr-vision-language-research/BLIP/models/model_base_capfilt_large.pth


In [None]:
import os
import pandas as pd
from PIL import Image
import torch
from torchvision import transforms
from torchvision.transforms import InterpolationMode

def traverse_png_images(root_dir):
  """Traverses all subdirectories of the given root directory and yields the paths to all PNG images found.

  Args:
    root_dir: The root directory to start the traversal from.

  Yields:
    A tuple of (subdirectory_name, filename, absolute_path) for each PNG image found.
  """

  for root, _, files in os.walk(root_dir):
    subdirectory_name = os.path.relpath(root, root_dir)
    for file in files:
      if file.endswith(".png"):
        yield subdirectory_name, file, os.path.join(root, file)

# Assuming 'images' is the root directory containing subdirectories with images
image_dir = "/content/drive/My Drive/FFgenAI/images/"

results = pd.DataFrame(columns=[ 'file_name', 'text'])

index = 1
for subdir, original_filename, image_path in traverse_png_images(image_dir):
  # Load and preprocess the image
  img = Image.open(image_path).convert('RGB')

  # Transform the image
  transform = transforms.Compose([
      transforms.Resize((image_size, image_size), interpolation=InterpolationMode.BICUBIC),
      transforms.ToTensor(),
      transforms.Normalize((0.48145466, 0.4578275, 0.40821073), (0.26862954, 0.26130258, 0.27577711))
  ])
  image = transform(img).unsqueeze(0).to(device)

  # Generate the caption
  with torch.no_grad():
      caption = model.generate(image, sample=False, num_beams=3, max_length=100, min_length=10)# beam search
      # caption = model.generate(image, sample=True, top_p=0.9, max_length=100, min_length=10)# nucleus sampling

  brand = original_filename.split('_')[0]
  caption = f"A logo of {brand} company. Category {subdir}. {caption}"

  # Generate new filename
  new_filename = f"img_{index}.png"  # naming convention
  index += 1  # Increment index for the next image

  # Rename the image
  new_path = os.path.join(image_dir, subdir, new_filename)
  os.rename(image_path, new_path)

  if results.empty:
    results = pd.DataFrame({'file_name': [new_filename], 'text': [caption]})  # Create a new DataFrame
  else:
    new_df = pd.DataFrame({'file_name': [new_filename], 'text': [caption]})  # Create a new DataFrame
    results = pd.concat([results, new_df], ignore_index=True)  # Concatenate and reset index

# Save the results to a csv file
results.to_csv('/content/drive/My Drive/FFgenAI/metadata.csv', index=False)

# Display the results
results.head()


Unnamed: 0,file_name,text
0,img_1.png,A logo of UPS company. Category Transportation...
1,img_2.png,A logo of UPS company. Category Transportation...
2,img_3.png,A logo of UPS company. Category Transportation...
3,img_4.png,A logo of UPS company. Category Transportation...
4,img_5.png,A logo of FedEx company. Category Transportati...


In [None]:
# import os
# import pandas as pd
# from PIL import Image
# import torch
# from torchvision import transforms
# from torchvision.transforms import InterpolationMode

# def traverse_png_images(root_dir):
#   """Traverses all subdirectories of the given root directory and yields the paths to all PNG images found.

#   Args:
#     root_dir: The root directory to start the traversal from.

#   Yields:
#     A tuple of (subdirectory_name, original_filename, absolute_path) for each PNG image found.
#   """

#   for root, _, files in os.walk(root_dir):
#     subdirectory_name = os.path.relpath(root, root_dir)
#     for file in files:
#       if file.endswith(".png"):
#         yield subdirectory_name, file, os.path.join(root, file)

# # Assuming 'images' is the root directory containing subdirectories with images
# image_dir = "images"

# results = pd.DataFrame(columns=['subdirectory', 'original_filename', 'new_filename', 'caption'])




# index = 1

# for subdir, original_filename, image_path in traverse_png_images(image_dir):
#   # Load and preprocess the image
#   img = Image.open(image_path).convert('RGB')

#   # Transform the image
#   transform = transforms.Compose([
#       transforms.Resize((image_size, image_size), interpolation=InterpolationMode.BICUBIC),
#       transforms.ToTensor(),
#       transforms.Normalize((0.48145466, 0.4578275, 0.40821073), (0.26862954, 0.26130258, 0.27577711))
#   ])
#   image = transform(img).unsqueeze(0).to(device)

#   # Generate the caption
#   with torch.no_grad():
#       #caption = model.generate(image, sample=False, num_beams=5, max_length=100, min_length=10)    # beam search
#       caption = model.generate(image, sample=True, top_p=0.9, max_length=100, min_length=10) # nucleus sampling

#   # Concatenate filename to caption
#   caption = f"{original_filename}: {caption}"

#   # Generate new filename
#   new_filename = f"img_{index}.png"  # Replace with your desired naming convention
#   index += 1  # Increment index for the next image

#   # Rename the image
#   new_path = os.path.join(image_dir, subdir, new_filename)
#   os.rename(image_path, new_path)

#   # Append the results to the dataframe
#   results = pd.concat([results, pd.DataFrame({'subdirectory': subdir, 'original_filename': original_filename, 'new_filename': new_filename, 'caption': caption})], ignore_index=True)

# # Save the results to a csv file
# results.to_csv('./../../data/metadata.csv', index=False)

# # Display the results
# results.head()


### Resize and upload to huggingface

In [None]:
###resizing and putting storing in a seperate directory
import os
from PIL import Image

def resize_and_save(input_dir, output_dir, new_size=(256, 256)):
  """
  Resizes all PNG images in a directory and its subdirectories to a specified size and saves them in a new directory structure.

  Args:
    input_dir: The directory containing the PNG images.
    output_dir: The directory to save the resized images.
    new_size: The desired size of the resized images (width, height).
  """

  for root, _, files in os.walk(input_dir):
    # Create the corresponding output subdirectory
    output_subdir = os.path.join(output_dir, os.path.relpath(root, input_dir))
    os.makedirs(output_subdir, exist_ok=True)

    for file in files:
      if file.endswith(".png"):
        input_path = os.path.join(root, file)
        output_path = os.path.join(output_subdir, file)

        try:
          with Image.open(input_path) as img:
            img = img.resize(new_size, Image.ANTIALIAS)
            img.save(output_path, "PNG")
            print(f"Resized and saved: {input_path} -> {output_path}")
        except OSError as e:
          print(f"Error processing image: {input_path} - {e}")

if __name__ == "__main__":
  input_directory = "/content/drive/My Drive/FFgenAI/images"
  output_directory = "/content/drive/My Drive/FFgenAI/images_resized"
  resize_and_save(input_directory, output_directory)


In [None]:
!huggingface-cli login


    _|    _|  _|    _|    _|_|_|    _|_|_|  _|_|_|  _|      _|    _|_|_|      _|_|_|_|    _|_|      _|_|_|  _|_|_|_|
    _|    _|  _|    _|  _|        _|          _|    _|_|    _|  _|            _|        _|    _|  _|        _|
    _|_|_|_|  _|    _|  _|  _|_|  _|  _|_|    _|    _|  _|  _|  _|  _|_|      _|_|_|    _|_|_|_|  _|        _|_|_|
    _|    _|  _|    _|  _|    _|  _|    _|    _|    _|    _|_|  _|    _|      _|        _|    _|  _|        _|
    _|    _|    _|_|      _|_|_|    _|_|_|  _|_|_|  _|      _|    _|_|_|      _|        _|    _|    _|_|_|  _|_|_|_|

    To login, `huggingface_hub` requires a token generated from https://huggingface.co/settings/tokens .
Enter your token (input will not be visible): 
Add token as git credential? (Y/n) n
Token is valid (permission: fineGrained).
Your token has been saved to /root/.cache/huggingface/token
Login successful


In [None]:
!pip install -q datasets

### Before pushing to hub the *metadata.csv* file must be moved to the ***images_resized*** directory

In [None]:
### moving all images to parent directory discarding subdirectory structure to match metadata description
import os
import shutil

for root, _, files in os.walk("/content/drive/My Drive/FFgenAI/images_resized"):
  for file in files:
    if file.endswith('.png'):
      source = os.path.join(root, file)
      destination = os.path.join("/content/drive/My Drive/FFgenAI/images_resized", file)
      try:
        shutil.move(source, destination)
        print(f"Moved {source} to {destination}")
      except shutil.Error as e:
        print(f"Error moving file: {e}")

In [None]:
from datasets import load_dataset

dataset = load_dataset("imagefolder", data_dir="/content/drive/My Drive/FFgenAI/images_resized")
dataset.push_to_hub("mahim05078/logos-blips")

Resolving data files:   0%|          | 0/425 [00:00<?, ?it/s]

Downloading data:   0%|          | 0/425 [00:00<?, ?files/s]

Generating train split: 0 examples [00:00, ? examples/s]

Uploading the dataset shards:   0%|          | 0/1 [00:00<?, ?it/s]

Map:   0%|          | 0/424 [00:00<?, ? examples/s]

Creating parquet from Arrow format:   0%|          | 0/5 [00:00<?, ?ba/s]

CommitInfo(commit_url='https://huggingface.co/datasets/mahim05078/logos-blips/commit/9731f6c77fa85abdf4cca290d299e92d6d6da0b8', commit_message='Upload dataset', commit_description='', oid='9731f6c77fa85abdf4cca290d299e92d6d6da0b8', pr_url=None, pr_revision=None, pr_num=None)

In [None]:
### WRITE YOUR CODE TO BUILD THE DATASET HERE

**Link to the dataset on Hugging Face Hub:** [LINK HERE](https://huggingface.co/datasets/mahim05078/logos-blips)

# 2. Finetune a Foundation Model

Now that you have collected a dataset, its time to pick a base model to finetune.


* Go to the [Hugging Face Hub](https://huggingface.co/models) and pick a foundation model to fine-tune. (For example, if you are interested in generating images, you could pick [Stable Diffusion 1.5](https://huggingface.co/runwayml/stable-diffusion-v1-5) or [Stable Diffusion 3](https://huggingface.co/stabilityai/stable-diffusion-3-medium) as your base model.) Make sure to pick a model that can be loaded in the free tier of the Colab Notebook.
* Then finetine the your model on the dataset that you collected in Step 1. There are different ways to finetune a model: [from LoRA to a full finetune](https://huggingface.co/docs/diffusers/v0.13.0/en/training/lora). Pick one of these methods, and explain your reasoning below. We suggest that you use use the `transformers` or `diffusers` library to finetune a foundation model.
* Generate some samples from the base model and from the final finetuned model. How do they compare?  
* [Upload the the model to the Hugging Face Hub](https://huggingface.co/docs/hub/adding-a-model), and add a link to your model below.


In [None]:
%env MODEL_NAME="CompVis/stable-diffusion-v1-4"
%env DATASET_NAME="mahim05078/logo-blips"

env: MODEL_NAME="CompVis/stable-diffusion-v1-4"
env: DATASET_NAME="mahim05078/logo-blips"


In [None]:
!wget https://raw.githubusercontent.com/huggingface/diffusers/main/examples/text_to_image/train_text_to_image_lora.py

--2024-08-01 08:04:42--  https://raw.githubusercontent.com/huggingface/diffusers/main/examples/text_to_image/train_text_to_image_lora.py
Resolving raw.githubusercontent.com (raw.githubusercontent.com)... 185.199.108.133, 185.199.110.133, 185.199.111.133, ...
Connecting to raw.githubusercontent.com (raw.githubusercontent.com)|185.199.108.133|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 40697 (40K) [text/plain]
Saving to: ‘train_text_to_image_lora.py’


2024-08-01 08:04:43 (38.3 MB/s) - ‘train_text_to_image_lora.py’ saved [40697/40697]



In [None]:
!pip install -q datasets
!pip install -q --upgrade peft
!pip install -q --upgrade transformers
!pip install -U -qq git+https://github.com/huggingface/diffusers.git
!pip install -q wandb

In [None]:
!git clone https://github.com/huggingface/diffusers
!git clone https://github.com/justinpinkney/stable-diffusion

Cloning into 'diffusers'...
remote: Enumerating objects: 66619, done.[K
remote: Counting objects: 100% (951/951), done.[K
remote: Compressing objects: 100% (506/506), done.[K
remote: Total 66619 (delta 627), reused 617 (delta 363), pack-reused 65668[K
Receiving objects: 100% (66619/66619), 47.51 MiB | 14.11 MiB/s, done.
Resolving deltas: 100% (49061/49061), done.
Cloning into 'stable-diffusion'...
remote: Enumerating objects: 1755, done.[K
remote: Counting objects: 100% (8/8), done.[K
remote: Compressing objects: 100% (5/5), done.[K
remote: Total 1755 (delta 3), reused 5 (delta 3), pack-reused 1747[K
Receiving objects: 100% (1755/1755), 73.93 MiB | 14.01 MiB/s, done.
Resolving deltas: 100% (1082/1082), done.


## Train model with bash script

In [None]:
!accelerate launch --mixed_precision="bf16" diffusers/examples/text_to_image/train_text_to_image_lora.py \
  --pretrained_model_name_or_path="CompVis/stable-diffusion-v1-4" \
  --dataset_name="mahim05078/logos-blips" --caption_column="text" \
  --resolution=512 --random_flip \
  --train_batch_size=1 \
  --num_train_epochs=50 --checkpointing_steps=5000 \
  --learning_rate=1e-04 --lr_scheduler="constant" --lr_warmup_steps=0 \
  --seed=1337 \
  --output_dir="/content/drive/My Drive/FFgenAI/sd-finetuned-logos-lora" \
  --validation_prompt="A logo Gentle Men company. Category Apparel. [A black suite on red background with the words GM] " --report_to="wandb"

The following values were not passed to `accelerate launch` and had defaults used instead:
	`--num_processes` was set to a value of `1`
	`--num_machines` was set to a value of `1`
	`--dynamo_backend` was set to a value of `'no'`
2024-07-31 19:34:38.031252: I tensorflow/core/util/port.cc:153] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable `TF_ENABLE_ONEDNN_OPTS=0`.
2024-07-31 19:34:38.048962: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:485] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
2024-07-31 19:34:38.070464: E external/local_xla/xla/stream_executor/cuda/cuda_dnn.cc:8454] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
2024-07-31 19:34:38.077019: E external/local_xla/xla/str

### Upload model to huggingface hub

In [None]:
from huggingface_hub import notebook_login

notebook_login()

VBox(children=(HTML(value='<center> <img\nsrc=https://huggingface.co/front/assets/huggingface_logo-noborder.sv…

In [None]:
from huggingface_hub import HfApi
api = HfApi()
api.upload_folder(
    folder_path="/content/drive/My Drive/FFgenAI/sd-finetuned-logos-lora",
    path_in_repo="",
    repo_id="mahim05078/sd-finetuned-logos-lora",
    repo_type="model",
)

pytorch_lora_weights.safetensors:   0%|          | 0.00/1.63M [00:00<?, ?B/s]

Upload 21 LFS files:   0%|          | 0/21 [00:00<?, ?it/s]

random_states_0.pkl:   0%|          | 0.00/14.4k [00:00<?, ?B/s]

optimizer.bin:   0%|          | 0.00/3.40M [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/1.72G [00:00<?, ?B/s]

scheduler.bin:   0%|          | 0.00/1.00k [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/1.72G [00:00<?, ?B/s]

optimizer.bin:   0%|          | 0.00/3.40M [00:00<?, ?B/s]

pytorch_lora_weights.safetensors:   0%|          | 0.00/1.63M [00:00<?, ?B/s]

random_states_0.pkl:   0%|          | 0.00/14.4k [00:00<?, ?B/s]

scheduler.bin:   0%|          | 0.00/1.00k [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/1.72G [00:00<?, ?B/s]

optimizer.bin:   0%|          | 0.00/3.40M [00:00<?, ?B/s]

pytorch_lora_weights.safetensors:   0%|          | 0.00/1.63M [00:00<?, ?B/s]

random_states_0.pkl:   0%|          | 0.00/14.4k [00:00<?, ?B/s]

scheduler.bin:   0%|          | 0.00/1.00k [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/1.72G [00:00<?, ?B/s]

optimizer.bin:   0%|          | 0.00/3.40M [00:00<?, ?B/s]

pytorch_lora_weights.safetensors:   0%|          | 0.00/1.63M [00:00<?, ?B/s]

random_states_0.pkl:   0%|          | 0.00/14.4k [00:00<?, ?B/s]

scheduler.bin:   0%|          | 0.00/1.00k [00:00<?, ?B/s]

pytorch_lora_weights.safetensors:   0%|          | 0.00/3.23M [00:00<?, ?B/s]

CommitInfo(commit_url='https://huggingface.co/mahim05078/sd-finetuned-logos-lora/commit/6ae26c860513faa3930e55c98c8cff0361ba1546', commit_message='Upload folder using huggingface_hub', commit_description='', oid='6ae26c860513faa3930e55c98c8cff0361ba1546', pr_url=None, pr_revision=None, pr_num=None)

## Inference comparison
### first prompting the pretratined stable diffusion 1.4

In [None]:
from diffusers import StableDiffusionPipeline

pipe = StableDiffusionPipeline.from_pretrained("CompVis/stable-diffusion-v1-4")
pipe.to("cuda")

# prompt = "A logo for Bkash company. Category Financial Services. a pink paper bird on white background"

from PIL import Image

def image_grid(imgs, rows, cols):
    assert len(imgs) == rows*cols

    w, h = imgs[0].size
    grid = Image.new('RGB', size=(cols*w, rows*h))
    grid_w, grid_h = grid.size

    for i, img in enumerate(imgs):
        grid.paste(img, box=(i%cols*w, i//cols*h))
    return grid


num_images = 3
prompt = ["A logo for amazon company. Category technology. a blue arrow with the word amazon on it."] * num_images

images = pipe(prompt).images

grid = image_grid(images, rows=1, cols=3)

# you can save the grid with
grid.save(f"B-logo-pretrained.png")

The cache for model files in Transformers v4.22.0 has been updated. Migrating your old cache. This is a one-time only operation. You can interrupt this and resume the migration later on by calling `transformers.utils.move_cache()`.


0it [00:00, ?it/s]

The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


model_index.json:   0%|          | 0.00/541 [00:00<?, ?B/s]

Fetching 16 files:   0%|          | 0/16 [00:00<?, ?it/s]

model.safetensors:   0%|          | 0.00/1.22G [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/492M [00:00<?, ?B/s]

scheduler/scheduler_config.json:   0%|          | 0.00/313 [00:00<?, ?B/s]

safety_checker/config.json:   0%|          | 0.00/4.56k [00:00<?, ?B/s]

tokenizer/merges.txt:   0%|          | 0.00/525k [00:00<?, ?B/s]

(…)kpoints/scheduler_config-checkpoint.json:   0%|          | 0.00/209 [00:00<?, ?B/s]

(…)ature_extractor/preprocessor_config.json:   0%|          | 0.00/342 [00:00<?, ?B/s]

text_encoder/config.json:   0%|          | 0.00/592 [00:00<?, ?B/s]

tokenizer/tokenizer_config.json:   0%|          | 0.00/806 [00:00<?, ?B/s]

tokenizer/vocab.json:   0%|          | 0.00/1.06M [00:00<?, ?B/s]

unet/config.json:   0%|          | 0.00/743 [00:00<?, ?B/s]

tokenizer/special_tokens_map.json:   0%|          | 0.00/472 [00:00<?, ?B/s]

diffusion_pytorch_model.safetensors:   0%|          | 0.00/3.44G [00:00<?, ?B/s]

diffusion_pytorch_model.safetensors:   0%|          | 0.00/335M [00:00<?, ?B/s]

vae/config.json:   0%|          | 0.00/551 [00:00<?, ?B/s]

Loading pipeline components...:   0%|          | 0/7 [00:00<?, ?it/s]

  0%|          | 0/50 [00:00<?, ?it/s]

### Now prompting our finetuned stable diffusion 1.4

In [None]:
from diffusers import StableDiffusionPipeline
import torch

model_path = "mahim05078/sd-finetuned-logos-lora"
pipe_ft = StableDiffusionPipeline.from_pretrained("CompVis/stable-diffusion-v1-4", torch_dtype=torch.float16)
pipe_ft.unet.load_attn_procs(model_path)
pipe_ft.to("cuda")
images = pipe_ft(prompt).images

grid = image_grid(images, rows=1, cols=3)

# you can save the grid with
grid.save(f"B-logo-finetuned.png")

Loading pipeline components...:   0%|          | 0/7 [00:00<?, ?it/s]

pytorch_lora_weights.safetensors:   0%|          | 0.00/3.23M [00:00<?, ?B/s]

  0%|          | 0/50 [00:00<?, ?it/s]

In [None]:
### WRITE YOUR CODE TO FINETUNE THE MODEL HERE

**Write up**:
* Explain what finetuning strategy you used and why

    * We used stable diffusion 1.4 as our base model. For finetuning we picked LoRA  text-to-image finetuning. LoRA support is available for two types of finetuning strategies text-to-image and dreambooth. For dreambooth we introduce new concepts to the model by using 3-5 sample images which are related. We chose text-to-image finetuning because we would like the model to learn the inference relationship between a log o image and it's textual descriptio. Also we added brand name and category in the capions to understand if the model can establish some sort of inference rules form those.

* Share some samples from the base model and from the final finetuned model. How do they compare?

    * Here are some sample images generated by the base model and the finetuned model for the same prompts.
      
      * Propmt 1: "*A logo for Banhi company. Category luxury. a blue triangle with the letter B inside*".
      >Base model
      > ![](https://drive.google.com/uc?id=1MVe3SmjYIACSbtHUfxLGA1hKNiCUWeFw)
      >Finetuned model
      >![](https://drive.google.com/uc?id=1131VKVC2zuWaDHPpodDfyCMs37GAQ9M5)
      * Propmt 2: "*A logo for Greek company. Category leisure. a white crown on red background*".
      >Base model
      > ![](https://drive.google.com/uc?id=1V_NJL1br7HEM69hWWJdEOw3FWp7O4dq7)
      >Finetuned model
      >![](https://drive.google.com/uc?id=10TqN7500zEP_Ztw4v2uxJrR5fpZQCUmQ)
      
    * Though both models are inadquate for real life use it is noticable that the finetuned model can follow object level descrptions and somewhat stick to the subject matter. Though color profiles are seem way off in both cases, more importantly other trial reveal it is weak on complex previously unknown objects.


**Link to the model on Hugging Face Hub:** [LINK HERE](https://huggingface.co/mahim05078/sd-finetuned-logos-lora/tree/main)