# Data preparation for Flux.1 Fine-tuning

Code authored by: Shaw Talebi <br>
[Video link](https://youtu.be/bZr2vhoXSy8) | [Blog link](https://medium.com/@shawhin/i-trained-flux-1-on-my-face-and-how-you-can-too-bbf0cb3824b0)

### imports

In [1]:
from PIL import Image
import os
import shutil

import replicate
from dotenv import load_dotenv

In [2]:
# load vars from .env
load_dotenv()

# set replicate api key
replicate = replicate.Client(api_token=os.getenv("REPLICATE_API_KEY"))

### crop images to be square

In [3]:
# Set input and output directories
input_dir = "raw/"  # Change this to your folder
output_dir = "data/"

# Ensure output directory exists
os.makedirs(output_dir, exist_ok=True)

# image counter
i = 0

# Process each image
for filename in os.listdir(input_dir):
    if filename.lower().endswith(('.png', '.jpg', '.jpeg')):  # Check for PNG and JPEG
        img_path = os.path.join(input_dir, filename)
        img = Image.open(img_path)

        # Get original dimensions
        width, height = img.size
        new_size = min(width, height)  # Smallest dimension for square crop

        # Calculate cropping box (centered)
        left = (width - new_size) // 2
        top = (height - new_size) // 2
        right = left + new_size
        bottom = top + new_size

        # Crop the image
        img_cropped = img.crop((left, top, right, bottom))

        # Resize to 1024x1024 using high-quality resampling
        img_resized = img_cropped.resize((1024, 1024), Image.LANCZOS)

        # Crop and save the image
        output_path = os.path.join(output_dir, f"img-{i}.png")
        img_resized.save(output_path, format="PNG")
        i += 1

### write captions with llava

In [4]:
%%time
for filename in os.listdir(output_dir):
    if filename.lower().endswith('.png'):
        img_path = os.path.join(output_dir, filename)
        image = open(img_path, "rb")

        # generate caption
        output = replicate.run(
            "yorickvp/llava-13b:a0fdc44e4f2e1f20f2bb4e27846899953ac8e66c5886c5878fa1d6b73ce009e5",
            input={
                "image": image,
                "prompt": "Please write a caption for this image."
            }
        )
        # extract text form output
        response = " ".join(list(output))

        # add trigger token "shw-tlb"
        caption = f"A photo of shw-tlb. {response}"
 
        # save caption a text file
        caption_filename = img_path.split('.')[0] + '.txt'
        with open(caption_filename, "w") as file:
            print(caption)
            file.write(caption)

A photo of shw-tlb. A  man is  sitting at  a table  in a  restaurant,  working on  his laptop.  The table  is surrounded  by chairs,  and there  are several  other tables  in the  restaurant.  The man  is focused  on his  work,  and the  laptop is  open in  front of  him. 
A photo of shw-tlb. A  man wearing  a gray  shirt  is smiling  and looking  at the  camera. 
A photo of shw-tlb. A  man with  a bald  head and  a beard  is wearing  a black  shirt  and sitting  against a  white brick  wall. 
A photo of shw-tlb. A  man wearing  a suit  and tie  is speaking  into a  microphone. 
A photo of shw-tlb. A  man wearing  a white  shirt  and a  black hat  is sitting  in a  chair with  his arms  outstretched,  possibly making  a gesture  of surprise  or excitement.  The room  has a  kitchen area  with a  sink,  a dining  table,  and a  few chairs.  There are  also some  decorative  elements,  such as  a potted  plant and  a vase,  and a  wine glass  can be  seen on  a surface. 
A photo of shw-t

### compress data as .zip

In [5]:
# inputs: name of .zip file, compression format, directory to compress
shutil.make_archive("data", 'zip', "data/")

'/Users/shawhin/Documents/_code/_stv/sandbox/flux-finetuning/data.zip'