<a href="https://colab.research.google.com/github/Matthieu6/IndividualProject/blob/main/Semantic_Image_Compression.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Final Semantic Image Compression Scheme

The following code shows the final image compression scheme, created by taking open-source image captioning and image generation models to construct an encoder-decoder architecture.

*If the following model does not work due to limited RAM available, make sure to run the Encoder and Decoder sections separately.*

## Encoder (image captioning)

Initiate the image captioning model (BLIP-2) installing any packages.

In [None]:
!pip3 install salesforce-lavis

import torch
from PIL import Image
import requests
from lavis.models import load_model_and_preprocess

# setup device to use
device = torch.device("cuda") if torch.cuda.is_available() else "cpu"

model, vis_processors, _ = load_model_and_preprocess(name="blip2_t5", model_type="pretrain_flant5xxl", is_eval=True, device=device)

vis_processors.keys()

Runs the code to create a caption from the input image. This will act as the data transferred from one location to another.

In [None]:
#change name to the file you would like to complete semantic image compression to
raw_image = Image.open("input.jpg").convert('RGB')

image = vis_processors["eval"](raw_image).unsqueeze(0).to(device)
# Generate text from the image
gen_output = model.generate({"image": image}, min_length=16, max_length=64)
# Assume the output is directly the text. If it's in a different format, you'll need to adjust this part
generated_text = gen_output



## Decoder (image Generation)

Initiate the image generation model (StableSDXL) installing any packages.

In [None]:
!pip install diffusers
import torch
from diffusers import StableDiffusionXLPipeline, UNet2DConditionModel, EulerDiscreteScheduler
from huggingface_hub import hf_hub_download
from safetensors.torch import load_file

base = "stabilityai/stable-diffusion-xl-base-1.0"
repo = "ByteDance/SDXL-Lightning"
ckpt = "sdxl_lightning_8step_unet.safetensors" # Use the correct ckpt for your step setting!

# Load model.
unet = UNet2DConditionModel.from_config(base, subfolder="unet").to("cuda", torch.float16)
unet.load_state_dict(load_file(hf_hub_download(repo, ckpt), device="cuda"))
pipe = StableDiffusionXLPipeline.from_pretrained(base, unet=unet, torch_dtype=torch.float16, variant="fp16").to("cuda")

# Ensure sampler uses "trailing" timesteps.
pipe.scheduler = EulerDiscreteScheduler.from_config(pipe.scheduler.config, timestep_spacing="trailing")

Generates an image from the image caption generated in the encoder architecture section

In [None]:
# Use the caption text as input for image generation
pipe(generated_text, num_inference_steps=8, guidance_scale=7).images[0].save("output.png")

The output will be a .png file, representing the input image passed through the semantic image compression scheme.