# TST-CycleGAN with CLIP integration for Multimodal Machine Translation

This notebook clones the TST-CycleGAN repo, sets up the environment, and runs the training pipeline with the following phases:
1. Caption Training (Image to Text)
2. Translation Training (Supervised Translation)
3. Cycle Training (CycleGAN with multimodal inputs)

## 1. Mount Google Drive

In [3]:
from google.colab import drive
drive.mount('/content/drive')

# Create base directories

Mounted at /content/drive


## 2. Clone Repository and Checkout Branch

In [1]:
# Clone the repository
!git clone https://github.com/developer-sidani/TST-CycleGAN.git

# Change to the repository directory
%cd TST-CycleGAN

# Checkout the mmt branch
!git checkout mmt

# Show current branch to confirm
!git branch

Cloning into 'TST-CycleGAN'...
remote: Enumerating objects: 642, done.[K
remote: Counting objects: 100% (100/100), done.[K
remote: Compressing objects: 100% (73/73), done.[K
remote: Total 642 (delta 45), reused 73 (delta 26), pack-reused 542 (from 1)[K
Receiving objects: 100% (642/642), 23.02 MiB | 10.69 MiB/s, done.
Resolving deltas: 100% (252/252), done.
/content/TST-CycleGAN
Branch 'mmt' set up to track remote branch 'mmt' from 'origin'.
Switched to a new branch 'mmt'
  main[m
* [32mmmt[m


## 3. Set Up Conda Environment with Condacolab

In [6]:
# Install condacolab
!pip install -q condacolab
import condacolab
condacolab.install()

⏬ Downloading https://github.com/jaimergp/miniforge/releases/download/24.11.2-1_colab/Miniforge3-colab-24.11.2-1_colab-Linux-x86_64.sh...
📦 Installing...
📌 Adjusting configuration...
🩹 Patching environment...
⏲ Done in 0:00:11
🔁 Restarting kernel...


In [4]:
from google.colab import userdata
import os

os.environ["COMET_API_KEY"] = userdata.get('COMET_API_KEY')
os.environ["COMET_PROJECT_NAME"] = 'cyclegan-mmt'
os.environ["COMET_WORKSPACE"] = userdata.get('COMET_WORKSPACE')



In [15]:
!cd /content/drive/MyDrive/env && ls

In [20]:
!conda info --env


# conda environments:
#
base                   /usr/local



In [26]:
!conda env list


# conda environments:
#
base                   /usr/local
cyclegan               /usr/local/envs/cyclegan



## 4. Data Setup

Check and prepare the Multi30k dataset

In [24]:
# prompt: unzip /content/drive/MyDrive/thesis/data.zip to /content/TST-CycleGAN/data/images

!unzip /content/drive/MyDrive/thesis/data.zip -d /content/TST-CycleGAN/data/images


[1;30;43mStreaming output truncated to the last 5000 lines.[0m
  inflating: /content/TST-CycleGAN/data/images/data/train/2289751916.jpg  
  inflating: /content/TST-CycleGAN/data/images/__MACOSX/data/train/._2289751916.jpg  
  inflating: /content/TST-CycleGAN/data/images/data/train/3273892996.jpg  
  inflating: /content/TST-CycleGAN/data/images/__MACOSX/data/train/._3273892996.jpg  
  inflating: /content/TST-CycleGAN/data/images/data/train/1783147941.jpg  
  inflating: /content/TST-CycleGAN/data/images/__MACOSX/data/train/._1783147941.jpg  
  inflating: /content/TST-CycleGAN/data/images/data/train/2114355355.jpg  
  inflating: /content/TST-CycleGAN/data/images/__MACOSX/data/train/._2114355355.jpg  
  inflating: /content/TST-CycleGAN/data/images/data/train/7813154662.jpg  
  inflating: /content/TST-CycleGAN/data/images/__MACOSX/data/train/._7813154662.jpg  
  inflating: /content/TST-CycleGAN/data/images/data/train/3416460533.jpg  
  inflating: /content/TST-CycleGAN/data/images/__MACOSX

In [33]:
# Define paths
IMAGES_DIR = "/content/TST-CycleGAN/data/images/data/train"
IMAGES_TEST_DIR = "/content/TST-CycleGAN/data/images/data/test_2016_flickr"
TEMP_DIR = "/content/TST-CycleGAN/data/multi30k/data/task1/raw"

# Create directories if they don't exist
!mkdir -p $IMAGES_DIR
!mkdir -p $IMAGES_TEST_DIR
!mkdir -p $TEMP_DIR

# Check if data is already downloaded
import os
if not os.listdir(IMAGES_DIR):
    print("Download Multi30k dataset (this may take some time)...")
    # You'll need to add code here to download the Multi30k dataset
    # For example:
    # !wget -P /tmp/ https://github.com/multi30k/dataset/raw/master/data/task1/image_splits/train.txt
    # More download commands as needed
else:
    print("Multi30k data already exists.")

Multi30k data already exists.


## 5. Generate TSV Files for Training

Create TSV files from raw data and image splits

In [41]:
# Set parameters
SRC_LANG = "en"  # Source language
TGT_LANG = "de"  # Target language
TEST_YEAR = "2016"  # Test set year
TEST_SET = "flickr"  # Test set name

# Define paths
TRAIN_SPLITS = f"/content/TST-CycleGAN/data/multi30k/data/task1/image_splits/train.txt"
TEST_SPLITS = f"/content/TST-CycleGAN/data/multi30k/data/task1/image_splits/test_{TEST_YEAR}_{TEST_SET}.txt"

CAPTION_FILE_SRC = f"{TEMP_DIR}/train.{SRC_LANG}"
CAPTION_FILE_TGT = f"{TEMP_DIR}/train.{TGT_LANG}"
CAPTION_FILE_SRC_EVAL = f"{TEMP_DIR}/test_{TEST_YEAR}_{TEST_SET}.{SRC_LANG}"
CAPTION_FILE_TGT_EVAL = f"{TEMP_DIR}/test_{TEST_YEAR}_{TEST_SET}.{TGT_LANG}"

# Generate TSV files if they don't exist
import os

# For demonstration: if in real use, replace with actual Multi30k data paths
# Check if files need to be generated
if not os.path.exists(CAPTION_FILE_SRC):
    print("TSV files need to be generated. Please ensure raw data is downloaded.")
    # In the actual implementation, you would run these commands:
    # !paste -d '\t' $TRAIN_SPLITS "/path/to/train.${SRC_LANG}" > $CAPTION_FILE_SRC
    # !paste -d '\t' $TRAIN_SPLITS "/path/to/train.${TGT_LANG}" > $CAPTION_FILE_TGT
else:
    print("TSV files already exist.")

TSV files already exist.


## 6. Define Model Settings and Common Parameters

In [42]:
# Define model directories
CAPTION_MODEL_NAME = f"multi30k_{SRC_LANG}_{TGT_LANG}_caption_p1"
TRANSLATE_MODEL_NAME = f"multi30k_{SRC_LANG}_{TGT_LANG}_translate_p2"
CYCLE_MODEL_NAME = f"multi30k_{SRC_LANG}_{TGT_LANG}_cycle_p3"

CAPTION_SAVE_DIR = f"/content/drive/MyDrive/TST-CycleGAN-data/models/{CAPTION_MODEL_NAME}"
TRANSLATE_SAVE_DIR = f"/content/drive/MyDrive/TST-CycleGAN-data/models/{TRANSLATE_MODEL_NAME}"
CYCLE_SAVE_DIR = f"/content/drive/MyDrive/TST-CycleGAN-data/models/{CYCLE_MODEL_NAME}"

# Create directories if they don't exist
!mkdir -p $CAPTION_SAVE_DIR
!mkdir -p $TRANSLATE_SAVE_DIR
!mkdir -p $CYCLE_SAVE_DIR

# Common parameters
CLIP_MODEL = "openai/clip-vit-base-patch32"
MBART_MODEL = "facebook/mbart-large-50"
PREFIX_LENGTH = 10
MAPPING_NETWORK = "mlp"

## 7. PHASE 1: Caption Training

Train the model to generate captions from images

In [43]:
!cd /usr/local/envs/cyclegan/ && ls

bin		 etc	  mkspecs      plugins	  ssl
cmake		 include  mysql        qml	  translations
compiler_compat  lib	  mysql-test   resources  var
conda-meta	 libexec  opt	       sbin	  x86_64-conda_cos7-linux-gnu
doc		 man	  phrasebooks  share	  x86_64-conda-linux-gnu


In [44]:
print("Starting PHASE 1 - CAPTION TRAINING")

python_path = ' /usr/local/envs/cyclegan/bin/python3'
# Run the training script with appropriate parameters
!$python_path train.py \
    --style_a $SRC_LANG \
    --style_b $TGT_LANG \
    --training_phase caption \
    --use_clip \
    --clip_model_name $CLIP_MODEL \
    --prefix_length $PREFIX_LENGTH \
    --mapping_network $MAPPING_NETWORK \
    --image_dir $IMAGES_DIR \
    --caption_file_a $CAPTION_FILE_SRC \
    --caption_file_b $CAPTION_FILE_TGT \
    --caption_file_a_eval $CAPTION_FILE_SRC_EVAL \
    --caption_file_b_eval $CAPTION_FILE_TGT_EVAL \
    --lang $SRC_LANG \
    --batch_size 16 \
    --epochs 5 \
    --save_base_folder $CAPTION_SAVE_DIR \
    --save_steps 1 \
    --learning_rate 2e-5 \
    --lr_scheduler_type "linear" \
    --warmup \
    --use_cuda_if_available

print("Completed PHASE 1 - CAPTION TRAINING")

Starting PHASE 1 - CAPTION TRAINING
Arguments summary: 
 
	style_a:		en
	style_b:		de
	lang:		en
	max_samples_train:		None
	max_samples_eval:		None
	nonparal_same_size:		False
	path_mono_A:		None
	path_mono_B:		None
	path_mono_A_eval:		None
	path_mono_B_eval:		None
	path_paral_A_eval:		None
	path_paral_B_eval:		None
	path_paral_eval_ref:		None
	n_references:		None
	lowercase_ref:		False
	bertscore:		True
	max_sequence_length:		64
	batch_size:		16
	shuffle:		False
	num_workers:		4
	pin_memory:		False
	use_cuda_if_available:		True
	learning_rate:		2e-05
	epochs:		5
	lr_scheduler_type:		linear
	warmup:		True
	lambdas:		1|1|1|1|1|1
	generator_model_tag:		None
	discriminator_model_tag:		None
	pretrained_classifier_model:		None
	pretrained_classifier_eval:		None
	save_base_folder:		/content/drive/MyDrive/TST-CycleGAN-data/models/multi30k_en_de_caption_p1
	from_pretrained:		None
	save_steps:		1
	eval_strategy:		None
	eval_steps:		None
	additional_eval:		0
	control_file:		None
	lambda_file:		N

## 8. PHASE 2: Translation Training

Train the model for supervised translation using images as context

In [None]:
print("Starting PHASE 2 - TRANSLATION TRAINING")

# Run the training script with appropriate parameters
!python train.py \
    --style_a $SRC_LANG \
    --style_b $TGT_LANG \
    --training_phase translate \
    --use_clip \
    --clip_model_name $CLIP_MODEL \
    --prefix_length $PREFIX_LENGTH \
    --mapping_network $MAPPING_NETWORK \
    --image_dir $IMAGES_DIR \
    --caption_file_a $CAPTION_FILE_SRC \
    --caption_file_b $CAPTION_FILE_TGT \
    --caption_file_a_eval $CAPTION_FILE_SRC_EVAL \
    --caption_file_b_eval $CAPTION_FILE_TGT_EVAL \
    --lang $SRC_LANG \
    --generator_model_tag $MBART_MODEL \
    --discriminator_model_tag "distilbert-base-multilingual-cased" \
    --batch_size 16 \
    --epochs 10 \
    --from_pretrained "${CAPTION_SAVE_DIR}/final/" \
    --save_base_folder $TRANSLATE_SAVE_DIR \
    --save_steps 1 \
    --learning_rate 2e-5 \
    --lr_scheduler_type "linear" \
    --warmup \
    --use_cuda_if_available

print("Completed PHASE 2 - TRANSLATION TRAINING")

## 9. PHASE 3: Cycle Training

Train the full CycleGAN model with cycle consistency

In [None]:
print("Starting PHASE 3 - CYCLE TRAINING")

# Run the training script with appropriate parameters
!python train.py \
    --style_a $SRC_LANG \
    --style_b $TGT_LANG \
    --training_phase cycle \
    --use_clip \
    --clip_model_name $CLIP_MODEL \
    --prefix_length $PREFIX_LENGTH \
    --mapping_network $MAPPING_NETWORK \
    --image_dir $IMAGES_DIR \
    --caption_file_a $CAPTION_FILE_SRC \
    --caption_file_b $CAPTION_FILE_TGT \
    --caption_file_a_eval $CAPTION_FILE_SRC_EVAL \
    --caption_file_b_eval $CAPTION_FILE_TGT_EVAL \
    --lang $SRC_LANG \
    --generator_model_tag $MBART_MODEL \
    --discriminator_model_tag "distilbert-base-multilingual-cased" \
    --batch_size 8 \
    --epochs 10 \
    --from_pretrained "${TRANSLATE_SAVE_DIR}/final/" \
    --save_base_folder $CYCLE_SAVE_DIR \
    --save_steps 1 \
    --learning_rate 1e-5 \
    --lr_scheduler_type "linear" \
    --warmup \
    --lambdas "1|1|0.5|0.5|10" \
    --use_cuda_if_available

print("Completed PHASE 3 - CYCLE TRAINING")

## 10. Testing and Evaluation

In [None]:
print("Running evaluation on test set")

# Example test command - adjust as needed based on your test script
!python test.py \
    --style_a $SRC_LANG \
    --style_b $TGT_LANG \
    --direction AB \
    --input_mode text \
    --input_file $CAPTION_FILE_SRC_EVAL \
    --output_file "${CYCLE_SAVE_DIR}/test_output.txt" \
    --generator_model_tag $MBART_MODEL \
    --model_path "${CYCLE_SAVE_DIR}/final/" \
    --use_clip \
    --clip_model_name $CLIP_MODEL \
    --prefix_length $PREFIX_LENGTH \
    --mapping_network $MAPPING_NETWORK \
    --use_cuda_if_available

print("Evaluation complete!")

## 11. Visualize Results (Optional)

In [None]:
# Add visualization code if needed
import matplotlib.pyplot as plt
import pandas as pd

# Example: Plot training metrics if saved
# This is just a placeholder - replace with actual visualization code
try:
    loss_file = f"{CYCLE_SAVE_DIR}/loss.pickle"
    if os.path.exists(loss_file):
        import pickle
        with open(loss_file, 'rb') as f:
            loss_data = pickle.load(f)

        # Plot losses
        plt.figure(figsize=(12, 6))
        for key in loss_data:
            if isinstance(loss_data[key], list):
                plt.plot(loss_data[key], label=key)
        plt.legend()
        plt.title('Training Losses')
        plt.show()
except Exception as e:
    print(f"Error visualizing results: {e}")
    print("Visualization code needs to be adapted to your specific output format.")