# Text-to-Gloss Generation using Flan-T5 (ISL_CLSRT)

This notebook demonstrates **generative text-to-gloss mapping** using **Flan-T5** zero-shot prompting. We input natural language sentences and generate corresponding **gloss-style outputs** without supervised fine-tuning, using **instruction prompting**—a core concept in **Generative AI**.


In [None]:
# !pip install transformers pandas


In [1]:
import pandas as pd
from transformers import AutoTokenizer, AutoModelForSeq2SeqLM

import torch


In [2]:
from google.colab import drive
drive.mount('/content/drive')

Mounted at /content/drive


In [3]:
df = pd.read_csv('/content/drive/MyDrive/IETGenAI-SLT/Chapter 4/isl_train_meta_cleaned.csv')
df_sample = df.sample(10, random_state=42).copy()
df_sample[['Sentences', 'cleaned_gloss']]

Unnamed: 0,Sentences,cleaned_gloss
361,you are good,GOOD
73,it was nice chatting with you,NICE CHATTING
374,i got hurt,GOT HURT
155,you can do it,
104,he came by train,CAME TRAIN
394,"you need a medicine, take this one",NEED MEDICINE TAKE ONE
377,speak softly,SPEAK SOFTLY
124,he came by train,CAME TRAIN
68,we are all with you,
450,he came by train,CAME TRAIN


In [None]:
model_name = "google/flan-t5-small"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForSeq2SeqLM.from_pretrained(model_name)


In [5]:
def generate_gloss(sentence):
    prompt = f"Convert the sentence to sign language gloss: {sentence}"
    inputs = tokenizer(prompt, return_tensors="pt")
    output = model.generate(**inputs, max_length=30)
    gloss = tokenizer.decode(output[0], skip_special_tokens=True)
    return gloss


In [6]:
df_sample['generated_gloss'] = df_sample['Sentences'].apply(generate_gloss)
df_sample[['Sentences', 'cleaned_gloss', 'generated_gloss']]


Unnamed: 0,Sentences,cleaned_gloss,generated_gloss
361,you are good,GOOD,you are good
73,it was nice chatting with you,NICE CHATTING,it was nice chatting with you
374,i got hurt,GOT HURT,i got hurt
155,you can do it,,you can do it
104,he came by train,CAME TRAIN,he came by train
394,"you need a medicine, take this one",NEED MEDICINE TAKE ONE,"you need a medicine, take this one"
377,speak softly,SPEAK SOFTLY,speak softly
124,he came by train,CAME TRAIN,he came by train
68,we are all with you,,we are all with you
450,he came by train,CAME TRAIN,he came by train


In [7]:
df_sample.to_csv('isl_generated_gloss_sample.csv', index=False)
print("Generated gloss saved to isl_generated_gloss_sample.csv")


Generated gloss saved to isl_generated_gloss_sample.csv


### Summary

In this notebook, we used **Flan-T5** to generate **gloss sequences** from raw sentences using **instruction prompting**, showcasing a **Generative AI pipeline** with zero-shot learning.


In [None]:
import pandas as pd
from transformers import AutoTokenizer, AutoModelForSeq2SeqLM
import torch
from google.colab import drive
import matplotlib.pyplot as plt
import seaborn as sns

# Mount Google Drive
drive.mount('/content/drive')

# Load and sample the data
df = pd.read_csv('/content/drive/MyDrive/IETGenAI-SLT/Chapter 4/isl_train_meta_cleaned.csv')
df_sample = df.sample(10, random_state=42).copy()

# Load the model and tokenizer
model_name = "google/flan-t5-small"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForSeq2SeqLM.from_pretrained(model_name)

# Define the gloss generation function
def generate_gloss(sentence):
    prompt = f"Convert the sentence to sign language gloss: {sentence}"
    inputs = tokenizer(prompt, return_tensors="pt")
    output = model.generate(**inputs, max_length=30)
    gloss = tokenizer.decode(output[0], skip_special_tokens=True)
    return gloss

# Generate glosses for the sample data
df_sample['generated_gloss'] = df_sample['Sentences'].apply(generate_gloss)

# Calculate agreement
df_sample['agreement'] = (df_sample['cleaned_gloss'].str.strip().str.lower() == df_sample['generated_gloss'].str.strip().str.lower())

# Display the sample data with generated gloss and agreement
display(df_sample[['Sentences', 'cleaned_gloss', 'generated_gloss', 'agreement']])

# Visualize the distribution of agreement
plt.figure(figsize=(6, 4))
sns.countplot(x='agreement', data=df_sample)
plt.title('Agreement between Cleaned and Generated Gloss (Sample)')
plt.xlabel('Exact Match Agreement')
plt.ylabel('Count')
plt.xticks([0, 1], ['Disagree', 'Agree'])
plt.show()

# Save the sample data with generated gloss
df_sample.to_csv('isl_generated_gloss_sample.csv', index=False)
print("Generated gloss saved to isl_generated_gloss_sample.csv")