<a href="https://colab.research.google.com/github/SattamAltwaim/SaSOKE/blob/main/notebooks/5_text_to_sign_inference.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>


# Text-to-Sign Language Inference
Generate sign language from custom text input using SOKE model.


## 1. Setup Environment


In [1]:
# Clone repo if not present
import os
if not os.path.exists('/content/SaSOKE'):
    !git clone https://github.com/SattamAltwaim/SaSOKE.git
%cd /content/SaSOKE

# Mount Google Drive
from google.colab import drive
drive.mount('/content/drive')

drive_data = '/content/drive/MyDrive/GraduationProject/CodeFiles/SaSOKE'
print("✓ Code:", os.getcwd())
print("✓ Data:", drive_data)


/content/SaSOKE
Drive already mounted at /content/drive; to attempt to forcibly remount, call drive.mount("/content/drive", force_remount=True).
✓ Code: /content/SaSOKE
✓ Data: /content/drive/MyDrive/GraduationProject/CodeFiles/SaSOKE


In [2]:
# Install dependencies (if needed)
# Install dependencies
%pip install -q pytorch_lightning torchmetrics omegaconf shortuuid transformers diffusers einops wandb rich matplotlib
%pip install -q smplx h5py scikit-image spacy ftfy more-itertools natsort tensorboard sentencepiece
%pip install -q gdown pandas



## 2. Verify GPU


In [3]:
import torch
print(f"CUDA available: {torch.cuda.is_available()}")
if torch.cuda.is_available():
    print(f"GPU: {torch.cuda.get_device_name(0)}")
    print(f"GPU Memory: {torch.cuda.get_device_properties(0).total_memory / 1e9:.2f} GB")
else:
    print("⚠️ No GPU detected! Go to Runtime → Change runtime type → GPU")


CUDA available: True
GPU: NVIDIA A100-SXM4-40GB
GPU Memory: 42.47 GB


## 3. Enter Your Custom Text


In [4]:
# Enter your text here - you can modify this!
custom_texts = [
    "Hello, how are you today?",
    "Thank you for your help.",
    "I am learning sign language."
]

# Choose sign language (default: American Sign Language)
# Options: 'how2sign' (ASL), 'csl' (Chinese SL), 'phoenix' (German SL)
sign_language = 'how2sign'  # Change this to generate different sign languages

print(f"Target sign language: {sign_language}")
print("\nInput texts:")
for i, text in enumerate(custom_texts, 1):
    print(f"{i}. {text}")


Target sign language: how2sign

Input texts:
1. Hello, how are you today?
2. Thank you for your help.
3. I am learning sign language.


### About the Three Sign Languages

The SOKE model supports three sign languages:

1. **`how2sign`** → American Sign Language (ASL)
   - Uses language token `en_ASL`
   - Input text: English
   
2. **`csl`** → Chinese Sign Language (CSL)
   - Uses language token `zh_CSL`
   - Input text: Chinese
   
3. **`phoenix`** → German Sign Language (DGS)
   - Uses language token `de_DGS`
   - Input text: German

The model automatically converts your text to the appropriate sign language gestures based on your selection!


## 4. Run Inference on Your Text


In [None]:
import os

# Define the expected path and the actual path in Google Drive
drive_data = '/content/drive/MyDrive/GraduationProject/CodeFiles/SaSOKE' # Ensure drive_data is defined

# Create symbolic links for deps directories
# Note: We don't symlink deps/t2m to avoid circular reference issues
deps_links = {
    'deps/smpl_models': f'{drive_data}/deps/smpl_models',
    'deps/mbart-h2s-csl-phoenix': f'{drive_data}/deps/mbart-h2s-csl-phoenix',
}

for expected_path, actual_path in deps_links.items():
    if not os.path.exists(expected_path):
        print(f"Creating symbolic link from '{expected_path}' to '{actual_path}'")
        # Ensure the parent directory for the symlink exists
        os.makedirs(os.path.dirname(expected_path), exist_ok=True)
        os.symlink(actual_path, expected_path)
        print(f"  ✓ {expected_path} linked")
    else:
        print(f"  ✓ {expected_path} already exists")

print("\n✓ All symbolic links created!")

  ✓ deps/smpl_models already exists
  ✓ deps/mbart-h2s-csl-phoenix already exists

✓ All symbolic links created!


In [None]:
# Configuration and Argument Parsing (Run this cell ONLY ONCE per runtime session)
import sys
import yaml
from omegaconf import OmegaConf
from mGPT.config import parse_args

# Configure paths
with open('configs/soke.yaml', 'r') as f:
    config = yaml.safe_load(f)

config['ACCELERATOR'] = 'gpu'
config['DEVICE'] = [0]
config['DATASET']['H2S']['ROOT'] = f'{drive_data}/data/How2Sign'
config['DATASET']['H2S']['MEAN_PATH'] = f'{drive_data}/smpl-x/mean.pt'
config['DATASET']['H2S']['STD_PATH'] = f'{drive_data}/smpl-x/std.pt'
config['TRAIN']['PRETRAINED_VAE'] = f'{drive_data}/checkpoints/vae/tokenizer.ckpt'

with open('configs/text_inference.yaml', 'w') as f:
    yaml.dump(config, f)

# Update assets with the correct path from Google Drive
with open('configs/assets.yaml', 'r') as f:
    assets = yaml.safe_load(f)

# Use mix of symlinks and full paths to avoid circular reference
assets['RENDER']['SMPL_MODEL_PATH'] = 'deps/smpl_models/smpl'
assets['RENDER']['MODEL_PATH'] = 'deps/smpl_models'
# Use full Google Drive path for t2m to avoid circular symlink issues
assets['METRIC']['TM2T']['t2m_path'] = f'{drive_data}/deps/t2m/t2m/'

with open('configs/assets_inference.yaml', 'w') as f:
    yaml.dump(assets, f)

# Parse config (the parse_args function handles resolver registration)
sys.argv = ['', '--cfg', 'configs/text_inference.yaml', '--cfg_assets', 'configs/assets_inference.yaml']

# parse_args will register the 'eval' resolver automatically
cfg = parse_args(phase="test")
cfg.FOLDER = cfg.TEST.FOLDER

print("✓ Configuration and arguments parsed!")

Force no debugging when testing
✓ Configuration and arguments parsed!


In [7]:
# Model Setup and Loading (You can rerun this cell as needed)
import torch
import pytorch_lightning as pl
from mGPT.models.build_model import build_model
from mGPT.data.build_data import build_data
from mGPT.utils.load_checkpoint import load_pretrained_vae, load_pretrained
from mGPT.utils.logger import create_logger
import os

# Assuming 'cfg' and 'drive_data' are defined from the previous cell

# Seed
pl.seed_everything(cfg.SEED_VALUE)

# Update the word vectorizer path in cfg (word vectorizer is in glove/ subdirectory)
# Use full Google Drive path
cfg.DATASET.WORD_VERTILIZER_PATH = f'{drive_data}/deps/t2m/t2m/glove/'

# Build data and model
print("Loading model...")
print(f"Word vectorizer path being used: {cfg.DATASET.WORD_VERTILIZER_PATH}") # Added print statement
datamodule = build_data(cfg)
model = build_model(cfg, datamodule)

# Load checkpoints
logger = create_logger(cfg, phase="test")
if cfg.TRAIN.PRETRAINED_VAE:
    load_pretrained_vae(cfg, model, logger)

# Check for trained checkpoint
ckpt_path = f'{drive_data}/experiments/mgpt/SOKE/checkpoints/last.ckpt'
if os.path.exists(ckpt_path):
    print(f"Loading trained checkpoint from {ckpt_path}")
    cfg.TEST.CHECKPOINTS = ckpt_path
    load_pretrained(cfg, model, logger, phase="test")
else:
    print("Using pretrained mBART (no fine-tuned checkpoint found)")

model = model.cuda()
model.eval()

print("✓ Model ready!")

INFO:lightning_fabric.utilities.seed:Seed set to 1234


Loading model...
Word vectorizer path being used: /content/drive/MyDrive/GraduationProject/CodeFiles/SaSOKE/deps/t2m/t2m/glove/
mean path /content/drive/MyDrive/GraduationProject/CodeFiles/SaSOKE/smpl-x/mean.pt std_path:  /content/drive/MyDrive/GraduationProject/CodeFiles/SaSOKE/smpl-x/std.pt


OSError: [Errno 40] Too many levels of symbolic links: 'deps/t2m/t2m/t2m/text_mot_match/model/finest.tar'

In [None]:
import pickle

# Helper function to convert features to SMPL-X parameters
def feats_to_smplx(features, mean, std):
    """Convert 133-dim compressed features to SMPL-X parameters."""
    # Denormalize features
    features = features * std + mean

    # Add zero root pose (36 dims) to get 169 dims total
    T = features.shape[0]
    zero_pose = torch.zeros(T, 36).to(features)
    features_full = torch.cat([zero_pose, features], dim=-1)  # (T, 169)

    # Extract SMPL-X parameters
    smplx_params = {
        'root_pose': features_full[:, 0:3].cpu().numpy(),
        'body_pose': features_full[:, 3:66].cpu().numpy(),
        'lhand_pose': features_full[:, 66:111].cpu().numpy(),
        'rhand_pose': features_full[:, 111:156].cpu().numpy(),
        'jaw_pose': features_full[:, 156:159].cpu().numpy(),
        'expression': features_full[:, 159:169].cpu().numpy(),
    }
    return smplx_params

# Generate sign language poses
output_dir = '/content/text_sign_results'
os.makedirs(output_dir, exist_ok=True)
print(f"\nGenerating sign language for {len(custom_texts)} text(s)....\n")

# Get mean and std for denormalization
mean = datamodule.hparams.mean.cuda()
std = datamodule.hparams.std.cuda()

with torch.no_grad():
    for idx, text in enumerate(custom_texts):
        print(f"[{idx+1}/{len(custom_texts)}] Processing: '{text}'")

        # Prepare input (model expects text, length, and src fields)
        batch = {
            'text': [text],
            'length': [0],  # length not used during generation
            'src': [sign_language]  # Target sign language: 'how2sign', 'csl', or 'phoenix'
        }

        try:
            # Generate FULL SEQUENCE using forward method
            output = model.forward(batch, task="t2m")

            # Extract features
            feats = output['feats'][0] if 'feats' in output else None

            if feats is None:
                print(f"  ✗ No features generated")
                continue

            # Convert to SMPL-X parameters (full sequence)
            smplx_params = feats_to_smplx(feats, mean, std)

            # Save result (NO TOKENS, only SMPL-X params)
            filename = f"text_{idx+1}.pkl"
            filepath = os.path.join(output_dir, filename)

            result = {
                'text': text,
                'smplx_params': smplx_params,  # Full sequence of SMPL-X poses
                'num_frames': smplx_params['body_pose'].shape[0]
            }

            with open(filepath, 'wb') as f:
                pickle.dump(result, f)

            print(f"  ✓ Saved: {filepath}")
            print(f"    - Frames: {result['num_frames']}")
            print(f"    - SMPL-X parameters saved (no tokens)")

        except Exception as e:
            print(f"  ✗ Error: {e}")
            import traceback
            traceback.print_exc()
            continue

print(f"\nComplete! Predictions saved in '{output_dir}'")
print(f"\nTo play the animations, download results and use:")
print(f"  python3 generate_animation_html.py text_sign_results/text_1.pkl")

## 5. View Results


In [None]:
# List generated files
print("Generated predictions:")
!ls -lh {output_dir}

# Load and display results
pkl_files = sorted([f for f in os.listdir(output_dir) if f.endswith('.pkl')])

for pkl_file in pkl_files:
    filepath = os.path.join(output_dir, pkl_file)

    with open(filepath, 'rb') as f:
        result = pickle.load(f)

    print(f"\n{pkl_file}:")
    print(f"  Text: {result['text']}")
    print(f"  Frames: {result['num_frames']}")

    # Display SMPL-X parameters info
    if result.get('smplx_params') is not None:
        smplx = result['smplx_params']
        print(f"  SMPL-X Parameters:")
        print(f"- root_pose: {smplx['root_pose'].shape} (global orientation)")
        print(f"- body_pose: {smplx['body_pose'].shape} (21 body joints × 3)")
        print(f"- lhand_pose: {smplx['lhand_pose'].shape} (15 left hand joints × 3)")
        print(f"- rhand_pose: {smplx['rhand_pose'].shape} (15 right hand joints × 3)")
        print(f"- jaw_pose: {smplx['jaw_pose'].shape} (jaw rotation)")
        print(f"- expression: {smplx['expression'].shape} (facial expression)")

## 6. Download Results


In [None]:
# Zip results for easy download
!zip -r text_sign_results.zip {output_dir}/

# Download
from google.colab import files
files.download('text_sign_results.zip')

print("✓ Results packaged and ready to download")


## Notes

- **GPU Required**: Make sure you're using a GPU runtime (Runtime → Change runtime type → GPU → T4/V100/A100)
- **First Time**: Run notebook 1 first to download all dependencies to your Google Drive
- **Custom Text**: Simply modify the `custom_texts` list in cell 8 with your own text
- **Output**: Each text generates a `.pkl` file containing predicted sign language poses (3D coordinates)
- **Format**: Poses are in SMPL-X format and can be visualized using 3D animation tools

### Troubleshooting
- **OOM Error**: Reduce text length or batch size
- **Missing files**: Make sure notebook 1 was run successfully to download models
- **Slow generation**: Normal on T4 GPU, faster on V100/A100
