Dual-Decoder Transformer for Mechanism Synthesis Using Coupler Curve Images

Overview

Designing mechanisms capable of following specific trajectories, known as coupler curves, is a core challenge in mechanical engineering and robotics. Mechanisms are essential in applications ranging from robotic arms to automated manufacturing processes. However, traditional methods for mechanism synthesis rely heavily on analytical techniques, which are:

Time-Consuming: Solving complex equations for mechanism synthesis is computationally expensive, especially for higher-order mechanisms.
Single-Solution Oriented: These methods typically provide only one mechanism design for a given coupler curve.
Limited in Complexity: Analytical approaches struggle with mechanisms that have a large number of joints or require intricate trajectories.

Motivation for a Machine Learning Approach

Machine learning offers an innovative alternative to traditional methods, enabling faster and more diverse solutions. By leveraging data-driven models, engineers can explore a wider range of mechanism designs and automate the synthesis process. This project introduces a Dual-Decoder Transformer model, designed specifically for mechanism synthesis. Key innovations include:

Coupler Curves as Images: Input trajectories are represented as grayscale images, enabling the use of convolutional layers for spatial feature extraction.
Mechanism Type Embeddings: Each mechanism type is encoded as a unique feature vector to condition the model's output.
Joint Coordinates as Output: Mechanisms are represented as Cartesian coordinates of their joints, split into two independent parts for simplified learning.

Key Contributions

Mechanism Type Conditioning:
- Introduces a dedicated embedding layer for mechanism types, enabling the model to generate designs specific to the input type.
Dual-Decoder Architecture:
- Employs two independent decoders:
  - The first decoder predicts the first set of joint coordinates.
  - The second decoder predicts the second set of joint coordinates.
- This modular design improves performance for complex mechanisms.
Advanced Loss Masking:
- Implements a masked Mean Squared Error (MSE) loss to handle variable-length sequences and padding tokens.
Efficiency and Scalability:
- Processes coupler curve images efficiently through patch embeddings and scaled positional encodings.
- Designed to handle a wide range of mechanism types and complexities.

Iterative Development Process

The development process involved several iterations to refine the architecture and improve performance:

Initial Attempts

The project began with a single-decoder Transformer model inspired by natural language processing. However, early experiments revealed significant limitations:

Poor Performance on Complex Mechanisms: The single decoder struggled with mechanisms having more than six joints.
Limited Scalability: Increasing the model size improved results slightly but introduced overfitting and longer training times.

Integration of LLAMA Features

To address these challenges, features from the LLAMA architecture were integrated:

RMS Normalization:
- Improved training stability and model convergence.
Scaled Embeddings:
- Enhanced input and positional embeddings to capture spatial relationships effectively.
Dynamic Causal Masking:
- Ensured that predictions were generated step-by-step during training and inference.

Introduction of Dual Decoders

A major breakthrough came with the introduction of two independent decoders. This design allowed the model to handle mechanisms of varying complexity by splitting the task into two smaller, more manageable subtasks.

Methodology

Input Representation

Coupler Curves as Images:
- Each trajectory is represented as a 2D grayscale image, divided into patches of fixed size.
- A convolutional layer extracts features from these patches, which are embedded into a fixed-dimensional vector.
Mechanism Type Embeddings:
- Each mechanism type is represented as a unique vector using a learnable embedding layer.
- The embedding is added to the input sequence to condition the model on the desired mechanism type.

Model Architecture

Transformer Encoder:
- Processes the embedded input sequence (coupler curve patches + mechanism type embedding).
- Captures spatial relationships and encodes them into a latent representation.
Dual Decoders:
- Each decoder independently predicts one part of the mechanism (first and second sets of joint coordinates).
- Cross-attention layers allow the decoders to leverage information from the encoder's latent representation.
Projection Layers:
- Map the decoder outputs back to Cartesian coordinates.

Training Process

Masked MSE Loss:

Handles variable-length sequences by masking padding tokens during loss computation.

def mse_loss(predictions, targets, mask_value=0.5):
    mask = ~(targets == mask_value).all(dim=-1)
    mask = mask.unsqueeze(-1).expand_as(predictions)
    masked_predictions = predictions[mask]
    masked_targets = targets[mask]
    loss = F.mse_loss(masked_predictions, masked_targets, reduction="mean")
    return loss

Optimization:
- The model is trained using the Adam optimizer with a learning rate scheduler.
Dynamic Causal Masking:
- Applied during decoding to ensure stepwise predictions.

Inference Process

During inference, the model generates mechanism designs using a conditional greedy decoding approach:

Encoding:
- The coupler curve image is encoded along with the mechanism type embedding.
Decoding:
- Each decoder independently predicts its part of the mechanism, conditioned on the encoder's latent representation.
Stopping Condition:
- Decoding halts when an End-of-Sequence (EOS) token is detected.

Code Highlights for Inference

def greedy_decode_conditional(model, source, mech_type, max_len, eos_token=torch.tensor([1.0, 1.0])):
    encoder_output = model.encode(source, None, mech_type)
    decoder_input_first = torch.zeros(1, 1, 2).to(device)
    decoder_input_second = torch.zeros(1, 1, 2).to(device)

    # Decoding for both decoders
    while decoder_input_first.size(1) < max_len // 2:
        ...
    while decoder_input_second.size(1) < max_len // 2:
        ...

Applications

Robotics:
- Generates diverse designs for robotic mechanisms, such as arms and grippers.
Industrial Design:
- Facilitates rapid prototyping of mechanisms for manufacturing.
Education:
- Provides a framework for teaching mechanism synthesis concepts using advanced machine learning techniques.

Future Directions

Intra-Type Diversity:
- Extend the model to generate multiple mechanisms within the same type.
Scalability:
- Adapt the architecture to handle mechanisms with more joints and higher complexities.
Optimization Frameworks:
- Integrate the model with optimization algorithms for real-time design applications.
Explainability:
- Develop visualizations to interpret the model’s attention mechanisms and latent space.

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
LICENSE		LICENSE
README.md		README.md
dataset.py		dataset.py
model.py		model.py
moe_model.py		moe_model.py
sim.py		sim.py
test_coditional.ipynb		test_coditional.ipynb
train.py		train.py
train_moe.py		train_moe.py
utils.py		utils.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Dual-Decoder Transformer for Mechanism Synthesis Using Coupler Curve Images

Overview

Motivation for a Machine Learning Approach

Key Contributions

Iterative Development Process

Initial Attempts

Integration of LLAMA Features

Introduction of Dual Decoders

Methodology

Input Representation

Model Architecture

Training Process

Inference Process

Code Highlights for Inference

Applications

Future Directions

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Dual-Decoder Transformer for Mechanism Synthesis Using Coupler Curve Images

Overview

Motivation for a Machine Learning Approach

Key Contributions

Iterative Development Process

Initial Attempts

Integration of LLAMA Features

Introduction of Dual Decoders

Methodology

Input Representation

Model Architecture

Training Process

Inference Process

Code Highlights for Inference

Applications

Future Directions

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages