This project implements a simple, fully custom image-to-code architecture to generate CadQuery scripts from mechanical CAD images using a CNN + LSTM model. It was developed for a technical challenge, with the objective of being simple, efficient, and explainable.
Generate CadQuery Python code that recreates the geometry shown in a CAD image.
The solution follows an encoder-decoder design:
-
Encoder (CnnEncoder):
- A lightweight convolutional neural network that reduces a 128x128 image to a 512-dimensional feature vector.
- Acts as a visual extractor for the geometry.
-
Decoder (CodeLSTMDecoder):
- A 2-layer LSTM conditioned on the image features.
- Trained to generate valid Python code token-by-token using GPT2 tokenization.
Image Tokens
(RGB) (Code)
| |
[ CnnEncoder ] [ LSTM Decoder ]
| |
Features --> "import cadquery..."
Tokenizer: GPT2 tokenizer is used for code, as it handles indentation and Python-specific tokens well.
pad_token
is explicitly set toeos_token
for compatibility.
- Training done from scratch on the full CADCODER/GenCAD-Code dataset (~147k samples).
- Model trained for only 10 epochs on a high-end NVIDIA L4 GPU-equipped AWS EC2 instance.
Device: NVIDIA L4 (16 GB VRAM)
CUDA Version: 12.8
Training time: ~7h
Memory usage: ~16.7 GB
Initial loss: 5.77 Final loss after 10 epochs: ~0.15 The loss decreased consistently, even with a simple model.
You can download the pretrained model weights here:
Download from Google Drive
We evaluate two metrics:
-
Valid Syntax Rate (VSR): using
evaluate_syntax_rate_simple
.- Checks if the generated code runs without error.
-
IOU (Shape Similarity): using
get_iou_best
.- Compares the generated shape with the ground truth.
Due to limited compute resources, I were only able to evaluate a subset of the test set. On a representative example (sampleIdx = 840), the results were:
- Valid Syntax Rate: 1.00
- IOU (shape similarity): 0.64
While the syntax validity is good, the overall IOU remains relatively low. This indicates that the model often generates code that compiles, but the resulting shape is still far from accurate. These results are not yet satisfying, and further improvements are needed, especially in structure consistency and geometry alignment. Additionally, the model frequently struggles to correctly generate the initial import cadquery as cq statement. To ensure the generated code is syntactically valid and executable, this import line is automatically corrected in the evaluation script when missing or malformed.
python train.py
This will:
- Load the dataset
- Initialize model from scratch
- Train for 30 epochs (or fewer)
- Save weights in
weights/
folder
No virtualenv configuration provided, but using one is recommended for reproducibility.
python eval.py
This loads one sample, runs inference, and prints:
- Ground truth code
- Generated code
- Syntax validity rate
- IOU (3D shape comparison)
Component | Reason |
---|---|
GPT2 Tokenizer | Handles Python-like syntax well; avoids whitespace errors |
CNN Encoder | Simple and lightweight, avoids heavy ViTs for faster prototyping |
LSTM Decoder | Easy to train on limited GPU; interpretable code output |
No pretrained | Full training from scratch to control all layers and demonstrate skill |
- Use a pretrained image encoder (e.g., ResNet18 or ViT) to improve feature extraction
- Replace LSTM with Transformer decoder (e.g., GPT2)
- Improve training workflow with callbacks, logging ...
- Add support for exporting the model and tokenizer to ONNX for deployment and inference optimization
- Test different image sizes, max token lengths, and tokenization strategies
This project shows a functional baseline for CAD-to-code generation, using simple components and no external training tricks. The model is small, explainable, and performs reasonably well with only 10 epochs.
With more time and compute, we could easily scale this approach to achieve high performances.
Author: R.Choukri Date: June 2025