This repository contains the data processing pipelines, fine-tuning scripts, and evaluation environments for generating PLC Structured Text (IEC 61131-3) code using state-of-the-art open-weight model, Gemma 3 4B (and Phi-3).
Warning
Important Pathing Notice: The directory structure of this repository was heavily reorganized after the initial scripts were written. As a result, hardcoded file paths inside many of the Python scripts (e.g., paths for loading datasets, saving checkpoints, or reading configuration files) might still point to the old root directory. Please review and update the file paths inside any script before you run it to prevent "File Not Found" errors.
Project Directory Structure
plc-code-generation/
├── docs/ # Documentation
│ └── days1-3-complete-guide.md
├── src/ # Main source code module
│ ├── data/ # Data parsing, filtering, and formatting scripts
│ │ ├── create_holdout.py
│ │ ├── filter_oscat.py
│ │ ├── filter_oscat_relaxed.py
│ │ ├── format_dataset.py
│ │ ├── format_phi.py
│ │ ├── generate_dataset.py
│ │ ├── generate_pass3.py
│ │ ├── merge_datasets.py
│ │ ├── parse_oscat.py
│ │ └── parse_oscat_2.py
│ ├── train/ # Training and fine-tuning scripts
│ │ ├── train.py
│ │ └── train_phi.py
│ ├── eval/ # Evaluation and benchmarking scripts
│ │ ├── benchmark_edge.py
│ │ ├── eval.py
│ │ ├── evaluate.py
│ │ ├── evaluate_all.py
│ │ └── verif_eval.py
│ ├── model/ # Model saving, exporting, and modifying scripts
│ │ ├── export_gguf.py
│ │ ├── hack_gemma3.py
│ │ ├── save_model.py
│ │ └── save_phi.py
│ └── utils/ # Debugging and miscellaneous utility scripts
│ ├── debug_data.py
│ ├── debug_gen.py
│ ├── diagnose_v2.py
│ ├── test_gen.py
│ └── test_load.py
├── data/ # Datasets (Ignored in git)
│ ├── raw/ # Raw dataset files
│ ├── processed/ # Formatted and verified datasets
│ └── meta/ # Prompts and seeds
├── tests/ # PLC Code test files and C-exports
│ ├── st_files/ # .st Structured Text files
│ ├── c_exports/ # C code generated from PLC
│ └── plc_test/ # Evaluation test workspace
├── results/ # Evaluation outputs (Ignored in git)
├── logs/ # Log files (Ignored in git)
├── third_party/ # External libraries and submodules
│ ├── matiec/
│ ├── oscat_codesys/
│ └── oscat_plclang/
├── .gitignore # Git ignore file
└── README.md # Project overview
Model checkpoints and weights are tracked separately and hosted via Git LFS on Hugging Face:
Gemma3-PLC-4BPhi-PLC-Lora
The link to the weights and checkpoints can be found here: Weights.