Skip to content

MI-Materials-Intelligence/Recipe-Language-Model

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

172 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Recipe Language Model

Table of Contents

  1. Introduction
  2. Model Summary
  3. Model Downloads
  4. License
  5. Citation
  6. Contact

1. Introduction

Materials innovation have been undergoing rapid development with the vast combinatorial exploration of recipes; however, the related research suffers from time-consuming trial-and-error synthesis and labour-intensive fabrication. As a promising alternative, robotics enables high-throughput experimentation and data collection; however, the resulting numerical datasets are often insufficiently analysed and fail to provide effective feedback for semantic recipe optimisation.

Here, we present a domain-specific recipe language model (RLM) developed for an emerging scientific tool of robotic boxes (perovskite solar cell research as a demonstration). For iterative fine-tuning of the RLM, seven artificial intelligence (AI) layers, including learning, generating, RecipeQA, fine-tuning, reasoning, evaluation, and optimisation, have been designed with a language agent. During the loops of seven AI layers, both numerical and semantic recipes were continuously learned and optimised for the RLM. Guided by this RLM, eleven robotic boxes executed the controllable synthesis, fabrication and characterisation of 50,764 samples. Simultaneously, more than 578 million tokens were generated and augmented to improve the ability to recommend a recipe and mechanistic reasoning, reaching a level comparable to that of an experienced researcher.

Therefore, the integration of the RLM with robotic boxes enables an AI and robotics discovery process in which specialised language modelling and modularised robotic hardware continuously improve one another, suggesting an evolution of physical AI for the Materials Intelligence.


2. Model Summary

Fine-tuning of the RLM with robotics

To train this domain-specific RLM, the workflow starts from encoded formulas and parameters as recipe inputs, proceeds through seven AI layers with the language agent, executes synthesis and fabrication within eleven interconnected robotic boxes and produces in situ characterisation and device performance assessment as mechanistic outputs. As a result, the fine-tuned RLM incorporates the encoded recipes, robotics, and characterised results to form a closed recommendation–synthesis–fabrication–characterisation–mechanism loop for exploring the large space of the recipes and their underlying mechanisms. The language agent then encoded these machine-readable recipes into structured formulas and parameters sequences, which were translated into tokens for subsequent fine-tuning of the RLM and execution by the robotic boxes.


Seven AI layers Architecture for RLM Training

In the learning layer, the formulas and parameters are encoded and then tokenised into recipes as inputs. Through atomic skills of data extraction, cleaning, and matching, these data are organised into standardised datasets, providing the basis for RLM training and iterative recipe refinement.

In the generating layer, the tokenised recipes are comprised into the recipe report with fabrication details, mechanistic descriptions, an optimisation summary, and supporting information. Through atomic skills of edge reporting (generation from single experimental data), single-variable reporting (generation from matched data with single variable), and characterization reporting (generation from matched data with in situ characterization), these processed data are converted into robotic recipe reports.

In the RecipeQA layer, the recipe reports are further converted into semantically structured question–answer pairs (RecipeQA). The primary objective of this layer is to construct high-quality, domain-specific training corpora through key atomic skills of Report to QA (convert recipe reports into semantically structured RecipeQA) and Distillation (knowledge distillation for RecipeQA).

In the fine-tuning layer, the base model (Qwen3-32B) together with the RecipeQA corpora are taken as the input of this layer. Through low-rank adaptation (LoRA), the model is efficiently adapted to domain-specific recipe knowledge and transformed into a domain-specific RLM as output.

In the reasoning layer, the fine-tuned domain-specific RLM to generate mechanistic interpretations, performance explanations, and recipe optimization suggestions from experimental records. These reasoning results serve as an important bridge between trained model capability and practical scientific decision-making, and also provide candidate knowledge and reasoning evidence for the downstream Evaluation Layer and Optimization Layer.

In the evaluation layer, the recipe recommendations and mechanistic reasoning are evaluated, in order to measure their effectiveness, reliability, and scientific validity. Through key atomic skills of recipe recommendation and mechanistic reasoning, the aspects of recipe integrity, formula rationality, parameter rationality, experimental validation, domain knowledge, mechanism integrity, interpretation, comprehensiveness and coherence are systematically assessed.

In the optimization layer, the RLM to be optimised and preference pairs of positive and negative samples are taken as the input of this layer. Through atomic skill of Direct Preference Optimisation (DPO), an optimised RLM is obtained as output. This layer further aligns the model towards preference-consistent and high-performance recipe recommendation.


3. Model Downloads

To simplify usage, we release merged models where LoRA weights are already integrated into the base model.

This allows users to:

  • Run inference directly
  • Avoid manual LoRA merging
  • Ensure consistent behavior across environments

Model List

Model Base Model Type Download
RLM-v1 Qwen3-32B LoRA (merged) 🤗 Hugging Face

4. License

This project is released under the MIT License.


5. Citation

If you find this work useful, please cite:

@article{chen2026rlm,
  title={Agentic Robotic Boxes for Perovskite Solar Cell Fabrication with Recipe Language Model},
  author={Chen, Zijian and Yu, Wenjin and Wu, Chuang and Chen, Feibei and Wang, Zixuan and Zhou, Chao and You, Yimeng and Li, Shaojie and Zhu, Qiyuan and Ma, Ning and Sun, Yao and Li, Donghui and Fanady, Billy and Jiang, Shengchou and Yan, Zhongliang and Zhou, Shumin and Li, Liang and Hsieh, Chang-Yu and Bai, Yang and Xiao, Lixin and Chung, Chi-yung and Chan, Ching-chuen and Cui, Zhanfeng and Gr{\"a}tzel, Michael and Zhao, Haitao},
  journal={Engineering},
  year={2026},
  doi={10.1016/j.eng.2026.04.002}
}

6. Contact

For questions or collaboration,please contact us at Material_Intelligence@outlook.com


About

Recipe Language Model (RLM): We introduce a domain-specific RLM developed through seven AI layers and interconnected robotic boxes to drive the evolution of physical AI for defining a new paradigm - Materials Intelligence.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages