framework/: end-to-end C-to-Rust translation pipeline (batch_test_staged.sh).data/ohos/source_projects/: 5 minimal OHOS projects for theohos(test5) suite, each with a relocatablecompile_commands.json.data/ohos/ohos_root_min/: a minimal OpenHarmony header tree used for include resolution / bindgen.framework/workspace/rag/: the base RAG knowledge base (knowledge_base.json+bm25_index.pkl) and precomputed reranked results.scripts/+data/rq{1,2,3,4}/: paper analysis scripts and minimal inputs (RQ1-RQ4).
The commands below are tested on:
- OS: Ubuntu Linux
- Python (framework): 3.11.x (via conda env
c2r_frame) - Rust: nightly toolchain
- Clang + libclang: 14.x (
clang,libclang-dev)
System tools (required):
- Python 3.8+ (paper analysis). Python 3.10+ recommended for the framework.
- Rust toolchain:
rustc,cargo,clippy(recommend installing viarustup). Rust nightly is recommended/required. - C/C++ toolchain:
clang(orgcc) for preprocessing and bindgen-related steps. libclangruntime (required when using--use-libclang).- Conda (
condacommand in PATH) to use the provided one-click environment setup.
Network downloads (only needed for some modes):
- Conda packages + pip wheels (when creating the framework env).
- NLTK corpora:
stopwords,wordnet,omw-1.4(downloaded intoframework/data/nltk_data). - External LLM API access (when
USE_VLLM=false). - HuggingFace model weights (when
--run-rag trueand you enable the Jina reranker).
Install system packages:
sudo apt-get update
sudo apt-get install -y \
build-essential \
clang \
libclang-dev \
pkg-config \
cmake \
python3-venvInstall Rust via rustup (downloads the toolchain). Use nightly:
curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh -s -- -y
source "$HOME/.cargo/env"
rustup toolchain install nightly
rustup default nightly
rustup component add clippyInstall Miniconda/Anaconda so conda is available, then follow the conda setup below.
- Paper analysis (RQ1–RQ4): standard library only.
- Framework: use the provided conda environment (recommended).
# Create/update conda env from framework/environment.yml (downloads packages).
bash framework/setup_conda_env.sh
# Activate the env.
#
# If your conda crashes with PermissionError / CUDA probing issues, this workaround helps:
# (it disables conda's CUDA virtual-package probing which may spawn processes and fail in restricted envs)
export CONDA_OVERRIDE_CUDA=
export CONDA_NO_PLUGINS=true
conda activate c2r_frame
# Ensure NLTK corpora are found (the setup script downloads them into this folder).
export NLTK_DATA="$(pwd)/framework/data/nltk_data"If you run with --run-rag true, the reranker step uses torch + transformers and may download model weights.
If you only want to run the framework in external API mode (recommended smoke run uses --run-rag false and --skip-learned-kb),
you can avoid conda and install a minimal set of Python deps:
python3 -m venv .venv
source .venv/bin/activate
python -m pip install -U pip
python -m pip install -r requirements.txt
# Download NLTK corpora into the repo-local folder
python - <<'PY'
import nltk, os
dst = os.path.abspath("framework/data/nltk_data")
os.makedirs(dst, exist_ok=True)
for p in ["stopwords", "wordnet", "omw-1.4"]:
nltk.download(p, download_dir=dst, quiet=True)
print("NLTK_DATA =", dst)
PY
export NLTK_DATA="$(pwd)/framework/data/nltk_data"This repository keeps a single paper-reproduction entry:
python3 scripts/export_current_plot_metrics.pyWhat this command does:
- reruns the shipped RQ1--RQ4 analysis scripts on the local run directories under
data/rq{1,2,3,4}/; - regenerates
data/paper_metric_exports/generated_structured_json/*.json; - compares the reproduced His2Trans rows against the repository-local paper reference tables under
data/paper_metric_exports/reference_tables/; - writes:
data/paper_metric_exports/current_plot_metrics_alignment.jsondata/paper_metric_exports/current_plot_metrics_alignment.md
Notes:
data/test_module_rust_tests/is the frozen RQ2 unified test bundle used by the current paper.data/source_rq2_tests/keeps the paper-aligned reference Rust source layout used by the RQ2 incremental-compilation checker; it is required for reproducing the currentICompRatenumbers.- The reference CSVs under
data/paper_metric_exports/reference_tables/are shipped with the repository, so reproduction does not depend on external absolute paths. - Baseline rows are kept in the shipped reference tables; the one-click script reruns the His2Trans rows from the local run directories and checks them against the paper-aligned reference values.
This is the end-to-end pipeline. It will generate output under framework/translation_outputs/<run-dir>/.
# Recommended: force Rust nightly without changing global toolchain.
export RUSTUP_TOOLCHAIN=nightly
# Use external LLM API instead of local vLLM.
export USE_VLLM=false
export EXTERNAL_API_BASE_URL="https://api.deepseek.com/beta"
export EXTERNAL_API_MODEL="deepseek-coder"
export EXTERNAL_API_KEY="YOUR_KEY"
# Optional: avoid accidentally using any host-local OpenHarmony compile DB.
export USE_PREPROCESSING=false
# Optional (recommended): keep HuggingFace cache inside this folder (used by reranker).
export HF_HOME="$(pwd)/framework/data/my-huggingface"
export TRANSFORMERS_CACHE="$HF_HOME"
export HF_HUB_CACHE="$HF_HOME/hub"cd framework
bash batch_test_staged.sh \
--layered --incremental --max-repair 5 \
--max-parallel 20 \
--run-rag true --jina-parallel --use-libclang \
--bindgen-debug-keep-files \
--vllm-global-limit 120 \
--suite ohos \
--run-dir deepseek-coder-ohos10Outputs are written under: framework/translation_outputs/<run-dir>/.
If you just want to validate the pipeline end-to-end on a single shipped project:
cd framework
bash batch_test_staged.sh \
--layered --incremental --max-repair 1 \
--max-parallel 1 --max-parallel-workers 1 \
--run-rag false --skip-learned-kb --use-libclang \
--suite ohos \
--only osal__0bc4f21396ad \
--run-dir smoke_api- Full OpenHarmony source tree + full
compile_commands.json(only needed if you want preprocessing/type recovery to use the full build context):- Set
USE_PREPROCESSING=true - Provide
OHOS_ROOT=/path/to/OpenHarmonyandOHOS_COMPILE_COMMANDS=/path/to/compile_commands.json
- Set
- Jina reranker weights (only needed when
--run-rag true):- Model id:
jinaai/jina-reranker-v3 - The framework caches downloads under
framework/data/my-huggingface/(viaHF_HOME).
- Model id:
Paper reproduction:
python3 scripts/export_current_plot_metrics.pyFramework (single project smoke run, external API):
# 1) setup env (one-time)
bash framework/setup_conda_env.sh
export CONDA_OVERRIDE_CUDA=
export CONDA_NO_PLUGINS=true
conda activate c2r_frame
export NLTK_DATA="$(pwd)/framework/data/nltk_data"
# 2) set API env vars
export RUSTUP_TOOLCHAIN=nightly
export USE_VLLM=false
export EXTERNAL_API_BASE_URL="https://api.deepseek.com/beta"
export EXTERNAL_API_MODEL="deepseek-coder"
export EXTERNAL_API_KEY="YOUR_KEY"
# 3) run
cd framework
bash batch_test_staged.sh \
--layered --incremental --max-repair 1 \
--max-parallel 1 --max-parallel-workers 1 \
--run-rag false --skip-learned-kb --use-libclang \
--suite ohos \
--only osal__0bc4f21396ad \
--run-dir smoke_api