DeepThinkingFlow-AI

Runtime-Steering & SFT-Seed Stack for Structured Reasoning
Bilingual (Vietnamese/English) | LoRA/QLoRA Fine-Tuning | Behavior Bundles | Heuristic Eval

A self-built, end-to-end AI reasoning pipeline on top of a GPT-OSS-20B compatibility layer.
Focused on structured reasoning, bilingual behavior steering, and adapter-based fine-tuning.

Overview

DeepThinkingFlow is a comprehensive AI system consisting of:

Component	Description
Runtime Steering	Controls model behavior through behavior bundles (system prompt + profile) without modifying weights
SFT Seed Data	Bilingual Vietnamese/English training dataset in "harmony" format for supervised fine-tuning
LoRA/QLoRA Training	Complete adapter training pipeline with fixed train/eval splits, early stopping, gradient checkpointing
Multi-turn Chat	Interactive terminal chat with conversation history and dynamic reasoning effort switching
Heuristic Evaluation	Scores outputs against a trait checklist and rubric rules
Unified CLI	Single entry point for all scripts via `deepthinkingflow_cli.py`

Key Features

Bilingual (Vietnamese/English) -- defaults to Vietnamese when the user writes in Vietnamese
Behavior Bundles -- cleanly separates system prompt, profile, SFT data, and eval cases
3 Reasoning Levels -- low, medium, high -- switchable mid-session
Structured Output -- Goal, Assumptions, Analysis, Answer, Examples, Checks
No hidden chain-of-thought claims -- only visible analysis when opted in
7/7 smoke tests passing -- covers CLI, runtime helpers, chat flow, prompt rendering, one-shot generation

Architecture

+---------------------------------------------------------------------+
|                        DeepThinkingFlow Stack                        |
+---------------------------------------------------------------------+
|                                                                      |
|  +---------------+     +--------------------+     +-----------------+|
|  |   User CLI    |---->|  Unified CLI       |---->|   Sub-scripts   ||
|  |  (terminal)   |     |  deepthinkingflow  |     |   (12 scripts)  ||
|  +---------------+     |  _cli.py           |     +--------+--------+|
|                         +--------------------+              |         |
|                                                             v         |
|  +----------------------------------------------------------------+  |
|  |                    deepthinkingflow_runtime.py                  |  |
|  |  +--------------+ +----------------+ +------------------------+|  |
|  |  | Model Loader | | Prompt Render  | | Response Extractor     ||  |
|  |  | + Memory     | | + Chat Templ   | | (analysis / final)     ||  |
|  |  |   Check      | |                | |                        ||  |
|  |  +-------+------+ +-------+--------+ +-----------+------------+|  |
|  +----------+----------------+---------------------------+---------+  |
|             |                |                           |            |
|             v                v                           v            |
|  +----------------------------------------------------------------+  |
|  |              HuggingFace Transformers Runtime                   |  |
|  |  AutoModelForCausalLM + AutoTokenizer + chat_template           |  |
|  +---------------------------------+------------------------------+  |
|                                    |                                  |
|            +-----------------------+-----------------------+          |
|            v                       v                       v          |
|  +--------------+     +-------------------+     +------------------+ |
|  |  Behavior    |     |  Model Weights    |     |  LoRA Adapters   | |
|  |  Bundle      |     |  (safetensors)    |     |  (PEFT)          | |
|  +--------------+     +-------------------+     +------------------+ |
|                                                                      |
+----------------------------------------------------------------------+

Inference Flow

User Input --> Behavior Bundle (system_prompt.txt) --> Chat Template Rendering
                                                             |
                                                             v
                                                    Tokenizer.apply_chat_template()
                                                             |
                                                             v
                                                    Model.generate()
                                                             |
                                                             v
                                                    Decode Completion
                                                             |
                                              +--------------+--------------+
                                              v                             v
                                    extract_analysis_text()       extract_final_text()
                                     (hidden by default)          (shown to user)

Training Flow

harmony_sft_vi.jsonl --> prepare_harmony_sft_dataset.py --> train.jsonl + eval.jsonl
                                                                   |
                                                                   v
config.example.json --> train_transformers_deepthinkingflow_lora.py
                              |
                              +-- Validate config + dataset
                              +-- Load tokenizer + encode examples
                              +-- Load base model (bf16 / 4-bit)
                              +-- Apply LoraConfig (PEFT)
                              +-- Train with HF Trainer
                              +-- EarlyStoppingCallback
                              +-- Save adapter to out/
                              +-- (Optional) Merge adapter back

Project Structure

deepthinkingflow/
|
|-- README.md                             <- You are reading this file
|-- LICENSE                               <- GNU General Public License v3
|-- .gitignore                            <- Ignores model weights & training outputs
|-- requirements-transformers.txt         <- Dependencies for inference
|-- requirements-train-gpt-oss.txt        <- Dependencies for training (torch, peft, etc.)
|
|-- behavior/                             <- Behavior bundles (steering data)
|   +-- DeepThinkingFlow/
|       |-- profile.json                  <- Bundle metadata, quality gates, file map
|       |-- system_prompt.txt             <- System prompt with hard_rules, depth_policy, etc.
|       |-- evals/
|       |   +-- reasoning_following.jsonl <- 20+ eval cases with trait + rubric rules
|       +-- training/
|           |-- sft_reasoning_vi.jsonl        <- 6+ original SFT examples (vi)
|           |-- harmony_sft_vi.jsonl          <- 49 harmony-format examples (vi)
|           |-- harmony_sft_vi.train.jsonl    <- 39 train split (fixed, seed=42)
|           +-- harmony_sft_vi.eval.jsonl     <- 10 eval split  (fixed, seed=42)
|
|-- original/                             <- Original upstream model snapshot
|   |-- config.json                       <- Architecture config (MoE, 24 layers, etc.)
|   |-- dtypes.json                       <- Weight dtype metadata (FP4, UE8, BF16)
|   +-- model.safetensors                 <- ~12.82 GiB weights (git-ignored)
|
|-- runtime/                              <- Transformers-ready model directory
|   +-- transformers/
|       +-- DeepThinkingFlow/
|           |-- bootstrap-manifest.json   <- Bootstrapped file manifest
|           |-- config.json               <- Transformers model config
|           |-- generation_config.json    <- Generation defaults
|           |-- chat_template.jinja       <- Chat template with channel routing
|           |-- tokenizer.json            <- ~26.6 MB tokenizer (201,088 vocab)
|           |-- tokenizer_config.json     <- Tokenizer settings
|           |-- special_tokens_map.json   <- Special token mapping
|           |-- dtypes.json               <- Symlink to original/dtypes.json
|           +-- model.safetensors         <- Symlink to original/model.safetensors
|
|-- scripts/                              <- All Python scripts (12 files)
|   |-- deepthinkingflow_cli.py               <- Unified CLI launcher
|   |-- deepthinkingflow_runtime.py           <- Shared runtime helpers
|   |-- chat_deepthinkingflow.py              <- Multi-turn terminal chat
|   |-- run_transformers_deepthinkingflow.py  <- One-shot generation (JSON output)
|   |-- render_transformers_deepthinkingflow_prompt.py  <- Prompt preview
|   |-- bootstrap_transformers_deepthinkingflow.py      <- Bootstrap from HF
|   |-- assemble_local_transformers_model_dir.py        <- Symlink weights overlay
|   |-- compose_behavior_request.py           <- Compose messages from bundle
|   |-- validate_behavior_bundle.py           <- Bundle health checker
|   |-- prepare_harmony_sft_dataset.py        <- Dataset dedupe + split
|   |-- train_transformers_deepthinkingflow_lora.py     <- LoRA/QLoRA trainer
|   +-- evaluate_reasoning_outputs.py         <- Heuristic eval scorer
|
|-- training/                             <- Training config templates
|   +-- DeepThinkingFlow-lora/
|       |-- config.example.json           <- LoRA config (bf16, r=8, alpha=16)
|       +-- config.qlora.example.json     <- QLoRA config (4-bit, paged_adamw_8bit)
|
|-- out/                                  <- Training outputs (git-ignored)
|   |-- DeepThinkingFlow-lora-reasoning-vi/
|   |   +-- run-manifest.json             <- Training run metadata
|   +-- DeepThinkingFlow-qlora-reasoning-vi/
|       +-- run-manifest.json             <- Training run metadata
|
|-- skills/                               <- Codex skill definitions
|   +-- DeepThinkingFlow/
|       |-- SKILL.md                      <- Skill instructions for AI assistants
|       |-- agents/
|       |   +-- openai.yaml              <- Agent interface config
|       +-- references/
|           |-- model-profile.md          <- MoE architecture facts
|           |-- reasoning-patterns.md     <- Reasoning behavior patterns
|           |-- prompt-templates.md       <- Reusable prompt scaffolds
|           |-- response-examples.md      <- Example answer templates
|           +-- runtime-and-training.md   <- Runtime & training guide
|
+-- tests/                                <- Unit tests
    +-- test_deepthinkingflow_smoke.py    <- 7 smoke tests (all passing)

Prerequisites

System Requirements

Item	Minimum	Recommended
Python	3.10+	3.11+
RAM	16 GiB	32 GiB+
GPU VRAM	16 GiB (QLoRA 4-bit)	24 GiB+ (LoRA bf16)
Disk	15 GiB (weights)	30 GiB (weights + outputs)

Install Dependencies

For inference (running the model):

pip install -r requirements-transformers.txt

For training (LoRA/QLoRA fine-tuning):

pip install -r requirements-train-gpt-oss.txt

# If using QLoRA (4-bit quantization):
pip install bitsandbytes>=0.46.0

Dependency details

Inference:

Package	Version
transformers	>=4.57.0, <5.0.0
tokenizers	>=0.21.0, <1.0.0
huggingface_hub	>=0.35.0, <1.0.0
safetensors	>=0.6.0, <1.0.0
jinja2	>=3.1.0, <4.0.0

Training (additional):

Package	Version
torch	>=2.7.0, <3.0.0
accelerate	>=1.10.0, <2.0.0
datasets	>=4.0.0, <5.0.0
peft	>=0.17.0, <1.0.0

Quick Start

1. Bootstrap the model directory from HuggingFace

# Download metadata (tokenizer, config, chat template) -- does NOT include weights
python scripts/deepthinkingflow_cli.py bootstrap

# Or include weights (~12.8 GiB):
python scripts/deepthinkingflow_cli.py bootstrap --include-weights

2. (Optional) Link local weights

If you already have model.safetensors in the original/ directory:

python scripts/deepthinkingflow_cli.py assemble-model-dir

3. Interactive chat

python scripts/deepthinkingflow_cli.py chat

4. One-shot generation

python scripts/deepthinkingflow_cli.py run --user "Explain MoE architecture"

5. Validate the behavior bundle

python scripts/deepthinkingflow_cli.py validate-bundle behavior/DeepThinkingFlow

CLI Reference

All scripts are accessed through the unified CLI launcher:

python scripts/deepthinkingflow_cli.py <command> [args]

Command	Script	Description
`chat`	`chat_deepthinkingflow.py`	Interactive multi-turn chat with conversation history
`run`	`run_transformers_deepthinkingflow.py`	One-shot generation returning JSON
`render-prompt`	`render_transformers_deepthinkingflow_prompt.py`	Render the injected chat-template prompt
`compose-request`	`compose_behavior_request.py`	Compose messages from the behavior bundle
`validate-bundle`	`validate_behavior_bundle.py`	Validate bundle health
`bootstrap`	`bootstrap_transformers_deepthinkingflow.py`	Bootstrap model directory from HF
`assemble-model-dir`	`assemble_local_transformers_model_dir.py`	Symlink local weights into model dir
`prepare-sft`	`prepare_harmony_sft_dataset.py`	Deduplicate + split SFT dataset
`train-lora`	`train_transformers_deepthinkingflow_lora.py`	Train LoRA/QLoRA adapter
`eval`	`evaluate_reasoning_outputs.py`	Score outputs against trait + rubric

Chat Commands (inside a chat session)

/help                Show available commands
/status              Show current runtime settings
/clear               Clear history, keep system prompt
/history             Print the retained conversation
/analysis on|off     Toggle visible analysis output
/reasoning <level>   Switch reasoning effort: low, medium, high
/quit                Exit the chat session

Workflows

1. Inference Workflow

Use an existing model to generate answers.

                    +-------------------+
                    |   User starts     |
                    +---------+---------+
                              |
                              v
              +-------------------------------+
              |  1. Bootstrap Model Directory  |
              |  bootstrap --include-weights   |
              |  OR assemble-model-dir         |
              +---------------+---------------+
                              |
                              v
              +-------------------------------+
              |  2. Validate Behavior Bundle   |
              |  validate-bundle behavior/...  |
              +---------------+---------------+
                              |
                              v
                    +---------+---------+
                    |  Choose mode?     |
                    +---------+---------+
                    +---------+---------+
                    v                   v
          +--------------+    +------------------+
          |  One-shot     |    |  Multi-turn Chat  |
          |  run --user   |    |  chat              |
          |  "prompt"     |    |  (interactive)     |
          +------+-------+    +--------+----------+
                 |                     |
                 v                     v
          +--------------+    +------------------+
          |  JSON Output  |    |  Stream output    |
          |  {final_text} |    |  DeepThinkingFlow>|
          +--------------+    +------------------+

Detailed steps:

# Step 1: Prepare model
python scripts/deepthinkingflow_cli.py bootstrap
python scripts/deepthinkingflow_cli.py assemble-model-dir

# Step 2: Validate bundle
python scripts/deepthinkingflow_cli.py validate-bundle behavior/DeepThinkingFlow

# Step 3a: One-shot
python scripts/deepthinkingflow_cli.py run \
  --user "Analyze this prompt" \
  --reasoning-effort high \
  --include-analysis

# Step 3b: Chat
python scripts/deepthinkingflow_cli.py chat \
  --reasoning-effort high \
  --show-analysis \
  --max-history-turns 6

2. Training Workflow

Train a LoRA/QLoRA adapter to improve model behavior.

 +-----------------------------------------------------------------+
 |                   TRAINING PIPELINE                              |
 +-----------------------------------------------------------------+
 |                                                                  |
 |  (1) Prepare Dataset                                             |
 |  +------------------+     +-----------------------------+        |
 |  | harmony_sft_vi   |---->| prepare_harmony_sft_dataset |        |
 |  | .jsonl (49 ex)   |     | --eval-ratio 0.2 --seed 42  |        |
 |  +------------------+     +----------+------------------+        |
 |                                      |                           |
 |                           +----------+----------+                |
 |                           v                     v                |
 |                    train.jsonl (39)       eval.jsonl (10)         |
 |                                                                  |
 |  (2) Dry Run (Validate)                                          |
 |  +------------------------------------------------------+       |
 |  | train-lora --config config.example.json --dry-run     |       |
 |  | -> Validates config, loads tokenizer, preprocesses    |       |
 |  | -> Outputs summary JSON + run-manifest.json           |       |
 |  +--------------------------------------+---------------+       |
 |                                         |                        |
 |  (3) Train                              v                        |
 |  +------------------------------------------------------+       |
 |  | train-lora --config config.example.json               |       |
 |  | -> Loads base model (bf16 or 4-bit)                   |       |
 |  | -> Applies LoraConfig (r=8, alpha=16, dropout=0.05)   |       |
 |  | -> HF Trainer with EarlyStopping                      |       |
 |  | -> Saves adapter to out/ directory                    |       |
 |  +--------------------------------------+---------------+       |
 |                                         |                        |
 |  (4) Evaluate                           v                        |
 |  +------------------------------------------------------+       |
 |  | eval --eval-cases evals/reasoning_following.jsonl     |       |
 |  |      --predictions predictions.jsonl                  |       |
 |  | -> Scores: trait_pass_rate + rubric_pass_rate          |       |
 |  +--------------------------------------+---------------+       |
 |                                         |                        |
 |  (5) (Optional) Merge                   v                        |
 |  +------------------------------------------------------+       |
 |  | Set "merge_after_train": true in config               |       |
 |  | -> PeftModel.merge_and_unload()                       |       |
 |  | -> Saves merged model to out/*-merged/                |       |
 |  +------------------------------------------------------+       |
 |                                                                  |
 +-----------------------------------------------------------------+

Detailed steps:

# Step 1: Prepare dataset (if fixed splits do not exist yet)
python scripts/deepthinkingflow_cli.py prepare-sft \
  --input behavior/DeepThinkingFlow/training/harmony_sft_vi.jsonl \
  --train-out behavior/DeepThinkingFlow/training/harmony_sft_vi.train.jsonl \
  --eval-out behavior/DeepThinkingFlow/training/harmony_sft_vi.eval.jsonl \
  --eval-ratio 0.2 --seed 42

# Step 2: Dry run
python scripts/deepthinkingflow_cli.py train-lora \
  --config training/DeepThinkingFlow-lora/config.example.json \
  --dry-run

# Step 3: Train (LoRA)
python scripts/deepthinkingflow_cli.py train-lora \
  --config training/DeepThinkingFlow-lora/config.example.json

# Or Train (QLoRA -- saves VRAM)
python scripts/deepthinkingflow_cli.py train-lora \
  --config training/DeepThinkingFlow-lora/config.qlora.example.json

# Step 4: Evaluate
python scripts/deepthinkingflow_cli.py eval \
  --eval-cases behavior/DeepThinkingFlow/evals/reasoning_following.jsonl \
  --predictions your_predictions.jsonl

3. Evaluation Workflow

Score output quality along two dimensions: traits and rubrics.

                  +--------------------+
                  |  eval_cases.jsonl  |  <- Each case has:
                  |                    |     id, user, expected_traits,
                  |                    |     required_keywords, rubric rules
                  +----------+---------+
                             |
                             v
              +--------------------------+
              |  predictions.jsonl       |  <- Each row has:
              |  (from model run)        |     id, final_text, analysis_text
              +-------------+------------+
                            |
                            v
              +--------------------------+
              |  evaluate_reasoning_     |
              |  outputs.py              |
              |                          |
              |  Scoring:                |
              |  +-- Trait scoring       |  <- 18 trait types
              |  |   (keyword match,     |     (simple_definition,
              |  |    length check,      |      concise_reasoning,
              |  |    structure check)   |      phased_plan, ...)
              |  +-- Rubric scoring      |  <- Rule-based checks
              |      (required_keywords, |     (keyword groups,
              |       forbidden_keywords,|      max_chars,
              |       must_start_with,   |      min_numbered_steps)
              |       max_chars, ...)    |
              +-------------+------------+
                            |
                            v
              +--------------------------+
              |  Output Summary JSON     |
              |  {                       |
              |    trait_pass_rate: 0.85, |
              |    rubric_pass_rate: 0.9, |
              |    results: [...]        |
              |  }                       |
              +--------------------------+

Supported Traits:

Trait	Check
`simple_definition`	First line under 180 characters
`short_analysis`	Analysis under 400 characters
`one_concrete_example`	Contains "example" or Vietnamese equivalent
`concise_reasoning`	Full output under 1,400 characters
`likely_causes_first`	Lists probable causes before fixes
`ordered_checks`	Contains numbered check steps
`probable_fix`	Contains a concrete fix or solution
`findings_first`	First line leads with findings
`security_risk_called_out`	Mentions security or risk
`recommendation_first`	First line leads with a recommendation
`3_to_5_criteria`	At least 3 comparison criteria present
`one_tradeoff`	Mentions at least one tradeoff
`phased_plan`	Contains "phase 1/2" or equivalent
`validation_step`	Includes a validation or benchmark step
`rollback_step`	Includes a rollback or fallback plan
`main_risk`	Identifies the main risk
`brief_summary`	Full output under 1,600 characters
`scenario_example`	Contains a scenario-based example

4. Full Pipeline (End-to-End)

+---------+   +------------+   +-----------+   +----------+   +-----------+   +------------+
|  Write  |-->|  Validate  |-->|  Prepare  |-->|  Train   |-->|  Generate |-->|  Evaluate  |
|  Data   |   |  Bundle    |   |  Dataset  |   |  LoRA    |   |  Predict  |   |  & Compare |
+---------+   +------------+   +-----------+   +----------+   +-----------+   +------------+
     |              |               |               |               |               |
 harmony_      validate-       prepare-sft      train-lora       run --user       eval
 sft_vi         bundle                                            "..."        --eval-cases
 .jsonl

Behavior Bundle System

A behavior bundle is the central mechanism for steering model behavior without modifying weights.

Bundle Structure

behavior/DeepThinkingFlow/
|-- profile.json          <- Metadata + quality gates + file references
|-- system_prompt.txt     <- System prompt (injected into every request)
|-- evals/
|   +-- reasoning_following.jsonl  <- Eval cases
+-- training/
    |-- sft_reasoning_vi.jsonl          <- Raw SFT seed data
    |-- harmony_sft_vi.jsonl            <- Full harmony dataset
    |-- harmony_sft_vi.train.jsonl      <- Fixed train split
    +-- harmony_sft_vi.eval.jsonl       <- Fixed eval split

Profile Guarantees

{
  "guarantees": {
    "does_not_modify_weights": true,
    "does_not_claim_model_retraining": true,
    "requires_runtime_integration": true
  }
}

System Prompt Structure

The system prompt uses tagged blocks:

Block	Purpose
`<identity>`	Assistant identity declaration
`<hard_rules>`	Mandatory rules (language, transparency, verification)
`<task_classifier>`	Classifies tasks: explain, debug, review, compare, plan, estimate
`<depth_policy>`	Three levels: Quick, Standard, Deep
`<output_policy>`	Output format per task type
`<local_model_guidance>`	Optimization guidance for local models
`<quality_bar>`	Quality standards

Quality Gates

The bundle is automatically validated via validate-bundle:

Gate	Value
`min_sft_examples`	>= 6
`min_harmony_sft_examples`	>= 45
`min_eval_cases`	>= 20
`require_unique_eval_ids`	`true`
`require_unique_harmony_examples`	`true`

Model Profile

Property	Value
Upstream	`openai/gpt-oss-20b`
Architecture	Transformer + Mixture-of-Experts (MoE)
Layers	24
Hidden size	2,880
Vocab size	201,088
Attention	64 query heads, 8 KV heads, dim=64
Experts	32 per layer, 4 active per token
Context	4,096 tokens initial, sliding window 128
Total params (est.)	~21.5B (when expanding packed FP4)
Active params/token (est.)	~4.19B (4/32 experts)
Weight format	BF16 (attention/embedding) + Packed FP4 + UE8 scales (MoE)
File size	~12.82 GiB

Special Tokens and Channel System

The model uses a channel system to separate reasoning from output:

<|start|>assistant<|channel|>analysis<|message|>...<|end|>
<|start|>assistant<|channel|>final<|message|>...<|return|>

analysis -- Visible reasoning (hidden by default; enable via --show-analysis or /analysis on)
final -- The final answer shown to the user

Training Configuration

LoRA Config (`config.example.json`)

Parameter	Value	Description
`lora_r`	8	Rank of LoRA matrices
`lora_alpha`	16	Scaling factor
`lora_dropout`	0.05	Dropout rate
`target_modules`	`[q_proj, k_proj, v_proj, o_proj]`	Attention projection layers
`bf16`	`true`	BFloat16 precision
`learning_rate`	0.0002	Peak learning rate
`lr_scheduler_type`	`cosine`	Cosine decay scheduler
`gradient_checkpointing`	`true`	Saves VRAM
`gradient_accumulation_steps`	8	Effective batch = 1 x 8 = 8
`max_seq_length`	4,096	Maximum sequence length
`early_stopping_patience`	3	Stop if eval_loss does not improve for 3 consecutive evals
`optim`	`adamw_torch`	Optimizer

QLoRA Config (`config.qlora.example.json`)

Same as LoRA, with these additions:

Parameter	Value	Description
`use_qlora`	`true`	Enables QLoRA mode
`load_in_4bit`	`true`	Loads model in 4-bit (NF4)
`optim`	`paged_adamw_8bit`	Memory-efficient optimizer

Note: QLoRA requires the bitsandbytes package.

Testing

Smoke Tests (7/7)

python -m pytest tests/test_deepthinkingflow_smoke.py -v

Test	Description
`RuntimeHelpersTest::test_extracts_analysis_and_final_text`	Verifies channel token extraction
`CliSmokeTest::test_help_dispatches_to_subcommand_help`	CLI `help` routing
`CliSmokeTest::test_unknown_command_returns_error`	CLI unknown command returns exit code 2
`CliSmokeTest::test_dispatch_builds_expected_subprocess_call`	CLI subprocess argument construction
`RenderPromptSmokeTest::test_render_prompt_main_with_fake_tokenizer`	Prompt rendering pipeline
`RunSmokeTest::test_run_main_returns_expected_json_without_loading_real_model`	One-shot generation flow
`ChatSmokeTest::test_chat_main_handles_commands_and_response_flow`	Full chat lifecycle

Tests use mocks and run without a GPU or real model weights.

Codex Skill Integration

The skills/DeepThinkingFlow/ directory provides guidance for AI coding assistants (Codex, etc.):

skills/DeepThinkingFlow/
|-- SKILL.md                        <- Main instructions
|-- agents/openai.yaml              <- Agent interface config
+-- references/
    |-- model-profile.md            <- Architecture & prompting implications
    |-- reasoning-patterns.md       <- Reasoning behavior patterns
    |-- prompt-templates.md         <- Reusable prompt scaffolds
    |-- response-examples.md        <- Answer templates
    +-- runtime-and-training.md     <- Runtime & training integration guide

Skill Workflow

1. Classify task       -> explain | debug | review | compare | plan | estimate
2. Extract constraints -> language, depth, format, risk, evidence
3. Choose depth        -> Quick | Standard | Deep
4. Select prompt scaffold (if rewriting)
5. Select answer pattern
6. Final check for missing caveats & unsupported claims

Output Contract

Goal:        <one-sentence restatement>
Assumptions: <only if needed>
Analysis:    <short visible reasoning>
Answer:      <direct answer or recommendation>
Examples:    <1-3 concrete examples>
Checks:      <verification, caveat, or next step>

Dataset Statistics

Dataset	Count	Description
`sft_reasoning_vi.jsonl`	6+ examples	Original SFT seed (Vietnamese)
`harmony_sft_vi.jsonl`	49 examples	Full harmony-format dataset
`harmony_sft_vi.train.jsonl`	39 examples	Fixed train split (seed=42)
`harmony_sft_vi.eval.jsonl`	10 examples	Fixed eval split (seed=42)
`reasoning_following.jsonl`	20+ cases	Eval cases with traits + rubric

Design Principles

Transparency -- No claims of hidden chain-of-thought or secret reasoning
Separation of Concerns -- Behavior bundle is decoupled from model weights
Reproducibility -- Fixed train/eval splits, deterministic seeds
Safety -- Low-memory warnings, config validation, dry-run mode
Bilingual -- Vietnamese-first, English-compatible
Modularity -- Each script does one thing; the CLI orchestrates everything

License

This project is released under the GNU General Public License v3.0.

DeepThinkingFlow-AI -- by Dang Gia Minh
_{Runtime steering | Bilingual reasoning | Adapter-based fine-tuning | Open source}

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
behavior/DeepThinkingFlow		behavior/DeepThinkingFlow
original		original
runtime/transformers/DeepThinkingFlow		runtime/transformers/DeepThinkingFlow
scripts		scripts
skills/DeepThinkingFlow		skills/DeepThinkingFlow
tests		tests
training/DeepThinkingFlow-lora		training/DeepThinkingFlow-lora
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
requirements-train-gpt-oss.txt		requirements-train-gpt-oss.txt
requirements-transformers.txt		requirements-transformers.txt

Folders and files

Latest commit

History

Repository files navigation

DeepThinkingFlow-AI

Table of Contents

Overview

Key Features

Architecture

Inference Flow

Training Flow

Project Structure

Prerequisites

System Requirements

Install Dependencies

Quick Start

1. Bootstrap the model directory from HuggingFace

2. (Optional) Link local weights

3. Interactive chat

4. One-shot generation

5. Validate the behavior bundle

CLI Reference

Chat Commands (inside a chat session)

Workflows

1. Inference Workflow

2. Training Workflow

3. Evaluation Workflow

4. Full Pipeline (End-to-End)

Behavior Bundle System

Bundle Structure

Profile Guarantees

System Prompt Structure

Quality Gates

Model Profile

Special Tokens and Channel System

Training Configuration

LoRA Config (config.example.json)

QLoRA Config (config.qlora.example.json)

Testing

Smoke Tests (7/7)

Codex Skill Integration

Skill Workflow

Output Contract

Dataset Statistics

Design Principles

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 1

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

LoRA Config (`config.example.json`)

QLoRA Config (`config.qlora.example.json`)

Packages