# Language Compiler — End-to-End Demo (Colab)

**Important Runtime Note**

This notebook uses lightweight local language models (Phi-3.5-mini / Qwen2.5-0.5B).  
While the system can run on CPU, **GPU runtime is strongly recommended** in Google Colab for reasonable execution time.

**Before running any cells:**
- Go to **Runtime → Change runtime type**
- Set **Hardware accelerator** to **GPU (T4)**
- Click **Save**

The notebook will still function on CPU, but model loading and inference may be significantly slower.


## Project Overview

This notebook demonstrates a **Natural Language Logic Compiler** that converts human instructions into:

1. A structured logic plan (intermediate representation)
2. Readable pseudocode
3. Optional executable Python code

Unlike direct natural-language-to-code systems, this project explicitly exposes the reasoning layer before code generation, improving interpretability, safety, and debuggability.


## Environment Setup

This section installs all required dependencies and clones the project repository.
The system is fully local and does not rely on paid APIs.


In [1]:
!pip install transformers accelerate sentencepiece streamlit pydantic pytest

Collecting streamlit
  Downloading streamlit-1.52.1-py3-none-any.whl.metadata (9.8 kB)
Collecting pydeck<1,>=0.8.0b4 (from streamlit)
  Downloading pydeck-0.9.1-py2.py3-none-any.whl.metadata (4.1 kB)
Downloading streamlit-1.52.1-py3-none-any.whl (9.0 MB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m9.0/9.0 MB[0m [31m48.1 MB/s[0m eta [36m0:00:00[0m
[?25hDownloading pydeck-0.9.1-py2.py3-none-any.whl (6.9 MB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m6.9/6.9 MB[0m [31m83.0 MB/s[0m eta [36m0:00:00[0m
[?25hInstalling collected packages: pydeck, streamlit
Successfully installed pydeck-0.9.1 streamlit-1.52.1


## Initialising the Compiler

We initialise the `LanguageCompiler` using a lightweight local model.
In this demo, we use **Phi-3.5-mini**, which balances reasoning ability and efficiency.


In [6]:
!git clone https://github.com/drheaa/nlp-language-compiler.git
%cd nlp-language-compiler

Cloning into 'nlp-language-compiler'...
remote: Enumerating objects: 129, done.[K
remote: Counting objects: 100% (129/129), done.[K
remote: Compressing objects: 100% (74/74), done.[K
remote: Total 129 (delta 65), reused 100 (delta 36), pack-reused 0 (from 0)[K
Receiving objects: 100% (129/129), 29.07 KiB | 5.81 MiB/s, done.
Resolving deltas: 100% (65/65), done.
/content/nlp-language-compiler/nlp-language-compiler


In [7]:
from src.language_compiler.pipeline import LanguageCompiler
# Use phi-mini, not qwen-mini
compiler = LanguageCompiler(model='phi-mini')


[LMProvider] Loading local model: microsoft/Phi-3.5-mini-instruct
[LMProvider] CUDA detected → using GPU acceleration.


Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]

Device set to use cuda


## Example: Unambiguous Instruction

This example contains explicit numeric values.
The system should:
- Extract conditions and actions correctly
- Generate pseudocode without TODO placeholders
- Produce executable Python code


In [3]:
out = compiler.compile(
    "If the temperature is above 30 degrees, turn the AC to 20 degrees",
    to_code=True,
    interactive=True
)
out

CompilerOutput(reasoning=LogicPlan(steps=[LogicUnit(id='S1', role='condition', text='the temperature is above 30 degrees', depends_on=[], operator=None, value=None, negated=False, clarification_needed=False, clarification_field='temperature_threshold'), LogicUnit(id='S2', role='action', text='turn the AC to 20 degrees', depends_on=['S1'], operator=None, value='20', negated=False, clarification_needed=False, clarification_field=None)]), pseudocode=PseudocodeBlock(language='pseudocode', code='Solution:\n\nif temperature > 30:\n    set AC_temperature to 20\n\nPseudocode:\n\nif temperature > 30:\n    AC_temperature = 20', missing_clarifications=[]), code=CodeBlock(language='python', code='def set_ac_temperature(temperature):\n    if temperature > 30:\n        AC_temperature = 20\nset_ac_temperature(temperature)\ndef TURN_ON(x):\n    print("TURN_ON", x)\nTURN_ON("AC")'), clarifications_needed=[])

### Interpretation

- The **LogicPlan** shows a clear condition-action dependency.
- No clarification fields are returned because all values are explicit.
- Pseudocode and Python code preserve the original intent faithfully.


## Batch Evaluation on Diverse Instructions

To evaluate robustness beyond simple rule-based examples, we run the compiler on a set of less explicit, human-like instructions.
These examples include ambiguity, negation, conditional phrasing, and implicit thresholds.

The goal is **qualitative evaluation**:
- Does the system extract the correct structure?
- Does it avoid hallucinating missing values?
- Does it complete the pipeline consistently?


In [13]:
batch_instructions = [
    "Turn off the lights unless someone is still inside the room.",

    "Notify the manager when sales drop significantly compared to yesterday.",

    "If the temperature keeps rising, reduce the AC setting gradually.",

    "Apply a discount for expensive items, but only during peak hours.",

    "If it rains heavily, cancel outdoor activities.",

    "Only allow access after verification has been completed."
]


In [15]:
for i, instr in enumerate(batch_instructions, 1):
    print("\n" + "="*70)
    print(f"Example {i}")
    print(f"Instruction: {instr}")

    try:
        out = compiler.compile(
            instr,
            to_code=False,
            interactive=True
        )

        print("\nLogic Plan:")
        for step in out.reasoning.steps:
            deps = f" → depends on {step.depends_on}" if step.depends_on else ""
            print(f"  [{step.role}] {step.text}{deps}")

        print("\nPseudocode:")
        print(out.pseudocode.code)

        if out.clarifications_needed:
            print("\nMissing Clarifications:")
            for c in out.clarifications_needed:
                print(f"  - {c}")
        else:
            print("\nMissing Clarifications: None")

    except Exception as e:
        print("\n Compilation failed for this instruction.")
        print("Reason:", str(e))



Example 1
Instruction: Turn off the lights unless someone is still inside the room.

Logic Plan:
  [condition] NOT (someone is still inside the room.)
  [action] TURN OFF the lights → depends on ['S1']

Pseudocode:
Solution:

IF NOT (someone is still inside the room)
    TURN OFF the lights
END IF

TODO(someone)

Missing Clarifications: None

Example 2
Instruction: Notify the manager when sales drop significantly compared to yesterday.

Logic Plan:
  [action] NOTIFY the manager
  [condition] when sales drop significantly compared to yesterday → depends on ['S1']

Pseudocode:
Solution:

```
IF sales_drop_threshold IS NOT NULL
    TODO(sales_drop_threshold)
    IF sales_today < sales_yesterday * sales_drop_threshold
        NOTIFY_MANAGER
    ENDIF
ENDIF
```

Missing Clarifications:
  - sales_drop_threshold

Example 3
Instruction: If the temperature keeps rising, reduce the AC setting gradually.

Logic Plan:
  [condition] the temperature keeps rising
  [action] reduce the AC setting gra

## Example of Rejected Output

The following instruction demonstrates a case where the system rejects malformed model output
instead of guessing structure.


In [17]:
instruction = "When the room feels uncomfortable, switch on the cooling system."

print("Instruction:")
print(instruction)

print("\nAttempting compilation...\n")

try:
    out = compiler.compile(
        instruction,
        to_code=False,
        interactive=True
    )

    print("Logic Plan:")
    for step in out.reasoning.steps:
        deps = f" → depends on {step.depends_on}" if step.depends_on else ""
        print(f"  [{step.role}] {step.text}{deps}")

    print("\nPseudocode:")
    print(out.pseudocode.code)

    if out.clarifications_needed:
        print("\nMissing Clarifications:")
        for c in out.clarifications_needed:
            print(f"  - {c}")
    else:
        print("\nMissing Clarifications: None")

except Exception as e:
    print("Compilation rejected.")
    print("\nReason:")
    print(str(e))


Instruction:
When the room feels uncomfortable, switch on the cooling system.

Attempting compilation...

Compilation rejected.

Reason:
Failed to parse JSON from model output (Check LLM output):
Return:
<<<JSON_START>>>
{"steps":[
  {"id":"S1","role":"condition","text":"the room feels uncomfortable","depends_on":[],
   "operator": null, "value": null, "negated": false,
   "clarification_needed": false, "clarification_field": null}
]}
<<<JSON_END>>>


Instruction:
If the queue length exceeds 50 people, start a new queue.

Return JSON in this exact pattern:
<<<JSON_START>>>
{"steps":[
  {"id":"S1","role":"condition","text":"...", "depends_on":[],
    "operator": null, "value": null, "negated": false,
    "clarification_needed": false, "clarification_field": null}
]}


Seed (rough heuristic; refine strictly to schema):
{
  "steps": []
}

Return:
<<<JSON_START>>>
{"steps":[
  {"id":"S1","role


### Why This Rejection Is Correct

This instruction contains subjective language ("feels uncomfortable") without measurable
criteria. The lightweight local language model occasionally produces verbose or malformed
outputs for such cases.

Rather than silently guessing thresholds or accepting partially structured data, the system
rejects the output and surfaces the failure explicitly. This design prevents unsafe or
misleading automation and preserves transparency.


## Test Suite Execution

The following tests validate:
- schema correctness
- pipeline execution
- utility robustness

Due to the probabilistic nature of LLM-based generation, tests focus on structural validity rather than exact textual outputs.
Some legacy tests expecting deterministic strings are marked as expected failures.


In [8]:
!pytest tests/

platform linux -- Python 3.12.12, pytest-8.4.2, pluggy-1.6.0
rootdir: /content/nlp-language-compiler/nlp-language-compiler
configfile: pyproject.toml
plugins: typeguard-4.4.4, anyio-4.12.0, langsmith-0.4.56
collected 12 items                                                             [0m

tests/test_code_generator.py [32m.[0m[32m                                           [  8%][0m
tests/test_demo_examples.py [31mF[0m[31m                                            [ 16%][0m
tests/test_intent_parser.py [32m.[0m[31m                                            [ 25%][0m
tests/test_pipeline.py [31mF[0m[31m                                                 [ 33%][0m
tests/test_pseudocode.py [32m.[0m[31m                                               [ 41%][0m
tests/test_schemas.py [32m.[0m[32m.[0m[32m.[0m[32m.[0m[32m.[0m[31m                                              [ 83%][0m
tests/test_utils.py [32m.[0m[32m.[0m[31m                                       

## Limitations and Notes

- LLM outputs are probabilistic; exact string matching is avoided.
- The system prioritises transparency and safety over aggressive automation.
- Generated Python code uses stubs and is intended for demonstration rather than direct deployment.
