<a href="https://colab.research.google.com/github/AshameTheDestroyer/DevScribe/blob/main/DevScribe.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [None]:
!pip -q install langchain langchain-google-genai langchain_community pydantic

# Initialization

In [None]:
import os
import json
from google.colab import userdata
from IPython.display import display, Markdown
from langchain_google_genai import ChatGoogleGenerativeAI
from langchain_core.prompts import ChatPromptTemplate, FewShotChatMessagePromptTemplate
from langchain_core.output_parsers import StrOutputParser, JsonOutputParser, PydanticOutputParser
from langchain_core.runnables import RunnableLambda, RunnableBranch, RunnableParallel, RunnablePassthrough
from typing import Dict, Any, List
from pydantic import BaseModel, Field

os.environ["LANGSMITH_API_KEY"] = userdata.get("LANGSMITH_API_KEY")
os.environ["LANGSMITH_TRACING"] = "true"

os.environ["LANGSMITH_PROJECT"] = "DevScribe"

os.environ["GOOGLE_API_KEY"] = userdata.get("GOOGLE_API_KEY")
model = ChatGoogleGenerativeAI(model="gemini-2.5-flash", temperature=0.4)
creative_model = ChatGoogleGenerativeAI(model="gemini-2.5-flash", temperature=0.7)
deterministic_model = ChatGoogleGenerativeAI(model="gemini-2.5-flash", temperature=0.0)

# Pydantic Schema Classes

In [None]:
class CodeAnalysis(BaseModel):
  language: str = Field(description="Detected Programming Language")
  complexity_score: int = Field(description="Complexity on a Scale from 1 to 10", ge=1, le=10)
  key_concepts: List[str] = Field(description="List of Detected Code Smells")
  extracted_code: str = Field(description="Extracted Code From Prompt")

class RefactorSuggestion(BaseModel):
  critique: str = Field(description="The Architectural Review")
  refactored_code: str = Field(description="The Actual Clean Code")
  changes_made: List[str] = Field(description="Summary of Changes")

# Chain A: Metadata Extractor

In [None]:
code_analysis_prompt = ChatPromptTemplate.from_template(
    """
    You're a code expert. Analyze the user's prompt:
    {user_prompt}

    1. Extract the largest code block. If none, use empty string.
    2. Detect programming language.
    3. Assign complexity (1–10) using this rubric:
       - 1–2: trivial, single function
       - 3–4: simple script, basic logic
       - 5–6: multiple funcs/classes, clear structure
       - 7–8: complex logic, dependencies, patterns
       - 9–10: large system, advanced algorithms
    4. List code smells (e.g., 'duplicate code', 'long method', 'primitive obsession').
    Output MUST conform strictly to the CodeAnalysis schema.
    """
).with_config(run_name="Chain A: Metadata Extractor") \
  | deterministic_model.with_structured_output(CodeAnalysis) \
  | RunnableLambda(lambda x: x.model_dump())

# Chain B: Few-Shot Docstring Generator

In [None]:
few_shot_examples = [
    {
        "bad_code":
          """
            def calc(a, b):
              return a * b
          """,
        "good_docstring":
          """
            \"\"\"
            Multiply two numbers.
            Args:
              a (float): First operand.
              b (float): Second operand.
            Returns:
              float: Product of a and b.\n
            \"\"\"
          """
    },
    {
        "bad_code":
          """
            function getUser(id) {
              return users.find(u => u.id === id);
            }
          """,
        "good_docstring":
          """
            /**
              * Retrieves a user by their unique ID.
              *
              * @param {number} id - The unique identifier of the user.
              * @returns {Object|null} The user object if found, otherwise null.
              */
          """
    },
    {
        "bad_code":
          """def p():
            print('Hello')
          """,
        "good_docstring":
          """
            \"\"\"
            Print a greeting message to standard output.
            \"\"\"
          """
    },
]

example_prompt = ChatPromptTemplate.from_messages([
    (
        "human",
        """
          Code:
          {bad_code}

          Generate only the docstring.
        """
      ),
    ("ai", "{good_docstring}"),
])

few_shot_prompt = FewShotChatMessagePromptTemplate(
    examples=few_shot_examples,
    example_prompt=example_prompt,
)

python_docstring_prompt = ChatPromptTemplate.from_messages([
    (
        "system",
        """
          You are a Python documentation expert. Generate ONLY a PEP 257-compliant
          docstring. Do NOT rewrite code.
        """
    ),
    few_shot_prompt,
    (
        "human",
        """
          Code:
          {extracted_code}

          Generate only the code with the docstring added, your response must be valid source
          code in Python. Do not wrap it in triple backticks or add any extra text.
        """
    ),
])

generic_docstring_prompt = ChatPromptTemplate.from_messages([
    (
      "system",
      """
        You are a documentation expert. Generate appropriate language-specific
        documentation comments (e.g., JSDoc, Doxygen). Do NOT rewrite code.
      """
    ),
    (
      "human",
      """
        Language: {language}
        Code:
        {extracted_code}

        Generate only the code with the documentation comments added, your response must be
        valid source code in {language}. Do not wrap it in triple backticks or add any extra text.
      """
    ),
])

docstring_branch = RunnableBranch(
    (lambda x: x["language"].lower() == "python", python_docstring_prompt),
    generic_docstring_prompt
).with_config(run_name="Chain B: Few-Shot Docstring Generator") \
  | model \
  | StrOutputParser()

# Chain C: CoT Auditor

In [None]:
auditor_prompt = ChatPromptTemplate.from_messages([
    (
        "system",
        """
          You are a grumpy but brilliant Senior Software Architect.
          You hate bad code and security holes.
        """
    ),
    (
        "human",
        """
          Think step-by-step about the following code.
          Programming Language: {language}
          Code Smells Detected: {key_concepts}
          Code:
          {extracted_code}

          First, analyze:
          1. Logic Flaws: Incorrect assumptions, inefficient algorithms, poor structure.
          2. Security Risks: Injection, data leaks, unsafe defaults, error exposure, etc.
          Then, provide a concise but brutal critique for a junior engineer.
        """
    ),
]).with_config(run_name="Chain C: CoT Auditor") \
  | model \
  | StrOutputParser()


# Chain D: Refactorer

In [None]:
refactor_prompt = ChatPromptTemplate.from_messages([
    (
        "system",
        """
          You are a meticulous, ruthless, and pedantic Senior Architect.
          Refactor code based on your prior critique.
        """
    ),
    (
        "human",
        """
          Original Code:
          {extracted_code}

          Architect's Critique:
          {logic}

          Refactor this code with:
          - Improved readability and structure.
          - Fixed security flaws.
          - Better performance if obvious.
          - Language best practices.

          Output must conform to the RefactorSuggestion schema, and refactored code
          must be valid source code in {language}. Do not wrap it in triple backticks
          or add any extra text..
        """
    ),
]).with_config(run_name="Step D: Refactorer") \
  | creative_model.with_structured_output(RefactorSuggestion) \
  | RunnableLambda(lambda x: x.model_dump())

# Bonus: Meta-Prompting for Unit Tests

In [None]:
test_generator_prompt = ChatPromptTemplate.from_messages([
    ("system", "You are an expert test engineer."),
    (
        "human",
        """
          Given this refactored code in {language}:
          {refactored_code}

          Generate a precise prompt that instructs an LLM to write comprehensive
          unit tests for it.
          The prompt should specify:
          - Testing framework (e.g., pytest, Jest).
          - Key behaviors to test.
          - Edge cases.
          - Mocking strategy if needed.
          - Injecting this literal exact text at the start of prompt's list: "OUTPUT MUST
            ONLY BE CODE, NOTHING ELSE, the response must be valid source code in {language},
            it shouldn't be wrapped in triple backticks or have any extra text...".

          Return ONLY the prompt string, NOTHING ELSE.
        """
    ),
]) | model | StrOutputParser()

test_executor_prompt = ChatPromptTemplate.from_messages([
    ("system", "You are a meticulous test engineer. Follow instructions exactly."),
    ("human", "{generated_test_prompt}")
]) | model | StrOutputParser()

# Main Chain

In [None]:
main_chain = (
    code_analysis_prompt
    | RunnablePassthrough.assign(
        documentation=(
            lambda x: docstring_branch.invoke({
                "language": x["language"],
                "extracted_code": x["extracted_code"]
            }) if x["extracted_code"].strip() else ""
        )
    ).with_config(run_name="1. Generate Documentation")
    | RunnablePassthrough.assign(
        audit_critique=(
            lambda x: auditor_prompt.invoke({
                "language": x["language"],
                "key_concepts": x["key_concepts"],
                "extracted_code": x["extracted_code"],
            }) if x["extracted_code"].strip() else "No code to audit."
        )
    ).with_config(run_name="2. Perform Security & Logic Audit")
    | RunnablePassthrough.assign(
        refactor=(
            lambda x: refactor_prompt.invoke({
                "language": x["language"],
                "extracted_code": x["extracted_code"],
                "logic": x["audit_critique"],
            }) if x["extracted_code"].strip() else {
                "critique": "No code provided.",
                "refactored_code": "",
                "changes_made": [],
            }
        )
    ).with_config(run_name="3. Refactor Code")
    | RunnablePassthrough.assign(
        unit_test_prompt=(
            lambda x: test_generator_prompt.invoke({
                "language": x["language"],
                "refactored_code": x["refactor"]["refactored_code"]
            }) if x["refactor"]["refactored_code"].strip() else ""
        )
    ).with_config(run_name="4. Generate Unit Test Prompt")
    | RunnablePassthrough.assign(
        unit_tests=(
            lambda x: test_executor_prompt.invoke({
                "generated_test_prompt": x["unit_test_prompt"]
            }) if x["unit_test_prompt"] else ""
        )
    ).with_config(run_name="5. Generate Unit Tests")
    | {
        "analysis": lambda x: {
            "language": x["language"],
            "complexity_score": x["complexity_score"],
            "key_concepts": x["key_concepts"],
            "extracted_code": x["extracted_code"],
        },
        "documentation": lambda x: x["documentation"],
        "refactor": lambda x: x["refactor"],
        "unit_tests": lambda x: x["unit_tests"]
    }
).with_config(run_name="Code Analysis & Refactor")

# Execution

In [None]:
with open('code.txt', 'r') as file:
    raw_input = file.read()

result = main_chain.invoke({ "user_prompt": raw_input })
print(json.dumps(result, indent=2))

{
  "analysis": {
    "language": "Python",
    "complexity_score": 3,
    "key_concepts": [],
    "extracted_code": "def proc_data(d):\n# stuff happens\nx = []\nfor i in d:\nif i > 10:\nx.append(i * 2)\nelse:\nx.append(i)\nreturn x"
  },
  "documentation": "def proc_data(d):\n  \"\"\"Process a list of numbers, doubling values greater than 10.\n\n  Args:\n    d (list): The input list of numbers.\n\n  Returns:\n    list: A new list with values greater than 10 doubled, others unchanged.\n  \"\"\"\n  # stuff happens\n  x = []\n  for i in d:\n    if i > 10:\n      x.append(i * 2)\n    else:\n      x.append(i)\n  return x",
  "refactor": {
    "critique": "Alright, let's dissect this \"code.\"\n\n### Analysis:\n\n1.  **Logic Flaws:**\n    *   **Input Type Assumption:** The most glaring flaw is the complete lack of input validation. This function *assumes* `d` is an iterable containing only numeric types (or types that correctly implement `__gt__` and `__mul__` with integers). If `d` contain

# Extracted Code

In [None]:
display(Markdown(f"```{result["analysis"]["language"]}\n{result["analysis"]["extracted_code"]}\n```"))

```Python
def proc_data(d):
# stuff happens
x = []
for i in d:
if i > 10:
x.append(i * 2)
else:
x.append(i)
return x
```

# Docstrings

In [None]:
display(Markdown(f"```{result["analysis"]["language"]}\n{result["documentation"]}\n```"))

```Python
def proc_data(d):
  """Process a list of numbers, doubling values greater than 10.

  Args:
    d (list): The input list of numbers.

  Returns:
    list: A new list with values greater than 10 doubled, others unchanged.
  """
  # stuff happens
  x = []
  for i in d:
    if i > 10:
      x.append(i * 2)
    else:
      x.append(i)
  return x
```

# Refactored Code

In [None]:
display(Markdown(f"```{result["analysis"]["language"]}\n{result["refactor"]["refactored_code"]}\n```"))

```Python
def process_numeric_list(data_list):
    """
    Processes a list of numbers, doubling values greater than 10.

    This function ensures input robustness by validating that the provided
    'data_list' is an iterable containing only numeric types (integers or floats).
    It transforms the list by doubling any number greater than 10, otherwise
    keeping the number as is.

    Args:
        data_list: An iterable (e.g., list, tuple) of numeric values to be processed.

    Returns:
        A new list containing the processed numeric values.

    Raises:
        TypeError: If 'data_list' is not an iterable, or if any of its elements
                   are not numeric (int or float).
    """
    # Validate that the input is an iterable
    if not isinstance(data_list, (list, tuple, set)):
        raise TypeError("Input 'data_list' must be an iterable (e.g., list, tuple, set).")

    # Validate that all elements in the iterable are numeric
    for item in data_list:
        if not isinstance(item, (int, float)):
            raise TypeError(
                f"All elements in 'data_list' must be numeric (int or float). "
                f"Found type {type(item).__name__} for value '{item}'."
            )

    # Use a list comprehension for a more Pythonic, concise, and often efficient
    # way to create the new list based on the transformation logic.
    processed_data = [item * 2 if item > 10 else item for item in data_list]
    return processed_data
```

# Unit Tests

In [None]:
display(Markdown(f"```{result["analysis"]["language"]}\n{result["unit_tests"]}\n```"))

```Python
import pytest

def process_numeric_list(data_list):
    """
    Processes a list of numbers, doubling values greater than 10.

    This function ensures input robustness by validating that the provided
    'data_list' is an iterable containing only numeric types (integers or floats).
    It transforms the list by doubling any number greater than 10, otherwise
    keeping the number as is.

    Args:
        data_list: An iterable (e.g., list, tuple) of numeric values to be processed.

    Returns:
        A new list containing the processed numeric values.

    Raises:
        TypeError: If 'data_list' is not an iterable, or if any of its elements
                   are not numeric (int or float).
    """
    # Validate that the input is an iterable
    if not isinstance(data_list, (list, tuple, set)):
        raise TypeError("Input 'data_list' must be an iterable (e.g., list, tuple, set).")

    # Validate that all elements in the iterable are numeric
    for item in data_list:
        if not isinstance(item, (int, float)):
            raise TypeError(
                f"All elements in 'data_list' must be numeric (int or float). "
                f"Found type {type(item).__name__} for value '{item}'."
            )

    # Use a list comprehension for a more Pythonic, concise, and often efficient
    # way to create the new list based on the transformation logic.
    processed_data = [item * 2 if item > 10 else item for item in data_list]
    return processed_data


@pytest.mark.parametrize("input_list, expected_output", [
    # Numbers greater than 10
    ([11], [22]),
    ([20], [40]),
    ([11, 12, 13], [22, 24, 26]),
    # Numbers less than or equal to 10
    ([1], [1]),
    ([5], [5]),
    ([10], [10]),
    ([1, 5, 10], [1, 5, 10]),
    # Mixed lists
    ([5, 12, 8, 20], [5, 24, 8, 40]),
    ([10, 11, 9, 12, 10.0], [10, 22, 9, 24, 10.0]),
    # Empty list
    ([], []),
    # List with only numbers > 10
    ([15, 100, 10.1], [30, 200, 20.2]),
    # List with only numbers <= 10
    ([3, 7, 0, -5], [3, 7, 0, -5]),
    # Zero
    ([0], [0]),
    ([0, 11, 5], [0, 22, 5]),
    # Negative numbers
    ([-5], [-5]),
    ([-10], [-10]),
    ([-11], [-11]),
    ([-0.5, -10.0, -15.5], [-0.5, -10.0, -15.5]),
    # Floating-point numbers
    ([10.1], [20.2]),
    ([9.9], [9.9]),
    ([10.0], [10.0]),
    ([1.23, 15.7, 10.0], [1.23, 31.4, 10.0]),
    ([10.00000000000001], [20.00000000000002]),
    ([9.99999999999999], [9.99999999999999]),
    # Large numbers
    ([1000000], [2000000]),
    ([1000000000000.5], [2000000000001.0]),
    ([5000000000000.0], [10000000000000.0]),
    # Boolean values (True=1, False=0)
    ([True], [True]),
    ([False], [False]),
    ([True, False, 11, 10], [True, False, 22, 10]),
])
def test_process_numeric_list_valid_inputs(input_list, expected_output):
    """Tests core logic with various valid list inputs."""
    assert process_numeric_list(input_list) == expected_output


@pytest.mark.parametrize("input_iterable, expected_output", [
    # Tuple inputs
    ((1, 12, 5), [1, 24, 5]),
    ((10.0, 10.1, 9.9), [10.0, 20.2, 9.9]),
    ((), []),
])
def test_process_numeric_list_tuple_input(input_iterable, expected_output):
    """Tests with tuple inputs, ensuring correct processing and list return type."""
    result = process_numeric_list(input_iterable)
    assert result == expected_output
    assert isinstance(result, list)


def test_process_numeric_list_set_input():
    """Tests with set inputs, verifying elements are processed correctly (order not guaranteed)."""
    input_set = {1, 12, 5, 10.5, 0, -3.5}
    expected_output_elements = {1, 24, 5, 21.0, 0, -3.5}
    result = process_numeric_list(input_set)
    assert isinstance(result, list)
    assert set(result) == expected_output_elements


def test_process_numeric_list_set_input_empty():
    """Tests with an empty set input."""
    input_set = set()
    result = process_numeric_list(input_set)
    assert isinstance(result, list)
    assert result == []


@pytest.mark.parametrize("invalid_input", [
    None,
    123,
    "hello",
    {'a': 1},
    True, # bool is a subclass of int, but not an iterable (list, tuple, set)
    1.23,
    object(),
    range(5), # A generator/range object is an iterable, but not list, tuple, or set
])
def test_process_numeric_list_non_iterable_input_raises_type_error(invalid_input):
    """Tests that non-list/tuple/set inputs raise a TypeError."""
    with pytest.raises(TypeError, match="Input 'data_list' must be an iterable"):
        process_numeric_list(invalid_input)


@pytest.mark.parametrize("invalid_iterable_with_non_numeric", [
    [1, "two", 3],
    [10, None, 20],
    [1, [2, 3], 4],
    [1, {'a': 1}, 3],
    [1, object(), 3],
    ["string_only"],
    [None],
    [[1,2]],
    [{'key': 'value'}],
    (1, "two", 3), # Tuple with non-numeric
    {1, "two", 3}, # Set with non-numeric
])
def test_process_numeric_list_iterable_with_non_numeric_elements_raises_type_error(invalid_iterable_with_non_numeric):
    """Tests that iterables containing non-numeric elements raise a TypeError."""
    with pytest.raises(TypeError, match="All elements in 'data_list' must be numeric"):
        process_numeric_list(invalid_iterable_with_non_numeric)


def test_process_numeric_list_non_numeric_element_error_message_content():
    """Tests the specific content of the TypeError message for non-numeric elements."""
    with pytest.raises(TypeError) as excinfo:
        process_numeric_list([1, "hello", 3])
    assert "All elements in 'data_list' must be numeric (int or float)." in str(excinfo.value)
    assert "Found type str for value 'hello'." in str(excinfo.value)

    with pytest.raises(TypeError) as excinfo:
        process_numeric_list([1, None, 3])
    assert "All elements in 'data_list' must be numeric (int or float)." in str(excinfo.value)
    assert "Found type NoneType for value 'None'." in str(excinfo.value)

    with pytest.raises(TypeError) as excinfo:
        process_numeric_list([1, [2], 3])
    assert "All elements in 'data_list' must be numeric (int or float)." in str(excinfo.value)
    assert "Found type list for value '[2]'." in str(excinfo.value)
```

# Audit Critique

In [None]:
display(Markdown(f"```{result["analysis"]["language"]}\n{result["refactor"]["critique"]}\n```"))

```Python
Alright, let's dissect this "code."

### Analysis:

1.  **Logic Flaws:**
    *   **Input Type Assumption:** The most glaring flaw is the complete lack of input validation. This function *assumes* `d` is an iterable containing only numeric types (or types that correctly implement `__gt__` and `__mul__` with integers). If `d` contains strings, `None`, or any other non-comparable/non-multipliable type, the function will throw a `TypeError` or `AttributeError` at runtime. This makes the function incredibly brittle and prone to crashing.
    *   **Inefficiency/Readability (Minor):** While functional, the explicit `for` loop with `if/else` and `append` is verbose for such a simple transformation. A list comprehension would be significantly more concise, Pythonic, and often marginally more performant.

2.  **Security Risks:**
    *   **No Direct Security Risks:** Within the isolated scope of *this specific function*, there are no direct security vulnerabilities like injection, data leaks, or unsafe defaults. It performs a simple numerical transformation.
    *   **Indirect Risk (Input Validation):** The *absence* of input validation means this function is a potential point of failure if `d` originates from untrusted user input *upstream*. If malicious or malformed data (e.g., a list of non-numeric types) is passed to it, it will crash the application, potentially leading to a denial-of-service scenario. While the function itself doesn't *create* the security vulnerability, it certainly doesn't help mitigate it and instead acts as an unhandled exception waiting to happen.

### Brutal Critique for a Junior Engineer:

"Alright, let's look at this 'code.'

**Logic Flaws:** You've built a house of cards. This function *assumes* `d` is a list of numbers. What happens if I pass it `['a', 'b', 'c']`? Or `[1, None, 3]`? It crashes. Hard. You've introduced a runtime bomb waiting to explode the moment someone feeds it anything other than your imagined perfect input. This isn't robust; it's fragile. And for a simple transformation, that `for` loop is verbose. Ever heard of a list comprehension? It's Python 101 for this kind of thing.

**Security Risks:** As for 'security risks,' this particular piece of garbage isn't directly creating any *new* vulnerabilities. It's too simple and too broken to even get that far. However, its utter lack of input validation means it's a perfect *enabler* for security issues further up the chain. If `d` comes from untrusted input, and you don't validate it *before* it hits this function, you're just asking for a `TypeError` to bring down your service. Not a direct security hole, but a gaping maw for instability that can be exploited for denial of service.

**Bottom Line:** This isn't production-ready. It's a brittle, unvalidated piece of code that will crash if you look at it funny. Learn about input validation and list comprehensions. Now fix it."
```