# Supervised Machine Learning & OpenAI: End-to-End Workflow

## 1. Dataset Preparation for Fine-Tuning (Code Generation Task)

In [1]:
!pip install pandas

import pandas as pd

# Example dataset for fine-tuning code generation (input: prompt, output: code)
data = [
    {
        "prompt": "Write a Python function to add two numbers.",
        "completion": "def add(a, b):\n    return a + b"
    },
    {
        "prompt": "Write a Python function to check if a number is even.",
        "completion": "def is_even(n):\n    return n % 2 == 0"
    },
    {
        "prompt": "Write a Python function to find factorial.",
        "completion": "def factorial(n):\n    return 1 if n == 0 else n * factorial(n-1)"
    },
]

df = pd.DataFrame(data)
df.to_csv("code_generation_dataset.csv", index=False)
df.head()




Unnamed: 0,prompt,completion
0,Write a Python function to add two numbers.,"def add(a, b):\n return a + b"
1,Write a Python function to check if a number i...,def is_even(n):\n return n % 2 == 0
2,Write a Python function to find factorial.,def factorial(n):\n return 1 if n == 0 else...


## 2. Supervised Model: Train-Test Split and Fine-Tuning Prep
For OpenAI fine-tuning, you typically format your data as JSONL. Let's convert and show a basic split.

In [2]:
import json

# Train-test split (80-20 for demo)
train_df = df.sample(frac=0.8, random_state=42)
test_df = df.drop(train_df.index)

# Save as JSONL for OpenAI fine-tuning
def save_jsonl(df, filename):
    with open(filename, "w") as f:
        for _, row in df.iterrows():
            entry = {
                "messages": [
                    {"role": "user", "content": row['prompt']},
                    {"role": "assistant", "content": row['completion']}
                ]
            }
            f.write(json.dumps(entry) + "\n")

save_jsonl(train_df, "train.jsonl")
save_jsonl(test_df, "test.jsonl")


In [4]:
import google.generativeai as genai

# ✅ Set the API key directly (DO NOT use os.getenv if you're not using a .env file)
genai.configure(api_key="AIzaSyAUJrMKFNR2YgkE22Orzufwo-tD2xggDVk")

# ✅ Load the Gemini model (gemini-pro or gemini-1.5-flash are preferred)
model = genai.GenerativeModel("gemini-2.0-flash")  # or "gemini-1.5-flash" if you prefer

# ✅ Generate a response
response = model.generate_content("Write a motivational quote.")
print(response.text)


**"The only limit to your impact is your imagination and commitment."**



### A. Prompt Engineering for Code Generation

In [5]:
def generate_code(prompt):
    response = model.generate_content(
        f"You are a helpful Python code assistant. {prompt}"
    )
    return response.text

# Test prompt
prompt = "Write a Python function to reverse a string."
print("Prompt:", prompt)
print("Generated Code:\n", generate_code(prompt))

Prompt: Write a Python function to reverse a string.
Generated Code:
 ```python
def reverse_string(s):
  """Reverses a string.

  Args:
    s: The string to reverse.

  Returns:
    The reversed string.
  """
  return s[::-1]

# Example usage
string = "hello"
reversed_string = reverse_string(string)
print(f"Original string: {string}")
print(f"Reversed string: {reversed_string}")
```

**Explanation:**

1. **`def reverse_string(s):`**:  This defines a function named `reverse_string` that takes a single argument `s`, which is the string we want to reverse.

2. **`return s[::-1]`**: This is the core of the function.  It uses slicing to create a reversed copy of the string.
   - `[::-1]` is a slice that starts from the end of the string, goes to the beginning, and steps backwards by 1 character at a time.  This effectively creates a reversed version of the string without modifying the original.

3. **Example Usage**: The code includes an example to demonstrate how to use the function and pr

### B. Code Optimization Using OpenAI

In [6]:
raw_code = """
def add_numbers(a, b):
    result = 0
    result = a + b
    return result
"""

opt_prompt = f"Optimize the following Python code for simplicity and efficiency:\n\n{raw_code}"

optimized_code = generate_code(opt_prompt)
print("Optimized Code:\n", optimized_code)


Optimized Code:
 ```python
def add_numbers(a, b):
  """Adds two numbers and returns the sum."""
  return a + b
```

**Explanation of Changes and Why They Improve the Code:**

* **Direct Return:** The original code first initialized `result` to 0, then assigned the sum `a + b` to it, and *then* returned the value. This is unnecessarily verbose.  We can directly return the result of the addition `a + b`. This eliminates the need for an intermediate variable, making the code shorter and more readable.
* **Docstring:** Added a docstring to briefly explain what the function does.  Good documentation is crucial for maintainability and understanding.
* **Efficiency:** While the original code's performance difference would be negligible in most practical scenarios, directly returning the result avoids an unnecessary assignment operation, potentially improving performance slightly in very high-frequency usage.  More importantly, it *demonstrates* a principle of avoiding extra steps when the res

### C. Prompt Engineering for Debugging

In [7]:
buggy_code = """
def divide(a, b):
    return a / b
print(divide(10, 0))
"""

debug_prompt = f"Find and fix the bug in the following Python code:\n\n{buggy_code}"

fixed_code = generate_code(debug_prompt)
print("Fixed Code:\n", fixed_code)


Fixed Code:
 ```python
def divide(a, b):
    if b == 0:
        return "Cannot divide by zero"
    return a / b
print(divide(10, 0))
```

**Reasoning:**

The original code attempts to divide `a` by `b` without checking if `b` is zero.  In mathematics and in Python, division by zero is undefined and results in a `ZeroDivisionError`.

The corrected code adds a check to see if `b` is equal to zero. If it is, the function returns the string "Cannot divide by zero". Otherwise, it performs the division and returns the result.  This prevents the `ZeroDivisionError` and provides a more informative output to the user.



## 4. [Optional] Evaluate Generated Code with Automated Testing
You can run generated code snippets using exec, but always sanitize untrusted code. For demonstration:

In [8]:
import re
def generate_code(prompt):
    response = model.generate_content(f"Write only Python code.\n{prompt}")
    return response.text

# 🌟 Step 4: Extract code from markdown/codeblock
def extract_code_block(text):
    """
    Extracts the first python code block from LLM output.
    If no code block, returns raw text.
    """
    code_blocks = re.findall(r"```(?:python)?(.*?)```", text, re.DOTALL | re.IGNORECASE)
    if code_blocks:
        return code_blocks[0].strip()
    return text.strip()

# 🌟 Step 5: Test the generated code
def test_reverse_string():
    prompt = "Write a Python function named `reverse_string` that takes a string and returns it reversed."
    code_response = generate_code(prompt)
    
    print("💬 LLM Response:\n", code_response)
    
    code = extract_code_block(code_response)
    print("\n✅ Extracted Code:\n", code)

    try:
        exec(code, globals())  # Defines reverse_string() globally

        # Assertions to test the function
        assert reverse_string("OpenAI") == "IAnepO"
        assert reverse_string("") == ""
        assert reverse_string("a") == "a"
        print("\n🎉 Test passed!")
    except Exception as e:
        print("\n❌ Test failed:", e)

# 🌟 Step 6: Run it
test_reverse_string()

💬 LLM Response:
 ```python
def reverse_string(s):
  """
  Reverses a string.

  Args:
    s: The string to reverse.

  Returns:
    The reversed string.
  """
  return s[::-1]
```

✅ Extracted Code:
 def reverse_string(s):
  """
  Reverses a string.

  Args:
    s: The string to reverse.

  Returns:
    The reversed string.
  """
  return s[::-1]

🎉 Test passed!


## Summary
*Dataset Preparation:* Created and exported a dataset for code generation tasks.

*Modeling & Fine-Tuning:* Demonstrated how to prepare data for OpenAI fine-tuning.

*Prompt Engineering:* Showed prompts for code generation, code optimization, and debugging.

*OpenAI API Usage:* Used the OpenAI API for advanced code assistance.