# GPT Chat Completion Lab

Welcome! In this mini-lab we will explore how to build a playful yet practical chat assistant using the GPT 5 models. The goal is to make the workflow clear enough for beginners while giving you a template you can adapt for your usecases.

Objectives:
- Build a basic GPT-powered chat assistant  
- Adjust assistant behavior using system prompts  
- Build a simple Gradio UI

## Game Plan
- **Context:** We are using Google Colab, so everything happens in the cloud.
- **Model:** `gpt-5-nano` keeps responses smart while staying cost-efficient.
- **Secret management:** We read the API key from the Colab secret named `OpenAI_API_Key`.
- **Flow:** install the SDK â†’ load the key securely â†’ define a helper function â†’ experiment with prompts.
- **Stretch idea:** tweak the conversation style and system prompt with your own ideas.


In [14]:
from google.colab import userdata
import os
from openai import OpenAI
import gradio as gr
from IPython.display import Markdown, display

MODEL="gpt-5-nano"

## Load Secrets (No Hard-Coding!)
Colab lets us keep keys in the `userdata` vault. Make sure your workspace already stores `OpenAI_API_Key`; otherwise run `userdata.set_secret` once (never share the value).


In [15]:
os.environ['OPENAI_API_KEY'] = userdata.get('OpenAI_API_Key')

## Wrap the GPT Client
We use the official `openai` package. The helper below:
1. Initializes a single `OpenAI` client.
2. Accepts a system message and a list of user turns.
3. Returns the model reply plus token usage so we can discuss cost control.


In [16]:
client = OpenAI()

response = client.responses.create(
    model=MODEL,
    input="Write a one-sentence bedtime story about a unicorn."
)

response

Response(id='resp_062912d2fbb3b52600691ccff3749881a3a520887bad494c70', created_at=1763495923.0, error=None, incomplete_details=None, instructions=None, metadata={}, model='gpt-5-nano-2025-08-07', object='response', output=[ResponseReasoningItem(id='rs_062912d2fbb3b52600691ccff3a68c81a39acb595b40bf7165', summary=[], type='reasoning', content=None, encrypted_content=None, status=None), ResponseOutputMessage(id='msg_062912d2fbb3b52600691ccff5615c81a398c6773692f8e312', content=[ResponseOutputText(annotations=[], text='Under the silver moon, a gentle unicorn trotted through a lullaby-soft forest and whispered sweet dreams to the sleepy stars.', type='output_text', logprobs=[])], role='assistant', status='completed', type='message')], parallel_tool_calls=True, temperature=1.0, tool_choice='auto', tools=[], top_p=1.0, background=False, conversation=None, max_output_tokens=None, max_tool_calls=None, previous_response_id=None, prompt=None, prompt_cache_key=None, reasoning=Reasoning(effort='medi

In [17]:
response.usage.output_tokens

351

Let's extract the reply part only:

In [18]:
print(response.output_text)

Under the silver moon, a gentle unicorn trotted through a lullaby-soft forest and whispered sweet dreams to the sleepy stars.


## System Instructions
Formerly known as system/developer prompt. The instructions parameter sets high-level guidance for how the model should behaveâ€”its tone, goals, and styleâ€”while message roles give more specific, task-level directions.


<img src="https://raw.githubusercontent.com/soltaniehha/Business-Analytics-Toolbox/master/docs/images/Prof-Owl-1.png"
     width="300">


In [19]:
instructions = "You are Professor Owl, a wise but approachable teacher. Give clear, simple explanations and gently guide students without sounding formal."
input = "why do data analysts prefer Python or SQL instead of Excel for big datasets?"

response = client.responses.create(
    model=MODEL,
    instructions=instructions,   # Formerly known as system prompt
    input=input,                 # User prompt
    text={ "verbosity": "low" }  # Low: short, concise outputs â€” High: detailed explanations or big refactors
)

Markdown(response.output_text)

Great question. Hereâ€™s the simple reality: Excel is great for small, quick analyses, but Python and SQL are built for bigger, repeatable work. Hereâ€™s why:

- Size and speed
  - Excel has a hard row limit (about 1 million rows per sheet) and can get very slow with large formulas or many sheets.
  - SQL databases and Python data tools are designed to handle large datasets efficiently, using indexing, streaming, and parallel processing.

- Memory and resources
  - Excel loads data into memory, which means big files can crash or become sluggish.
  - SQL runs on a database server; Python can process data in chunks or with out-of-core libraries (e.g., Dask) to avoid loading everything at once.

- Reproducibility and automation
  - Excel often involves manual steps, copy-paste, and ad-hoc formulas, which are easy to break and hard to reproduce.
  - SQL and Python scripts can be version-controlled, tested, and rerun automatically to produce the same results every time.

- Data integrity and governance
  - With Excel, multiple copies of the same data can exist in different files, leading to inconsistencies.
  - A centralized database plus scripted workflows keep a single source of truth and clear provenance.

- Manipulation capabilities
  - SQL shines at joining large tables, filtering, aggregating, and doing set-based operations efficiently.
  - Python (with pandas, pyarrow, etc.) is very flexible for cleaning, feature engineering, and complex transformations, especially when logic is iterative.

- Collaboration and sharing
  - Databases and code-based workflows are easier to share, review, and run in teams.
  - Excel files are easier for individuals but harder to manage at scale and in production.

When to use Excel still:
- Quick exploration on small datasets
- Final tweaks or simple dashboards
- One-off analyses that donâ€™t need repetition

Bottom line: for big datasets and scalable, repeatable analysis, SQL and Python are usually the better tools. Use Excel for quick, light tasks, not as the primary tool for large-scale data work.

## Chat History

In [20]:
# Keep history
history = [{"role": "developer", "content": instructions}]

def chat(message):
    history.append({"role": "user", "content": message})  # Add the new user message to history

    # Send entire history to the model
    response = client.responses.create(
        model=MODEL,
        input=history,
        text={ "verbosity": "low" }
    )

    # Add model response to history
    history.append({"role": "assistant", "content": response.output_text})

    return response.output_text

In [21]:
Markdown(chat(input))

Great question. In short: for big datasets, Python or SQL usually wins because theyâ€™re designed for scale, automation, and reproducibility. Hereâ€™s why:

- Size and speed
  - Excel hits a row limit (about 1 million rows) and can bog down or crash with big data.
  - SQL databases and Python (with proper tooling) handle much larger data efficiently (indexes, querying, streaming, chunking).

- Power of data operations
  - Excel is great for light, ad-hoc calculations.
  - SQL can join large tables, filter, group, and use indexes. Python (pandas, Dask) can do complex transformations, analytics, and even ML.

- Reproducibility and automation
  - Excel is manual and error-prone; changes arenâ€™t easy to track.
  - Code and queries can be version-controlled, automated in pipelines, and re-run with consistent results.

- Data integrity and governance
  - Databases enforce schemas, types, constraints, and access controls.
  - Excel workbooks can become inconsistent, with multiple copies and hidden changes.

- Ecosystem and collaboration
  - Python/SQL fit into data pipelines, dashboards, ML, and scalable storage; workflows are easier to share and audit.

- When Excel is okay
  - Small datasets, quick exploration, or when you need a simple, visual summary.

If youâ€™re dealing with big data, start with SQL for querying and Python for deeper analysis or modeling.

In [22]:
chat("Please highlight the most important point")

'Key point: For big datasets, SQL and Python are preferred because they scale, automate, and give reproducible results, while Excel struggles with size and consistency.'

In [23]:
history

[{'role': 'developer',
  'content': 'You are Professor Owl, a wise but approachable teacher. Give clear, simple explanations and gently guide students without sounding formal.'},
 {'role': 'user',
  'content': 'why do data analysts prefer Python or SQL instead of Excel for big datasets?'},
 {'role': 'assistant',
  'content': 'Great question. In short: for big datasets, Python or SQL usually wins because theyâ€™re designed for scale, automation, and reproducibility. Hereâ€™s why:\n\n- Size and speed\n  - Excel hits a row limit (about 1 million rows) and can bog down or crash with big data.\n  - SQL databases and Python (with proper tooling) handle much larger data efficiently (indexes, querying, streaming, chunking).\n\n- Power of data operations\n  - Excel is great for light, ad-hoc calculations.\n  - SQL can join large tables, filter, group, and use indexes. Python (pandas, Dask) can do complex transformations, analytics, and even ML.\n\n- Reproducibility and automation\n  - Excel is 

## Chatbot
Using `Gradio` to build a chatbot that we control its workflow.

In [24]:
instructions = "You are Professor Owl, a wise but friendly teacher of Business Analytics. Explain concepts clearly and simply, using gentle guidance."

def respond(message, history):
    messages = [{"role": "developer", "content": instructions}]
    messages.extend({"role": m["role"], "content": m["content"]} for m in history)
    messages.append({"role": "user", "content": message})


    response = client.responses.create(
        model=MODEL,
        input=messages,
        text={"verbosity": "low"}
    )
    return response.output_text

demo = gr.ChatInterface(
    respond,
    type="messages",
    title="ðŸ¦‰ Professor Owl â€“ Business Analytics Helper",
    description="Ask Professor Owl anything data analytics!"
)

demo.launch(share=True)  # Add debug=True to debug, if needed

Colab notebook detected. To show errors in colab notebook, set debug=True in launch()
* Running on public URL: https://f9dc499acb65c87bc3.gradio.live

This share link expires in 1 week. For free permanent hosting and GPU upgrades, run `gradio deploy` from the terminal in the working directory to deploy to Hugging Face Spaces (https://huggingface.co/spaces)




## Your Turn
Plug in your own scenario: Rephrase the instructions to shift tone/guidelines.



In [None]:
# Your code goes here