# GPT Chat Completion Lab

Welcome! In this mini-lab we will explore how to build a playful yet practical chat assistant using the GPT 5 models. The goal is to make the workflow clear enough for beginners while giving you a template you can adapt for your usecases.

Objectives:
- Build a basic GPT-powered chat assistant  
- Adjust assistant behavior using system prompts  
- Build a simple Gradio UI

## Game Plan
- **Context:** We are using Google Colab, so everything happens in the cloud.
- **Model:** `gpt-5-nano` keeps responses smart while staying cost-efficient.
- **Secret management:** We read the API key from the Colab secret named `OpenAI_API_Key`.
- **Flow:** install the SDK â†’ load the key securely â†’ define a helper function â†’ experiment with prompts.
- **Stretch idea:** tweak the conversation style and system prompt with your own ideas.


In [1]:
from google.colab import userdata
import os
from openai import OpenAI
import gradio as gr
from IPython.display import Markdown, display

MODEL="gpt-5-nano"

## Load Secrets (No Hard-Coding!)
Colab lets us keep keys in the `userdata` vault. Make sure your workspace already stores `OpenAI_API_Key`; otherwise run `userdata.set_secret` once (never share the value).


In [2]:
os.environ['OPENAI_API_KEY'] = userdata.get('OpenAI_API_Key')

## Wrap the GPT Client
We use the official `openai` package. The helper below:
1. Initializes a single `OpenAI` client.
2. Accepts a system message and a list of user turns.
3. Returns the model reply plus token usage so we can discuss cost control.


In [10]:
client = OpenAI()

response = client.responses.create(
    model=MODEL,
    input="Write a one-sentence bedtime story about a unicorn."
)

response

Response(id='resp_038efadbbb25fe8600691cd1dcee20819db8f5a0913d6c04f0', created_at=1763496412.0, error=None, incomplete_details=None, instructions=None, metadata={}, model='gpt-5-nano-2025-08-07', object='response', output=[ResponseReasoningItem(id='rs_038efadbbb25fe8600691cd1dd2a30819d8382e67521b033a1', summary=[], type='reasoning', content=None, encrypted_content=None, status=None), ResponseOutputMessage(id='msg_038efadbbb25fe8600691cd1dfd360819d9ad1e3510fd05d5d', content=[ResponseOutputText(annotations=[], text='Under a sleepy moon, a gentle unicorn trotted through a starlit meadow and curled up in the whispering grass, finally drifting off to dreamland.', type='output_text', logprobs=[])], role='assistant', status='completed', type='message')], parallel_tool_calls=True, temperature=1.0, tool_choice='auto', tools=[], top_p=1.0, background=False, conversation=None, max_output_tokens=None, max_tool_calls=None, previous_response_id=None, prompt=None, prompt_cache_key=None, reasoning=Rea

In [12]:
response.usage.output_tokens

486

Let's extract the reply part only:

In [13]:
print(response.output_text)

Under a sleepy moon, a gentle unicorn trotted through a starlit meadow and curled up in the whispering grass, finally drifting off to dreamland.


## System Instructions
Formerly known as system/developer prompt. The instructions parameter sets high-level guidance for how the model should behaveâ€”its tone, goals, and styleâ€”while message roles give more specific, task-level directions.


<img src="https://raw.githubusercontent.com/soltaniehha/Business-Analytics-Toolbox/master/docs/images/Prof-Owl-1.png"
     width="300">


In [16]:
instructions = "You are Professor Owl, a wise but approachable teacher. Give clear, simple explanations and gently guide students without sounding formal."
input = "why do data analysts prefer Python or SQL instead of Excel for big datasets?"

response = client.responses.create(
    model=MODEL,
    instructions=instructions,   # Formerly known as system prompt
    input=input,                 # User prompt
    text={ "verbosity": "low" }  # Low: short, concise outputs â€” High: detailed explanations or big refactors
)

Markdown(response.output_text)

Great question. For big datasets, Python or SQL beat Excel for several practical reasons:

- Scale and speed
  - Databases (SQL) are built for huge data and fast queries with indexing, partitioning, and optimized storage.
  - Python (with pandas, or tools like Dask/PySpark) can handle larger-than-Excel workloads and can be distributed or chunked.

- Reproducibility and automation
  - Code-based workflows can be saved, version-controlled, and rerun with new data.
  - Excel files are manual and easy to break with small changes; formulas and steps arenâ€™t as transparent or auditable.

- Data integrity and governance
  - SQL enforces data types and constraints; centralized, single source of truth is easier to maintain.
  - Excel copies data across files, increasing drift and mistakes.

- Complex data operations
  - SQL shines at joins, aggregations, and filtering across many tables.
  - Python offers advanced analytics, machine learning, and complex transformations with clear code.

- Collaboration
  - Databases and code-based workflows are better for teams (shared access, versioning, reviews).
  - Excel files can collide when multiple people edit them.

- Ecosystem and integration
  - Python has rich libraries (pandas, numpy, scikit-learn) and dashboards/tools that integrate with data stores.
  - SQL integrates with data warehouses and BI tools natively.

When Excel is fine: for small, quick, ad-hoc analyses, or when you need a simple stakeholder-friendly sheet.

If youâ€™re starting: learn SQL basics, then Python for more advanced analysis and automation.

## Chat History

In [17]:
# Keep history
history = [{"role": "developer", "content": instructions}]

def chat(message):
    history.append({"role": "user", "content": message})  # Add the new user message to history

    # Send entire history to the model
    response = client.responses.create(
        model=MODEL,
        input=history,
        text={ "verbosity": "low" }
    )

    # Add model response to history
    history.append({"role": "assistant", "content": response.output_text})

    return response.output_text

In [18]:
Markdown(chat(input))

Short answer: Excel isnâ€™t built to handle big data the way Python or SQL are. Hereâ€™s why:

- Scale and speed
  - Excel has row/memory limits and can slow to a crawl with large files.
  - SQL databases and Python (with libraries like pandas, Dask) can process terabytes of data, using indexing, parallelism, and optimized engines.

- Data manipulation power
  - SQL shines at joins, aggregates, window functions, and set-based operations on large tables.
  - Python lets you do complex cleaning, modeling, and custom logic, and you can chain steps into repeatable scripts.

- Reproducibility and automation
  - SQL and Python scripts can be version-controlled, tested, and run in automated pipelines.
  - Excel files are hard to track changes in, prone to manual errors, and harder to reproduce exactly.

- Collaboration and governance
  - Databases provide centralized data, access controls, and audit trails.
  - Excel files can get out of sync when shared.

- Ecosystem and integration
  - SQL connects directly to data warehouses and BI tools; Python can orchestrate workflows, pull data from APIs, and build models.
  - Excel is great for quick ad-hoc checks, but not as a backbone for big data workflows.

When Excel still makes sense: for small datasets, quick ad-hoc analysis, or a simple scratchpad.

Tip: For big datasets, start with SQL to fetch and summarize data, then use Python for deeper analysis or modeling.

In [19]:
chat("Please highlight the most important point")

'Key point: Excel isnâ€™t built for big data. For large datasets, use SQL (for scalable querying/aggregation) and Python (for deeper analysis), because they handle volume, speed, and reproducibility much better.'

In [20]:
history

[{'role': 'developer',
  'content': 'You are Professor Owl, a wise but approachable teacher. Give clear, simple explanations and gently guide students without sounding formal.'},
 {'role': 'user',
  'content': 'why do data analysts prefer Python or SQL instead of Excel for big datasets?'},
 {'role': 'assistant',
  'content': 'Short answer: Excel isnâ€™t built to handle big data the way Python or SQL are. Hereâ€™s why:\n\n- Scale and speed\n  - Excel has row/memory limits and can slow to a crawl with large files.\n  - SQL databases and Python (with libraries like pandas, Dask) can process terabytes of data, using indexing, parallelism, and optimized engines.\n\n- Data manipulation power\n  - SQL shines at joins, aggregates, window functions, and set-based operations on large tables.\n  - Python lets you do complex cleaning, modeling, and custom logic, and you can chain steps into repeatable scripts.\n\n- Reproducibility and automation\n  - SQL and Python scripts can be version-controlle

In [21]:
chat('hi')

'Hi there! What would you like to do today? Want a quick SQL or Python example, or tips for setting up a big-data workflow? Tell me your dataset size and goal, and Iâ€™ll tailor a tiny demo.'

In [22]:
chat('comparing python sql excel r')

'Nice quick comparison, friend. Hereâ€™s a simple compass for Python, SQL, Excel, and R.\n\n- Excel\n  - Best for: small datasets, quick checks, ad-hoc analysis, dashboards.\n  - Pros: easy UI, fast for tiny tasks, widely familiar.\n  - Cons: memory/row limits, not scalable, hard to reproduce/collaborate.\n\n- SQL\n  - Best for: querying and shaping data stored in databases/data warehouses.\n  - Pros: handles large volumes, fast, reliable, good for reproducible pipelines and governance.\n  - Cons: not ideal for advanced statistics or complex modeling out of the box.\n\n- Python\n  - Best for: end-to-end analysis, data cleaning, modeling, automation, production pipelines.\n  - Pros: huge library ecosystem (pandas, scikit-learn, etc.), versatile, easy to connect to many data sources.\n  - Cons: learning curve, performance depends on approach and hardware.\n\n- R\n  - Best for: statistics, specialized analytics, and rich visuals.\n  - Pros: powerful stats packages, tidyverse/dplyr, excell

## Chatbot
Using `Gradio` to build a chatbot that we control its workflow.

In [23]:
instructions = "You are Professor Owl, a wise but friendly teacher of Business Analytics. Explain concepts clearly and simply, using gentle guidance."

def respond(message, history):
    messages = [{"role": "developer", "content": instructions}]
    messages.extend({"role": m["role"], "content": m["content"]} for m in history)
    messages.append({"role": "user", "content": message})


    response = client.responses.create(
        model=MODEL,
        input=messages,
        text={"verbosity": "low"}
    )
    return response.output_text

demo = gr.ChatInterface(
    respond,
    type="messages",
    title="ðŸ¦‰ Professor Owl â€“ Business Analytics Helper",
    description="Ask Professor Owl anything data analytics!"
)

demo.launch(share=True)  # Add debug=True to debug, if needed

Colab notebook detected. To show errors in colab notebook, set debug=True in launch()
* Running on public URL: https://ea757ea1ef3651199b.gradio.live

This share link expires in 1 week. For free permanent hosting and GPU upgrades, run `gradio deploy` from the terminal in the working directory to deploy to Hugging Face Spaces (https://huggingface.co/spaces)




## Your Turn
Plug in your own scenario: Rephrase the instructions to shift tone/guidelines.



In [None]:
# Your code goes here