# Chapter 1 — Environment Setup and Notebooks

**IMPORTANT: This chapter uses the book-wide shared environment setup blease follow the README.md in the root directory.**

## Prerequisites





### Before running this notebook, complete the book-wide setup from the repository root:

**macOS/Linux:**
```bash
bash setup/setup_mac.sh
```

**Windows (PowerShell):**
```powershell
powershell -ExecutionPolicy Bypass -File setup/setup_windows.ps1
```

This creates:
- Shared environment: `data_strategy_env/` (Python 3.12)
- Jupyter kernel: **"Python (Data Strategy Book)"**
- API keys: Automatically configured during setup

#### Using This Notebook

1. **Select the correct kernel**: **"Python (Data Strategy Book)"**

The setup script registers the environment as a Jupyter kernel named **"Python (Data Strategy Book)"**.
- Open Command Palette (Mac: Cmd+Shift+P) (Windows: Ctrl+Shift+P), 
- run: Developer: Reload Window (Mac: Cmd+Shift+P; or press Cmd+P, type '>Developer: Reload Window (Windows: Ctrl+P, type '>Developer: Reload Window')')

![reload_window](../images/reload_window.png)

- After reload, click Select Kernel (top-right)

![select_kernel](../images/select_kernel.png)

- Choose Jupyter Kernel

![jupyter_kernel](../images/jupyter_kernel.png)

- Choose `Python (Data Strategy Book)`

![select_python_data](../images/select_python_data.png)

- Run ALL cells:

![run_all_cells](../images/run_all.png)

- If you did not add the API key to the .env file, or during the setup, you will receive a pop-up to enter your OpenAI API key

![openai_api_key](../images/api_key.png)

We already explained how to get an OpenAI API key in the root README.


2. **If kernel not visible**: Command Palette → "Developer: Reload Window"
   - Mac: Cmd+Shift+P (Windows: Ctrl+Shift+P)
   - Type: "Developer: Reload Window"
3. **Restart kernel** if you just completed setup

The setup script handles all dependencies and API key configuration automatically.

## OpenAI API Setup


### OpenAI API Setup

For this book, I'm using OpenAI's API as our primary LLM provider. While there are other excellent options like Anthropic's Claude, Google's Gemini, or even local models with Ollama, OpenAI provides the most reliable, well-documented, and widely-used API in the industry. The reason I choose OpenAI for this book is the predictable service quality, comprehensive model selection, and industry-standard experience that you'll encounter in production environments. However, you can adapt the code to work with any other API of your choice. LLM calls are not the focus of this book, but are necessary. The focus of this book is about the data we are feeding to the LLM.

Here's the step-by-step setup process:

**Step 1: Create Your OpenAI Account**

When you go to https://platform.openai.com, you will see the following screen, where you can Sign In or Sign Up. If you have an account, you just need to sign in. If you don't have account, you need to sign up. Go to https://auth.openai.com/create-account and sign up for an account. You'll need to provide a phone number for verification.

![OpenAI Platform Homepage](../images/OpenAI_Platform_Home_Page.png)
**Figure 1.5: OpenAI Platform homepage - the industry standard for LLM APIs**

**Step 2: Complete Account Verification**

You can sign up with Google, Microsoft, or email. OpenAI requires phone verification for security. I recommend using your primary development account for consistency.

![OpenAI Sign Up](../images/OpenAI_Signup_page.png)

**Figure 1.6: OpenAI registration - phone verification required for account security**

**Step 3: Add Billing Information**

Unlike free-tier services, OpenAI requiresuires a payment method, but you only pay for what you use. The pricing is very reasonable - typically $0.002 per 1K tokens for GPT-4.1. For this book's examples, expect to spend less thanan $5 total. 

**Important**: **You will have to add money to your credit balance to be able to run the examples in this book. If you did not add credit, you will receive an error when you call the APIs.** 
https://platform.openai.com/settings/organization/billing/overview 

![OpenAI Billing](../images/OpenAI_Billing.png)

**Figure 1.7: Billing setup - pay-per-use model with transparent pricing**

**Step 4: Navigate to API Keys**

Once your account is sett up, go to https://platform.openai.com/api-keys to manage your API keys.

![OpenAI API Keys](../images/OpenAI_API_Keys.png)

**Figure 1.8: API Keys section in your OpenAI dashboard**

**Step 5: Create Your API Key**

Click "Create new secret key" and give it a descriptive name like "Book Examples" or "Development Testing". 

![Create API Key](../images/create_api_key.png)


**Figure 1.9: Creating a new API key**

**Step 6: Copy Your API Key**

Your API key will start with "sk-" - copy the entire string and paste it in the pop-up window in Colab.

- Store it securely. **Important**: You can only view this key once, so save it immediately.

## Option 1: Google Colab (Recommended for Beginners)


### Option 1: Google Colab (Recommended for Beginners)

If you're new to Python or want to start immediately without setup hassles, Google Colab is perfect. It requires zero installation, provides a fresh environment every time, and lets you focus on learning AI concepts rather than wrestling with environment configuration.

**Getting Started with Colab:**

1. **Google Account**: You need a Google account to access Google Colab. If you don't have one, you can create it for free at https://accounts.google.com/signup.

2. **Accessing Google Colab**: Open a web browser and go to https://colab.research.google.com/. You'll be prompted to sign in with your Google account.

![Colab Login](../images/colab_sign_in.png)  
**Figure 1.1: Google Colab Sign-in Page** 

3. **Create a New Notebook**: After signing in, click on the "New Notebook" button to create a new Colab notebook.

![Colab New Notebook](../images/colab_new_notebook.png)

**Figure 1.2: Google Colab New Notebook** 

**Note:** If you are new to Colab, you can read the "Welcome to Colab" guide to get started.

You will have a screen similar to the one below:

![Google Colab Interface](../images/colab_interface.png)

**Figure 1.3: Google Colab interface showing a new notebook**

On the GitHub repository, you will find a Jupyter Notebook file named `Chapter_1_Setup_Advanced.ipynb` that contains the code we will be using in this chapter. 
1. First, download the notebook from the GitHub repository (Or clone the repository).

2. Then, upload the notebook to your Colab environment and run it to follow along with the code examples in this chapter. This is the easiest way to get started, if you do not have previous experience or do not want to set up a local environment.

![Colab Upload Notebook](../images/Colab_Upload.png)

**Figure 1.4: Google Colab Upload Notebook**

- Run ALL cells:

![run_all_cells](../images/run_all.png)

- You will receive a pop-up to enter your OpenAI API key

![openai_api_key](../images/api_key.png)

We already explained how to get an OpenAI API key in the first cell of the notebook.






## Option 2: Automated Local Setup (Recommended for advanced users)


### Option 2: Automated Local Setup (Recommended for advanced users)

Follow these steps before running any cells:

- macOS/Linux
  1) Open Terminal
  2) cd to this repository root (Data-Strategy-for-LLMs)
  3) Run: `bash setup/setup_mac.sh`

- Windows (PowerShell)
  1) Open PowerShell (Run as Administrator if first-time installs)
  2) cd to this repository root (Data-Strategy-for-LLMs)
  3) Run: `powershell -ExecutionPolicy Bypass -File setup/setup_windows.ps1`

- Google Colab
  1) Just run the first code cell; it will handle basics for Colab if needed
  2) You can mount Drive and set paths as you prefer
  3) No virtual environment is required in Colab; dependencies install via pip cells as needed

Environment selection:
- Open Command Palette (Mac: Cmd+Shift+P) (Windows: Ctrl+Shift+P), 
- run: Developer: Reload Window (Mac: Cmd+Shift+P; or press Cmd+P, type '>Developer: Reload Window (Windows: Ctrl+P, type '>Developer: Reload Window')')

![reload_window](../images/reload_window.png)

- After reload, click Select Kernel (top-right)

![select_kernel](../images/select_kernel.png)

- Choose Jupyter Kernel

![jupyter_kernel](../images/jupyter_kernel.png)

- Choose `Python (Chapter 1)`

![chapter_1_env](../images/chapter_1_env.png)

- Run ALL cells:

![run_all_cells](../images/run_all.png)

- You will receive a pop-up to enter your OpenAI API key

![openai_api_key](../images/api_key.png)

We already explained how to get an OpenAI API key in the first cell of the notebook.




### Jupyter Kernel Setup Fix

**If you're seeing an error like "Running cells with 'Python X.X.X' requires the ipykernel package", this cell will fix it!**

This is a common issue, especially on:
- Fresh Python installations
- Homebrew-managed Python environments on macOS
- Systems with multiple Python versions

**Run the cell below to automatically detect your Python environment and install the correct kernel.**

In [1]:
import sys
import subprocess
import os

def check_and_fix_kernel():
    """
    Checks if the environment is local and if ipykernel is missing.
    If both conditions are true, it attempts to install the kernel.
    """
    # Step 1: Detect if running in Google Colab
    if 'google.colab' in sys.modules:
        print(" Running in Google Colab. No kernel fix needed.")
        return

    # Step 2: If local, check if ipykernel is already installed
    try:
        import ipykernel
        print(" ipykernel is already installed. No fix needed.")
        return
    except ImportError:
        print(" ipykernel not found. Attempting installation...")

    # Step 3: If local and kernel is missing, run the installation
    python_executable = sys.executable
    python_version = f"{sys.version_info.major}.{sys.version_info.minor}.{sys.version_info.micro}"
    
    print(f"DETECTED Python: {python_executable}")
    print(f"PYTHON VERSION: {python_version}")
    
    # Method 1: Try standard installation
    try:
        subprocess.run(
            [python_executable, '-m', 'pip', 'install', 'ipykernel', '-U', '--user', '--force-reinstall'],
            capture_output=True, text=True, check=True
        )
        print("SUCCESS: Successfully installed ipykernel (Method 1)")
        method_used = 1
    except subprocess.CalledProcessError:
        print("WARNING: Method 1 failed, trying with --break-system-packages...")
        # Method 2: Try with --break-system-packages
        try:
            subprocess.run(
                [python_executable, '-m', 'pip', 'install', 'ipykernel', '-U', '--user', '--force-reinstall', '--break-system-packages'],
                capture_output=True, text=True, check=True
            )
            print("SUCCESS: Successfully installed ipykernel (Method 2 - with system override)")
            method_used = 2
        except subprocess.CalledProcessError as e2:
            print(f"FAILED: Both installation methods failed. Error: {e2.stderr}")
            print("\nConsider creating a virtual environment manually.")
            return

    # Install kernel spec for the current Python
    try:
        kernel_name = f"python{sys.version_info.major}{sys.version_info.minor}"
        display_name = f"Python {python_version}"
        
        subprocess.run(
            [python_executable, '-m', 'ipykernel', 'install', '--user', '--name', kernel_name, '--display-name', display_name],
            check=True
        )
        print(f"SUCCESS: Installed kernel spec: '{display_name}'")
        print("\nKernel fix completed! Please RESTART your Jupyter server and select the new kernel.")
    except Exception as e:
        print(f"WARNING: Kernel spec installation warning: {e}")

# Run the check and fix function
check_and_fix_kernel()

 ipykernel is already installed. No fix needed.


#### What This Fix Does

The cell above automatically handles the most common kernel installation scenarios:

**Method 1 - Standard Installation:**
- Tries the standard `pip install ipykernel` approach
- Works for most regular Python installations

**Method 2 - System Override (Homebrew/Externally Managed):**
- Uses `--break-system-packages` flag for Homebrew Python
- Handles "externally-managed-environment" errors
- Essential for macOS Homebrew Python environments

**Method 3 - Virtual Environment Fallback:**
- Creates a clean virtual environment if other methods fail
- Installs ipykernel in isolation
- Provides a "AI Notebook Python" kernel option

**After running the fix:**
- Your Jupyter interface should show available kernels
- Select the one that matches your Python version
- All notebook cells should run without kernel errors

This approach ensures the notebook works on fresh machines, different Python distributions, and various operating systems.

## Complete Future-Proof OpenAI Setup
### Comprehensive Error Handling & API Evolution Adaptation

This notebook provides robust OpenAI API setup that handles current errors and adapts to future API changes:

**Error Handling:** Billing, authentication, model deprecation, rate limits, network issues
**Future-Proofing:** SDK version compatibility, adaptive response parsing, flexible error patterns
**Cross-Platform:** Local Jupyter, Google Colab, Python 3.8+

#### API Key Setup

Before we dive into the architecture, let's set up our environment to work with OpenAI. For this book, I'm using OpenAI as our primary LLM gateway. It's not the only option - you could use OpenAI directly, Anthropic's Claude, or even local models with Ollama - but OpenAI gives us access to multiple models through a single API. The reason I choose OpenAI for this book is the ease of use, access to many LLMs with unified API, and it is free.

In [2]:
# Smart Environment Setup
import sys, os, subprocess, importlib.util

IN_COLAB = 'google.colab' in sys.modules
print(f"Environment: {'Google Colab' if IN_COLAB else 'Local Jupyter'}")

def smart_install(package, min_version=None):
    """Install packages with multiple fallback strategies"""
    package_spec = f"{package}>={min_version}" if min_version else package
    strategies = [
        [sys.executable, '-m', 'pip', 'install', package_spec, '--quiet'],
        [sys.executable, '-m', 'pip', 'install', package_spec, '--user', '--quiet'],
        [sys.executable, '-m', 'pip', 'install', package_spec, '--break-system-packages', '--quiet']
    ]
    
    for cmd in strategies:
        try:
            subprocess.run(cmd, capture_output=True, check=True)
            print(f"SUCCESS: {package}")
            return True
        except subprocess.CalledProcessError:
            continue
    print(f"FAILED: {package}")
    return False

# Install required packages
packages = {'openai': '1.0.0', 'python-dotenv': None, 'packaging': None}
for pkg, ver in packages.items():
    smart_install(pkg, ver)

Environment: Local Jupyter
SUCCESS: openai
SUCCESS: python-dotenv
SUCCESS: packaging


In [3]:
# Import modules with graceful fallbacks
import os, re, time, json, getpass
from typing import Optional, List, Dict, Tuple

try:
    from dotenv import load_dotenv
    DOTENV_AVAILABLE = True
except ImportError:
    DOTENV_AVAILABLE = False
    def load_dotenv(): pass

try:
    from packaging import version
    VERSION_CHECK = True
except ImportError:
    VERSION_CHECK = False

print("Modules imported successfully!")

Modules imported successfully!


In [4]:
# Future-Proof API Key Validator
class APIKeyValidator:
    def __init__(self):
        self.patterns = [
            r'^sk-[A-Za-z0-9]{20,}$',
            r'^sk-proj-[A-Za-z0-9\-_]{20,}$',
            r'^sk-[A-Za-z0-9\-_]{40,}$'
        ]
        self.invalid_keys = {
            'your_api_key_here', 'sk-your-key-here', 'sk-...', 'sk-xxxxxxxx',
            'sk-placeholder', 'sk-example', 'sk-demo', 'sk-test'
        }
    
    def validate(self, key: str) -> Tuple[bool, str]:
        if not key or not isinstance(key, str):
            return False, "API key is empty"
        
        key = key.strip()
        
        if key.lower() in [k.lower() for k in self.invalid_keys]:
            return False, "API key appears to be a placeholder"
        
        if not key.startswith('sk-'):
            return False, "API keys should start with 'sk-'"
        
        if len(key) < 30:
            return False, "API key is too short"
        
        for pattern in self.patterns:
            if re.match(pattern, key):
                return True, "Valid API key format"
        
        # Heuristic check for unknown formats
        if self._heuristic_check(key):
            return True, "Format not recognized but appears valid"
        
        return False, "Invalid format"
    
    def _heuristic_check(self, key: str) -> bool:
        remaining = key[3:]  # Remove 'sk-'
        alphanumeric = sum(1 for c in remaining if c.isalnum())
        unique_chars = len(set(remaining.lower()))
        return alphanumeric >= len(remaining) * 0.8 and unique_chars >= 8

validator = APIKeyValidator()
print("API key validator ready")

API key validator ready


In [5]:
# Enhanced OpenAI API key setup with Google Drive support for Colab
import os
import sys
from pathlib import Path
from getpass import getpass
from dotenv import load_dotenv

def is_valid_openai_key(key: str) -> bool:
    if not key or not isinstance(key, str):
        return False
    key = key.strip()
    placeholders = {'your_api_key_here','sk-your-key-here','sk-...','sk-xxxxxxxx'}
    if key.lower() in placeholders:
        return False
    if not key.startswith('sk-'):
        return False
    return len(key) >= 40

def mount_google_drive():
    """Mount Google Drive in Colab if not already mounted"""
    try:
        from google.colab import drive
        drive_path = Path('/content/drive')
        if not drive_path.exists():
            print("Mounting Google Drive...")
            drive.mount('/content/drive')
            print("Google Drive mounted successfully!")
        return True
    except ImportError:
        return False
    except Exception as e:
        print(f"Failed to mount Google Drive: {e}")
        return False

def get_drive_env_path():
    """Get the Google Drive path for storing .env file"""
    drive_root = Path('/content/drive/MyDrive')
    colab_folder = drive_root / 'Colab_Notebooks' / 'Data_Strategy_Book'
    colab_folder.mkdir(parents=True, exist_ok=True)
    return colab_folder / '.env'

def prompt_drive_save():
    """Ask user if they want to save to Google Drive"""
    print("\nYour API key will be lost when this Colab session ends.")
    print("Would you like to save it to Google Drive for future sessions?")
    
    while True:
        choice = input("Save to Google Drive? (y/n): ").strip().lower()
        if choice in ['y', 'yes']:
            return True
        elif choice in ['n', 'no']:
            return False
        else:
            print("Please enter 'y' for yes or 'n' for no.")

def find_chapter_root(start: Path = None) -> Path:
    p = (start or Path.cwd()).resolve()
    
    # Check if we're in Google Colab
    if 'google.colab' in sys.modules:
        # In Colab, try to find chapter directory or create one
        colab_chapter_dir = p / 'chapter_01'
        if not colab_chapter_dir.exists():
            colab_chapter_dir.mkdir(exist_ok=True)
        return colab_chapter_dir
    
    # Original logic for local environments
    for parent in [p] + list(p.parents):
        if parent.name.startswith('chapter_'):
            return parent
    return p

def save_api_key_to_file(api_key: str, file_path: Path):
    """Save API key to .env file"""
    existing = []
    if file_path.exists():
        existing = file_path.read_text(encoding='utf-8').splitlines()

    wrote = False
    updated = []
    for line in existing:
        if line.strip().startswith('OPENAI_API_KEY='):
            updated.append(f'OPENAI_API_KEY={api_key}')
            wrote = True
        else:
            updated.append(line)
    if not wrote:
        updated.append(f'OPENAI_API_KEY={api_key}')

    file_path.write_text('\n'.join(updated) + '\n', encoding='utf-8')

def load_api_key_from_drive():
    """Try to load API key from Google Drive"""
    if 'google.colab' not in sys.modules:
        return None
    
    try:
        drive_env_path = get_drive_env_path()
        if drive_env_path.exists():
            load_dotenv(dotenv_path=drive_env_path, override=True)
            api_key = os.getenv('OPENAI_API_KEY')
            if is_valid_openai_key(api_key):
                print(f"API key loaded from Google Drive: {drive_env_path}")
                return api_key
    except Exception as e:
        print(f"Could not load from Google Drive: {e}")
    
    return None

# Main execution
try:
    IN_COLAB = 'google.colab' in sys.modules
    chapter_root = find_chapter_root()
    ENV_PATH = chapter_root / '.env'
    
    # First, try to load from local .env
    load_dotenv(dotenv_path=ENV_PATH, override=False)
    api_key = os.getenv('OPENAI_API_KEY')
    
    # If in Colab and no local key, try Google Drive
    if IN_COLAB and not is_valid_openai_key(api_key):
        if mount_google_drive():
            drive_key = load_api_key_from_drive()
            if drive_key:
                api_key = drive_key
                os.environ['OPENAI_API_KEY'] = api_key

    # If still no valid key, prompt user
    if not is_valid_openai_key(api_key):
        print('OpenAI API key not found or invalid. Please enter it securely:')
        entered = getpass('Enter your OpenAI API key (starts with sk-): ').strip()
        if not is_valid_openai_key(entered):
            raise ValueError('Invalid API key format or empty input.')

        # Save to local .env
        save_api_key_to_file(entered, ENV_PATH)
        
        # If in Colab, offer to save to Google Drive
        if IN_COLAB:
            if prompt_drive_save():
                if mount_google_drive():
                    try:
                        drive_env_path = get_drive_env_path()
                        save_api_key_to_file(entered, drive_env_path)
                        print(f"API key saved to Google Drive: {drive_env_path}")
                        print("This will persist across Colab sessions!")
                    except Exception as e:
                        print(f"Failed to save to Google Drive: {e}")
                        print("API key saved locally (will be lost when session ends)")
            else:
                print("API key saved locally (will be lost when session ends)")

        # Set for current session
        load_dotenv(dotenv_path=ENV_PATH, override=True)
        os.environ['OPENAI_API_KEY'] = entered
        print('API key loaded for this session')
    else:
        source = "Google Drive" if IN_COLAB and 'drive_key' in locals() else "environment"
        print(f'OpenAI API key loaded from {source}')

except Exception as e:
    print("API key setup required:")
    print(str(e))
    print("\nQuick setup:")
    if 'google.colab' in sys.modules:
        print("1. Run this cell and enter your API key when prompted")
        print("2. Choose 'y' to save to Google Drive for persistence")
        print("3. Get your key from: https://platform.openai.com/api-keys")
    else:
        print("1. Copy .env.example to .env: cp .env.example .env")
        print("2. Edit .env and add your OpenAI API key")
        print("3. Get your key from: https://platform.openai.com/api-keys")
        print("4. Restart this notebook kernel")

OpenAI API key loaded from environment


#### Connecting with OpenAI API

In [6]:
# Connection Test: OpenAI embeddings API
try:
    import os
    import openai
    key = os.getenv('OPENAI_api_key')
    if hasattr(openai, 'OpenAI'):
        client = openai.OpenAI(api_key=key)
    else:
        client = openai
        client.api_key = key
    _ = client.embeddings.create(model='text-embedding-3-small', input='ping')
    print('Connection test OK')
except Exception as e:
    print(f'Connection test failed: {e}')


Connection test OK


In [7]:
import os
api_key = os.getenv("OPENAI_API_KEY")  # pull from env into Python variable
if not api_key or not api_key.strip():
    raise ValueError("OPENAI_API_KEY is not set. Run the setup cell above first.")

### OpenAI Assistant ask_ai()

In the following code, we will define a future‑proof OpenAI assistant that initializes an API client, discovers and prioritizes modern models, selects a working model via a quick smoke test, and exposes a single ask_ai() method with robust retry and error classification. The snippet sets two global variables (OpenAI_API_Key, Model) and, if api_key is available, instantiates FutureProofAssistant at global scope so later cells can simply call assistant.ask_ai(...).

Key components and flow:
- Globals: `OpenAI_API_Key`, `Model` are declared for easy access across cells.
- Class `FutureProofAssistant`:
  - __init__(api_key): saves the key, sets defaults, and calls `_initialize()`.
  - `_initialize()`: builds the client, discovers models, selects a working one, then updates global `Model`.
  - `_setup_client()`: supports both modern SDK (`openai.OpenAI(api_key=...)`) and legacy (`openai.api_key = ...`).
  - `_discover_models()`: calls `client.models.list()`, filters to modern families (e.g., `o4`, `gpt‑4.1`, `gpt‑4o`), and prioritizes them.
  - `_select_model()`: tries top candidates with `_test_model()` by making a tiny chat completion; picks the first that works.
  - `ask_ai(content)`: validates input, performs up to 3 attempts with backoff on rate limits, and routes errors via `_classify_error()` to user‑friendly messages (`_billing_error_message()`, `_auth_error_message()`, `_model_error_message()`).
  - `_extract_content(response)`: returns the assistant text from either `choices[0].message.content` (modern) or `choices[0].text` (legacy).
- Global initialization: if `api_key` is set (by earlier setup cells), `assistant = FutureProofAssistant(api_key)` runs at the top level, which makes `assistant` available in `globals()` for later cells.

Why this matters and practical notes:
- Resilience: The assistant adapts to SDK differences and changing model names by discovering models dynamically and testing them before use.
- Simplicity for downstream cells: Placing `assistant` in the global namespace avoids re‑wiring; later code can do `assistant.ask_ai("...")` without reconfiguration.
- Error handling: Billing, auth, model, and rate‑limit issues are detected and surfaced with clear guidance, while other errors retry briefly before failing cleanly.
- Extensibility: You can tweak `include_patterns` (model families), `priority` (preferred order), or `max_retries` without touching the rest of the notebook.
- Initialization dependency: This block assumes an earlier cell loaded a valid `api_key` (for example from `.env`), otherwise the class raises a clear “No API key provided” error and the bottom‑cell guard prints “Cannot initialize assistant without API key”.

In [8]:
# Future-Proof OpenAI Assistant (updated models and discovery)
import time

# Global variables to be used later
OpenAI_API_Key = None
Model = None

class FutureProofAssistant:
    def __init__(self, api_key=None):
        global OpenAI_API_Key, Model
        
        self.api_key = api_key or api_key  # assumes api_key set in a previous cell
        self.client = None
        # Prefer modern families; keep a reasonable fallback
        self.models = ['o4-mini', 'o4', 'gpt-4.1-mini', 'gpt-4.1', 'gpt-4o']
        self.selected_model = None
        self.max_retries = 3
        
        if not self.api_key:
            raise ValueError("No API key provided")
        
        # Set global variables
        OpenAI_API_Key = self.api_key
        
        self._initialize()
    
    def _initialize(self):
        global Model
        
        print("Initializing Future-Proof Assistant...")
        self._setup_client()
        self._discover_models()
        self._select_model()
        
        # Set global Model variable
        Model = self.selected_model
        
        print(f"Ready! Using model: {self.selected_model}")
        print(f"Global variables set: OpenAI_API_Key and Model = '{Model}'")
    
    def _setup_client(self):
        try:
            import openai
            if hasattr(openai, 'OpenAI'):
                self.client = openai.OpenAI(api_key=self.api_key)
                print("Client initialized (modern API)")
            else:
                openai.api_key = self.api_key
                self.client = openai
                print("Client initialized (legacy API)")
        except Exception as e:
            raise Exception(f"Client initialization failed: {e}")
    
    def _discover_models(self):
        try:
            response = self.client.models.list()
            all_models = [m.id for m in response.data]
            # Prefer modern families; exclude legacy 3.5.
            # Future-proof: include patterns for potential future names (may not exist yet).
            include_patterns = ['o4', 'gpt-4.1', 'gpt-4o', 'gpt-5', 'gpt-4.5', 'gpt-6']
            chat_models = [
                m for m in all_models
                if any(p in m.lower() for p in include_patterns)
            ]
            self.models = self._prioritize_models(chat_models) or self.models
            print(f"Found {len(self.models)} models")
        except Exception as e:
            print(f"Model discovery failed: {e} - using defaults")
    
    def _prioritize_models(self, models):
        priority = ['o4-mini', 'o4', 'gpt-4.1-mini', 'gpt-4.1', 'gpt-4o']
        result = [m for m in priority if m in models]
        result.extend([m for m in sorted(models) if m not in result])
        return result
    
    def _select_model(self):
        for model in self.models[:3]:
            if self._test_model(model):
                self.selected_model = model
                return
        self.selected_model = self.models[0]
    
    def _test_model(self, model):
        try:
            self.client.chat.completions.create(
                model=model,
                messages=[{"role": "user", "content": "Hi"}],
                max_tokens=5
            )
            return True
        except:
            return False
    
    def ask_ai(self, content: str) -> str:
        if not content or not content.strip():
            return "Error: Please provide a valid question."
        
        for attempt in range(self.max_retries):
            try:
                response = self.client.chat.completions.create(
                    model=self.selected_model,
                    messages=[{"role": "user", "content": content.strip()}],
                    max_tokens=1000,
                    temperature=0.7
                )
                return self._extract_content(response)
            
            except Exception as e:
                error_type = self._classify_error(e)
                
                if error_type == 'billing':
                    return self._billing_error_message()
                elif error_type == 'auth':
                    return self._auth_error_message()
                elif error_type == 'model':
                    return self._model_error_message()
                elif error_type == 'rate' and attempt < self.max_retries - 1:
                    wait_time = 2 ** attempt
                    print(f"Rate limited. Waiting {wait_time}s...")
                    time.sleep(wait_time)
                    continue
                elif attempt < self.max_retries - 1:
                    print(f"Attempt {attempt + 1} failed: {str(e)[:50]}...")
                    time.sleep(1)
                    continue
                else:
                    return f"Error after {self.max_retries} attempts: {str(e)[:100]}..."
    
    def _extract_content(self, response):
        try:
            return response.choices[0].message.content
        except:
            try:
                return response.choices[0].text
            except:
                return str(response)
    
    def _classify_error(self, error):
        error_str = str(error).lower()
        if any(word in error_str for word in ['quota', 'billing', 'credit']):
            return 'billing'
        elif any(word in error_str for word in ['auth', 'key', 'unauthorized']):
            return 'auth'
        elif any(word in error_str for word in ['model', 'not_found']):
            return 'model'
        elif any(word in error_str for word in ['rate', 'limit', 'too_many']):
            return 'rate'
        return 'unknown'
    
    def _billing_error_message(self):
        return """BILLING ERROR: Insufficient credits.
        
To fix this:
1. Visit: https://platform.openai.com/settings/organization/billing/overview
2. Add a payment method
3. Purchase credits (minimum $5)
4. Wait a few minutes for credits to appear

Note: OpenAI requires prepaid credits for API usage."""
    
    def _auth_error_message(self):
        return """AUTHENTICATION ERROR: Invalid API key.
        
To fix this:
1. Check your API key at: https://platform.openai.com/api-keys
2. Create a new key if needed
3. Re-run the API key setup cell above

Make sure your key starts with 'sk-' and is complete."""
    
    def _model_error_message(self):
        return f"""MODEL ERROR: {self.selected_model} not available.
        
This usually means:
1. Model has been deprecated
2. Your account doesn't have access
3. Temporary service issue

The assistant will automatically try other models."""

# Initialize assistant and set global variables
if api_key:
    assistant = FutureProofAssistant(api_key)
    print(f"\nGlobal variables available:")
    print(f"OpenAI_API_Key: {'***' + OpenAI_API_Key[-10:] if OpenAI_API_Key else 'None'}")
    print(f"Model: {Model}")
else:
    print("Cannot initialize assistant without API key")

Initializing Future-Proof Assistant...
Client initialized (modern API)
Found 43 models
Ready! Using model: gpt-4.1-mini
Global variables set: OpenAI_API_Key and Model = 'gpt-4.1-mini'

Global variables available:
OpenAI_API_Key: ***e6nekmhQkA
Model: gpt-4.1-mini


#### Test the Assistant

In the following code, we will define a tiny wrapper ask_ai(content) that forwards calls to a globally initialized assistant (if present) and then run a quick smoke test block that only executes when api_key is available, verifying a basic response, empty-input handling, and printing selected and available models.

What it does:
- `ask_ai(content)`: Checks `globals()` for `assistant`; if found, calls `assistant.ask_ai(content)`. Otherwise returns a helpful message prompting you to run setup cells.
- Test harness (guarded by `if api_key:`): 
  - Prints a header.
  - Runs a basic test: `ask_ai("Say 'Hello, I am working!' in exactly those words.")` to confirm the end-to-end path.
  - Runs an empty-input test to verify validation in `assistant.ask_ai("")`.
  - Prints `assistant.selected_model` and a short preview of `assistant.models` to confirm model discovery/selection.
  - If `api_key` is missing, it prints “Please complete API key setup first.”


  


In [9]:
# Test the Assistant
def ask_ai(content: str) -> str:
    """Simple interface to the future-proof assistant"""
    if 'assistant' in globals():
        return assistant.ask_ai(content)
    else:
        return "Assistant not initialized. Please run the setup cells above."

# Test with various scenarios
if api_key:
    print("Testing assistant functionality...\n")
    
    # Basic test
    response = ask_ai("Say 'Hello, I am working!' in exactly those words.")
    print(f"Basic Test: {response}\n")
    
    # Empty input test
    response = ask_ai("")
    print(f"Empty Input Test: {response}\n")
    
    # Model info
    print(f"Selected Model: {assistant.selected_model}")
    print(f"Available Models: {assistant.models[:3]}...")
    
    print("\nAssistant is ready for use!")
else:
    print("Please complete API key setup first.")

Testing assistant functionality...

Basic Test: Hello, I am working!

Empty Input Test: Error: Please provide a valid question.

Selected Model: gpt-4.1-mini
Available Models: ['o4-mini', 'gpt-4.1-mini', 'gpt-4.1']...

Assistant is ready for use!


Results Explanation:
- You should see a literal response “Hello, I am working!” for the basic test if the model and key are configured correctly.
- The empty-input test should return the error string implemented inside `assistant.ask_ai` (e.g., “Error: Please provide a valid question.”).

Context:
- `assistant` is created earlier at notebook-global scope (e.g., `assistant = FutureProofAssistant(api_key)`), so this helper simply routes calls without reconfiguring the client.
- The `if api_key:` guard avoids running tests when the environment is not ready.

Next Steps:
- If you see the “Assistant not initialized” message, run the setup cells that define `api_key` and instantiate `assistant`.
- Replace the basic prompt with your real question and iterate on temperature, max tokens, or model via the assistant configuration defined earlier in the notebook.

#### Usage Examples

Now you can use the `ask_ai()` function for any queries:

```python
# Simple question
response = ask_ai("What is machine learning?")
print(response)

# Complex analysis
response = ask_ai("Explain the benefits of using LLMs for data analysis")
print(response)
```

## Future-Proof Features

This setup automatically handles:
- **API Changes**: Adapts to new OpenAI SDK versions
- **Model Updates**: Discovers and selects optimal models
- **Error Evolution**: Flexible error pattern matching
- **Response Formats**: Multiple content extraction methods

The assistant will continue working even as OpenAI updates their API!

In [10]:
ask_ai("tell me a joke")

"Sure! Here's one for you:\n\nWhy don’t scientists trust atoms?  \nBecause they make up everything!"

# Start Here

This is a direct, minimal call using the modern OpenAI client: we build a client with `openai.OpenAI(api_key=OpenAI_API_Key)`, request a chat completion with the globally selected `Model`, and print the assistant’s text via `response.choices[0].message.content`. I use this pattern when I want to confirm the key and model work end‑to‑end before moving back to the higher‑level `assistant.ask_ai(...)` wrapper.

In [11]:
import openai;  
OpenAI_client = openai.OpenAI(api_key=OpenAI_API_Key)

response = OpenAI_client.chat.completions.create( 
    model=Model, 
    messages=[{ 
            "role": "user",  
            "content": "Tell me a joke"}]) 

print(response.choices[0].message.content)


Sure! Here's a joke for you:

Why don’t scientists trust atoms?

Because they make up everything!


In [12]:
import openai 
 
def ask_ai_simple(content, api_key=OpenAI_API_Key): 
    return openai.OpenAI(api_key=api_key).chat.completions.create( 
        model=Model,  
        messages=[{"role": "user", "content": content}] 
    ).choices[0].message.content 
 
# Usage: 
print(ask_ai_simple("Tell me a joke"))  

Sure! Here's a joke for you:

Why don't scientists trust atoms?

Because they make up everything!
