# 🔧 Fix GPT-5 Configuration

**Issue**: GPT-5 models (gpt-5-nano, gpt-5-mini) use different API parameters than GPT-4:
- ❌ Use `max_completion_tokens` instead of `max_tokens`
- ❌ Do NOT include `temperature` parameter (GPT-5 always uses temperature=1)
- ✅ Must include proper system message and formatting

This cell contains the corrected code from `insights.ipynb`.

In [None]:
# CORRECTED CODE - Copy this to replace the faulty cell in your main notebook

topic_names = {}

if clustered_df is not None and len(clustered_df) > 0:
    print(f"\n🤖 Generating topic names with {GPT_MODEL}...")
    
    async def generate_topic_name(questions: List[str], keywords: str = "") -> str:
        """Generate a topic name using GPT-5 for a cluster of questions"""
        
        # Limit to top 10 questions for context (like insights)
        sample_questions = questions[:10]
        questions_text = "\n".join([f"- {q}" for q in sample_questions])

        prompt = f"""
Based on the following student questions and keywords, generate a concise, descriptive topic name.

QUESTIONS:
{questions_text}

KEYWORDS: {keywords}

Instructions:
- Your answer must be ONLY the topic name (2–8 words), no extra text.
- It should clearly describe the shared theme of the questions.
- Avoid generic labels like "General Questions" or "Miscellaneous."
- Do not include "Topic name:" or quotation marks.
- Use simple, natural English that sounds clear to a student or teacher.

Example:
Questions:
- When does registration open?
- What are the fall 2025 enrollment deadlines?
Keywords: registration, deadlines

Topic name: Fall 2025 Registration Deadlines

Now generate the topic name for the questions above:
"""

        try:
            messages = [
                {"role": "system", "content": "You are an expert at creating clear, descriptive topic names for student question categories."},
                {"role": "user", "content": prompt}
            ]
            
            # GPT-5 specific configuration (NO temperature parameter!)
            response = await async_client.chat.completions.create(
                model=GPT_MODEL,
                messages=messages,
                max_completion_tokens=1000  # Use max_completion_tokens for GPT-5, not max_tokens
            )
            
            topic_name = response.choices[0].message.content.strip()
            
            # Clean up the response
            topic_name = topic_name.replace("Topic name:", "").strip()
            topic_name = topic_name.strip('\"\'')
            
            if not topic_name:
                topic_name = f"Topic: {keywords[:50]}" if keywords else f"Question Group {hash(str(questions[:3])) % 1000}"
            
            return topic_name
            
        except Exception as e:
            error_logger.log_error("TopicNaming", f"GPT failed: {str(e)}", e)
            # Fallback to keyword-based name
            fallback_name = f"Topic: {keywords[:50]}" if keywords else f"Question Group {hash(str(questions[:3])) % 1000}"
            return fallback_name
    
    async def process_all_clusters():
        tasks = []
        cluster_ids = []
        
        for cluster_id, group in clustered_df.groupby('cluster_id'):
            questions = group['question'].tolist()
            # Extract keywords from BERTopic if available
            keywords = group['topic_keywords'].iloc[0] if 'topic_keywords' in group.columns else ""
            
            tasks.append(generate_topic_name(questions, keywords))
            cluster_ids.append(cluster_id)
        
        names = await asyncio.gather(*tasks)
        return dict(zip(cluster_ids, names))
    
    topic_names = await process_all_clusters()
    clustered_df['topic_name'] = clustered_df['cluster_id'].map(topic_names)
    
    print(f"✅ Generated {len(topic_names)} topic names")
    for cid, name in list(topic_names.items())[:5]:
        count = len(clustered_df[clustered_df['cluster_id'] == cid])
        print(f"   {name} ({count} questions)")

## 📋 Key Changes from Original

### ❌ What was wrong:
```python
response = await async_client.chat.completions.create(
    model=GPT_MODEL,
    messages=[{"role": "user", "content": prompt}],
    max_tokens=100  # ❌ Wrong parameter for GPT-5
)
```

### ✅ What's correct:
```python
response = await async_client.chat.completions.create(
    model=GPT_MODEL,
    messages=[
        {"role": "system", "content": "..."},  # ✅ System message
        {"role": "user", "content": prompt}
    ],
    max_completion_tokens=1000  # ✅ Correct parameter for GPT-5
    # ✅ NO temperature parameter - GPT-5 always uses 1.0
)
```

## 🔍 Other Improvements:

1. **Better prompt** - More detailed instructions and examples
2. **System message** - Establishes GPT's role as topic naming expert
3. **Keywords integration** - Uses BERTopic keywords if available
4. **Better fallback** - Uses keywords for unnamed topics
5. **Response cleaning** - Removes "Topic name:" prefix and quotes

## 🚀 How to Use:

1. Copy the code from the cell above
2. In your main notebook, find the "Generate Topic Names with GPT" section
3. Replace the entire cell with this corrected version
4. Run the notebook again

The GPT-5 errors should disappear! ✨

## 🔧 Fix S3 Upload Failures

**Issue**: Output files are failing to upload even with retry logic.

This is likely due to one of these issues:
1. Missing `public=True` parameter in upload call
2. Files don't exist in current directory
3. Permission issue with `PutObjectAcl` (public-read ACL)
4. Network/region configuration issue

Let's add debugging and fix the upload code.

In [None]:
# OPTION 1: Fix with better error handling and file verification

print(f"\n☁️  Uploading to S3...")

# Delete old files (optional - comment out if you want to keep history)
try:
    delete_s3_folder(S3_OUTPUT_PREFIX)
except Exception as e:
    print(f"⚠️  Could not delete old files: {e}")

# Upload new files with verification
uploaded = []
failed = []

for filepath in output_files:
    # Check if file exists
    if not os.path.exists(filepath):
        print(f"❌ File not found: {filepath}")
        failed.append(filepath)
        continue
    
    # Get file size for debugging
    file_size = os.path.getsize(filepath)
    print(f"📤 Uploading {filepath} ({file_size:,} bytes)...")
    
    s3_key = f"{S3_OUTPUT_PREFIX}/{filepath}"
    
    try:
        # Explicitly pass public=True for output files
        if upload_to_s3(filepath, s3_key, public=True):
            url = f"https://{S3_BUCKET}.s3.amazonaws.com/{s3_key}"
            uploaded.append(url)
            print(f"   ✅ Success: {url}")
        else:
            failed.append(filepath)
            print(f"   ❌ Failed: {filepath}")
    except Exception as e:
        print(f"   ❌ Exception: {str(e)}")
        failed.append(filepath)

print(f"\n📊 UPLOAD SUMMARY:")
print(f"   ✅ Successful: {len(uploaded)}/{len(output_files)}")
print(f"   ❌ Failed: {len(failed)}/{len(output_files)}")

if uploaded:
    print(f"\n✅ Uploaded files:")
    for url in uploaded:
        print(f"   {url}")

if failed:
    print(f"\n❌ Failed files:")
    for f in failed:
        print(f"   {f}")

In [None]:
# OPTION 2: Skip ACL if you don't have PutObjectAcl permission
# Use this if your diagnostic showed you can upload without ACL but not with public-read

print(f"\n☁️  Uploading to S3 (without public ACL)...")

# Delete old files
try:
    delete_s3_folder(S3_OUTPUT_PREFIX)
except Exception as e:
    print(f"⚠️  Could not delete old files: {e}")

# Upload new files WITHOUT public-read ACL
uploaded = []
failed = []

for filepath in output_files:
    if not os.path.exists(filepath):
        print(f"❌ File not found: {filepath}")
        failed.append(filepath)
        continue
    
    file_size = os.path.getsize(filepath)
    print(f"📤 Uploading {filepath} ({file_size:,} bytes)...")
    
    s3_key = f"{S3_OUTPUT_PREFIX}/{filepath}"
    
    try:
        # Use public=False to skip ACL (if you don't have PutObjectAcl permission)
        if upload_to_s3(filepath, s3_key, public=False):
            # Note: URL won't be publicly accessible without ACL
            url = f"s3://{S3_BUCKET}/{s3_key}"
            uploaded.append(url)
            print(f"   ✅ Success: {url}")
        else:
            failed.append(filepath)
            print(f"   ❌ Failed: {filepath}")
    except Exception as e:
        print(f"   ❌ Exception: {str(e)}")
        failed.append(filepath)

print(f"\n📊 UPLOAD SUMMARY:")
print(f"   ✅ Successful: {len(uploaded)}/{len(output_files)}")
print(f"   ❌ Failed: {len(failed)}/{len(output_files)}")

if uploaded:
    print(f"\n✅ Uploaded files (private - not publicly accessible):")
    for url in uploaded:
        print(f"   {url}")
    print(f"\n💡 TIP: Files are uploaded but not public. Your Streamlit app can access them with AWS credentials.")

if failed:
    print(f"\n❌ Failed files:")
    for f in failed:
        print(f"   {f}")

## 🔍 Root Cause Analysis

Based on your diagnostic test results:

### ✅ What Works:
- Test 3 & 4: Small file uploads to S3 succeed
- Network connectivity is fine
- Credentials are valid

### ❌ What's Failing:
- **ALL output files** fail to upload (not just cache files)
- Retry logic exhausts all 5 attempts
- Different from cache upload failures (which were silent)

### 🎯 Most Likely Causes:

**1. Missing `s3:PutObjectAcl` Permission** (Most Likely)
- Your IAM user can upload files (`s3:PutObject` works)
- But cannot set ACL to `public-read` (`s3:PutObjectAcl` missing)
- Solution: Use `public=False` or ask admin to add permission

**2. File Path Issues**
- Files might be in `/tmp/` but code looks in current directory
- Check with `!ls -la *.parquet *.json`

**3. File Size Issues**
- Small test files (110-152 bytes) worked
- Large parquet files might timeout
- Unlikely since retry logic should handle this

### 💡 Recommended Solutions:

**Option A** (Quick Fix): Use `public=False`
- Files upload successfully but aren't publicly accessible
- Streamlit app can still read them with AWS credentials
- No IAM permission changes needed

**Option B** (Proper Fix): Request IAM Permission
- Ask your AWS admin to add `s3:PutObjectAcl` permission
- Allows public-read ACL for sharing files
- Enables public URLs for Streamlit

**Option C** (Workaround): Bucket Policy
- Set bucket-level policy to make all files public by default
- No ACL needed per-file
- Requires bucket admin access

## 📋 Quick Summary

### Two Issues Fixed in This Notebook:

#### 1️⃣ **GPT-5 Topic Naming** ❌→✅
- **Problem**: Used wrong API parameters (`max_tokens` vs `max_completion_tokens`)
- **Fix**: Use corrected code from first cell (matches insights.ipynb)
- **Status**: Ready to apply

#### 2️⃣ **S3 Upload Failures** ❌→⚠️
- **Problem**: Missing `s3:PutObjectAcl` IAM permission
- **Options**:
  - **Quick**: Use `public=False` (Option 2 code)
  - **Proper**: Request IAM permission from AWS admin
- **Status**: Choose your approach and apply code

### 🚀 Next Steps:

1. **Fix GPT-5**: Replace topic naming cell in main notebook with corrected version
2. **Fix Uploads**: Choose Option 1 or Option 2 based on your needs
3. **Test**: Run the full notebook end-to-end
4. **Verify**: Check S3 bucket for uploaded files

Your notebook should work perfectly after these changes! 🎉