# 📝 Google Forms Creator for SME Evaluation

**Simple Google Colab notebook to create forms for expert evaluation of AI models**

This notebook creates Google Forms based on your sample data for Subject Matter Expert (SME) evaluation.

## 🚀 Quick Start:
1. Upload your JSON data file to Colab
2. Update the `PROJECT_ID` and `JSON_FILE_PATH` below
3. Run all cells
4. Get your form URLs!


## 📦 Setup and Authentication


In [None]:
# Install required packages
%pip install -qqq google-auth google-auth-oauthlib google-auth-httplib2 google-api-python-client


In [None]:
import json
import os
from typing import List, Dict, Any, Optional
from google.colab import auth
from googleapiclient.discovery import build
from google.auth import default


In [None]:
# 🔧 CONFIGURATION - UPDATE THESE VALUES
PROJECT_ID = "black-heuristic-xxx-f6"  # Your Google Cloud Project ID
BATCH_ID = 1
JSON_FILE_PATH = f"/content/sample_data_template_{BATCH_ID}.json"

FORM_TITLE = f"BEACON LLM Model Evaluation for the Severity Assessment" 

print(f"🔧 Project ID: {PROJECT_ID}")
print(f"📂 Data file: {JSON_FILE_PATH}")
print(f"📝 Form title: {FORM_TITLE}")


In [None]:

# Set environment variables
os.environ['GOOGLE_CLOUD_PROJECT'] = PROJECT_ID
os.environ['GCLOUD_PROJECT'] = PROJECT_ID

# Authenticate
auth.authenticate_user()
creds, _ = default()

if hasattr(creds, 'with_quota_project'):
    creds = creds.with_quota_project(PROJECT_ID)

# Build Forms service
forms_service = build('forms', 'v1', credentials=creds)
print("Google Forms API connected successfully!")


## 📊 Load and Prepare Data


In [None]:
# 📂 Load your JSON data
print(f"📂 Loading data from: {JSON_FILE_PATH}")

try:
    with open(JSON_FILE_PATH, 'r', encoding='utf-8') as f:
        sample_data = json.load(f)
    print(f"✅ Loaded {len(sample_data)} samples!")
    
    # Show first sample structure
    if sample_data:
        print("\n📋 Sample data structure:")
        first_sample = sample_data[0]
        for key in first_sample.keys():
            value = str(first_sample[key])[:100] + "..." if len(str(first_sample[key])) > 100 else first_sample[key]
            print(f"  • {key}: {value}")
            
except FileNotFoundError:
    print(f"❌ File not found: {JSON_FILE_PATH}")
    
print(f"\n🎯 Ready to create form with {len(sample_data)} samples")


## 🚀 Create the Google Form


In [None]:
# Add instructions

# Improved format optimized for Google Forms plain text display
instructions = """
OVERVIEW
You are participating in a research study to evaluate AI-generated outbreak severity assessments for news articles about health threats.



WHAT YOU'LL REVIEW

Article: News article from HealthMap's archived reports of historical disease outbreaks

Reference Assessment: Current "gold standard" baseline created by GPT-o1 (advanced AI model)
• Includes severity score and detailed reasoning
• Used as our training reference point

Model A & Model B Assessments: Two randomized AI model outputs for comparison
• One fine-tuned model (BEACON LLM)
• One base model (Llama-3.1-8B)
• Order is randomized to prevent bias


KEY DEFINITIONS

Severity Score Scale (1-5):
1 = Very Low Risk 
2 = Low Risk 
3 = Moderate Risk
4 = High Risk
5 = Very High Risk

Severity Reasoning:
Detailed justification explaining the assigned score based on relevant epidemiological, clinical, and contextual factors for the specific risk domain.


YOUR EVALUATION TASKS

Task 1: Reference Validation
Purpose: Validate our "gold standard" baseline assessment
Your Role: Evaluate whether GPT-o1's risk assessment and reasoning are accurate and appropriate
Importance: Ensures our training data quality

Task 2: Model Comparison
Purpose: Compare two different AI approaches
Your Role: Determine which model provides better risk assessment and reasoning
Focus: Overall quality, not just agreement with reference

Task 3: Expert Assessment (Optional)
Purpose: Provide independent expert judgment
When to Use: When you believe all AI models significantly over- or under-estimate 
Your Role: Provide your own expert severity score with justification


EVALUATION CRITERIA

Please evaluate based on:
• Scientific Accuracy: Correctness of epidemiological and medical information
• Risk Appropriateness: How well the severity score matches the described threat level
• Reasoning Quality: Completeness and quality of the justification


SUPPORT
For technical issues or questions about this evaluation, please contact: jmyang@bu.edu

Thank you for contributing your expertise to improve BEACON LLMS."""

In [None]:
# Step 1: Create basic form
form_request = {"info": {"title": f"{FORM_TITLE} - Batch {BATCH_ID}"}}

form = forms_service.forms().create(body=form_request).execute()
form_id = form['formId']

# Step 2: Add description
try:
    forms_service.forms().batchUpdate(
        formId=form_id,
        body={
            "requests": [{
                "updateFormInfo": {
                    "info": {
                        "title": f"{FORM_TITLE} - Batch {BATCH_ID}",
                        "description": f"{instructions}"
                    },
                    "updateMask": "description"
                }
            }]
        }
    ).execute()
except Exception as e:
    print(f"Description error: {e}")


In [None]:
# 📝 Add Form Content

all_requests = []
current_index = 0

# Add SME Email field
all_requests.append({
    "createItem": {
        "item": {
            "title": "Your Email (Required)",
            "description": "Please enter your email address",
            "questionItem": {
                "question": {
                    "required": True,
                    "textQuestion": {"paragraph": False}
                }
            }
        },
        "location": {"index": current_index}
    }
})
current_index += 1

# Add questions for each sample - with page breaks and separate article questions
for i, sample in enumerate(sample_data):
    sample_num = i + 1
    
    ### Add page break before each new sample (except the first)
    if i > 0:
        all_requests.append({
            "createItem": {
                "item": {
                    "pageBreakItem": {}
                },
                "location": {"index": current_index}
            }
        })
        current_index += 1
    
    ### Add the sample header with domain and factors only
    all_requests.append({
        "createItem": {
            "item": {
                "title": f"Sample {sample_num}",
                "description": f"Domain: {sample.get('domain', 'Unknown Domain')}\n\nFactors: {sample.get('factors', 'Unknown Factors')}",
                "textItem": {}
            },
            "location": {"index": current_index}
        }
    })
    current_index += 1
    
    ### Add article content as a separate question on the same page
    all_requests.append({
        "createItem": {
            "item": {
                "title": f"Article Content - Sample {sample_num}",
                "description": sample.get('article_content', 'Article content'),
                "textItem": {
                    "paragraph": True
                }
            },
            "location": {"index": current_index}
        }
    })
    current_index += 1
    # Add Reference assessment
    
    all_requests.append({
        "createItem": {
            "item": {
                "title": f"Task 1. Reference Assessment (Generated by OpenAI GPT-O1) - Sample {sample_num}",
                "description": f"""
Severity Score: {sample.get('reference_score', 'X')}/5
Severity Reasoning: {sample.get('reference_reasoning', 'Reference reasoning')}""",
                "textItem": {}
            },
            "location": {"index": current_index}
        }
    })
    current_index += 1

    # Reference risk score question
    all_requests.append({
        "createItem": {
            "item": {
                "title": f"Q1. Is the reference assessment's risk score appropriate for this scenario? - Sample {sample_num} (Required)",
                "description": f"Severity Score: {sample.get('reference_score', 'X')}/5",
                "questionItem": {
                    "question": {
                        "required": True,
                        "choiceQuestion": {
                            "type": "RADIO",
                            "options": [
                                {"value": "Too low (should be higher)"},
                                {"value": "Appropriate (correct level)"},
                                {"value": "Too high (should be lower)"}
                            ]
                        }
                    }
                }
            },
            "location": {"index": current_index}
        }
    })
    current_index += 1
    
    # Reference reasoinng question
    all_requests.append({
        "createItem": {
            "item": {
                "title": f"Q2. Rate the accuracy and quality of the Reference reasoning - Sample {sample_num} (Required)",
                "description": f"Severity Reasoning: {sample.get('reference_reasoning', 'Reference reasoning')}",
                "questionItem": {
                    "question": {
                        "required": True,
                        "choiceQuestion": {
                            "type": "RADIO",
                            "options": [
                                {"value": "Poor (Major errors)"},
                                {"value": "Average (Acceptable)"},
                                {"value": "Good (Accurate)"}
                            ]
                        }
                    }
                }
            },
            "location": {"index": current_index}
        }
    })
    current_index += 1


        
    # Model comparison display
    
    all_requests.append({
        "createItem": {
            "item": {
                "title": f"Task 2. Model Comparison - Sample {sample_num}",
                "description": f"""
Model A Severity Score: {sample.get('model_a_score', 'Y')}/5\n\n
Model A Severity Reasoning: {sample.get('model_a_reasoning', 'Model A reasoning')}\n\n
Model B Severity Score: {sample.get('model_b_score', 'Z')}/5  \n\n
Model B Severity Reasoning: {sample.get('model_b_reasoning', 'Model B reasoning')}""",
                "textItem": {}
            },
            "location": {"index": current_index}
        }
    })
    current_index += 1
    
    # Model comparison question
    all_requests.append({
        "createItem": {
            "item": {
                "title": f"Q3. Which model severity score assessment is more accurate? - Sample {sample_num} (Required)",
                "description": f"Model A Severity Score: {sample.get('model_a_score', 'Y')}/5\n\nModel B Severity Score: {sample.get('model_b_score', 'Z')}/5  \n\n",
                "questionItem": {
                    "question": {
                        "required": True,
                        "choiceQuestion": {
                            "type": "RADIO",
                            "options": [
                                {"value": "Model A is clearly better"},
                                {"value": "Model A is slightly better"},
                                {"value": "Both are equivalent"},
                                {"value": "Model B is slightly better"},
                                {"value": "Model B is clearly better"}
                            ]
                        }
                    }
                }
            },
            "location": {"index": current_index}
        }
    })
    current_index += 1

    # Model comparison question
    all_requests.append({
        "createItem": {
            "item": {
                "title": f"Q4. Which model provides BETTER REASONING and justification? - Sample {sample_num} (Required)",
                "description": f"Model A Severity Reasoning: {sample.get('model_a_reasoning', 'Model A reasoning')}\n\nModel B Severity Reasoning: {sample.get('model_b_reasoning', 'Model B reasoning')}",
                "questionItem": {
                    "question": {
                        "required": True,
                        "choiceQuestion": {
                            "type": "RADIO",
                            "options": [
                                {"value": "Model A reasoning is much better"},
                                {"value": "Model A reasoning is slightly better"},
                                {"value": "Both reasoning are equivalent"},
                                {"value": "Model B reasoning is slightly better"},
                                {"value": "Model B reasoning is much better"}
                            ]
                        }
                    }
                }
            },
            "location": {"index": current_index}
        }
    })
    current_index += 1
    
    # Expert score (optional)
    all_requests.append({
        "createItem": {
            "item": {
                "title": f"Q5. What risk score would you assign (1-5)? - Sample {sample_num}  (Optional)",
                "description": "",
                "questionItem": {
                    "question": {
                        "required": False,
                        "choiceQuestion": {
                            "type": "RADIO",
                            "options": [
                                {"value": "1 - Very low risk"},
                                {"value": "2 - Low risk"},
                                {"value": "3 - Moderate risk"},
                                {"value": "4 - High risk"},
                                {"value": "5 - Very high risk"}
                            ]
                        }
                    }
                }
            },
            "location": {"index": current_index}
        }
    })
    current_index += 1
    
    # Comments
    all_requests.append({
        "createItem": {
            "item": {
                "title": f"Q6. Any critical factors missed by all assessments or any comments? - Sample {sample_num} (Optional)",
                "description": "",
                "questionItem": {
                    "question": {
                        "required": False,
                        "textQuestion": {"paragraph": True}
                    }
                }
            },
            "location": {"index": current_index}
        }
    })
    current_index += 1



In [None]:

batch_size = 5  # Small batches to avoid API limits
success_count = 0

for i in range(0, len(all_requests), batch_size):
    batch = all_requests[i:i+batch_size]
    
    try:
        forms_service.forms().batchUpdate(
            formId=form_id,
            body={"requests": batch}
        ).execute()
        success_count += len(batch)
        print(f"✅ Added {len(batch)} items (total: {success_count}/{len(all_requests)})")
        
    except Exception as e:
        print(f"❌ Batch error: {e}")
        # Try individually
        for j, request in enumerate(batch):
            try:
                forms_service.forms().batchUpdate(
                    formId=form_id,
                    body={"requests": [request]}
                ).execute()
                success_count += 1
                print(f"✅ Added item individually ({success_count}/{len(all_requests)})")
            except Exception as eq:
                print(f"❌ Individual error {i+j+1}: {eq}")

print(f"📊 Successfully added {success_count}/{len(all_requests)} items")




## 📋 Form Information and Results


In [None]:
# 💾 Save Form Information - Get correct URLs from API

try:
    # Get the form data to extract the correct URLs
    form_data = forms_service.forms().get(formId=form_id).execute()
    
    # Extract the correct URLs
    edit_url = f"https://docs.google.com/forms/d/{form_id}/edit"
    
    # The responderUri gives us the correct public URL
    if 'responderUri' in form_data:
        view_url = form_data['responderUri']
    else:
        # Fallback URL format
        view_url = f"https://docs.google.com/forms/d/{form_id}/viewform"
    
    # Responses URL - this goes to the responses tab of the edit page
    responses_url = f"https://docs.google.com/forms/d/{form_id}/edit#responses"
    
    
except Exception as e:
    # Use fallback URLs
    edit_url = f"https://docs.google.com/forms/d/{form_id}/edit"
    view_url = f"https://docs.google.com/forms/d/{form_id}/viewform"
    responses_url = f"https://docs.google.com/forms/d/{form_id}/edit#responses"

form_info = {
    'form_id': form_id,
    'edit_url': edit_url,
    'view_url': view_url,
    'responses_url': responses_url,
    'total_samples': len(sample_data),
    'json_file_used': JSON_FILE_PATH,
    'project_id': PROJECT_ID,
    'form_title': FORM_TITLE
}

# Save to file
with open(f'/content/form_result_{BATCH_ID}.json', 'w') as f:
    json.dump(form_info, f, indent=2)


print(f"✏️  Edit Form: {edit_url}")
print(f"👀 View Form: {view_url}")
print(f"📈 Responses: {responses_url}")

