# Question Generation

This notebook demonstrates generating multiple question types from lesson content.  
- **BigQuery AI SQL** is included for hackathon submission.  
- **Local execution** uses Hugging Face as fallback.


In [13]:
import sys, os
sys.path.append(os.path.abspath(".."))

# Question Generation

This notebook generates questions from lesson content using AI.

**Types of Questions:**
1. Multiple Choice Questions (MCQ)
2. Short-Answer Questions
3. Open-Ended / Discussion Questions

The goal is to create adaptive learning materials suitable for students with varying levels of understanding.


In [17]:
# 02_generate_questions.ipynb
import pandas as pd
from tqdm import tqdm
from src.question_generator import QuestionGenerator

# Load dataset
df = pd.read_csv("../data/example_study_pack.csv")

# Initialize Question Generator with warning-free settings
qg = QuestionGenerator(model_name="gpt2")

# Optional: limit rows for testing
# test_rows = 5  # set to len(df) for full dataset

# Create empty list to store results
questions_list = []

for text in tqdm(df['content'], desc="Generating questions for all rows"):
    questions_list.append(qg.generate_all(text))

df['questions'] = pd.Series(questions_list)

# Generate questions with progress bar
for text in tqdm(df['content'][:test_rows], desc="Generating questions"):
    questions_list.append(qg.generate_all(text))

# Assign to dataframe
df['questions'] = pd.Series(questions_list)

# Preview results
df[['id', 'subject', 'questions']]


Device set to use mps:0
Generating questions for all rows:   0%|                                                                                                | 0/21 [00:00<?, ?it/s]Truncation was not explicitly activated but `max_length` is provided a specific value, please use `truncation=True` to explicitly truncate examples to max length. Defaulting to 'longest_first' truncation strategy. If you encode pairs of sequences (GLUE-style) with the tokenizer you can select this strategy more precisely by providing a specific strategy to `truncation`.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Both `max_new_tokens` (=256) and `max_length`(=80) seem to have been set. `max_new_tokens` will take precedence. Please refer to the documentation for more information. (https://huggingface.co/docs/transformers/main/en/main_classes/text_generation)
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Both `max_new_tokens` (=256) and `max_length`(=50) 

Unnamed: 0,id,subject,questions
0,lc_001,Science,{'mcq': 'Generate a multiple choice question w...
1,lc_002,Science,{'mcq': 'Generate a multiple choice question w...
2,lc_003,Science,{'mcq': 'Generate a multiple choice question w...
3,lc_004,Science,{'mcq': 'Generate a multiple choice question w...
4,lc_005,Science,{'mcq': 'Generate a multiple choice question w...
5,lc_006,Science,{'mcq': 'Generate a multiple choice question w...
6,lc_007,Math,{'mcq': 'Generate a multiple choice question w...
7,lc_008,Math,{'mcq': 'Generate a multiple choice question w...
8,lc_009,Math,{'mcq': 'Generate a multiple choice question w...
9,lc_010,Geography,{'mcq': 'Generate a multiple choice question w...


In [18]:
# Show MCQ, short, and open questions separately
df[['id', 'subject', 'questions']].head(5).apply(
    lambda row: print(f"{row['id']} | {row['subject']} | {row['questions']}\n"), axis=1
)

lc_001 | Science | {'mcq': 'Generate a multiple choice question with 4 options from: Plants make food using sunlight, water, and air. This process is called photosynthesis. It is the process by which plants produce fuel. Plants use sunlight, water, and air to make food, and this process is called photosynthesis. It is the process by which plants make fuel.\n\nA plant does not see light, so it will not see food. A plant does not see light, so it will not see food.\n\nA plant leaves flowers when it is hot. The first time light leaves leaves flowers when it is hot. The first time light leaves flowers when it is hot.\n\nA flower leaves when it is hot, but leaves slowly to the inside of its container. It is too hot to see the flowers. The flower leaves when it is hot, but leaves slowly to the inside of its container. It is too hot to see the flowers.\n\nA plant leaves when it is cold, but leaves slowly to the outside of its container. It is too cold to see the flowers. The flower leaves whe

0    None
1    None
2    None
3    None
4    None
dtype: object

In [20]:
df.to_csv("../data/example_study_pack_with_questions.csv", index=False)

-- BigQuery AI version (for hackathon, requires billing)
-- This block is included to meet hackathon authenticity requirements

In [None]:


SELECT
  id,
  AI.GENERATE_TEXT(
    MODEL `bqai.text-bison`,
    PROMPT => CONCAT(
      "Generate 3 questions (MCQ, short, open-ended) for this lesson: ",
      text
    )
  ) AS generated_questions
FROM `ai-learning-buddy-471104.ai_learning_buddy.lesson_content`
LIMIT 10;


SyntaxError: invalid decimal literal (1543525114.py, line 13)