# Japanese Vocabulary Quiz with Google Gemini

This notebook demonstrates how to use Google Gemini to create interactive vocabulary quizzes from Japanese vocabulary PDFs, helping improve language learning and retention.

## 1. Setup and Installation

First, let's install the necessary packages for our application.

In [1]:
%pip install --upgrade --user --quiet google-genai python-dotenv pandas gdown

[0m
[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m A new release of pip is available: [0m[31;49m24.3[0m[39;49m -> [0m[32;49m25.0.1[0m
[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m To update, run: [0m[32;49mpip install --upgrade pip[0m

[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m A new release of pip is available: [0m[31;49m24.3[0m[39;49m -> [0m[32;49m25.0.1[0m
[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m To update, run: [0m[32;49mpip install --upgrade pip[0m
Note: you may need to restart the kernel to use updated packages.
Note: you may need to restart the kernel to use updated packages.


### Restart Kernel

Let's restart the kernel to ensure the newly installed packages are available in this Jupyter session.

In [None]:
# Restart kernel after installs so that your environment can access the new packages
import IPython

app = IPython.Application.instance()
app.kernel.do_shutdown(True)

{'status': 'ok', 'restart': True}

: 

<div class="alert alert-block alert-warning">
<b>⚠️ The kernel is going to restart. Please wait until it is finished before continuing to the next step. ⚠️</b>
</div>

## 2. Authentication and Configuration

Set up Google Cloud project information and create a Gemini client.

In [1]:
import os
import sys
import json
import random
import pandas as pd
from IPython.display import Markdown, display
from google import genai
from google.genai.types import GenerateContentConfig, Part
from pydantic import BaseModel, Field
from dotenv import load_dotenv

# Load environment variables from .env file
load_dotenv()

# Check if running in Google Colab
if "google.colab" in sys.modules:
    # Authenticate user to Google Cloud
    from google.colab import auth
    auth.authenticate_user()

# Set up project information
PROJECT_ID = os.environ.get("GOOGLE_CLOUD_PROJECT")
if not PROJECT_ID or PROJECT_ID == "[your-project-id]":
    PROJECT_ID = str(os.environ.get("GOOGLE_CLOUD_PROJECT"))

LOCATION = os.environ.get("GOOGLE_CLOUD_REGION", "us-central1")

# Create Gemini client
client = genai.Client(vertexai=True, project=PROJECT_ID, location=LOCATION)

# Define constants
MODEL_ID = "gemini-2.0-flash"  # Efficient model for most tasks
PDF_MIME_TYPE = "application/pdf"
JSON_MIME_TYPE = "application/json"

## 3. Download Vocabulary PDF from Google Drive

This section allows you to download your vocabulary PDF from Google Drive.

In [41]:
import gdown
from dotenv import load_dotenv
import os

def download_from_google_drive(file_id, output_file="japanese_vocab.pdf"):
    """Download a file from Google Drive using its file ID"""
    try:
        url = f"https://drive.google.com/uc?id={file_id}"
        gdown.download(url, output_file, quiet=False)
        print(f"File downloaded successfully to {output_file}")
        return output_file
    except Exception as e:
        print(f"Error downloading file: {e}")
        return None

# Example usage
# Replace with your PDF file ID from Google Drive
load_dotenv()
file_id = os.getenv("FILE_ID")
pdf_file = download_from_google_drive(file_id)

Downloading...
From: https://drive.google.com/uc?id=1B38H78uxRucrBcL1zPbfyW41Y4XHEfsH
To: /home/justin/AThinkWsl/programming/google_ai/japanese_vocab.pdf
100%|██████████| 207k/207k [00:00<00:00, 2.47MB/s]

File downloaded successfully to japanese_vocab.pdf





## 4. Extract Vocabulary from PDF

Use Gemini to extract vocabulary pairs from the PDF document.

In [3]:
# Define vocabulary schema for structured extraction
class VocabularyItem(BaseModel):
    japanese: str = Field(..., description="The Japanese word/phrase")
    reading: str | None = Field(None, description="The hiragana/katakana reading of the word/phrase")
    english: str = Field(..., description="The English translation/meaning")

class VocabularyList(BaseModel):
    vocab_items: list[VocabularyItem] = Field(..., description="List of vocabulary items")

In [12]:
def extract_vocabulary_from_pdf(pdf_file_path):
    """Extract vocabulary pairs from a PDF using Gemini"""
    
    # System instruction for vocabulary extraction
    system_instruction = """
    You are a Japanese language specialist. Your task is to extract Japanese vocabulary 
    items from this document. For each item, provide:
    1. The Japanese word/phrase
    2. The reading in hiragana/katakana if available
    3. The English translation
    
    Extract as many vocabulary items as you can find in the document.
    If the reading is not provided in the document, leave it null.
    """
    
    # Read the file
    try:
        with open(pdf_file_path, "rb") as f:
            file_bytes = f.read()
        
        print("Extracting vocabulary from PDF...")
        # Send to Gemini API
        response = client.models.generate_content(
            model=MODEL_ID,
            contents=[
                "Extract all Japanese vocabulary with English translations from this document.",
                Part.from_bytes(data=file_bytes, mime_type=PDF_MIME_TYPE),
            ],
            config=GenerateContentConfig(
                system_instruction=system_instruction,
                temperature=0,
                response_schema=VocabularyList,
                response_mime_type=JSON_MIME_TYPE,
            )
        )
        
        # Return the parsed vocabulary list
        return response.parsed
        
    except Exception as e:
        print(f"Error extracting vocabulary: {e}")
        return None

In [13]:
# Extract vocabulary from the downloaded PDF
if pdf_file:
    vocabulary_list = extract_vocabulary_from_pdf(pdf_file)
    if vocabulary_list:
        print(f"\n✅ Successfully extracted {len(vocabulary_list.vocab_items)} vocabulary items")
        
        # Display the first few items
        display(Markdown("### Sample Extracted Vocabulary"))
        
        df = pd.DataFrame([{
            "Japanese": item.japanese,
            "Reading": item.reading if item.reading else "",
            "English": item.english
        } for item in vocabulary_list.vocab_items[:10]])  # Show first 10 items
        
        display(df)
    else:
        print("Failed to extract vocabulary from the PDF.")
else:
    print("Please provide a valid PDF file.")

Extracting vocabulary from PDF...

✅ Successfully extracted 52 vocabulary items

✅ Successfully extracted 52 vocabulary items


### Sample Extracted Vocabulary

Unnamed: 0,Japanese,Reading,English
0,あう,,encounter
1,ふこう,,unhappiness
2,ちょきんする,貯金する,save money
3,まいつき,,every month
4,きゅうりょう,,salary
5,そしき,組織,organization
6,すぎる,過ぎる,pass[time]
7,なれる,慣れる,get used to
8,くさる,腐る,[food] rot
9,このごろ,,"these days,‘最近’"


## Save markdown file

In [35]:
def save_markdown(content, filename="output.md"):
    """
    Save the given content to a Markdown file.

    Args:
        content (str): The content to save in Markdown format.
        filename (str): The name of the file to save the content to (default: "output.md").
    """
    try:
        with open(filename, "w") as f:
            f.write(content)
        print(f"✅ Content successfully saved to {filename}")
    except Exception as e:
        print(f"❌ Failed to save content: {e}")

## 5. Generate Quiz Questions

Create different types of quizzes from the extracted vocabulary.

In [14]:
def create_multiple_choice_quiz(vocab_items, num_questions=10, direction="j2e"):
    """Create a multiple-choice quiz from vocabulary items
    
    Args:
        vocab_items: List of vocabulary items
        num_questions: Number of questions to generate
        direction: 'j2e' for Japanese to English, 'e2j' for English to Japanese
    """
    if len(vocab_items) < num_questions:
        num_questions = len(vocab_items)
    
    # Sample vocabulary for questions
    selected_items = random.sample(vocab_items, num_questions)
    
    quiz_questions = []
    for i, item in enumerate(selected_items):
        # Create a question
        if direction == "j2e":
            question = f"Q{i+1}: What is the meaning of '{item.japanese}'?"
            correct_answer = item.english
            # Sample other items for wrong options
            other_options = [v.english for v in vocab_items if v.english != correct_answer]
            if len(other_options) < 3:
                other_options = [f"Wrong option {j}" for j in range(3)]
        else:  # e2j
            question = f"Q{i+1}: What is the Japanese word for '{item.english}'?"
            correct_answer = item.japanese
            # Sample other items for wrong options
            other_options = [v.japanese for v in vocab_items if v.japanese != correct_answer]
            if len(other_options) < 3:
                other_options = [f"Wrong option {j}" for j in range(3)]
        
        # Create options (1 correct + 3 wrong)
        options = random.sample(other_options, 3) + [correct_answer]
        random.shuffle(options)
        
        correct_index = options.index(correct_answer)
        option_letters = ['A', 'B', 'C', 'D']
        
        # Format options
        formatted_options = {}
        for j, opt in enumerate(options):
            formatted_options[option_letters[j]] = opt
        
        quiz_questions.append({
            "question": question,
            "options": formatted_options,
            "correct_answer": option_letters[correct_index]
        })
    
    return quiz_questions

def create_typing_quiz(vocab_items, num_questions=10, direction="j2e"):
    """Create a typing quiz from vocabulary items"""
    if len(vocab_items) < num_questions:
        num_questions = len(vocab_items)
    
    # Sample vocabulary for questions
    selected_items = random.sample(vocab_items, num_questions)
    
    quiz_questions = []
    for i, item in enumerate(selected_items):
        if direction == "j2e":
            question = f"Q{i+1}: Translate '{item.japanese}' to English"
            answer = item.english
        else:  # e2j
            question = f"Q{i+1}: Translate '{item.english}' to Japanese"
            answer = item.japanese
        
        quiz_questions.append({
            "question": question,
            "answer": answer
        })
    
    return quiz_questions

def create_context_quiz(vocab_items, num_questions=5):
    """Create a context-based quiz using Gemini to generate example sentences"""
    if len(vocab_items) < num_questions:
        num_questions = len(vocab_items)
    
    # Sample vocabulary for questions
    selected_items = random.sample(vocab_items, num_questions)
    
    context_questions = []
    for item in selected_items:
        # Use Gemini to generate a context sentence
        system_instruction = """
        You are a Japanese language teacher. Create a simple example sentence in Japanese 
        using the given vocabulary word. Provide:
        1. The Japanese sentence
        2. The romaji reading
        3. The English translation
        4. A fill-in-the-blank version where the target word is replaced with ___
        """
        
        response = client.models.generate_content(
            model=MODEL_ID,
            contents=f"Create an example sentence using the Japanese word '{item.japanese}' which means '{item.english}'.",
            config=GenerateContentConfig(system_instruction=system_instruction, temperature=0.2)
        )
        
        context_questions.append({
            "vocabulary": item,
            "context": response.text
        })
    
    return context_questions

In [15]:
def generate_gemini_quiz(vocab_items, quiz_type="multiple_choice", num_questions=10, language_direction="mixed"):
    """Generate a complete quiz using Gemini's creative capabilities"""
    
    # Prepare vocabulary data for the prompt
    vocab_sample = []
    selected_items = random.sample(vocab_items, min(20, len(vocab_items)))
    for item in selected_items:
        if item.reading:
            vocab_sample.append(f"{item.japanese} ({item.reading}) - {item.english}")
        else:
            vocab_sample.append(f"{item.japanese} - {item.english}")
    
    vocab_text = "\n".join(vocab_sample)
    
    system_instruction = """
    You are an expert Japanese language teacher creating interactive quizzes for students.
    Your quiz should be engaging, educational, and tailored to help students remember vocabulary effectively.
    Include clear instructions, well-formatted questions, and an answer key at the end.
    """
    
    prompt = f"""
    Create a {quiz_type} quiz with {num_questions} questions using these Japanese vocabulary items:
    
    {vocab_text}
    
    The quiz should include {language_direction} direction questions (Japanese to English and/or English to Japanese).
    For multiple choice questions, include 4 options per question with only one correct answer.
    For fill-in-the-blank questions, provide context sentences in both Japanese and English.
    Format the quiz neatly with clear sections for questions and include a separate answer key at the end.
    """
    
    response = client.models.generate_content(
        model=MODEL_ID,
        contents=prompt,
        config=GenerateContentConfig(system_instruction=system_instruction, temperature=0.2)
    )
    
    return response.text

## 6. Interactive Quiz Interface

Create an interactive interface for taking the quiz.

In [16]:
def run_multiple_choice_quiz(vocab_items, num_questions=5, direction="j2e"):
    """Run an interactive multiple-choice quiz in the notebook"""
    questions = create_multiple_choice_quiz(vocab_items, num_questions, direction)
    correct_count = 0
    
    print(f"{'='*40}")
    print(f"JAPANESE VOCABULARY QUIZ - MULTIPLE CHOICE")
    print(f"{'='*40}\n")
    
    for q in questions:
        print(q["question"])
        for letter, option in q["options"].items():
            print(f"{letter}. {option}")
        
        # Get user input
        user_answer = input("\nYour answer (A/B/C/D): ").strip().upper()
        
        # Check answer
        if user_answer == q["correct_answer"]:
            print("✓ Correct!\n")
            correct_count += 1
        else:
            print(f"✗ Incorrect. The correct answer is {q['correct_answer']}.\n")
    
    # Show final score
    print(f"{'='*40}")
    print(f"Quiz completed! Your score: {correct_count}/{len(questions)} ({correct_count/len(questions)*100:.1f}%)")
    print(f"{'='*40}")
    
    return correct_count / len(questions)

In [9]:
def run_typing_quiz(vocab_items, num_questions=5, direction="j2e"):
    """Run an interactive typing quiz in the notebook"""
    questions = create_typing_quiz(vocab_items, num_questions, direction)
    correct_count = 0
    
    print(f"{'='*40}")
    print(f"JAPANESE VOCABULARY QUIZ - TYPING")
    print(f"{'='*40}\n")
    
    for q in questions:
        print(q["question"])
        
        # Get user input
        user_answer = input("Your answer: ").strip()
        
        # Use Gemini to evaluate the answer with some flexibility
        system_instruction = """
        You are a Japanese language evaluation assistant. Your task is to determine if 
        the user's answer is close enough to be considered correct, even if there are 
        minor spelling variations or synonyms used. For Japanese answers, ignore small 
        differences in kanji usage or hiragana/katakana variations if the meaning is preserved.
        Respond with only 'CORRECT' or 'INCORRECT' followed by a brief explanation.
        """
        
        evaluation_prompt = f"""
        Question: {q['question']}
        Expected answer: {q['answer']}
        User's answer: {user_answer}
        
        Is the user's answer correct or close enough to be considered correct?
        """
        
        response = client.models.generate_content(
            model=MODEL_ID,
            contents=evaluation_prompt,
            config=GenerateContentConfig(system_instruction=system_instruction, temperature=0)
        )
        
        # Process evaluation result
        if response.text.strip().upper().startswith("CORRECT"):
            print("✓ Correct!")
            print(f"Full answer: {q['answer']}\n")
            correct_count += 1
        else:
            print("✗ Incorrect.")
            print(f"Correct answer: {q['answer']}\n")
    
    # Show final score
    print(f"{'='*40}")
    print(f"Quiz completed! Your score: {correct_count}/{len(questions)} ({correct_count/len(questions)*100:.1f}%)")
    print(f"{'='*40}")
    
    return correct_count / len(questions)

In [36]:
def run_gemini_quiz(vocab_items):
    """Run a Gemini-generated quiz"""
    # Set quiz parameters
    quiz_type = input("Quiz type (multiple_choice, fill_in_blank, mixed): ").strip()
    num_questions = int(input("Number of questions: ").strip())
    direction = input("Direction (j2e, e2j, mixed): ").strip()
    
    print("\nGenerating quiz...\n")
    quiz = generate_gemini_quiz(vocab_items, quiz_type, num_questions, direction)
    
    # Display the quiz
    display(Markdown("## Japanese Vocabulary Quiz"))
    display(Markdown(quiz))
    
    # Save the quiz to a Markdown file
    save_markdown(quiz, filename="gemini_quiz.md")
    
    print("\nTake the quiz above and check your answers against the provided answer key.")

## 7. Run a Quiz

Let's run a quiz with the extracted vocabulary. Choose the type of quiz you want to take.

In [38]:
if 'vocabulary_list' in locals() and vocabulary_list and hasattr(vocabulary_list, 'vocab_items'):
    print("Choose a quiz type:")
    print("1. Multiple Choice Quiz (Japanese to English)")
    print("2. Multiple Choice Quiz (English to Japanese)")
    print("3. Typing Quiz (Japanese to English)")
    print("4. Typing Quiz (English to Japanese)")
    print("5. Gemini-Generated Custom Quiz")
    print("6. Generate and Export Quiz to PDF")
    
    choice = input("\nEnter your choice (1-6): ").strip()
    
    if choice == "1":
        run_multiple_choice_quiz(vocabulary_list.vocab_items, num_questions=5, direction="j2e")
    elif choice == "2":
        run_multiple_choice_quiz(vocabulary_list.vocab_items, num_questions=5, direction="e2j")
    elif choice == "3":
        run_typing_quiz(vocabulary_list.vocab_items, num_questions=5, direction="j2e")
    elif choice == "4":
        run_typing_quiz(vocabulary_list.vocab_items, num_questions=5, direction="e2j")
    elif choice == "5":
        run_gemini_quiz(vocabulary_list.vocab_items)
    elif choice == "6":
        run_export_quiz_menu()
    else:
        print("Invalid choice. Please run the cell again and choose a number between 1 and 6.")
else:
    print("No vocabulary list available. Please run the vocabulary extraction section first.")

Choose a quiz type:
1. Multiple Choice Quiz (Japanese to English)
2. Multiple Choice Quiz (English to Japanese)
3. Typing Quiz (Japanese to English)
4. Typing Quiz (English to Japanese)
5. Gemini-Generated Custom Quiz
6. Generate and Export Quiz to PDF

Generating quiz...


Generating quiz...



## Japanese Vocabulary Quiz

Okay, here is a fill-in-the-blank quiz designed to help students practice the provided Japanese vocabulary.

**Instructions:**

*   Read each sentence carefully.
*   Fill in the blank with the most appropriate word from the vocabulary list provided below.
*   Write your answers in Japanese.
*   Some questions may require you to provide the correct form of the word.
*   Good luck!

**Vocabulary List:**

*   なれる (慣れる)
*   チャレンジする
*   はこぶ (運ぶ)
*   かならず (必ず)
*   やっと
*   いっしょうけんめい (一生懸命)
*   ふこう (不幸)
*   やせる (痩せる)
*   あう (会う)
*   おとず (訪)
*   とおく (遠く)
*   のぞ (望)
*   きもち (気持ち)
*   けがにん (けが人)
*   かなり
*   かんこうきゃく (観光客)
*   じゅうたく (住宅)
*   きんがん (近眼)
*   ねむ (眠)
*   ひていけい (否定形)

**Quiz:**

**Part 1: Fill in the Blank**

1.  Japanese: 日本での生活に＿＿＿ 必要があります。
    English: You need to \_\_\_\_\_\_ living in Japan.
    Answer: \_\_\_\_\_\_

2.  Japanese: 彼は＿＿＿ 勉強して、試験に合格しました。
    English: He studied \_\_\_\_\_\_ and passed the exam.
    Answer: \_\_\_\_\_\_

3.  Japanese: ＿＿＿ 彼女に手紙を書きます。
    English: I will write a letter to her \_\_\_\_\_\_.
    Answer: \_\_\_\_\_\_

4.  Japanese: 新しい仕事に＿＿＿ ことにしました。
    English: I decided to \_\_\_\_\_\_ a new job.
    Answer: \_\_\_\_\_\_

5.  Japanese: ＿＿＿ 目的地に着きました。
    English: I \_\_\_\_\_\_ arrived at my destination.
    Answer: \_\_\_\_\_\_

**Answer Key:**

1.  慣れる (なれる)
2.  一生懸命 (いっしょうけんめい)
3.  必ず (かならず)
4.  チャレンジする
5.  やっと

I hope this quiz is helpful for your students! Let me know if you'd like any modifications or additional questions.


✅ Content successfully saved to gemini_quiz.md

Take the quiz above and check your answers against the provided answer key.


## 8. Save Vocabulary for Later Use

Save the extracted vocabulary to a CSV file for future use.

In [28]:
def save_vocabulary_to_csv(vocab_items, filename="japanese_vocabulary.csv"):
    """Save vocabulary to a CSV file"""
    try:
        df = pd.DataFrame([{
            "japanese": item.japanese,
            "reading": item.reading if item.reading else "",
            "english": item.english
        } for item in vocab_items])
        
        df.to_csv(filename, index=False)
        print(f"Vocabulary saved to {filename}")
        return True
    except Exception as e:
        print(f"Error saving vocabulary: {e}")
        return False

# Save vocabulary if available
if 'vocabulary_list' in locals() and vocabulary_list and hasattr(vocabulary_list, 'vocab_items'):
    save_vocabulary_to_csv(vocabulary_list.vocab_items)
else:
    print("No vocabulary list available to save.")

Vocabulary saved to japanese_vocabulary.csv


## 9. Additional Features

### 9.1 Spaced Repetition System

The following cell implements a simple spaced repetition system that tracks which words you struggle with and quizzes you more on those.

In [29]:
class SpacedRepetitionSystem:
    """A simple spaced repetition system for vocabulary learning"""
    
    def __init__(self, vocab_items):
        self.vocab_items = vocab_items
        # Initialize word strengths (higher = better known)
        self.word_strengths = {i: 1.0 for i in range(len(vocab_items))}
        self.history = []
    
    def select_words_for_review(self, num_words=10):
        """Select words for review, prioritizing less known words"""
        # Sort indices by strength (ascending)
        sorted_indices = sorted(self.word_strengths.keys(), 
                               key=lambda x: self.word_strengths[x])
        
        # Select primarily from weaker words, with some randomness
        review_indices = sorted_indices[:int(num_words * 0.7)]  # 70% weakest words
        remaining = sorted_indices[int(num_words * 0.7):]
        if len(review_indices) < num_words and remaining:
            review_indices += random.sample(remaining, 
                                          min(num_words - len(review_indices), len(remaining)))
        
        # Return actual vocabulary items
        return [(i, self.vocab_items[i]) for i in review_indices[:num_words]]
    
    def update_strength(self, index, correct):
        """Update word strength based on quiz result"""
        current_strength = self.word_strengths[index]
        
        if correct:
            # Increase strength if correct, max at 10
            new_strength = min(current_strength + 0.5, 10.0)
        else:
            # Decrease strength if incorrect, min at 0.5
            new_strength = max(current_strength - 1.0, 0.5)
        
        self.word_strengths[index] = new_strength
        self.history.append({
            "index": index,
            "word": self.vocab_items[index].japanese,
            "correct": correct,
            "old_strength": current_strength,
            "new_strength": new_strength
        })
    
    def run_review_session(self, num_words=10, direction="j2e"):
        """Run an interactive review session"""
        words_to_review = self.select_words_for_review(num_words)
        
        print(f"{'='*40}")
        print("SPACED REPETITION REVIEW SESSION")
        print(f"{'='*40}\n")
        
        for index, item in words_to_review:
            if direction == "j2e":
                print(f"Translate: {item.japanese}")
                expected = item.english
            else:  # e2j
                print(f"Translate: {item.english}")
                expected = item.japanese
            
            user_answer = input("Your answer: ").strip()
            
            # Evaluate using Gemini
            system_instruction = """
            You are a Japanese language evaluation assistant. Determine if 
            the user's answer is close enough to be considered correct. Respond 
            with only 'CORRECT' or 'INCORRECT' followed by a brief explanation.
            """
            
            evaluation_prompt = f"""
            Expected answer: {expected}
            User's answer: {user_answer}
            
            Is the user's answer correct or close enough to be considered correct?
            """
            
            response = client.models.generate_content(
                model=MODEL_ID,
                contents=evaluation_prompt,
                config=GenerateContentConfig(system_instruction=system_instruction, temperature=0)
            )
            
            is_correct = response.text.strip().upper().startswith("CORRECT")
            
            if is_correct:
                print("✓ Correct!")
            else:
                print(f"✗ Incorrect. The correct answer is: {expected}")
            
            # Update word strength
            self.update_strength(index, is_correct)
            
            # Add reading if available
            if item.reading and direction == "j2e":
                print(f"Reading: {item.reading}")
            
            print("\n" + "-"*30 + "\n")
        
        # Show summary
        correct_count = sum(1 for h in self.history[-num_words:] if h["correct"])
        print(f"Review completed! Score: {correct_count}/{num_words} ({correct_count/num_words*100:.1f}%)")
        print("Words have been updated in the spaced repetition system.")
        print(f"{'='*40}")
        
    def get_word_status_report(self):
        """Generate a report on current word strengths"""
        categories = {
            "Needs Work (0-3)": [],
            "Learning (3-7)": [],
            "Well Known (7-10)": []
        }
        
        for index, strength in self.word_strengths.items():
            word = self.vocab_items[index].japanese
            meaning = self.vocab_items[index].english
            
            if strength < 3:
                categories["Needs Work (0-3)"].append((word, meaning, strength))
            elif strength < 7:
                categories["Learning (3-7)"].append((word, meaning, strength))
            else:
                categories["Well Known (7-10)"].append((word, meaning, strength))
        
        return categories
    
    def display_report(self):
        """Display a formatted report of word statuses"""
        report = self.get_word_status_report()
        
        print(f"{'='*60}")
        print("VOCABULARY PROGRESS REPORT")
        print(f"{'='*60}\n")
        
        for category, words in report.items():
            print(f"\n{category}: {len(words)} words")
            print("-" * 40)
            
            if words:
                # Sort by strength
                sorted_words = sorted(words, key=lambda x: x[2])
                
                # Display up to 5 examples
                for word, meaning, strength in sorted_words[:5]:
                    print(f"{word} - {meaning}: {strength:.1f}")
                
                if len(sorted_words) > 5:
                    print(f"...and {len(sorted_words) - 5} more words")
            else:
                print("No words in this category yet.")
        
        print(f"\n{'='*60}")

In [34]:
# Initialize and run the spaced repetition system if vocabulary is available
if 'vocabulary_list' in locals() and vocabulary_list and hasattr(vocabulary_list, 'vocab_items'):
    # Create SRS instance
    srs = SpacedRepetitionSystem(vocabulary_list.vocab_items)
    
    # Run an initial review session
    num_words = int(input("How many words would you like to review? ").strip())
    direction = input("Direction (j2e or e2j): ").strip()
    
    srs.run_review_session(num_words=num_words, direction=direction)
    
    # Show initial report
    srs.display_report()
else:
    print("No vocabulary list available. Please run the vocabulary extraction section first.")

SPACED REPETITION REVIEW SESSION

Translate: encounter
✓ Correct!

------------------------------

Translate: unhappiness
✓ Correct!

------------------------------

Translate: unhappiness
✗ Incorrect. The correct answer is: ふこう

------------------------------

Translate: save money
✗ Incorrect. The correct answer is: ふこう

------------------------------

Translate: save money
✗ Incorrect. The correct answer is: ちょきんする

------------------------------

Translate: manager
✗ Incorrect. The correct answer is: ちょきんする

------------------------------

Translate: manager
✗ Incorrect. The correct answer is: かんりにん

------------------------------

Translate: duty
✗ Incorrect. The correct answer is: かんりにん

------------------------------

Translate: duty
✗ Incorrect. The correct answer is: ぎむ

------------------------------

Review completed! Score: 1/5 (20.0%)
Words have been updated in the spaced repetition system.
VOCABULARY PROGRESS REPORT


Needs Work (0-3): 52 words
---------------------------

### 9.2 Custom PDF Uploader (Google Colab)

If you're using Google Colab, you can use this cell to upload your own vocabulary PDF.

In [30]:
def upload_custom_pdf():
    """Upload a custom PDF file (for Colab environments)"""
    if "google.colab" in sys.modules:
        from google.colab import files
        print("Please upload your Japanese vocabulary PDF file...")
        uploaded = files.upload()
        
        if uploaded:
            filename = list(uploaded.keys())[0]
            print(f"\nUploaded: {filename}")
            return filename
        else:
            print("No file was uploaded.")
            return None
    else:
        print("This function is only available in Google Colab.")
        print("Please use the file path approach for local environments.")
        return None

In [19]:
# Upload and process a custom PDF
custom_pdf = upload_custom_pdf()
if custom_pdf:
    custom_vocab_list = extract_vocabulary_from_pdf(custom_pdf)
    if custom_vocab_list:
        print(f"\n✅ Successfully extracted {len(custom_vocab_list.vocab_items)} vocabulary items from your custom PDF")
        
        # Replace the main vocab list with the custom one
        vocabulary_list = custom_vocab_list
        
        # Display the first few items
        display(Markdown("### Sample Extracted Vocabulary from Custom PDF"))
        
        df = pd.DataFrame([{
            "Japanese": item.japanese,
            "Reading": item.reading if item.reading else "",
            "English": item.english
        } for item in vocabulary_list.vocab_items[:10]])  # Show first 10 items
        
        display(df)

This function is only available in Google Colab.
Please use the file path approach for local environments.


## 10. PDF Export Functionality

Let's add functionality to export quizzes to PDF files for offline use or printing.

In [None]:
# Install fpdf2 for PDF generation
%pip install --quiet fpdf2 matplotlib


[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m A new release of pip is available: [0m[31;49m24.3[0m[39;49m -> [0m[32;49m25.0.1[0m
[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m To update, run: [0m[32;49mpip install --upgrade pip[0m
Note: you may need to restart the kernel to use updated packages.
Note: you may need to restart the kernel to use updated packages.


In [31]:
import matplotlib.pyplot as plt
from fpdf import FPDF
import tempfile
import webbrowser
from datetime import datetime

def export_quiz_to_pdf(quiz_content, filename=None):
    """Export a quiz to a PDF file
    
    Args:
        quiz_content: The quiz content (can be text or markdown)
        filename: Optional filename for the PDF (default: auto-generated)
    """
    try:
        # Create timestamp for filename if not provided
        if not filename:
            timestamp = datetime.now().strftime("%Y%m%d_%H%M%S")
            filename = f"japanese_quiz_{timestamp}.pdf"
        elif not filename.endswith('.pdf'):
            filename += '.pdf'
        
        # Create PDF
        pdf = FPDF()
        pdf.add_page()
        pdf.set_auto_page_break(auto=True, margin=15)
        
        # Add title
        pdf.set_font("Arial", "B", 16)
        pdf.cell(0, 10, "Japanese Vocabulary Quiz", ln=True, align="C")
        pdf.ln(10)
        
        # Add date
        pdf.set_font("Arial", "I", 10)
        pdf.cell(0, 10, f"Generated on: {datetime.now().strftime('%Y-%m-%d %H:%M')}", ln=True)
        pdf.ln(5)
        
        # Add quiz content
        pdf.set_font("Arial", "", 12)
        
        # Process quiz content line by line
        lines = quiz_content.split('\n')
        for line in lines:
            # Handle basic formatting
            if line.startswith('# '):
                pdf.set_font("Arial", "B", 14)
                pdf.cell(0, 10, line[2:], ln=True)
                pdf.set_font("Arial", "", 12)
            elif line.startswith('## '):
                pdf.set_font("Arial", "B", 13)
                pdf.cell(0, 10, line[3:], ln=True)
                pdf.set_font("Arial", "", 12)
            elif line.startswith('### '):
                pdf.set_font("Arial", "B", 12)
                pdf.cell(0, 10, line[4:], ln=True)
                pdf.set_font("Arial", "", 12)
            elif line.startswith('**') and line.endswith('**'):
                pdf.set_font("Arial", "B", 12)
                pdf.cell(0, 10, line.strip('**'), ln=True)
                pdf.set_font("Arial", "", 12)
            elif line.startswith('Q') and ':' in line:
                pdf.set_font("Arial", "B", 12)
                pdf.cell(0, 10, line, ln=True)
                pdf.set_font("Arial", "", 12)
            elif line.startswith(('A.', 'B.', 'C.', 'D.')):
                pdf.cell(0, 8, line, ln=True)
            elif line.startswith('Answer'):
                pdf.set_font("Arial", "B", 12)
                pdf.cell(0, 10, line, ln=True)
                pdf.set_font("Arial", "", 12)
            elif line.strip() == '':
                pdf.ln(5)
            else:
                pdf.multi_cell(0, 8, line)
        
        # Save PDF
        pdf.output(filename)
        print(f"Quiz exported successfully to {filename}")
        
        # Open the PDF
        webbrowser.open(filename)
        
        return filename
    
    except Exception as e:
        print(f"Error exporting quiz to PDF: {e}")
        return None

def generate_and_export_quiz(vocab_items, quiz_type="multiple_choice", num_questions=10, 
                           direction="mixed", filename=None):
    """Generate and export a quiz to PDF in one step"""
    
    # Generate the quiz
    print("Generating quiz...")
    quiz_content = generate_gemini_quiz(vocab_items, quiz_type, num_questions, direction)
    
    # Export to PDF
    print("Exporting to PDF...")
    export_quiz_to_pdf(quiz_content, filename)
    
    # Also display in notebook
    display(Markdown("## Generated Quiz"))
    display(Markdown(quiz_content))

In [32]:
def run_export_quiz_menu():
    """Run a menu to export quizzes to PDF"""
    if 'vocabulary_list' in locals() and vocabulary_list and hasattr(vocabulary_list, 'vocab_items'):
        print("\n" + "="*50)
        print("EXPORT QUIZ TO PDF")
        print("="*50)
        
        # Get quiz parameters
        print("\nChoose quiz type:")
        print("1. Multiple Choice Quiz")
        print("2. Fill-in-the-blank Quiz")
        print("3. Mixed Format Quiz")
        
        type_choice = input("\nEnter your choice (1-3): ").strip()
        
        quiz_types = {
            "1": "multiple_choice",
            "2": "fill_in_blank", 
            "3": "mixed"
        }
        
        if type_choice not in quiz_types:
            print("Invalid choice. Using multiple choice as default.")
            quiz_type = "multiple_choice"
        else:
            quiz_type = quiz_types[type_choice]
        
        # Get number of questions
        try:
            num_questions = int(input("\nNumber of questions (5-20): ").strip())
            if num_questions < 5:
                num_questions = 5
            elif num_questions > 20:
                num_questions = 20
        except ValueError:
            print("Invalid number. Using 10 questions as default.")
            num_questions = 10
        
        # Get direction
        print("\nChoose quiz direction:")
        print("1. Japanese to English")
        print("2. English to Japanese")
        print("3. Mixed (both directions)")
        
        dir_choice = input("\nEnter your choice (1-3): ").strip()
        
        directions = {
            "1": "j2e",
            "2": "e2j",
            "3": "mixed"
        }
        
        if dir_choice not in directions:
            print("Invalid choice. Using mixed direction as default.")
            direction = "mixed"
        else:
            direction = directions[dir_choice]
        
        # Get filename (optional)
        filename = input("\nEnter filename for PDF (or press Enter for auto-generated name): ").strip()
        
        # Generate and export
        generate_and_export_quiz(
            vocabulary_list.vocab_items, 
            quiz_type, 
            num_questions, 
            direction,
            filename
        )
    else:
        print("No vocabulary list available. Please run the vocabulary extraction section first.")

## 11. Updated Quiz Menu

This updated quiz menu adds the option to export quizzes to PDF.

In [33]:
if 'vocabulary_list' in locals() and vocabulary_list and hasattr(vocabulary_list, 'vocab_items'):
    print("Choose a quiz type:")
    print("1. Multiple Choice Quiz (Japanese to English)")
    print("2. Multiple Choice Quiz (English to Japanese)")
    print("3. Typing Quiz (Japanese to English)")
    print("4. Typing Quiz (English to Japanese)")
    print("5. Gemini-Generated Custom Quiz")
    print("6. Generate and Export Quiz to PDF")
    
    choice = input("\nEnter your choice (1-6): ").strip()
    
    if choice == "1":
        run_multiple_choice_quiz(vocabulary_list.vocab_items, num_questions=5, direction="j2e")
    elif choice == "2":
        run_multiple_choice_quiz(vocabulary_list.vocab_items, num_questions=5, direction="e2j")
    elif choice == "3":
        run_typing_quiz(vocabulary_list.vocab_items, num_questions=5, direction="j2e")
    elif choice == "4":
        run_typing_quiz(vocabulary_list.vocab_items, num_questions=5, direction="e2j")
    elif choice == "5":
        run_gemini_quiz(vocabulary_list.vocab_items)
    elif choice == "6":
        run_export_quiz_menu()
    else:
        print("Invalid choice. Please run the cell again and choose a number between 1 and 6.")
else:
    print("No vocabulary list available. Please run the vocabulary extraction section first.")

Choose a quiz type:
1. Multiple Choice Quiz (Japanese to English)
2. Multiple Choice Quiz (English to Japanese)
3. Typing Quiz (Japanese to English)
4. Typing Quiz (English to Japanese)
5. Gemini-Generated Custom Quiz
6. Generate and Export Quiz to PDF
No vocabulary list available. Please run the vocabulary extraction section first.
No vocabulary list available. Please run the vocabulary extraction section first.


## Conclusion

In this notebook, we've created a Japanese vocabulary quiz application that leverages Google Gemini to:

1. Extract vocabulary from PDF documents
2. Generate various quiz formats (multiple choice, typing, etc.)
3. Provide intelligent answer evaluation
4. Create a spaced repetition system for more effective learning
5. Export quizzes to PDF for offline use

You can extend this application with more features like:

- Audio pronunciation using text-to-speech
- Grammar integration for sentence construction practice
- Additional quiz formats (matching, flashcards, etc.)
- Progress tracking and statistics over time
- Integration with other language learning resources