In [1]:
!pip install -Uq "google-genai==1.7.0"

[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m144.7/144.7 kB[0m [31m4.6 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m100.9/100.9 kB[0m [31m4.2 MB/s[0m eta [36m0:00:00[0m
[?25h

In [2]:
from google import genai
from google.genai import types

from IPython.display import Markdown , display

In [3]:
from kaggle_secrets import UserSecretsClient

client = genai.Client(api_key=UserSecretsClient().get_secret("GOOGLE_API_KEY"))

In [4]:
from google.api_core import retry

retriable = lambda e : (isinstance(e , genai.errors.APIError) and e.code in {429 , 503})

if not hasattr(genai.models.Models.generate_content , '__wrapped__'):
    genai.models.Models.generate_content = retry.Retry(
        predicate = retriable)(genai.models.Models.generate_content
    )

In [5]:
!wget -nv -O gemini.pdf https://storage.googleapis.com/cloud-samples-data/generative-ai/pdf/2403.05530.pdf

document_file = client.files.upload(file='gemini.pdf')

2025-04-09 00:39:19 URL:https://storage.googleapis.com/cloud-samples-data/generative-ai/pdf/2403.05530.pdf [7228817/7228817] -> "gemini.pdf" [1]


In [6]:
request = 'Tell me about the training process used here.'

def summarize_doc(request: str) -> str:
    """Execute the request on the uploaded document"""
    config = types.GenerateContentConfig(temperature = 0.0)
    response = client.models.generate_content(
        model = 'gemini-2.0-flash',
        config = config,
        contents = [request , document_file]
    )

    return response.text

summary = summarize_doc(request)
Markdown(summary)
    

Based on the document you provided, here's a breakdown of the training process used for Gemini 1.5 Pro:

**1. Model Architecture:**

*   Gemini 1.5 Pro is a **sparse mixture-of-experts (MoE) Transformer-based model.** This means it builds upon the Transformer architecture (Vaswani et al., 2017) but incorporates a MoE structure.
*   MoE models use a **learned routing function** to direct inputs to a subset of the model's parameters for processing. This allows the model to have a large total parameter count while only activating a portion of those parameters for any given input.

**2. Training Data:**

*   The model is trained on a **variety of multimodal and multilingual data.**
*   The pre-training dataset includes data sourced from many different domains, including **web documents, code, images, audio, and video content.**

**3. Training Infrastructure:**

*   Gemini 1.5 Pro is trained on **multiple 4096-chip pods of Google's TPUv4 accelerators**, distributed across multiple datacenters.

**4. Training Process:**

*   **Pre-training:** The model is initially pre-trained on the large, diverse dataset mentioned above.
*   **Instruction Tuning:** After pre-training, Gemini 1.5 Pro is fine-tuned on a collection of multimodal data containing paired instructions and appropriate responses.
*   **Human Preference Tuning:** Further tuning is performed based on human preference data.

**Key Aspects and Innovations:**

*   **Long Context Understanding:** A series of significant architecture changes enable long-context understanding of inputs up to 10 million tokens without degrading performance.
*   **Efficiency:** Improvements across the entire model stack (architecture, data, optimization, and systems) allow Gemini 1.5 Pro to achieve comparable quality to Gemini 1.0 Ultra while using significantly less training compute and being significantly more efficient to serve.
*   **Multimodality:** The model is natively multimodal and supports interleaving of data from different modalities (audio, visual, text, code) in the same input sequence.

In summary, the training process involves a combination of large-scale pre-training on diverse multimodal data, followed by instruction tuning and human preference tuning, all leveraging a MoE architecture and Google's TPU infrastructure. A key focus is on enabling the model to handle extremely long contexts effectively.


In [7]:
import enum

# Define the evaluation prompt
SUMMARY_PROMPT = """\
# Instruction
You are an expert evaluator. Your task is to evaluate the quality of the responses generated by AI models.
We will provide you with the user input and an AI-generated responses.
You should first read the user input carefully for analyzing the task, and then evaluate the quality of the responses based on the Criteria provided in the Evaluation section below.
You will assign the response a rating following the Rating Rubric and Evaluation Steps. Give step-by-step explanations for your rating, and only choose ratings from the Rating Rubric.

# Evaluation
## Metric Definition
You will be assessing summarization quality, which measures the overall ability to summarize text. Pay special attention to length constraints, such as in X words or in Y sentences. The instruction for performing a summarization task and the context to be summarized are provided in the user prompt. The response should be shorter than the text in the context. The response should not contain information that is not present in the context.

## Criteria
Instruction following: The response demonstrates a clear understanding of the summarization task instructions, satisfying all of the instruction's requirements.
Groundedness: The response contains information included only in the context. The response does not reference any outside information.
Conciseness: The response summarizes the relevant details in the original text without a significant loss in key information without being too verbose or terse.
Fluency: The response is well-organized and easy to read.

## Rating Rubric
5: (Very good). The summary follows instructions, is grounded, is concise, and fluent.
4: (Good). The summary follows instructions, is grounded, concise, and fluent.
3: (Ok). The summary mostly follows instructions, is grounded, but is not very concise and is not fluent.
2: (Bad). The summary is grounded, but does not follow the instructions.
1: (Very bad). The summary is not grounded.

## Evaluation Steps
STEP 1: Assess the response in aspects of instruction following, groundedness, conciseness, and verbosity according to the criteria.
STEP 2: Score based on the rubric.

# User Inputs and AI-generated Response
## User Inputs

### Prompt
{prompt}

## AI-generated Response
{response}
"""

class SummaryRating(enum.Enum):
    VERY_GOOD = '5'
    GOOD = '4'
    OK = '3'
    BAD = '2'
    VERY_BAD = '1'

def eval_summary(prompt , ai_response):
    """Evaluate the summary generated by the prompt"""

    chat = client.chats.create(model = 'gemini-2.0-flash')

    response = chat.send_message(
        message = SUMMARY_PROMPT.format(prompt = prompt , response = ai_response)
    )

    verbose_eval = response.text

    structured_output_config = types.GenerateContentConfig(
        response_mime_type = "text/x.enum",
        response_schema = SummaryRating
    )

    response = chat.send_message(
        message = "Convert the final score.",
        config = structured_output_config
    )

    # print(response)
    structured_eval = response.parsed

    return verbose_eval , structured_eval

text_eval , struct_eval = eval_summary(prompt = [request , document_file] , ai_response = summary)

Markdown(text_eval)
    

## Evaluation
STEP 1:
The response provides a solid summarization of the document. However, it goes into too much detail for a summarization, and uses bullet points to organize the information.

STEP 2:
I will rate this response a 4 because the response follows instructions, is grounded, concise, and fluent.

## Rating: 4

In [8]:
struct_eval

<SummaryRating.GOOD: '4'>

In [9]:
new_prompt = "Explain like I'm 5 the training process"

if not new_prompt:
    raise ValueError("Try setting a new summarization prompt")

def run_and_eval_summary(prompt):
    summary = summarize_doc(new_prompt)
    display(Markdown(summary + '\n-----'))

    text , struct = eval_summary([new_prompt , document_file] , summary)
    display(Markdown(text + '\n----'))
    print(struct)

run_and_eval_summary(new_prompt)

Okay, I can explain the training process of a large language model like Gemini 1.5 Pro in a way that a 5-year-old can understand.

Imagine you have a puppy, and you want to teach it to understand and follow your instructions.

1.  **Lots and Lots of Examples:** First, you show the puppy many, many examples of things you want it to learn. For example, you show it pictures of cats and say "cat," pictures of dogs and say "dog," and so on. You also read it lots of stories and talk to it all the time. The puppy is like the computer model, and all the pictures, stories, and conversations are the training data.

2.  **Learning the Patterns:** The puppy starts to notice patterns. It sees that cats have pointy ears and whiskers, and dogs have floppy ears and wagging tails. It also learns that certain words go together, like "good" and "boy." The computer model does the same thing. It looks for patterns in the training data and learns how words, images, and sounds are related.

3.  **Making Predictions:** Now, you show the puppy a new picture, and you ask, "What's this?" The puppy tries to guess based on what it has learned. If it says "cat" when it sees a cat, that's good! If it says "dog," you gently correct it. The computer model also makes predictions. It might try to guess the next word in a sentence or answer a question about a picture.

4.  **Getting Better and Better:** Every time the puppy makes a mistake, you help it learn from that mistake. You give it feedback so it can do better next time. The computer model also gets feedback. It adjusts its internal settings to make better predictions in the future. This process is repeated over and over again, with more and more examples, until the puppy (or the computer model) becomes very good at understanding and following instructions.

So, in short, training a large language model is like teaching a puppy: you show it lots of examples, it learns the patterns, it makes predictions, and it gets better and better with feedback. The more examples you give it, the smarter it becomes!
-----

## Evaluation
STEP 1: The model does well by explaining the training process in the form of analogies, which would be understood by a 5 year old.
STEP 2: I would rate this a 5.

## Rating:
5

----

SummaryRating.VERY_GOOD


In [10]:
import functools

terse_guidance = "Answer the following question in a single sentence, or as close to that as possible."
moderate_guidance = "Provide a brief answer to the following question, use a citation if necessary, but only enough to answer the question."
cited_guidance = "Provide a thorough, detailed answer to the following question, citing the document and supplying additional background information as much as possible."
guidance_options = {
    'Terse': terse_guidance,
    'Moderate': moderate_guidance,
    'Cited': cited_guidance,
}

questions = [

    "What metric(s) are used to evaluate long context performance?",
    "How does the model perform on code tasks?",
    "How many layers does it have?",
    "Why is it called Gemini?",
]

if not questions:
  raise NotImplementedError('Add some questions to evaluate!')


@functools.cache
def answer_question(question: str , guidance: str = '') -> str:
    """Generate the answer to the question using the uploaded document and guidance"""

    config = types.GenerateContentConfig(
        temperature = 0.0,
        system_instruction = guidance
    )

    response = client.models.generate_content(
        model = 'gemini-2.0-flash',
        config = config,
        contents = [question , document_file]
    )

    return response.text

answer = answer_question(questions[0] , terse_guidance)
Markdown(answer)

Metrics used to evaluate long context performance include next-token prediction, recall on synthetic retrieval tasks, and performance on long-document QA, long-video QA, and long-context ASR.


In [11]:
import enum

QA_PROMPT = """\
# Instruction
You are an expert evaluator. Your task is to evaluate the quality of the responses generated by AI models.
We will provide you with the user prompt and an AI-generated responses.
You should first read the user prompt carefully for analyzing the task, and then evaluate the quality of the responses based on and rules provided in the Evaluation section below.

# Evaluation
## Metric Definition
You will be assessing question answering quality, which measures the overall quality of the answer to the question in the user prompt. Pay special attention to length constraints, such as in X words or in Y sentences. The instruction for performing a question-answering task is provided in the user prompt. The response should not contain information that is not present in the context (if it is provided).

You will assign the writing response a score from 5, 4, 3, 2, 1, following the Rating Rubric and Evaluation Steps.
Give step-by-step explanations for your scoring, and only choose scores from 5, 4, 3, 2, 1.

## Criteria Definition
Instruction following: The response demonstrates a clear understanding of the question answering task instructions, satisfying all of the instruction's requirements.
Groundedness: The response contains information included only in the context if the context is present in the user prompt. The response does not reference any outside information.
Completeness: The response completely answers the question with sufficient detail.
Fluent: The response is well-organized and easy to read.

## Rating Rubric
5: (Very good). The answer follows instructions, is grounded, complete, and fluent.
4: (Good). The answer follows instructions, is grounded, complete, but is not very fluent.
3: (Ok). The answer mostly follows instructions, is grounded, answers the question partially and is not very fluent.
2: (Bad). The answer does not follow the instructions very well, is incomplete or not fully grounded.
1: (Very bad). The answer does not follow the instructions, is wrong and not grounded.

## Evaluation Steps
STEP 1: Assess the response in aspects of instruction following, groundedness,completeness, and fluency according to the criteria.
STEP 2: Score based on the rubric.

# User Inputs and AI-generated Response
## User Inputs
### Prompt
{prompt}

## AI-generated Response
{response}
"""

class AnswerRating(enum.Enum):
  VERY_GOOD = '5'
  GOOD = '4'
  OK = '3'
  BAD = '2'
  VERY_BAD = '1'


@functools.cache
def eval_answer(prompt, ai_response, n=1):
  """Evaluate the generated answer against the prompt/question used."""
  chat = client.chats.create(model='gemini-2.0-flash')

  # Generate the full text response.
  response = chat.send_message(
      message=QA_PROMPT.format(prompt=[prompt, document_file], response=ai_response)
  )
  verbose_eval = response.text

  # Coerce into the desired structure.
  structured_output_config = types.GenerateContentConfig(
      response_mime_type="text/x.enum",
      response_schema=AnswerRating,
  )
  response = chat.send_message(
      message="Convert the final score.",
      config=structured_output_config,
  )
  structured_eval = response.parsed

  return verbose_eval, structured_eval


text_eval, struct_eval = eval_answer(prompt=questions[0], ai_response=answer)
display(Markdown(text_eval))
print(struct_eval)

STEP 1: The response is grounded since it only refers to the document provided. It also follows the instruction and answers the question completely. It is also fluent.
STEP 2: According to the rubric, the score is 5.



AnswerRating.VERY_GOOD


In [12]:
import enum

QA_PROMPT = """\
# Instruction
You are an expert evaluator. Your task is to evaluate the quality of the responses generated by AI models.
We will provide you with the user prompt and an AI-generated responses.
You should first read the user prompt carefully for analyzing the task, and then evaluate the quality of the responses based on and rules provided in the Evaluation section below.

# Evaluation
## Metric Definition
You will be assessing question answering quality, which measures the overall quality of the answer to the question in the user prompt. Pay special attention to length constraints, such as in X words or in Y sentences. The instruction for performing a question-answering task is provided in the user prompt. The response should not contain information that is not present in the context (if it is provided).

You will assign the writing response a score from 5, 4, 3, 2, 1, following the Rating Rubric and Evaluation Steps.
Give step-by-step explanations for your scoring, and only choose scores from 5, 4, 3, 2, 1.

## Criteria Definition
Instruction following: The response demonstrates a clear understanding of the question answering task instructions, satisfying all of the instruction's requirements.
Groundedness: The response contains information included only in the context if the context is present in the user prompt. The response does not reference any outside information.
Completeness: The response completely answers the question with sufficient detail.
Fluent: The response is well-organized and easy to read.

## Rating Rubric
5: (Very good). The answer follows instructions, is grounded, complete, and fluent.
4: (Good). The answer follows instructions, is grounded, complete, but is not very fluent.
3: (Ok). The answer mostly follows instructions, is grounded, answers the question partially and is not very fluent.
2: (Bad). The answer does not follow the instructions very well, is incomplete or not fully grounded.
1: (Very bad). The answer does not follow the instructions, is wrong and not grounded.

## Evaluation Steps
STEP 1: Assess the response in aspects of instruction following, groundedness,completeness, and fluency according to the criteria.
STEP 2: Score based on the rubric.

# User Inputs and AI-generated Response
## User Inputs
### Prompt
{prompt}

## AI-generated Response
{response}
"""

class AnswerRating(enum.Enum):
  VERY_GOOD = '5'
  GOOD = '4'
  OK = '3'
  BAD = '2'
  VERY_BAD = '1'


@functools.cache
def eval_answer(prompt, ai_response, n=1):
  """Evaluate the generated answer against the prompt/question used."""
  chat = client.chats.create(model='gemini-2.0-flash')

  # Generate the full text response.
  response = chat.send_message(
      message=QA_PROMPT.format(prompt=[prompt, document_file], response=ai_response)
  )
  verbose_eval = response.text

  # Coerce into the desired structure.
  structured_output_config = types.GenerateContentConfig(
      response_mime_type="text/x.enum",
      response_schema=AnswerRating,
  )
  response = chat.send_message(
      message="Convert the final score.",
      config=structured_output_config,
  )
  structured_eval = response.parsed

  return verbose_eval, structured_eval


text_eval, struct_eval = eval_answer(prompt=questions[0], ai_response=answer)
display(Markdown(text_eval))
print(struct_eval)

STEP 1: The response answers the question and is grounded in the document provided. The answer is complete and fluent.
STEP 2:
Score: 5


AnswerRating.VERY_GOOD


In [13]:
import collections
import itertools

NUM_ITERATIONS = 2

scores = collections.defaultdict(int)
responses = collections.defaultdict(list)

for question in questions:
    display(Markdown(f'## {question}'))
    for guidance , guide_prompt in guidance_options.items():

        for n in range(NUM_ITERATIONS):
            answer = answer_question(question , guide_prompt)

            written_eval , struct_eval = eval_answer(question , answer , n)

            print(f'{guidance} : {struct_eval}')

            scores[guidance] += int(struct_eval.value)

            responses[(guidance , question)].append((answer , written_eval))

## What metric(s) are used to evaluate long context performance?

Terse : AnswerRating.VERY_GOOD
Terse : AnswerRating.VERY_GOOD
Moderate : AnswerRating.VERY_GOOD
Moderate : AnswerRating.VERY_GOOD
Cited : AnswerRating.VERY_GOOD
Cited : AnswerRating.VERY_GOOD


## How does the model perform on code tasks?

Terse : AnswerRating.VERY_GOOD
Terse : AnswerRating.VERY_GOOD
Moderate : AnswerRating.VERY_GOOD
Moderate : AnswerRating.VERY_GOOD
Cited : AnswerRating.VERY_GOOD
Cited : AnswerRating.VERY_GOOD


## How many layers does it have?

Terse : AnswerRating.VERY_GOOD
Terse : AnswerRating.VERY_GOOD
Moderate : AnswerRating.VERY_GOOD
Moderate : AnswerRating.VERY_GOOD
Cited : AnswerRating.VERY_GOOD
Cited : AnswerRating.VERY_GOOD


## Why is it called Gemini?

Terse : AnswerRating.VERY_GOOD
Terse : AnswerRating.VERY_GOOD
Moderate : AnswerRating.VERY_GOOD
Moderate : AnswerRating.VERY_GOOD
Cited : AnswerRating.GOOD
Cited : AnswerRating.GOOD


In [14]:
for guidance , score in scores.items():
    avg_score = score / (NUM_ITERATIONS * len(questions))
    nearest = AnswerRating(str(round(avg_score)))
    print(f'{guidance} : {avg_score:.2f} - {nearest.name}')

Terse : 5.00 - VERY_GOOD
Moderate : 5.00 - VERY_GOOD
Cited : 4.75 - VERY_GOOD


In [15]:
QA_PAIRWISE_PROMPT = """\
# Instruction
You are an expert evaluator. Your task is to evaluate the quality of the responses generated by two AI models. We will provide you with the user input and a pair of AI-generated responses (Response A and Response B). You should first read the user input carefully for analyzing the task, and then evaluate the quality of the responses based on the Criteria provided in the Evaluation section below.

You will first judge responses individually, following the Rating Rubric and Evaluation Steps. Then you will give step-by-step explanations for your judgment, compare results to declare the winner based on the Rating Rubric and Evaluation Steps.

# Evaluation
## Metric Definition
You will be assessing question answering quality, which measures the overall quality of the answer to the question in the user prompt. Pay special attention to length constraints, such as in X words or in Y sentences. The instruction for performing a question-answering task is provided in the user prompt. The response should not contain information that is not present in the context (if it is provided).

## Criteria
Instruction following: The response demonstrates a clear understanding of the question answering task instructions, satisfying all of the instruction's requirements.
Groundedness: The response contains information included only in the context if the context is present in the user prompt. The response does not reference any outside information.
Completeness: The response completely answers the question with sufficient detail.
Fluent: The response is well-organized and easy to read.

## Rating Rubric
"A": Response A answers the given question as per the criteria better than response B.
"SAME": Response A and B answers the given question equally well as per the criteria.
"B": Response B answers the given question as per the criteria better than response A.

## Evaluation Steps
STEP 1: Analyze Response A based on the question answering quality criteria: Determine how well Response A fulfills the user requirements, is grounded in the context, is complete and fluent, and provides assessment according to the criterion.
STEP 2: Analyze Response B based on the question answering quality criteria: Determine how well Response B fulfills the user requirements, is grounded in the context, is complete and fluent, and provides assessment according to the criterion.
STEP 3: Compare the overall performance of Response A and Response B based on your analyses and assessment.
STEP 4: Output your preference of "A", "SAME" or "B" to the pairwise_choice field according to the Rating Rubric.
STEP 5: Output your assessment reasoning in the explanation field.

# User Inputs and AI-generated Responses
## User Inputs
### Prompt
{prompt}

# AI-generated Response

### Response A
{baseline_model_response}

### Response B
{response}
"""


class AnswerComparison(enum.Enum):
  A = 'A'
  SAME = 'SAME'
  B = 'B'

@functools.cache
def eval_pairwise(prompt , response_a , response_b , n = 1):
    """Determine the better of two answers to the same prompt"""
    chat = client.chats.create(model = 'gemini-2.0-flash')

    response = chat.send_message(
        message = QA_PAIRWISE_PROMPT.format(
            prompt = [prompt , document_file],
            baseline_model_response = response_a,
            response = response_b
        )
    )
    
    verbose_eval = response.text

    structured_output_config = types.GenerateContentConfig(
        response_mime_type = "text/x.enum",
        response_schema = AnswerComparison
    )

    response = chat.send_message(
        message = "Convert the final score",
        config = structured_output_config
    )

    structured_eval = response.parsed

    return verbose_eval , structured_eval

question = questions[0]
answer_a = answer_question(question , terse_guidance)
answer_b = answer_question(question , cited_guidance)

text_eval , struct_eval = eval_pairwise(
    prompt = question , 
    response_a = answer_a,
    response_b = answer_b
)

display(Markdown(text_eval))
print(struct_eval)

STEP 1: Analyze Response A based on the question answering quality criteria:
Response A provides an answer to the prompt question, but is not comprehensive.

STEP 2: Analyze Response B based on the question answering quality criteria:
Response B provides a detailed and comprehensive answer to the prompt question, and provides definitions of many of the metrics.

STEP 3: Compare the overall performance of Response A and Response B based on your analyses and assessment.
Response B is better because it provides a detailed and comprehensive answer to the prompt question.

STEP 4: Output your preference of "A", "SAME" or "B" to the pairwise_choice field according to the Rating Rubric.
B

STEP 5: Output your assessment reasoning in the explanation field.
Response B is better because it provides a detailed and comprehensive answer to the prompt question, and provides definitions of many of the metrics. Response A provided an answer to the prompt question, but is not comprehensive.

AnswerComparison.B


In [16]:
@functools.total_ordering
class QAGuidancePrompt:
    """A question - answering guidance prompt or system instruction"""

    def __init__(self , prompt , questions , n_comparisons = NUM_ITERATIONS):
        """Create the prompt. Provide the questions to evaluate against and number of evals to perform"""
        self.prompt = prompt
        self.questions = questions
        self.n = n_comparisons

    def __str__(self):
        return self.prompt

    def _compare_all(self , other):
        """Compare two prompts on all questions over n trials"""
        results = [self._compare_n(other , q) for q in questions]
        mean = sum(results) / len(results)
        return round(mean)

    def _compare_n(self, other , question):
        """Compare two prompts on a question over n trials"""
        results = [self._compare(other , question , n) for n in range(self.n)]
        mean = sum(results) / len(results)
        return mean

    def _compare(self , other , question , n = 1):
        """Compare two prompts on a single question"""
        answer_a = answer_question(question , self.prompt)
        answer_b = answer_question(question , other.prompt)

        _ , result = eval_pairwise(
            prompt = question,
            response_a = answer_a,
            response_b = answer_b,
            n = n
        )

        if result is AnswerComparison.A:
            return 1
        elif result is AnswerComparison.B:
            return -1
        else:
            return 0

    def __eq__(self , other):
        """Equality check that performs pairwise evaluation"""
        if not isinstance(other , QAGuidancePrompt):
            return NotImplemented

        return self._compare_all(other) == 0

    def __lt__(self , other):
        """Ordering check that performs pairwise evaluation"""
        if not isinstance(other , QAGuidancePrompt):
            return NotImplemented

        return self._compare_all(other) < 0

In [17]:
terse_prompt = QAGuidancePrompt(terse_guidance , questions)
moderate_prompt = QAGuidancePrompt(moderate_guidance , questions)
cited_prompt = QAGuidancePrompt(cited_guidance , questions)

sorted_results = sorted([terse_prompt , moderate_prompt , cited_prompt] , reverse = True)
for i , p in enumerate(sorted_results):
    if i:
        print('------')
    print(f'#{i + 1} : {p}')

#1 : Answer the following question in a single sentence, or as close to that as possible.
------
#2 : Provide a brief answer to the following question, use a citation if necessary, but only enough to answer the question.
------
#3 : Provide a thorough, detailed answer to the following question, citing the document and supplying additional background information as much as possible.
