# **The Impact of Demonstration Quality**

For this second exercise we will test how different qualities of demonstrations can affect our results using machine translation as our task.

## Setup

Once again, let's start by installing and import the necessary libraries.

In [None]:
!pip install openai
!pip freeze

Collecting openai
  Downloading openai-1.33.0-py3-none-any.whl (325 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m325.5/325.5 kB[0m [31m3.5 MB/s[0m eta [36m0:00:00[0m
Collecting httpx<1,>=0.23.0 (from openai)
  Downloading httpx-0.27.0-py3-none-any.whl (75 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m75.6/75.6 kB[0m [31m4.2 MB/s[0m eta [36m0:00:00[0m
Collecting httpcore==1.* (from httpx<1,>=0.23.0->openai)
  Downloading httpcore-1.0.5-py3-none-any.whl (77 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m77.9/77.9 kB[0m [31m7.3 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting h11<0.15,>=0.13 (from httpcore==1.*->httpx<1,>=0.23.0->openai)
  Downloading h11-0.14.0-py3-none-any.whl (58 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m58.3/58.3 kB[0m [31m4.6 MB/s[0m eta [36m0:00:00[0m
Installing collected packages: h11, httpcore, httpx, openai
Successfully installed h11-0.14.0 httpcore-1.0.5 ht

In [None]:
import os
from openai import AzureOpenAI

client = AzureOpenAI(
  api_key = "13d8d0888405404b9c1ee4407ad19226",
  api_version = "2023-07-01-preview",
  azure_endpoint =  "https://openai-resource-for-multilingual.openai.azure.com/"
)

Once again we will use the `get_completion` function to make the runs more convinient.

In [None]:
def get_completion(prompt, model="gpt-35-turbo"):
    messages = [{"role": "user", "content": prompt}]
    response = client.chat.completions.create(
        model=model,
        messages=messages,
        temperature=0, # this is the degree of randomness of the model's output
    )
    return response.choices[0].message.content

## Demonstration Quality

As we said in the last notebook, there are a series of attributes of the prompt we can play around with to try to improve the performance of the model in our particular task. In this notebook, we will focus on the quality of the demonstration using Machine Translation as our task. As you might know, LLMs are already able to perform fairly well in this task, so these examples are merely illustrative. For our particular case, we will use English-to-Spanish translation, **feel free to use any other pair of languages you know** to evaluate these techniques.

## Zero-shot

In this first example, we will not use any ICL demonstration. Just ask the model to perform the desired task directly, without providing any examples.

In [None]:
prompt = f"""
Translate the following English text to Spanish:
Source: The smallest field has only two elements: 0 and 1. Target:
"""
response = get_completion(prompt)
print(response)

El campo más pequeño tiene solo dos elementos: 0 y 1.


### First example

Now, we will add a series of demonstrations to our previous example. We will start with low-quality demonstrations, which we will slowly improve. To achieve this, we will start with a random pairing of sentences as source and target in our demonstrations.

In [None]:
prompt = f"""
Translate the following English text to Spanish:
Source: Spain is the host country for the Olympics in 1992. Target: Me caso la próxima semana.
Source: It's not as easy as people think. Target: Los zapatos de Tom le están demasiado grandes.
Source: Please don't walk so fast. Target: ¿Qué tren vas a coger?
Source: The smallest field has only two elements: 0 and 1. Target:
"""
response = get_completion(prompt)
print(response)

El campo más pequeño tiene solo dos elementos: 0 y 1.


### Second example

In this second example, we will pass the correct translations by tweaking the sentences a bit, for example, by eliminating some words from the demonstrations (either from the source or from the target sentence).

In [None]:
prompt = f"""
Translate the following English text to Spanish:
Source: Spain is the host country for the Olympics in 1992. Target: España la sede de las Olimpiadas de 1992.
Source: It's not as easy as people think. Target: Es tan fácil como la gente piensa.
Source: Don't walk so fast. Target: Por favor, no camines tan rápido.
Source: The smallest field has only two elements: 0 and 1. Target:
"""
response = get_completion(prompt)
print(response)

### Third Example

For the next step, we will pass the correct translations for the same sentences we used in the last two examples.

In [None]:
prompt = f"""
Translate the following English text to Spanish:
Source: Spain is the host country for the Olympics in 1992. Target: España fue la sede de las Olimpiadas de 1992.
Source: It's not as easy as people think. Target: No es tan fácil como la gente piensa.
Source: Please don't walk so fast. Target: Por favor, no camines tan rápido.
Source: The smallest field has only two elements: 0 and 1. Target:
"""
response = get_completion(prompt)
print(response)

### Forth example

Lastly, for our last example, we selected a series of sentences from the same domain as our target sentence to see if this influences the performance.

In [None]:
prompt = f"""
Translate the following English text to Spanish:
Source: Associativity of addition and multiplication. Target: Asociatividad de la adición y la multiplicación.
Source: A field is a fundamental algebraic structure. Target: Un cuerpo es una estructura algebraica fundamental.
Source: Rational numbers are numbers that can be written as fractions. Target: Los números racionales son números que pueden ser escritos como fracciones.
Source: The smallest field has only two elements: 0 and 1. Target:
"""
response = get_completion(prompt)
print(response)

El campo más pequeño tiene solo dos elementos: 0 y 1.


## Exercise 2

Try to recreate the examples we just did using a different task, different language, etc. For example, using a sentiment analysis task in a language other than English.