# Language proficiency level

In [1]:
import sys
sys.path.append("/home/mihaelabaksic/proj/2023-languagelearning")
sys.path.append("/home/mihaelabaksic/proj/2023-languagelearning/src") 
sys.path.append("/home/mihaelabaksic/proj/2023-languagelearning/src/templates") 

%load_ext autoreload
%autoreload 2

In [2]:
from single_run_thread import SingleRunThread
from prompt_builder import PromptBuilder
from utils import load_template
import keys

import openai

In [3]:
openai.api_key = keys.OPENAI_API_KEY

Q1: How to encode language level:
- CEFR
- CEFR with description

In [8]:
cefr_decriptions = {
    'A1': 'Can understand and use familiar everyday expressions and very basic phrases aimed at the satisfaction of needs of a concrete type. Can introduce him/herself and others and can ask and answer questions about personal details such as where he/she lives, people he/she knows and things he/she has. Can interact in a simple way provided the other person talks slowly and clearly and is prepared to help.',
    'A2': 'Can understand sentences and frequently used expressions related to areas of most immediate relevance (e.g. very basic personal and family information, shopping, local geography, employment). Can communicate in simple and routine tasks requiring a simple and direct exchange of information on familiar and routine matters.  Can describe in simple terms aspects of his/her background, immediate environment and matters in areas of immediate need.',
    'B1': 'Can understand the main points of clear standard input on familiar matters regularly encountered in work, school, leisure, etc. Can deal with most situations likely to arise whilst travelling in an area where the language is spoken.  Can produce simple connected text on topics which are familiar or of personal interest. Can describe experiences and events, dreams, hopes & ambitions and briefly give reasons and explanations for opinions and plans.',
    'B2': 'Can understand the main ideas of complex text on both concrete and abstract topics, including technical discussions in his/her field of specialisation. Can interact with a degree of fluency and spontaneity that makes regular interaction with native speakers quite possible without strain for either party. Can produce clear, detailed text on a wide range of subjects and explain a viewpoint on a topical issue giving the advantages and disadvantages of various options.',
    'C1': 'Can understand a wide range of demanding, longer texts, and recognise implicit meaning. Can express him/herself fluently and spontaneously without much obvious searching for expressions. Can use language flexibly and effectively for social, academic and professional purposes. Can produce clear, well-structured, detailed text on complex subjects, showing controlled use of organisational patterns, connectors and cohesive devices.',
    'C2': 'Can understand with ease virtually everything heard or read. Can summarise information from different spoken and written sources, reconstructing arguments and accounts in a coherent presentation. Can express him/herself spontaneously, very fluently and precisely, differentiating finer shades of meaning even in more complex situations.'
}

## Generation with respect to CEFR

### No CEFR descriptions

In [13]:
cefr_prompts ={
        'A1': 'Generate 3 sentences about football using only A1 level of English, according to CEFR framework.',
        'A2': 'Generate 3 sentences about football using only A2 level of English, according to CEFR framework.',
        'B1': 'Generate 3 sentences about football using only B1 level of English, according to CEFR framework.',
        'B2': 'Generate 3 sentences about football using only B2 level of English, according to CEFR framework.',
        'C1': 'Generate 3 sentences about football using only C1 level of English, according to CEFR framework.',
        'C2': 'Generate 3 sentences about football using only C2 level of English, according to CEFR framework.'
    } 

In [10]:
t = SingleRunThread()

cefr_generated_sentences = dict()

for level in cefr_prompts:
    result = t.send(cefr_prompts[level])
    cefr_generated_sentences[level] = result

In [11]:
cefr_generated_sentences

{'A1': "1. Football is a popular sport played by two teams, each trying to score goals by kicking a ball into the opponent's net.\n2. I enjoy watching football matches on TV because they are exciting and full of action.\n3. My favorite football player is Ronaldo because he is very talented and scores a lot of goals.",
 'A2': "1. Football is a popular sport played by two teams with eleven players each, who try to score goals by kicking a ball into the opposing team's net.\n2. Players in football need to have good coordination, stamina, and teamwork skills to succeed in the game.\n3. Watching a football match can be exciting, as fans cheer for their favorite teams and celebrate when a goal is scored.",
 'B1': "1. Football is a popular sport played by two teams, each consisting of eleven players, who aim to score goals by kicking a ball into the opposing team's net.\n2. The game is fast-paced and requires good teamwork, communication, and skill. Players must pass, dribble, and shoot the b

### CEFR descriptions

In [16]:
cefr_with_descriptions_prompts ={
        'A1': 'Generate 3 sentences about football using only A1 level of English, according to CEFR framework. User speaking English at A1 level can: ' + cefr_decriptions['A1'],
        'A2': 'Generate 3 sentences about football using only A2 level of English, according to CEFR framework. User speaking English at A2 level can: ' + cefr_decriptions['A2'],
        'B1': 'Generate 3 sentences about football using only B1 level of English, according to CEFR framework. User speaking English at B1 level can: ' + cefr_decriptions['B1'],
        'B2': 'Generate 3 sentences about football using only B2 level of English, according to CEFR framework. User speaking English at B2 level can: ' + cefr_decriptions['B2'],
        'C1': 'Generate 3 sentences about football using only C1 level of English, according to CEFR framework. User speaking English at C1 level can: ' + cefr_decriptions['C1'],
        'C2': 'Generate 3 sentences about football using only C2 level of English, according to CEFR framework. User speaking English at C2 level can: ' + cefr_decriptions['C2']
    } 

In [17]:
t = SingleRunThread()

cefr_with_descriptions_generated_sentences = dict()

for level in cefr_with_descriptions_prompts:
    result = t.send(cefr_with_descriptions_prompts[level])
    cefr_with_descriptions_generated_sentences[level] = result

In [18]:
cefr_with_descriptions_generated_sentences

{'A1': '1. I like football. I watch it on TV and play with my friends.\n2. My favorite football team is Manchester United. They are very good.\n3. I go to the stadium to watch football matches. It is very exciting.',
 'A2': '1. Football is a popular sport played by teams on a large field with a round ball.\n2. I enjoy watching football matches on TV with my friends and cheering for my favorite team.\n3. Last weekend, I played football with my classmates in the park and had a lot of fun.',
 'B1': "1. I enjoy watching football matches with my friends on weekends and discussing the players' performance and strategies.\n2. Last summer, I had the opportunity to attend a live football game at a stadium, which was an incredible experience for me.\n3. In my opinion, football is not just a sport but also a great way to bring people from different cultures and backgrounds together.",
 'B2': "1. Football is a popular sport played by teams of eleven players on a rectangular field. It involves a lo

## Refinement with respect to CEFR

### No level description available

In [22]:
refinement_template = 'The teext was written by GPT using {0} CEFR level of proficiency. Refine the text to be on a {0} level: {1}'

A1 -> C2

In [26]:
a1_sentence = 'I like football. I watch it on TV and play with my friends.'

c2_sentence_refined = t.send(refinement_template.format('C2', a1_sentence))
c2_sentence_refined

'I have a great fondness for football, which I both enjoy watching on television and actively engage in by playing with my friends.'

C2 -> A1

In [27]:
c2_sentence = 'Football, known as soccer in some countries, is a globally popular sport characterized by its dynamic gameplay and passionate fanbase, with matches drawing millions of viewers from around the world.'

a1_sentence_refined = t.send(refinement_template.format('A1', c2_sentence))
a1_sentence_refined

'Football, also called soccer in some places, is a very popular sport worldwide. It has exciting gameplay and fans who love it a lot. Many people watch football matches from all over the world.'

B2 -> C1

In [29]:
b2_sentence = 'Football is a popular sport played by teams of eleven players on a rectangular field. It involves a lot of physical activity and strategy, with the goal of scoring goals by kicking a ball into the opponent\'s net.'

c1_sentence_refined = t.send(refinement_template.format('C1', b2_sentence ))
c1_sentence_refined

"Football is a widely favored team sport that consists of eleven players competing on a rectangular field. It encompasses a considerable amount of physical exertion and strategic thinking, aiming to score goals by propelling a ball into the adversary's net."

C1 -> B2

In [31]:
c1_sentence = 'The intensity and competitiveness of football matches can create an electrifying atmosphere in stadiums, with passionate fans cheering on their favorite teams.'

b2_sentence_refined = t.send(refinement_template.format('B2', c1_sentence))
b2_sentence_refined

'Football matches can generate an exhilarating atmosphere in stadiums, where passionate fans enthusiastically support their preferred teams, resulting in a thrilling experience.'

### Level description available

In [32]:
refinement_template_with_description = 'The teext was written by GPT using {0} CEFR level of proficiency. The {0} level is defined as: {1}. Refine the text to be on a {0} level: {2}'

A1 -> C2

In [33]:
a1_sentence = 'I like football. I watch it on TV and play with my friends.'

c2_sentence_refined = t.send(refinement_template_with_description.format('C2', cefr_decriptions['C2'],  a1_sentence))
c2_sentence_refined

'Football is a passion of mine that I thoroughly enjoy. I avidly follow the sport by watching matches on television and also actively participate in friendly games with my companions.'

C2 -> A1

In [36]:
c2_sentence = 'Football, known as soccer in some countries, is a globally popular sport characterized by its dynamic gameplay and passionate fanbase, with matches drawing millions of viewers from around the world.'

a1_sentence_refined = t.send(refinement_template_with_description.format('A1', cefr_decriptions['A1'],  c2_sentence))
print(a1_sentence_refined)

a1_sentence_refined_refined = t.send(refinement_template_with_description.format('A1', cefr_decriptions['A1'],  a1_sentence_refined))
print(a1_sentence_refined_refined)

Football, also known as soccer in some places, is a very popular sport worldwide. It is played with a lot of energy and has many fans who are very passionate. The matches are watched by millions of people from different countries.
Football, also known as soccer in some places, is a very popular sport all over the world. It is played with a lot of energy and has many fans who are very passionate. The matches are watched by millions of people from different countries.


B2 -> C1

In [38]:
b2_sentence = 'Football is a popular sport played by teams of eleven players on a rectangular field. It involves a lot of physical activity and strategy, with the goal of scoring goals by kicking a ball into the opponent\'s net.'

c1_sentence_refined = t.send(refinement_template_with_description.format('C1', cefr_decriptions['C1'], b2_sentence ))
c1_sentence_refined

"Football, a widely enjoyed sport, is played by teams consisting of eleven players on a rectangular field. It requires both physical prowess and strategic thinking, aiming to score goals by skillfully maneuvering a ball into the opposing team's net."

C1 -> B2

In [39]:
c1_sentence = 'The intensity and competitiveness of football matches can create an electrifying atmosphere in stadiums, with passionate fans cheering on their favorite teams.'

b2_sentence_refined = t.send(refinement_template_with_description.format('B2', cefr_decriptions['B2'], c1_sentence))
b2_sentence_refined

'Football matches can generate an exhilarating atmosphere in stadiums due to their high intensity and competitiveness. Devoted fans enthusiastically support their preferred teams, adding to the excitement.'

### Experimenting with higher temperatures for text simplification purposes

# Conclusions