In [1]:
!pip install --upgrade openai 
!pip install python-dotenv
!pip install pydantic

Collecting openai
  Downloading openai-1.55.1-py3-none-any.whl.metadata (24 kB)
Downloading openai-1.55.1-py3-none-any.whl (389 kB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m389.5/389.5 kB[0m [31m3.6 MB/s[0m eta [36m0:00:00[0ma [36m0:00:01[0m
[?25hInstalling collected packages: openai
  Attempting uninstall: openai
    Found existing installation: openai 1.55.0
    Uninstalling openai-1.55.0:
      Successfully uninstalled openai-1.55.0
Successfully installed openai-1.55.1


This Import Dependencies

In [2]:
import openai
import os
import json
import logging
import pandas as pd
from dotenv import load_dotenv
from resources.OpenAIAzureEngine import OpenAIAzureEngine
from resources.OpenAIService import OpenAIService
from resources.CambridgeAssessmentCriteria import CambridgeAssessmentCriteria
from resources.WritingCriteriaScore import WritingCriteriaScore 
from resources.WritingResult import WritingResult

In [3]:
load_dotenv()

True

In [4]:
azure_api_key = os.getenv("OPEN_AI_API_KEY")

In [5]:
engine_text=OpenAIAzureEngine(api_version="2024-08-01-preview", deployment="gpt-4o-2", api_key=azure_api_key)

service = OpenAIService(engine_text=engine_text)


In [6]:
def convert_openai_output_to_df(openai_output):
    # Extracting keys and values from the input dictionary
    criteria = list(openai_output.keys())
    scores = [openai_output[crit]['score'] for crit in criteria]
    feedback = [openai_output[crit]['feedback'] for crit in criteria]
    
    # Creating a DataFrame
    data = {
        "Criteria": criteria,
        "Score": scores,
        "Feedback": feedback
    }
    
    # Returning the DataFrame
    return pd.DataFrame(data)

In [7]:
def get_assessment_criteria() -> str:
    criteria  = {
            CambridgeAssessmentCriteria.Content.value: "Does the answer fulfill the task requirements? Is the response relevant and complete? Ensure the TaskAnswer is not just a repetition of the TaskAssessment. If it is, deduct points and provide feedback.",
            CambridgeAssessmentCriteria.CommunicativeAchievement.value: "Is the language appropriate for the task? Does the answer address the context? Check for originality and appropriate extensions of the TaskAssessment.",
            CambridgeAssessmentCriteria.Organization.value: " Is the response coherent and well-organized? Are ideas logically connected? If the answer repeats the TaskAssessment, penalize for lack of coherence.",
            CambridgeAssessmentCriteria.Language.value: " Is the grammar accurate and vocabulary appropriate for the level? Ensure that new vocabulary or grammar structures are used in the answer compared to the task.",
    }
    
    if isinstance(criteria, dict):
        return '\n\t\t'.join(f"--{key}: \"{value}\"" for key, value in criteria.items())
    return ""

In [8]:
question="""Rubric: Read this email from your English teacher, Mrs Rose, and the notes you have made.
To:
From: Mrs Rose
Subject: New film club
Body: I’d like to start an after-school Film Club.
We can meet either on Monday or Friday afternoon. Which one would you prefer?
Which types of film would you like to see?
I want to provide some food and drink during the films. Is this a good idea?
Please let me know what you think.
Many thanks!
Notes:

Note 1:
Note: That’s great
Email Part: I’d like to start an after-school Film Club.

Note 2:
Note: Explain
Email Part: We can meet either on Monday or Friday afternoon. Which one would you prefer?

Note 3:
Note: Explain which
Email Part: Which types of film would you like to see?

Note 4:
Note: Give your opinion
Email Part: I want to provide some food and drink during the films. Is this a good idea?"""

In [9]:
student_answer="""
Good afternoon Mrs Rose,

I just got your email and I think it’s really great idea because I think lots of people like watching films.

I personally would prefer Mondays because on Fridays I often have other plans with my family.

I would really love to see some detective or some action films but I relly don’t mind watching something different.

I think it’s good idea to have some food or drink during the film because lots of people are used to it because in cinemas they always eat something so i think it would be great.

Have a nice day.
"""

In [10]:
def get_user_prompt(question: str = '', answer: str = '') -> str:
    return f"""
The input provides candidates with a situation requiring them to respond to an email in approximately 100 words. The input consists of a rubric and an email containing four prompts presented as notes linked to relevant parts of the email text.

Candidates are expected to include all four content points in their response while demonstrating the ability to handle language functions typical of the B1 level, such as agreeing, disagreeing, giving opinions, offering, and explaining. The task assesses the candidate’s ability to communicate effectively in a structured, coherent, and clear manner.

Responses are evaluated using four subscales: Content, Communicative Achievement, Organisation, and Language. Candidates should ensure their response is relevant, adequately organised, and uses appropriate vocabulary and grammar for the context.

**Context:**

- The response must be evaluated based on Cambridge's B1 assessment scales. The scales consist of four subscales: **Content**, **Communicative Achievement**, **Organisation**, and **Language**.
- You are expected to provide a **score (0-5)** for each subscale and brief constructive **feedback** (no more than 5 sentences).
- The **task instructions** and a **one-shot example** of evaluations have been provided to guide your assessment.
- Use the **Cambridge Assessment B1 Preliminary handbook** and the **Common European Framework of Reference (CEFR)** to support your evaluation.
- **Identify all specific task points from the task instructions and assess whether the student's response addresses them fully and explicitly.**
- **PAY SPECIAL ATTENTION TO THE CONTENT SUBSCALE, ENSURING THAT ALL TASK POINTS ARE ADDRESSED.**
- **USE A CHECKLIST TO CONFIRM THAT ALL REQUIRED CONTENT POINTS ARE ADDRESSED:**
- **CONSIDER THE COMPLETENESS OF COMMUNICATIVE PURPOSES IN THE COMMUNICATIVE ACHIEVEMENT SUBSCALE.**
- **FOCUS ON THE CRITERIA SPECIFIC TO EACH SUBSCALE. DO NOT PENALIZE ISSUES THAT BELONG TO OTHER SUBSCALES.**
- **ASSESS LANGUAGE ERRORS UNDER THE LANGUAGE SUBSCALE UNLESS THEY SIGNIFICANTLY IMPEDE COMMUNICATION IN OTHER SUBSCALES.**
- **DO NOT LET LANGUAGE ERRORS OR CLARITY ISSUES DUE TO LANGUAGE ERRORS AFFECT THE CONTENT SCORE.**
- **UNDER THE CONTENT SUBSCALE, IF THE STUDENT ADDRESSES ALL REQUIRED CONTENT POINTS, YOU MUST ASSIGN A SCORE OF 5 IN CONTENT, REGARDLESS OF ANY OTHER ISSUES, INCLUDING LANGUAGE ERRORS, IRRELEVANCES, OR MISINTERPRETATIONS DUE TO LANGUAGE ERRORS. DO NOT DEDUCT POINTS IN THE CONTENT SUBSCALE FOR ANY REASON OTHER THAN MISSING OR INADEQUATELY ADDRESSING REQUIRED CONTENT POINTS.**
- Only reply with the assessment criteria; don't add anything else.
- The feedback is given by you.

**Content Subscale:**

- **Band 5 (Score 5):**
  - **All content is relevant to the task.**
  - **All content points from the task are addressed appropriately.**
  - **The target reader is fully informed.**
  - **IMPORTANT: IF ALL REQUIRED CONTENT POINTS ARE MENTIONED, YOU MUST ASSIGN A CONTENT SCORE OF 5, REGARDLESS OF LANGUAGE ERRORS OR CLARITY ISSUES. DO NOT DEDUCT POINTS FOR ANY OTHER REASONS IN THE CONTENT SUBSCALE.**

- **Band 4 (Score 4):**
  - Performance shares features of Bands 3 and 5.

- **Band 3 (Score 3):**
  - **Minor irrelevances and/or omissions may be present.**
  - **Most content points are addressed.**
  - **The target reader is on the whole informed.**

- **Band 2 (Score 2):**
  - Performance shares features of Bands 1 and 3.

- **Band 1 (Score 1):**
  - **Irrelevances and misinterpretation of task may be present.**
  - **Some content points are addressed.**
  - **The target reader is minimally informed.**

**Communicative Achievement Subscale:**

- **Band 5 (Score 5):**
  - **Uses the conventions of the communicative task to hold the target reader’s attention and communicate straightforward ideas.**
  - **Maintains an appropriate tone and register for the task.**

- **Band 4 (Score 4):**
  - Performance shares features of Bands 3 and 5.

- **Band 3 (Score 3):**
  - **Uses the conventions of the communicative task in generally appropriate ways to communicate straightforward ideas.**
  - **Some awareness of tone and register, which is generally appropriate.**

- **Band 2 (Score 2):**
  - Performance shares features of Bands 1 and 3.

- **Band 1 (Score 1):**
  - **Produces text that communicates simple ideas in simple ways.**
  - **Limited or inconsistent awareness of tone and register.**

**Organisation Subscale:**

- **Band 5 (Score 5):**
  - **Text is generally well-organised and coherent.**
  - **Uses a variety of linking words and cohesive devices.**

- **Band 4 (Score 4):**
  - Performance shares features of Bands 3 and 5.

- **Band 3 (Score 3):**
  - **Text is connected and coherent.**
  - **Uses basic linking words and a limited number of cohesive devices.**

- **Band 2 (Score 2):**
  - Performance shares features of Bands 1 and 3.

- **Band 1 (Score 1):**
  - **Text is connected using basic, high-frequency linking words.**
  - **Limited use of cohesive devices; ideas may be disjointed.**

**Language Subscale:**

- **Band 5 (Score 5):**
  - **Uses a range of everyday vocabulary appropriately, with occasional inappropriate use of less common lexis.**
  - **Uses a range of simple and some complex grammatical forms with a good degree of control.**
  - **Errors do not impede communication.**

- **Band 4 (Score 4):**
  - Performance shares features of Bands 3 and 5.

- **Band 3 (Score 3):**
  - **Uses everyday vocabulary generally appropriately, while occasionally overusing certain lexis.**
  - **Uses simple grammatical forms with a good degree of control.**
  - **While errors are noticeable, meaning can still be determined.**

- **Band 2 (Score 2):**
  - Performance shares features of Bands 1 and 3.

- **Band 1 (Score 1):**
  - **Uses basic vocabulary reasonably appropriately.**
  - **Uses simple grammatical forms with some degree of control.**
  - **Errors may impede meaning at times.**

**Additional Guidance:**

- **Identify the content points required by the task.**
- **Ensure that each content point is addressed in the student's response.**
- **Use a checklist to confirm that all required content points are addressed.**
- **Do NOT penalise for language errors in the Content subscale.**
- **Assess the student's ability to communicate their ideas effectively in the Communicative Achievement subscale.**
- **Evaluate the coherence and cohesion of the text in the Organisation subscale.**
- **Assess the range and accuracy of vocabulary and grammar in the Language subscale.**

    **<EXAMPLE>**
    **<TaskAssessment>**
    **Rubric:**
    Read this email from your English-speaking friend Sandy and the notes you have made.
    From: Sandy
    Subject: Your visit!
    Body: Hi,
I’m so excited that you’re coming to stay with me for a week!

On your first evening here, there’s a rock concert in our town. Would you like to go to the concert or would you prefer us to relax at home?

Also, shall we go climbing in the mountains while you’re here?

Let me know if you have any questions.

See you soon

Sandy
Notes: 

Note 1:
Note: Me too!
Email Part: I’m so excited that you’re coming to stay with me for a week!

Note 2:
Note: Say which I prefer
Email Part: Would you like to go to the concert or would you prefer us to relax at home?

Note 3:
Note: No, because …
Email Part: Also, shall we go climbing in the mountains while you’re here?

Note 4:
Note: Ask Sandy …
Email Part: Let me know if you have any questions.
    **</TaskAssessment>**
    **<TaskAnswer>**
Hi my dearest friend,

I’m excited about that as you.

Of course I can go. I don’t prefer to listen only one genre of music for too long, so, if concert will annoy me, I will go, I suppose you don’t mind. You know, I don’t depend on kinds of music, I listen what I like to, but still, I don’t mind about rock.

We can go for climbing for sure, but If you’re meaning climbing by ropes and pickaxes, I’m out. I’m afraid of heights. If you have something on your mind please offer me.

I’m looking forward for next week.

Alex
    **</TaskAnswer>**
    **<ExampleResult>**
    {{
        "Content": {{
            "score": 4,
            "feedback": "The content is relevant to the task, but the final point, which requires the candidate to ask for some information, has not been fully addressed. There is an attempt at this point, If you have something on your mind please offer me, but it is not a question that the reader needs to respond to, and it is not clearly expressed. As such, the task has not been completely fulfilled, but the target reader is on the whole informed."
        }},
        "CommunicativeAchievement": {{
            "score": 3,
            "feedback": "The conventions of an informal email to a friend, such as a friendly tone, a consistent register and appropriate opening and closing salutations, Hi my dearest friend; Of course; You know, I don’t; for sure; I’m out; I’m looking forward for next week, are used in generally appropriate ways to communicate straightforward ideas. There are some negative comments, so, if concert will annoy me, I will go, which could be expressed in a more polite way, but the general ideas are communicated."
        }},
        "Organisation": {{
            "score": 2,
            "feedback": "The text is connected and generally coherent. The four notes are addressed and paragraphs are used to generally good effect. The text is connected using high-frequency linking words, so; but; If and there is some good use of punctuation, for too long, so, if concert will annoy me, I will go."
        }},
        "Language": {{
            "score": 2,
            "feedback": "Everyday vocabulary is used generally appropriately, genre; annoy; don’t mind; pickaxes; afraid of heights, although there is some lifting from the input text, excited about that; I don’t prefer. Simple grammatical forms are used with some degree of control, including attempts at forming conditional clauses, if concert will annoy me, I will go; If you’re meaning … I’m out; If you have something on your mind, please offer me. There are a number of errors with prepositions and word choices, I don’t depend on kinds of music; for climbing; please offer me, which leads to a lack of clarity at times. For example, I will go is ambiguous as it could mean I will leave the concert or I will go to the concert, so meaning is unclear at times."
        }}
    }}
    **</ExampleResult>**
    **</EXAMPLE>**

    **Task to Evaluate:**
    **<TaskAssessment>**
    {question}
    **</TaskAssessment>**
    **<TaskAnswer>**
    {answer}
    **</TaskAnswer>**

    **Extra Information:**
    - In the output, maintain camel case and spaces format for criteria in the output.
    - Do not take into consideration or transcribe struck-through words.
    - Candidates are asked to write approximately 100 words. Writing approximately the right number of words is an integral part of task achievement. If significantly fewer words are written, this is likely to mean that the task has not been successfully completed. If significantly more words are written, there is likely to be irrelevance, repetition of ideas, or poor organisation.
    """

In [11]:
prompt = [
            {
        "role": "system",
        "content": f"""
        You are a world-class B1 Preliminary Cambridge writing examiner with over 10 years of experience.
        As a certified Cambridge English examiner, you have in-depth knowledge of the CEFR standards, especially for the B1 level.
        You provide detailed, constructive feedback that highlights strengths and areas for improvement, using clear and supportive language. Your goal is to help learners progress by offering actionable advice while encouraging their efforts.
        You are asked to evaluate students' email or letter writing in response to a task and provide feedback.
        """
             },
            {
                "role": "user",
                "content": get_user_prompt(question,student_answer)
               
            }
        ]

A Partir de Aca ejecutas SIEMPRE !!!!

In [12]:
print(prompt[0]['content'])
print(prompt[1]['content'])


        You are a world-class B1 Preliminary Cambridge writing examiner with over 10 years of experience.
        As a certified Cambridge English examiner, you have in-depth knowledge of the CEFR standards, especially for the B1 level.
        You provide detailed, constructive feedback that highlights strengths and areas for improvement, using clear and supportive language. Your goal is to help learners progress by offering actionable advice while encouraging their efforts.
        You are asked to evaluate students' email or letter writing in response to a task and provide feedback.
        

The input provides candidates with a situation requiring them to respond to an email in approximately 100 words. The input consists of a rubric and an email containing four prompts presented as notes linked to relevant parts of the email text.

Candidates are expected to include all four content points in their response while demonstrating the ability to handle language functions typical of th

In [13]:
service.temperature=0.7

In [14]:
ai_response, spent_tokens = service.parse(WritingResult,prompt=prompt)


for criterion_score in ai_response.criteria:
    print(f" {criterion_score.criteria}, Score: {criterion_score.score},   feedback: {criterion_score.feedback} \n")



#df = convert_openai_output_to_df(json.loads(ai_response))
#df

 Content, Score: 5,   feedback: All content points are addressed appropriately, and the target reader is fully informed. The student expresses enthusiasm about the Film Club, states a preferred day for meetings, suggests film genres, and provides an opinion on offering food and drinks. 

 Communicative Achievement, Score: 4,   feedback: The response maintains an appropriate tone for a communication with a teacher and successfully uses the conventions of an email. The ideas are communicated clearly, but there is room for slightly more varied expression to fully engage the reader. 

 Organisation, Score: 4,   feedback: The text is organized and coherent, with clear paragraphs that correspond to the content points. Basic linking words such as 'because' and 'so' are used effectively, though more varied cohesive devices could enhance the flow. 

 Language, Score: 3,   feedback: Everyday vocabulary is used appropriately, and there is control over simple grammatical forms. However, there are 