<a href="https://colab.research.google.com/github/iam-Dylan/automated-essay-scoring/blob/meo/llm.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

## Setup

### Install the Python SDK

The Python SDK for the Gemini API, is contained in the [`google-generativeai`](https://pypi.org/project/google-generativeai/) package. Install the dependency using pip:

In [None]:
!pip install -q -U google-generativeai

### Import packages

Import the necessary packages.

In [None]:
import pathlib
import textwrap

import google.generativeai as genai

from IPython.display import display
from IPython.display import Markdown

import re
import numpy as np
import pandas as pd

from google.colab import userdata


### Setup  API key

Before you can use the Gemini API, you must first obtain an API key. If you don't already have one, create a key with one click in Google AI Studio.

<a class="button button-primary" href="https://makersuite.google.com/app/apikey" target="_blank" rel="noopener noreferrer">Get an API key</a>

In [None]:
genai.configure(api_key='AIzaSyDirbWdDtNLnzPl8tJMQGyqAAWin4nYaY4')

## L·ª±a ch·ªçn model

Now you're ready to call the Gemini API. Use `list_models` to see the available Gemini models:

* `gemini-1.5-pro`: optimized for high intelligence tasks, the most powerful Gemini model
* `gemini-1.5-flash`: optimized for multi-modal use-cases where speed and cost are important

In [None]:
for m in genai.list_models():
  if 'generateContent' in m.supported_generation_methods:
    print(m.name)

models/gemini-1.0-pro
models/gemini-1.0-pro-001
models/gemini-1.0-pro-latest
models/gemini-1.0-pro-vision-latest
models/gemini-1.5-flash
models/gemini-1.5-flash-001
models/gemini-1.5-flash-latest
models/gemini-1.5-pro
models/gemini-1.5-pro-001
models/gemini-1.5-pro-latest
models/gemini-pro
models/gemini-pro-vision


Note: For detailed information about the available models, including their capabilities and rate limits, see [Gemini models](https://ai.google.dev/models/gemini). There are options for requesting [rate limit increases](https://ai.google.dev/docs/increase_quota). The rate limit for Gemini-Pro models is 60 requests per minute (RPM).

The `genai` package also supports the PaLM  family of models, but only the Gemini models support the generic, multimodal capabilities of the `generateContent` method.

In [None]:
model = genai.GenerativeModel('gemini-1.5-flash')

Ch·ªçn model `gemini-1.5-flash`

The `generate_content` method can handle a wide variety of use cases, including multi-turn chat and multimodal input, depending on what the underlying model supports. The available models only support text and images as input, and text as output.

In the simplest case, you can pass a prompt string to the <a href="https://ai.google.dev/api/python/google/generativeai/GenerativeModel#generate_content"><code>GenerativeModel.generate_content</code></a> method:

In [None]:
def to_markdown(text):
  text = text.replace('‚Ä¢', '  *')
  return Markdown(textwrap.indent(text, '> ', predicate=lambda _: True))

ƒê·ªçc file

In [None]:
FILEID = '1hUhF4f-gGTixo_-b-ytez01_swNBslIG'
url = f"https://drive.google.com/uc?export=download&id={FILEID}"
# ƒê·ªçc t·ªáp CSV t·ª´ URL
try:
    data = pd.read_csv(url)
    data.head()
except Exception as e:
    print(f"ƒê√£ x·∫£y ra l·ªói: {e}")
train = data.copy()
train.head()

Unnamed: 0,essay_id,full_text,score
0,000d118,Many people have car where they live. The thin...,3
1,000fe60,I am a scientist at NASA that is discussing th...,3
2,001ab80,People always wish they had the same technolog...,4
3,001bdc0,"We all heard about Venus, the planet without a...",4
4,002ba53,"Dear, State Senator\n\nThis is a letter to arg...",3


#### X√¢y d·ª±ng c√°c prompt c∆° b·∫£n

---



ƒê·∫ßu ti√™n, ta g√°n vai tr√≤ cho model

In [None]:
messages = []
messages = [
    {'role':'user',
     'parts': ["You are a teacher in high school. you will score this essay below. Are you ready"]}
]
response = model.generate_content(messages)
messages.append({'role':'model',
                 'parts':[response.text]})

to_markdown(response.text)

> Please provide me with the essay you want me to score. I'm ready to evaluate it!  üòä 
> 
> To help me give you the most accurate feedback, please also tell me:
> 
> * **What is the essay prompt?** 
> * **What grade level is this for?** 
> * **What are the specific criteria you want me to focus on?** (e.g., grammar, organization, analysis, etc.) 
> 
> I look forward to helping you! 


**Tr·∫£ l·ªùi c√¢u h·ªèi**  'What is the essay topic?': \\
V√¨ topic trong b√†i vƒÉn kh√¥ng b·ªã gi·ªõi h·∫°n ch·ªß ƒë·ªÅ n√™n ta vi·∫øt prompt y√™u c·∫ßu model t·ª± t√¨m ra ch·ªß ƒë·ªÅ.

In [None]:
messages.append({'role':'user',
     'parts': [f"""- The topic is multidisciplinary. You have to find out topic in each essay.
                    """]})
response = model.generate_content(messages)
messages.append({'role':'model',
                 'parts':[response.text]})

to_markdown(response.text)

> Please provide me with the essay you want me to analyze. I need the text of the essay to identify the topic and its multidisciplinary aspects.  
> 
> Once you provide the essay, I will do my best to:
> 
> * **Identify the main topic:** I will look for the central theme, idea, or argument that the essay explores.
> * **Analyze its multidisciplinary nature:** I will examine how the essay draws on concepts, theories, and perspectives from different academic disciplines. 
> * **Explain how these disciplines connect:** I will show how the different disciplines contribute to a deeper understanding of the topic.
> 
> I'm ready to help you understand the essay's multidisciplinary approach! 


Tr∆∞·ªõc khi ƒë∆∞a ra c√°c ti√™u ch√≠ v·ªÅ thang ƒëi·ªÉm c≈©ng nh∆∞ ƒë·ªãnh d·∫°ng mong mu·ªën, ta s·∫Ω cho v√†o m·ªôt b√†i vƒÉn v√† xem response tr·∫£ v·ªÅ nh∆∞ th·∫ø n√†o

In [None]:
messages.append({'role':'user',
                'parts':[f"""Essay: {train['full_text'][3]}."""]})
response = model.generate_content(messages,
                                generation_config=genai.types.GenerationConfig(
                                max_output_tokens=20,
                                temperature=0.7))

In [None]:
print(response.text)
messages.append({'role':'model',
                 'parts':[response.text]})

This essay explores the topic of **Venus as a potential target for future scientific exploration**, despite its hostile


K·∫øt qu·∫£ tr√™n cho th·∫•y model ƒë√£ kh√¥ng tr·∫£ v·ªÅ ƒëi·ªÉm s·ªë nh∆∞ ta mong ƒë·ª£i, b·ªüi v√¨ ta ch∆∞a g·ª£i √Ω cho output c≈©ng nh∆∞ chi ti·∫øt v·ªÅ c√°c ti√™u ch√≠ ƒëi·ªÉm.

ƒê·ªÉ y√™u c·∫ßu ƒë·∫ßu ra mong mu·ªën, ta s·∫Ω √°p ·ª•ng kƒ© thu·∫≠t zero shot.


## C√°c kƒ© thu·∫≠t s·ª≠ d·ª•ng

### Zero shot

Zero shot la gi...

In [None]:

criteria = '''After reading each essay and completing the analytical rating form, assign a holistic score based on the rubric
below. For the following evaluations you will need to use a grading scale between 1 (minimum) and 6
(maximum). As with the analytical rating form, the distance between each grade (e.g., 1-2, 3-4, 4-5) should be
considered equal.
SCORE OF 6: An essay in this category demonstrates clear and consistent mastery, although it may have a
few minor errors. A typical essay effectively and insightfully develops a point of view on the issue and
demonstrates outstanding critical thinking; the essay uses clearly appropriate examples, reasons, and other
evidence taken from the source text(s) to support its position; the essay is well organized and clearly focused,
demonstrating clear coherence and smooth progression of ideas; the essay exhibits skillful use of language,
using a varied, accurate, and apt vocabulary and demonstrates meaningful variety in sentence structure; the
essay is free of most errors in grammar, usage, and mechanics.
SCORE OF 5: An essay in this category demonstrates reasonably consistent mastery, although it will have
occasional errors or lapses in quality. A typical essay effectively develops a point of view on the issue and
demonstrates strong critical thinking; the essay generally using appropriate examples, reasons, and other
evidence taken from the source text(s) to support its position; the essay is well organized and focused,
demonstrating coherence and progression of ideas; the essay exhibits facility in the use of language, using
appropriate vocabulary demonstrates variety in sentence structure; the essay is generally free of most errors in
grammar, usage, and mechanics.
SCORE OF 4: An essay in this category demonstrates adequate mastery, although it will have lapses in
quality. A typical essay develops a point of view on the issue and demonstrates competent critical thinking; the
essay using adequate examples, reasons, and other evidence taken from the source text(s) to support its
position; the essay is generally organized and focused, demonstrating some coherence and progression of ideas
exhibits adequate; the essay may demonstrate inconsistent facility in the use of language, using generally
appropriate vocabulary demonstrates some variety in sentence structure; the essay may have some errors in
grammar, usage, and mechanics.
SCORE OF 3: An essay in this category demonstrates developing mastery, and is marked by ONE OR
MORE of the following weaknesses: develops a point of view on the issue, demonstrating some critical
thinking, but may do so inconsistently or use inadequate examples, reasons, or other evidence taken from the
source texts to support its position; the essay is limited in its organization or focus, or may demonstrate some
lapses in coherence or progression of ideas displays; the essay may demonstrate facility in the use of language,
but sometimes uses weak vocabulary or inappropriate word choice and/or lacks variety or demonstrates
problems in sentence structure; the essay may contain an accumulation of errors in grammar, usage, and
mechanics.
SCORE OF 2: An essay in this category demonstrates little mastery, and is flawed by ONE OR MORE of
the following weaknesses: develops a point of view on the issue that is vague or seriously limited, and
demonstrates weak critical thinking; the essay provides inappropriate or insufficient examples, reasons, or
other evidence taken from the source text to support its position; the essay is poorly organized and/or focused,
or demonstrates serious problems with coherence or progression of ideas; the essay displays very little facility
in the use of language, using very limited vocabulary or incorrect word choice and/or demonstrates frequent
problems in sentence structure; the essay contains errors in grammar, usage, and mechanics so serious that
meaning is somewhat obscured.
SCORE OF 1: An essay in this category demonstrates very little or no mastery, and is severely flawed by
ONE OR MORE of the following weaknesses: develops no viable point of view on the issue, or provides little
or no evidence to support its position; the essay is disorganized or unfocused, resulting in a disjointed or
incoherent essay; the essay displays fundamental errors in vocabulary and/or demonstrates severe flaws in
sentence structure; the essay contains pervasive errors in grammar, usage, or mechanics that persistently
interfere with meaning'''

In [None]:
messages.append({'role':'user',
     'parts': [f"Follow this criteria {criteria} to score essay. "]})
response = model.generate_content(messages)
messages.append({'role':'model',
                 'parts':[response.text]})

to_markdown(response.text)

> Okay, I can help you score the essay based on your rubric. Here's a breakdown:
> 
> **Strengths:**
> 
> * **Clear Focus:** The essay clearly focuses on the potential for studying Venus, even with its challenging environment.
> * **Evidence from Source:** The essay uses specific examples from the source text to support its argument (e.g., the description of Venus's atmosphere, NASA's proposed blimp-like vehicle).
> * **Concession:** The essay acknowledges the challenges of studying Venus, demonstrating a more nuanced understanding of the topic. 
> * **Logical Progression:**  The essay follows a logical progression of ideas, moving from the difficulties of Venus to the potential solutions and the importance of scientific exploration.
> 
> **Weaknesses:**
> 
> * **Lack of Depth:** While the essay mentions multidisciplinary aspects, it doesn't delve deeply into how different disciplines would contribute to studying Venus. It would benefit from specific examples of disciplines and their relevant contributions. 
> * **Repetitive Language:** Some phrases are repeated (e.g., "the author") which could be replaced with more varied language. 
> * **Minor Grammar Errors:**  There are some minor grammatical errors (e.g., "despite of" instead of "despite").
> 
> **Overall Score:**
> 
> Considering the strengths and weaknesses, I would assign this essay a **score of 4**.  It demonstrates adequate mastery of the topic, but it lacks the depth and linguistic sophistication to reach a higher score. 
> 
> **Recommendations for Improvement:**
> 
> * **Expand on Multidisciplinary Aspects:**  Explore how disciplines like astronomy, geology, atmospheric science, engineering, and even biology might contribute to studying Venus. 
> * **Strengthen Analysis:**  Connect the evidence to a stronger argument about why studying Venus is important, even in the face of challenges. 
> * **Refine Language:**  Use varied and precise language to express ideas more effectively and avoid repetition.
> * **Proofread Carefully:**  Pay attention to grammar and mechanics before submitting the essay. 


B√¢y gi·ªù, ta c√≥ c√≥ ƒë∆∞·ª£c score nh∆∞ mong mu·ªën. \\
Tuy nhi√™n, ·ªü l·ªánh prompt tr√™n, ta ch·ªâ y√™u c·∫ßu output l√† score nh∆∞ng output tr·∫£ v·ªÅ c√≥ c·∫£ nh·∫≠n x√©t v√¨ model ch∆∞a bi·∫øt ƒë∆∞·ª£c output m·∫´u c√≥ d·∫°ng nh∆∞ th·∫ø n√†o v√† trong l·ªánh prompt tr√™n em c·ªë t√¨nh kh√¥ng gi·ªõi h·∫°n s·ªë token tr·∫£ v·ªÅ, v√¨ th·∫ø output kh√° d√†i.

### One shot, Few shot

one shot - few shot la gi, uu nhuoc

https://www.e2enetworks.com/blog/a-guide-to-prompt-engineering-from-zero-shot-to-chain-of-thought

https://viblo.asia/p/tat-tan-tat-nhung-ki-thuat-prompt-engineering-huu-ich-nhat-cho-chatgpt-bXP4WzmqV7G#_1-zero-shot-learning-9

In [None]:
examples = [
            {'Essay':train['full_text'][0],
             'Response': """
             - Topic: Car-free communities and their benefits
             - Comment: The essay expresses a clear opinion, but lacks strong organization and uses informal language.  It needs more evidence and a stronger argument.
             - Score: 3"""},

            {'Essay':train['full_text'][1],
             'Response': """
            - Topic: The "Face on Mars" is a landform
            - Comment: The essay argues the point but lacks strong evidence and organization. It uses informal language and repetitive statements.
            - Score: 3"""},

              {'Essay':train['full_text'][2],
             'Response': """
           - Topic: The risks of driverless cars
           - Comment: The essay presents a clear argument against driverless cars, but lacks strong evidence and could benefit from more concrete examples.
           - Score: 4""" }
            ]


√Åp d·ª•ng c√°c v√≠ d·ª• m·∫´u `examples` trong prompt ƒë·ªÉ ra ƒë·ªãnh d·∫°ng mong mu·ªën

In [None]:
messages.append({'role':'user',
                'parts':[f"""Out put must be 3 lines like this. Sticky to above format. Do you understand ?
                             {examples[0]['Response']}"""]})
response = model.generate_content(messages,
                                generation_config=genai.types.GenerationConfig(
                                max_output_tokens=10,
                                temperature=0.7))

print(response.text)

You got it! I understand. I will provide


In [None]:
messages.append({'role':'model',
                 'parts':[response.text]})

In [None]:
messages.append({'role':'user',
                'parts':[f"""Score this essay: {train['full_text'][3]}.
                            Response
                            - Topic: topic of essay (max 15 tokens)
                            - Explanation: your comment (max 40 tokens)
                            - Score: score you grade (from 1 to 6)

                            Example:
                            {examples[0]['Essay']}
                            Response: {examples[0]['Response']}
                            {examples[1]['Essay']}
                            Response: {examples[1]['Response']}
                            {examples[2]['Essay']}
                            Response: {examples[2]['Response']}"""]})
response = model.generate_content(messages,
                                generation_config=genai.types.GenerationConfig(
                                max_output_tokens=55,
                                temperature=0.7))

print(response.text)

Response:
- Topic: Studying Venus
- Comment:  The essay is well-organized and focuses on the importance of studying Venus. It uses relevant examples but could benefit from more in-depth analysis and stronger language.
- Score: 4 



Khi model tr·∫£ v·ªÅ ƒë√∫ng ƒë·ªãnh d·∫°ng output, ta s·∫Ω y√™u c·∫ßu model gi·ªØ nguy√™n ƒë·ªãnh d·∫°ng cho c√°c b√†i sau. B·ªüi v√¨ AI ƒë√¥i l√∫c kh√¥ng tr√°nh kh·ªèi sai s√≥t d√π ta ƒë√£ format ch·∫∑t ch·∫Ω b·∫±ng c√°c v√≠ d·ª• minh h·ªça (fewshot).

In [None]:
messages.append({'role':'model',
                 'parts':[response.text]})
messages.append({'role':'user',
                'parts':[f"This format of output is exactly what i want. Keep it!!!"]
                })


In [None]:
response = model.generate_content(messages,
                                generation_config=genai.types.GenerationConfig(
                                max_output_tokens=10,
                                temperature=0.7))

print(response.text)

You got it! I will stick to this format


Khi ƒë∆∞·ª£c y√™u c·∫ßu theo ƒë·ªãnh d·∫°ng b√™n tr√™n, model ƒë√£ tr·∫£ v·ªÅ "ok"

Th·ª≠ nghi·ªám v·ªõi m·ªôt v√†i b√†i text ƒë·ªÉ ki·ªÉm ch·ª©ng.

In [None]:
messages.append({'role':'model',
                 'parts':[response.text]})

# Th·ª≠ tr√™n essay[5]
messages.append({'role':'user',
                'parts':[f"""Score this essay{train['full_text'][5]}.
                            Response
                            - Topic: topic of essay (max 15 tokens)
                            - Explanation: your comment (max 40 tokens)
                            - Score: score you grade (from 1 to 6)
                            Example:
                            {examples[0]['Essay']}
                            Response: {examples[0]['Response']}
                            {examples[1]['Essay']}
                            Response: {examples[1]['Response']}
                            {examples[2]['Essay']}
                            Response: {examples[2]['Response']}"""]})
response = model.generate_content(messages,
                                generation_config=genai.types.GenerationConfig(
                                max_output_tokens=55,
                                temperature=0.7))

print(response.text)

Response:
- Topic: Abolishing the Electoral College
- Comment: The essay presents a clear argument for abolishing the Electoral College but lacks strong supporting evidence and analysis.
- Score: 3 



In [None]:
# test tr√™n essay[ 6 : 11]
messages.append({'role':'model',
                 'parts':[response.text]})
scores = []
for e in train['full_text'][6:11]:
    messages.append({'role':'user',
                 'parts':[f"""Score this essay{e}.
                             Response
                            - Topic: topic of essay (max 15 tokens)
                            - Explanation: your comment (max 40 tokens)
                            - Score: score you grade (from 1 to 6)
                            Example:
                            {examples[0]['Essay']}
                            Response: {examples[0]['Response']}
                            {examples[1]['Essay']}
                            Response: {examples[1]['Response']}
                            {examples[2]['Essay']}
                            Response: {examples[2]['Response']}"""]})
    response = model.generate_content(messages,
                                  generation_config=genai.types.GenerationConfig(
                                  max_output_tokens=55,
                                  temperature=0.7))

    print(response.text)
    score_line = response.text.split('\n')[3]
    s = re.findall(r'\d+', score_line)[0]
    scores.append(s)
    messages.append({'role':'model',
                 'parts':[response.text]})

Response:
- Topic: Face-recognizing computers in education
- Explanation: The essay lacks a clear argument and evidence. The writing is informal and contains errors. 
- Score: 2 

Response:
- Topic: The Seagoing Cowboys Program
- Explanation: The essay lacks a clear thesis statement and organization. It is repetitive and uses informal language. 
- Score: 2 

Response:
- Topic: Exploring Venus
- Explanation: The essay lacks a clear thesis and organization. The writing is informal and contains errors. 
- Score: 2 



Response:
- Topic: Benefits of the Seagoing Cowboys Program
- Explanation: The essay lacks a clear thesis and organization. It is repetitive and uses informal language. 
- Score: 2 

Response:
- Topic: Dangers of Driverless Cars
- Explanation: The essay presents a basic argument against driverless cars, but lacks strong evidence and organization. It uses informal language. 
- Score: 3 



## Test model

B√¢y gi·ªù, ta s·∫Ω test th√¢t. \\
V√¨ model mi·ªÖn ph√≠ n√™n s·∫Ω c√≥ gi·ªõi h·∫°n v·ªÅ l∆∞·ª£ng token. ƒê·ªÉ ch·∫Øc ch·∫Øn ch∆∞∆°ng tr√¨nh kh√¥ng b·ªã crash khi ch·∫°y, ta s·∫Ω ch·ªâ th·ª≠ nghi·ªám tr√™n 64 b√†i vƒÉn.

#### Th·ª≠ nghi·ªám tr√™n 100 essay

In [None]:
messages.append({'role':'model',
                 'parts':[response.text]})
scores = []
for e in train['full_text'][:50]:
    messages.append({'role':'user',
                 'parts':[f"""Score this essay{e}.
                             Response
                            - Topic: topic of essay (max 15 tokens)
                            - Explanation: your comment (max 40 tokens)
                            - Score: score you grade (from 1 to 6)"""]})
    response = model.generate_content(messages,
                                  generation_config=genai.types.GenerationConfig(
                                  max_output_tokens=60,
                                  temperature=0.7))

    print(response.text)
    score_line = response.text.split('\n')[3]
    s = re.findall(r'\d+', score_line)[0]
    scores.append(s)
    messages.append({'role':'model',
                 'parts':[response.text]})

Response:
- Topic: Car-free communities
- Comment: The essay expresses a clear opinion but lacks strong organization and uses informal language. It needs more evidence and a stronger argument. 
- Score: 3 

Response:
- Topic: The "Face on Mars" is a landform
- Comment: The essay argues the point but lacks strong evidence and organization. It uses informal language and repetitive statements. 
- Score: 3 

Response:
- Topic: The risks of driverless cars
- Comment: The essay presents a clear argument against driverless cars, but lacks strong evidence and could benefit from more concrete examples.
- Score: 4 

Response:
- Topic: Studying Venus
- Comment: The essay is well-organized and focuses on the importance of studying Venus. It uses relevant examples but could benefit from more in-depth analysis and stronger language.
- Score: 4 

Response:
- Topic: Keeping the Electoral College
- Comment: The essay presents arguments in favor of the Electoral College but lacks clear organization and 



TooManyRequests: 429 POST https://generativelanguage.googleapis.com/v1beta/models/gemini-1.5-flash:generateContent?%24alt=json%3Benum-encoding%3Dint: Resource has been exhausted (e.g. check quota).

### ƒê√°nh gi√° model

In [None]:
score_50 = np.array(scores, dtype=int)
score_50.shape

(33,)

In [None]:
score_33 = np.array(scores, dtype=int)
test_33 = train['score'][:33]
from sklearn.metrics import cohen_kappa_score

def quadratic_weighted_kappa(y_true, y_pred):
  return cohen_kappa_score(y_true, y_pred, weights='quadratic')

kappa = quadratic_weighted_kappa(test_33, score_33)
kappa


0.5562632696390657