In [2]:
topic = "friends"
texts = []
for level in ["A1","A2","B1","B2","C1","C2"]:
    response = model.predict(
        f"Write a text about ${topic} using grammar and vocabulary according to CEFR level ${level}.",
        temperature=0.5,
        max_output_tokens=256,
        top_k=40,
        top_p=0.8,
    )
    texts.append({"level": level, "topic": topic, "text": response.text})

In [3]:
text_df = pd.DataFrame(texts)

In [4]:
text_df.head()

Unnamed: 0,level,topic,text
0,A1,friends,Friends are people who you like and spend time...
1,A2,friends,Friends are people who we like to spend time w...
2,B1,friends,Friends are people who we have a close relatio...
3,B2,friends,Friends are people who are close to us and who...
4,C1,friends,Friends are a vital part of our lives. They pr...


In [21]:
text_df.to_csv("../dat/texts.csv", index=False)

Let's create some better prompts from existing stories.

In [5]:
cefr_texts = pd.read_csv("../dat/cefr_leveled_texts.csv")
cefr_texts.head()

Unnamed: 0,text,label
0,Hi!\nI've been meaning to write for ages and f...,B2
1,﻿It was not so much how hard people found the ...,B2
2,Keith recently came back from a trip to Chicag...,B2
3,"The Griffith Observatory is a planetarium, and...",B2
4,-LRB- The Hollywood Reporter -RRB- It's offici...,B2


In [6]:
print(cefr_texts.text[20][:100])

Who says adult parties have to be boring. More and more adults are reliving their childhoods or crea


We create descriptors and story prompts by using the first 100 characters of each story

In [24]:
description = {
    "C2": "Can understand and interpret critically virtually all forms of the written language including abstract, structurally complex, or highly colloquial literary and non-literary writings. Can understand a wide range of long and complex texts, appreciating subtle distinctions of style and implicit as well as explicit meaning.",
    "C1": "Can understand in detail lengthy, complex texts, whether or not they relate to his/her own area of speciality, provided he/she can reread difficult sections.",
    "B2": "Can read with a large degree of independence, adapting style and speed of reading to different texts and purposes, and using appropriate reference sources selectively. Has a broad active reading vocabulary, but may experience some difficulty with low-frequency idioms.",
    "B1": "Can read straightforward factual texts on subjects related to his/her field and interest with a satisfactory level of comprehension.",
    "A2": "Can understand short, simple texts on familiar matters of a concrete type which consist of high frequency everyday or job-related language. Can understand short, simple texts containing the highest frequency vocabulary, including a proportion of shared international vocabulary items.",
    "A1": "Can understand very short, simple texts a single phrase at a time, picking up familiar names, words and basic phrases and rereading as required."
}

storyPrompts = [f"{text[:100]}..." for text in cefr_texts.text]

In [35]:
import time
def generate(level, storyPrompt):
  model = GenerativeModel("gemini-pro")
  print(level)
  print(storyPrompt)
  
  prompt = f"Write a story using the following prompt on CEFR level {level} (Description: {description[level]})\n\n{storyPrompt}"

  responses = model.generate_content(
    prompt,
    safety_settings={
        HarmCategory.HARM_CATEGORY_SEXUALLY_EXPLICIT: HarmBlockThreshold.BLOCK_NONE,
    },
    generation_config={
        "max_output_tokens": 1024,
        "temperature": 1,
        "top_p": 0.9,
        
    },
  stream=True,
  )

  text = ""
  for response in responses:
    try:
      text += response.candidates[0].content.parts[0].text
    except Exception as e:
      print(response.candidates)
      print(e)
      #return generate(level, storyPrompt)
  return text

num_stories = 50
random.shuffle(storyPrompts)

file_path = "../dat/generated_texts.csv"
if os.path.exists(file_path):
    existing_df = pd.read_csv(file_path)
    existing_stories = list(existing_df.story.unique())
    storyPrompts = storyPrompts[slice(0, num_stories-len(existing_stories))] + existing_stories
else:
    existing_df = pd.DataFrame(columns=["label", "story", "text"])
    

story_counts = existing_df['label'].value_counts()
for level in description.keys():
    current_count = story_counts.get(level, 0)
    stories_to_add = num_stories - current_count

    for story in storyPrompts[:stories_to_add]:
        time.sleep(20)
        text = generate(level, story)
        new_row = {"label": level, "story": story, "text": text}
        pd.DataFrame([new_row]).to_csv(file_path, mode='a', index=False, header=not os.path.exists(file_path))

A1
I heard that your brother is in the hospital.
Yeah. He's been there since last week.
Oh, no. What ha...
A1
A bridal shower is a party when a woman who is about to get married is "showered" with gifts and wel...
A1
It was late at night. The plane flew through the air. It flew through the air very fast. It flew thr...
A1
Hey, James. You want to play?
Not right now, Elizabeth.
Come on. It'll be fun.
What do you want to p...
A1
Mother's Day is a holiday that celebrates and honors mothers in the United States. It is celebrated ...
A1
﻿When Larry Pizzi, a veteran bicycle industry executive, first heard about electric bikes nearly 20 ...
A1
Did you hear about Joseph and Michelle?
No, what happened? Did they have a divorce?
Oh, no. They are...
A1
Julie was tired of living with her parents. She was 22, and just finished college. She started a new...
A1
Dora wanted to buy a card for her mother. Her mother's birthday was next week. Dora loved her mom. S...
A1
In less than a week, Turkey will h

In [14]:
text_df = pd.DataFrame(texts)
print(text_df.text[5])

George had perpetually exuded an aura of hilarity, captivating those around him with his infectious humor. Our paths crossed serendipitously at the local cinema, where I had eagerly anticipated the release of the latest Spider-Man installment. As fate would have it, George occupied the adjacent seat, and our conversation ignited with an effortless camaraderie that defied the constraints of time.

During the ensuing months, George became an integral part of my life. His presence radiated an infectious energy that transformed the mundane into the extraordinary. Whether we embarked on spontaneous road trips, engaged in intellectually stimulating debates, or simply reveled in shared moments of laughter, George possessed an uncanny ability to elevate every experience.

One particularly memorable evening, as we strolled through the picturesque streets of our quaint town, George recounted tales of his eccentric family, each anecdote punctuated with his signature wit and charm. His grandmother