# Генерация датасетов моделью Mistral 7B

In [1]:
EXAMPLE_ID = 0
MODEL_FILE = '../models/mistral-7b-instruct-v0.2.Q5_K_M.gguf'
N_CONTEXT = 32768
MODEL = 'mistral-7b-instruct-v0.2.Q5_K_M'
START = 1000
OUT_COUNT = 500
OUT_FILE_PREFIX = 'mistral_essays'

### Исходный датасет, написанный человеком

In [2]:
import pandas as pd

df = pd.read_csv("../datasets/human_essays.csv", encoding="utf-8")
df.head(10)

Unnamed: 0,title,text
0,12 Years a Slave: An Analysis of the Film Essay,The 2013 film 12 Years a Slave proved that sla...
1,20+ Social Media Post Ideas to Radically Simpl...,Social Media Examiner’s (2021) video on social...
2,533 U.S. 27 (2001) Kyllo v. United States: The...,Table of Contents\n 1. Facts\n 2. Issue\n 3. H...
3,A Charles Schwab Corporation Case Essay,Charles Schwab is a for-profit Corporation who...
4,A Clinical Office Assistant’s Attire Research ...,The work of a clinical or medical office worke...
5,A Comic Science Fiction Film “Back to the Futu...,Table of Contents\n 1. Introduction\n 2. Movie...
6,A Community Yard Sale as a Memorable Event Essay,Autumn is not generally viewed as an appropria...
7,A Complex in the “Every Secret Thing” Film by ...,Carl Jung identifies the mother complex with a...
8,A Customer Told to Wear a Mask Threw Snow at a...,The article describes a recent Chicago hot dog...
9,A Disconnect Between Public Transportation and...,One realistic way to commute is using private ...


#### Пример:

In [3]:
print(f"{df.iloc[EXAMPLE_ID,0]}\n\n{df.iloc[EXAMPLE_ID,1]}")

12 Years a Slave: An Analysis of the Film Essay

The 2013 film 12 Years a Slave proved that slavery is a worldwide issue. Indeed, the film made $150 million outside the United States and $57 million in the U.S., with a production budget of $20 million (Sharf, 2020). The movie was based on the memoir Twelve Years a Slave by Solomon Northup (Ntim, 2020). It tells the story of a free African American man who was kidnapped and sold into slavery. Solomon spent twelve years away from his family, being traded from one master to another. Fortunately, the protagonist met a person who helped him deliver a message to his family and friends, who came and rescued him. This movie accurately illustrates discriminatory relationships between white slaveholders and black slaves that stemmed from the dysfunctional system in the country and prejudices in people’s mindsets at that time.

The two main ethnic groups presented in this film are White and African Americans, and the three social groups are afflu

### Утилиты

In [4]:
import math


def round_word_count(c):
    return math.floor(c / 100) * 100

def print_example(data, example_id):
    if 'prompt' in data.iloc[example_id]:
        print(f"Prompt {'=' * 93}\n{data.iloc[example_id]['prompt']}")
        print(f"{'=' * 100}")
        print(f"{data.iloc[example_id]['title']}\n\n{data.iloc[example_id]['text']}")
        print(f"{'=' * 100}")
    if 'prompt_2' in data.iloc[example_id]:
        print(f"Prompt 1 {'=' * 93}\n{data.iloc[example_id]['prompt_1']}")
        print(f"{'=' * 100}")
        print(f"{data.iloc[example_id]['response']}")
        print(f"Prompt 2 {'=' * 93}\n{data.iloc[example_id]['prompt_2']}")
        print(f"{'=' * 100}")
        print(f"{data.iloc[example_id]['title']}\n\n{data.iloc[example_id]['text']}")
        print(f"{'=' * 100}")

### Подготовка модели

In [5]:
from llama_cpp import Llama
import re
import os

In [6]:
def load_model(model_path, n_ctx, n_gpu_layers=-1):
    llm = Llama(
        model_path=model_path,
        n_ctx=n_ctx,
        n_gpu_layers=n_gpu_layers,
        n_batch=1024,
        n_threads=12,
        use_mlock=True,
        use_mmap=True,
        metal=True,
        verbose=False
    )
    
    return llm

def generate_response(llm, prompt, max_tokens=4096, temperature=0.7, top_p=0.9, top_k=40, repeat_penalty=1.1):
    output = llm(
        prompt,
        max_tokens=max_tokens,
        temperature=temperature,
        top_p=top_p,
        top_k=top_k,
        repeat_penalty=repeat_penalty,
    )
    
    return output['choices'][0]['text']

def word_count(text):
    words = re.findall(r'\b\w+\b', text)
    return round_word_count(len(words))

In [7]:
llm = load_model(MODEL_FILE, n_ctx=N_CONTEXT)

ggml_metal_init: skipping kernel_get_rows_bf16                     (not supported)
ggml_metal_init: skipping kernel_set_rows_bf16                     (not supported)
ggml_metal_init: skipping kernel_mul_mv_bf16_f32                   (not supported)
ggml_metal_init: skipping kernel_mul_mv_bf16_f32_c4                (not supported)
ggml_metal_init: skipping kernel_mul_mv_bf16_f32_1row              (not supported)
ggml_metal_init: skipping kernel_mul_mv_bf16_f32_l4                (not supported)
ggml_metal_init: skipping kernel_mul_mv_bf16_bf16                  (not supported)
ggml_metal_init: skipping kernel_mul_mv_id_bf16_f32                (not supported)
ggml_metal_init: skipping kernel_mul_mm_bf16_f32                   (not supported)
ggml_metal_init: skipping kernel_mul_mm_id_bf16_f16                (not supported)
ggml_metal_init: skipping kernel_flash_attn_ext_bf16_h64           (not supported)
ggml_metal_init: skipping kernel_flash_attn_ext_bf16_h80           (not supported)
ggml

### Вариант 1. Prompt-based generation

```
[INST]
Write an essay of about {} words on the topic '{}'.
Start directly with the first sentence of the essay. Absolutely no title or heading of any kind. 
Output only the essay text.
[/INST]
```

In [8]:
def generate(model, model_name, df, prompt_raw, start=0, n=5000):
    rows = []
    for i in range(start, start + n):
        title = df.loc[i, "title"]
        length = word_count(df.loc[i, "text"])
    
        prompt_formatted = prompt_raw.format(length, title)

        try:
            response = generate_response(
                model,
                prompt_formatted,
                max_tokens=4096,
                temperature=0.7,
            )
        except:
            continue
            
        response = response.lstrip()
    
        rows.append({"title": title, "prompt": prompt_formatted, "text": response, "model": model_name})

        if i % (n / 10) == 0:
            print(f"{100 * i / n}% done")
    
    return pd.DataFrame(rows)

In [9]:
prompt = """[INST] 
Write an essay of about {} words on the topic '{}'.
Start directly with the first sentence of the essay. Absolutely no title or heading of any kind. 
Output only the essay text. 
[/INST]"""

df1 = generate(llm, MODEL, df, prompt, n=OUT_COUNT)
df1.head()

0.0% done
10.0% done
20.0% done
30.0% done
40.0% done
50.0% done
60.0% done
70.0% done
80.0% done
90.0% done


Unnamed: 0,title,prompt,text,model
0,12 Years a Slave: An Analysis of the Film Essay,[INST] \nWrite an essay of about 600 words on ...,"In the annals of cinematic history, few films ...",mistral-7b-instruct-v0.2.Q5_K_M
1,20+ Social Media Post Ideas to Radically Simpl...,[INST] \nWrite an essay of about 300 words on ...,"In today's digital world, social media has bec...",mistral-7b-instruct-v0.2.Q5_K_M
2,533 U.S. 27 (2001) Kyllo v. United States: The...,[INST] \nWrite an essay of about 300 words on ...,"In the landmark case 533 U.S. 27 (2001), Kyllo...",mistral-7b-instruct-v0.2.Q5_K_M
3,A Charles Schwab Corporation Case Essay,[INST] \nWrite an essay of about 300 words on ...,In the dynamic world of finance and investment...,mistral-7b-instruct-v0.2.Q5_K_M
4,A Clinical Office Assistant’s Attire Research ...,[INST] \nWrite an essay of about 400 words on ...,In the intricately woven fabric of a healthcar...,mistral-7b-instruct-v0.2.Q5_K_M


#### Пример:

In [10]:
print_example(df1, EXAMPLE_ID)

[INST] 
Write an essay of about 600 words on the topic '12 Years a Slave: An Analysis of the Film Essay'.
Start directly with the first sentence of the essay. Absolutely no title or heading of any kind. 
Output only the essay text. 
[/INST]
12 Years a Slave: An Analysis of the Film Essay

In the annals of cinematic history, few films have managed to elicit such a profound emotional response from audiences as Steve McQueen's "12 Years a Slave." Released in 2013, this powerful and unflinching drama recounts the true story of Solomon Northup, a free black man from New York who was kidnapped and sold into slavery in the pre-Civil War South. The film's haunting portrayal of human bondage serves as an unsettling reminder of a dark chapter in American history, inviting viewers to confront the harsh realities of slavery and its enduring impact on individuals and society.

At the heart of "12 Years a Slave" is Chiwetel Ejiofor's riveting performance as Solomon Northup, a talented musician and e

In [11]:
file_path = f"../datasets/{OUT_FILE_PREFIX}_1.csv"

df1.to_csv(
    file_path, 
    mode="a",
    header=not os.path.exists(file_path),
    index=False,
    encoding="utf-8",
)

### Вариант 2. Prompt-based Style Transfer

```
[INST]
Write an essay of about {} words on the topic '{}', following the stylistic and structural conventions of IvyPanda essays.

STYLE REQUIREMENTS (IvyPanda-like):
1. The tone must be academic, clear, neutral, and objective.
2. The essay should include background context, a clear analytical structure, and well-developed paragraphs.
3. Arguments must be supported with reasoning, concise explanations, and relevant examples.
4. The writing must be coherent, logically progressive, and easy to read.
5. Avoid emotional language or overly dramatic phrasing.

STRUCTURE REQUIREMENTS:
1. Divide the essay into several sections, each starting with a plain-text subheading.
2. Subheadings must contain only letters and spaces — no punctuation, no symbols, and absolutely no Markdown formatting.
   - This means: no asterisks, no bold, no italics, no underscores, no brackets.
   - Subheadings must appear as plain text lines, e.g. Background Context
3. Each subheading must be on its own line with exactly one blank line before and after it.
4. No overall essay title — begin directly with the first subheading.
5. Output must be plain text only (no Markdown, no asterisks, no bold, no italics, no hashtags, no special formatting).

OPTIONAL ELEMENTS:
You may include a plain-text references section at the end if appropriate to the topic, but this is optional.

Begin with the first subheading, then continue with the first paragraph.
Output only the essay text.
[/INST]
```

In [12]:
def generate(model, model_name, df, prompt_raw, start=0, n=5000):
    rows = []
    for i in range(start, start + n):
        title = df.loc[i, "title"]
        length = word_count(df.loc[i, "text"])
    
        prompt_formatted = prompt_raw.format(length, title)

        try:
            response = generate_response(
                model,
                prompt_formatted,
                max_tokens=4096,
                temperature=0.7,
            )
        except:
            continue
            
        response = response.lstrip()
    
        rows.append({"title": title, "prompt": prompt_formatted, "text": response, "model": model_name})

        if i % (n / 10) == 0:
            print(f"{100 * i / n}% done")
    
    return pd.DataFrame(rows)

In [13]:
prompt = """[INST]
Write an essay of about {} words on the topic '{}', following the stylistic and structural conventions of IvyPanda essays.

STYLE REQUIREMENTS (IvyPanda-like):
1. The tone must be academic, clear, neutral, and objective.
2. The essay should include background context, a clear analytical structure, and well-developed paragraphs.
3. Arguments must be supported with reasoning, concise explanations, and relevant examples.
4. The writing must be coherent, logically progressive, and easy to read.
5. Avoid emotional language or overly dramatic phrasing.

STRUCTURE REQUIREMENTS:
1. Divide the essay into several sections, each starting with a plain-text subheading.
2. Subheadings must contain only letters and spaces — no punctuation, no symbols, and absolutely no Markdown formatting.
   - This means: no asterisks, no bold, no italics, no underscores, no brackets.
   - Subheadings must appear as plain text lines, e.g. Background Context
3. Each subheading must be on its own line with exactly one blank line before and after it.
4. No overall essay title — begin directly with the first subheading.
5. Output must be plain text only (no Markdown, no asterisks, no bold, no italics, no hashtags, no special formatting).

OPTIONAL ELEMENTS:
You may include a plain-text references section at the end if appropriate to the topic, but this is optional.

Begin with the first subheading, then continue with the first paragraph.
Output only the essay text.
[/INST]"""

df2 = generate(llm, MODEL, df, prompt, n=OUT_COUNT)
df2.head()

0.0% done
10.0% done
20.0% done
30.0% done
40.0% done
50.0% done
60.0% done
70.0% done
80.0% done
90.0% done


Unnamed: 0,title,prompt,text,model
0,12 Years a Slave: An Analysis of the Film Essay,[INST]\nWrite an essay of about 600 words on t...,"Background Context\n\n""12 Years a Slave,"" dire...",mistral-7b-instruct-v0.2.Q5_K_M
1,20+ Social Media Post Ideas to Radically Simpl...,[INST]\nWrite an essay of about 300 words on t...,Background Context\n\nSocial media marketing h...,mistral-7b-instruct-v0.2.Q5_K_M
2,533 U.S. 27 (2001) Kyllo v. United States: The...,[INST]\nWrite an essay of about 300 words on t...,Background Context\n\nThe Fourth Amendment of ...,mistral-7b-instruct-v0.2.Q5_K_M
3,A Charles Schwab Corporation Case Essay,[INST]\nWrite an essay of about 300 words on t...,Background Context\n\nCharles Schwab Corporati...,mistral-7b-instruct-v0.2.Q5_K_M
4,A Clinical Office Assistant’s Attire Research ...,[INST]\nWrite an essay of about 400 words on t...,Background Context\n\nA Clinical Office Assist...,mistral-7b-instruct-v0.2.Q5_K_M


#### Пример:

In [14]:
print_example(df2, EXAMPLE_ID)

[INST]
Write an essay of about 600 words on the topic '12 Years a Slave: An Analysis of the Film Essay', following the stylistic and structural conventions of IvyPanda essays.

STYLE REQUIREMENTS (IvyPanda-like):
1. The tone must be academic, clear, neutral, and objective.
2. The essay should include background context, a clear analytical structure, and well-developed paragraphs.
3. Arguments must be supported with reasoning, concise explanations, and relevant examples.
4. The writing must be coherent, logically progressive, and easy to read.
5. Avoid emotional language or overly dramatic phrasing.

STRUCTURE REQUIREMENTS:
1. Divide the essay into several sections, each starting with a plain-text subheading.
2. Subheadings must contain only letters and spaces — no punctuation, no symbols, and absolutely no Markdown formatting.
   - This means: no asterisks, no bold, no italics, no underscores, no brackets.
   - Subheadings must appear as plain text lines, e.g. Background Context
3. Eac

In [15]:
file_path = f"../datasets/{OUT_FILE_PREFIX}_2.csv"

df2.to_csv(
    file_path, 
    mode="a",
    header=not os.path.exists(file_path),
    index=False,
    encoding="utf-8",
)

### Вариант 3. Few-shot prompting

```
[INST]
You are a helpful writing assistant. Your task is to generate essays of a given length and topic.
Start directly with the first sentence. No titles or headings. Output only the essay text.

Here is an example:

Example topic: "{}"
Example length: {} words

Example output:
{}

Now generate a new essay.

Topic: "{}"
Length: {} words

Write the essay now.
[/INST]
```

В качестве примера используется эссе из этого же набора, но на другую тему.

In [16]:
def generate_few_shot(model, model_name, df, prompt_raw, start=0, n=5000):
    rows = []
    for i in range(start + 1, start + n + 1):
        prev_title = df.loc[i - 1, "title"]
        prev_length = word_count(df.loc[i - 1, "text"])
        prev_text = df.loc[i - 1, "text"]
        
        title = df.loc[i, "title"]
        length = word_count(df.loc[i, "text"])
    
        prompt_formatted = prompt_raw.format(prev_title, prev_length, prev_text, title, length)

        try:
            response = generate_response(
                model,
                prompt_formatted,
                max_tokens=4096,
                temperature=0.7,
            )
        except:
            continue
            
        response = response.lstrip()
    
        rows.append({"title": title, "prompt": prompt_formatted, "text": response, "model": model_name})

        if i % (n / 10) == 0:
            print(f"{100 * i / n}% done")
    
    return pd.DataFrame(rows)

In [17]:
prompt = """[INST]
You are a helpful writing assistant. Your task is to generate essays of a given length and topic.
Start directly with the first sentence. No titles or headings. Output only the essay text.

Here is an example:

Example topic: "{}"
Example length: {} words

Example output:
{}

Now generate a new essay.

Topic: "{}"
Length: {} words

Write the essay now.
[/INST]"""

df3 = generate_few_shot(llm, MODEL, df, prompt, n=OUT_COUNT)
df3.head()

10.0% done
20.0% done
30.0% done
40.0% done
50.0% done
60.0% done
70.0% done
80.0% done
90.0% done
100.0% done


Unnamed: 0,title,prompt,text,model
0,20+ Social Media Post Ideas to Radically Simpl...,[INST]\nYou are a helpful writing assistant. Y...,"In the digital age, social media has become an...",mistral-7b-instruct-v0.2.Q5_K_M
1,533 U.S. 27 (2001) Kyllo v. United States: The...,[INST]\nYou are a helpful writing assistant. Y...,In the landmark case of Kyllo v. United States...,mistral-7b-instruct-v0.2.Q5_K_M
2,A Charles Schwab Corporation Case Essay,[INST]\nYou are a helpful writing assistant. Y...,"Facts\nThe Charles Schwab Corporation, a leadi...",mistral-7b-instruct-v0.2.Q5_K_M
3,A Clinical Office Assistant’s Attire Research ...,[INST]\nYou are a helpful writing assistant. Y...,A clinical office assistant plays a crucial ro...,mistral-7b-instruct-v0.2.Q5_K_M
4,A Comic Science Fiction Film “Back to the Futu...,[INST]\nYou are a helpful writing assistant. Y...,"In the realm of popular culture, few films hav...",mistral-7b-instruct-v0.2.Q5_K_M


#### Пример:

In [18]:
print_example(df3, EXAMPLE_ID)

[INST]
You are a helpful writing assistant. Your task is to generate essays of a given length and topic.
Start directly with the first sentence. No titles or headings. Output only the essay text.

Here is an example:

Example topic: "12 Years a Slave: An Analysis of the Film Essay"
Example length: 600 words

Example output:
The 2013 film 12 Years a Slave proved that slavery is a worldwide issue. Indeed, the film made $150 million outside the United States and $57 million in the U.S., with a production budget of $20 million (Sharf, 2020). The movie was based on the memoir Twelve Years a Slave by Solomon Northup (Ntim, 2020). It tells the story of a free African American man who was kidnapped and sold into slavery. Solomon spent twelve years away from his family, being traded from one master to another. Fortunately, the protagonist met a person who helped him deliver a message to his family and friends, who came and rescued him. This movie accurately illustrates discriminatory relationsh

In [19]:
file_path = f"../datasets/{OUT_FILE_PREFIX}_3.csv"

df3.to_csv(
    file_path, 
    mode="a",
    header=not os.path.exists(file_path),
    index=False,
    encoding="utf-8",
)

### Вариант 4. Content Abstraction (Regeneration Learning)

Первый промпт:
```
[INST]
Identify the writing style and format of the text, excluding all semantic content and themes.

Text:
{}
[/INST]
```

Второй промпт:
```
[INST]
Write an essay of about {} words on the topic '{}'.

You are given the following style-and-format description:
'{}'

Start directly with the first sentence of the essay. Absolutely no title or heading of any kind.
Output only the essay text.
[/INST]
```

In [20]:
def generate_with_content(model, model_name, df, prompt_raw_1, prompt_raw_2, start=0, n=5000):
    rows = []
    for i in range(start, start + n):
        title = df.loc[i, "title"]
        length = word_count(df.loc[i, "text"])
        text = df.loc[i, "text"]
    
        prompt_1_formatted = prompt_raw_1.format(text)

        try:
            response_1 = generate_response(
                model,
                prompt_1_formatted,
                max_tokens=4096,
                temperature=0.7,
            )
            response_1 = response_1.lstrip()
    
            prompt_2_formatted = prompt_raw_2.format(length, title, response_1)
            response_2 = generate_response(
                model,
                prompt_2_formatted,
                max_tokens=4096,
                temperature=0.7,
            )
        except:
            continue
            
        response = response_2.lstrip()
    
        rows.append({
            "title": title, 
            "prompt_1": prompt_1_formatted, 
            "response": response_1,
            "prompt_2": prompt_2_formatted, 
            "text": response, 
            "model": model_name
        })

        if i % (n / 10) == 0:
            print(f"{100 * i / n}% done")
    
    return pd.DataFrame(rows)

In [21]:
prompt_1 = """[INST]
Identify the writing style and format of the text, excluding all semantic content and themes.

Text:
{}
[/INST]"""

prompt_2 = """[INST]
Write an essay of about {} words on the topic '{}'.

You are given the following style-and-format description:
'{}'

Start directly with the first sentence of the essay. Absolutely no title or heading of any kind.
Output only the essay text.
[/INST]"""

In [22]:
df4 = generate_with_content(llm, MODEL, df, prompt_1, prompt_2, n=OUT_COUNT)
df4.head()

0.0% done
10.0% done
20.0% done
30.0% done
40.0% done
50.0% done
60.0% done
70.0% done
80.0% done
90.0% done


Unnamed: 0,title,prompt_1,response,prompt_2,text,model
0,12 Years a Slave: An Analysis of the Film Essay,[INST]\nIdentify the writing style and format ...,The text is a descriptive and analytical essay...,[INST]\nWrite an essay of about 600 words on t...,"In the annals of American cinema, there are fi...",mistral-7b-instruct-v0.2.Q5_K_M
1,20+ Social Media Post Ideas to Radically Simpl...,[INST]\nIdentify the writing style and format ...,The text is written in an academic style with ...,[INST]\nWrite an essay of about 300 words on t...,Intro:\nSocial media marketing has become an i...,mistral-7b-instruct-v0.2.Q5_K_M
2,533 U.S. 27 (2001) Kyllo v. United States: The...,[INST]\nIdentify the writing style and format ...,The text is written in a formal and academic s...,[INST]\nWrite an essay of about 300 words on t...,I. Introduction\n\nThis essay explores the lan...,mistral-7b-instruct-v0.2.Q5_K_M
3,A Charles Schwab Corporation Case Essay,[INST]\nIdentify the writing style and format ...,The writing style of the text is formal and ac...,[INST]\nWrite an essay of about 300 words on t...,This paper aims to analyze the form of busines...,mistral-7b-instruct-v0.2.Q5_K_M
4,A Clinical Office Assistant’s Attire Research ...,[INST]\nIdentify the writing style and format ...,"The text is written in a formal, business-like...",[INST]\nWrite an essay of about 400 words on t...,"As a clinical office assistant, the appearance...",mistral-7b-instruct-v0.2.Q5_K_M


#### Пример:

In [23]:
print_example(df4, EXAMPLE_ID)

[INST]
Identify the writing style and format of the text, excluding all semantic content and themes.

Text:
The 2013 film 12 Years a Slave proved that slavery is a worldwide issue. Indeed, the film made $150 million outside the United States and $57 million in the U.S., with a production budget of $20 million (Sharf, 2020). The movie was based on the memoir Twelve Years a Slave by Solomon Northup (Ntim, 2020). It tells the story of a free African American man who was kidnapped and sold into slavery. Solomon spent twelve years away from his family, being traded from one master to another. Fortunately, the protagonist met a person who helped him deliver a message to his family and friends, who came and rescued him. This movie accurately illustrates discriminatory relationships between white slaveholders and black slaves that stemmed from the dysfunctional system in the country and prejudices in people’s mindsets at that time.

The two main ethnic groups presented in this film are White a

In [24]:
file_path = f"../datasets/{OUT_FILE_PREFIX}_4.csv"

df4.to_csv(
    file_path, 
    mode="a",
    header=not os.path.exists(file_path),
    index=False,
    encoding="utf-8",
)

### Вариант 5. Outline‑guided Text Generation

Первый промпт:
```
[INST]
You are an academic writing instructor.
Create a well-structured outline for an academic essay on the following topic: {}.

Requirements:

The outline must include the following sections:
- Introduction
- Main Body (2–4 thematic sections)
- Counterarguments / Alternative Perspectives
- Conclusion

Each section should contain 2–3 concise bullet points describing the key ideas or arguments.
The outline should reflect a clear thesis-driven structure.
Use formal academic language.

Do not write the essay itself
[/INST]
```

Второй промпт:
```
[INST]
You are an academic essay writer.
Write a coherent academic essay of about {} words on the topic '{}' strictly following the outline below.

Outline:
{}

Requirements:
- Follow the outline in the given order without adding or removing sections.
- Develop each bullet point into a full, well-argued paragraph.
- Maintain a formal academic tone and clear argumentative logic.
- Use appropriate academic transitions and signposting.
- Do not introduce new arguments or themes not present in the outline.
- Avoid informal language, personal anecdotes, and unsupported claims.
[/INST]
```

In [25]:
def generate_with_plan(model, model_name, df, prompt_raw_1, prompt_raw_2, start=0, n=5000):
    rows = []
    for i in range(start, start + n):
        title = df.loc[i, "title"]
        length = word_count(df.loc[i, "text"])
        text = df.loc[i, "text"]
    
        prompt_1_formatted = prompt_raw_1.format(title)

        try:
            response_1 = generate_response(
                model,
                prompt_1_formatted,
                max_tokens=4096,
                temperature=0.7,
            )
            response_1 = response_1.lstrip()
    
            prompt_2_formatted = prompt_raw_2.format(length, title, response_1)
            response_2 = generate_response(
                model,
                prompt_2_formatted,
                max_tokens=4096,
                temperature=0.7,
            )
        except:
            continue
            
        response = response_2.lstrip()
    
        rows.append({
            "title": title, 
            "prompt_1": prompt_1_formatted, 
            "response": response_1,
            "prompt_2": prompt_2_formatted, 
            "text": response, 
            "model": model_name
        })

        if i % (n / 10) == 0:
            print(f"{100 * i / n}% done")
    
    return pd.DataFrame(rows)

In [26]:
prompt_1 = """[INST]
You are an academic writing instructor.
Create a well-structured outline for an academic essay on the following topic: {}.

Requirements:

The outline must include the following sections:
- Introduction
- Main Body (2–4 thematic sections)
- Counterarguments / Alternative Perspectives
- Conclusion

Each section should contain 2–3 concise bullet points describing the key ideas or arguments.
The outline should reflect a clear thesis-driven structure.
Use formal academic language.

Do not write the essay itself
[/INST]"""

prompt_2 = """[INST]
You are an academic essay writer.
Write a coherent academic essay of about {} words on the topic '{}' strictly following the outline below.

Outline:
{}

Requirements:
- Follow the outline in the given order without adding or removing sections.
- Develop each bullet point into a full, well-argued paragraph.
- Maintain a formal academic tone and clear argumentative logic.
- Use appropriate academic transitions and signposting.
- Do not introduce new arguments or themes not present in the outline.
- Avoid informal language, personal anecdotes, and unsupported claims.
[/INST]"""

In [27]:
df5 = generate_with_plan(llm, MODEL, df, prompt_1, prompt_2, n=OUT_COUNT)
df5.head()

0.0% done
10.0% done
20.0% done
30.0% done
40.0% done
50.0% done
60.0% done
70.0% done
80.0% done
90.0% done


Unnamed: 0,title,prompt_1,response,prompt_2,text,model
0,12 Years a Slave: An Analysis of the Film Essay,[INST]\nYou are an academic writing instructor...,"I. Introduction\n- Brief background of ""12 Yea...",[INST]\nYou are an academic essay writer.\nWri...,"Title: ""12 Years a Slave"": An In-depth Analysi...",mistral-7b-instruct-v0.2.Q5_K_M
1,20+ Social Media Post Ideas to Radically Simpl...,[INST]\nYou are an academic writing instructor...,I. Introduction\n- Brief explanation of the im...,[INST]\nYou are an academic essay writer.\nWri...,Title: Simplifying Marketing Efforts with Soci...,mistral-7b-instruct-v0.2.Q5_K_M
2,533 U.S. 27 (2001) Kyllo v. United States: The...,[INST]\nYou are an academic writing instructor...,I. Introduction\n- Brief description of Kyllo ...,[INST]\nYou are an academic essay writer.\nWri...,Title: Kyllo v. United States (2001): A New Fr...,mistral-7b-instruct-v0.2.Q5_K_M
3,A Charles Schwab Corporation Case Essay,[INST]\nYou are an academic writing instructor...,I. Introduction\n- Brief background of Charles...,[INST]\nYou are an academic essay writer.\nWri...,Title: An Analysis of Charles Schwab Corporati...,mistral-7b-instruct-v0.2.Q5_K_M
4,A Clinical Office Assistant’s Attire Research ...,[INST]\nYou are an academic writing instructor...,I. Introduction\n- Brief overview of the role ...,[INST]\nYou are an academic essay writer.\nWri...,Title: A Clinical Office Assistant’s Attire: B...,mistral-7b-instruct-v0.2.Q5_K_M


#### Пример:

In [28]:
print_example(df5, EXAMPLE_ID)

[INST]
You are an academic writing instructor.
Create a well-structured outline for an academic essay on the following topic: 12 Years a Slave: An Analysis of the Film Essay.

Requirements:

The outline must include the following sections:
- Introduction
- Main Body (2–4 thematic sections)
- Counterarguments / Alternative Perspectives
- Conclusion

Each section should contain 2–3 concise bullet points describing the key ideas or arguments.
The outline should reflect a clear thesis-driven structure.
Use formal academic language.

Do not write the essay itself
[/INST]
I. Introduction
- Brief background of "12 Years a Slave" as a historical account and film adaptation
- Thesis statement: An in-depth analysis of "12 Years a Slave" reveals its impactful portrayal of slavery, human resilience, and the complexities of race relations.

II. Portrayal of Slavery in 12 Years a Slave
- Depiction of the harsh realities of plantation life
    - Forced labor, physical abuse, and psychological trauma


In [29]:
file_path = f"../datasets/{OUT_FILE_PREFIX}_5.csv"

df5.to_csv(
    file_path, 
    mode="a",
    header=not os.path.exists(file_path),
    index=False,
    encoding="utf-8",
)