<a href="https://colab.research.google.com/github/andjoer/llm_poetry_generation/blob/main/colabs/Poetry_generation.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Creating metrically correct and rhyming poetry with large language models

- a Colab runtime with high Ram needs to be selected
- it may take a while until everything is loaded
- For new users: Please click the 'Run all' button and always scroll to the cell where some action is visible 


<img src = 'https://github.com/andjoer/llm_poetry_generation/blob/main/graphics/colab_en.jpg?raw=true'>

German



<img src = 'https://github.com/andjoer/llm_poetry_generation/blob/main/graphics/colab.jpg?raw=true'>


### Cloning the github repository and changing directory

In [None]:
! git clone https://github.com/andjoer/llm_poetry_generation.git

%cd llm_poetry_generation


###Check if connected to a GPU

In [None]:
!nvidia-smi

### Installing the required packages

some pip errors might be ignored

In [None]:
! pip install -r requirements.txt

In [None]:
# since google colab is not an empty environment
!pip3 install torchvision==0.11.2 torchaudio==0.10.1 torchtext==0.11.1

!mkdir logs

### Download Spacy model

In [None]:
!python -m spacy download "de_core_news_lg"

## If GPT3 should be used
Example for syntax: 

%env OPENAI_API_KEY=jr-mdhcvu9kd

In [6]:
%env OPENAI_API_KEY=
%env OPENAI_API_ID=

env: OPENAI_API_KEY=
env: OPENAI_API_ID=


## If the spectrogram of words created with Amazon Polly should be used to detect rhymes

Amazon AWS credentials for using Polly


In [7]:
%env POLLY_API_KEY=
%env POLLY_API_ID=

env: POLLY_API_KEY=
env: POLLY_API_ID=


## Input parameters

In [2]:
%run poetry_generator.py --h

  VERSION_SPEC = originalTextFor(_VERSION_SPEC)("specifier")
  MARKER_EXPR = originalTextFor(MARKER_EXPR())("marker")


usage: poetry_generator.py [-h] [--prompt PROMPT] [--title TITLE]
                           [--generated_lines GENERATED_LINES]
                           [--verse_versions VERSE_VERSIONS]
                           [--check_end CHECK_END]
                           [--invalid_verse_ends INVALID_VERSE_ENDS]
                           [--repetition_penalty REPETITION_PENALTY]
                           [--LLM LLM] [--LLM_sampling LLM_SAMPLING]
                           [--LLM_random_first LLM_RANDOM_FIRST]
                           [--LLM_random_all LLM_RANDOM_ALL]
                           [--LLM_temperature LLM_TEMPERATURE]
                           [--trunkate_after TRUNKATE_AFTER]
                           [--LLM_top_p LLM_TOP_P]
                           [--syllable_count_toll SYLLABLE_COUNT_TOLL]
                           [--dividable_rest DIVIDABLE_REST]
                           [--verse_stop_tokens VERSE_STOP_TOKENS]
                           [--verse_alpha_only_after

# Start generating
the result will be saved in llm_poetry_generation/logs.
Use below if you don't have access to a high Ram Colab runtime


In [None]:
%run poetry_generator.py --rhyme_scheme aabb --title 'Die Regierung' --LLM Anjoe/german-poetry-gpt2 --use_tts False --generated_poems 1 --generated_lines 4

If you have paid access to a Colab runtime with high ram you may uncomment and run this: 

In [None]:
#%run poetry_generator.py --prompt 'Nur durch das Morgentor des Schönen\nDrangst du in der Erkenntnis Land.\nAn höhern Glanz sich zu gewöhnen,\nÜbt sich am Reize der Verstand.' --rhyme_scheme aabb --LLM Anjoe/german-poetry-gpt2-large --use_tts False --generated_poems 1

# Finding rhymes for existing poetry

In [None]:
from rythm_utils import verse_cl
import argparse


In [21]:
poem = '''Da geht er hinaus in den Wald, 
von Gedanken gezogen und erfüllt,
das Licht gar unerreichbar scheint,
doch nun ist es soweit er sieht'''

rhyming_lines = [2,4]

In [22]:
rhyming_lines.sort()

text_lst = poem.split('\n')
verse_lst = [verse_cl(text) for text in text_lst]

idx_1 = rhyming_lines[0]-1
idx_2 = rhyming_lines[1]-1


verse_lst[idx_1].context = ' '.join(text_lst[:idx_1])    
verse_lst[idx_2].context = ' '.join(text_lst[:idx_2])

rhyme_parameters = argparse.Namespace
rhyme_parameters.max_rhyme_dist = 0.5
rhyme_parameters.use_colone_phonetics = False
rhyme_parameters.use_tts = False
rhyme_parameters.LLM_2 = None
rhyme_parameters.target_rythm = []
rhyme_parameters.top_p_dict_rhyme = {0:0.65,3:0.5}
rhyme_parameters.top_p_rhyme = 0.5
rhyme_parameters.rhyme_stop_tokens = ['\n','.']
rhyme_parameters.rhyme_temperature = 1
rhyme_parameters.allow_pos_match = True
rhyme_parameters.LLM = 'Anjoe/german-poetry-gpt2-large'
rhyme_parameters.invalid_verse_ends = []
rhyme_parameters.LLM_sampling = 'systematic'
rhyme_parameters.LLM_rhyme = None  # makes it same as LLM
rhyme_parameters.LLM_rhyme_sampling = None  # makes it same as LLM
rhyme_parameters.repetition_penalty = 1.2
rhyme_parameters.size_tts_sample = 10


if you have access to a high RAM runtime, don't run the below cell

In [9]:
rhyme_parameters.LLM = 'Anjoe/german-poetry-gpt2'

In [10]:
from rhyme import find_rhyme
from poetry_generator import initialize_llms

LLM, LLM_perplexity, LLM_rhyme, _ = initialize_llms(rhyme_parameters)

### Start the search

the function above is first creating alternatives for the verse endings with gpt and bert. It returns the alternatives for both verses as list (first and second). It also checks if some alternatives would rhyme with different methods

In [None]:
_, first, second = find_rhyme(rhyme_parameters,
                            verse_lst,
                            idx_1,
                            idx_2,
                            LLM_perplexity,
                            last_stress = -2, 
                            LLM=LLM,
                            LLM2 = None,
                            return_alternatives=True) # needs to be set True, otherwise the function would only return one value


In [None]:
first

In [None]:
second

# The inner workings of the rhyme detection mechanism

Method 3 - comparing the mfcc features of words - is explained in an other colab notebook
## Method 1: colone phonetics
https://de.wikipedia.org/wiki/K%C3%B6lner_Phonetik

The words are converted into a number while the distance between the numbers reflect the difference in the sound of the words.




In [8]:
from rhyme_detection.colone_phonetics import colone_phonetics

In [None]:
colone_phonetics('Gedicht')

## Method 2: siamese recurrent network
The network maps the words into a vectorspace. Words that are rhyming are closer to each other compared to non rhyming pairs

In [10]:
from sia_rhyme.siamese_rhyme import siamese_rhyme
rhyme_model = siamese_rhyme()

In [None]:
vector_1 = rhyme_model.get_word_vec('gehen')
vector_1

In [12]:
vector_2 = rhyme_model.get_word_vec('sägen')

In [13]:
rhyme_model.vector_distance(vector_1,vector_2)

array([0.20934641], dtype=float32)

# Finding the rythm of words

The words are translated into the IPA phonetic alphabet. This notation contains signs for primary and secondary word stress. In the output a non stressed syllable is translated to 0, a secondary stress to 0.5 and a primary stress to 1. Words that have no word stress get a 0 or 1 dependent on the type of the word

In [15]:
from rythm_utils import get_rythm

In [None]:
get_rythm('Bibliothek')

In [20]:
from annotate_meter.ortho_to_ipa import ortho_to_ipa
import os

dirname = ''
m_path = os.path.join(dirname, 'ortho_to_ipa/model')

otoi = ortho_to_ipa(load = True)

In [None]:
otoi.translate('seyn')

In [None]:
otoi.translate('eynerley')

# Fixing the meter of a verse
The rythm of a verse is fixed with Spacy and Bert. Spacy detects which combinations of words could be deleted in order to shorten a verse if it is too long. The presented algorithm chooses the best of these options (according to meter and perplexity). If the meter is incorrect Bert tries to replace the problematic word or to insert an other word in front of it. Also if the verse is too short, Bert finds different options of inserting words and the one with the smallest perplexity value is chosen. The process is iterative. Below are two examples of adjusting the length of a verse. Since no context to the sentences is provided, the result of the more extreme change is not perfect. In the poetry generation algorithm gpt_poet only creates verses that are not too long. Therefore Bert does not get too much influence on the final output. However this rythm algorithm could be used independently for different gestures, for example in order to transform all sentences in a corpus to match a certain meter. 

In [None]:
from rythm import fix_rythm
from rythm_utils import verse_cl
from gpt2 import LLM_class
LLM_perplexity = LLM_class('Anjoe/german-poetry-gpt2',device='cpu')

In [3]:

test_verse_1 = verse_cl('Wer hatte ihm den weißen Stock gegeben')
test_verse_2 = verse_cl('Der Begriff Poesie umfasste in der Antike und frühen Neuzeit die Werke in gebundener Sprache, während im Mittelalter nur die quantitierende Dichtung in antiker Tradition als poesis bezeichnet wurde') #wikipedia DE Poesie

In [None]:
new_verse = fix_rythm(test_verse_1,[0,1],9,LLM_perplexity)  # verse, target rythm, number of syllables
print(' '.join(new_verse.text))

In [None]:
new_verse = fix_rythm(test_verse_2,[0,1],10,LLM_perplexity)  # verse, target rythm, number of syllables
print(' '.join(new_verse.text))