<a href="https://colab.research.google.com/github/andjoer/llm_poetry_generation/blob/main/colabs/Poetry_generation.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Creating metrically correct and rhyming poetry with large language models

- a Colab runtime with high Ram needs to be selected
- it may take a while until everything is loaded
- click the 'Run all' button


<img src = 'https://github.com/andjoer/llm_poetry_generation/blob/main/graphics/colab_en.jpg?raw=true'>

German



<img src = 'https://github.com/andjoer/llm_poetry_generation/blob/main/graphics/colab.jpg?raw=true'>


### Cloning the github repository and changing directory

In [None]:
! git clone https://github.com/andjoer/llm_poetry_generation.git

%cd llm_poetry_generation


###Check if connected to a GPU

In [None]:
!nvidia-smi

Wed Jun 22 17:16:07 2022       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 460.32.03    Driver Version: 460.32.03    CUDA Version: 11.2     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|   0  Tesla P100-PCIE...  Off  | 00000000:00:04.0 Off |                    0 |
| N/A   43C    P0    30W / 250W |      0MiB / 16280MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Proces

### Installing the required packages

some pip errors might be ignored

In [None]:
! pip install -r requirements.txt

In [None]:
# since google colab is not an empty environment
!pip3 install torchvision==0.11.2 torchaudio==0.10.1 torchtext==0.11.1

!mkdir output

### Download Spacy model

In [None]:
!python -m spacy download "de_core_news_lg"

## If GPT3 should be used
Example for syntax: 

%env OPENAI_API_KEY=jr-mdhcvu9kd

In [None]:
%env OPENAI_API_KEY=
%env OPENAI_API_ID=

## If the spectrogram of words created with Amazon Polly should be used to detect rhymes

Amazon AWS credentials for using Polly


In [None]:
%env POLLY_API_KEY=
%env POLLY_API_ID=

## Importing the main function
You need to be connected to a GPU accelerated environment

In [None]:
from poetry_generator import start_poetry_generation

## Define the inputs

In [None]:
jambus = [0,1]
trochee = [1,0]

prompt = '''Die Philosophie ist ein schlechtes Metier.
Wahrhaftig, ich begreife nie,
Warum man treibt Philosophie.
Sie ist langweilig und bringt nichts ein,
Und gottlos ist sie obendrein;'''

num_syll_list =[9,11]                               # number of syllables of one Verse; if it is a list
                                                    # the length of the verses will iterate accordingly
                                                                    
rhyme_scheme = 'abab'                               # if '' the program will not look for rhymes (much faster)

rythm = jambus 

# Start generating
the result will be saved in llm_poetry_generation/output

In [None]:
start_poetry_generation(prompt,
                        rythm,
                        num_syll_list,
                        rhyme_scheme,
                        shots = 1,                           # number of time the algorithm generates the verse; afterwards it selects the one with lowest perplexity
                        loops = 1,                           # number of poems to generate
                        LLM='GPT2-large',
                        LLM_rhyme='GPT2-large',
                        use_tts = False)                     # if you have entered the credentials for the aws account

# Finding rhymes for existing poetry

In [None]:
from rythm_utils import verse_cl


In [None]:
poem = '''Da geht er hinaus in den Wald, 
von Gedanken gezogen und erfüllt,
das Licht gar unerreichbar scheint,
doch nun ist es soweit er sieht'''

rhyming_lines = [1,3]

In [None]:
rhyming_lines.sort()

text_lst = poem.split('\n')
verse_lst = [verse_cl(text) for text in text_lst]

idx_1 = rhyming_lines[0]-1
idx_2 = rhyming_lines[1]-1


verse_lst[idx_1].context = ' '.join(text_lst[:idx_1])    
verse_lst[idx_2].context = ' '.join(text_lst[:idx_2])


In [None]:
from rhyme import find_rhyme

### Start the search

the function above is first creating alternatives for the verse endings with gpt and bert. It returns the alternatives for both verses as list (first and second). It also checks if some alternatives would rhyme with different methods

In [None]:
_, first, second = find_rhyme(verse_lst,
                              idx_1,
                              idx_2,
                              [],
                              last_stress = -2, 
                              detection_method ='neural',
                              LLM='GPT2-large',
                              use_tts = False,
                              return_alternatives=True)

In [None]:
first

In [None]:
second

# The inner workings of the rhyme detection mechanism

Method 3 - comparing the mfcc features of words - is explained in an other colab notebook
## Method 1: colone phonetics
https://de.wikipedia.org/wiki/K%C3%B6lner_Phonetik

The words are converted into a number while the distance between the numbers reflect the difference in the sound of the words.




In [None]:
from rhyme_detection.colone_phonetics import colone_phonetics

In [None]:
colone_phonetics('Gedicht')

[4, 0, 2, 0, 4, 2]

## Method 2: siamese recurrent network
The network maps the words into a vectorspace. Words that are rhyming are closer to each other compared to non rhyming pairs

In [None]:
from sia_rhyme.siamese_rhyme import siamese_rhyme
rhyme_model = siamese_rhyme()

In [None]:
vector_1 = rhyme_model.get_word_vec('gehen')
vector_1

tensor([[-0.5856,  0.7474,  0.1178, -0.0544, -0.0271,  0.3719, -0.2101,  0.4195,
          0.3779,  0.0342, -0.4246, -0.3210,  0.7679, -0.0753,  0.0470,  0.0611,
          0.0807, -0.0652, -0.3461,  0.4887, -0.1806,  0.0458, -0.2565,  0.4855,
          0.1426, -0.0724,  0.4419,  0.3535, -0.2769,  0.0636, -0.1953,  0.3511,
          0.7090,  0.8421,  0.6162,  0.5989,  0.6668, -0.3263, -0.7324,  0.3969,
         -0.1411, -0.0125,  0.0048,  0.0571, -0.1915, -0.1740,  0.4161, -0.3785,
         -0.3235,  0.8206,  0.4755,  0.2791,  0.0385, -0.1753,  0.1664,  0.4155,
         -0.4856, -0.1417,  0.0729, -0.6723, -0.2852, -0.3127,  0.2243, -0.0176,
          0.2686,  0.0457, -0.0650,  0.3836, -0.3010, -0.2898,  0.6069,  0.3057,
         -0.6498, -0.6421,  0.0050,  0.4717,  0.0602, -0.5515,  0.3163,  0.4609,
         -0.4097, -0.8044,  0.3902,  0.1557,  0.6025, -0.2303,  0.0567, -0.7864,
         -0.0780, -0.1755, -0.3927, -0.8881,  0.4535, -0.4321,  0.1614, -0.0957,
         -0.3500,  0.3394,  

In [None]:
vector_2 = rhyme_model.get_word_vec('sägen')

In [None]:
rhyme_model.vector_distance(vector_1,vector_2)

array([0.16701883], dtype=float32)

# Finding the rythm of words

The words are translated into the IPA phonetic alphabet. This notation contains signs for primary and secondary word stress. In the output a non stressed syllable is translated to 0, a secondary stress to 0.5 and a primary stress to 1. Words that have no word stress get a 0 or 1 dependent on the type of the word

In [None]:
from rythm_utils import get_rythm

In [None]:
get_rythm('Bibliothek')

array([0., 0., 1.])

In [None]:
from ortho_to_ipa.ortho_to_ipa import ortho_to_ipa
import os

dirname = ''
m_path = os.path.join(dirname, 'ortho_to_ipa/model')

otoi = ortho_to_ipa(load = True,fname_ortho= m_path +'/ortho.pt',
                    fname_ipa=m_path +'/ipa.pt',fname_model=m_path +'/ortho_to_ipa.pt')

In [None]:
otoi.translate('seyn')

'zɛɪ̯n'

In [None]:
otoi.translate('eynerley')

'ˈaɪ̯nɐllɪ̯'