Oksana Dereza

## Poetry & Python

A simple character-based model for text generation.


In [2]:
import re
from collections import Counter, defaultdict

class LanguageModel:
    def __init__(self, data, order=4):
        self.order = order
        pad = '~' * order
        self.data = pad + data.lower()
        self.ngrams = self.make_ngrams()
        self.alphabet = set(self.data)
        self.matrix = self.make_matrix()

        self.lm = {history: self.normalize(chars) for history, chars in self.matrix.items()}
        
    def make_ngrams(self):
        n = self.order + 1
        self.ngrams = [self.data[i:i+n] for i in range(len(self.data)-n+1)]
        return self.ngrams
    
    def make_matrix(self):
        self.matrix = defaultdict(lambda: defaultdict(int))
        for ngram in self.ngrams:
            matches = re.findall(ngram, self.data)
            self.matrix[ngram[:-1]][ngram[-1]] = Counter(matches)[ngram]
        return self.matrix          
    
    def normalize(self, chars):
        total = sum(chars.values())
        normalized = {}
        for k, v in chars.items():
            normalized[k] = v/total
        return normalized
    
    def __getitem__(self, history):
        return self.lm[history]

In [4]:
text = "Poetry has a long history, dating back to the Sumerian Epic of Gilgamesh. Early poems evolved from folk songs such as the Chinese Shijing, or from a need to retell oral epics, as with the Sanskrit Vedas, Zoroastrian Gathas, and the Homeric epics, the Iliad and the Odyssey. Ancient attempts to define poetry, such as Aristotle's Poetics, focused on the uses of speech in rhetoric, drama, song and comedy. Later attempts concentrated on features such as repetition, verse form and rhyme, and emphasized the aesthetics which distinguish poetry from more objectively informative, prosaic forms of writing. From the mid-20th century, poetry has sometimes been more generally regarded as a fundamental creative act employing language. \
Poetry uses forms and conventions to suggest differential interpretation to words, or to evoke emotive responses. Devices such as assonance, alliteration, onomatopoeia and rhythm are sometimes used to achieve musical or incantatory effects. The use of ambiguity, symbolism, irony and other stylistic elements of poetic diction often leaves a poem open to multiple interpretations. Similarly figures of speech such as metaphor, simile and metonymy create a resonance between otherwise disparate images—a layering of meanings, forming connections previously not perceived. Kindred forms of resonance may exist, between individual verses, in their patterns of rhyme or rhythm. \
Some poetry types are specific to particular cultures and genres and respond to characteristics of the language in which the poet writes. Readers accustomed to identifying poetry with Dante, Goethe, Mickiewicz and Rumi may think of it as written in lines based on rhyme and regular meter; there are, however, traditions, such as Biblical poetry, that use other means to create rhythm and euphony. Much modern poetry reflects a critique of poetic tradition, playing with and testing, among other things, the principle of euphony itself, sometimes altogether forgoing rhyme or set rhythm. In today's increasingly globalized world, poets often adapt forms, styles and techniques from diverse cultures and languages.\
Python is a widely used high-level, general-purpose, interpreted, dynamic programming language. Its design philosophy emphasizes code readability, and its syntax allows programmers to express concepts in fewer lines of code than possible in languages such as Java. The language provides constructs intended to enable writing clear programs on both a small and large scale. \
Python supports multiple programming paradigms, including object-oriented, imperative and functional programming or procedural styles. It features a dynamic type system and automatic memory management and has a large and comprehensive standard library. \
Python interpreters are available for many operating systems, allowing Python code to run on a wide variety of systems. CPython, the reference implementation of Python, is open source software and has a community-based development model, as do nearly all of its variant implementations. CPython is managed by the non-profit Python Software Foundation."

lm = LanguageModel(text)

In [5]:
lm.ngrams[:10]

['~~~~p',
 '~~~po',
 '~~poe',
 '~poet',
 'poetr',
 'oetry',
 'etry ',
 'try h',
 'ry ha',
 'y has']

In [14]:
list(lm['poet'].keys())

['r', 's', ' ', 'i']

In [21]:
# functions for character and text generation

import numpy as np

def generate_letter(lm, history):
    history = history[-lm.order:]
    best = np.random.choice(list(lm[history].keys()), 1, p=list(lm[history].values()))
    return best[0]

def generate_text(lm, n_letters=1000):
    history = '~' * lm.order
    out = []
    for i in range(n_letters):
        letter = generate_letter(lm, history)
        history += letter     
    return history

In [22]:
generate_letter(lm, 'poet')

'r'

In [23]:
generate_text(lm, n_letters=500)

"~~~~poetry types and resonance may exist, between other meter; their pattempts to run on both and fundamentation today's in the used devices such as assonance, all and concepts constructs a critiques forms, interpretations. simile and has some poetry types and comprehensive readers accustomed the sanskrit vedas, zoroastriant implements of speech in to evolved from more general-purpose, intended as the aesthetic dictional poet written leaves and automatopoeia and large and language. poetry has a long"