# Worksheet 2.2.0: Text generation

<div class="alert alert-block alert-info">
    This worksheet implements to-do markers where work needs to be completed. In some cases, this means that you'll need to copy a line or two and make meaningful changes.
</div>

## I'm nobody, who are you

...and did you know that computers can be poets, too? They can and, historically, have been. Take the following work called _House of Dust_ by Alison Knowles and James Tenney (translated from a language called Fortran into Python by the inestimable [Allison Parrish](http://www.decontextualize.com)):

In [4]:
import random

materials = [
    'shiny steel',
    'aluminum',
    'laser-cut mdf',
    'blinking leds'
]

locations = [
    'among high mountains',
    'among other houses',
    'among small hills',
    'by a river',
    'by an abandoned lake',
    'by the sea',
    'in a cold, windy climate',
    'in a deserted airport',
    'in a deserted church',
    'in a deserted factory',
    'in a green, mossy terrain',
    'in a hot climate',
    'in a metropolis',
    'in a place with both heavy rain and bright sun',
    'in an overpopulated area',
    'in dense woods',
    'in heavy jungle undergrowth',
    'in japan',
    'in michigan',
    'in southern france',
    'inside a mountain',
    'on an island',
    'on the sea',
    'underwater'
]

lights = [
    'all available lighting',
    'candles',
    'electricity',
    'natural light'
]

inhabitants = [
    'all races of men represented wearing predominantly red clothing',
    'children and old people',
    'collectors of all types',
    'fishermen and families',
    'french and german speaking people',
    'friends',
    'friends and enemies',
    'horses and birds',
    'little boys',
    'lovers',
    'people from many walks of life',
    'people speaking many languages wearing little or no clothing',
    'people who eat a great deal',
    'people who enjoy eating together',
    'people who love to read',
    'people who sleep almost all the time',
    'people who sleep very little',
    'various birds and fish',
    'vegetarians',
    'very tall people'
]

stanza_count = 7
for i in range(stanza_count):
    print()
    print("A house of " + random.choice(materials))
    print(f"     " + random.choice(locations))
    print("          using " + random.choice(lights))
    print("                inhabited by " + random.choice(inhabitants))


A house of laser-cut mdf
     in japan
          using electricity
                inhabited by vegetarians

A house of blinking leds
     in heavy jungle undergrowth
          using candles
                inhabited by french and german speaking people

A house of shiny steel
     in a green, mossy terrain
          using candles
                inhabited by friends and enemies

A house of blinking leds
     in a metropolis
          using all available lighting
                inhabited by little boys

A house of shiny steel
     in a deserted church
          using candles
                inhabited by people who sleep almost all the time

A house of laser-cut mdf
     in a green, mossy terrain
          using electricity
                inhabited by lovers

A house of aluminum
     in a metropolis
          using natural light
                inhabited by french and german speaking people


"But, of course," you say, "it looks a lot like a madlib, so it _would_ come out readable." 

True, but for its time (and even now) it serves the purpose of helping us understand _how_ computers can assist us in creating interesting and "renewable" works which both have form and literal function. The process of writing computational poetry differs little in terms of mental processes from its analogue counterpart: it's all about discovering and exploiting patterns in language. Even (at this point _literally_) 100 years ago, the Dada art movement had a similar hot-take about how to create poems:

```
Take a newspaper.
Take some scissors.
Choose from this paper an article of the length you want to make your poem.
Cut out the article.
Next carefully cut out each of the words that makes up this article and put them all in a bag.
Shake gently.
Next take out each cutting one after the other.
Copy conscientiously in the order in which they left the bag.
Them poem will resemble you.
And there youu are – an infinitely original author of charming sensibility, even though unappreciated by the vulgar herd.
```

["Dada Manifesto on Feeble Love and Bitter Love"](https://391.org/manifestos/1920-dada-manifesto-feeble-love-bitter-love-tristan-tzara/#.WnPkJYJOndd), Tristan Tzara (~1920)

Okay. If that's true, let's do it.

## Writing your own Dadaist poem

In [1]:
from textwrap3 import wrap         # to make it pretty, because if it's not pretty,
                                   # it's not worth having
    
# TODO: Follow the example give by the professor to
#       generate our own!

Oh, that looks _good_. Let's save it. This, of course, is _writing_ a file, which is not dissimilar from reading one:

```python
poem = open("poem.txt","a") # <-- here "a" is for "append"
poem.write(TEXT TO WRITE)
poem.close()                # <-- tell the file we're done
```

In [17]:
# TODO: Write the results to a file called "dada.txt"

Congratulations -- we/you have a poem. Yer a poet. Does it make sense? Maybe a strange kind, but do poems _have_ to make sense?

### Another `random` strategy

Another poetic form that leverages randomness is the _cento_ (pronounced "chen-toe" by folks who want to look smart, but "sen-toe" by most of the rest of us). Typically, the form takes _multiple_ poems and samples full lines from each of them. I've given us four:

* Elizabeth Bishop, "One Art"
  * `poems/bishop_one_art.txt`
* Robert Haden, "Those Winter Sundays"
  * `poems/hayden_those_winter.txt`
* Marianne Moore, "Poetry"
  * `poems/moore_poetry.txt`
* William Carlos Williams, "Landscape With the Fall of Icarus"
  * `poems/williams_landscape_icarus.txt`

We'll create one of these _cento_ businesses from them.

In [51]:
def read_file(filename):
    file = open(f"poems/{filename}","r")
    return file.read()
    
files = [
    "bishop_one_art.txt",
    "hayden_those_winter.txt",
    "moore_poetry.txt",
    "williams_landscape_icarus.txt"
]

# TODO: Follow along with the professor's example to create
#       and write a file called "cento.txt"

## Some vocabulary

When we generate text, we usually refer to the source (or the body of text from which we're generating) as a _corpus_, and any kind of generation or sampling we do as coming from a _model_. In the above examples our corpi (corpuses? porpoises?) are relativey simple and the models the same (in fact in the above, `corpus == model`, essentally).

However, that's not always the case.

### A philosophical detour into statistics

The problem of "generating text _like_ something" is not an altogether unworthy task. To do this, it requires a bit of warping how we think about a given author or style's work. Instead of the product of artistic genius, what if we were to think about a writer's style as nothing more than a set of likelihoods (i.e. that it's more than likely that if we are _in fact_ reading Shakespeare -- like, why -- and a word begins with "f" _and_ is 3 letters, it's probably "fie" -- as in "o, fie on't").

Curiously enough, we have a kind of statistical body right here, right now: the corpus of our poem texts. Can we write _new_ lines from it?

Yes, yes we can -- using something called a "Markov chain," which is just a fancy statistical word for a kind of frequency analysis. We're not going to care about the _why_ or _what_. Let's focus on the _how_. That's more fun anyway.

Here, our _model_ and our texts will be a bit different than our examples above.

We're going to use a library called [markovify](https://github.com/jsvine/markovify) in order to save ourselves the hassle of making our own chain generator.

In [88]:
# A handy Markov chain generator, so we don't
# have to do the work.
import markovify

# Create a model based on the text
model = markovify.NewlineText(text,state_size=1)

# Generate 10 sentences
for _ in range(10):
    print(model.make_short_sentence(60))

sweating in the meantime, if it valid
According to him,
that there was spring
to become so many things are
insolence and put his field
useful; when Icarus fell
next-to-last, of something to master
with a tireless
so many things that melted
I miss them, but because


None of these lines actually exist in the source texts, though the sources aren't too hard to identify. That's because our sample size is actually a bit _small_. Let's increase it with, say... [145 poems](f1_week-2-worksheet-sonnets.md).