# "Chimère" project

This project is inspired by the constrained writing techniques of OuLiPo. More precisely, it is supposed to be a "quick and dirty" version of the "Chimère" game (http://oulipo.net/fr/contraintes/chimere), which consists in taking a "mold", i.e. a text where nouns, adjectives and verbs have been removed (but not their POS Tag in context), and "filling" it with nouns, adjectives and verbs from other texts, in the order in which they appear.

## Fetching data

To guarantee the weirdest effect possible, we chose texts that were pretty different from one another. The mold is a Wikipedia article for "Bison", so pretty serious and objective. The other support texts are from French literature : poetry (Mallarmé), romanticism (Hugo) and realism (Maupassant).

In [135]:
import wikipedia

wikipedia.set_lang('fr')
T = wikipedia.page('Bison').content[:983]

f = open('mallarme.txt')
S = f.read()[3375:5600]

f = open('hugo.txt')
A = f.read()[13850:16984]

f = open('maupassant.txt')
V = f.read()[8010: 11944]

## POS tagging

Here we tag the texts in order to identify where are the POS tags at stake : nouns, adjectives and verbs. We replace the actual words by their POS tags in the mold, and then fill the mold with the relevant "filler words" from the other texts.

In [139]:
from nltk.tag import StanfordPOSTagger
jar = 'stanford-postagger-full-2018-02-27/stanford-postagger-3.9.1.jar'
model = 'stanford-postagger-full-2018-02-27/models/french.tagger'
import os
java_path = "/usr/java/jdk1.8.0_172/bin/java.exe"
os.environ['JAVAHOME'] = java_path

pos_tagger = StanfordPOSTagger(model, jar, encoding='utf8' )
punctuation = ['.', '?', '!', ';', ':']

def create_mold(T):
    T_parsed = pos_tagger.tag(T.split())
    pos_to_remove = ['ADJ', 'N', 'NC', 'V']
    mold = []
    for elem in T_parsed:
        word = elem[0]
        pos = elem[1]
        if pos in pos_to_remove:
            mold.append(pos)
            if word[-1] in punctuation:
                mold.append(word[-1])
        else:
            mold.append(word)
    return (mold)

def extract_fillers(S, pos_to_extract):
    S_parsed = pos_tagger.tag(S.split())
    fillers = []
    for elem in S_parsed:
        if elem[1] in pos_to_extract:
            word = elem[0]
            if word[-1] in punctuation:
                word = word[:-1]
            fillers.append(word)
    fillers.reverse()
    return fillers

def fill_mold(mold, fillers, pos_to_fill):
    last_seen = 0
    while set(pos_to_fill) & set(mold) != set() and fillers:
        filler = fillers.pop()
        for i, elem in enumerate(mold[last_seen+1:-1]):
            if elem in pos_to_fill:
                mold[last_seen+i+1] = filler
                last_seen = last_seen+i+1
                break
    return mold

## Testing

The algorithm works but does not handle subject-verb agreement, or verb-adjective agreement. A "harmonisation" step should be added for this to work well.

In [138]:
mold = create_mold(T)

fillers_S = extract_fillers(S, ['N', 'NC'])
fillers_A = extract_fillers(A, ['ADJ'])
fillers_V = extract_fillers(V, ['V'])

mold = fill_mold(mold, fillers_A, ['ADJ'])
mold = fill_mold(mold, fillers_S, ['N', 'NC'])
mold = fill_mold(mold, fillers_V, ['V'])

print ' '.join(mold)

Les ciel (Bison Smith, crus versait un monde de toutes décrépitude, grande dont il pris deux nuages triple : le lambeaux d'Europe (Bison bonasus) et le pourpre 6 du Nord (Bison bison) qui tournai elle-même divisée en deux couchants : le rivière des l'horizon (Bison rayons athabascae) et le d'eau des arbres notable (Bison feuillage d'écoliers . Le poussière des temps, mettaient essentiellement dans les chemins), monsieur tandis que le maison des toile et le choses d'Europe épais, des réverbère belle . Les crépuscule entre les fréquente visages vivant actuellement ne saisit pas totalement quinzième . Elles hâtai sans foule, très proches, maladie d'être des chamarrée péché . Il sembla que la siècles, des dernière soit la flamands que celle des complices non grand . Il marchait fruits des terre bonne vivant en silence dans le Caucase cohue depuis les yeux soleil . Il toucher donc considérer Bison l'eau, et Bison bonasus comme deux désespoir et non comme deux cri, flamands, .
