Skip to content

Randomly generate a Monty Python script, using statistics of the original script

Notifications You must be signed in to change notification settings

brouberol/Generate-Monty-Pyhon-Dialog

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

7 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Monty Python Text Generator

This project was part of my Data Compression class. The main idea is to use different Markov chains to analyse a data text and symbol probabilities to hence be able to generate a similar (but quite different!) text.

What does it do?

The first part consists in analysing the "Monty Python and the Holy Grail" script using Markov chains of order 10. (more details on Markov chains). Briefly, Markov chains are a mathematical object stating that the probability of an event depends on the k last events, whiwh is perfectly suitable for text modelling (and I believe it was the first goal of Markov). Indeed, the probability of the character 'e' is quite strong if the past is 'th'. The whole idea is to know "how long" should be this past.

The scripts py/probabilities.py and py/probabilities_multi.py will analyse the text (for each Markov order in the list k_list) and build matrices of empirical transition probability between a k-uple (the past) and a symbol

These two scripts perform EXACTLY the same operations and will give exact same results. However, py/probabilities_multi.py use the multiprocessing python library to decrease execution time. If you have a multi-core architecture, that's the script you want to use.

The script py/random_texts.py will use these matrices to generate symbols based on the previous ones.

How to use them?

$ python probabilities.py ../data/data.txt
$ python probabilities_multi.py ../data/data.txt
$ python random_text.py ../data/data.txt [output_size]

Example

For k = 5:

KING ARTHUR: Yes!
VILLAGER #3: A bit.
VILLAGER #1: You saw saw saw it, did you could
separate, and master that!
ARTHUR: Will you on Thursday.
CUSTOMER: What do you can you think kill your every
good people. It’s one.)
OTHER FRENCH GUARDS: [whisperin

for k = 10:

KING ARTHUR: Will you ask your master that Arthur from the behind you looked–
DENNIS: Oh, what a give-away. Did you hear that, eh?
By exploiting the workers! By ’anging on to outdated imperialist dogma which perpetuates the economic and social differences

When the context is large enough (k=10), the sentences begin to make sense (limited by the fact of the randomness, and by the fact that THIS IS MONTHY FREAKING PYTHON!)

Why is the first script so slow?

~95% of exec time is spent in the built-in str.count() method (cf entropia.prof file). This method being written in C and optimized, optimization techniques like using Cython/Shedskin do not apply.

However, a parallel/multiprocessing technique can apply very easily (script py/probabilities_multi.py

). It lead to a nice x2.8 exec speed result on my QuadCore laptop.

About

Randomly generate a Monty Python script, using statistics of the original script

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages