GitHub - Jakubus/Random-sentences: Generates pseudorandom sentences based on a sample of text.

Branches Tags

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
.gitattributes		.gitattributes
Random sentence.cpp		Random sentence.cpp
Random sentence.exe		Random sentence.exe
markov_dict.cpp		markov_dict.cpp
markov_dict.h		markov_dict.h
readme.txt		readme.txt
sample1_miserables.txt		sample1_miserables.txt
sample2_potop-t1.txt		sample2_potop-t1.txt
stdafx.cpp		stdafx.cpp
stdafx.h		stdafx.h
targetver.h		targetver.h
word_extended.cpp		word_extended.cpp
word_extended.h		word_extended.h

Repository files navigation

The point of the project is to generate pseudorandom sentences using Markov chains. The program first loads a sample file (e.g. a book) and generates a form of a dictionary. Then it generates the sentence using that dictionary. 

The idea is simple: for given two consecutive words, it choose a random successor, with probability proportional to the number of corresponding triplets in the source. 
E.g. if we have one "I am stupid" and three "I am hungry" and no other "I am" in a source file, then in a generated sentence after the words "I am" there is:
Pr(stupid)=1/4, 
Pr(hungry)=3/4.

In general, Pr(C|(A,B))  = #(A,B,C) / #(A,B).

As an example, I add two sample files, from which one can generate a dictionary: 
- V.Hugo, Les Misérables (from gutenberg.org) - in English,
- H.Sienkiewicz, Potop (en. The Deluge), Vol.I (from wolnelektury.pl) - in Polish.
One can also upload any (long enough) text file, with UTF-8 encoding. Regard that current speed of loading a file is about 80 kB/s. If someone has any idea to improve this result, please let mi know.