# Task 2

## Original exercise number

Exercise 13-8

## Description

### Markov analysis:

1. Write a program to read a text from a file and perform Markov analysis. The result should be a dictionary that maps from prefixes to a collection of possible suffixes. The collection might be an array, tuple, or dictionary; it is up to you to make an appropriate choice. You can test your program with prefix length two, but you should write the program in a way that makes it easy to try other lengths.

2. Add a function to the previous program to generate random text based on the Markov analysis. Here is an example from Emma with prefix length 2:

“He was very clever, be it sweetness or be angry, ashamed or only amused, at such a stroke. She had never thought of Hannah till you were never meant for me?" "I cannot make speeches, Emma:" he soon cut it all himself.”

For this example, I left the punctuation attached to the words. The result is almost syntactically correct, but not quite. Semantically, it almost makes sense, but not quite.

What happens if you increase the prefix length? Does the random text make more sense?

3. Once your program is working, you might want to try a mash-up: if you combine text from two or more books, the random text you generate will blend the vocabulary and phrases from the sources in interesting ways.

Credit: This case study is based on an example from Kernighan and Pike, The Practice of Programming, Addison-Wesley, 1999.

### TIP

You should attempt this exercise before you go on.

## My Notes

I think I will skip the point 3 (the mash-up)

## Solution

NO GUARANTEE THAT THE SOLUTION WILL WORK OR WORKS CORRECTLY! USE IT AT
YOUR OWN RISK!

### Functions

In [1]:
function getWords(file_path::String)::Vector{String}
    words::Vector{String} = []
    open(file_path) do file
        for line in eachline(file)
            for word in split(line)
                push!(words, replace(lowercase(word), r"[.,;!?-]" => ""))
            end
        end
    end
    return words
end

getWords (generic function with 1 method)

In [2]:
function getPrefixSuffixesDict(words::Vector{String}, prefixLength::Int = 2)::Dict{String, Set{String}}
    prefix::Vector{String} = []
    pref::String = "" # elements of prefix joined to 1 string
    prefixSuffixes::Dict{String, Set{String}} = Dict()
    for word in words
        if length(prefix) < prefixLength
            push!(prefix, word)
        else
            pref = join(prefix, " ")
            haskey(prefixSuffixes, pref) ?
                push!(prefixSuffixes[pref], word) :
                prefixSuffixes[pref] = Set([word])
            popfirst!(prefix)
            push!(prefix, word)
        end
    end
    return prefixSuffixes
end

getPrefixSuffixesDict (generic function with 2 methods)

Given this mapping, you can generate a random text by starting with any prefix and choosing at random from the possible suffixes. Next, you can combine the end of the prefix and the new suffix to form the next prefix, and repeat.

In [3]:
function getRandWords(prefixSuffixes:: Dict{String, Set{String}}, howManyPrefixes::Int)::Vector{String}
   @assert (howManyPrefixes >= 1) "howManyPrefixes must be an integer greater than 0"
   ks::Vector{String} = collect(keys(prefixSuffixes)) 
   prefixLength::Int = length(split(ks[1], " ")) # one or more words split by space
   prefix::String = ""
   suffix::String = ""
   result::Vector{String} = []
   for i in 1:howManyPrefixes
      prefix = (i == 1) ? rand(ks) : join(last(result, prefixLength), " ")
      suffix = rand(prefixSuffixes[prefix])
      (i == 1) ? push!(result, split(prefix, " ")..., suffix) : push!(result, suffix)
   end
   return result
end

getRandWords (generic function with 1 method)

## Testing

In [4]:
words = getWords("./emma.txt")
prefixSuffixes = getPrefixSuffixesDict(words, 2);

In [5]:
join(getRandWords(prefixSuffixes, 10), " ")

"yet ever seen mrs goddard emma could only take him wholly by"