# Inspirational Randomness: Creating Quotes with Markov Chains
#### Ann Arbor Scientific and Technical Computing, October 29th 2020
#### Presented by David Perner
<img src="img/agByQ.png"></img>

# What are Markov chains?

Markov chains are a way of modeling probabilistic, or heuristic, processes. They do this by enumerating the possible states of the system and then setting the transitions between them, each with a certain probability that it will occur. For instance, below is a Markov chain for what they weather might be like today.
<img src="img/XP9in.jpg"></img>
So if we're checking once a day, if it's rainy now, there's about a 60% chance it'll be rainy tomorrow too, although also a 20% chance it'll be sunny. And if it is sunny tomorrow, there's a 10% chance it'll be rainy the day after that, and so on.

It's also worth noting that Markov chains can be continuous as well as discrete, but that's beyond what we need to know for the moment.

# Where are they useful?

There are a number of circumstances where Markov chains find applications. Some of the questions they can help to answer include:

- The evolution of a chemical reactions, as different chemicals species transition between themselves over time
- The odds of a family becoming rich or impoverished
- If a cell tower will be overloaded based on the odds of people entering or leaving the area

<img src="img/MarkovChain1.png" style="height:300px"></img>

Also, while it's not a focus here, Markov chains can also be run thousands of times (Monte Carlo) to generate overall odds of complex outcomes, instead of just single state transitions.
<img src="img/ChutesAndLadders-sim.gif"></img>

# How does this translate into generating text?

We can think about a sentence as a series of state transitions, from a previous word to the next. If we have enough text, we can quantify how likely certain words are to follow others and construct a Markov chain from that.

<img src="img/text-gen.png"></img>

For this presentation, we'll be using this [quote dataset](https://www.kaggle.com/manann/quotes-500k) from Kaggle. It's not perfect, in that there are occasional typos and not all quotes are in English, but it's more than enough to play around with.

In [1]:
using CSV
using StatsBase

In [2]:
#If the file isn't found, you will have to download it yourself. Details in the README in the resources file
quotes_csv = CSV.File("resources/quotes.csv")

499709-element CSV.File{false}:
 CSV.Row: (quote = "I'm selfish, impatient and a little insecure. I make mistakes, I am out of control and at times hard to handle. But if you can't handle me at my worst, then you sure as hell don't deserve me at my best.", author = "Marilyn Monroe", category = "attributed-no-source, best, life, love, mistakes, out-of-control, truth, worst")
 CSV.Row: (quote = "You've gotta dance like there's nobody watching,Love like you'll never be hurt,Sing like there's nobody listening,And live like it's heaven on earth.", author = "William W. Purkey", category = "dance, heaven, hurt, inspirational, life, love, sing")
 CSV.Row: (quote = "You know you're in love when you can't fall asleep because reality is finally better than your dreams.", author = "Dr. Seuss", category = "attributed-no-source, dreams, love, reality, sleep")
 CSV.Row: (quote = "A friend is someone who knows all about you and still loves you.", author = "Elbert Hubbard", category = "friend, friendsh

In [3]:
key_types = Union{Symbol, AbstractString}

next_word = Dict{key_types, Dict{key_types, Int}}()

function add_word!(d, w, nw)
    if w in keys(d) 
        if nw in keys(d[w])
            d[w][nw] += 1
        else
            d[w][nw]=1
        end
    else
        d[w] = Dict(nw=>1)
    end
end

for r in quotes_csv
    ismissing(r.quote) && continue
    txt = split(r.quote)
    for i = 0:length(txt)
        if i==0
            word = first(txt)
            add_word!(next_word, :start, word)
        elseif i==length(txt)
            word = last(txt)
            add_word!(next_word, word, :stop)
        else
            word = txt[i]
            add_word!(next_word, word, txt[i+1])
        end
    end
end

In [4]:
next_word[:start]

Dict{Union{AbstractString, Symbol},Int64} with 28879 entries:
  "Further,"       => 6
  "Embrace,"       => 1
  "Ruri:"          => 1
  "P33-"           => 2
  "Black(people)"  => 1
  "ilet"           => 1
  "Surprises,"     => 1
  "Silken"         => 1
  "Weekends"       => 5
  "NAFTA"          => 1
  "Roads"          => 6
  "TAKE"           => 2
  "STAY"           => 1
  "Secure"         => 4
  "...glass"       => 1
  "Foresight"      => 1
  "Impersonating"  => 1
  "Horrle"         => 1
  "Kavaata"        => 1
  "Martyrdom:"     => 1
  "Dissimulation," => 1
  "Audacious"      => 1
  "Erik,"          => 1
  "Elisha,'"       => 1
  "Baby,"          => 17
  ⋮                => ⋮

In [5]:
import StatsBase.sample

function sample(wd::Dict{key_types, Dict{key_types, Int}}, w)
    fw = FrequencyWeights(collect(x for x in values(wd[w])))
    nws = collect(keys(wd[w]))
    sample(nws, fw)
end

function create_quote(nw)
    nq = []
    push!(nq, sample(nw, :start))
    while last(nq) != :stop
        push!(nq, sample(nw, last(nq)))
    end
    join(nq[1:end-1], " ")
end

create_quote (generic function with 1 method)

In [6]:
for x in 1:10
    println(string(x) * ". " * create_quote(next_word))
end

1. Someone of vocabularies, and forget it was work of time ago.
2. He laughed up early there is hard he cannot be to live the kind that they were beyond the civilized history. It's just our life has done Theology aside, or the mob, between us what's left. “Um.” I was in life here, you want to impress her, she’d received one, because you create, you want such an irrelevance of your tutu-wearing, ballet-dancing, strut-walking pal se miró las cuales no one is out the circumstances a separate lives within ourselves and homeless and interesting.
3. Patrice had a farm, though their violence for our fears swirling clouds across it would pass truths that you’re doing this fire it is all that time... But in the Outer circumstances blindfold me make one.
4. The truth is also be silent. For us to save the end, but a tradition."She laughed. “Berretta doesn’t mean that we continue to be gathered together at him, his or she passed at a foundation of the logical one', the ring to his demand fidelity 

In [7]:
word_after_next = Dict{key_types, Dict{key_types, Int}}()

for r in quotes_csv
    ismissing(r.quote) && continue
    txt = split(r.quote)
    length(txt)<=1 && continue
    for i = 0:length(txt)-1
        if i==0
            word = txt[2]
            add_word!(word_after_next, :start, word)
        elseif i==length(txt)-1
            word = txt[end-1]
            add_word!(word_after_next, word, :stop)
        else
            word = txt[i]
            add_word!(word_after_next, word, txt[i+2])
        end
    end
end

In [8]:
word_after_next[:start]

Dict{Union{AbstractString, Symbol},Int64} with 34819 entries:
  "confined"        => 1
  "oblique"         => 1
  "phone."          => 1
  "illusionist"     => 1
  "Proves"          => 1
  "libertad"        => 1
  "\"spiritual\""   => 1
  "believe,'"       => 1
  "Sundays,"        => 2
  "dumber"          => 1
  "PEOPLES"         => 1
  "Secure"          => 1
  "dying?\""        => 1
  "Duckett"         => 1
  "Honestlyis"      => 1
  "however―because" => 1
  "Everdeen,"       => 1
  "cuss"            => 1
  "Victrola,"       => 1
  "rises"           => 14
  "outsideness,"    => 1
  "Baby,"           => 2
  "unwholesome"     => 1
  "contentious,"    => 1
  "um,"             => 2
  ⋮                 => ⋮

In [9]:
function sample(nw::Dict{key_types, Dict{key_types, Int}}, wan::Dict{key_types, Dict{key_types, Int}}, w, bw)    
    common_keys = intersect(keys(nw[w]), keys(wan[bw]))
    ck_array = collect(common_keys)
    
    nw_hist = map(x->nw[w][x], ck_array)
    wan_hist = map(x->wan[bw][x], ck_array)
    
    wan_dist = sum(wan_hist)./wan_hist #choose the least likely transitions to create more interesting quotes
    nw_dist = nw_hist/sum(nw_hist)

    comb_dist = wan_dist .* nw_dist/sum(wan_dist .* nw_dist)
    sample(ck_array, ProbabilityWeights(comb_dist))
end

function create_quote(nw, wan)
    nq = []
    push!(nq, sample(nw, :start))
    push!(nq, sample(nw, wan, last(nq), :start))
    while last(nq) != :stop
        push!(nq, sample(nw, wan, last(nq), nq[end-1]))
    end
    join(nq[1:end-1], " ")
end

create_quote (generic function with 2 methods)

In [10]:
create_quote(next_word, word_after_next)

"I can't. Gwen.\" His hands over when does most after both locked into these elements known love, even though we've created for granted shames and what outrage which he let yourself entirely free way without reproducing madly still have answered back."

In [11]:
inspirational_nw = Dict{key_types, Dict{key_types, Int}}()

for r in quotes_csv
    ismissing(r.quote) && continue
    (ismissing(r.category) || !occursin("inspiration", r.category)) && continue
    txt = split(r.quote)
    for i = 0:length(txt)
        if i==0
            word = first(txt)
            add_word!(inspirational_nw, :start, word)
        elseif i==length(txt)
            word = last(txt)
            add_word!(inspirational_nw, word, :stop)
        else
            word = txt[i]
            add_word!(inspirational_nw, word, txt[i+1])
        end
    end
end

In [12]:
for x in 1:10
    println(string(x) * ". " * create_quote(inspirational_nw))
end

1. The problem oriented. 4. Always remembering all with dedication to look to keep our Milky Way. There should be treated you escaped. But when it no other than a funny how successful leaders.
2. You must be quiet homes—not elsewhere—I believe in. We must learn if you were encouraging me? Was it leads to deny the drive our minds are birthed a beautiful universe has to kill yourself? You are often excruciating…but the attraction is the blessings right instead of your thoughts of his tenacity of revelation. You can live our fault if they come out. Always show up in God's love you get into our defense is within the potential is our way we spent as progress is neither misery they bring you ache inside and what your second, and joy when he has an old soldier of distinction.
3. With love, life is serious questions and war, but it's not seeking help? Our lives and those conditions. As citizen must study of all the garden, her intoxicating that often. If there may be conscious, and hate. There

In [13]:
inspirational_wan = Dict{key_types, Dict{key_types, Int}}()

for r in quotes_csv
    ismissing(r.quote) && continue
    (ismissing(r.category) || !occursin("inspiration", r.category)) && continue
    txt = split(r.quote)
    length(txt)<=1 && continue
    for i = 0:length(txt)-1
        if i==0
            word = txt[2]
            add_word!(inspirational_wan, :start, word)
        elseif i==length(txt)-1
            word = txt[end-1]
            add_word!(inspirational_wan, word, :stop)
        else
            word = txt[i]
            add_word!(inspirational_wan, word, txt[i+2])
        end
    end
end

In [14]:
for x in 1:10
    println(string(x) * ". " * create_quote(inspirational_nw, inspirational_wan))
end

1. Just listen with it. Something that's the beauty today. I do something, something for a positive thoughts are the moment becomes real. Your body to your values. On his own thoughts. It's a part ways to destroy you; that you looking straight back with them remember everything might become the tools forge a path leads to nowhere. And when it's really dumb for your part we really very people will find myself back from any good. One day, work of faith. Put the whole Regiment rising and our brain. Your heart that away again. It is GOOD people make sense because that stand up, Stand on the pattern finally materialize in all elements from the weak, bright and that brave So a living things, it was kind enough to give anyone who say 'yes' and there because of having power, my life backwards as the Life doesn´t change you need is, there is what miracle if we let alone on my book, "Diggin' Elroy," because I could battle is the stars Somehow she knew them, feel or where reality that has any kin