# Info
This notebook works pretty differently from txt_from_web.ipynb. The other notebook is meant to get a bunch of text files for you to analyze. In theory, you would go through it, get what you want, and go back to it as-needed for more text. 

By contrast, `markov_modeling.ipynb` is more interactive, and you're intended to fill certain blocks with your own code to make use of the main object: marky the Markov Model Manager! The functions you can make use of are described in the code blocks, but I'll also have a block with examples of the functions being used, and how you might make use of a Markov Model Manager object.

The main idea is that marky will take the folder you set in section 0 as `root`, create any missing folders, and allow you to handle file management and markov model building all in one place! You can:

* import text files and turn them into markov models
* import markov models from json files
* create new models and then export them as json files
* combine models and export the result as a json file
* create formatted sentence outputs for any model so you can test models and fiddle with settings

(Note: the reason jsons are used here is because you can convert a markovify model to json, save the json, and convert the json back later. Making a model from json is way faster than remaking the model from a string. Get more info on markovify here: https://github.com/jsvine/markovify)

There is another model managing object, nern the Stage Manager, for managing text files representing scripts. Right now, nern is separate from marky so that if you do have scripts you can use nern exclusively (it has all the same functionality as marky), but otherwise you can just use marky.

---
Note that nern requires scripts in the texts/scripts folder to be formatted a certain way. Each line should be a line of dialogue, formatted as:

`Character: Dialogue.`

You can then use nern-specific functions to make strings/markovify models based on certain characters, groups of characters, or of the script without any character tags. Basically, nern leverages the formatting of a script to let you do a little more detailed model-generation based on characters.

# File Manager
Define and manage file/folder structure for importing and exporting text and json files. You'll want to set this up at the start and avoid making changes once you've built your marky and/or nern model.

In [3]:
# note the / at the end!! no matter how long the folder path, it needs to end in / or shit breaks
root = "data/"

# 0. Universal Imports & Definitions
Run the next two blocks before using any of the following tools. The second block defines the two main classes by importing them from `_managers.py` in the root directory, and while you're welcome to read about them, you don't need to! You'll mostly be making an instance of one or both models, and this notebook will explain how to call the various functions, as well as the info on what they do.

In [15]:
# imports and markovify-level class definitions
from _managers import markovManager, stageManager

# defines class that extends markovify model by incorporating part-of-speech tagging
# note that using POSifiedText(string) over markovify.Text(string) will take a lot longer
class POSifiedText(markovify.Text):
    def word_split(self, sentence):
        return ["::".join((word.orth_,word.pos_)) for word in nlp(sentence)]
    
    def word_join(self, words):
        sentence = " ".join(word.split("::")[0] for word in words)
        return sentence

# 1. marky the Markov Model Manager
Manage text files, json files, and markovify models all in one place.

### 1.0 marky Setup
Run the next block first to set everything up.

Here you're creating an instance of `markovManager` given the root folder you defined in File Manager at the top of the notebook.

You CAN change the name, but just so you know the canonical name is `marky`.

If you want to import text files or jsons, set up `txts/` and `jsons/` folders in the folder you defined as root,
    and then add files to them before running this block. You can add stuff later, though.

In [16]:
marky = markovManager(root)

### 1.a marky Examples

What follows are examples and explainers of how to use marky.

#### Things marky Has
The main "things" marky has that you'll be using are three dictionaries. If you're unsure how to make use of dictionaries, I'd read through this resource: https://www.w3schools.com/python/python_dictionaries.asp

The dictionaries are:
1. `marky.txts`: maps names of strings to their contents
2. `marky.models`: maps names of models to a model
3. `marky.jsons`: maps names of jsons to json version of markify models

In [17]:
print("txts in marky: ", marky.txts.keys())
print("models in marky: ", marky.models.keys())
print("jsons in marky: ", marky.jsons.keys())

print("\ntxt contents\n-------")
for key in marky.txts.keys():
    print(key+": ", marky.txts[key][0:100])

print("\nmodels\n-------")
for key in marky.models.keys():
    print(key+": ", marky.models[key])

txts in marky:  dict_keys(['mystery', 'scifi'])
models in marky:  dict_keys(['emma-scifi', 'emma', 'mystery', 'scifi'])
jsons in marky:  dict_keys(['emma-scifi', 'emma', 'mystery', 'scifi'])

txt contents
-------
mystery:  There were thirty-eight patients on the bus the morning I left for Hanover, most of them disturbed a
scifi:  Now that he knew himself to be self he was free to grok ever closer to his brothers, merge without l

models
-------
emma-scifi:  <markovify.text.Text object at 0x000001DE0AE22BE0>
emma:  <markovify.text.Text object at 0x000001DE0A950A60>
mystery:  <markovify.text.Text object at 0x000001DE0C603430>
scifi:  <markovify.text.Text object at 0x000001DE0B49D0D0>


In [18]:
for key in marky.models.keys():
    print(key+": ", marky.output(key))

emma-scifi:  Gradually, he realized that the ritual learning dance of the thin atmosphere contrasts were sharp between light and gay and clear-headed, ready for her to marry again, not knowing that the carriage behind.
emma:  I remember her telling us, of course, the same success and satisfaction, took place that might impede them.
mystery:  Whoever his companion was going to see a police car parked in the wastebasket.
scifi:  His small black eyes darted inside the ship, went into the frontal sinuses, she could have had time to develop a sense of justice.


#### Things marky Does
The main functions you'll be using are:
1. `marky.update()`
2. `marky.make_model(new_name, string, pos=False, state_size=2)`
3. `marky.combine(new_name, model_names, weights=False)`
4. `marky.export(model_name)`
5. `marky.output(model_name)`

If you want to learn about what a function does or how to use it, simply type `help(marky.NAME)`, where `NAME` is the function you want to learn more about. See example below.

In [19]:
help(marky.update)

Help on method update in module _managers:

update() method of _managers.markovManager instance
    Checks the txts and jsons folders for new files to import.
    If the filename (w/o extension) is already in the self.txts or self.jsons dictionary,
        then that file will be skipped.
    Any new jsons will be automatically converted to a markov model and added to self.models.



#### marky Demo
Below is an example of how I might use marky, keeping in mind I locally have `txts\mystery.txt`, `txts\scifi.txt`, and `jsons\emma.json`.

In [127]:
# take all current txts in marky.txts, turn them into models, and then export the models to the jsons folder
for key in marky.txts.keys():
    marky.make_model(key, marky.txts[key])
    marky.export(key)

In [128]:
# combine two models and generate a test sentence
marky.combine("emma-scifi", ["emma", "scifi"], weights=[1, 10])
marky.output("emma-scifi")

'They now walked after her father; but she could not understand how the principle could ever be codified.'

In [129]:
# note that once you've made a new model with make_model or combine, you can just call for output
# the for loop makes it so it prints 5 different outputs
for i in range(5):
    print(marky.output("emma-scifi")+"\n")

Consequently no one knew what was good for him.

In a camp, instead of the time of day when, from some already established premise.

The dromozoa kept him alive, but he did not come out of a parent, it suits my ideas of comfort better than sweetbread and some censure I acknowledge myself liable to.-- My behaviour to Miss Bates; she is so calculated to affect one!-- Miss Woodhouse, is he like to have him sitting just opposite each other.

She was bewildered amidst the confusion of the inwardness of pain.

Exactly so, indeed -- I have seen through such thick - headed nonsense?



In [130]:
# if i like my new model, i can export it
marky.export("emma-scifi")

### 1.b marky Playground
Explore marky's functions!

In [None]:
# have funnn


# 2. nern the Stage Manager
Manage text files, json files, and markovify models all in one place, including formatted scripts. (See `Info` block for more details.) **I would tinker around with marky before using nern**, personally, as nern is kind of an extended version of marky that lets you do stuff with texts that are scripts formatted in a certain way.

### 2.0 nern Setup

The next block is for defining any duplicate names in your script (e.g. if the same character appears as both "Mysterious Man" and "Theodore Logan" at different points in the script), and setting up groupings of characters. These can be passed to nern when you create a dialogue map for a script, where you'll be able to create strings/models given a character name or group name for sets of characters defined below. If you only want to make models based on individual characters or the full script without character tags, you don't need to worry about dupes or groups very much.

Any names in `dupes` will be fixed before making the dialogue map that nern leverages for making strings based on the script. In `dupes`, you should add the name that you want to fix using the format `"Correct Name": "Name to Replace"`. See an example `dupes` dictionary below, where "Artemy Burakh" is replaced by "Haruspex", "Clara" is replaced by "Changeling", etc. So, when making the dialogue map, any instance of "Artemy Burakh: " would be replaced by "Haruspex: ". If you think your names are consistent, you can just set the relevant `dupes` dictionary to `None` (no quotes).

`groups` is a dictionary, mapping a group name to a list of character names. Assuming all the character names exist in the script, you can then use a group name to generate a string of all the dialogue for characters in that group. Then, you can turn this into a markovify model that will generate sentences based on all the characters represented in that group. See the blocks in section 2.a on specific functions for nern to learn more about how to leverage this tool.

Note that you have to define dupes and groups, run the block, and then pass the `script_associations` list as a parameter/variable when creating nern. Eventually, I will have it so you can add these things in later, as right now you have to start over if you add a new script into your scripts folder given new dupes and groups, rather than more seamlessly updating everything. I'm just one woman!!

In [131]:
# dupes dictionaries go here
patho_dupes = {
    "Haruspex": "Artemy Burakh",
    "Changeling": "Clara",
    "Inquisitor": "Aglaya Lilich",
    "Commander": "Alexander Block",
    "Young Vlad": "Vlad the Younger",
    "Big Vlad": "Vlad Olgimsky",
    "Fellow Traveller": "Fellow Traveler",
    "Stanislav Rubin": "Stakh Rubin",
    "The Foreman of the Abattoir": "Foreman Oyun",
    "Hunchback": "Var"
        }

new_dupes = {
    "Correct Name 1": "Name to Replace 1",
    "Correct Name 2": "Name to Replace 2"
}

# groups dictionaries go here
patho_groups = {
"utopians": ["Bachelor", "Andrey Stamatin", "Eva Yan", "Georgiy Kain", "Mark Immortell", "Maria Kaina", "Peter Stamatin", "Victor Kain", "Young Vlad"],
"termites": ["Haruspex", "Capella", "Grace", "Khan", "Murky", "Notkin", "Sticky", "Taya Tycheek"],
"humbles": ["Changeling", "Alexander Saburov", "Anna Angel", "Aspity", "Bad Grief", "Katerina Saburova", "Lara Ravel", "The Foreman of the Abattoir", "Stanislav Rubin", "Yulia Lyuricheva"],
"major": ["Inquisitor", "Commander", "Big Vlad"],
"theater": ["Executor", "Tragedian", "Rat Prophet", "Player", "Fellow Traveller"],
"kin": ["Worm", "Butcher", "Albino", "Ayan", "Dancer Girl", "Herb Bride", "Kin One", "Pack", "Nara", "Tumbler Human", "Wonder Bull"],
"minor": ["Hunchback", "Willow", "Barley", "Isidor Burakh"],
"generic_kid": ["Doghead", "Tot", "Boy", "Girl", "Teensy", "Troublemaker"],
"generic_adult": ["Factory Worker", "Woman", "Patrolman", "Young Man", "Young Lady", "Carouser", "Man", "Officer", "Sickly Man", "Mugger", "Corpsman", "Salesman", "Soldier", "Survivor", "Gravedigger"]
}

new_groups = {
    "group1": ["Name A", "Name B"],
    "group2": ["Name C", "Name D"]
}

# create variables for each script in your scripts folder, relating the filename (without extension) to
# the dupes and groups dictionaries
script_associations = [
    ("pathologic", patho_dupes, patho_groups),
    ("new", new_dupes, new_groups)
]

Run the next block to create an instance of `stageManager` given the root folder you defined in File Manager at the top of the notebook, and the list of tuples associating file names to lists of dupes and groups. (See markdown block above for more info.)

You CAN change the name, but just so you know the canonical name is `nern`. 

If you want to import text files or jsons, set up `txts/`, `jsons/`, and `scripts/` folders in the folder you defined as root, and then add files to them before running this block. You can add stuff later, though.

In [132]:
nern = stageManager(root, script_associations)

### 2.a nern Examples
What follows are examples and explainers of how to use nern.

#### Things nern Has
The main "things" nern has are the same dictionaries as marky (so you can call `nern.models` or `nern.jsons`), but our main string dictionary will be `self.scripts` instead of `self.texts`. The new dictionary is a more detailed version of `self.texts`, which mapped names of texts to their contents. By contrast, `self.scripts` maps names to a tuple of the form:

`(script_contents, dialogue_map, groups)`

The dialogue map is a dictionary, mapping character names to a list of every line of dialogue they have in the script. (Note that `dupes` helps ensure there aren't duplicates in this dictionary.) The groups, taken from those defined in section 2.0, allow you to generate strings/markovify models for entire groups of characters. You have to predefine these because the computer doesn't know what the character names mean, or their relationships. The groupings are significant to you, not the computer.

For what it's worth, you won't be calling this tuple, so don't worry too much about the structure of `nern.scripts`. You'll pretty much never be calling it directly unless you're trying to poke around (which, y'know, feel free). You'll mostly be using a few functions for making models based on the names of scripts in your `txts\scripts\` folder, and otherwise using it very similarly to how you'd use marky.

In [133]:
print("txts in nern: ", nern.txts.keys())
print("scripts in nern: ", nern.scripts.keys())

txts in nern:  dict_keys(['mystery', 'scifi'])
scripts in nern:  dict_keys(['pathologic'])


#### Things nern Does
The main functions you'll be using are:
1. `nern.character_model(model_name, script, character, pos=False, state_size=2)`
2. `nern.group_model(model_name, script, group, pos=False, state_size=2)`
3. `nern.dialogue_model(model_name, script)`
4. `nern.export(model_name)`
5. `nern.output(model_name)`

If you want to learn about what a function does or how to use it, simply type `help(nern.NAME)`, where `NAME` is the function you want to learn more about. See example below.

In [134]:
help(nern.character_model)

Help on method character_model in module __main__:

character_model(model_name, script, character, pos=False, state_size=2) method of __main__.stageManager instance
    Receives a name for a new model to be added to self.models, the name of the script to create a model from,
        and the name of a character to build a model from.
    Note that you can pass it the same optional arguments as you can when creating markovify models.



#### nern Demo
Below is an example of how I might use nern, keeping in mind I locally have `txts\scripts\pathologic.txt` saved locally.

In [135]:
# creates models called "utopians" and "kiddos" in nern.models for the pre-defined groups
# "utopians" and "generic_kid" from the pathologic script
nern.group_model("utopians", "pathologic", "utopians")
nern.group_model("kiddos", "pathologic", "generic_kid")

# creates a model called "grief" in nern.models for the character "Bad Grief" from the "pathologic" script
nern.character_model("grief","pathologic","Bad Grief")

#creates a model called "patho_dia" in nern.models for dialogue in the "pathologic" script
nern.dialogue_model("patho_dia", "pathologic")

In [136]:
# printing output for the models made above
print("Utopians: ", nern.output("utopians"))
print("Bad Grief: ", nern.output("grief"))
print("Pathologic Dialogue: ", nern.output("patho_dia"))
print("Science Fiction: ", nern.output("scifi"))
print("Kid Characters: ", nern.output("kiddos"))

Utopians:  They barely know how to take care of it.
Bad Grief:  The designers of the Warehouses, just you get my name, anyway?
Pathologic Dialogue:  And what about the... shmowders?
Science Fiction:  Yet, if he did not change.
Kid Characters:  They actually could give us your blessings and kisses...


### 2.b nern Playground

Explore nern's functions!

In [None]:
# have funnn
