# Madlibs

## Madlibs style word substitution
Given a (multiline) f-string, match the substitutes to the targets. If the targets have no subs, put an empty string (default value).

In [7]:
# Attempt one
def g(inp:str, **subs) -> str:
    return inp.format(**subs)

In [8]:
replacements = {"x":70}
# replacements = {"x":70, "w":80, "z": 90}
# replacements = {"x":70, "w":80, }

sample = "The number is {x} {w} {z}"
test = g(sample, **replacements)
print(test)

KeyError: 'w'

Problem: if the dict doesnt contain it, theres a keyerror

In [9]:
# Attempt 2: use a custom formatter (PEP 3101)
from string import Formatter
from typing import Dict
class MadLibber(Formatter):
    def __init__(self, default="") -> None:
        super().__init__()
        self.default=default

    def get_value(self, key, args, kwds:Dict):
        if isinstance(key, str):
            return kwds.get(key, self.default)
        else:
            return super().get_value(key, args, kwds)


In [10]:
mL = MadLibber()
print(mL.format(sample, **replacements))

The number is 70  


Hopefully we can improve on its inelegance. Also I dont fully understand the formatter class.

In [17]:
# Attempt 3: format_map
class Default(dict):
    def __missing__(self, key):
        return '{'+key+'}'
class Default2(dict):
    def __missing__(self, key):
        return ""

In [18]:
print(sample.format_map(Default(replacements)))
print(sample.format_map(Default2(replacements)))

The number is 70 {w} {z}
The number is 70  


In [19]:
multiline ="""
Lorem Ipsum is {adj} dummy text of the printing 
and typesetting industry. Lorem Ipsum has been the 
industry's standard dummy text ever since the 1500s, 
when an unknown printer took a {noun} of type and 
scrambled it to make a type specimen book. It has 
survived not only five centuries, but also the 
leap {preposition} electronic typesetting, remaining 
essentially unchanged. It was {verb} in the 
1960s with the release of Letraset sheets containing 
Lorem Ipsum passages, and more recently with 
desktop publishing software like Aldus PageMaker 
including versions of Lorem Ipsum.
"""
subs = {"adj":"interestingly",
        "noun": "human",
        "preposition": "at"}
print(multiline.format_map(Default(subs)))
print(multiline.format_map(Default2(subs)))


Lorem Ipsum is interestingly dummy text of the printing 
and typesetting industry. Lorem Ipsum has been the 
industry's standard dummy text ever since the 1500s, 
when an unknown printer took a human of type and 
scrambled it to make a type specimen book. It has 
survived not only five centuries, but also the 
leap at electronic typesetting, remaining 
essentially unchanged. It was {verb} in the 
1960s with the release of Letraset sheets containing 
Lorem Ipsum passages, and more recently with 
desktop publishing software like Aldus PageMaker 
including versions of Lorem Ipsum.


Lorem Ipsum is interestingly dummy text of the printing 
and typesetting industry. Lorem Ipsum has been the 
industry's standard dummy text ever since the 1500s, 
when an unknown printer took a human of type and 
scrambled it to make a type specimen book. It has 
survived not only five centuries, but also the 
leap at electronic typesetting, remaining 
essentially unchanged. It was  in the 
1960s with the rel

## Getting a corpus of passages
Let's actually make a game! We want lots of different passages of similar lengths so we can make them

In [33]:
import pandas as pd

In [34]:
# extract articles from https://www.kaggle.com/datasets/gowrishankarp/newspaper-text-summarization-cnn-dailymail?resource=download
df = pd.read_csv("archive/cnn_dailymail/test.csv")
articles_df = df['article'].str.replace('. ','.\n', regex=False) #

In [35]:
# check the first article in a single file
with open("passage.txt", "w") as f:
    text = articles_df.iloc[0]
    print(text)
    f.write(text)
    

Ever noticed how plane seats appear to be getting smaller and smaller? With increasing numbers of people taking to the skies, some experts are questioning if having such packed out planes is putting passengers at risk.
They say that the shrinking space on aeroplanes is not only uncomfortable - it's putting our health and safety in danger.
More than squabbling over the arm rest, shrinking space on planes putting our health and safety in danger? This week, a U.S consumer advisory group set up by the Department of Transportation said at a public hearing that while the government is happy to set standards for animals flying on planes, it doesn't stipulate a minimum amount of space for humans.
'In a world where animals have more rights to space and food than humans,' said Charlie Leocha, consumer representative on the committee. 'It is time that the DOT and FAA take a stand for humane treatment of passengers.' But could crowding on planes lead to more serious issues than fighting for space 

In [36]:
# export to CSV
articles_df.to_csv("articles.csv")


# NLP
We want to use part-of-speech taggers to tag words to the 8 parts of speech: nouns, verbs, adjectives, adverbs, connectives, pronouns and prepositions.
This allows us to choose certain numbers of each word type to blank out.

"Ever noticed how plane seats appear to be getting smaller and smaller? With increasing numbers of people taking to the skies, some experts are questioning if having such packed out planes is putting passengers at risk.\nThey say that the shrinking space on aeroplanes is not only uncomfortable - it's putting our health and safety in danger.\nMore than squabbling over the arm rest, shrinking space on planes putting our health and safety in danger? This week, a U.S consumer advisory group set up by the Department of Transportation said at a public hearing that while the government is happy to set standards for animals flying on planes, it doesn't stipulate a minimum amount of space for humans.\n'In a world where animals have more rights to space and food than humans,' said Charlie Leocha, consumer representative on the committee.\xa0'It is time that the DOT and FAA take a stand for humane treatment of passengers.' But could crowding on planes lead to more serious issues than fighting for