# Recipe generator

In this notebook we use [TextBlob](https://textblob.readthedocs.io/en/dev/) to extract nouns, verbs, and sentences from the OCRd text of a 19th century cookery book. We try to clean things up a bit, using regular expressions to discard likely OCR errors. Then we recombine the various parts in random combinations to create delicious recipes for all occasions. Enjoy!

Inspired by [*Australian Plain Cookery by a Practical Cook*](https://nla.gov.au/nla.obj-579917051), 1882.

In [24]:
import requests
from textblob import TextBlob
import re
import random
import pandas as pd
from IPython.display import display, HTML

In [25]:
# The Cloudstor URL links to the repository of OCRd text from Trove digitised books
CLOUDSTOR_URL = 'https://cloudstor.aarnet.edu.au/plus/s/ugiw3gdijSKaoTL'
# File name of the cookery book
text_file = 'australian-plain-cookery-by-a-practical-cook-nla.obj-579917051.txt'

First we procure a recipe book.

In [26]:
# Download the text of the book
response = requests.get(f'{CLOUDSTOR_URL}/download?files={text_file}')

Then we slice and dice the words to create a new TextBlob.

In [27]:
# Create a TextBlob using the text
blob = TextBlob(response.text)

Carefully we remove the nouns and the verbs, discarding any that are spoiled.

In [28]:
# Get the verbs filtering out short words and those including non-alpha characters.
# 'VBD' is the part of speech tag for a past tense verb
verbs = [w.title() for w, t in blob.tags if t == 'VBD' and len(w) > 3 and w.isalpha()]

In [29]:
# Get the nouns filtering out short words and those including non-alpha characters.
# NNP is the POS tag for proper nouns
nouns = [w.title() for w, t in blob.tags if t.startswith('NNP') and len(w) > 3 and w.isalpha()]

Now it is necessary to prepare the sentences. First extract them from the blob. Discard any that seem ill-formed.

In [30]:
# Get the sentences from the blob
# Uses a regexp to exclude those that include anything other than standard letters, numbers, and punctuation.
sentences = [str(s).replace('\n', ' ') for s in blob.sentences if re.match(r'^[a-zA-Z\s\-,\.;0-9\'&\(\):]*$', str(s))]

The sentences now need to be divided, to separate out the titles, which are recognised by their case.

In [31]:
# Titles in this cookbook are in uppercase, so we can separate them out from the rest of the sentences.
titles = [s for s in sentences if s.strip('.').isupper()]
sentences = [s for s in sentences if not s.strip('.').isupper()]

Now we are ready to start cooking!

In [32]:
def recipe_maker(num=5):
    html = ''
    # Get a random title
    title = random.choice(titles)
    html = f'<h4>{title}</h4>'
    html += '<h5>Ingredients:</h5>'
    html += '<ol>'
    for n in range(1, num + 1):
        # Make a random selection from the nouns & verbs
        html += f'<li>{random.choice(verbs)} {random.choice(nouns)}</li>'
    html += '</ol>'
    html += '<h5>Method:</h5>'
    # Get random sentences and combine
    html += f'<p>{" ".join(random.sample(sentences, num))}</p>'
    display(HTML(html))

In [41]:
recipe_maker(6)

## What's next?

There's a [full list of the POS (Part of Speech) tags](https://www.geeksforgeeks.org/python-part-of-speech-tagging-using-textblob/) here if you'd like to play with different combinations.

Perhaps we could add some more cookbooks? Let's load details of all the digitised books in Trove that include the word 'cookery' in the title.

In [11]:
df = pd.read_csv('https://raw.githubusercontent.com/GLAM-Workbench/trove-books/master/trove_digitised_books_with_ocr.csv')

In [12]:
df.loc[(df['title'].str.contains('cookery')) & (df['text_downloaded'] == True)]

Unnamed: 0,title,url,contributors,date,fulltext_url,trove_id,language,rights,pages,form,volume,parent,children,text_downloaded,text_file
1888,The Kingswood cookery book / by H. F. Wicken,https://trove.nla.gov.au/work/12721516,"Wicken, H",1885-1950,https://nla.gov.au/nla.obj-43987239,nla.obj-43987239,English,Out of Copyright|http://rightsstatements.org/v...,278,Book,,,,True,the-kingswood-cookery-book-by-h-f-wicken-nla.o...
2582,Electric cookery book : being an indispensable...,https://trove.nla.gov.au/work/16383834,State Electricity Commission of Victoria,1940-1949,http://nla.gov.au/nla.obj-52836472,nla.obj-52836472,English,No known copyright restrictions|http://rightss...,73,Book,,,,True,electric-cookery-book-being-an-indispensable-h...
2654,The English and Australian cookery book : cook...,https://trove.nla.gov.au/work/16551115,"Abbott, Edward, 1801-1869",1864-2014,https://nla.gov.au/nla.obj-9562000,nla.obj-9562000,English,Out of Copyright|http://rightsstatements.org/v...,356,Book,,,,True,the-english-and-australian-cookery-book-cooker...
4431,Australian plain cookery / by a Practical Cook...,https://trove.nla.gov.au/work/18493439,Old housekeeper,1882-1897,http://nla.gov.au/nla.obj-579917051,nla.obj-579917051,,,148,Book,,,,True,australian-plain-cookery-by-a-practical-cook-r...
7688,The Armidale Red Cross cookery book of tested ...,https://trove.nla.gov.au/work/20631441,Australian Red Cross Society. Armidale Branch,1920,https://nla.gov.au/nla.obj-52792201,nla.obj-52792201,English,Out of Copyright|http://rightsstatements.org/v...,82,Book,,,,True,the-armidale-red-cross-cookery-book-of-tested-...
8173,The Kandy Koola cookery book and housewife's c...,https://trove.nla.gov.au/work/21067450,Kandy Koola Tea,1898,https://nla.gov.au/nla.obj-2409723409,nla.obj-2409723409,English,Out of Copyright|http://rightsstatements.org/v...,76,Book,,,,True,the-kandy-koola-cookery-book-and-housewife-s-c...
8491,"The Hawkesbury and Shoalhaven calendar, cultur...",https://trove.nla.gov.au/work/21309432,Woodhill & Co,1905,http://nla.gov.au/nla.obj-28658844,nla.obj-28658844,English,Out of Copyright|http://rightsstatements.org/v...,200,Book,,,,True,the-hawkesbury-and-shoalhaven-calendar-cultura...
9457,Hebrew cookery / by an Australian,https://trove.nla.gov.au/work/22242397,Australian,1867,http://nla.gov.au/nla.obj-52864954,nla.obj-52864954,English,No known copyright restrictions|http://rightss...,25,Book,,,,True,hebrew-cookery-by-an-australian-nla.obj-528649...
9472,"Recipes given by Mrs. Wicken at cookery class,...",https://trove.nla.gov.au/work/22249810,"Wicken, H",1888,http://nla.gov.au/nla.obj-533356312,nla.obj-533356312,English,Out of Copyright|http://rightsstatements.org/v...,16,Book,,,,True,recipes-given-by-mrs-wicken-at-cookery-class-w...
13145,"Southland Red Cross cookery book, 1916",https://trove.nla.gov.au/work/237279068,,1916,https://nla.gov.au/nla.obj-49498371,nla.obj-49498371,English,Out of Copyright|http://rightsstatements.org/v...,187,Book,,,,True,southland-red-cross-cookery-book-1916-nla.obj-...


To use a different one of these as the source for our recipe generator, just copy the index value, and then get the name of the `text_file`. Like this:

In [13]:
df.loc[8173]['text_file']

'the-kandy-koola-cookery-book-and-housewife-s-compa-nla.obj-2409723409.txt'

Copy and paste the file name into the `text_file` value at the top of this notebook, and then re-run the cells.

How might we combine ingredients from **all** of these cook books?

----

Created by [Tim Sherratt](https://timsherratt.org) for the [GLAM Workbench](https://glam-workbench.github.io/).