# Introduction

Right now, this notebook is not connected to the code defining the manuscript object. We access it with the following import statement. 

In [3]:
from digital_manuscript import BnF

BnF is the name of a <b>class</b>, or plans for creating a specific collection of data, called an <b>instance</b> or <b>object</b>. 

In [11]:
manuscript = BnF() # create an instance of the BnF() class named 'manuscript'

The Manuscipt Object is a container, meaning it stores and catalogues other Python objects. In this case, the objects within the container represent individual entries separated by div tags. The entries are indentified, or keyed, by their div id. We can use the div id to retrieve the entry object from the manuscript.

In [5]:
sword_varnish = manuscript.entry('004v_1')

The entry objects have encoded features, or <b>attributes</b> of their own. Given any entry object, we can, for example, recover its div id. 

In [4]:
sword_varnish.identity

'004v_1'

## Entry Attributes

In [5]:
sword_varnish.get_title('tl')

'Black varnish for sword guard, bands for trunks, &amp;c'

The entry class has a built in `.text()` function to retrieve the text. There are two <b>parameters</b> for this retreival, the version of the manuscript (tc, tcn, or tl), and whether or not to include the XML tags. The manuscript version must be specified, but if the XML parameter is left blank, it defaults to removing the XML. 

In [6]:
sword_varnish.text('tl')

'Black varnish for sword guard, bands for trunks,\n\n&amp;c\n\nTake linseed oil or more cheaply, walnut\n\noil, and rid it of grease with garlic &amp;\n\nonions +hog’s\n\nfennel, some also add bread crusts, which you will\n\nboil in it for a good quarter of an hour. Next, put\n\nin one lb of the oil thus boiled the size of a walnut of black\n\npitch &amp; a double handful of grains of wheat, without\n\nremoving the garlic &amp; onions, and let\n\nit boil together for a good quarter of an hour. And\n\nwhen the pitch is well melted &amp; when the oil has body,\n\nyou can remove it from the fire. Then, to varnish, place your\n\niron over a low charcoal fire &amp; apply your\n\nvarnish with a feather or a brush.\n\nAnd when you see that it no longer smokes, it is done and your\n\nvarnish is dry.\n\nFor excellent black varnish, add two or three paternoster\n\nbeads of jet among the rest.\n\nSome consider walnut oil better.\n\nIf there is a lot of varnish, it needs to boil for at least\n\nha

In [7]:
sword_varnish.text('tc', xml=True)

'<div id="p004v_1" categories="varnish;arms and armor">\n\n<head><m>Vernis noir</m> pour garde<lb/>\n\ndespee bandes de bahus &amp;</head>\n\n<ab>Prens <m>huile de <pa>lin</pa></m> ou pour meilleur marche de\nl<m>huile<lb/>\n\nde <pa>noix</pa></m> Et la fais desgraisser avecq des\n<m><pa>aulx</pa></m> &amp; <del><m><pa>oigno<exp>n</exp>s</pa></m></del>\n<add><mark>+</mark><m><pa>queues de pourrceaulx</pa></m><lb/>\n\naulcuns adjoustent des <m>crostes de\npain</m></add><lb/>\n\nque tu feras bouillir dedans <ms><tmp>un bon quart\ndheure</tmp></ms><lb/>\n\nApres mects dans une <ms>lb</ms> d<m>huile</m> ainsy bouilly <ms>la\ngrosseur<lb/>\n\ndune <pa>noix</pa></ms> de <m>poix noire</m> &amp; une <ms><bp>joinctee</bp></ms>\nde <m><pa>bled de froment</pa></m><lb/>\n\nsans oster les <m><pa>aulx</pa></m> &amp; <m><pa>oignons</pa></m> Et\nlaisser bouillir ensemble<lb/>\n\n<ms><tmp>un bon quart dheure</tmp></ms> Et quand la <m>poix</m> est bien\nfondue<lb/>\n\n&amp; que l<m>huile</m> a corps Tu 

Properties are XML tagged data manually added to the manuscript. The `.get_prop()` function retrieves this data and takes two parameters, the property type and the version of the manuscript. The full list of properties and the syntax for retrieving the properities are shown below.

In [9]:
sword_varnish_materials = sword_varnish.get_prop('material', 'tl')
sword_varnish_materials

['clofe',
 'garlic',
 'galipot',
 'bread crust',
 'iron',
 'grain',
 'hog’s fennel',
 'smoke',
 'oil',
 'jet',
 'charcoal',
 'pitch',
 'onion',
 'varnish',
 'sand']

In [10]:
sword_varnish_plants = sword_varnish.get_prop('plant', 'tcn')
sword_varnish_plants

['oignons',
 'aulx',
 'ail',
 'lin',
 'queues de pourrceaulx',
 'noix',
 'bled de froment']

## Manuscript Attributes

The property attribute of the entries can be used to group entries with similar properties. The `.search` function for the BnF() class can be used for this purpose with varying degrees of specification. First, we can simply specify whether or not a property appears in a manuscript. We can apply as many properties as we care to specify to the search, and if none are specified the entire manuscript will be returned.

In [12]:
has_materials = manuscript.search(material=True)
has_materials

['001r_1',
 '001v_1',
 '002r_2',
 '002v_2',
 '002v_3',
 '003r_1',
 '003r_2',
 '003r_3',
 '003v_1',
 '004r_1',
 '004r_2',
 '004v_1',
 '004v_2',
 '004v_3',
 '005r_1',
 '005r_2',
 '005v_1',
 '006r_1',
 '006v_1',
 '006v_2',
 '007r_1',
 '007r_2',
 '007r_3',
 '007r_4',
 '007v_1',
 '007v_2',
 '007v_3',
 '007v_4',
 '008r_1',
 '008r_2',
 '008r_3',
 '008r_4',
 '008v_1',
 '008v_2',
 '008v_3',
 '008v_4',
 '008v_5',
 '009r_1',
 '009v_1',
 '009v_3',
 '010r_1',
 '010r_2',
 '010r_3',
 '010r_4',
 '010r_5',
 '010v_1',
 '010v_2',
 '010v_3',
 '010v_4',
 '011r_1',
 '011v_1',
 '011v_2',
 '011v_3',
 '011v_4',
 '011v_5',
 '012r_1',
 '012r_2',
 '012r_3',
 '012r_4',
 '012v_1',
 '012v_2',
 '012v_3',
 '013r_1',
 '013r_2',
 '013r_3',
 '013r_4',
 '013v_1',
 '013v_2',
 '013v_3',
 '013v_5',
 '014r_1',
 '015r_1',
 '015r_2',
 '015r_3',
 '015v_1',
 '015v_2',
 '015v_3',
 '015v_4',
 '015v_5',
 '015v_6',
 '016r_1',
 '016v_1',
 '016v_2',
 '016v_3',
 '016v_4',
 '016v_5',
 '017r_1',
 '019v_1',
 '019v_2',
 '019v_3',
 '020r_1',

In [13]:
has_materials_plants = manuscript.search(material=True, plant=True)
has_materials_plants

['001r_1',
 '001v_1',
 '002r_2',
 '003r_1',
 '004v_1',
 '004v_2',
 '004v_3',
 '006r_1',
 '006v_1',
 '007r_2',
 '007r_3',
 '007v_2',
 '008r_1',
 '009r_1',
 '010r_1',
 '010r_3',
 '012r_3',
 '014r_1',
 '015r_1',
 '015v_6',
 '016r_1',
 '016v_1',
 '016v_2',
 '016v_4',
 '016v_5',
 '020r_1',
 '020r_2',
 '020v_1',
 '020v_2',
 '020v_4',
 '024v_1',
 '029r_1',
 '029v_1',
 '029v_6',
 '031r_1',
 '032v_1',
 '033r_1',
 '033v_2',
 '035v_1',
 '035v_2',
 '036r_1',
 '036v_3',
 '037r_2',
 '037v_1',
 '038r_2',
 '038r_4',
 '038v_2',
 '038v_5',
 '038v_6',
 '039r_2',
 '039v_3',
 '040r_1',
 '041r_04',
 '041v_3',
 '042v_1',
 '042v_5',
 '042v_6',
 '043v_1',
 '044r_5',
 '044v_2',
 '044v_3',
 '046r_1',
 '046r_2',
 '046r_3',
 '046r_4',
 '047r_1',
 '047r_3',
 '047r_5',
 '048r_3',
 '049r_2',
 '049r_3',
 '050r_1',
 '050v_1',
 '051r_1',
 '051r_2',
 '052r_1',
 '053v_1',
 '055r_3',
 '055v_2',
 '056r_1',
 '056r_2',
 '056v_1',
 '057v_1',
 '057v_2',
 '058v_1',
 '058v_4',
 '058v_5',
 '059r_2',
 '060r_3',
 '060v_3',
 '063r_1'

We can also search based on specific properties. The terms to be searched for, even if there is only one, must be written in a list.

In [14]:
rose_entries = manuscript.search(plant=['rose'])
rose_entries

['010r_1',
 '010r_3',
 '116r_4',
 '120v_6',
 '129r_4',
 '154v_4',
 '155r_1',
 '155v_1',
 '159v_2',
 '169r_1']

In [15]:
iron_water_entries = manuscript.search(material=['iron', 'water'])
iron_water_entries

['001v_1',
 '003r_1',
 '003r_3',
 '003v_1',
 '004r_2',
 '004v_1',
 '004v_3',
 '005r_1',
 '005v_1',
 '006r_1',
 '006v_2',
 '007r_1',
 '007v_3',
 '008v_2',
 '008v_5',
 '010r_5',
 '010v_1',
 '010v_3',
 '010v_4',
 '011r_1',
 '011v_1',
 '012r_1',
 '012r_4',
 '012v_3',
 '013v_2',
 '015v_4',
 '016r_1',
 '016v_1',
 '017r_1',
 '020v_1',
 '021r_2',
 '022v_1',
 '023r_1',
 '023v_1',
 '024v_1',
 '028v_1',
 '029r_1',
 '029v_2',
 '029v_6',
 '030r_2',
 '031r_2',
 '031r_3',
 '032r_3',
 '032v_3',
 '035r_2',
 '035r_3',
 '036v_3',
 '037r_3',
 '037r_4',
 '038r_4',
 '039r_2',
 '039v_3',
 '040r_2',
 '040r_3',
 '041v_3',
 '042r_2',
 '042v_1',
 '042v_2',
 '042v_3',
 '043v_1',
 '044v_2',
 '046r_1',
 '047r_1',
 '047v_1',
 '048v_1',
 '048v_2',
 '049r_1',
 '049v_2',
 '050r_1',
 '050v_2',
 '051r_1',
 '051r_2',
 '052r_1',
 '052v_2',
 '053r_2',
 '053r_4',
 '053v_1',
 '055v_1',
 '055v_2',
 '055v_3',
 '055v_4',
 '056r_1',
 '056v_1',
 '057v_1',
 '057v_2',
 '058v_4',
 '058v_5',
 '059r_2',
 '060r_3',
 '060v_2',
 '061v_1',

In [16]:
iron_water_rose = manuscript.search(material=['iron', 'water'], plant=['rose'], tool=True)
iron_water_rose

['155r_1', '155v_1', '169r_1']

This list of identities can be converted into a manuscript object containing only the entries found in the list by passing the list as a parameter into a new object.

In [17]:
rose_manuscript = BnF(rose_entries)

In [18]:
for identity, entry in rose_manuscript.entries.items():
    print(identity, entry.title['tl'])

010r_1 Counterfeit jasper
010r_3 Roses
116r_4 A way to grind enamel gold very delicate gold rose leaves and others
120v_6 Keeping dry flowers in the same state all year
129r_4 Molded roses
154v_4 Reinforcing flowers and delicate things
155r_1 Molding a rose
155v_1 Rose
159v_2 Carnation
169r_1 [List of processes]


## Custom Data

Let's take this reduced manuscript, and generate a data table representing its contents. One advantage of using Jupyter Notebook is that it allows us to import open source packages on a browser without downloading them to a machine. Pandas allows us to easily generate, manipulate, and save spreadsheets, or DataFrames.

In [22]:
df = rose_manuscript.tablefy()
df

Unnamed: 0,entry,folio,folio_display,div_id,categories,heading_tc,heading_tcn,heading_tl,animal,body_part,...,measurement,music,plant,place,personal_name,profession,sensory,tool,time,weapon
0,<recipe.Recipe object at 0x10be26a10>,010r,10r,010r_1,stones;decorative,Jaspe contrefaict,Jaspe contrefaict,Counterfeit jasper,,,...,,,roses; aspic,,,,,limer; rabot,,
1,<recipe.Recipe object at 0x10be1a410>,010r,10r,010r_3,decorative,Roses,Roses,Roses,,,...,,,roses,,,,,,,
2,<recipe.Recipe object at 0x10c7bdf90>,116r,116r,116r_4,metal process,Moyen desmouailler desmailler des foeilles dor...,Moyen desmouailler d’esmailler des foeilles do...,A way to grind enamel gold very delicate gold ...,,,...,,,roses; rose,,,,,,,
3,<recipe.Recipe object at 0x10cd898d0>,120v,120v,120v_6,preserving,Garder fleurs seiches en mesme estat toute la...,Garder fleurs seiches en mesme estat toute l’...,Keeping dry flowers in the same state all year,,poin; doibs; main,...,quand leur saison passe; plusieurs jours; horo...,,roses; soulcis; amaranthe; boufains; fleurs ja...,,,orfevres; verriers,,poin; pinceau; horologe de sable; grands vases...,plusieurs jours; saison; toute lannee; hors de...,
4,<recipe.Recipe object at 0x10c4df490>,129r,129r,129r_4,casting,Roses moulees,Roses moulées,Molded roses,mouches,,...,,,roses; caprier; froment; pensees,,,,,,,
5,<recipe.Recipe object at 0x10c8021d0>,154v,154v,154v_4,casting,Renforcer les fleurs et choses delicates,Renforcer les fleurs et choses delicates,Reinforcing flowers and delicate things,mouche,,...,,,roses; pensee; froment,,,,,petit pinceau,,
6,<recipe.Recipe object at 0x10c802d50>,155r,155r,155r_1,casting,Mouler une rose,Mouler une rose,Molding a rose,,alheine,...,,,rose; fraisiers; rosiers; rosier,,,,,moules; moule; filets; cercle dardille,long temps,
7,<recipe.Recipe object at 0x10c7f4f50>,155v,155v,155v_1,casting,Rose,Rose,Rose,poisson; mouche,,...,,,rose,,,,,petite poincte de fer chaulde; tige de fil de ...,,
8,<recipe.Recipe object at 0x10c80a550>,159v,159v,159v_2,casting,Oeillet,Oeillet,Carnation,,,...,,,rose; soulcy; oeillet,,,,,moule,,
9,<recipe.Recipe object at 0x10c133bd0>,169r,169r,169r_1,lists,[Liste de procédés],[Liste de procédés],[List of processes],tortues; mousches; chancre; poissons; oiseaulx...,,...,,,rose; fraise; vigne; capilli veneris; oeillets...,,,,,chassis; filets,,


New, custom columns can be added by creating a function that defines this property given an entry object as an input.

In [23]:
def foo(entry):
    return 'foo ' + entry.identity

In [24]:
def first_word_tl(entry):
    return entry.text('tl', xml=False).strip().split(' ')[0]

In [25]:
df['foo'] = df.entry.apply(lambda x: foo(x))
df['first_word'] = df.entry.apply(lambda x: first_word_tl(x))
df

Unnamed: 0,entry,folio,folio_display,div_id,categories,heading_tc,heading_tcn,heading_tl,animal,body_part,...,plant,place,personal_name,profession,sensory,tool,time,weapon,foo,first_word
0,<recipe.Recipe object at 0x10be26a10>,010r,10r,010r_1,stones;decorative,Jaspe contrefaict,Jaspe contrefaict,Counterfeit jasper,,,...,roses; aspic,,,,,limer; rabot,,,foo 010r_1,Counterfeit
1,<recipe.Recipe object at 0x10be1a410>,010r,10r,010r_3,decorative,Roses,Roses,Roses,,,...,roses,,,,,,,,foo 010r_3,Roses\n\nThese
2,<recipe.Recipe object at 0x10c7bdf90>,116r,116r,116r_4,metal process,Moyen desmouailler desmailler des foeilles dor...,Moyen desmouailler d’esmailler des foeilles do...,A way to grind enamel gold very delicate gold ...,,,...,roses; rose,,,,,,,,foo 116r_4,A
3,<recipe.Recipe object at 0x10cd898d0>,120v,120v,120v_6,preserving,Garder fleurs seiches en mesme estat toute la...,Garder fleurs seiches en mesme estat toute l’...,Keeping dry flowers in the same state all year,,poin; doibs; main,...,roses; soulcis; amaranthe; boufains; fleurs ja...,,,orfevres; verriers,,poin; pinceau; horologe de sable; grands vases...,plusieurs jours; saison; toute lannee; hors de...,,foo 120v_6,Keeping
4,<recipe.Recipe object at 0x10c4df490>,129r,129r,129r_4,casting,Roses moulees,Roses moulées,Molded roses,mouches,,...,roses; caprier; froment; pensees,,,,,,,,foo 129r_4,Molded
5,<recipe.Recipe object at 0x10c8021d0>,154v,154v,154v_4,casting,Renforcer les fleurs et choses delicates,Renforcer les fleurs et choses delicates,Reinforcing flowers and delicate things,mouche,,...,roses; pensee; froment,,,,,petit pinceau,,,foo 154v_4,Reinforcing
6,<recipe.Recipe object at 0x10c802d50>,155r,155r,155r_1,casting,Mouler une rose,Mouler une rose,Molding a rose,,alheine,...,rose; fraisiers; rosiers; rosier,,,,,moules; moule; filets; cercle dardille,long temps,,foo 155r_1,Molding
7,<recipe.Recipe object at 0x10c7f4f50>,155v,155v,155v_1,casting,Rose,Rose,Rose,poisson; mouche,,...,rose,,,,,petite poincte de fer chaulde; tige de fil de ...,,,foo 155v_1,Rose\n\nBecause
8,<recipe.Recipe object at 0x10c80a550>,159v,159v,159v_2,casting,Oeillet,Oeillet,Carnation,,,...,rose; soulcy; oeillet,,,,,moule,,,foo 159v_2,Carnation\n\nIt
9,<recipe.Recipe object at 0x10c133bd0>,169r,169r,169r_1,lists,[Liste de procédés],[Liste de procédés],[List of processes],tortues; mousches; chancre; poissons; oiseaulx...,,...,rose; fraise; vigne; capilli veneris; oeillets...,,,,,chassis; filets,,,foo 169r_1,List


Pandas can handle much of the difficult file processing required to save this data. With the two lines below, we can drop the column that represents the memory address of the recipe object, and convert the DataFrame to .csv format. If `df.to_csv()` were run in an executable function fromn the command line, the file will be saved in the current working directory.

In [21]:
df = df.drop(columns=['entry'])
df.to_csv(index=False)

'folio,folio_display,div_id,categories,heading_tc,heading_tcn,heading_tl,margins,del_tags,figures,animal,body_part,currency,definition,environment,material,medical,measurement,music,plant,place,personal_name,profession,sensory,tool,time,weapon,foo,first_word\n010r,10r,010r_1,stones;decorative,Jaspe contrefaict,Jaspe contrefaict,Counterfeit jasper,3,horn,unknown,,,,,,jaspe; gomme armoniac; jaspe contrefaict; roses; colle forte; vinaigre; or &amp; argent en foeille; verre; huile daspic; corne; limaille de talc ou despingle; vernis daspic; pierres; jaspe grumeleux; laines a gros poil; huiler; argent; foeille destaing; tourmentine claire; cornalines; ciment; tourmentine,,,,roses; aspic,,,,,rabot; limer,,,foo 010r_1,Counterfeit\n010r,10r,010r_3,decorative,Roses,Roses,Roses,3,<ill/>,unknown,,,,,,corne de lanternes; parchemin,,,,roses,,,,,,,,foo 010r_3,"Roses\n\nThese"\n116r,116r,116r_4,metal process,Moyen desmouailler desmailler des foeilles dore roses dor fort subtilles et aultres,Moyen des