# RSA DDL 2022:  Using Jupyter Notebooks and Pandas to Work with Data

Richard Freedman (Haverford College)<br>
rfreedma@haverford.edu  <br>
https://orcid.org/0000-0001-5550-3674 <br>

Daniel Russo-Batterham (The University of Melbourne) <br>
daniel.russo@unimelb.edu.au <br>
https://orcid.org/0000-0001-7550-528X <br>


## A. Introduction


* **Jupyter Notebooks** allow anyone to run **Python** code in any browser, without the need to use the terminal or command line

* **Jupyter Notebooks** are organized as 'cells', which can be **commentary** (like this one, which is static), or **code** (those below, which produce dyanmic output in the form of charts or tabular data frames.  

* To run an individual cell, use the **`arrow/run`** command at the top of the Notebook, or just press **`Shift + Enter`** on your keyboard.


### A.1 Import Libraries

Before we can do anything we need to import the python tools we need:

* **pandas** is the main library
* **matplot** will help us render graphs and charts
* **folium** will help us with GIS and maps
* there are dozens of other standard libraries; if they work with Python, they will work with Jupyter and Pandas

In [1]:
import pandas as pd
import matplotlib.pyplot as plt
# import networkx as nx
import folium
import numpy as np
# from datetime import datetime

### A.2  URLs of our Data Sets

In [2]:
beatles = 'data_files/TheBeatlesCleaned.csv'
emblematica_books = "data_files/bookObjects.csv"
emblematica_emblems = "data_files/emblemObjects.csv"
decima_data = "data_files/Decima_data.csv"
bartoli_letters = 'https://docs.google.com/spreadsheets/d/e/2PACX-1vRhB4rw1W7DMgO8EEn8jhvRVuIGe-NvcTVMSwCXKxZQ3_B8LUB8OmLhMKAbFap8K3VqcfEQxG8ZKYvq/pub?output=csv'

du_chemin_poems = 'https://docs.google.com/spreadsheets/d/e/2PACX-1vRrNA-LgA5pyIIB-FPDuIBU5eZy6dnL5WHHIkRDJIhi-G9fZPemaa6s8h_bNKg5mSx6yi7gFYdIk64w/pub?output=csv'

compagnie_data = 'https://docs.google.com/spreadsheets/d/e/2PACX-1vRSKiDNpqPfMPLcLz-W6amuDsVZabZuFWqcrpiXNVsZecxRKlFCkoFlCGYkm5YVJaKllH9HWV3FvO-V/pub?output=csv'

### A.3  Make a DataFrame

* The basic unit of work in Pandas is the DataFrame (a table representing the data as columns (with the headings from our CSV files) and rows.
* The left-hand column is called the `index`.  Here it's simply a number, but we can change that.  **NB**:  index starts at "0"!

**Create a DataFrame** by importing one of the CSV files.  We'll also give the frame a name (`books`):

In [3]:
books = pd.read_csv(beatles)
books.dropna()

Unnamed: 0,id,year,album,song,danceability,energy,speechiness,acousticness,liveness,valence,duration_ms
0,1,1963,Please Please Me,I Saw Her Standing There,0.491,0.801,0.0361,0.2700,0.0665,0.971,173947
1,2,1963,Please Please Me,Misery,0.591,0.605,0.0413,0.7070,0.3090,0.882,108547
2,3,1963,Please Please Me,Anna,0.608,0.565,0.0336,0.6350,0.0601,0.835,177133
3,4,1963,Please Please Me,Chains,0.654,0.561,0.0304,0.6080,0.1290,0.929,145080
4,5,1963,Please Please Me,Boys,0.402,0.86,0.0504,0.6070,0.7360,0.822,146440
...,...,...,...,...,...,...,...,...,...,...,...
188,189,1970,Let It Be,ive got a feeling,0.440,0.609,0.0358,0.0715,0.5820,0.364,217560
189,190,1970,Let It Be,one after 909,0.554,0.828,0.0739,0.0307,0.9070,0.888,173960
190,191,1970,Let It Be,the long and winding road,0.299,0.329,0.0279,0.7560,0.0559,0.392,218187
191,192,1970,Let It Be,for you blue,0.880,0.556,0.0855,0.2400,0.2400,0.955,152213


### A.4 Inspect the DataFrame

Now we can look at the data in various ways to see what's here:

* `books.head()` shows us just the first five entries (for brevity).  
* We could also see `books.tail()`, or `books.sample(5)` to see a random sample of five.
* `books.shape` will tell us the size of our frame:  how many rows and columns



In [None]:
books.tail()

Overall size of our dataframe:

In [None]:
books.shape

#### We found some duplicate books, so let's drop one of those

In [None]:
books = books.drop_duplicates()
books.shape

## B. Working with Column Data

### B.1 Get Columns and Values

* We now start to look more closely the columns
* `books.columns` will give us a list of the column names
* `books.columns.sort_values()`
* Select one:  `books["Place_of_Publication"]`
* Count the number of unique values: `books["Place_of_Publication"].nunique()`
* Count the number of entries for each value:  `books["Place_of_Publication"].value_counts()`

#### We can list out the columns like this:

In [None]:
books.columns

#### And put them in alphabetical order

In [None]:
books.columns.sort_values()

#### An individual column is represented as a "Series"

In [None]:
books["Place_of_Publication"]

In [None]:
books["Place_of_Publication"].nunique()

#### We get a count of the places of publication as follows. Note, however, the subtle orthographical differences:

In [None]:
books["Place_of_Publication"].value_counts()

___

### B.2  What about all those Author columns?

* Notice that in the *Emblematica* data that an individual book can have more than one author.  Each is in a different column


In [None]:
books.head()

#### Using Python's capacity to understand `lists`, we can find just these columns:

In [None]:
author_columns = [c for c in books.columns if c.startswith("Author")]
author_columns

In [None]:
books[author_columns].head()

#### And now 'stack' the values for all these columns into one **Series**

In [None]:
books[author_columns].stack()

#### And count the items in the stacked set

In [None]:
books[author_columns].stack().value_counts()

#### All the authors, sorted alphbetically as a list

In [None]:
sorted(books[author_columns].stack().unique())

### B.3 See if you can get a list of contributors 

It is very similar to the process above:

In [None]:
books.columns

___

## C.  Charts and Graphs

* Through libraries like **matplot**, Pandas can quickly produce histograms, charts, and graphs of various kinds (these can even be saved as PNG files for publications)

In [None]:
books["Publication_Date"].sort_values()

#### Here can create histograms of the distribution of the years of publication, and sort them into however many 'bins' we like

In [None]:
books["Publication_Date"].hist(figsize=(20, 10), bins=40)
plt.xlabel("Year")
plt.ylabel("Book count")
plt.show()

#### Various built-in math functions allow us to run basic statistics.  Libraries like `numpy` permit many more!

In [None]:
books.Publication_Date.mean()

___

## D.  More Work with Columns

### D.1 Load the Emblem CSV

In [None]:
emblems = pd.read_csv(emblematica_emblems)

In [None]:
emblems.sample(5)

### D.2  Dropping Columns

* The *From Collection* column is completely empty (see the `NaN` values), so how about we drop it from our DataFrame. 
* We will also drop the **URL** columns to reduce the size of the DataFrame.



In [None]:
emblems = emblems.drop(columns=["From Collection", "URL_for_Emblem_Details", "URL_for_Emblem_Thumbnail", "URL_for_Pictura"])
emblems.head()

#### We found a duplicate book entry (1695) so let's drop it here

### D.3 Exploring the IconClass Data:  Combining the Tags

* Emblematica Online uses the Iconclass vocabularies to classify the images.
* Each emblem can have more than one classification number, and these are stored in separate columns
* Let's: 
 - **find those Iconclass columns** and then 
 - **remove the NaN values** in any cells and then
 - put all the values for each icon in a single cell **'iconclasses'**

 #### More about [Iconclass](http://www.iconclass.org/help/outline)

In [None]:
icon_columns = [c for c in emblems.columns if c.startswith("Iconclass")]
icon_columns

In [None]:
emblems["iconclasses"] = (
    emblems[icon_columns]
        .fillna("")
        .apply(lambda x: [el for el in x if el], axis="columns")
)
emblems["iconclasses"]

In [None]:
emblems.head()

#### The `iloc` method lets us see a particular location, in this case the **combined cell of the first row** `[0]`

In [None]:
emblems.iconclasses.iloc[0]

### D.3 Exploring the IconClass Data:  Searching within Tag Data

* Iconclass data (http://www.iconclass.org/help/outline) are organized hierarchically.  The first digit represents a broad category ("2" = nature), and subsequent characters and letter represent sub-types.  "24" is the heavens; "25" is the earth.  
* "25F" is the category 'animal'.  So we can make a new **Boolean column** (which will be either True or False) for any row in which that tag string appears. 


In [None]:
emblems["is_animal"] = emblems.iconclasses.apply(lambda tag_list: any([tag.startswith("25F") for tag in tag_list]))
emblems["is_animal"]

#### The dataframe can now be 'filtered' to show only those rows where "is_animal" is True (or False!)

* Scroll to the far right in the following dataframe to see the Boolean column

In [None]:
results = emblems[emblems.is_animal]
results

In [None]:
results.to_csv("my_animals.csv")

___

## E.  Combining, Joining, and Merging DataFrames

### E.1 We can merge the emblem and book dataframes on the basis of some shared data

* In this case the **Published_In** column in the emblems list corresponds to the **Book_Id**
* In Pandas, the two frames to be joined are called "left" and "right"
* The "suffixes" argument tells Pandas how to handle fields are otherwise named identically in the source files

In [None]:
emblems_combined = pd.merge(left=emblems, 
         right=books, 
         left_on="Published_In", 
         right_on="Book_ID", 
         how="left", 
         suffixes=["_emblem", "_book"])

emblems_combined.tail()

In [None]:
emblems_combined.columns


In [None]:
emblems_combined.shape

In [None]:
emblems_combined.columns

#### Let's see where emblems with animals were published:

In [None]:
emblems_combined[emblems_combined.is_animal]["Place_of_Publication"].sort_values().unique()

___

## F.  Decima Data

* Let's load the Decima data

In [3]:
decima = pd.read_csv(decima_data)
decima1 = decima

#### Should we add something about renaming columns?

In [4]:
decima1.loc[0]['OBJECTID'] = 'https://docs.google.com/document/d/1alBeVyXkABBwWlTiCMGBLB3hoO5f4sMm4wMnTVzDM4Y/edit#'

A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  decima1.loc[0]['OBJECTID'] = 'https://docs.google.com/document/d/1alBeVyXkABBwWlTiCMGBLB3hoO5f4sMm4wMnTVzDM4Y/edit#'


In [5]:
decima1

Unnamed: 0,OBJECTID,Y,X,Folio,Entry Number,Entry Number.1,Owner in Source,Standardized Alternative,Source,"Age of Confraternity (if estimate, *)",...,Building Description,Quartiere,Popolo,Street,Location Information,OBJECTID.1,Y.1,X.1,Folio.1,Entry Number.2
0,2073,43.770558,11.270464,179v,2784,2784,"Cappella, o vero, compagnia degl'Azzurri di S....",S. Maria della Neve,Henderson P&C #114,1445*,...,casa,s. Giovanni,S. Piero,Borgo della Porta alla Croce,contigua alla suddetta et a una delle Monache ...,2073,43.770558,11.270464,179v,2784
1,3219,43.770336,11.264497,195r,3033,3033,Compagnia che sotto S. Piero Maggiore,S Pier Maggiore,Henderson P&C #137,,...,casa,s. Giovanni,S. Ambrogio,Via dell'Agnolo,contigua alla suddetta et a una di Francesco d...,3219,43.770336,11.264497,195r,3033
2,2203,43.771118,11.269190,157,2421,2421,Compagnia che sotto S. Piero Maggiore,S Pier Maggiore,Henderson P&C #137,,...,casa in cappella,s. Giovanni,S. Piero Maggiore,Borgo della Porta alla Croce,contigua alla portaccia di Borgo alla croce et...,2203,43.771118,11.269190,157,2421
3,147,43.775931,11.252323,44r,681,681,Compagnia de' Cieci,S Maria del Giglio detto de'ciechi [S Ma. de P...,Henderson P&C 104,1324,...,Casa,s. Giovanni,S. Lorenzo,Via S. Jacopo in Campo Corbolino - Cella di Ci...,contigua a la soprascritta suoltando al lato d...,147,43.775931,11.252323,44r,681
4,7343,43.765810,11.247119,62v,1035,1035,Compagnia degl'Innocenti,SS Innocenti,Henderson P&C #78,,...,una casa,s. Spirito,S. Felice in piazza,Via Mazzetta,in via mazzetta a 1o Rede di Piero Dei a 2o An...,7343,43.765810,11.247119,62v,1035
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
128,2963,43.776060,11.260334,133r,2059,2059,Orsanmichele,Madonna di Orsanmichele,Henderson P&C #92,1291,...,Casa,s. Giovanni,S. Michele Visdomini,Via de Servi - Piazza de Servi,Posta nella testa della via de servi sul canto...,2963,43.776060,11.260334,133r,2059
129,3296,43.771350,11.262617,203r,3163,3163,Orsanmichele,Madonna di Orsanmichele,Henderson P&C #92,1291,...,casa con bottega,s. Giovanni,S. Piero Maggiore,Canto alle Rondine,contigua alla suddetta et a Luca et Giovanni d...,3296,43.771350,11.262617,203r,3163
130,3455,43.770749,11.262444,201v,3143,3143,Orsanmichele,Madonna di Orsanmichele,Henderson P&C #92,1291,...,una casa,s. Giovanni,S. Piero Maggiore,Via Santucce,contigua alla suddetta et a Giovanni di Raffae...,3455,43.770749,11.262444,201v,3143
131,4120,43.769050,11.260390,64v,812,812,Orsanmichele,Madonna di Orsanmichele,Henderson P&C #92,1291,...,una casa con una bottega sotto di ciabattino,s. Croce,,Piazza S. Croce,sulla piazza di s. Croce a 1o il macello del F...,4120,43.769050,11.260390,64v,812


### F.1 Slicing DataFrames

#### Another way to slice data:  selected rows

In [6]:
decima.iloc[:10]

Unnamed: 0,OBJECTID,Y,X,Folio,Entry Number,Entry Number.1,Owner in Source,Standardized Alternative,Source,"Age of Confraternity (if estimate, *)",...,Building Description,Quartiere,Popolo,Street,Location Information,OBJECTID.1,Y.1,X.1,Folio.1,Entry Number.2
0,2073,43.770558,11.270464,179v,2784,2784,"Cappella, o vero, compagnia degl'Azzurri di S....",S. Maria della Neve,Henderson P&C #114,1445*,...,casa,s. Giovanni,S. Piero,Borgo della Porta alla Croce,contigua alla suddetta et a una delle Monache ...,2073,43.770558,11.270464,179v,2784
1,3219,43.770336,11.264497,195r,3033,3033,Compagnia che sotto S. Piero Maggiore,S Pier Maggiore,Henderson P&C #137,,...,casa,s. Giovanni,S. Ambrogio,Via dell'Agnolo,contigua alla suddetta et a una di Francesco d...,3219,43.770336,11.264497,195r,3033
2,2203,43.771118,11.26919,157,2421,2421,Compagnia che sotto S. Piero Maggiore,S Pier Maggiore,Henderson P&C #137,,...,casa in cappella,s. Giovanni,S. Piero Maggiore,Borgo della Porta alla Croce,contigua alla portaccia di Borgo alla croce et...,2203,43.771118,11.26919,157,2421
3,147,43.775931,11.252323,44r,681,681,Compagnia de' Cieci,S Maria del Giglio detto de'ciechi [S Ma. de P...,Henderson P&C 104,1324,...,Casa,s. Giovanni,S. Lorenzo,Via S. Jacopo in Campo Corbolino - Cella di Ci...,contigua a la soprascritta suoltando al lato d...,147,43.775931,11.252323,44r,681
4,7343,43.76581,11.247119,62v,1035,1035,Compagnia degl'Innocenti,SS Innocenti,Henderson P&C #78,,...,una casa,s. Spirito,S. Felice in piazza,Via Mazzetta,in via mazzetta a 1o Rede di Piero Dei a 2o An...,7343,43.76581,11.247119,62v,1035
5,8476,43.767438,11.244969,120v,1874,1874,Compagnia degl'Innocenti di s. Maria Novella,SS Innocenti,Henderson P&C #92,,...,una casa,s. Spirito,S. Friano,Via d' Ardiglione,a primo Piero di Raffaello a secondo ser Alama...,8476,43.767438,11.244969,120v,1874
6,5514,43.772942,11.247446,37v,642,642,Compagnia degl'Innocenti di s. Maria Novella,SS Innocenti,Henderson P&C #92,,...,Casa,s. Maria Novella,s. Paolo,Via nuova da s. Paulo,contigua a la suddetta et a una del capitolo d...,5514,43.772942,11.247446,37v,642
7,8102,43.768004,11.240896,116v,1814,1814,Compagnia degli Genovesi nel Carmine,S Sebastiano de'Genovesi,Henderson P&C #147,1474,...,una casa,s. Spirito,S. Friano,Via s. Salvadore,nella via di s. Salvatore a 1o Bartolomeo di B...,8102,43.768004,11.240896,116v,1814
8,2030,43.770728,11.270409,180r,2786,2786,Compagnia dei Bianchi di S. Ambrogio,S Maria delle Laude e di S Ambrogio,Henderson P&C #105 [or 121],1466,...,casa con un poco di bottega,s. Giovanni,S. Piero,Borgo della Porta alla Croce,"Contigua alla suddetta et al chiasso della, os...",2030,43.770728,11.270409,180r,2786
9,400,43.777147,11.254293,35v,534,534,Compagnia dei Concordi,S Concordia [property on S Orsola = close to c...,Henderson P&C #42,<1429,...,Casa,s. Giovanni,S. Lorenzo,Via S. Orsola,contigua a la suddetta et a gioGualberto di Ra...,400,43.777147,11.254293,35v,534


In [7]:
decima.shape

(133, 44)

#### Check the column names

In [8]:
decima.columns

Index(['OBJECTID', 'Y', 'X', 'Folio', 'Entry Number', 'Entry Number.1',
       'Owner in Source', 'Standardized Alternative', 'Source',
       'Age of Confraternity (if estimate, *)', 'Type', 'Property Type',
       'Rent - Lire', 'Rent - Scudi', 'Value', 'Male Tenants',
       'Female Tenants', 'Total Residents', 'Contract Description', 'Contract',
       'Owner Type', 'Other Owner Name', 'Tenant', 'Tenant Gender',
       'Tenant Occupation', 'Resident 1 Name', 'Resident 1 Gender',
       'Resident 1 Occupation', 'Resident 2 Name', 'Resident 2 Gender',
       'Resident 2 Occupation', 'Resident 3 Name', 'Resident 3 Gender',
       'Resident 3 Occupation', 'Building Description', 'Quartiere', 'Popolo',
       'Street', 'Location Information', 'OBJECTID.1', 'Y.1', 'X.1', 'Folio.1',
       'Entry Number.2'],
      dtype='object')

### F.2 Groupby Functions

* With **groupby** Pandas will let us find all the **neighborhoods** in members of particular **trades** lived

In [9]:
decima.groupby("Tenant Occupation").Quartiere.unique()

Tenant Occupation
barbiere                                                      [s. Spirito]
battilano                              [s. Giovanni, s. Croce, s. Spirito]
beccaio                                            [s. Croce, s. Giovanni]
calzolaio                                                    [s. Giovanni]
cappellaio                                                   [s. Giovanni]
cartolaio                                                       [s. Croce]
ciabattino                                         [s. Giovanni, s. Croce]
coltellinaio                                                    [s. Croce]
contadino                                                       [s. Croce]
curandaio                                                    [s. Giovanni]
divettino                                                    [s. Giovanni]
donzello                                                     [s. Giovanni]
esattore                                                     [s. Giovanni]
fattori

In [10]:
decima.groupby("Quartiere")["Tenant Occupation"].value_counts()

Quartiere         Tenant Occupation    
s. Croce          battilano                2
                  beccaio                  1
                  cartolaio                1
                  ciabattino               1
                  coltellinaio             1
                  contadino                1
                  legnaiuolo               1
                  muratore                 1
                  occhialaio               1
                  portatore                1
                  prete e cappellano       1
                  servo                    1
                  spedaliero               1
                  tessitore                1
                  tessitore di drappi      1
                  votapozzi                1
s. Giovanni       tessitore                7
                  legnaiuolo               2
                  rigattiere               2
                  battilano                1
                  beccaio                  1
               

### F.3  Mapping with folium

* Decima includes geographical data in the form of longitude and latitude

#### Since the X and Y columns represent longitude and latitude, we can use the `mean` method to find the center!

In [11]:
X, Y = decima["X"].mean(), decima["Y"].mean()
print(X, Y)

11.255879475338345 43.772051565563906


#### The folium library will let us create maps directly in Jupyter NBs

- Notice that the map is zoomable and interactive (linked back to the data in our frame)

In [16]:
our_map = folium.Map(location=[Y, X], zoom_start=15)
our_map

In [17]:
decima["popup_column"] = decima.apply(lambda row: f"""
        <ul>
            <li><strong>ID: </strong>{row["OBJECTID"]}</li>
            <li><strong>Source: </strong>{row["Source"]}</li>
            <li><strong>Name: </strong>{row["Tenant Occupation"]}</li>
        <ul>
    """
, axis="columns")

In [20]:
decima

Unnamed: 0,OBJECTID,Y,X,Folio,Entry Number,Entry Number.1,Owner in Source,Standardized Alternative,Source,"Age of Confraternity (if estimate, *)",...,Quartiere,Popolo,Street,Location Information,OBJECTID.1,Y.1,X.1,Folio.1,Entry Number.2,popup_column
0,2073,43.770558,11.270464,179v,2784,2784,"Cappella, o vero, compagnia degl'Azzurri di S....",S. Maria della Neve,Henderson P&C #114,1445*,...,s. Giovanni,S. Piero,Borgo della Porta alla Croce,contigua alla suddetta et a una delle Monache ...,2073,43.770558,11.270464,179v,2784,\n <ul>\n <li><strong>ID: </...
1,3219,43.770336,11.264497,195r,3033,3033,Compagnia che sotto S. Piero Maggiore,S Pier Maggiore,Henderson P&C #137,,...,s. Giovanni,S. Ambrogio,Via dell'Agnolo,contigua alla suddetta et a una di Francesco d...,3219,43.770336,11.264497,195r,3033,\n <ul>\n <li><strong>ID: </...
2,2203,43.771118,11.269190,157,2421,2421,Compagnia che sotto S. Piero Maggiore,S Pier Maggiore,Henderson P&C #137,,...,s. Giovanni,S. Piero Maggiore,Borgo della Porta alla Croce,contigua alla portaccia di Borgo alla croce et...,2203,43.771118,11.269190,157,2421,\n <ul>\n <li><strong>ID: </...
3,147,43.775931,11.252323,44r,681,681,Compagnia de' Cieci,S Maria del Giglio detto de'ciechi [S Ma. de P...,Henderson P&C 104,1324,...,s. Giovanni,S. Lorenzo,Via S. Jacopo in Campo Corbolino - Cella di Ci...,contigua a la soprascritta suoltando al lato d...,147,43.775931,11.252323,44r,681,\n <ul>\n <li><strong>ID: </...
4,7343,43.765810,11.247119,62v,1035,1035,Compagnia degl'Innocenti,SS Innocenti,Henderson P&C #78,,...,s. Spirito,S. Felice in piazza,Via Mazzetta,in via mazzetta a 1o Rede di Piero Dei a 2o An...,7343,43.765810,11.247119,62v,1035,\n <ul>\n <li><strong>ID: </...
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
128,2963,43.776060,11.260334,133r,2059,2059,Orsanmichele,Madonna di Orsanmichele,Henderson P&C #92,1291,...,s. Giovanni,S. Michele Visdomini,Via de Servi - Piazza de Servi,Posta nella testa della via de servi sul canto...,2963,43.776060,11.260334,133r,2059,\n <ul>\n <li><strong>ID: </...
129,3296,43.771350,11.262617,203r,3163,3163,Orsanmichele,Madonna di Orsanmichele,Henderson P&C #92,1291,...,s. Giovanni,S. Piero Maggiore,Canto alle Rondine,contigua alla suddetta et a Luca et Giovanni d...,3296,43.771350,11.262617,203r,3163,\n <ul>\n <li><strong>ID: </...
130,3455,43.770749,11.262444,201v,3143,3143,Orsanmichele,Madonna di Orsanmichele,Henderson P&C #92,1291,...,s. Giovanni,S. Piero Maggiore,Via Santucce,contigua alla suddetta et a Giovanni di Raffae...,3455,43.770749,11.262444,201v,3143,\n <ul>\n <li><strong>ID: </...
131,4120,43.769050,11.260390,64v,812,812,Orsanmichele,Madonna di Orsanmichele,Henderson P&C #92,1291,...,s. Croce,,Piazza S. Croce,sulla piazza di s. Croce a 1o il macello del F...,4120,43.769050,11.260390,64v,812,\n <ul>\n <li><strong>ID: </...


In [18]:
decima.apply(lambda row: folium.Marker(
                                    location=(row["Y"], row["X"]),
                                    popup=row["popup_column"]
                                    ).add_to(our_map), axis=1);

In [19]:
our_map

## G. Bartoli Letters

In [None]:
bartoli = pd.read_csv(bartoli_letters)

In [None]:
bartoli.head()

In [None]:
def parse_date(date_string):
    try:
        return datetime.strptime(date_string, "%d/%m/%Y").date()
    except ValueError as e:
        print(e)
        return np.nan

In [None]:
bartoli["parsed_date"] = bartoli["Modern Date"].dropna().apply(parse_date)


In [None]:
bartoli.sort_values("parsed_date").head(20)

In [None]:
_filter_Bartoli = bartoli["Sender"].str.contains("Bartoli")


In [None]:
bartoli.parsed_date.dropna().apply(lambda x: x.year)

In [None]:
bartoli.parsed_date.dropna().apply(lambda x: x.year).value_counts()

#### Find Documents of Type 'Avviso'

In [None]:
bartoli["Sender"].str.contains("Avviso")

In [None]:
_is_Avviso = bartoli["Sender"].str.contains("Avviso")
_is_Avviso

####  Now reverse the Boolean series (so that Avviso documents are false)

In [None]:
~_is_Avviso

In [None]:
bartoli[~_is_Avviso]

In [None]:
bartoli[_filter_Bartoli].tail(20)


In [None]:
pd.to_datetime(bartoli['Modern Date'], format='%d/%m/%Y')

In [None]:
bartoli['Recipient'].unique()

In [None]:
bartoli['Recipient'].value_counts()

In [None]:
bartoli['clean_recipients'] = bartoli['Recipient'].str.strip('[]')

In [None]:
bartoli['clean_recipients'].value_counts()

## H.  Itineraries Data and Network

In [None]:
itineraries = pd.read_csv('https://raw.githubusercontent.com/rmidura/emdigit/main/Edge_Table.tsv', sep='\t')

In [None]:
itineraries.head['Edge_Type_Last_Date']

### use tab for autocomplete

In [None]:
edges = list(zip(itineraries['Source'],itineraries['Target']))

In [None]:
G = nx.Graph()

In [None]:
G.add_edges_from(edges)

In [None]:
import pyvis

In [None]:
pyvis_graph = pyvis.network.Network(notebook=True, height='900px', width='900px')

In [None]:
pyvis_graph.from_nx(G)

In [None]:
pyvis_graph.show('our_graph.html')

## I. Compagnie Data From Laura Blom

In [None]:
compagnie = pd.read_csv(compagnie_data)


In [None]:
compagnie.shape