Preliminary installation: googletrans library for the last study about the titles of the text. 
Uncomment the following cell to install the library.

In [1]:
#pip install googletrans==3.1.0a0

Preliminary imports: pandas and numpy for managing the dataset, nltk and re for manipulating the texts in the last section.

In [2]:
import pandas as pd
import numpy as np
import nltk
import re
from SPARQLWrapper import SPARQLWrapper, JSON
import ssl

#preparing the connexion to the wikidata endpoint
ssl._create_default_https_context = ssl._create_unverified_context
# get the endpoint API
wikidata_endpoint = "https://query.wikidata.org/bigdata/namespace/wdq/sparql"


# Giving voice to madrigals

Our research starts form the dataset of the British Library about the History of Printed Music. It is downloadable as a csv file.

First we downloaded the [dataset from the British Library](https://www.bl.uk/bibliographic/downloads/HistoryOfMusicResearcherFormat_202210_csv.zip) about the History of Music, which we then loaded using the pandas library in order to examine our data.

We then create a Dataframe about madrigals, extracting only rows where the relevant columns contain the characters "madrigal"

In [3]:
#insert here the link to the British Library data dump, after dowloading it.
df = pd.read_csv("C:/Users/const/Downloads/MusicResearcherFormat_201505_csv/detailedrecords.csv",dtype=str)
df_madri = df.loc[df["Subject/genre terms"].str.contains('madrigal', case=False, na=False) | df["Title"].str.contains('madrigal', case=False, na=False) | df["Other titles"].str.contains('madrigal', case=False, na=False) | df["Notes"].str.contains('madrigal', case=False, na=False)]
df_madri

Unnamed: 0,BL record ID,Composer,Composer life dates,Title,Standardised title,Other titles,Other names,Publication date (standardised),Publication date (not standardised),Country of publication,...,Contents,Referenced in,Subject/genre terms,Physical description,Series title,Number within series,ISBN,ISMN,Publisher number,BL shelfmark
3722,004166081,"Adlam, Frank",,Winter stern hath loosed his Grip. <Madrigal f...,,,,1913,1913,England,...,,,,"3 pages, 8°",Choruses for equal Voices,no. 1387 [Choruses for equal Voices],,,,mDON1781 ; E.861./1387
3804,004166165,"Adler, Samuel",1928-,Three Madrigals. For four-part chorus of mixed...,,,,1958,1958,United States,...,,,,"11 pages, 8°",,,,,,F.1744.v.(1.)
3932,004166293,"Adriani, Francesco",,Il Primo Libro de Madrigali a Cinque Voci ... ...,,,,1570,1570,Italy,...,,,,4°,,,,,,D.148
3933,004166294,"Adriani, Francesco",,Il Secondo Libro de Madrigali a Cinque Voci .....,,,,1570,1570,Italy,...,,,,4°,,,,,,D.148.a
3934,004166295,"Adriansen, Emanuel",approximately 1554-1604,"Luitmuziek ... Een keuze van fantasieën, danse...",,,"Spiessons, Godelieve",1966,1966,Belgium,...,,,,"xvi, 97 pages, facsimiles, folio",Monumenta musicæ belgicæ,no. 10 [Monumenta musicæ belgicæ],,,,H.15
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
1044411,016837423,"Bennet, John",active 1599-1614,All creatures now are merry-minded : (SSATB) ;...,,Triumphes of Oriana,"Wilkes, Roger, (Musician) [editor]",2014,©2014,England,...,,,"Choruses, Secular (Mixed voices)--Scores ; Mad...","1 score (12 pages) + 9 parts, 30 cm",Editions of renaissance music,,,9790570570317 ; 9790570570324,TR004,G.1535.l.(1.)
1044413,016837433,"Jones, Robert",active 1597-1615,"Fair Oriana, seeming to wink : (SSAATB) : from...",,Triumphes of Oriana,"Wilkes, Roger, (Musician) [editor]",2014,©2014,England,...,,,"Choruses, Secular (Mixed voices)--Scores ; Mad...","1 score (12 pages) + 14 parts, 30 cm",Editions of renaissance music,,,9790570570300,TR021,G.1535.l.(4.)
1044421,016840548,"Jackson, William",1730-1803,"Ten duets : for sopranos, tenors, or soprano &...",Vocal music. Selections,Duets ; Canzonets ; Pastorals ; Madrigals,"Jackson, William, op. 9 ; Jackson, William, op...",2014,©2013,England,...,"From Twelve canzonets, op. 9. Time has not thi...",,Vocal duets with continuo,"2 scores (44, 39 pages), 30 cm",,,,9790708105701,Jac 1,G.804.ll.(11.)
1044497,016874797,"Tomkins, Thomas",1572-1656,Dear Lord of life,,,"Burke, James [editor]",2014,©2014,England,...,,,"Anthems ; Choruses, Sacred (Mixed voices, 6 pa...","1 score (15 pages), 30 cm",Church Music Society reprints,no. 132 [Church Music Society reprints],9780193954014,,,E.1617./132


This dataset contains all rows containg the word madrigal, but for our study we want only those composed by composers in our period of interest (1530 - 1650), so we decided to clean the "Composer life dates" column, adding manually those that are missing. 

We first extract only the column "Composer" and "Composer life dates" and select composers born after 1450 and dead before 1730 (to be sure to involve all composers of interest).
We manually added and corrected the dates, because most of these composers are not present on Wikidata and had to be searched on specialized websites and music dictionaries.
To download the original csv, uncomment the last line.

In [4]:
composers_to_search=df_madri[["Composer", "Composer life dates"]]
composers_to_search =composers_to_search.sort_values(["Composer"]).drop_duplicates()
composers_to_search.to_csv("data/composers_to_clean.csv")

After exporting our csv and correcting it manually, we import it back into our project. 

In [5]:
composers = pd.read_csv("data/composers_dates_cleaned.csv")

#dropping lines where we don't have information about life and death dates. 
#(we cannot include them in our final selection of Composers belonging to our period of time)
composers = composers.dropna()
composers.drop('Unnamed: 0', axis=1, inplace=True)
composers

Unnamed: 0,Composer,Composer life dates
1,A. L. (Amelia Lehmann),1838-1903
2,"Adler, Samuel",1928-
3,"Adriaensen, Emanuel",1550-1604
4,"Adriani, Francesco",1539-1575
5,"Adriansen, Emanuel",1554-1604
...,...,...
937,"Zanotti, Camillo",1545-1591
938,"Zarlino, Gioseffo",1517-1590
939,"Zoilo, Annibale",1537-1592
940,"Zoilo, Cesare",1584-1622


In [6]:
dict_composers = composers.set_index("Composer").to_dict()

Then we select only composers born in the right period: after the first composer of madrigals Philippe Verdelot in 1480, and dead before the last composer of madrigals of our period of interest: Scarlatti in 1725. To include cases of composers born at the extremities of this time frame, we delimited the time period between 1450 and 1730.

In [7]:
comp_madrigals = {}
for composer in dict_composers['Composer life dates']:
    dates = dict_composers['Composer life dates'][composer]
    dates = dates.strip("- ")
    dates_list = dates.split("-")
    if len(dates_list)>1:
        if int(dates_list[0])>=1450 and int(dates_list[1])<=1730:
            comp_madrigals[composer]=dates
    else:
        if int(dates_list[0])>=1450 and int(dates_list[0])<=1730:
            comp_madrigals[composer]=dates

# 1. Working on composers of interest: 
We want to merge our dictionary with information extracted from Wikidata.
First we need to change the name in format "surname, name" to "name surname" which is the one adopted for labels on Wikidata.

In [8]:
dict_comp_madrigals = {}

#this dictionary will be used later on when working with the publishers and records dataframes
inverted_names = {}

for el in comp_madrigals.keys():
    
    #exchanging "surname, name" for "name surname"
    if len(el.split(", "))>1:
        splitted = el.split(", ")
        name = splitted[1].strip("'")
        surname = splitted[0].strip("'")
        dict_comp_madrigals[name+" "+surname] = comp_madrigals[el]
        inverted_names[el] = name+" "+surname
        
    #some entities, like "Henry VIII", are not in "surname, name" format, so they should be added as is.
    else:
        dict_comp_madrigals[el] = comp_madrigals[el]
        
#printing the dictionary        
print(len(dict_comp_madrigals))
dict_comp_madrigals


369


{'Emanuel Adriaensen': '1550-1604',
 'Francesco Adriani': '1539-1575',
 'Emanuel Adriansen': '1554-1604',
 'Agostino Agazzari': '1578-1640',
 'Lodovico Agostini': '1534-1590',
 'Gregor Aichinger': '1564-1628',
 'Vittoria Aleotti': '1573-1620',
 'Richard Alison': '1565-1610',
 'Felice Anerio': '1560-1614',
 'Giovanni Francesco Anerio': '1567-1630',
 'Giovanni Animuccia': '1500-1571',
 'Padovano Annibale': '1527-1575',
 'Jacob Arcadelt': '1505-1568',
 'Antonio Archilei': '1500-1612',
 'Giovanni Matteo Asola': '1532-1609',
 'Filippo Azzaiolo': '1530-1569',
 'Ippolito Baccusi': '1550-1609',
 'Simone Balsamino': '1596',
 'Adriano Banchieri': '1568-1634',
 'Bartolomeo Barbarino': '1617',
 'Melchiorre de Barberiis': '1500-1549',
 'Giovanni de Bardi': '1534-1612',
 'Giovanni Battista Bassani': '1650-1716',
 'Giovanni Bassano': '1558-1617',
 'Thomas Bateson': '1630',
 'Luca Bati': '1546-1608',
 'Henricus Beauvarlet': '1575-1623',
 'Antonio di Becchi': '1522-1568',
 'Girolamo Belli': '1552-1620'

### Querying the Wikidata SPARQL endpoint API
When sending a query to the wikidata sparql endpoint api, the full list was too big to handle, so we split it in groups of 100 composers.

We use regular expressions to search for persons which have as occupation "composer" and whose label matches one in the list. 
By groups of 100, the names are concatenated in a big string with the operator OR ("|") and we make sure that the name starts ("^") and ends ("$") exactly with those characters.

Without this condition, the first query returned other composers which were almost homonyms from the ones in our list, for example "Friedrich Nicolaus Bruhns" instead of only "Nicolaus Bruhns".

In [9]:
#one query with all the composers is too long, so we split the list in sub_lists of 100 composers.
list_keys = list(dict_comp_madrigals.keys())
chunked_list_to_be_queried = [list_keys[i:i+100] for i in range(0, len(dict_comp_madrigals), 100)]
#instantiating the final dictionary
dict_composers_wd = {}

In [10]:
#sending a query for groups of 100 composers
for hundred_composers in chunked_list_to_be_queried:
    to_str = "$|^".join(hundred_composers)

    query_composers = """
    SELECT DISTINCT *
    WHERE {
            ?composer wdt:P106 wd:Q36834; #has for occupation: composer
            rdfs:label ?label.
            FILTER regex(?label, \"^"""+to_str+"""$\" )
            FILTER (langMatches(lang(?label), "EN"))
            }
    """
    # set the endpoint 
    sparql_wd = SPARQLWrapper(wikidata_endpoint)
    # set the query
    sparql_wd.setQuery(query_composers)
    # set the returned format
    sparql_wd.setReturnFormat(JSON)
    # get the results
    results = sparql_wd.query().convert()

    # manipulate the result
    for result in results["results"]["bindings"]:
        #the dictionary contains as key the name of the composers, 
        #and as value another dictionary containing the future columns of the dataframe, "wikidata" and "dates"
        dict_composers_wd[result["label"]["value"]] = {"wikidata":result["composer"]["value"], 
                                                       "dates":dict_comp_madrigals[result["label"]["value"]]}

#printing the length of the dictionary
print(len(dict_composers_wd))

244


We get a dataset of 244 composers out of the 369 in the British Library catalogue, so approximately 2/3 of them.

We now send another query to wikidata to see if we can find other composers of madrigals to add to our dataset.

In [11]:
query_new_composers = """
SELECT ?composition ?compositionLabel ?composer ?composerLabel ?birthdate ?deathdate WHERE {
        ?composition wdt:P86 ?composer;         #a piece composed by someone, 
                     wdt:P7937 wd:Q193217.      #and is a form of creative work of a madrigal.
        ?composer wdt:P569 ?birthdate;
                  wdt:P570 ?deathdate.
          SERVICE wikibase:label { bd:serviceParam wikibase:language "en" }  # labels in English
}

"""

# set the endpoint 
sparql_wd = SPARQLWrapper(wikidata_endpoint)
# set the query
sparql_wd.setQuery(query_new_composers)
# set the returned format
sparql_wd.setReturnFormat(JSON)
# get the results
results = sparql_wd.query().convert()

# manipulate the result
for result in results["results"]["bindings"]:
    #Adding the composer only if it doesn't already exist in the dataset
    if result["composerLabel"]["value"] not in dict_composers_wd:
        birthdate = result["birthdate"]["value"].split("-")
        deathdate = result["deathdate"]["value"].split("-")
        dict_composers_wd[result["composerLabel"]["value"]] = {"wikidata":result["composer"]["value"],
                                                              "dates":birthdate[0]+"-"+deathdate[0]}

        
#printing the length of the dictionary and the dictionary itself
print(len(dict_composers_wd))
print(dict_composers_wd)

245
{'Nicolaus Bruhns': {'wikidata': 'http://www.wikidata.org/entity/Q57369', 'dates': '1665-1697'}, 'Marc-Antoine Charpentier': {'wikidata': 'http://www.wikidata.org/entity/Q55524', 'dates': '1643-1704'}, 'Arcangelo Corelli': {'wikidata': 'http://www.wikidata.org/entity/Q164475', 'dates': '1653-1713'}, 'John Bennet': {'wikidata': 'http://www.wikidata.org/entity/Q374718', 'dates': '1575-1614'}, 'Adriano Banchieri': {'wikidata': 'http://www.wikidata.org/entity/Q347804', 'dates': '1568-1634'}, 'Agostino Agazzari': {'wikidata': 'http://www.wikidata.org/entity/Q395563', 'dates': '1578-1640'}, 'Maddalena Casulana': {'wikidata': 'http://www.wikidata.org/entity/Q269690', 'dates': '1544-1590'}, 'Girolamo Dalla Casa': {'wikidata': 'http://www.wikidata.org/entity/Q354023', 'dates': '1543-1601'}, 'Thomas Campion': {'wikidata': 'http://www.wikidata.org/entity/Q455618', 'dates': '1567-1620'}, "Sigismondo D'India": {'wikidata': 'http://www.wikidata.org/entity/Q457145', 'dates': '1582-1629'}, 'Scipio

This allowed to add only one composer to our collection. Indeed, Wikidata has very few instances of madrigals registered.

We tried to query the endpoint to find more information, for example setting madrigals as the movement (P135) or the genre (P136) to which the composer belonged, but none was conclusive.

So we remain with our dataset of 245 individuals.

### Adding metadata for our composers
Now let's add some information: Nationality, gender, birthplace, place of death, languages spoken and instruments played.

In [12]:
#creating a dictionary for mapping the output of wikidata with the future columns of the dataframe
labels = {"genderLabel":"gender", "citizenshipLabel":"citizenship", "birthplaceLabel":"birthplace", "birthcountryLabel":"birth country", "geoBirthplace":"geographical coordinates birthplace", "deathplaceLabel":"place of death", "deathcountryLabel":"country of death", "geoDeathplace":"geographical coordinates place of death", "languageLabel":"language", "instrumentLabel":"instrument"}

In [14]:
for composer in dict_composers_wd:
    #collecting all information available on wikidata about each composer
    query_composer_info = """
    SELECT ?genderLabel ?citizenshipLabel ?birthplaceLabel ?birthcountryLabel ?geoBirthplace ?deathplaceLabel ?deathcountryLabel ?geoDeathplace ?languageLabel ?instrumentLabel WHERE {
           OPTIONAL {<""" +dict_composers_wd[composer]['wikidata']+"""> wdt:P21 ?birth_date } .
           OPTIONAL {<""" +dict_composers_wd[composer]['wikidata']+"""> wdt:P21 ?gender } .
           OPTIONAL {<""" +dict_composers_wd[composer]['wikidata']+"""> wdt:P27 ?citizenship } .
           OPTIONAL {<""" +dict_composers_wd[composer]['wikidata']+"""> wdt:P19 ?birthplace .
                       ?birthplace wdt:P17 ?birthcountry;
                                   wdt:P625 ?geoBirthplace} .
           OPTIONAL {<""" +dict_composers_wd[composer]['wikidata']+"""> wdt:P20 ?deathplace .
                       ?deathplace wdt:P17 ?deathcountry;
                                   wdt:P625 ?geoDeathplace} .
           OPTIONAL {<""" +dict_composers_wd[composer]['wikidata']+"""> wdt:P1412 ?language } .
           OPTIONAL {<""" +dict_composers_wd[composer]['wikidata']+"""> wdt:P1303 ?instrument } .
              SERVICE wikibase:label { bd:serviceParam wikibase:language "en" }  # labels in English
    }

    """
    # set the endpoint 
    sparql_wd = SPARQLWrapper(wikidata_endpoint)
    # set the query
    sparql_wd.setQuery(query_composer_info)
    # set the returned format
    sparql_wd.setReturnFormat(JSON)
    # get the results
    results = sparql_wd.query().convert()

    # manipulate the result
    for result in results["results"]["bindings"]:
        for label in labels:
            if label in result:
                #a composer might have more than one citizenship, language or instrument, and in that case we create a list.
                if labels[label] in ["citizenship", "language", "instrument"]:
                    if labels[label] in dict_composers_wd[composer] and result[label]["value"] not in dict_composers_wd[composer][labels[label]]:
                        if type(dict_composers_wd[composer][labels[label]]) != list:
                            dict_composers_wd[composer][labels[label]] = list([dict_composers_wd[composer][labels[label]]])
                        dict_composers_wd[composer][labels[label]].append(result[label]["value"])
                    elif labels[label] not in dict_composers_wd[composer]:
                        dict_composers_wd[composer][labels[label]] = result[label]["value"]
                        
                #otherwise, just adding the result to the dictionary.
                else:
                    dict_composers_wd[composer][labels[label]] = result[label]["value"]

In [15]:
df_composers = pd.DataFrame.from_dict(dict_composers_wd, orient="index")
df_composers.reset_index(inplace=True)
df_composers = df_composers.rename(columns = {'index':'name'})
df_composers

Unnamed: 0,name,wikidata,dates,gender,citizenship,birthplace,birth country,geographical coordinates birthplace,place of death,country of death,geographical coordinates place of death,language,instrument
0,Nicolaus Bruhns,http://www.wikidata.org/entity/Q57369,1665-1697,male,"[Denmark, Duchy of Schleswig]",Schwabstedt,Germany,Point(9.187222222 54.395833333),Husum,Germany,Point(9.051111111 54.476944444),German,organ
1,Marc-Antoine Charpentier,http://www.wikidata.org/entity/Q55524,1643-1704,male,Kingdom of France,Paris,France,Point(2.351388888 48.856944444),Paris,France,Point(2.351388888 48.856944444),French,"[organ, voice]"
2,Arcangelo Corelli,http://www.wikidata.org/entity/Q164475,1653-1713,male,Papal States,Fusignano,Italy,Point(11.95636 44.46656),Rome,Italy,Point(12.482777777 41.893055555),"[Latin, Italian]",violin
3,John Bennet,http://www.wikidata.org/entity/Q374718,1575-1614,male,England,Lancashire,United Kingdom,Point(-2.6 53.8),,,,English,
4,Adriano Banchieri,http://www.wikidata.org/entity/Q347804,1568-1634,male,,Bologna,Italy,Point(11.342777777 44.493888888),Bologna,Italy,Point(11.342777777 44.493888888),Italian,organ
...,...,...,...,...,...,...,...,...,...,...,...,...,...
240,Camillo Zanotti,http://www.wikidata.org/entity/Q84562722,1545-1591,male,,,,,,,,,
241,Agostino Soderini,http://www.wikidata.org/entity/Q77535913,1575-1607,,,,,,,,,,
242,Flaminio Tresti,http://www.wikidata.org/entity/Q110222853,1560-1613,,,,,,,,,,
243,Cesare Zoilo,http://www.wikidata.org/entity/Q109265169,1584-1622,male,Papal States,,,,Rome,Italy,Point(12.482777777 41.893055555),,


We save the final dataset in json and csv in order to use it for visualizations later on.

In [16]:
df_composers.to_json("data/final datasets/composers.json")
df_composers.to_csv("data/final datasets/composers.csv")

# 2. Cleaning the British Library dataset
We transform the dictionary into a dataframe and merge it to the one with all instances of works, selecting only the columns of interest for us -> composed by composers of interest. 

In [17]:
df_comp_madri = pd.DataFrame.from_dict(comp_madrigals, orient="index")
df_comp_madri = df_comp_madri.reset_index()
df_comp_madri.columns=["Composer", "Composer life dates"]
df_madri=df_madri.drop(["Composer life dates"], axis=1)
df_madri = df_madri.merge(df_comp_madri, how="left", on="Composer")
df_madri = df_madri[["BL record ID", "Title", "Standardised title", "Other titles", "Composer", "Composer life dates", "Other names", "Publication date (standardised)", "Country of publication", "Place of publication", "Publisher", "Notes", "Contents","Subject/genre terms"]]

## 2.1 Extracting madrigals published in the period of interest
We filter the data on publication date to try to clean data and extract only relevant madrigals.

By examining our dataset, we noticed that there were a few more entries for madrigals until the end of the 1600s (until 1678) before a jump to 1762 where the genre of madrigals changes. So we decided to include these in the final dataset

In [18]:
df_madri_pub_bef_1678 = df_madri.copy(deep=True)
df_madri_pub_bef_1678 = df_madri_pub_bef_1678.dropna(subset=['Publication date (standardised)'])
df_madri_pub_bef_1678['Publication date (standardised)'] = df_madri_pub_bef_1678['Publication date (standardised)'].astype("int")
df_madri_pub_bef_1678 = df_madri_pub_bef_1678.loc[df_madri_pub_bef_1678['Publication date (standardised)'] <= 1678]
df_madri_pub_bef_1678

Unnamed: 0,BL record ID,Title,Standardised title,Other titles,Composer,Composer life dates,Other names,Publication date (standardised),Country of publication,Place of publication,Publisher,Notes,Contents,Subject/genre terms
2,004166293,Il Primo Libro de Madrigali a Cinque Voci ... ...,,,"Adriani, Francesco",1539-1575,,1570,Italy,Vinegia,Appresso Girolamo Scotto,,,
3,004166294,Il Secondo Libro de Madrigali a Cinque Voci .....,,,"Adriani, Francesco",1539-1575,,1570,Italy,Vinegia,Appresso Girolamo Scotto,,,
5,004166296,Novvm pratvm mvsicvm longo amoenissimvm : cviv...,Novum pratum musicum,Novum pratum musicum longo amoenissimum,"Adriaensen, Emanuel",1550-1604,"Phalèse, Pierre [printer] ; Bellère, Pierre [p...",1592,Belgium,Antverpiæ ; Antwerp,Excudebat Petrus Phalesius sibi & Ioanni Bellero,Method in lute tabulature followed by solo lut...,Methodvs ad omnes omnivm tonorvm cantiones in ...,Lute--Methods ; Lute music ; Madrigals ; Part ...
6,004166297,Pratum Musicum ... cuius ambitu ... comprehend...,,,"Adriansen, Emanuel",1554-1604,,1600,Belgium,Antuerpiæ,Ex Typographia Musica Petri Phalesij,The contents of this edition are different fro...,,
7,004166441,Di Agostino Agazzari ... Il Primo Libro de Mad...,,,"Agazzari, Agostino",1578-1640,,1600,Italy,Venetia,Appresso Angelo Gardano,,,
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
2390,004762922,Il Primo Libro de' Madrigali a Cinque Voci ......,,,"Zoilo, Cesare",1584-1622,,1627,Italy,Napoli,Appresso Ambrosio Magnetta,,,
2395,004763937,Lucae Marentii ... madrigalia quinque vocum : ...,Madrigals. voices (5). Parts,,"Marenzio, Luca",1553-1599,,1601,Germany,Noribergae,in officina typographica Pauli Kauffmanni,,,
2902,014954746,Il secondo libro de Madrigali a 5 voci … con i...,"Madrigals a 5, Bk.II",,"Rossi, Salamone",1570-1630,,1601,Italy,Venice,Amadino,"Listed in MGG, but no entries in BUCEM or RISM...",,
3368,015834926,Il secondo libro intabolatura di liuto : ove s...,Intabolatura di liuto. libro 2,,"Neusidler, Melchior",1531-1590,"Crecquillon, Thomas [composer] ; Gardane, Anto...",1566,Italy,In Venetia,Appresso di Antonio Gardano,"Collection of madrigals, chansons, dances and ...",Deus canticum novum (secunda pars: Quia delect...,Intabulations (Lute) ; Lute music


We add also those to the general dataset of madrigals of interest, which will then contain madrigals from composers between 1450 and 1730 and/or composed before 1678

In [19]:
df_madri_pub_bef_1678['Publication date (standardised)'] = df_madri_pub_bef_1678['Publication date (standardised)'].astype("str")

#We want to keep only columns where we registered dates for a composer -> those we know are of the right period of time.
df_madri_composers_dates=df_madri.dropna(subset=['Composer life dates'])
df_madri = df_madri_composers_dates.merge(df_madri_pub_bef_1678, how="outer")
df_madri

Unnamed: 0,BL record ID,Title,Standardised title,Other titles,Composer,Composer life dates,Other names,Publication date (standardised),Country of publication,Place of publication,Publisher,Notes,Contents,Subject/genre terms
0,004166293,Il Primo Libro de Madrigali a Cinque Voci ... ...,,,"Adriani, Francesco",1539-1575,,1570,Italy,Vinegia,Appresso Girolamo Scotto,,,
1,004166294,Il Secondo Libro de Madrigali a Cinque Voci .....,,,"Adriani, Francesco",1539-1575,,1570,Italy,Vinegia,Appresso Girolamo Scotto,,,
2,004166295,"Luitmuziek ... Een keuze van fantasieën, danse...",,,"Adriansen, Emanuel",1554-1604,"Spiessons, Godelieve",1966,Belgium,Antwerpen,Vereniging voor musiekgeschiedenis te Antwerpen,,,
3,004166296,Novvm pratvm mvsicvm longo amoenissimvm : cviv...,Novum pratum musicum,Novum pratum musicum longo amoenissimum,"Adriaensen, Emanuel",1550-1604,"Phalèse, Pierre [printer] ; Bellère, Pierre [p...",1592,Belgium,Antverpiæ ; Antwerp,Excudebat Petrus Phalesius sibi & Ioanni Bellero,Method in lute tabulature followed by solo lut...,Methodvs ad omnes omnivm tonorvm cantiones in ...,Lute--Methods ; Lute music ; Madrigals ; Part ...
4,004166297,Pratum Musicum ... cuius ambitu ... comprehend...,,,"Adriansen, Emanuel",1554-1604,,1600,Belgium,Antuerpiæ,Ex Typographia Musica Petri Phalesij,The contents of this edition are different fro...,,
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
1958,004723438,Symphonia angelica,,,,,"Angelini, Horatio [composer] ; Animuccia, Paol...",1594,Belgium,In Anversa ; Antwerp,appresso Pietro Phalesio & Giovanni Bellero,"A collection of four-, five-, and six-part Ita...",Nova leggiadra stella / Dominico Lauro -- Laur...,"Madrigals, Italian"
1959,004746189,"Il terzo Libro delle Muse, a tre voci",,"Sur le joly jonc ; Allons, allons gay ; Or sui...","Muse a 3 voci, libro 3",,"Willaert, Adrian ; Mouton, Jean ; Reuez, N. ; ...",1562,Italy,In Vinegia ; Venice,Appresso Girolamo Scotto,Collection of French chansons in three parts (...,Svr le ioly ioly jonc ma doulc’ amye / Adriano...,"Madrigals, Italian--16th century"
1960,004759627,Musica Transalpina. Cantus,,,,,"Bertani, Lelio [composer] ; Byrd, William [com...",1588,England,London,Published by N Yonge,Collection of Italian madrigals translated int...,These that be certaine signes of my tormenting...,Madrigals
1961,004759632,Musica transalpina : the second booke of madri...,,MUSICA TRANSALPINA. / CANTVS. / THE SECOND BOO...,"Musica transalpina, Book 2",,"Bicci, Antonio [composer] ; Croce, Giovanni [c...",1597,England,At London,Printed by Thomas Este,Collection of 24 Italian madrigals given Engli...,The white delightfull swanne / Horatio Vecchi ...,"Madrigals, Italian"


In [20]:
df_madri.to_csv("data/madrigals_records_to_clean.csv")

By hand, we extracted the number of voices for which each record has been composed, as well as a new column for extracting the title of the single madrigal, if it applies

In [21]:
records_cleaned = pd.read_csv("madrigals_records_cleaned.csv", sep=";", encoding="utf-8")
records_cleaned = records_cleaned.replace({"Composer":inverted_names})
records_cleaned

Unnamed: 0,BL record ID,Title,Standardised title,Other titles,Composer,Composer life dates,Other names,Publication date (standardised),Country of publication,Place of publication,Publisher,Notes,Contents,Subject/genre terms,Voices,Titles madrigals
0,4166293.0,Il Primo Libro de Madrigali a Cinque Voci ... ...,,,Francesco Adriani,1539-1575,,1570.0,Italy,Vinegia,Appresso Girolamo Scotto,,,,5,
1,4166294.0,Il Secondo Libro de Madrigali a Cinque Voci .....,,,Francesco Adriani,1539-1575,,1570.0,Italy,Vinegia,Appresso Girolamo Scotto,,,,5,
2,4166295.0,"Luitmuziek ... Een keuze van fantasieën, danse...",,,Emanuel Adriansen,1554-1604,"Spiessons, Godelieve",1966.0,Belgium,Antwerpen,Vereniging voor musiekgeschiedenis te Antwerpen,,,,4; 5; 6,
3,4166296.0,Novvm pratvm mvsicvm longo amoenissimvm : cviv...,Novum pratum musicum,Novum pratum musicum longo amoenissimum,Emanuel Adriaensen,1550-1604,"Phalèse, Pierre [printer] ; Bellère, Pierre [p...",1592.0,Belgium,Antverpiæ ; Antwerp,Excudebat Petrus Phalesius sibi & Ioanni Bellero,Method in lute tabulature followed by solo lut...,Methodvs ad omnes omnivm tonorvm cantiones in ...,Lute--Methods ; Lute music ; Madrigals ; Part ...,4; 5; 6,
4,4166297.0,Pratum Musicum ... cuius ambitu ... comprehend...,,,Emanuel Adriansen,1554-1604,,1600.0,Belgium,Antuerpiæ,Ex Typographia Musica Petri Phalesij,The contents of this edition are different fro...,,,4; 5; 6,
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
1956,4227906.0,"Madrigali Concertati a due, tre e quattro Voci...",,,"Bonaffino, Filippo",,,1623.0,Italy,Messina,Appresso Pietro Brea,,,,2; 3; 4,
1957,4530372.0,Filomenici Concenti di Madrigali Concertati a ...,,,"Modiana, Horatio",,,1625.0,Italy,Venetia,Appresso Alessandro Vincenti,,,,2; 3; 4; 5,
1958,4334854.0,Di Giouanni Ferrari ... il Primo Libro de Madr...,,,"Ferrari, Giovanni",,,1628.0,Italy,Venetia,Stampa del Gardano Appresso Bartolameo Magni,,,,2; 3; 4,
1959,4506403.0,"Madrigali Concertati a Due, e Tre Voci ... Ope...",,,"Marastoni, Antonio",,,1628.0,Italy,Venetia,Stampa del Gardano Appresso B Magni,,,,2; 3,


# 3. Working on publishers
In order to do a study and visualizations on the publishers, we need to uniformize the data. 

## Cleaning publication place and country

From the column "Place of publication", we find the standard name in english and replace the name. 

Due to the different orthographs of the names, as well as deprecated names which are not present on Wikidata, it was simpler to write by hand a dictionary to map the old names to the new ones, by searching for them on Google, rather than writing a SPARQL query to extract this information. 

In [22]:
places = df_madri_pub_bef_1678[["Place of publication"]]
set_places = set(places["Place of publication"].dropna())


dict_places = {}
set_places_to_query = set()
for place in set_places:
    splitted_place = place.split(" ; ")
    for idx in range(len(splitted_place)):
        name = splitted_place[idx]
        if " " in name:
            name = name.split(" ")
            name = name[1]
        splitted_place[idx] = name
        set_places_to_query.add(name)
    dict_places[place] = {"names":splitted_place}
    
print(sorted(set_places))

['A Paris', 'Antuerpiæ', 'Antverpiæ ; Antwerp', 'Anuers', 'Anuersa', 'Anuersa ; Venice', 'Anversa', 'At London', 'Augspurg', 'Augusta', 'Bologna', 'Coloniae Agrippinae ; Cologne', 'Copenhave', 'Copenhaven', 'Copenhavē', 'Dresdae ; Dresden', 'En Anvers ; Antwerp', 'Erffurt', 'Excudebat Venetiis', 'Excudebat Venetiis ; Venice', 'Ferrara', 'Firenze', 'Firenze ; Florence', 'Francfort', 'Francoforti', 'Freybergk in Meissen', 'Genova', 'Getruckt zu Strassburg ; Strasbourg', 'Getruckt zů Strassburg ; Strasbourg', 'Gotha', 'Heidelberg', 'In Anuersa', 'In Anuersa ; Antwerp', 'In Anversa ; Antwerp', 'In Ferrara', 'In Roma ; Rome', 'In Roma ; Venice', 'In Venetia', 'In Venetia ; Venice', 'In Vineggia ; Venice', 'In Vinegia ; Venice', 'In Vinetia ; Venice', 'Jehna', 'Leipzig', 'London', 'Londra', 'Lyon', 'Lyone', 'Messina', 'Milano', 'Milano ; Venice', 'Monachii ; Munich', 'Monachij', 'Napoli', 'Noribergae', 'Noribergae ; Nuremberg', 'Noribergæ', 'Norimbergæ', 'Nürmberg', 'Nürnberg', 'Oruieto', 'O

In [23]:
dict_places = {'Veneggia': 'Venice', 'Excudebat Venetiis': 'Venice', 'Venetia ; Venice':'Venice', 'Noribergæ':'Nuremberg', 'Anuersa ; Venice':'Antwerp', 'Londra':'London', 
              'Gotha':'Gotha', 'Milano':'Milan', 'Augspurg':'Augsburg', 'Copenhavē':'Copenhagen', 'In Vinegia ; Venice':'Venice', 'In Roma ; Rome':'Rome', 'Anversa':'Antwerp',
              'Padua':'Padua', 'Noribergae':'Nuremberg', 'Anuersa':'Antwerp', 'Excudebat Venetiis ; Venice':'Venice', 'Dresdae ; Dresden': 'Dresden', 'Copenhave':'Copenhagen',
              'Oruieto':'Orvieto', 'Venetijs ; Venice':'Venice', 'In Venetia ; Venice':'Venice', 'In Vinetia ; Venice':'Venice', 'Bologna':'Bologna', 'Nürmberg':'Nuremberg',
              'Ventia':'Venice', 'Vinegia':'Venice', 'Firenze ; Florence':'Florence', 'Venetia':'Venice', 'Freybergk in Meissen':'Freiberg', 'Venegia':'Venice',
              'Getruckt zů Strassburg ; Strasbourg':'Strasbourg', 'Vinetia':'Venice', 'Leipzig':'Leipzig', 'Venetijs':'Venice', 'Ferrara':'Ferrara', 'Palermo':'Palermo',
              'Venetiis':'Venice', 'In Venetia':'Venice', 'In Anuersa':'Antwerp', 'Venezia':'Venice', 'Nürnberg':'Nuremberg', 'Francoforti':'Frankfurt am Main', 'Erffurt':'Erfurt',
              'Parma':'Parma', 'Anuers':'Antwerp', 'Venetiis ; Venice':'Venice', 'Oxford':'Oxford', 'Jehna':'Jena', 'Antverpiæ ; Antwerp':'Antwerp', 'Vineggia':'Venice',
              'Firenze':'Florence', 'Antuerpiæ':'Antwerp', 'in Anversa':'Antwerp', 'In Vineggia ; Venice':'Venice', 'London':'London', 'Napoli':'Naples', 'Augusta':'Augsburg',
              'Heidelberg':'Heidelberg', 'Wolferbyti':'Wolfenbüttel', 'Genova':'Genoa', 'Copenhaven':'Copenhagen', 'En Anvers ; Antwerp':'Antwerp', 'Roma':'Rome', 'Rotterodamo':'Rotterdam',
              'Venice':'Venice', 'Monachij':'Munich','In Roma ; Venice':'Rome', 'Stampato in Ferrara, et ristampato in Napoli, Per Constantino Vitale Ad istanza di Stefano Colacurcio':'Ferrara',
              'A Paris':'Paris', 'At London':'London', 'Coloniae Agrippinae ; Cologne':'Cologne', 'Francfort':'Frankfurt am Main', 'Getruckt zu Strassburg ; Strasbourg':'Strasbourg',
              'In Anuersa ; Antwerp':'Antwerp', 'In Anversa ; Antwerp':'Antwerp', 'In Ferrara':'Ferrara', 'Lyon':'Lyon', 'Lyone':'Lyon', 'Messina':'Messina','Milano ; Venice':'Milan',
              'Monachii ; Munich':'Munich', 'Noribergae ; Nuremberg':'Nuremberg', 'Norimbergæ':'Nuremberg','Pataviæ':'Padua', 'Rotenburg ob der Tauber':'Rothenburg ob der Tauber ',
              'Vineggia ; Venice':'Venice'}

In [24]:
df_madri_pub_bef_1678= df_madri_pub_bef_1678.replace({"Place of publication":dict_places})

#We realized that the data was wrong: the country of publication for the place Augsburg should be "Germany" and not "United States"
df_madri_pub_bef_1678 = df_madri_pub_bef_1678.replace({'Country of publication':{'United States':'Germany (East)'}})

## Cleaning publisher column
The orthograph of names being fluctuant at that time, there were several ways of writing the name of the same publisher, so we did a pre-cleaning with regular expressions, but then a better cleaning by hand in order to make sure that all rows published by the same publisher could be identified.

In [25]:
df_madri_pub_bef_1678= df_madri_pub_bef_1678.replace({"Composer":inverted_names})
df_madri_pub_bef_1678['Publisher'] = df_madri_pub_bef_1678['Publisher'].replace(to_replace ='[aA](p)?presso |[aA]pud |[pP]resso ', value = '', regex = True)
df_madri_pub_bef_1678.to_csv("data/publishers_to_clean.csv")

We continued the cleaning by hand to uniformize the names, and then merge the resulting dataframe with the new information about voices and madrigal title manually added to the records_cleaned dataframe.

In [26]:
publishers_cleaned = pd.read_csv("data/publishers_cleaned.csv", index_col="Column1")
publishers_cleaned = publishers_cleaned.merge(records_cleaned[['BL record ID', 'Voices', 'Titles madrigals']], how="left", on=["BL record ID"])
publishers_cleaned['Country of publication'] = publishers_cleaned['Country of publication'].replace(to_replace =' \(East\)', value = '', regex = True)

We realized that in order to compute geographical visualizations for the publishers, we could extract coordinates for publishers location from Wikidata, first by creating a dictionary of cities and their country extracted from the dataframe.

In [27]:
cities = dict(zip(publishers_cleaned['Place of publication'].dropna(),publishers_cleaned['Country of publication'].dropna()))

#it was necessary to change 'England' to 'United Kingdom' for the Wikidata query 
#which wouldn't have extracted Oxford and London otherwise
for city in cities: 
    if cities[city] == "England":
        cities[city] = "United Kingdom"

In [28]:
cities_locations={}
for city in cities:
    #query the endpoint based on the labels of the cities and countries.
    query_geolocations = """
    SELECT DISTINCT ?geolocation WHERE {
            {?city wdt:P31/wdt:P279* wd:Q515.} #instance or subclass of a city
            UNION
            {?city wdt:P31 wd:Q747074.} #instance of a comune of Italy (to include Orvieto which is not a city)
            ?city wdt:P17 ?country.
            ?country rdfs:label \""""+cities[city]+"""\"@en.
            ?city wdt:P625 ?geolocation.
            ?city rdfs:label \""""+city+"""\"@en.
            SERVICE wikibase:label { bd:serviceParam wikibase:language "en" }
        }

    """

    # set the endpoint 
    sparql_wd = SPARQLWrapper(wikidata_endpoint)
    # set the query
    sparql_wd.setQuery(query_geolocations)
    # set the returned format
    sparql_wd.setReturnFormat(JSON)
    # get the results
    results = sparql_wd.query().convert()

    # manipulate the result
    for result in results["results"]["bindings"]:
        cities_locations[city] = result['geolocation']['value']
        
#as an example, print the geolocation for Venice
print(cities_locations['Venice'])

Point(12.331944444 45.439722222)


In [29]:
new_column = publishers_cleaned['Place of publication'].apply(lambda x: (cities_locations[x] if x in cities_locations else x))
publishers_cleaned.insert(loc = 10,
          column = 'Coordinates place of publication',
          value = new_column)
publishers_cleaned.to_csv('data/final datasets/final_publishers_cleaned.csv')
publishers_cleaned

Unnamed: 0,BL record ID,Title,Standardised title,Other titles,Composer,Composer life dates,Other names,Publication date (standardised),Country of publication,Place of publication,Coordinates place of publication,Publisher,Notes,Contents,Subject/genre terms,Voices,Titles madrigals
0,4166293,Il Primo Libro de Madrigali a Cinque Voci ... ...,,,Francesco Adriani,1539-1575,,1570,Italy,Venice,Point(12.331944444 45.439722222),Girolamo Scotto,,,,5,
1,4166294,Il Secondo Libro de Madrigali a Cinque Voci .....,,,Francesco Adriani,1539-1575,,1570,Italy,Venice,Point(12.331944444 45.439722222),Girolamo Scotto,,,,5,
2,4166296,Novvm pratvm mvsicvm longo amoenissimvm : cviv...,Novum pratum musicum,Novum pratum musicum longo amoenissimum,Emanuel Adriaensen,1550-1604,"Phalèse, Pierre [printer] ; Bellère, Pierre [p...",1592,Belgium,Antwerp,Point(4.399722222 51.221111111),Pietro Phalesio & Giouanni Bellero,Method in lute tabulature followed by solo lut...,Methodvs ad omnes omnivm tonorvm cantiones in ...,Lute--Methods ; Lute music ; Madrigals ; Part ...,4; 5; 6,
3,4166297,Pratum Musicum ... cuius ambitu ... comprehend...,,,Emanuel Adriansen,1554-1604,,1600,Belgium,Antwerp,Point(4.399722222 51.221111111),Pietro Phalesio,The contents of this edition are different fro...,,,4; 5; 6,
4,4166441,Di Agostino Agazzari ... Il Primo Libro de Mad...,,,Agostino Agazzari,1578-1640,,1600,Italy,Venice,Point(12.331944444 45.439722222),Angelo Gardano,,,,5; 6; 8,
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
660,4762922,Il Primo Libro de' Madrigali a Cinque Voci ......,,,Cesare Zoilo,1584-1622,,1627,Italy,Naples,Point(14.25 40.833333333),Ambrosio Magnetta,,,,5,
661,4763937,Lucae Marentii ... madrigalia quinque vocum : ...,Madrigals. voices (5). Parts,,Luca Marenzio,1553-1599,,1601,Germany,Nuremberg,Point(11.0775 49.453888888),Paul Kauffmanns,,,,5,
662,14954746,Il secondo libro de Madrigali a 5 voci … con i...,"Madrigals a 5, Bk.II",,Salamone Rossi,1570-1630,,1601,Italy,Venice,Point(12.331944444 45.439722222),Amadino,"Listed in MGG, but no entries in BUCEM or RISM...",,,5,
663,15834926,Il secondo libro intabolatura di liuto : ove s...,Intabolatura di liuto. libro 2,,Melchior Neusidler,1531-1590,"Crecquillon, Thomas [composer] ; Gardane, Anto...",1566,Italy,Venice,Point(12.331944444 45.439722222),Antonio Gardano,"Collection of madrigals, chansons, dances and ...",Deus canticum novum (secunda pars: Quia delect...,Intabulations (Lute) ; Lute music,,


# 4. Study on madrigal texts

Creating a new dataframe with single titles of madrigals (when they are present in the dataset).

In order to include as much madrigal titles as possible, we involved also those published after 1768, as long as the composer was of the right period of time, that have been edited afterwards but still carry the music and lyrics of a 16th-17th century madrigal. 

In [30]:
#preliminary imports
from nltk.corpus import stopwords
en_stops = set(stopwords.words('english'))
fr_stops = set(stopwords.words('french'))
it_stops = set(stopwords.words('italian'))
ge_stops = set(stopwords.words('german'))
all_stops = en_stops.union(fr_stops, it_stops, ge_stops)

from googletrans import Translator, constants
translator = Translator()

In [31]:
#selecting records which have a single madrigal in the title.
madrigals_cleaned = records_cleaned.loc[records_cleaned['Titles madrigals'].notnull()]
madrigals_cleaned = madrigals_cleaned.drop_duplicates(['Titles madrigals','Composer'])

#adding collections which have the list of madrigals in the column 'Contents'
madrigals_cleaned = madrigals_cleaned.append(records_cleaned.loc[records_cleaned['Contents'].notnull()])
print(len(madrigals_cleaned))

795


In [32]:
dictionary_madrigals={}
idx = 0
for idx_row,row in madrigals_cleaned.iterrows():
    if not pd.isna(row['Titles madrigals']):
        madrigals = row['Titles madrigals'].split(';')
        for madrigal in madrigals: 
            translations = madrigal.split('. ')
            for translation in translations:
                words = re.split(r"[ :\-!,.]+|'s", translation)
                list_keywords = []
                for word in words: 
                    if word.lower() not in all_stops and word !="":
                        list_keywords.append(word.lower().strip(" '"))
                dictionary_madrigals[idx]={'Madrigal':translation, 'Composer single madrigal':row['Composer'], 'Voices':row['Voices'],'Keywords':list_keywords}
                idx+=1
                break
    #when the title of the madrigal is in the Contents column
    else:
        list_of_madrigals = re.split(r" -+ |;\s*", row['Contents'])
        for madrigal in list_of_madrigals:
            composer = np.nan
            madri = madrigal
            if "/" in madrigal:
                temp = re.split(r"\s*/\s*", madrigal)
                madri = temp[0]
                composer = temp[1].strip("[] ")
            words = re.split(r"[ :\-!,.]+|'s", madri)
            list_keywords = []
            for word in words: 
                if word.lower() not in all_stops and word !="":
                    list_keywords.append(word.lower())
            dictionary_madrigals[idx]={'Madrigal':madri, 'Composer single madrigal':composer, 'Composer collection':row['Composer'], 
                                        'Voices':row['Voices'], 'Keywords': list_keywords}
            idx+=1
    
new_df_by_madrigal = pd.DataFrame.from_dict(dictionary_madrigals, orient="index")  

**Warning: the next cell takes several minutes to run**

It sends a high number of request to the Google translate API

In [33]:
#Adding keywords in english
new_df_by_madrigal['Keywords_en'] = new_df_by_madrigal['Madrigal'].apply(lambda x: [word.lower().strip(" '") for word in re.split(r"[ :\-!,.]+|'s", translator.translate(x).text) if word.lower() not in all_stops and word !=""])

In [34]:
new_df_by_madrigal

Unnamed: 0,Madrigal,Composer single madrigal,Voices,Keywords,Composer collection,Keywords_en
0,Shall I abide this jesting,Richard Alison,5,"[shall, abide, jesting]",,"[shall, abide, jesting]"
1,A garden is my lady's face.,Richard Alison,5,"[garden, lady, face]",,"[garden, lady, face]"
2,There is a Garden in her Face,Richard Alison,5,"[garden, face]",,"[garden, face]"
3,Ah me! Where is my true Love,Felice Anerio,4,"[ah, true, love]",,"[ah, true, love]"
4,"When lo, by Break of Morning",Felice Anerio,4,"[break, morning]",,"[break, morning]"
...,...,...,...,...,...,...
6150,The Nightingale that sweetly doth complayne (s...,Peter Phillips,5,"[nightingale, sweetly, doth, complayne, (secon...",,"[nightingale, sweetly, doth, complayne, (secon..."
6151,As Mopsus went his silly flock foorth leading,Stefano Venturi,5,"[mopsus, went, silly, flock, foorth, leading]",,"[mopsus, went, silly, flock, foorth, leading]"
6152,Flora faire Nimph whilst silly Lambs are feeding,Giovanni Feretti,5,"[flora, faire, nimph, whilst, silly, lambs, fe...",,"[flora, fairy, nymph, silly, lambs, feeding]"
6153,"My sweet Layis, Lady mistres",Giovanni di Macque,5,"[sweet, layis, lady, mistres]",,"[sweet, layis, lady, mistress]"


In [36]:
#saving the final dataset
new_df_by_madrigal.to_csv("data/final datasets/madrigals_by_title.csv")

In [37]:
test_word_cloud = pd.read_csv("madrigals_by_title.csv")

In [40]:
big_string=""
for idx_row,row in new_df_by_madrigal.iterrows():
    big_string+=" "+" ".join(row['Keywords'])
for word in ['part', 'first', 'second', 'third', 'fourth', 'fifth', 'sixth','prima', 'seconda', 'parte', 'terza', 'quarta']:
    big_string = big_string.replace(word, "")
print(big_string)

 shall abide jesting garden lady face garden face ah true love break morning deh dimm amor dicoche fra spring glory bianco dolce cigno gentle silver swan alte sfere madrigale dolce usignolo mai provasti donna hark hear heav'nly harmony oriana farewell nightingale fly love sister awake sweet delightful lillies oriana walk'd take air cupid bed roses cytherea smiling said hills corina trips found her? heard noise sù sù sù dormir creatures merry minded lure falconers lure shepherds follow cruel unkind languish complain grief sleep fond fancy sing ye nymphs thyrsis sleepest thou flow tears weep mine eyes glance look'd look'd ch'ami vita amatemi ben baci amorosi cari turn amarillis melissomelos bee madrigall let us sing merry glee sweet merry month may cast doubtful care bright sun hail thou merry month may sweet merry month amarilli bella fere selvagge tune viol love love truth calm air gentle swains every bush new springing ev'ry bush new springing every bush new springing faustina hath fa