# Notebook 1: Creating Items for Building Complexes

This notebook implements the first step of creating of the Klosterdatenbank-to-Factgrid-Workflow which is to create Items for the building complexes. It contains describing elements about the underlying data model and the workflow in general, as well as specific instructions in order to run the notebook. Markdown cells containing describing elements are marked as `#description`. Instructional sections are marked as `#instruction`.

Strictly speaking, the monastery database does not contain dedicated information on building complexes. Information on where a religious community had its place of operation is stored in the `gs_monastery_location` table. This table assigns each row of a religious community (`gsn_id`) to a location (`place_id`) and, if known, specific coordinates within this location (`longitude`, `latitude`). Such an assignment implies that the community lived or worked at this location at a certain point in time. At this point, we make the central assumption that a building complex of some kind, consisting of at least one building, must have existed. Accordingly, the building complexes created in this step represent both a row from the `gs_monastery_location` table and thus an assignment of a monastery to a specific location, as well as physical buildings in which religious communities worked and which may have continued to exist before or after their use and have experienced other use scenarios.

The notebook requires the following libraries to run. If an error occurs, make sure the libraries are installed on your system.

In [1]:
import pandas as pd
import numpy as np
import os
import csv

First, the export files are loaded into [Dataframes](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.html). The dataframes are stored in a dictionary with the keys being the filenames, for easier access.

In [2]:
# Load Access exports
export_files = {}
for export_file in os.listdir("data/exports_monasteryDB"):
    if export_file.endswith("xlsx"):
        export_files[export_file.split(".")[0]] = f"data/exports_monasteryDB/{export_file}"

# Create dataframes for each table
dataframes = {key: pd.read_excel(value) for key, value in export_files.items()}

# Add dataframe for monasteries in factGrid (stored in a different directory)
dataframes["building_complexes_in_factgrid"] = pd.read_csv("data/factgrid_data/building_complexes_in_factgrid.csv")
dataframes["monasteries_in_factgrid"] = pd.read_csv("data/factgrid_data/monasteries_in_factgrid.csv")

Since `gs_monastery_location` does not contain the name of the monasteries, the table is joined to `gs_monastery` to extract the missing information. The resulting table is cut down to the relevant columns. The resulting dataframe is being filtered to only contain information on religious comunities that have the status "online", meaning they are not currently worked on anymore. Finally, to make sure that no duplicate building complexes are being created, the table is filtered against the existing building complexes in FactGrid.

In [3]:
# Merge gs_monastery_location and gs_monastery
merged_df = pd.merge(dataframes["gs_monastery_location"], dataframes["gs_monastery"], left_on='gsn_id', right_on='id_gsn', how='left')
# Filter for status 'online'
online_df = merged_df[merged_df["status"] == "Online"]
# Define columns to drop
drop_columns = [
    "relocated", 
    "comment", 
    "main_location", 
    "diocese_id", 
    "id_monastery", 
    "date_created", 
    "created_by_user", 
    "note", 
    "patrocinium",
    "selection", 
    "processing_status", 
    "gs_persons", 
    "selection_criteria", 
    "last_change", 
    "changed_by_user", 
    "founder",
    "Unnamed: 0_x",
    "Unnamed: 0_y"
]
# Prepare dataframe by dropping unnecessary columns
prepared_df = online_df.drop(drop_columns, axis="columns")
prepared_df = prepared_df[~prepared_df["id_monastery_location"].isin(dataframes["building_complexes_in_factgrid"]["GSVocabTerm"])]
prepared_df

Unnamed: 0,id_monastery_location,place_id,gsn_id,location_begin_tpq,location_begin_taq,location_begin_note,location_end_tpq,location_end_taq,location_end_note,longitude,latitude,location_name,id_gsn,status,monastery_name
0,16673,46484371,10880,1773,,,1869,,,14.06718,49.961381,Beraun,10880,Online,"Piaristenkolleg Beraun (Beroun), Tschechien"
1,6072,19993,3593,1135,1145.0,um 1140,1525,,,10.659167,50.828889,,3593,Online,Zisterzienserkloster Georgenthal
2,1865,46479281,3468,1346,,1346,1541,,1541,14.56,53.426389,Stettin,3468,Online,"Kollegiatstift St. Otto, Stettin (Szczecin), P..."
3,16824,46484486,11549,1295,1305.0,um 1300,1300,1325.0,frühes 14. Jahrhundert,,,Luditz,11549,Online,"Dominikanerinnenkloster Luditz (Žlutice), Tsch..."
4,16923,46479178,11613,1285,1380.0,wahrscheinlich 1285,1824,,,5.232053,50.52405,Huy,11613,Online,"Magdalenerinnenkloster Saint-Quirin, Huy, Belgien"
5,16936,6305,50228,1478,1478.0,1478/1479,1802,,,6.958232,50.925171,,50228,Online,"Franziskanerterziarinnen St. Bonifatius, Köln"
6,10564,16731,302,1186,,1186/1191,1192,,,11.042222,51.901389,,302,Online,Prämonstratenserstift Halberstadt
7,13544,46481605,8354,1204,,,1584,1594.0,1584/1594,6.42447,53.37613,,8354,Online,Prämonstratenserinnenstift (Nijenklooster) Klo...
8,765,4845,367,1427,,,1484,,,,,Hemeringen,367,Online,Augustinerchorfrauenstift Egestorf
9,367,39870,367,1293,1303.0,ca. 1298,1427,,,,,Egestorf,367,Online,Augustinerchorfrauenstift Egestorf


To double-check potential duplicates, the following cell finds buildings complexes that are connected to monasteries already existent in FactGrid. If the resulting DataFrame is empty, all building complexes will be linked to newly created monastery items.

In [4]:
existing_monasteries = prepared_df[prepared_df["gsn_id"].isin(dataframes["monasteries_in_factgrid"]["KlosterdatenbankID"])]
existing_monasteries

Unnamed: 0,id_monastery_location,place_id,gsn_id,location_begin_tpq,location_begin_taq,location_begin_note,location_end_tpq,location_end_taq,location_end_note,longitude,latitude,location_name,id_gsn,status,monastery_name
17,8445,20353,3823,1320,,,1531,,,10.266809,50.718199,,3823,Online,Zisterzienserkloster Georgenzell


It is expected that items in FactGrid have a label in at least one language. While the FactGrid ID (also referred to as the "Q-Number") uniquely identifies the item, the label serves to capture the name of the item in everyday language. The label is also indexed for text-based search. The naming of the items created in this project follows the following rule:
- For the religious communities, the name from the monastery database is used as the label, for example "Zisterzienserkloster Georgenzell".
- For the building complexes, the labels are constructed according to the following schema: `Gebäudekomplex <monastery_name> [(<location_name>)]`. Here, `monastery_name` is again the name of the religious community from the `gs_monastery` table. `location_name` is a column of the `gs_monastery_location` table. In this column, if available, the specific name given to this location is stored. 

For example, the "Benediktinerinnenkloster Mielen, Sint-Truiden, Belgien" (GSN [11665](https://klosterdatenbank.adw-goe.de/gsn/11665)) has two locations in the Belgian town of Sint-Truiden, namely the location "Sint Truiden" and the location "Metsteren" (see Figure). The constructed labels are then "Gebäudekomplex Benediktinerinnenkloster Mielen, Sint-Truiden, Belgien (Sint-Truiden)" and "Gebäudekomplex Benediktinerinnenkloster Mielen, Sint-Truiden, Belgien (Metsteren)". However, location names are not available in all these cases, which can lead to duplicates in the labels. These are displayed in the workflow, so that location names can be added to distinguish them better.

<img src="documentation-images/Standorte GSN11665.png" alt="Monastery Locations of GSN 11665" width="500">

*Figure 1: Building Complexes of the Benedictine nun's monastery Mielen in Sint-Truiden, Belgium (GSN 11665). Base-Layer: OpenStreetMap.*

The following cell constructs the location names and saves them in a new column called "Lde" (see [Quickstatements specification](https://www.wikidata.org/wiki/Help:QuickStatements#Adding_labels,_aliases,_descriptions_and_sitelinks)).

In [5]:
# 1. Create new column with labels
prepared_df['Lde'] = "Gebäudekomplex " + prepared_df["monastery_name"].str.cat(prepared_df["location_name"].fillna(''), sep=" (") +")"
# 2. If necessary, delete empty brackets at end of labels
prepared_df['Lde'] = prepared_df["Lde"].str.replace(r'\(\)', '', regex=True).apply(lambda x: f'\"{x.strip()}\"')
prepared_df

Unnamed: 0,id_monastery_location,place_id,gsn_id,location_begin_tpq,location_begin_taq,location_begin_note,location_end_tpq,location_end_taq,location_end_note,longitude,latitude,location_name,id_gsn,status,monastery_name,Lde
0,16673,46484371,10880,1773,,,1869,,,14.06718,49.961381,Beraun,10880,Online,"Piaristenkolleg Beraun (Beroun), Tschechien","""Gebäudekomplex Piaristenkolleg Beraun (Beroun..."
1,6072,19993,3593,1135,1145.0,um 1140,1525,,,10.659167,50.828889,,3593,Online,Zisterzienserkloster Georgenthal,"""Gebäudekomplex Zisterzienserkloster Georgenthal"""
2,1865,46479281,3468,1346,,1346,1541,,1541,14.56,53.426389,Stettin,3468,Online,"Kollegiatstift St. Otto, Stettin (Szczecin), P...","""Gebäudekomplex Kollegiatstift St. Otto, Stett..."
3,16824,46484486,11549,1295,1305.0,um 1300,1300,1325.0,frühes 14. Jahrhundert,,,Luditz,11549,Online,"Dominikanerinnenkloster Luditz (Žlutice), Tsch...","""Gebäudekomplex Dominikanerinnenkloster Luditz..."
4,16923,46479178,11613,1285,1380.0,wahrscheinlich 1285,1824,,,5.232053,50.52405,Huy,11613,Online,"Magdalenerinnenkloster Saint-Quirin, Huy, Belgien","""Gebäudekomplex Magdalenerinnenkloster Saint-Q..."
5,16936,6305,50228,1478,1478.0,1478/1479,1802,,,6.958232,50.925171,,50228,Online,"Franziskanerterziarinnen St. Bonifatius, Köln","""Gebäudekomplex Franziskanerterziarinnen St. B..."
6,10564,16731,302,1186,,1186/1191,1192,,,11.042222,51.901389,,302,Online,Prämonstratenserstift Halberstadt,"""Gebäudekomplex Prämonstratenserstift Halberst..."
7,13544,46481605,8354,1204,,,1584,1594.0,1584/1594,6.42447,53.37613,,8354,Online,Prämonstratenserinnenstift (Nijenklooster) Klo...,"""Gebäudekomplex Prämonstratenserinnenstift (Ni..."
8,765,4845,367,1427,,,1484,,,,,Hemeringen,367,Online,Augustinerchorfrauenstift Egestorf,"""Gebäudekomplex Augustinerchorfrauenstift Eges..."
9,367,39870,367,1293,1303.0,ca. 1298,1427,,,,,Egestorf,367,Online,Augustinerchorfrauenstift Egestorf,"""Gebäudekomplex Augustinerchorfrauenstift Eges..."


As mentioned above, there might be duplicate labels in cases where locations don't have an explicit name. Since they still can be distinguished from another by their identifier and coordinates, this is not necessarily a problem. However, the following cell will create a list of all the duplicate labels so that they can be examined.

**In order to resolve the duplicates**

1. Open and inspect the table located at `data/intermediate_results/duplicate_building_complex_labels.xslx`
2. Add location names in the monastery database
3. Create new exports from the monastery database and replace `data/exports_monasteryDB/gs_monastery.xlsx` and `data/exports_monasteryDB/gs_monastery_location.xlsx` with the new files
4. Re-run the notebook. The cell below now should no longer contain the duplicates you resolved. 

In [6]:
duplicated_building_complex_labels = prepared_df[prepared_df.duplicated(subset="Lde", keep=False)]
duplicated_building_complex_labels.to_excel('data/intermediate_results/duplicate_building_complex_labels.xlsx')
duplicated_building_complex_labels

Unnamed: 0,id_monastery_location,place_id,gsn_id,location_begin_tpq,location_begin_taq,location_begin_note,location_end_tpq,location_end_taq,location_end_note,longitude,latitude,location_name,id_gsn,status,monastery_name,Lde
5,16936,6305,50228,1478,1478.0,1478/1479,1802,,,6.958232,50.925171,,50228,Online,"Franziskanerterziarinnen St. Bonifatius, Köln","""Gebäudekomplex Franziskanerterziarinnen St. B..."
9,367,39870,367,1293,1303.0,ca. 1298,1427,,,,,Egestorf,367,Online,Augustinerchorfrauenstift Egestorf,"""Gebäudekomplex Augustinerchorfrauenstift Eges..."
14,7909,39870,367,1484,,,1559,,,,,Egestorf,367,Online,Augustinerchorfrauenstift Egestorf,"""Gebäudekomplex Augustinerchorfrauenstift Eges..."
15,7150,6305,50228,1367,1293.0,Ende des 13. Jahrhunderts,1478,1479.0,1478/1479,6.944923,50.938611,,50228,Online,"Franziskanerterziarinnen St. Bonifatius, Köln","""Gebäudekomplex Franziskanerterziarinnen St. B..."


FactGrid is a multilingual platform. Therefore, the labels for the monasteries and building complexes should not only be created in German, but also in English. Due to the heterogeneity of the monastery names in the database, a rule-based translation is difficult to implement. Instead, a Large-Language Model was used. The model, prompting, and details of the translation are described in more detail in the notebook "1a - Translation". We are using the [GWDG/KISSKI API](https://docs.hpc.gwdg.de/services/chat-ai/index.html), so in order to execute the notebook, a [SAIA API key](https://docs.hpc.gwdg.de/services/saia/index.html) is needed. Since the translation process can take some time, it has been outsourced to a separate notebook.

In [7]:
to_translate = prepared_df[["monastery_name", 'Lde']].copy()
to_translate["note"] = "x"
to_translate["Dde"] = "x"
to_translate = to_translate.rename(columns={"Lde": "building_Lde", "Dde": "building_Dde", "monastery_name" : "monastery_Lde", "note": "monastery_Dde"})
to_translate.to_csv("data/translation/to_translate.csv")
to_translate

Unnamed: 0,monastery_Lde,building_Lde,monastery_Dde,building_Dde
0,"Piaristenkolleg Beraun (Beroun), Tschechien","""Gebäudekomplex Piaristenkolleg Beraun (Beroun...",x,x
1,Zisterzienserkloster Georgenthal,"""Gebäudekomplex Zisterzienserkloster Georgenthal""",x,x
2,"Kollegiatstift St. Otto, Stettin (Szczecin), P...","""Gebäudekomplex Kollegiatstift St. Otto, Stett...",x,x
3,"Dominikanerinnenkloster Luditz (Žlutice), Tsch...","""Gebäudekomplex Dominikanerinnenkloster Luditz...",x,x
4,"Magdalenerinnenkloster Saint-Quirin, Huy, Belgien","""Gebäudekomplex Magdalenerinnenkloster Saint-Q...",x,x
5,"Franziskanerterziarinnen St. Bonifatius, Köln","""Gebäudekomplex Franziskanerterziarinnen St. B...",x,x
6,Prämonstratenserstift Halberstadt,"""Gebäudekomplex Prämonstratenserstift Halberst...",x,x
7,Prämonstratenserinnenstift (Nijenklooster) Klo...,"""Gebäudekomplex Prämonstratenserinnenstift (Ni...",x,x
8,Augustinerchorfrauenstift Egestorf,"""Gebäudekomplex Augustinerchorfrauenstift Eges...",x,x
9,Augustinerchorfrauenstift Egestorf,"""Gebäudekomplex Augustinerchorfrauenstift Eges...",x,x


After executing the above cell, a table is generated in `data/translation` that contains all terms that should be translated: `to_translate.csv`. Execute Notebook 1a. Once the execution is completed, there should be a file named `translated.csv` that contains the translations within the `data/translation` folder. Once the file exists, you can run the next cell to load the translated labels.

In [8]:
translated = pd.read_csv("data/translation/translated.csv")
prepared_df["Len"] = translated["building_Len"].apply(lambda x:f'\"{x}\"')
prepared_df

Unnamed: 0,id_monastery_location,place_id,gsn_id,location_begin_tpq,location_begin_taq,location_begin_note,location_end_tpq,location_end_taq,location_end_note,longitude,latitude,location_name,id_gsn,status,monastery_name,Lde,Len
0,16673,46484371,10880,1773,,,1869,,,14.06718,49.961381,Beraun,10880,Online,"Piaristenkolleg Beraun (Beroun), Tschechien","""Gebäudekomplex Piaristenkolleg Beraun (Beroun...","""Building complex Piarist college Beraun (Bero..."
1,6072,19993,3593,1135,1145.0,um 1140,1525,,,10.659167,50.828889,,3593,Online,Zisterzienserkloster Georgenthal,"""Gebäudekomplex Zisterzienserkloster Georgenthal""","""Building complex Cistercian monastery Georgen..."
2,1865,46479281,3468,1346,,1346,1541,,1541,14.56,53.426389,Stettin,3468,Online,"Kollegiatstift St. Otto, Stettin (Szczecin), P...","""Gebäudekomplex Kollegiatstift St. Otto, Stett...","""Building complex Collegiate Church of St. Ott..."
3,16824,46484486,11549,1295,1305.0,um 1300,1300,1325.0,frühes 14. Jahrhundert,,,Luditz,11549,Online,"Dominikanerinnenkloster Luditz (Žlutice), Tsch...","""Gebäudekomplex Dominikanerinnenkloster Luditz...","""Building complex Dominican Nuns' monastery Lu..."
4,16923,46479178,11613,1285,1380.0,wahrscheinlich 1285,1824,,,5.232053,50.52405,Huy,11613,Online,"Magdalenerinnenkloster Saint-Quirin, Huy, Belgien","""Gebäudekomplex Magdalenerinnenkloster Saint-Q...","""Building complex Monastery of St. Mary Magdal..."
5,16936,6305,50228,1478,1478.0,1478/1479,1802,,,6.958232,50.925171,,50228,Online,"Franziskanerterziarinnen St. Bonifatius, Köln","""Gebäudekomplex Franziskanerterziarinnen St. B...","""Building complex Franciscans St. Boniface, Co..."
6,10564,16731,302,1186,,1186/1191,1192,,,11.042222,51.901389,,302,Online,Prämonstratenserstift Halberstadt,"""Gebäudekomplex Prämonstratenserstift Halberst...","""Building complex Premonstratensians of Halber..."
7,13544,46481605,8354,1204,,,1584,1594.0,1584/1594,6.42447,53.37613,,8354,Online,Prämonstratenserinnenstift (Nijenklooster) Klo...,"""Gebäudekomplex Prämonstratenserinnenstift (Ni...","""Building complex Premonstratensian nuns (Nije..."
8,765,4845,367,1427,,,1484,,,,,Hemeringen,367,Online,Augustinerchorfrauenstift Egestorf,"""Gebäudekomplex Augustinerchorfrauenstift Eges...","""Building complex Canonesses Regular of St Aug..."
9,367,39870,367,1293,1303.0,ca. 1298,1427,,,,,Egestorf,367,Online,Augustinerchorfrauenstift Egestorf,"""Gebäudekomplex Augustinerchorfrauenstift Eges...","""Building complex Canonesses Regular of St Aug..."


Our data model separates religious communities from the building complexes in which they lived and worked. The geocoordinates of a location of a religious community are properties of the building complex in this modeling. In the monastery database, there are two levels of accuracy with which the localization of a monastery location can be performed: coordinates for a monastery location will either represent the exact point where the building was located, or the central point of a place, e.g. a village, in which it was located. It is to be noted that the centroid-based location always only represents an approximation of the centroid of the modern location. In cases where the exact location of the building complex is unknown, the respective item will not be linked to any coordinates. Instead, the coordinates of the place where it is located should be queried. In all other cases, the coordinates are directly linked to the building complexes, using values from the `latitude` and `longitude` columns as [P48](https://database.factgrid.de/wiki/Property:P48).

In [9]:
for index, row in prepared_df.iterrows():
    if (not pd.isna(row["latitude"])) and (not pd.isna(row["longitude"])):
        prepared_df.loc[index, "P48"] = f'@{row["latitude"]}/{row["longitude"]}'
prepared_df.drop_duplicates(subset="P48", inplace=True)
prepared_df

Unnamed: 0,id_monastery_location,place_id,gsn_id,location_begin_tpq,location_begin_taq,location_begin_note,location_end_tpq,location_end_taq,location_end_note,longitude,latitude,location_name,id_gsn,status,monastery_name,Lde,Len,P48
0,16673,46484371,10880,1773,,,1869,,,14.06718,49.961381,Beraun,10880,Online,"Piaristenkolleg Beraun (Beroun), Tschechien","""Gebäudekomplex Piaristenkolleg Beraun (Beroun...","""Building complex Piarist college Beraun (Bero...",@49.96138086190997/14.067180236807877
1,6072,19993,3593,1135,1145.0,um 1140,1525,,,10.659167,50.828889,,3593,Online,Zisterzienserkloster Georgenthal,"""Gebäudekomplex Zisterzienserkloster Georgenthal""","""Building complex Cistercian monastery Georgen...",@50.828889/10.659167
2,1865,46479281,3468,1346,,1346,1541,,1541,14.56,53.426389,Stettin,3468,Online,"Kollegiatstift St. Otto, Stettin (Szczecin), P...","""Gebäudekomplex Kollegiatstift St. Otto, Stett...","""Building complex Collegiate Church of St. Ott...",@53.426389/14.56
3,16824,46484486,11549,1295,1305.0,um 1300,1300,1325.0,frühes 14. Jahrhundert,,,Luditz,11549,Online,"Dominikanerinnenkloster Luditz (Žlutice), Tsch...","""Gebäudekomplex Dominikanerinnenkloster Luditz...","""Building complex Dominican Nuns' monastery Lu...",
4,16923,46479178,11613,1285,1380.0,wahrscheinlich 1285,1824,,,5.232053,50.52405,Huy,11613,Online,"Magdalenerinnenkloster Saint-Quirin, Huy, Belgien","""Gebäudekomplex Magdalenerinnenkloster Saint-Q...","""Building complex Monastery of St. Mary Magdal...",@50.52404981028194/5.232052788226349
5,16936,6305,50228,1478,1478.0,1478/1479,1802,,,6.958232,50.925171,,50228,Online,"Franziskanerterziarinnen St. Bonifatius, Köln","""Gebäudekomplex Franziskanerterziarinnen St. B...","""Building complex Franciscans St. Boniface, Co...",@50.92517069926024/6.95823197738584
6,10564,16731,302,1186,,1186/1191,1192,,,11.042222,51.901389,,302,Online,Prämonstratenserstift Halberstadt,"""Gebäudekomplex Prämonstratenserstift Halberst...","""Building complex Premonstratensians of Halber...",@51.901389/11.042222
7,13544,46481605,8354,1204,,,1584,1594.0,1584/1594,6.42447,53.37613,,8354,Online,Prämonstratenserinnenstift (Nijenklooster) Klo...,"""Gebäudekomplex Prämonstratenserinnenstift (Ni...","""Building complex Premonstratensian nuns (Nije...",@53.3761296985323/6.42447009029927
10,9788,46483304,5178,806,823.0,zwischen 806/807 und 823,1811,,,9.045081,47.159873,,5178,Online,"Augustinerchorfrauen Schänis, Schweiz","""Gebäudekomplex Augustinerchorfrauen Schänis, ...","""Building complex Canonesses Regular of St Aug...",@47.159873/9.045081
11,1259,46477066,2013,1136,,,1803,,,12.88855,47.73154,Reichenhall,2013,Online,"Augustinerchorherrenstift St. Zeno, Reichenhall","""Gebäudekomplex Augustinerchorherrenstift St. ...","""Building complex Canons Regular of St Augusti...",@47.73154/12.88855


The prerequisite for connecting all building complexes with the locations in which they were found is that there are items in FactGrid for these locations. For the collection on locality data in the monastery database, the open source service [geonames](https://www.geonames.org/) was the central tool. Therefore, there is a geonames ID in the monastery database for each location. In FactGrid, there is also a qualifier (P418) for the GeoNames ID. This can be used to assign the location data to each other and to subsequently fill in missing locations. The notebook 1b - Place Matching describes this process.

In order to match all places needed, a matching between FactGrid and the place data from the monastery database is needed. All information that is already available should be placed in a file called `places_reconciled.xlsx` in the `reconciliation` folder. Make sure that the table has at least a column called `place_id` and one called `factgrid_id` that represent the id of the place in the table `gs_places` and in FactGrid respectively. The following cell will load the reconciled places and merge them to the data. If any places remain without a FactGrid id, they will be saved in a new table called `places_without_factgrid.xlsx` in the `reconciliation` folder. Find or create the missing Items in Factgrid and add the information to the `places_reconciled.xlsx` table in the `reconciliation` folder. Afterwards, re-run the workflow. 

In [10]:
# 1. Load the reconciled places
places_reconciled = pd.read_excel("data/reconciliation/places_reconciled.xlsx")[["place_id", "factgrid_id"]]
# 2. Merge them to the table with prepared monasteries
prepared_df = pd.merge(prepared_df, places_reconciled, how="left", on="place_id")
prepared_df = prepared_df.rename(columns={"factgrid_id":"P83"})
prepared_df
# 3. Filter out missing FactGrid Items and store them in a separate table
missing_factgrid_ids = prepared_df[prepared_df['P83'].isna()]
missing_factgrid_ids.to_excel('data/results/monasteries_without_factGrid.xlsx')
prepared_df = prepared_df.dropna(subset = 'P83')
prepared_df

Unnamed: 0,id_monastery_location,place_id,gsn_id,location_begin_tpq,location_begin_taq,location_begin_note,location_end_tpq,location_end_taq,location_end_note,longitude,latitude,location_name,id_gsn,status,monastery_name,Lde,Len,P48,P83
0,16673,46484371,10880,1773,,,1869,,,14.06718,49.961381,Beraun,10880,Online,"Piaristenkolleg Beraun (Beroun), Tschechien","""Gebäudekomplex Piaristenkolleg Beraun (Beroun...","""Building complex Piarist college Beraun (Bero...",@49.96138086190997/14.067180236807877,Q629276
1,6072,19993,3593,1135,1145.0,um 1140,1525,,,10.659167,50.828889,,3593,Online,Zisterzienserkloster Georgenthal,"""Gebäudekomplex Zisterzienserkloster Georgenthal""","""Building complex Cistercian monastery Georgen...",@50.828889/10.659167,Q23292
2,1865,46479281,3468,1346,,1346,1541,,1541,14.56,53.426389,Stettin,3468,Online,"Kollegiatstift St. Otto, Stettin (Szczecin), P...","""Gebäudekomplex Kollegiatstift St. Otto, Stett...","""Building complex Collegiate Church of St. Ott...",@53.426389/14.56,Q21782
3,16824,46484486,11549,1295,1305.0,um 1300,1300,1325.0,frühes 14. Jahrhundert,,,Luditz,11549,Online,"Dominikanerinnenkloster Luditz (Žlutice), Tsch...","""Gebäudekomplex Dominikanerinnenkloster Luditz...","""Building complex Dominican Nuns' monastery Lu...",,Q623515
5,16936,6305,50228,1478,1478.0,1478/1479,1802,,,6.958232,50.925171,,50228,Online,"Franziskanerterziarinnen St. Bonifatius, Köln","""Gebäudekomplex Franziskanerterziarinnen St. B...","""Building complex Franciscans St. Boniface, Co...",@50.92517069926024/6.95823197738584,Q10400
6,10564,16731,302,1186,,1186/1191,1192,,,11.042222,51.901389,,302,Online,Prämonstratenserstift Halberstadt,"""Gebäudekomplex Prämonstratenserstift Halberst...","""Building complex Premonstratensians of Halber...",@51.901389/11.042222,Q10374
7,13544,46481605,8354,1204,,,1584,1594.0,1584/1594,6.42447,53.37613,,8354,Online,Prämonstratenserinnenstift (Nijenklooster) Klo...,"""Gebäudekomplex Prämonstratenserinnenstift (Ni...","""Building complex Premonstratensian nuns (Nije...",@53.3761296985323/6.42447009029927,Q1348301
8,9788,46483304,5178,806,823.0,zwischen 806/807 und 823,1811,,,9.045081,47.159873,,5178,Online,"Augustinerchorfrauen Schänis, Schweiz","""Gebäudekomplex Augustinerchorfrauen Schänis, ...","""Building complex Canonesses Regular of St Aug...",@47.159873/9.045081,Q880959
9,1259,46477066,2013,1136,,,1803,,,12.88855,47.73154,Reichenhall,2013,Online,"Augustinerchorherrenstift St. Zeno, Reichenhall","""Gebäudekomplex Augustinerchorherrenstift St. ...","""Building complex Canons Regular of St Augusti...",@47.73154/12.88855,Q82598
10,7119,6305,50197,1263,1313.0,vor 1313,1802,,,6.956109,50.936995,,50197,Online,"Augustinerinnen-, später Benediktinerinnenklos...","""Gebäudekomplex Augustinerinnen-, später Bened...","""Building complex Augustinian nuns, later Bene...",@50.93699504169764/6.956108698612635,Q10400


To state that these items are building complexes, the Item [Q635758](https://database.factgrid.de/wiki/Item:Q635758) (building complex) is connected to all entries using [P2](https://database.factgrid.de/wiki/Property:P2) (instance of)

In [11]:
prepared_df["P2"] = "Q635758"
prepared_df

Unnamed: 0,id_monastery_location,place_id,gsn_id,location_begin_tpq,location_begin_taq,location_begin_note,location_end_tpq,location_end_taq,location_end_note,longitude,latitude,location_name,id_gsn,status,monastery_name,Lde,Len,P48,P83,P2
0,16673,46484371,10880,1773,,,1869,,,14.06718,49.961381,Beraun,10880,Online,"Piaristenkolleg Beraun (Beroun), Tschechien","""Gebäudekomplex Piaristenkolleg Beraun (Beroun...","""Building complex Piarist college Beraun (Bero...",@49.96138086190997/14.067180236807877,Q629276,Q635758
1,6072,19993,3593,1135,1145.0,um 1140,1525,,,10.659167,50.828889,,3593,Online,Zisterzienserkloster Georgenthal,"""Gebäudekomplex Zisterzienserkloster Georgenthal""","""Building complex Cistercian monastery Georgen...",@50.828889/10.659167,Q23292,Q635758
2,1865,46479281,3468,1346,,1346,1541,,1541,14.56,53.426389,Stettin,3468,Online,"Kollegiatstift St. Otto, Stettin (Szczecin), P...","""Gebäudekomplex Kollegiatstift St. Otto, Stett...","""Building complex Collegiate Church of St. Ott...",@53.426389/14.56,Q21782,Q635758
3,16824,46484486,11549,1295,1305.0,um 1300,1300,1325.0,frühes 14. Jahrhundert,,,Luditz,11549,Online,"Dominikanerinnenkloster Luditz (Žlutice), Tsch...","""Gebäudekomplex Dominikanerinnenkloster Luditz...","""Building complex Dominican Nuns' monastery Lu...",,Q623515,Q635758
5,16936,6305,50228,1478,1478.0,1478/1479,1802,,,6.958232,50.925171,,50228,Online,"Franziskanerterziarinnen St. Bonifatius, Köln","""Gebäudekomplex Franziskanerterziarinnen St. B...","""Building complex Franciscans St. Boniface, Co...",@50.92517069926024/6.95823197738584,Q10400,Q635758
6,10564,16731,302,1186,,1186/1191,1192,,,11.042222,51.901389,,302,Online,Prämonstratenserstift Halberstadt,"""Gebäudekomplex Prämonstratenserstift Halberst...","""Building complex Premonstratensians of Halber...",@51.901389/11.042222,Q10374,Q635758
7,13544,46481605,8354,1204,,,1584,1594.0,1584/1594,6.42447,53.37613,,8354,Online,Prämonstratenserinnenstift (Nijenklooster) Klo...,"""Gebäudekomplex Prämonstratenserinnenstift (Ni...","""Building complex Premonstratensian nuns (Nije...",@53.3761296985323/6.42447009029927,Q1348301,Q635758
8,9788,46483304,5178,806,823.0,zwischen 806/807 und 823,1811,,,9.045081,47.159873,,5178,Online,"Augustinerchorfrauen Schänis, Schweiz","""Gebäudekomplex Augustinerchorfrauen Schänis, ...","""Building complex Canonesses Regular of St Aug...",@47.159873/9.045081,Q880959,Q635758
9,1259,46477066,2013,1136,,,1803,,,12.88855,47.73154,Reichenhall,2013,Online,"Augustinerchorherrenstift St. Zeno, Reichenhall","""Gebäudekomplex Augustinerchorherrenstift St. ...","""Building complex Canons Regular of St Augusti...",@47.73154/12.88855,Q82598,Q635758
10,7119,6305,50197,1263,1313.0,vor 1313,1802,,,6.956109,50.936995,,50197,Online,"Augustinerinnen-, später Benediktinerinnenklos...","""Gebäudekomplex Augustinerinnen-, später Bened...","""Building complex Augustinian nuns, later Bene...",@50.93699504169764/6.956108698612635,Q10400,Q635758


In order to keep a mapping between the monastery database and FactGrid, every item will receive a distinct vocabulary term that is constructed using the `id_monastery_location` from the `gs_monastery_location` table. The FactGrid Property to use is [P1301](https://database.factgrid.de/wiki/Property:P1301) (GS vocabulary term). For the construction, the following pattern is being used: `GSMonasteryLocation<id_monastery_location>`.

In [12]:
prepared_df['P1301'] = prepared_df['id_monastery_location'].apply(lambda x: f'\"GSMonasteryLocation{x}\"')
prepared_df

Unnamed: 0,id_monastery_location,place_id,gsn_id,location_begin_tpq,location_begin_taq,location_begin_note,location_end_tpq,location_end_taq,location_end_note,longitude,...,location_name,id_gsn,status,monastery_name,Lde,Len,P48,P83,P2,P1301
0,16673,46484371,10880,1773,,,1869,,,14.06718,...,Beraun,10880,Online,"Piaristenkolleg Beraun (Beroun), Tschechien","""Gebäudekomplex Piaristenkolleg Beraun (Beroun...","""Building complex Piarist college Beraun (Bero...",@49.96138086190997/14.067180236807877,Q629276,Q635758,"""GSMonasteryLocation16673"""
1,6072,19993,3593,1135,1145.0,um 1140,1525,,,10.659167,...,,3593,Online,Zisterzienserkloster Georgenthal,"""Gebäudekomplex Zisterzienserkloster Georgenthal""","""Building complex Cistercian monastery Georgen...",@50.828889/10.659167,Q23292,Q635758,"""GSMonasteryLocation6072"""
2,1865,46479281,3468,1346,,1346,1541,,1541,14.56,...,Stettin,3468,Online,"Kollegiatstift St. Otto, Stettin (Szczecin), P...","""Gebäudekomplex Kollegiatstift St. Otto, Stett...","""Building complex Collegiate Church of St. Ott...",@53.426389/14.56,Q21782,Q635758,"""GSMonasteryLocation1865"""
3,16824,46484486,11549,1295,1305.0,um 1300,1300,1325.0,frühes 14. Jahrhundert,,...,Luditz,11549,Online,"Dominikanerinnenkloster Luditz (Žlutice), Tsch...","""Gebäudekomplex Dominikanerinnenkloster Luditz...","""Building complex Dominican Nuns' monastery Lu...",,Q623515,Q635758,"""GSMonasteryLocation16824"""
5,16936,6305,50228,1478,1478.0,1478/1479,1802,,,6.958232,...,,50228,Online,"Franziskanerterziarinnen St. Bonifatius, Köln","""Gebäudekomplex Franziskanerterziarinnen St. B...","""Building complex Franciscans St. Boniface, Co...",@50.92517069926024/6.95823197738584,Q10400,Q635758,"""GSMonasteryLocation16936"""
6,10564,16731,302,1186,,1186/1191,1192,,,11.042222,...,,302,Online,Prämonstratenserstift Halberstadt,"""Gebäudekomplex Prämonstratenserstift Halberst...","""Building complex Premonstratensians of Halber...",@51.901389/11.042222,Q10374,Q635758,"""GSMonasteryLocation10564"""
7,13544,46481605,8354,1204,,,1584,1594.0,1584/1594,6.42447,...,,8354,Online,Prämonstratenserinnenstift (Nijenklooster) Klo...,"""Gebäudekomplex Prämonstratenserinnenstift (Ni...","""Building complex Premonstratensian nuns (Nije...",@53.3761296985323/6.42447009029927,Q1348301,Q635758,"""GSMonasteryLocation13544"""
8,9788,46483304,5178,806,823.0,zwischen 806/807 und 823,1811,,,9.045081,...,,5178,Online,"Augustinerchorfrauen Schänis, Schweiz","""Gebäudekomplex Augustinerchorfrauen Schänis, ...","""Building complex Canonesses Regular of St Aug...",@47.159873/9.045081,Q880959,Q635758,"""GSMonasteryLocation9788"""
9,1259,46477066,2013,1136,,,1803,,,12.88855,...,Reichenhall,2013,Online,"Augustinerchorherrenstift St. Zeno, Reichenhall","""Gebäudekomplex Augustinerchorherrenstift St. ...","""Building complex Canons Regular of St Augusti...",@47.73154/12.88855,Q82598,Q635758,"""GSMonasteryLocation1259"""
10,7119,6305,50197,1263,1313.0,vor 1313,1802,,,6.956109,...,,50197,Online,"Augustinerinnen-, später Benediktinerinnenklos...","""Gebäudekomplex Augustinerinnen-, später Bened...","""Building complex Augustinian nuns, later Bene...",@50.93699504169764/6.956108698612635,Q10400,Q635758,"""GSMonasteryLocation7119"""


By connecting to modern municipalities, it is possible to understand in which territorial structures the (former) building complexes are located today. However, the monastery database also contains information about the historical diocese in which the building complexes were located. This information is stored in the table `gs_places` in the column `diocese_id`. Therefore, the locations where monastery locations are located are assigned to a diocese. In FactGrid, we connect the information about the dioceses directly to the building complexes. A building complex has a property [P1003](https://database.factgrid.de/wiki/Item:Q21662) (Diocese), which connects to a diocese item, for example the Archdiocese of Mainz ([Q153230](https://database.factgrid.de/wiki/Item:Q153230)). The historical affiliation of a location to a diocese is a complex phenomenon. On the one hand, this changed over time, especially in border areas. On the other hand, it is also possible that an area that we understand today as a contiguous location was not a contiguous location around 1500 and only partially belonged to a certain diocese. Therefore, we separate the modern territorial localization (statements about the current location of the address) from the historical localization (statements about the affiliation to a diocese).

Every Statement in FactGrid should be supported by a Source/Reference. To achieve this, a source column `S471` is added after each relevant property to link to the Monastery Database Entries using the Property [P471](https://database.factgrid.de/wiki/Property:P471).

In [13]:
final_table = prepared_df.copy()
for colname in ["P48", "P83"]:
    final_table.insert(final_table.columns.get_loc(colname)+1, "S471", final_table["gsn_id"].apply(lambda x:f'\"{x}\"'), allow_duplicates=True)
final_table["P131"] = "Q153178"
final_table

Unnamed: 0,id_monastery_location,place_id,gsn_id,location_begin_tpq,location_begin_taq,location_begin_note,location_end_tpq,location_end_taq,location_end_note,longitude,...,monastery_name,Lde,Len,P48,S471,P83,S471.1,P2,P1301,P131
0,16673,46484371,10880,1773,,,1869,,,14.06718,...,"Piaristenkolleg Beraun (Beroun), Tschechien","""Gebäudekomplex Piaristenkolleg Beraun (Beroun...","""Building complex Piarist college Beraun (Bero...",@49.96138086190997/14.067180236807877,"""10880""",Q629276,"""10880""",Q635758,"""GSMonasteryLocation16673""",Q153178
1,6072,19993,3593,1135,1145.0,um 1140,1525,,,10.659167,...,Zisterzienserkloster Georgenthal,"""Gebäudekomplex Zisterzienserkloster Georgenthal""","""Building complex Cistercian monastery Georgen...",@50.828889/10.659167,"""3593""",Q23292,"""3593""",Q635758,"""GSMonasteryLocation6072""",Q153178
2,1865,46479281,3468,1346,,1346,1541,,1541,14.56,...,"Kollegiatstift St. Otto, Stettin (Szczecin), P...","""Gebäudekomplex Kollegiatstift St. Otto, Stett...","""Building complex Collegiate Church of St. Ott...",@53.426389/14.56,"""3468""",Q21782,"""3468""",Q635758,"""GSMonasteryLocation1865""",Q153178
3,16824,46484486,11549,1295,1305.0,um 1300,1300,1325.0,frühes 14. Jahrhundert,,...,"Dominikanerinnenkloster Luditz (Žlutice), Tsch...","""Gebäudekomplex Dominikanerinnenkloster Luditz...","""Building complex Dominican Nuns' monastery Lu...",,"""11549""",Q623515,"""11549""",Q635758,"""GSMonasteryLocation16824""",Q153178
5,16936,6305,50228,1478,1478.0,1478/1479,1802,,,6.958232,...,"Franziskanerterziarinnen St. Bonifatius, Köln","""Gebäudekomplex Franziskanerterziarinnen St. B...","""Building complex Franciscans St. Boniface, Co...",@50.92517069926024/6.95823197738584,"""50228""",Q10400,"""50228""",Q635758,"""GSMonasteryLocation16936""",Q153178
6,10564,16731,302,1186,,1186/1191,1192,,,11.042222,...,Prämonstratenserstift Halberstadt,"""Gebäudekomplex Prämonstratenserstift Halberst...","""Building complex Premonstratensians of Halber...",@51.901389/11.042222,"""302""",Q10374,"""302""",Q635758,"""GSMonasteryLocation10564""",Q153178
7,13544,46481605,8354,1204,,,1584,1594.0,1584/1594,6.42447,...,Prämonstratenserinnenstift (Nijenklooster) Klo...,"""Gebäudekomplex Prämonstratenserinnenstift (Ni...","""Building complex Premonstratensian nuns (Nije...",@53.3761296985323/6.42447009029927,"""8354""",Q1348301,"""8354""",Q635758,"""GSMonasteryLocation13544""",Q153178
8,9788,46483304,5178,806,823.0,zwischen 806/807 und 823,1811,,,9.045081,...,"Augustinerchorfrauen Schänis, Schweiz","""Gebäudekomplex Augustinerchorfrauen Schänis, ...","""Building complex Canonesses Regular of St Aug...",@47.159873/9.045081,"""5178""",Q880959,"""5178""",Q635758,"""GSMonasteryLocation9788""",Q153178
9,1259,46477066,2013,1136,,,1803,,,12.88855,...,"Augustinerchorherrenstift St. Zeno, Reichenhall","""Gebäudekomplex Augustinerchorherrenstift St. ...","""Building complex Canons Regular of St Augusti...",@47.73154/12.88855,"""2013""",Q82598,"""2013""",Q635758,"""GSMonasteryLocation1259""",Q153178
10,7119,6305,50197,1263,1313.0,vor 1313,1802,,,6.956109,...,"Augustinerinnen-, später Benediktinerinnenklos...","""Gebäudekomplex Augustinerinnen-, später Bened...","""Building complex Augustinian nuns, later Bene...",@50.93699504169764/6.956108698612635,"""50197""",Q10400,"""50197""",Q635758,"""GSMonasteryLocation7119""",Q153178


To finalize, the table is cleaned up and transformed into a variety of formats. Most importantly, you will find the V1-statements to create the new building complex items under `data/results/building_complexes/import_building_complexes.tsv`

In [14]:
from helper_functions import df_to_qs_v1
final_table = final_table.drop(columns=["id_monastery_location", "place_id", "gsn_id", "location_begin_tpq", "location_begin_taq", "location_begin_note", "location_end_tpq", "location_end_taq", "location_end_note", "longitude", "latitude", "location_name", "id_gsn", "status", "monastery_name"])
final_table.insert(0, "qid", np.nan)
final_table.to_excel("data/results/building_complexes/import_building_complexes.xlsx", index=False)
final_table.to_csv("data/results/building_complexes/import_building_complexes.csv", index=False, doublequote=False, quoting=csv.QUOTE_NONE, escapechar="§") #hack to save in Quickstatements-applicable format
with open("data/results/building_complexes/import_building_complexes.csv", "r") as file:
    s = file.read()
with open("data/results/building_complexes/import_building_complexes.csv", "w") as file:
    file.write(s.replace("§", ""))
with open("data/results/building_complexes/import_building_complexes.tsv", "w") as file:
    file.write(df_to_qs_v1(final_table))
final_table

Unnamed: 0,qid,Lde,Len,P48,S471,P83,S471.1,P2,P1301,P131
0,,"""Gebäudekomplex Piaristenkolleg Beraun (Beroun...","""Building complex Piarist college Beraun (Bero...",@49.96138086190997/14.067180236807877,"""10880""",Q629276,"""10880""",Q635758,"""GSMonasteryLocation16673""",Q153178
1,,"""Gebäudekomplex Zisterzienserkloster Georgenthal""","""Building complex Cistercian monastery Georgen...",@50.828889/10.659167,"""3593""",Q23292,"""3593""",Q635758,"""GSMonasteryLocation6072""",Q153178
2,,"""Gebäudekomplex Kollegiatstift St. Otto, Stett...","""Building complex Collegiate Church of St. Ott...",@53.426389/14.56,"""3468""",Q21782,"""3468""",Q635758,"""GSMonasteryLocation1865""",Q153178
3,,"""Gebäudekomplex Dominikanerinnenkloster Luditz...","""Building complex Dominican Nuns' monastery Lu...",,"""11549""",Q623515,"""11549""",Q635758,"""GSMonasteryLocation16824""",Q153178
5,,"""Gebäudekomplex Franziskanerterziarinnen St. B...","""Building complex Franciscans St. Boniface, Co...",@50.92517069926024/6.95823197738584,"""50228""",Q10400,"""50228""",Q635758,"""GSMonasteryLocation16936""",Q153178
6,,"""Gebäudekomplex Prämonstratenserstift Halberst...","""Building complex Premonstratensians of Halber...",@51.901389/11.042222,"""302""",Q10374,"""302""",Q635758,"""GSMonasteryLocation10564""",Q153178
7,,"""Gebäudekomplex Prämonstratenserinnenstift (Ni...","""Building complex Premonstratensian nuns (Nije...",@53.3761296985323/6.42447009029927,"""8354""",Q1348301,"""8354""",Q635758,"""GSMonasteryLocation13544""",Q153178
8,,"""Gebäudekomplex Augustinerchorfrauen Schänis, ...","""Building complex Canonesses Regular of St Aug...",@47.159873/9.045081,"""5178""",Q880959,"""5178""",Q635758,"""GSMonasteryLocation9788""",Q153178
9,,"""Gebäudekomplex Augustinerchorherrenstift St. ...","""Building complex Canons Regular of St Augusti...",@47.73154/12.88855,"""2013""",Q82598,"""2013""",Q635758,"""GSMonasteryLocation1259""",Q153178
10,,"""Gebäudekomplex Augustinerinnen-, später Bened...","""Building complex Augustinian nuns, later Bene...",@50.93699504169764/6.956108698612635,"""50197""",Q10400,"""50197""",Q635758,"""GSMonasteryLocation7119""",Q153178


## Next steps
As a next step, you should run notebook 2 - Monasteries to create the religious community items that go together with the building complexes. Afterwards you can copy the V1 statements from both, `data/results/building_complexes/import_building_complexes.csv` and `data/results/monasteries/import_monasteries.csv` to Quickstatements and upload.