# Notebook 1: Creating Items for Building Complexes

This notebook implements the first step of creating of the Klosterdatenbank-to-Factgrid-Workflow which is to create Items for the building complexes. It contains describing elements about the underlying data model and the workflow in general, as well as specific instructions in order to run the notebook. Markdown cells containing describing elements are marked as `#description`. Instructional sections are marked as `#instruction`.

Strictly speaking, the monastery database does not contain dedicated information on building complexes. Information on where a religious community had its place of operation is stored in the `gs_monastery_location` table. This table assigns each row of a religious community (`gsn_id`) to a location (`place_id`) and, if known, specific coordinates within this location (`longitude`, `latitude`). Such an assignment implies that the community lived or worked at this location at a certain point in time. At this point, we make the central assumption that a building complex of some kind, consisting of at least one building, must have existed. Accordingly, the building complexes created in this step represent both a row from the `gs_monastery_location` table and thus an assignment of a monastery to a specific location, as well as physical buildings in which religious communities worked and which may have continued to exist before or after their use and have experienced other use scenarios.

## Preparations

The notebook requires the following libraries to run. If an error occurs, make sure the libraries are installed on your system.

In [469]:
import pandas as pd
import numpy as np
import os
import csv

First, the export files are loaded into [Dataframes](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.html). The dataframes are stored in a dictionary with the keys being the filenames, for easier access.

In [470]:
# Load Access exports
from helper_functions import load_files_from_folder, query_factgrid

export_files = load_files_from_folder("data/exports_monasteryDB", "xlsx")

# Create dataframes for each table
dataframes = {key: pd.read_excel(value) for key, value in export_files.items()}

# Add dataframe for monasteries in factGrid (stored in a different directory)
dataframes["building_complexes_in_factgrid"] = query_factgrid("building_complexes")
dataframes["monasteries_in_factgrid"] = query_factgrid("monasteries")

Since `gs_monastery_location` does not contain the name of the monasteries, the table is joined to `gs_monastery` to extract the missing information. The resulting table is cut down to the relevant columns. The resulting dataframe is being filtered to only contain information on religious comunities that have the status "online", meaning they are not currently worked on anymore. Finally, to make sure that no duplicate building complexes are being created, the table is filtered against the existing building complexes in FactGrid.

In [471]:
# Merge gs_monastery_location and gs_monastery
merged_df = pd.merge(dataframes["gs_monastery_location"], dataframes["gs_monastery"], left_on='gsn_id', right_on='id_gsn', how='left')
# Filter for status 'online'
online_df = merged_df[merged_df["status"] == "Online"]
# Define columns to drop
drop_columns = [
    "relocated", 
    "comment", 
    "main_location", 
    "diocese_id", 
    "id_monastery", 
    "date_created", 
    "created_by_user", 
    "patrocinium",
    "selection", 
    "processing_status", 
    "gs_persons", 
    "selection_criteria", 
    "last_change", 
    "changed_by_user", 
    "founder"
]
# Prepare dataframe by dropping unnecessary columns
prepared_df = online_df.drop(drop_columns, axis="columns")
bc_in_fg = dataframes["building_complexes_in_factgrid"]["GSVocabTerm"].str.split("Location").str[1].astype(int)
print(f"{len(prepared_df[prepared_df["id_monastery_location"].isin(bc_in_fg)])} building complexes already exist in FactGrid and are filtered out.")
prepared_df = prepared_df[~prepared_df["id_monastery_location"].isin(bc_in_fg)]
prepared_df

2495 building complexes already exist in FactGrid and are filtered out.


Unnamed: 0,id_monastery_location,place_id,gsn_id,location_begin_tpq,location_begin_taq,location_begin_note,location_end_tpq,location_end_taq,location_end_note,longitude,latitude,location_name,id_gsn,status,note,monastery_name
0,6051,2001.0,40356,1297.0,,1297 erste Erwähnung,1463.0,,,7.1660075206704485,50.14597836228394,,40356,Online,1463 dem Kollegiatstift Pfalzel (GSN 1031) ink...,Augustinerinnenkloster (Martinsklause) Cochem
3,6054,6179.0,40359,1434.0,1466.0,Mitte 15. Jahrhundert,1543.0,,,8.211983,49.939573,,40359,Online,1543 Umsiedlung des Konvents in das Reuerinnen...,"Franziskanertertiarinnenkloster St. Andreas, K..."
4,6055,7242.0,40360,1493.0,,,1802.0,,,8.294172,49.988671,Weisenau,40360,Online,,"Allerheiligenkloster Mainz, Weisenau"
6,6057,11776.0,40362,1235.0,1238.0,1235/1238,1802.0,,,6.632027,49.759165,,40362,Online,Das Kloster geht auf eine kurz nach 1200 gegrü...,"Dominikanerinnenkloster St. Katharina, Trier"
7,6058,11776.0,40363,551.0,600.0,zweite Hälfte 6. Jahrhundert,1288.0,,,6.65873701917281,49.7524770967829,,40363,Online,Das Kloster wurde dem Dominikanerinnenkloster ...,"Frauenkloster St. Martin auf dem Berge, Trier"
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
8171,17885,46484699.0,11928,1348.0,,,1566.0,,,3.174419,51.180071,,11928,Online,Gründung einer Kartause vor den Mauern der Sta...,"Kartäuserinnenkloster Brügge, Belgien"
8181,17895,22625.0,12201,1240.0,,,1240.0,,,,,,12201,Online,"Einzig 1240 erwähnt, als der Bischof von Straß...",Schwesternsammlung Haslach
8183,17898,46484884.0,12204,1631.0,,,1780.0,,,21.14756,54.04918,,12204,Online,,"Jesuitenkolleg Rößel, Polen"
8188,17906,6344.0,20734,1423.0,1424.0,1423/1424,1436.0,,vor 1436,9.174574,47.663389,,20734,Online,Der Name der Gemeinschaft geht die Konstanzer ...,"Schwesternsammlung ""im Mäntellerinnenhaus"", Ko..."


To double-check potential duplicates, the following cell finds buildings complexes that are connected to monasteries already existent in FactGrid. If the resulting DataFrame is empty, all building complexes will be linked to newly created monastery items.

In [472]:
existing_monasteries = prepared_df[prepared_df["gsn_id"].isin(dataframes["monasteries_in_factgrid"]["KlosterdatenbankID"].astype(int))]
existing_monasteries

Unnamed: 0,id_monastery_location,place_id,gsn_id,location_begin_tpq,location_begin_taq,location_begin_note,location_end_tpq,location_end_taq,location_end_note,longitude,latitude,location_name,id_gsn,status,note,monastery_name
13,7944,548.0,60471,1071.0,,,1802.0,,,10.89715,48.372959,,60471,Online,,"Kollegiatstift St. Gertrud, Augsburg"
35,1895,6305.0,3503,866.0,,866 erstmals erwähnt,1802.0,,,6.95823,50.94123,,3503,Online,,Domstift Köln
37,1897,11120.0,3489,858.0,1020.0,zwischen 858 und 1020,1801.0,1802.0,1801/1802,8.442941,49.3172355,,3489,Online,Das Bistum Speyer selbst wurde 614 gegründet. ...,Domstift Speyer
38,1899,13202.0,3490,900.0,999.0,vor 1000,1802.0,,,8.359962,49.630013,,3490,Online,Domklerus als Kommunität erstmals 814 oder 897...,Domstift Worms
39,1901,763.0,3492,1007.0,1012.0,kurz vor 1012,1802.0,1803.0,1802/1803,10.882520,49.890836,,3492,Online,,Domstift Bamberg
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
8103,17811,20288.0,12139,1318.0,,,1526.0,,,,,,12139,Online,Eine Urkunde des Jahres 1318 bestätigt die Übe...,Niederlassung des Augustiner-Chorherrenstifts ...
8111,17822,46484923.0,12147,1589.0,,,1835.0,,,2.17505,41.38749,,12147,Online,Seit 1510 unternahm der Orden der Paulaner meh...,Paulanerkloster Sant Francesc de Paula de Barc...
8114,17825,46484923.0,12150,1578.0,,,1835.0,,,2.120173,41.393722,,12150,Online,1578 kamen einige Kapuziner aus Italien nach B...,"Kapuzinerkloster von Sarrià, Barcelona, Spanien"
8134,17847,1779.0,20216,1355.0,,,1489.0,,,,,Buchen,20216,Online,Die erste Erwähnung einer Frauengemeinschaft i...,Franziskanerinnenkloster Buchen


In addition to Building Complexes that were already created in previous batches, there is one more case to be considered: There could be building complexes, that have identical coordinates. In this case, only one FactGrid-Item should be created, so they are dismissed for now.

In [473]:
#Check duplicate coordinates with already existing building complexes in FactGrid
duplicate_coordinates = dataframes["building_complexes_in_factgrid"]
duplicate_coordinates["longitude"] = duplicate_coordinates["coords"].str.split(" ").str[0].str.strip("Point(").astype(float)
duplicate_coordinates["latitude"] = duplicate_coordinates["coords"].str.split(" ").str[1].str.strip(")").astype(float)
duplicate_coordinates = duplicate_coordinates.drop(columns="coords")
prepared_df_without_nan_coordinates = prepared_df.dropna(subset=["longitude", "latitude"])
prepared_df_without_nan_coordinates["longitude"] = prepared_df_without_nan_coordinates["longitude"].astype(float)
prepared_df_without_nan_coordinates["latitude"] = prepared_df_without_nan_coordinates["latitude"].astype(float)
duplicate_coordinates = pd.merge(prepared_df_without_nan_coordinates, duplicate_coordinates, on=["longitude", "latitude"])
duplicate_coordinates[["gsn_id", "id_monastery_location", "monastery_name", "longitude", "latitude", "GSVocabTerm", "item"]]
duplicate_coordinates.to_excel("data/intermediate_results/duplicate_coords_factgrid.xlsx")
prepared_df = prepared_df[~prepared_df["id_monastery_location"].isin(duplicate_coordinates["id_monastery_location"])]

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  prepared_df_without_nan_coordinates["longitude"] = prepared_df_without_nan_coordinates["longitude"].astype(float)
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  prepared_df_without_nan_coordinates["latitude"] = prepared_df_without_nan_coordinates["latitude"].astype(float)


In [474]:
duplicate_coordinates

Unnamed: 0,id_monastery_location,place_id,gsn_id,location_begin_tpq,location_begin_taq,location_begin_note,location_end_tpq,location_end_taq,location_end_note,longitude,latitude,location_name,id_gsn,status,note,monastery_name,item,GSVocabTerm
0,16662,46479184.0,11430,1315.0,1330.0,nach 1315,1420.0,,,14.404674,50.085097,"Prag, Kleinseite",11430,Online,Möglicherweise wurde das Magdalenerinnenkloste...,"Magdalenerinnenkloster Prag, Kleinseite (Praha...",https://database.factgrid.de/entity/Q1763910,GSMonasteryLocation16512
1,16666,46479184.0,11434,1655.0,,,1782.0,,,14.406281,50.088353,"Prag, Kleinseite",11434,Online,Die Karmelitinnen wurden 1655 von Ferdinand II...,"Barfüßer-Karmelitinnenkloster Prag, Kleinseite...",https://database.factgrid.de/entity/Q1774579,GSMonasteryLocation16667
2,16669,46479184.0,10695,1625.0,,,1950.0,,,14.402500,50.091208,Prager Burgstadt,10695,Online,Das Dominikanerkloster St. Clemens wurde wahrs...,"Dominikanerkloster St. Clemens, Prag (Praha), ...",https://database.factgrid.de/entity/Q1772514,GSMonasteryLocation1744
3,495,2097.0,495,1795.0,,,1825.0,,,7.250419,52.046579,,495,Online,"Ab 1800 Doppelkloster, ab 1804 Nonnenkloster. ...",Trappistenkloster Darfeld,https://database.factgrid.de/entity/Q1763025,GSMonasteryLocation575
4,13589,46481217.0,8388,1452.0,1464.0,ungefähr zwischen 1457 und 1459,1498.0,,,5.799167,53.208889,Leeuwarden,8388,Online,Franziskanerobservanten; Niederlassung zunächs...,"Franziskanerkloster Leeuwarden, Niederlande",https://database.factgrid.de/entity/Q1763615,GSMonasteryLocation13588
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
86,17498,46480802.0,11946,1448.0,,,1578.0,,,3.566510,51.181330,,11946,Online,Graue Schwestern aus Sint-Omaars gründeten 144...,"Graue Schwestern Onze-Lievre-Vrouw Ten Doorn, ...",https://database.factgrid.de/entity/Q1758286,GSMonasteryLocation17489
87,17513,46484718.0,11953,1599.0,,,1796.0,,,3.719720,51.059170,Gent,11953,Online,Das Kloster wurde 1296 gegründet. Die Gemeinsc...,"Augustinerkloster St. Stephan, Gent, Belgien",https://database.factgrid.de/entity/Q1763122,GSMonasteryLocation17512
88,17515,46484709.0,11954,1599.0,,,1796.0,,,3.877280,50.775060,Gerhardsbergen,11954,Online,"Zunächst 1796 verkauft, die Karmeliter konnten...","Karmelitenkloster Gerhardsbergen, Belgien",https://database.factgrid.de/entity/Q1772783,GSMonasteryLocation17514
89,17526,46484819.0,11961,1559.0,,,1797.0,,,4.839510,51.175190,,11961,Online,Nach der Vertreibung der Norbertinerinnen im J...,"Prämonstratenserinnenkloster Herentals, Belgien",https://database.factgrid.de/entity/Q1764007,GSMonasteryLocation17525


In [475]:
prepared_df

Unnamed: 0,id_monastery_location,place_id,gsn_id,location_begin_tpq,location_begin_taq,location_begin_note,location_end_tpq,location_end_taq,location_end_note,longitude,latitude,location_name,id_gsn,status,note,monastery_name
0,6051,2001.0,40356,1297.0,,1297 erste Erwähnung,1463.0,,,7.1660075206704485,50.14597836228394,,40356,Online,1463 dem Kollegiatstift Pfalzel (GSN 1031) ink...,Augustinerinnenkloster (Martinsklause) Cochem
3,6054,6179.0,40359,1434.0,1466.0,Mitte 15. Jahrhundert,1543.0,,,8.211983,49.939573,,40359,Online,1543 Umsiedlung des Konvents in das Reuerinnen...,"Franziskanertertiarinnenkloster St. Andreas, K..."
4,6055,7242.0,40360,1493.0,,,1802.0,,,8.294172,49.988671,Weisenau,40360,Online,,"Allerheiligenkloster Mainz, Weisenau"
6,6057,11776.0,40362,1235.0,1238.0,1235/1238,1802.0,,,6.632027,49.759165,,40362,Online,Das Kloster geht auf eine kurz nach 1200 gegrü...,"Dominikanerinnenkloster St. Katharina, Trier"
7,6058,11776.0,40363,551.0,600.0,zweite Hälfte 6. Jahrhundert,1288.0,,,6.65873701917281,49.7524770967829,,40363,Online,Das Kloster wurde dem Dominikanerinnenkloster ...,"Frauenkloster St. Martin auf dem Berge, Trier"
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
8171,17885,46484699.0,11928,1348.0,,,1566.0,,,3.174419,51.180071,,11928,Online,Gründung einer Kartause vor den Mauern der Sta...,"Kartäuserinnenkloster Brügge, Belgien"
8181,17895,22625.0,12201,1240.0,,,1240.0,,,,,,12201,Online,"Einzig 1240 erwähnt, als der Bischof von Straß...",Schwesternsammlung Haslach
8183,17898,46484884.0,12204,1631.0,,,1780.0,,,21.14756,54.04918,,12204,Online,,"Jesuitenkolleg Rößel, Polen"
8188,17906,6344.0,20734,1423.0,1424.0,1423/1424,1436.0,,vor 1436,9.174574,47.663389,,20734,Online,Der Name der Gemeinschaft geht die Konstanzer ...,"Schwesternsammlung ""im Mäntellerinnenhaus"", Ko..."


## Labels

It is expected that items in FactGrid have a label in at least one language. While the FactGrid ID (also referred to as the "Q-Number") uniquely identifies the item, the label serves to capture the name of the item in everyday language. The label is also indexed for text-based search. The naming of the items created in this project follows the following rule:
- For the religious communities, the name from the monastery database is used as the label, for example "Zisterzienserkloster Georgenzell".
- For the building complexes, the labels are constructed according to the following schema: `Gebäudekomplex <monastery_name> [(<location_name>)]`. Here, `monastery_name` is again the name of the religious community from the `gs_monastery` table. `location_name` is a column of the `gs_monastery_location` table. In this column, if available, the specific name given to this location is stored. 

For example, the "Benediktinerinnenkloster Mielen, Sint-Truiden, Belgien" (GSN [11665](https://klosterdatenbank.adw-goe.de/gsn/11665)) has two locations in the Belgian town of Sint-Truiden, namely the location "Sint Truiden" and the location "Metsteren" (see Figure). The constructed labels are then "Gebäudekomplex Benediktinerinnenkloster Mielen, Sint-Truiden, Belgien (Sint-Truiden)" and "Gebäudekomplex Benediktinerinnenkloster Mielen, Sint-Truiden, Belgien (Metsteren)". However, location names are not available in all these cases, which can lead to duplicates in the labels. These are displayed in the workflow, so that location names can be added to distinguish them better.

<img src="documentation-images/Standorte GSN11665.png" alt="Monastery Locations of GSN 11665" width="500">

*Figure 1: Building Complexes of the Benedictine nun's monastery Mielen in Sint-Truiden, Belgium (GSN 11665). Base-Layer: OpenStreetMap.*

The following cell constructs the location names and saves them in a new column called "Lde" (see [Quickstatements specification](https://www.wikidata.org/wiki/Help:QuickStatements#Adding_labels,_aliases,_descriptions_and_sitelinks)).

In [476]:
from helper_functions import construct_description
# 1. Create new column with labels
prepared_df['Lde'] = "Gebäudekomplex " + prepared_df["monastery_name"].str.cat(prepared_df["location_name"].fillna(''), sep=" (") +")"
for index, row in prepared_df.iterrows():
    prepared_df.loc[index, "Dde"] = construct_description(row["location_name"], row["monastery_name"], row["location_begin_taq"], row["location_begin_tpq"], row["location_end_taq"], row["location_end_tpq"])
# 2. If necessary, delete empty brackets at end of labels
prepared_df['Lde'] = prepared_df["Lde"].str.replace(r'\(\)', '', regex=True).apply(lambda x: f'\"{x.strip()}\"')
prepared_df["Dde"] = prepared_df["Dde"].apply(lambda x:f'\"{x}\"')
prepared_df

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  prepared_df['Lde'] = "Gebäudekomplex " + prepared_df["monastery_name"].str.cat(prepared_df["location_name"].fillna(''), sep=" (") +")"
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  prepared_df.loc[index, "Dde"] = construct_description(row["location_name"], row["monastery_name"], row["location_begin_taq"], row["location_begin_tpq"], row["location_end_taq"], row["location_end_tpq"])
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the 

Unnamed: 0,id_monastery_location,place_id,gsn_id,location_begin_tpq,location_begin_taq,location_begin_note,location_end_tpq,location_end_taq,location_end_note,longitude,latitude,location_name,id_gsn,status,note,monastery_name,Lde,Dde
0,6051,2001.0,40356,1297.0,,1297 erste Erwähnung,1463.0,,,7.1660075206704485,50.14597836228394,,40356,Online,1463 dem Kollegiatstift Pfalzel (GSN 1031) ink...,Augustinerinnenkloster (Martinsklause) Cochem,"""Gebäudekomplex Augustinerinnenkloster (Martin...","""Gebäudekomplex des Augustinerinnenklosters (M..."
3,6054,6179.0,40359,1434.0,1466.0,Mitte 15. Jahrhundert,1543.0,,,8.211983,49.939573,,40359,Online,1543 Umsiedlung des Konvents in das Reuerinnen...,"Franziskanertertiarinnenkloster St. Andreas, K...","""Gebäudekomplex Franziskanertertiarinnenkloste...","""Gebäudekomplex des Franziskanertertiarinnenkl..."
4,6055,7242.0,40360,1493.0,,,1802.0,,,8.294172,49.988671,Weisenau,40360,Online,,"Allerheiligenkloster Mainz, Weisenau","""Gebäudekomplex Allerheiligenkloster Mainz, We...","""Gebäudekomplex Weisenau des Allerheiligenklos..."
6,6057,11776.0,40362,1235.0,1238.0,1235/1238,1802.0,,,6.632027,49.759165,,40362,Online,Das Kloster geht auf eine kurz nach 1200 gegrü...,"Dominikanerinnenkloster St. Katharina, Trier","""Gebäudekomplex Dominikanerinnenkloster St. Ka...","""Gebäudekomplex des Dominikanerinnenklosters S..."
7,6058,11776.0,40363,551.0,600.0,zweite Hälfte 6. Jahrhundert,1288.0,,,6.65873701917281,49.7524770967829,,40363,Online,Das Kloster wurde dem Dominikanerinnenkloster ...,"Frauenkloster St. Martin auf dem Berge, Trier","""Gebäudekomplex Frauenkloster St. Martin auf d...","""Gebäudekomplex des Frauenklosters St. Martin ..."
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
8171,17885,46484699.0,11928,1348.0,,,1566.0,,,3.174419,51.180071,,11928,Online,Gründung einer Kartause vor den Mauern der Sta...,"Kartäuserinnenkloster Brügge, Belgien","""Gebäudekomplex Kartäuserinnenkloster Brügge, ...","""Gebäudekomplex des Kartäuserinnenklosters Brü..."
8181,17895,22625.0,12201,1240.0,,,1240.0,,,,,,12201,Online,"Einzig 1240 erwähnt, als der Bischof von Straß...",Schwesternsammlung Haslach,"""Gebäudekomplex Schwesternsammlung Haslach""","""Gebäudekomplex der Schwesternsammlung Haslach"""
8183,17898,46484884.0,12204,1631.0,,,1780.0,,,21.14756,54.04918,,12204,Online,,"Jesuitenkolleg Rößel, Polen","""Gebäudekomplex Jesuitenkolleg Rößel, Polen""","""Gebäudekomplex des Jesuitenkollegs Rößel, Polen"""
8188,17906,6344.0,20734,1423.0,1424.0,1423/1424,1436.0,,vor 1436,9.174574,47.663389,,20734,Online,Der Name der Gemeinschaft geht die Konstanzer ...,"Schwesternsammlung ""im Mäntellerinnenhaus"", Ko...","""Gebäudekomplex Schwesternsammlung ""im Mäntell...","""Gebäudekomplex des Schwesternsammlung ""im Män..."


As mentioned above, there might be duplicate labels in cases where locations don't have an explicit name. Since they still can be distinguished from another by their identifier and coordinates, this is not necessarily a problem. However, the following cell will create a list of all the duplicate labels so that they can be examined.

**In order to resolve the duplicates**

1. Open and inspect the table located at `data/intermediate_results/duplicate_building_complex_labels.xslx`
2. Add location names in the monastery database
3. Create new exports from the monastery database and replace `data/exports_monasteryDB/gs_monastery.xlsx` and `data/exports_monasteryDB/gs_monastery_location.xlsx` with the new files
4. Re-run the notebook. The cell below now should no longer contain the duplicates you resolved. 

In [477]:
duplicated_building_complex_labels = prepared_df[prepared_df.duplicated(subset="Lde", keep=False)]
duplicated_building_complex_labels.to_excel('data/intermediate_results/duplicate_building_complex_labels.xlsx')
duplicated_building_complex_labels

Unnamed: 0,id_monastery_location,place_id,gsn_id,location_begin_tpq,location_begin_taq,location_begin_note,location_end_tpq,location_end_taq,location_end_note,longitude,latitude,location_name,id_gsn,status,note,monastery_name,Lde,Dde
67,1535,31028.0,3147,722.0,,,1290.0,,,,,,3147,Online,Ursprünglich mit irischen Mönchen besiedeltes ...,"Benediktinerkloster, später Kollegiatstift St....","""Gebäudekomplex Benediktinerkloster, später Ko...","""Gebäudekomplex des Benediktinerkloster, späte..."
93,16670,46479184.0,10695,1556.0,,,1625.0,,,14.4239,50.09237,Prager Altstadt (Staré Město),10695,Online,Das Dominikanerkloster St. Clemens wurde wahrs...,"Dominikanerkloster St. Clemens, Prag (Praha), ...","""Gebäudekomplex Dominikanerkloster St. Clemens...","""Gebäudekomplex Prager Altstadt (Staré Město) ..."
103,16681,46480666.0,11438,1761.0,,,1769.0,,,14.474259,48.977356,Budweis,11438,Online,Zunächst im ehemaligen St.-Wenzel Spital unter...,"Piaristenkolleg Budweis (České Budějovice), Ts...","""Gebäudekomplex Piaristenkolleg Budweis (České...","""Gebäudekomplex Budweis des Piaristenkollegs B..."
130,6402,17712.0,90301,1235.0,1245.0,um 1240,1564.0,,,11.96905,51.486079,,90301,Online,Die letzten acht Ordensbrüder gingen 1564 nach...,Franziskanerkloster Halle,"""Gebäudekomplex Franziskanerkloster Halle""","""Gebäudekomplex des Franziskanerklosters Halle"""
300,13243,46483830.0,8129,1569.0,,,1579.0,,,6.15699661768455,52.254353296813,,8129,Online,Die Franziskanerobservanten nahmen den Platz d...,"Franziskanerkloster Deventer, Niederlande","""Gebäudekomplex Franziskanerkloster Deventer, ...","""Gebäudekomplex des Franziskanerklosters Deven..."
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
8159,17872,3425.0,20251,1419.0,,,1644.0,,,7.844976,47.997426,Freiburg,20251,Online,Die Frühzeit des Regelhauses am Graben kaum zu...,Regelhaus am Graben/Dominikanerinnenkloster St...,"""Gebäudekomplex Regelhaus am Graben/Dominikane...","""Gebäudekomplex Freiburg des Regelhaus am Grab..."
8160,17873,3425.0,20590,1651.0,,,1677.0,,,7.84997808,47.99877003,Freiburg,20590,Online,Vereinigung mit den Dominikanerinnen des Klost...,Dominikanerinnenkloster St. Katharina in der W...,"""Gebäudekomplex Dominikanerinnenkloster St. Ka...","""Gebäudekomplex Freiburg des Dominikanerinnenk..."
8171,17885,46484699.0,11928,1348.0,,,1566.0,,,3.174419,51.180071,,11928,Online,Gründung einer Kartause vor den Mauern der Sta...,"Kartäuserinnenkloster Brügge, Belgien","""Gebäudekomplex Kartäuserinnenkloster Brügge, ...","""Gebäudekomplex des Kartäuserinnenklosters Brü..."
8188,17906,6344.0,20734,1423.0,1424.0,1423/1424,1436.0,,vor 1436,9.174574,47.663389,,20734,Online,Der Name der Gemeinschaft geht die Konstanzer ...,"Schwesternsammlung ""im Mäntellerinnenhaus"", Ko...","""Gebäudekomplex Schwesternsammlung ""im Mäntell...","""Gebäudekomplex des Schwesternsammlung ""im Män..."


### Translation of Labels

FactGrid is a multilingual platform. Therefore, the labels for the monasteries and building complexes should not only be created in German, but also in English. Due to the heterogeneity of the monastery names in the database, a rule-based translation is difficult to implement. Instead, a Large-Language Model was used. The model, prompting, and details of the translation are described in more detail in the notebook "1a - Translation". We are using the [GWDG/KISSKI API](https://docs.hpc.gwdg.de/services/chat-ai/index.html), so in order to execute the notebook, a [SAIA API key](https://docs.hpc.gwdg.de/services/saia/index.html) is needed. Since the translation process can take some time, it has been outsourced to a separate notebook.

In [478]:
to_translate = prepared_df[["monastery_name", 'Lde', 'Dde', "note"]].copy()
to_translate = to_translate.rename(columns={"Lde": "building_Lde", "Dde": "building_Dde", "monastery_name" : "monastery_Lde", "note": "monastery_Dde"})
to_translate.to_csv("data/translation/to_translate.csv")
to_translate

Unnamed: 0,monastery_Lde,building_Lde,building_Dde,monastery_Dde
0,Augustinerinnenkloster (Martinsklause) Cochem,"""Gebäudekomplex Augustinerinnenkloster (Martin...","""Gebäudekomplex des Augustinerinnenklosters (M...",1463 dem Kollegiatstift Pfalzel (GSN 1031) ink...
3,"Franziskanertertiarinnenkloster St. Andreas, K...","""Gebäudekomplex Franziskanertertiarinnenkloste...","""Gebäudekomplex des Franziskanertertiarinnenkl...",1543 Umsiedlung des Konvents in das Reuerinnen...
4,"Allerheiligenkloster Mainz, Weisenau","""Gebäudekomplex Allerheiligenkloster Mainz, We...","""Gebäudekomplex Weisenau des Allerheiligenklos...",
6,"Dominikanerinnenkloster St. Katharina, Trier","""Gebäudekomplex Dominikanerinnenkloster St. Ka...","""Gebäudekomplex des Dominikanerinnenklosters S...",Das Kloster geht auf eine kurz nach 1200 gegrü...
7,"Frauenkloster St. Martin auf dem Berge, Trier","""Gebäudekomplex Frauenkloster St. Martin auf d...","""Gebäudekomplex des Frauenklosters St. Martin ...",Das Kloster wurde dem Dominikanerinnenkloster ...
...,...,...,...,...
8171,"Kartäuserinnenkloster Brügge, Belgien","""Gebäudekomplex Kartäuserinnenkloster Brügge, ...","""Gebäudekomplex des Kartäuserinnenklosters Brü...",Gründung einer Kartause vor den Mauern der Sta...
8181,Schwesternsammlung Haslach,"""Gebäudekomplex Schwesternsammlung Haslach""","""Gebäudekomplex der Schwesternsammlung Haslach""","Einzig 1240 erwähnt, als der Bischof von Straß..."
8183,"Jesuitenkolleg Rößel, Polen","""Gebäudekomplex Jesuitenkolleg Rößel, Polen""","""Gebäudekomplex des Jesuitenkollegs Rößel, Polen""",
8188,"Schwesternsammlung ""im Mäntellerinnenhaus"", Ko...","""Gebäudekomplex Schwesternsammlung ""im Mäntell...","""Gebäudekomplex des Schwesternsammlung ""im Män...",Der Name der Gemeinschaft geht die Konstanzer ...


After executing the above cell, a table is generated in `data/translation` that contains all terms that should be translated: `to_translate.csv`. Execute Notebook 1a. Once the execution is completed, there should be a file named `translated.csv` that contains the translations within the `data/translation` folder. Once the file exists, you can run the next cell to load the translated labels.

In [479]:
translated = pd.read_csv("data/translation/translated.csv")
translated["building_Lde"] = translated["building_Lde"].str.strip().str.strip("\"\"\"").apply(lambda x:f'\"{x}\"' if not pd.isna(x) else np.nan)
translated
prepared_df = pd.merge(prepared_df, translated[["building_Lde", "building_Len"]], how="left", left_on="Lde", right_on="building_Lde").drop_duplicates(subset="id_monastery_location")
prepared_df.rename(columns={"building_Len":"Len"}, inplace=True)
prepared_df["Len"] = prepared_df["Len"].apply(lambda x:f'\"{x}\"' if not pd.isna(x) else np.nan)
prepared_df

Unnamed: 0,id_monastery_location,place_id,gsn_id,location_begin_tpq,location_begin_taq,location_begin_note,location_end_tpq,location_end_taq,location_end_note,longitude,latitude,location_name,id_gsn,status,note,monastery_name,Lde,Dde,building_Lde,Len
0,6051,2001.0,40356,1297.0,,1297 erste Erwähnung,1463.0,,,7.1660075206704485,50.14597836228394,,40356,Online,1463 dem Kollegiatstift Pfalzel (GSN 1031) ink...,Augustinerinnenkloster (Martinsklause) Cochem,"""Gebäudekomplex Augustinerinnenkloster (Martin...","""Gebäudekomplex des Augustinerinnenklosters (M...","""Gebäudekomplex Augustinerinnenkloster (Martin...","""Building complex of the Augustinian nuns' mon..."
1,6054,6179.0,40359,1434.0,1466.0,Mitte 15. Jahrhundert,1543.0,,,8.211983,49.939573,,40359,Online,1543 Umsiedlung des Konvents in das Reuerinnen...,"Franziskanertertiarinnenkloster St. Andreas, K...","""Gebäudekomplex Franziskanertertiarinnenkloste...","""Gebäudekomplex des Franziskanertertiarinnenkl...","""Gebäudekomplex Franziskanertertiarinnenkloste...","""Building complex of the Franciscans St. Andre..."
2,6055,7242.0,40360,1493.0,,,1802.0,,,8.294172,49.988671,Weisenau,40360,Online,,"Allerheiligenkloster Mainz, Weisenau","""Gebäudekomplex Allerheiligenkloster Mainz, We...","""Gebäudekomplex Weisenau des Allerheiligenklos...","""Gebäudekomplex Allerheiligenkloster Mainz, We...","""Building complex of the All Saints Monastery ..."
3,6057,11776.0,40362,1235.0,1238.0,1235/1238,1802.0,,,6.632027,49.759165,,40362,Online,Das Kloster geht auf eine kurz nach 1200 gegrü...,"Dominikanerinnenkloster St. Katharina, Trier","""Gebäudekomplex Dominikanerinnenkloster St. Ka...","""Gebäudekomplex des Dominikanerinnenklosters S...","""Gebäudekomplex Dominikanerinnenkloster St. Ka...","""Building complex of the Dominican Nuns' monas..."
4,6058,11776.0,40363,551.0,600.0,zweite Hälfte 6. Jahrhundert,1288.0,,,6.65873701917281,49.7524770967829,,40363,Online,Das Kloster wurde dem Dominikanerinnenkloster ...,"Frauenkloster St. Martin auf dem Berge, Trier","""Gebäudekomplex Frauenkloster St. Martin auf d...","""Gebäudekomplex des Frauenklosters St. Martin ...","""Gebäudekomplex Frauenkloster St. Martin auf d...","""Building complex of the Women's convent St. M..."
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
5062,17885,46484699.0,11928,1348.0,,,1566.0,,,3.174419,51.180071,,11928,Online,Gründung einer Kartause vor den Mauern der Sta...,"Kartäuserinnenkloster Brügge, Belgien","""Gebäudekomplex Kartäuserinnenkloster Brügge, ...","""Gebäudekomplex des Kartäuserinnenklosters Brü...","""Gebäudekomplex Kartäuserinnenkloster Brügge, ...","""Building complex of the Carthusian Nuns' mona..."
5064,17895,22625.0,12201,1240.0,,,1240.0,,,,,,12201,Online,"Einzig 1240 erwähnt, als der Bischof von Straß...",Schwesternsammlung Haslach,"""Gebäudekomplex Schwesternsammlung Haslach""","""Gebäudekomplex der Schwesternsammlung Haslach""",,
5065,17898,46484884.0,12204,1631.0,,,1780.0,,,21.14756,54.04918,,12204,Online,,"Jesuitenkolleg Rößel, Polen","""Gebäudekomplex Jesuitenkolleg Rößel, Polen""","""Gebäudekomplex des Jesuitenkollegs Rößel, Polen""",,
5066,17906,6344.0,20734,1423.0,1424.0,1423/1424,1436.0,,vor 1436,9.174574,47.663389,,20734,Online,Der Name der Gemeinschaft geht die Konstanzer ...,"Schwesternsammlung ""im Mäntellerinnenhaus"", Ko...","""Gebäudekomplex Schwesternsammlung ""im Mäntell...","""Gebäudekomplex des Schwesternsammlung ""im Män...","""Gebäudekomplex Schwesternsammlung ""im Mäntell...","""Building complex Women's convent ""in the Mänt..."


### OPTIONAL: If working with a pre-translated file
If you are uploading a lot of monasteries at once, it may be useful to translate all of them in batches before you run this notebook. If you do so, the following cell will double-check for missing translations so that they can be added retrospectively. A new file `to_translate.csv` will be created. Execute Notebook 1a to translate the missing labels, then copy the resulting CSV to the end of `translated.csv`.

In [480]:
missing_label_translations = prepared_df[prepared_df["Len"].isna()]
to_translate = missing_label_translations[["monastery_name", 'Lde', 'Dde', "note"]].copy()
to_translate = to_translate.rename(columns={"Lde": "building_Lde", "Dde": "building_Dde", "monastery_name" : "monastery_Lde", "note": "monastery_Dde"})
to_translate.to_csv("data/translation/to_translate.csv")
to_translate

Unnamed: 0,monastery_Lde,building_Lde,building_Dde,monastery_Dde
621,"Schwesternsammlungen ""Frauen von Nordheim/Ihri...","""Gebäudekomplex Schwesternsammlungen ""Frauen v...","""Gebäudekomplex der Schwesternsammlungen ""Frau...",Aus den Jahren 1327 und 1341 sind zwei Zeugnis...
4763,"Zisterzienserinnenabtei Oosteeklo, später Gent...","""Gebäudekomplex Zisterzienserinnenabtei Oostee...","""Gebäudekomplex Oosteeklo der Zisterzienserinn...",1164 privilegierte Philipp von Elsass den Zist...
4764,"Zisterzienserinnenkloster Orienten, Rummen, Be...","""Gebäudekomplex Zisterzienserinnenkloster Orie...","""Gebäudekomplex des Zisterzienserinnenklosters...",Lokalisierung nach Ortsmittelpunkt.
4781,"Augustinereremitenkloster Unter Rotschow, Tsch...","""Gebäudekomplex Augustinereremitenkloster Unte...","""Gebäudekomplex Unter Rotschow des Augustinere...",
4782,"Jesuitenniederlassung Illuxt (Ilūkste), Lettland","""Gebäudekomplex Jesuitenniederlassung Illuxt (...","""Gebäudekomplex Illuxt der Jesuitenniederlassu...",Die Jesuitenniederlassung Illuxt war die größt...
...,...,...,...,...
5052,Benediktinerpropstei Ebringen,"""Gebäudekomplex Benediktinerpropstei Ebringen""","""Gebäudekomplex der Benediktinerpropstei Ebrin...",Propstei der Benediktinerabtei St. Gallen (GSN...
5053,Benediktinerpropstei Ebringen,"""Gebäudekomplex Benediktinerpropstei Ebringen""","""Gebäudekomplex der Benediktinerpropstei Ebrin...",Propstei der Benediktinerabtei St. Gallen (GSN...
5054,Schwesternsammlung Klause zu Eichstetten,"""Gebäudekomplex Schwesternsammlung Klause zu E...","""Gebäudekomplex der Schwesternsammlung Klause ...",1326 sind erstmals zwei Schwestern urkundlich ...
5064,Schwesternsammlung Haslach,"""Gebäudekomplex Schwesternsammlung Haslach""","""Gebäudekomplex der Schwesternsammlung Haslach""","Einzig 1240 erwähnt, als der Bischof von Straß..."


Once the missing translations have been added, rerun the notebook.

## Geocoordinates

Our data model separates religious communities from the building complexes in which they lived and worked. The geocoordinates of a location of a religious community are properties of the building complex in this modeling. In the monastery database, there are two levels of accuracy with which the localization of a monastery location can be performed: coordinates for a monastery location will either represent the exact point where the building was located, or the central point of a place, e.g. a village, in which it was located. It is to be noted that the centroid-based location always only represents an approximation of the centroid of the modern location. In cases where the exact location of the building complex is unknown, the respective item will not be linked to any coordinates. Instead, the coordinates of the place where it is located should be queried. In all other cases, the coordinates are directly linked to the building complexes, using values from the `latitude` and `longitude` columns as [P48](https://database.factgrid.de/wiki/Property:P48).

In [481]:
for index, row in prepared_df.iterrows():
    if (not pd.isna(row["latitude"])) and (not pd.isna(row["longitude"])):
        prepared_df.loc[index, "P48"] = f'@{row["latitude"]}/{row["longitude"]}'
prepared_df

Unnamed: 0,id_monastery_location,place_id,gsn_id,location_begin_tpq,location_begin_taq,location_begin_note,location_end_tpq,location_end_taq,location_end_note,longitude,...,location_name,id_gsn,status,note,monastery_name,Lde,Dde,building_Lde,Len,P48
0,6051,2001.0,40356,1297.0,,1297 erste Erwähnung,1463.0,,,7.1660075206704485,...,,40356,Online,1463 dem Kollegiatstift Pfalzel (GSN 1031) ink...,Augustinerinnenkloster (Martinsklause) Cochem,"""Gebäudekomplex Augustinerinnenkloster (Martin...","""Gebäudekomplex des Augustinerinnenklosters (M...","""Gebäudekomplex Augustinerinnenkloster (Martin...","""Building complex of the Augustinian nuns' mon...",@50.14597836228394/7.1660075206704485
1,6054,6179.0,40359,1434.0,1466.0,Mitte 15. Jahrhundert,1543.0,,,8.211983,...,,40359,Online,1543 Umsiedlung des Konvents in das Reuerinnen...,"Franziskanertertiarinnenkloster St. Andreas, K...","""Gebäudekomplex Franziskanertertiarinnenkloste...","""Gebäudekomplex des Franziskanertertiarinnenkl...","""Gebäudekomplex Franziskanertertiarinnenkloste...","""Building complex of the Franciscans St. Andre...",@49.939573/8.211983
2,6055,7242.0,40360,1493.0,,,1802.0,,,8.294172,...,Weisenau,40360,Online,,"Allerheiligenkloster Mainz, Weisenau","""Gebäudekomplex Allerheiligenkloster Mainz, We...","""Gebäudekomplex Weisenau des Allerheiligenklos...","""Gebäudekomplex Allerheiligenkloster Mainz, We...","""Building complex of the All Saints Monastery ...",@49.988671/8.294172
3,6057,11776.0,40362,1235.0,1238.0,1235/1238,1802.0,,,6.632027,...,,40362,Online,Das Kloster geht auf eine kurz nach 1200 gegrü...,"Dominikanerinnenkloster St. Katharina, Trier","""Gebäudekomplex Dominikanerinnenkloster St. Ka...","""Gebäudekomplex des Dominikanerinnenklosters S...","""Gebäudekomplex Dominikanerinnenkloster St. Ka...","""Building complex of the Dominican Nuns' monas...",@49.759165/6.632027
4,6058,11776.0,40363,551.0,600.0,zweite Hälfte 6. Jahrhundert,1288.0,,,6.65873701917281,...,,40363,Online,Das Kloster wurde dem Dominikanerinnenkloster ...,"Frauenkloster St. Martin auf dem Berge, Trier","""Gebäudekomplex Frauenkloster St. Martin auf d...","""Gebäudekomplex des Frauenklosters St. Martin ...","""Gebäudekomplex Frauenkloster St. Martin auf d...","""Building complex of the Women's convent St. M...",@49.7524770967829/6.65873701917281
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
5062,17885,46484699.0,11928,1348.0,,,1566.0,,,3.174419,...,,11928,Online,Gründung einer Kartause vor den Mauern der Sta...,"Kartäuserinnenkloster Brügge, Belgien","""Gebäudekomplex Kartäuserinnenkloster Brügge, ...","""Gebäudekomplex des Kartäuserinnenklosters Brü...","""Gebäudekomplex Kartäuserinnenkloster Brügge, ...","""Building complex of the Carthusian Nuns' mona...",@51.180071/3.174419
5064,17895,22625.0,12201,1240.0,,,1240.0,,,,...,,12201,Online,"Einzig 1240 erwähnt, als der Bischof von Straß...",Schwesternsammlung Haslach,"""Gebäudekomplex Schwesternsammlung Haslach""","""Gebäudekomplex der Schwesternsammlung Haslach""",,,
5065,17898,46484884.0,12204,1631.0,,,1780.0,,,21.14756,...,,12204,Online,,"Jesuitenkolleg Rößel, Polen","""Gebäudekomplex Jesuitenkolleg Rößel, Polen""","""Gebäudekomplex des Jesuitenkollegs Rößel, Polen""",,,@54.04918/21.14756
5066,17906,6344.0,20734,1423.0,1424.0,1423/1424,1436.0,,vor 1436,9.174574,...,,20734,Online,Der Name der Gemeinschaft geht die Konstanzer ...,"Schwesternsammlung ""im Mäntellerinnenhaus"", Ko...","""Gebäudekomplex Schwesternsammlung ""im Mäntell...","""Gebäudekomplex des Schwesternsammlung ""im Män...","""Gebäudekomplex Schwesternsammlung ""im Mäntell...","""Building complex Women's convent ""in the Mänt...",@47.663389/9.174574


### Duplicates of Coordinates

There are special cases, in which there may be duplicate coordinates within the database. Generally speaking, if two coordinates of building complexes are exactly the same, we consider the building complexes to be identical, so only one item should be created. There are two occasions in which this can happen:

1. If a religious community returns to a previously inhabited building complex.
2. If the diocese in which the building complex was located changes. In this case, the change of diocese is represented as a new monastery location with identical coordinates in the monastery database.

In both cases, only one item should be created. In case 1, this item needs to be linked to the respective religious community two or more times. This is handled by Notebook 3. In case 2, it should only be linked once, but it should have two dioceses linked to reflect the change in diocese. This is handled in the section "dioceses" of this Notebook. In the next cell, a list of all coordinate duplicates is created for future use.

In [482]:
# Find occurences of identical coordinates
coord_duplicates = prepared_df[prepared_df.duplicated(subset="P48", keep=False)].dropna(subset="P48").drop_duplicates(subset="id_monastery_location", keep=False)
prepared_df = prepared_df[(~prepared_df.duplicated(subset="P48")) | (prepared_df['P48'].isnull())]
coord_duplicates 

Unnamed: 0,id_monastery_location,place_id,gsn_id,location_begin_tpq,location_begin_taq,location_begin_note,location_end_tpq,location_end_taq,location_end_note,longitude,...,location_name,id_gsn,status,note,monastery_name,Lde,Dde,building_Lde,Len,P48
47,16670,46479184.0,10695,1556.0,,,1625.0,,,14.4239,...,Prager Altstadt (Staré Město),10695,Online,Das Dominikanerkloster St. Clemens wurde wahrs...,"Dominikanerkloster St. Clemens, Prag (Praha), ...","""Gebäudekomplex Dominikanerkloster St. Clemens...","""Gebäudekomplex Prager Altstadt (Staré Město) ...","""Gebäudekomplex Dominikanerkloster St. Clemens...","""Building complex of the Dominican monastery S...",@50.09237/14.4239
81,492,2060.0,492,1264.0,,12./frühes 13. Jahrhundert,1369.0,,ca. 1369,8.8413888888889,...,,492,Online,1264 ist das Kloster erstmals bezeugt.,Augustinerinnenkloster Dalheim,"""Gebäudekomplex Augustinerinnenkloster Dalheim""","""Gebäudekomplex des Augustinerinnenklosters Da...","""Gebäudekomplex Augustinerinnenkloster Dalheim""","""Building complex of the Augustinian nuns' mon...",@51.565277777778/8.8413888888889
82,493,2060.0,493,1429.0,,,1803.0,,,8.8413888888889,...,,493,Online,1429 war das Stift dem Stift Böddeken inkorpor...,Augustinerchorherrenstift Dalheim,"""Gebäudekomplex Augustinerchorherrenstift Dalh...","""Gebäudekomplex des Augustinerchorherrenstifts...","""Gebäudekomplex Augustinerchorherrenstift Dalh...","""Building complex of the Canons Regular of St ...",@51.565277777778/8.8413888888889
126,11154,46480179.0,70045,1703.0,,,,,heute,11.071043,...,,70045,Online,,"Franziskanerkloster Telfs, Österreich","""Gebäudekomplex Franziskanerkloster Telfs, Öst...","""Gebäudekomplex des Franziskanerklosters Telfs...","""Gebäudekomplex Franziskanerkloster Telfs, Öst...","""Building complex of the Franciscans Telfs, Au...",@47.307616/11.071043
127,11155,46480156.0,70046,1720.0,,,1782.0,,,11.071043,...,,70046,Online,,"Klarissenkloster Hall in Tirol, Österreich","""Gebäudekomplex Klarissenkloster Hall in Tirol...","""Gebäudekomplex des Klarissenklosters Hall in ...","""Gebäudekomplex Klarissenkloster Hall in Tirol...","""Building complex of the Clarissine nunnery Ha...",@47.307616/11.071043
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
5027,17766,46484884.0,12101,1631.0,,,1780.0,,,21.14756,...,,12101,Online,,"Augustinereremitenkloster Rößel, Polen","""Gebäudekomplex Augustinereremitenkloster Röße...","""Gebäudekomplex des Augustinereremitenklosters...",,,@54.04918/21.14756
5035,17809,20077.0,12137,1478.0,,,1531.0,,,12.381729,...,,12137,Online,Im Jahr 1478 trat das Augustiner-Chorherrensti...,Kartäuserkloster Crimmitschau,"""Gebäudekomplex Kartäuserkloster Crimmitschau""","""Gebäudekomplex des Kartäuserklosters Crimmits...","""Gebäudekomplex Kartäuserkloster Crimmitschau""","""Building complex of the Carthusians Crimmitsc...",@50.799839/12.381729
5044,17852,3425.0,20261,1784.0,,,1810.0,,,7.85035692,...,Freiburg,20261,Online,Die erste urkundliche Erwähnung im Jahr 1278 l...,Augustinerkloster Freiburg,"""Gebäudekomplex Augustinerkloster Freiburg (Fr...","""Gebäudekomplex Freiburg des Augustinerkloster...","""Gebäudekomplex Augustinerkloster Freiburg (Fr...","""Building complex of the Augustinian monastery...",@47.99619261/7.85035692
5059,17873,3425.0,20590,1651.0,,,1677.0,,,7.84997808,...,Freiburg,20590,Online,Vereinigung mit den Dominikanerinnen des Klost...,Dominikanerinnenkloster St. Katharina in der W...,"""Gebäudekomplex Dominikanerinnenkloster St. Ka...","""Gebäudekomplex Freiburg des Dominikanerinnenk...","""Gebäudekomplex Dominikanerinnenkloster St. Ka...","""Building complex of the Dominican Nuns' monas...",@47.99877003/7.84997808


In [483]:
prepared_df

Unnamed: 0,id_monastery_location,place_id,gsn_id,location_begin_tpq,location_begin_taq,location_begin_note,location_end_tpq,location_end_taq,location_end_note,longitude,...,location_name,id_gsn,status,note,monastery_name,Lde,Dde,building_Lde,Len,P48
0,6051,2001.0,40356,1297.0,,1297 erste Erwähnung,1463.0,,,7.1660075206704485,...,,40356,Online,1463 dem Kollegiatstift Pfalzel (GSN 1031) ink...,Augustinerinnenkloster (Martinsklause) Cochem,"""Gebäudekomplex Augustinerinnenkloster (Martin...","""Gebäudekomplex des Augustinerinnenklosters (M...","""Gebäudekomplex Augustinerinnenkloster (Martin...","""Building complex of the Augustinian nuns' mon...",@50.14597836228394/7.1660075206704485
1,6054,6179.0,40359,1434.0,1466.0,Mitte 15. Jahrhundert,1543.0,,,8.211983,...,,40359,Online,1543 Umsiedlung des Konvents in das Reuerinnen...,"Franziskanertertiarinnenkloster St. Andreas, K...","""Gebäudekomplex Franziskanertertiarinnenkloste...","""Gebäudekomplex des Franziskanertertiarinnenkl...","""Gebäudekomplex Franziskanertertiarinnenkloste...","""Building complex of the Franciscans St. Andre...",@49.939573/8.211983
2,6055,7242.0,40360,1493.0,,,1802.0,,,8.294172,...,Weisenau,40360,Online,,"Allerheiligenkloster Mainz, Weisenau","""Gebäudekomplex Allerheiligenkloster Mainz, We...","""Gebäudekomplex Weisenau des Allerheiligenklos...","""Gebäudekomplex Allerheiligenkloster Mainz, We...","""Building complex of the All Saints Monastery ...",@49.988671/8.294172
3,6057,11776.0,40362,1235.0,1238.0,1235/1238,1802.0,,,6.632027,...,,40362,Online,Das Kloster geht auf eine kurz nach 1200 gegrü...,"Dominikanerinnenkloster St. Katharina, Trier","""Gebäudekomplex Dominikanerinnenkloster St. Ka...","""Gebäudekomplex des Dominikanerinnenklosters S...","""Gebäudekomplex Dominikanerinnenkloster St. Ka...","""Building complex of the Dominican Nuns' monas...",@49.759165/6.632027
4,6058,11776.0,40363,551.0,600.0,zweite Hälfte 6. Jahrhundert,1288.0,,,6.65873701917281,...,,40363,Online,Das Kloster wurde dem Dominikanerinnenkloster ...,"Frauenkloster St. Martin auf dem Berge, Trier","""Gebäudekomplex Frauenkloster St. Martin auf d...","""Gebäudekomplex des Frauenklosters St. Martin ...","""Gebäudekomplex Frauenkloster St. Martin auf d...","""Building complex of the Women's convent St. M...",@49.7524770967829/6.65873701917281
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
5061,17876,3425.0,20090,1272.0,1279.0,1272/1279,1782.0,,,7.845306,...,Freiburg,20090,Online,Die Unklarheit bezüglich des Gründungsjahres b...,Klarissenkloster Freiburg,"""Gebäudekomplex Klarissenkloster Freiburg (Fre...","""Gebäudekomplex Freiburg des Klarissenklosters...","""Gebäudekomplex Klarissenkloster Freiburg (Fre...","""Building complex of the Clarissine nunnery Fr...",@47.996836/7.845306
5062,17885,46484699.0,11928,1348.0,,,1566.0,,,3.174419,...,,11928,Online,Gründung einer Kartause vor den Mauern der Sta...,"Kartäuserinnenkloster Brügge, Belgien","""Gebäudekomplex Kartäuserinnenkloster Brügge, ...","""Gebäudekomplex des Kartäuserinnenklosters Brü...","""Gebäudekomplex Kartäuserinnenkloster Brügge, ...","""Building complex of the Carthusian Nuns' mona...",@51.180071/3.174419
5064,17895,22625.0,12201,1240.0,,,1240.0,,,,...,,12201,Online,"Einzig 1240 erwähnt, als der Bischof von Straß...",Schwesternsammlung Haslach,"""Gebäudekomplex Schwesternsammlung Haslach""","""Gebäudekomplex der Schwesternsammlung Haslach""",,,
5066,17906,6344.0,20734,1423.0,1424.0,1423/1424,1436.0,,vor 1436,9.174574,...,,20734,Online,Der Name der Gemeinschaft geht die Konstanzer ...,"Schwesternsammlung ""im Mäntellerinnenhaus"", Ko...","""Gebäudekomplex Schwesternsammlung ""im Mäntell...","""Gebäudekomplex des Schwesternsammlung ""im Män...","""Gebäudekomplex Schwesternsammlung ""im Mäntell...","""Building complex Women's convent ""in the Mänt...",@47.663389/9.174574


## Connection to places

The prerequisite for connecting all building complexes with the locations in which they were found is that there are items in FactGrid for these locations. For the collection on locality data in the monastery database, the open source service [geonames](https://www.geonames.org/) was the central tool. Therefore, there is a geonames ID in the monastery database for each location. In FactGrid, there is also a qualifier (P418) for the GeoNames ID. This can be used to assign the location data to each other and to subsequently fill in missing locations. The notebook 1b - Place Matching describes this process.

In order to match all places needed, a matching between FactGrid and the place data from the monastery database is needed. All information that is already available should be placed in a file called `places_reconciled.xlsx` in the `reconciliation` folder. Make sure that the table has at least a column called `place_id` and one called `factgrid_id` that represent the id of the place in the table `gs_places` and in FactGrid respectively. The following cell will load the reconciled places and merge them to the data. If any places remain without a FactGrid id, they will be saved in a new table called `places_without_factgrid.xlsx` in the `reconciliation` folder. Find or create the missing Items in Factgrid and add the information to the `places_reconciled.xlsx` table in the `reconciliation` folder. Afterwards, re-run the workflow. 

In [484]:
# 1. Load the reconciled places
places_reconciled = pd.read_excel("data/reconciliation/places_reconciled.xlsx")[["place_id", "factgrid_id"]]
# 2. Merge them to the table with prepared monasteries
prepared_df = pd.merge(prepared_df, places_reconciled, how="left", on="place_id")
prepared_df = prepared_df.rename(columns={"factgrid_id":"P83"})
prepared_df
# 3. Filter out missing FactGrid Items and store them in a separate table
missing_factgrid_ids = prepared_df[prepared_df['P83'].isna()]
missing_factgrid_ids.to_excel('data/reconciliation/places_without_factGrid.xlsx')
prepared_df = prepared_df.dropna(subset = 'P83')
prepared_df

Unnamed: 0,id_monastery_location,place_id,gsn_id,location_begin_tpq,location_begin_taq,location_begin_note,location_end_tpq,location_end_taq,location_end_note,longitude,...,id_gsn,status,note,monastery_name,Lde,Dde,building_Lde,Len,P48,P83
0,6051,2001.0,40356,1297.0,,1297 erste Erwähnung,1463.0,,,7.1660075206704485,...,40356,Online,1463 dem Kollegiatstift Pfalzel (GSN 1031) ink...,Augustinerinnenkloster (Martinsklause) Cochem,"""Gebäudekomplex Augustinerinnenkloster (Martin...","""Gebäudekomplex des Augustinerinnenklosters (M...","""Gebäudekomplex Augustinerinnenkloster (Martin...","""Building complex of the Augustinian nuns' mon...",@50.14597836228394/7.1660075206704485,Q83856
1,6054,6179.0,40359,1434.0,1466.0,Mitte 15. Jahrhundert,1543.0,,,8.211983,...,40359,Online,1543 Umsiedlung des Konvents in das Reuerinnen...,"Franziskanertertiarinnenkloster St. Andreas, K...","""Gebäudekomplex Franziskanertertiarinnenkloste...","""Gebäudekomplex des Franziskanertertiarinnenkl...","""Gebäudekomplex Franziskanertertiarinnenkloste...","""Building complex of the Franciscans St. Andre...",@49.939573/8.211983,Q87364
2,6055,7242.0,40360,1493.0,,,1802.0,,,8.294172,...,40360,Online,,"Allerheiligenkloster Mainz, Weisenau","""Gebäudekomplex Allerheiligenkloster Mainz, We...","""Gebäudekomplex Weisenau des Allerheiligenklos...","""Gebäudekomplex Allerheiligenkloster Mainz, We...","""Building complex of the All Saints Monastery ...",@49.988671/8.294172,Q10417
3,6057,11776.0,40362,1235.0,1238.0,1235/1238,1802.0,,,6.632027,...,40362,Online,Das Kloster geht auf eine kurz nach 1200 gegrü...,"Dominikanerinnenkloster St. Katharina, Trier","""Gebäudekomplex Dominikanerinnenkloster St. Ka...","""Gebäudekomplex des Dominikanerinnenklosters S...","""Gebäudekomplex Dominikanerinnenkloster St. Ka...","""Building complex of the Dominican Nuns' monas...",@49.759165/6.632027,Q10483
4,6058,11776.0,40363,551.0,600.0,zweite Hälfte 6. Jahrhundert,1288.0,,,6.65873701917281,...,40363,Online,Das Kloster wurde dem Dominikanerinnenkloster ...,"Frauenkloster St. Martin auf dem Berge, Trier","""Gebäudekomplex Frauenkloster St. Martin auf d...","""Gebäudekomplex des Frauenklosters St. Martin ...","""Gebäudekomplex Frauenkloster St. Martin auf d...","""Building complex of the Women's convent St. M...",@49.7524770967829/6.65873701917281,Q10483
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
4345,17876,3425.0,20090,1272.0,1279.0,1272/1279,1782.0,,,7.845306,...,20090,Online,Die Unklarheit bezüglich des Gründungsjahres b...,Klarissenkloster Freiburg,"""Gebäudekomplex Klarissenkloster Freiburg (Fre...","""Gebäudekomplex Freiburg des Klarissenklosters...","""Gebäudekomplex Klarissenkloster Freiburg (Fre...","""Building complex of the Clarissine nunnery Fr...",@47.996836/7.845306,Q10354
4346,17885,46484699.0,11928,1348.0,,,1566.0,,,3.174419,...,11928,Online,Gründung einer Kartause vor den Mauern der Sta...,"Kartäuserinnenkloster Brügge, Belgien","""Gebäudekomplex Kartäuserinnenkloster Brügge, ...","""Gebäudekomplex des Kartäuserinnenklosters Brü...","""Gebäudekomplex Kartäuserinnenkloster Brügge, ...","""Building complex of the Carthusian Nuns' mona...",@51.180071/3.174419,Q140903
4347,17895,22625.0,12201,1240.0,,,1240.0,,,,...,12201,Online,"Einzig 1240 erwähnt, als der Bischof von Straß...",Schwesternsammlung Haslach,"""Gebäudekomplex Schwesternsammlung Haslach""","""Gebäudekomplex der Schwesternsammlung Haslach""",,,,Q297848
4348,17906,6344.0,20734,1423.0,1424.0,1423/1424,1436.0,,vor 1436,9.174574,...,20734,Online,Der Name der Gemeinschaft geht die Konstanzer ...,"Schwesternsammlung ""im Mäntellerinnenhaus"", Ko...","""Gebäudekomplex Schwesternsammlung ""im Mäntell...","""Gebäudekomplex des Schwesternsammlung ""im Män...","""Gebäudekomplex Schwesternsammlung ""im Mäntell...","""Building complex Women's convent ""in the Mänt...",@47.663389/9.174574,Q22566


## Instance of statement

To state that these items are building complexes, the Item [Q635758](https://database.factgrid.de/wiki/Item:Q635758) (building complex) is connected to all entries using [P2](https://database.factgrid.de/wiki/Property:P2) (instance of)

In [485]:
prepared_df["P2"] = "Q635758"
prepared_df

Unnamed: 0,id_monastery_location,place_id,gsn_id,location_begin_tpq,location_begin_taq,location_begin_note,location_end_tpq,location_end_taq,location_end_note,longitude,...,status,note,monastery_name,Lde,Dde,building_Lde,Len,P48,P83,P2
0,6051,2001.0,40356,1297.0,,1297 erste Erwähnung,1463.0,,,7.1660075206704485,...,Online,1463 dem Kollegiatstift Pfalzel (GSN 1031) ink...,Augustinerinnenkloster (Martinsklause) Cochem,"""Gebäudekomplex Augustinerinnenkloster (Martin...","""Gebäudekomplex des Augustinerinnenklosters (M...","""Gebäudekomplex Augustinerinnenkloster (Martin...","""Building complex of the Augustinian nuns' mon...",@50.14597836228394/7.1660075206704485,Q83856,Q635758
1,6054,6179.0,40359,1434.0,1466.0,Mitte 15. Jahrhundert,1543.0,,,8.211983,...,Online,1543 Umsiedlung des Konvents in das Reuerinnen...,"Franziskanertertiarinnenkloster St. Andreas, K...","""Gebäudekomplex Franziskanertertiarinnenkloste...","""Gebäudekomplex des Franziskanertertiarinnenkl...","""Gebäudekomplex Franziskanertertiarinnenkloste...","""Building complex of the Franciscans St. Andre...",@49.939573/8.211983,Q87364,Q635758
2,6055,7242.0,40360,1493.0,,,1802.0,,,8.294172,...,Online,,"Allerheiligenkloster Mainz, Weisenau","""Gebäudekomplex Allerheiligenkloster Mainz, We...","""Gebäudekomplex Weisenau des Allerheiligenklos...","""Gebäudekomplex Allerheiligenkloster Mainz, We...","""Building complex of the All Saints Monastery ...",@49.988671/8.294172,Q10417,Q635758
3,6057,11776.0,40362,1235.0,1238.0,1235/1238,1802.0,,,6.632027,...,Online,Das Kloster geht auf eine kurz nach 1200 gegrü...,"Dominikanerinnenkloster St. Katharina, Trier","""Gebäudekomplex Dominikanerinnenkloster St. Ka...","""Gebäudekomplex des Dominikanerinnenklosters S...","""Gebäudekomplex Dominikanerinnenkloster St. Ka...","""Building complex of the Dominican Nuns' monas...",@49.759165/6.632027,Q10483,Q635758
4,6058,11776.0,40363,551.0,600.0,zweite Hälfte 6. Jahrhundert,1288.0,,,6.65873701917281,...,Online,Das Kloster wurde dem Dominikanerinnenkloster ...,"Frauenkloster St. Martin auf dem Berge, Trier","""Gebäudekomplex Frauenkloster St. Martin auf d...","""Gebäudekomplex des Frauenklosters St. Martin ...","""Gebäudekomplex Frauenkloster St. Martin auf d...","""Building complex of the Women's convent St. M...",@49.7524770967829/6.65873701917281,Q10483,Q635758
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
4345,17876,3425.0,20090,1272.0,1279.0,1272/1279,1782.0,,,7.845306,...,Online,Die Unklarheit bezüglich des Gründungsjahres b...,Klarissenkloster Freiburg,"""Gebäudekomplex Klarissenkloster Freiburg (Fre...","""Gebäudekomplex Freiburg des Klarissenklosters...","""Gebäudekomplex Klarissenkloster Freiburg (Fre...","""Building complex of the Clarissine nunnery Fr...",@47.996836/7.845306,Q10354,Q635758
4346,17885,46484699.0,11928,1348.0,,,1566.0,,,3.174419,...,Online,Gründung einer Kartause vor den Mauern der Sta...,"Kartäuserinnenkloster Brügge, Belgien","""Gebäudekomplex Kartäuserinnenkloster Brügge, ...","""Gebäudekomplex des Kartäuserinnenklosters Brü...","""Gebäudekomplex Kartäuserinnenkloster Brügge, ...","""Building complex of the Carthusian Nuns' mona...",@51.180071/3.174419,Q140903,Q635758
4347,17895,22625.0,12201,1240.0,,,1240.0,,,,...,Online,"Einzig 1240 erwähnt, als der Bischof von Straß...",Schwesternsammlung Haslach,"""Gebäudekomplex Schwesternsammlung Haslach""","""Gebäudekomplex der Schwesternsammlung Haslach""",,,,Q297848,Q635758
4348,17906,6344.0,20734,1423.0,1424.0,1423/1424,1436.0,,vor 1436,9.174574,...,Online,Der Name der Gemeinschaft geht die Konstanzer ...,"Schwesternsammlung ""im Mäntellerinnenhaus"", Ko...","""Gebäudekomplex Schwesternsammlung ""im Mäntell...","""Gebäudekomplex des Schwesternsammlung ""im Män...","""Gebäudekomplex Schwesternsammlung ""im Mäntell...","""Building complex Women's convent ""in the Mänt...",@47.663389/9.174574,Q22566,Q635758


## Vocabulary Terms

In order to keep a mapping between the monastery database and FactGrid, every item will receive a distinct vocabulary term that is constructed using the `id_monastery_location` from the `gs_monastery_location` table. The FactGrid Property to use is [P1301](https://database.factgrid.de/wiki/Property:P1301) (GS vocabulary term). For the construction, the following pattern is being used: `GSMonasteryLocation<id_monastery_location>`.

In [486]:
prepared_df['P1301'] = prepared_df['id_monastery_location'].apply(lambda x: f'\"GSMonasteryLocation{x}\"')
# Handle Vocabulary Terms for duplicated coords within new imports
for index, row in prepared_df.iterrows():
    if row["id_monastery_location"] in coord_duplicates["id_monastery_location"].values:
        x = 0
        for i, r in coord_duplicates[coord_duplicates["gsn_id"] == row["gsn_id"]].iterrows():
            if x == 0:
                x += 1
                continue
            else:
                prepared_df.loc[prepared_df["P48"] == r["P48"], f"P1301.{x}" ] = f'\"GSMonasteryLocation{r["id_monastery_location"]}\"'
                x += 1
prepared_df

Unnamed: 0,id_monastery_location,place_id,gsn_id,location_begin_tpq,location_begin_taq,location_begin_note,location_end_tpq,location_end_taq,location_end_note,longitude,...,Dde,building_Lde,Len,P48,P83,P2,P1301,P1301.1,P1301.2,P1301.3
0,6051,2001.0,40356,1297.0,,1297 erste Erwähnung,1463.0,,,7.1660075206704485,...,"""Gebäudekomplex des Augustinerinnenklosters (M...","""Gebäudekomplex Augustinerinnenkloster (Martin...","""Building complex of the Augustinian nuns' mon...",@50.14597836228394/7.1660075206704485,Q83856,Q635758,"""GSMonasteryLocation6051""",,,
1,6054,6179.0,40359,1434.0,1466.0,Mitte 15. Jahrhundert,1543.0,,,8.211983,...,"""Gebäudekomplex des Franziskanertertiarinnenkl...","""Gebäudekomplex Franziskanertertiarinnenkloste...","""Building complex of the Franciscans St. Andre...",@49.939573/8.211983,Q87364,Q635758,"""GSMonasteryLocation6054""",,,
2,6055,7242.0,40360,1493.0,,,1802.0,,,8.294172,...,"""Gebäudekomplex Weisenau des Allerheiligenklos...","""Gebäudekomplex Allerheiligenkloster Mainz, We...","""Building complex of the All Saints Monastery ...",@49.988671/8.294172,Q10417,Q635758,"""GSMonasteryLocation6055""",,,
3,6057,11776.0,40362,1235.0,1238.0,1235/1238,1802.0,,,6.632027,...,"""Gebäudekomplex des Dominikanerinnenklosters S...","""Gebäudekomplex Dominikanerinnenkloster St. Ka...","""Building complex of the Dominican Nuns' monas...",@49.759165/6.632027,Q10483,Q635758,"""GSMonasteryLocation6057""",,,
4,6058,11776.0,40363,551.0,600.0,zweite Hälfte 6. Jahrhundert,1288.0,,,6.65873701917281,...,"""Gebäudekomplex des Frauenklosters St. Martin ...","""Gebäudekomplex Frauenkloster St. Martin auf d...","""Building complex of the Women's convent St. M...",@49.7524770967829/6.65873701917281,Q10483,Q635758,"""GSMonasteryLocation6058""",,,
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
4345,17876,3425.0,20090,1272.0,1279.0,1272/1279,1782.0,,,7.845306,...,"""Gebäudekomplex Freiburg des Klarissenklosters...","""Gebäudekomplex Klarissenkloster Freiburg (Fre...","""Building complex of the Clarissine nunnery Fr...",@47.996836/7.845306,Q10354,Q635758,"""GSMonasteryLocation17876""",,,
4346,17885,46484699.0,11928,1348.0,,,1566.0,,,3.174419,...,"""Gebäudekomplex des Kartäuserinnenklosters Brü...","""Gebäudekomplex Kartäuserinnenkloster Brügge, ...","""Building complex of the Carthusian Nuns' mona...",@51.180071/3.174419,Q140903,Q635758,"""GSMonasteryLocation17885""",,,
4347,17895,22625.0,12201,1240.0,,,1240.0,,,,...,"""Gebäudekomplex der Schwesternsammlung Haslach""",,,,Q297848,Q635758,"""GSMonasteryLocation17895""",,,
4348,17906,6344.0,20734,1423.0,1424.0,1423/1424,1436.0,,vor 1436,9.174574,...,"""Gebäudekomplex des Schwesternsammlung ""im Män...","""Gebäudekomplex Schwesternsammlung ""im Mäntell...","""Building complex Women's convent ""in the Mänt...",@47.663389/9.174574,Q22566,Q635758,"""GSMonasteryLocation17906""",,,


In [487]:
# building_complexes_with_coordinates = prepared_df[["gsn_id", "P1301", "P83", "P48", "location_begin_tpq", "location_begin_taq", "location_end_tpq", "location_end_taq", "place_id"]].rename(columns={"P1301":"id_monastery_location", "P83":"place_factgrid", "P48":"coordinates"})
# building_complexes_with_coordinates["id_monastery_location"] = building_complexes_with_coordinates["id_monastery_location"].str.strip("\"").str.split("Location").str[-1].astype(int)
# building_complexes_with_coordinates["latitude"] = building_complexes_with_coordinates["coordinates"].str.split("/").str[0].str[1:].astype(float)
# building_complexes_with_coordinates["longitude"] = building_complexes_with_coordinates["coordinates"].str.split("/").str[1].astype(float)
# building_complexes_with_coordinates.drop(columns=["coordinates"])
# building_complexes_with_coordinates["place_id"] = building_complexes_with_coordinates["place_id"].astype(int)
# building_complexes_with_coordinates.to_csv("data/intermediate_results/building_complexes_coordinates.csv")
# building_complexes_with_coordinates

## Dioceses

By connecting to modern municipalities, it is possible to understand in which territorial structures the (former) building complexes are located today. However, the monastery database also contains information about the historical diocese in which the building complexes were located. This information is stored in the table `gs_places` in the column `diocese_id`. Therefore, the locations where monastery locations are located are assigned to a diocese. In FactGrid, we connect the information about the dioceses directly to the building complexes. A building complex has a property [P1003](https://database.factgrid.de/wiki/Item:Q21662) (Diocese), which connects to a diocese item, for example the Archdiocese of Mainz ([Q153230](https://database.factgrid.de/wiki/Item:Q153230)). The historical affiliation of a location to a diocese is a complex phenomenon. On the one hand, this changed over time, especially in border areas. On the other hand, it is also possible that an area that we understand today as a contiguous location was not a contiguous location around 1500 and only partially belonged to a certain diocese. Therefore, we separate the modern territorial localization (statements about the current location of the address) from the historical localization (statements about the affiliation to a diocese).

In [488]:
# Merge gs_places['diocese_id] to existing table
places_selection = dataframes["gs_places"][["id_places", "diocese_id"]]
diocese_urls_selection = dataframes["gs_id_external_urls_diocese"][dataframes["gs_id_external_urls_diocese"]["url_type_id"]==42][["diocese_id", "url_value"]]
diocese_urls_selection
prepared_df = pd.merge(prepared_df, places_selection, how="left", left_on="place_id", right_on="id_places").drop(columns="id_places")
prepared_df = pd.merge(prepared_df, diocese_urls_selection, how="left", left_on="diocese_id", right_on="diocese_id").drop(columns="diocese_id").rename(columns={"url_value":"P1003"})

# Handle dioceses for coordinate duplicates
coord_duplicates = pd.merge(coord_duplicates, places_selection, how="left", left_on="place_id", right_on="id_places").drop(columns="id_places")
coord_duplicates = pd.merge(coord_duplicates, diocese_urls_selection, how="left", left_on="diocese_id", right_on="diocese_id").drop(columns="diocese_id").rename(columns={"url_value":"P1003"})
for index, row in prepared_df.iterrows():
    if row["id_monastery_location"] in coord_duplicates["id_monastery_location"].values:
        x = 0
        for i, r in coord_duplicates[coord_duplicates["gsn_id"] == row["gsn_id"]].iterrows():
            if x == 0:
                x += 1
                continue
            else:
                if not r["P1003"] == row["P1003"]:
                    prepared_df.loc[prepared_df["P48"] == r["P48"], f"P1003.{x}"] = r["P1003"]
                    x += 1
prepared_df

Unnamed: 0,id_monastery_location,place_id,gsn_id,location_begin_tpq,location_begin_taq,location_begin_note,location_end_tpq,location_end_taq,location_end_note,longitude,...,P48,P83,P2,P1301,P1301.1,P1301.2,P1301.3,P1003,P1003.1,P1003.2
0,6051,2001.0,40356,1297.0,,1297 erste Erwähnung,1463.0,,,7.1660075206704485,...,@50.14597836228394/7.1660075206704485,Q83856,Q635758,"""GSMonasteryLocation6051""",,,,Q153244,,
1,6054,6179.0,40359,1434.0,1466.0,Mitte 15. Jahrhundert,1543.0,,,8.211983,...,@49.939573/8.211983,Q87364,Q635758,"""GSMonasteryLocation6054""",,,,Q153230,,
2,6055,7242.0,40360,1493.0,,,1802.0,,,8.294172,...,@49.988671/8.294172,Q10417,Q635758,"""GSMonasteryLocation6055""",,,,Q153230,,
3,6057,11776.0,40362,1235.0,1238.0,1235/1238,1802.0,,,6.632027,...,@49.759165/6.632027,Q10483,Q635758,"""GSMonasteryLocation6057""",,,,Q153244,,
4,6058,11776.0,40363,551.0,600.0,zweite Hälfte 6. Jahrhundert,1288.0,,,6.65873701917281,...,@49.7524770967829/6.65873701917281,Q10483,Q635758,"""GSMonasteryLocation6058""",,,,Q153244,,
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
4394,17876,3425.0,20090,1272.0,1279.0,1272/1279,1782.0,,,7.845306,...,@47.996836/7.845306,Q10354,Q635758,"""GSMonasteryLocation17876""",,,,Q153226,,
4395,17885,46484699.0,11928,1348.0,,,1566.0,,,3.174419,...,@51.180071/3.174419,Q140903,Q635758,"""GSMonasteryLocation17885""",,,,,,
4396,17895,22625.0,12201,1240.0,,,1240.0,,,,...,,Q297848,Q635758,"""GSMonasteryLocation17895""",,,,Q153264,,
4397,17906,6344.0,20734,1423.0,1424.0,1423/1424,1436.0,,vor 1436,9.174574,...,@47.663389/9.174574,Q22566,Q635758,"""GSMonasteryLocation17906""",,,,Q153226,,


In [489]:
prepared_df[prepared_df["P1003"].isna()]

Unnamed: 0,id_monastery_location,place_id,gsn_id,location_begin_tpq,location_begin_taq,location_begin_note,location_end_tpq,location_end_taq,location_end_note,longitude,...,P48,P83,P2,P1301,P1301.1,P1301.2,P1301.3,P1003,P1003.1,P1003.2
152,10318,46481623.0,4708,1276.0,,,1580.0,1581.0,1580/1581,6.515278,...,@53.331111/6.515278,Q1347812,Q635758,"""GSMonasteryLocation10318""",,,,,,
653,15028,46483904.0,7200,1000.0,1100.0,11. Jahrhundert,,,heute,17.59293,...,@52.53656/17.59293,Q80779,Q635758,"""GSMonasteryLocation15028""",,,,,,
897,16039,46484510.0,11079,1135.0,,,1798.0,,,9.236915,...,@45.416027/9.236915,Q1381367,Q635758,"""GSMonasteryLocation16039""",,,,,,
899,16108,46484511.0,11073,1134.0,,,1799.0,,,8.954879,...,@45.351911/8.954879,Q1381368,Q635758,"""GSMonasteryLocation16108""",,,,,,
1647,14623,46483687.0,10151,1101.0,1200.0,12. Jahrhundert,1301.0,1400.0,14. Jahrhundert,8.874287956554907,...,@46.40052215454597/8.874287956554907,Q879450,Q635758,"""GSMonasteryLocation14623""",,,,,,
1648,14625,46483688.0,10153,1607.0,,,,,heute,8.80290845578666,...,@46.47761569893648/8.80290845578666,Q879547,Q635758,"""GSMonasteryLocation14625""",,,,,,
1649,14626,46483686.0,10154,1683.0,,,1841.0,,,8.563285833290928,...,@46.55778380393062/8.563285833290928,Q879630,Q635758,"""GSMonasteryLocation14626""",,,,,,
2208,14960,46483920.0,77925,1392.0,1493.0,zwischen 1392 und 1493,1835.0,1836.0,1835/1836,17.191978973135942,...,@52.80451413090155/17.191978973135942,Q93056,Q635758,"""GSMonasteryLocation14960""",,,,,,
2524,6574,46483312.0,90808,1120.0,1121.0,1120/1121,1789.0,1795.0,um 1790,3.4105555555556,...,@49.547222222222/3.4105555555556,Q128864,Q635758,"""GSMonasteryLocation6574""",,,,,,
2608,11121,46481727.0,30,1152.0,1162.0,um 1157,1188.0,1198.0,vor 1198,16.322931207190326,...,@47.0568658028105/16.322931207190326,Q387125,Q635758,"""GSMonasteryLocation11121""",,,,,,


## External Identifiers
In some instances, the monastery database has listed wikipedia articles that are specifically written about the building complex of a monastery. Where these exist, they should be linked to the building complex item. 

In [490]:
gs_external_url_type_with_factgrid = dataframes["gs_external_url_type_with_factgrid"].dropna(subset="factgrid_property")
url_factgrid = pd.merge(dataframes["gs_external_urls_monastery"], gs_external_url_type_with_factgrid, how="left", left_on="url_type_id", right_on="id_url_type")[["gsn_id", "url_value", "factgrid_property", "url_name_formatter"]].dropna(subset="factgrid_property")
for index, row in url_factgrid.iterrows():
    if row["gsn_id"] in prepared_df["id_gsn"].values and "Wikipedia-Artikel zum Baudenkmal" in row["url_name_formatter"]:
        prepared_df.loc[prepared_df["id_gsn"] == row["gsn_id"], row["factgrid_property"]] = f'\"{row["url_value"]}\"'
prepared_df

Unnamed: 0,id_monastery_location,place_id,gsn_id,location_begin_tpq,location_begin_taq,location_begin_note,location_end_tpq,location_end_taq,location_end_note,longitude,...,P1003,P1003.1,P1003.2,Sdewiki,Snlwiki,Sitwiki,Sfrwiki,Splwiki,Slvwiki,Scswiki
0,6051,2001.0,40356,1297.0,,1297 erste Erwähnung,1463.0,,,7.1660075206704485,...,Q153244,,,,,,,,,
1,6054,6179.0,40359,1434.0,1466.0,Mitte 15. Jahrhundert,1543.0,,,8.211983,...,Q153230,,,,,,,,,
2,6055,7242.0,40360,1493.0,,,1802.0,,,8.294172,...,Q153230,,,,,,,,,
3,6057,11776.0,40362,1235.0,1238.0,1235/1238,1802.0,,,6.632027,...,Q153244,,,,,,,,,
4,6058,11776.0,40363,551.0,600.0,zweite Hälfte 6. Jahrhundert,1288.0,,,6.65873701917281,...,Q153244,,,,,,,,,
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
4394,17876,3425.0,20090,1272.0,1279.0,1272/1279,1782.0,,,7.845306,...,Q153226,,,,,,,,,
4395,17885,46484699.0,11928,1348.0,,,1566.0,,,3.174419,...,,,,,,,,,,
4396,17895,22625.0,12201,1240.0,,,1240.0,,,,...,Q153264,,,,,,,,,
4397,17906,6344.0,20734,1423.0,1424.0,1423/1424,1436.0,,vor 1436,9.174574,...,Q153226,,,,,,,,,


## Sources / References

Every Statement in FactGrid should be supported by a Source/Reference. To achieve this, a source column `S471` is added after each relevant property to link to the Monastery Database Entries using the Property [P471](https://database.factgrid.de/wiki/Property:P471).

In [491]:
final_table = prepared_df.copy()
for colname in ["P48", "P83", "P1003"] + [c for c in final_table.columns.tolist() if c.startswith("P1003.")]:
    final_table.insert(final_table.columns.get_loc(colname)+1, "S471", final_table["gsn_id"].apply(lambda x:f'\"{x}\"'), allow_duplicates=True)
final_table["P131"] = "Q153178"
final_table

Unnamed: 0,id_monastery_location,place_id,gsn_id,location_begin_tpq,location_begin_taq,location_begin_note,location_end_tpq,location_end_taq,location_end_note,longitude,...,P1003.2,S471,Sdewiki,Snlwiki,Sitwiki,Sfrwiki,Splwiki,Slvwiki,Scswiki,P131
0,6051,2001.0,40356,1297.0,,1297 erste Erwähnung,1463.0,,,7.1660075206704485,...,,"""40356""",,,,,,,,Q153178
1,6054,6179.0,40359,1434.0,1466.0,Mitte 15. Jahrhundert,1543.0,,,8.211983,...,,"""40359""",,,,,,,,Q153178
2,6055,7242.0,40360,1493.0,,,1802.0,,,8.294172,...,,"""40360""",,,,,,,,Q153178
3,6057,11776.0,40362,1235.0,1238.0,1235/1238,1802.0,,,6.632027,...,,"""40362""",,,,,,,,Q153178
4,6058,11776.0,40363,551.0,600.0,zweite Hälfte 6. Jahrhundert,1288.0,,,6.65873701917281,...,,"""40363""",,,,,,,,Q153178
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
4394,17876,3425.0,20090,1272.0,1279.0,1272/1279,1782.0,,,7.845306,...,,"""20090""",,,,,,,,Q153178
4395,17885,46484699.0,11928,1348.0,,,1566.0,,,3.174419,...,,"""11928""",,,,,,,,Q153178
4396,17895,22625.0,12201,1240.0,,,1240.0,,,,...,,"""12201""",,,,,,,,Q153178
4397,17906,6344.0,20734,1423.0,1424.0,1423/1424,1436.0,,vor 1436,9.174574,...,,"""20734""",,,,,,,,Q153178


## Finalizing

To finalize, the table is cleaned up and transformed into a variety of formats. Most importantly, you will find the V1-statements to create the new building complex items under `data/results/building_complexes/import_building_complexes.tsv`

In [492]:
from helper_functions import df_to_qs_v1

final_table["id_monastery_location"].to_csv("data/intermediate_results/new_building_complex_locations_ids.csv")

final_table = final_table.drop(columns=["Dde", "note", "building_Lde", "id_monastery_location", "place_id", "gsn_id", "location_begin_tpq", "location_begin_taq", "location_begin_note", "location_end_tpq", "location_end_taq", "location_end_note", "longitude", "latitude", "location_name", "id_gsn", "status", "monastery_name"])
final_table.insert(0, "qid", np.nan)
dup_drop = final_table[final_table.duplicated(subset="P48", keep=False)]
final_table = final_table.drop_duplicates(subset="P48").drop_duplicates(subset="P1301")
final_table.to_excel("data/results/building_complexes/import_building_complexes.xlsx", index=False)
final_table.to_csv("data/results/building_complexes/import_building_complexes.csv", index=False, doublequote=False, quoting=csv.QUOTE_NONE, escapechar="§") #hack to save in Quickstatements-applicable format
with open("data/results/building_complexes/import_building_complexes.csv", "r") as file:
    s = file.read()
with open("data/results/building_complexes/import_building_complexes.csv", "w") as file:
    file.write(s.replace("§", ""))
with open("data/results/building_complexes/import_building_complexes.tsv", "w") as file:
    file.write(df_to_qs_v1(final_table))

final_table

Unnamed: 0,qid,Lde,Len,P48,S471,P83,S471.1,P2,P1301,P1301.1,...,P1003.2,S471.4,Sdewiki,Snlwiki,Sitwiki,Sfrwiki,Splwiki,Slvwiki,Scswiki,P131
0,,"""Gebäudekomplex Augustinerinnenkloster (Martin...","""Building complex of the Augustinian nuns' mon...",@50.14597836228394/7.1660075206704485,"""40356""",Q83856,"""40356""",Q635758,"""GSMonasteryLocation6051""",,...,,"""40356""",,,,,,,,Q153178
1,,"""Gebäudekomplex Franziskanertertiarinnenkloste...","""Building complex of the Franciscans St. Andre...",@49.939573/8.211983,"""40359""",Q87364,"""40359""",Q635758,"""GSMonasteryLocation6054""",,...,,"""40359""",,,,,,,,Q153178
2,,"""Gebäudekomplex Allerheiligenkloster Mainz, We...","""Building complex of the All Saints Monastery ...",@49.988671/8.294172,"""40360""",Q10417,"""40360""",Q635758,"""GSMonasteryLocation6055""",,...,,"""40360""",,,,,,,,Q153178
3,,"""Gebäudekomplex Dominikanerinnenkloster St. Ka...","""Building complex of the Dominican Nuns' monas...",@49.759165/6.632027,"""40362""",Q10483,"""40362""",Q635758,"""GSMonasteryLocation6057""",,...,,"""40362""",,,,,,,,Q153178
4,,"""Gebäudekomplex Frauenkloster St. Martin auf d...","""Building complex of the Women's convent St. M...",@49.7524770967829/6.65873701917281,"""40363""",Q10483,"""40363""",Q635758,"""GSMonasteryLocation6058""",,...,,"""40363""",,,,,,,,Q153178
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
4393,,"""Gebäudekomplex Regelhaus am Graben/Dominikane...","""Building complex of the Regelhaus am Graben/D...",@47.997426/7.844976,"""20251""",Q10354,"""20251""",Q635758,"""GSMonasteryLocation17872""",,...,,"""20251""",,,,,,,,Q153178
4394,,"""Gebäudekomplex Klarissenkloster Freiburg (Fre...","""Building complex of the Clarissine nunnery Fr...",@47.996836/7.845306,"""20090""",Q10354,"""20090""",Q635758,"""GSMonasteryLocation17876""",,...,,"""20090""",,,,,,,,Q153178
4395,,"""Gebäudekomplex Kartäuserinnenkloster Brügge, ...","""Building complex of the Carthusian Nuns' mona...",@51.180071/3.174419,"""11928""",Q140903,"""11928""",Q635758,"""GSMonasteryLocation17885""",,...,,"""11928""",,,,,,,,Q153178
4397,,"""Gebäudekomplex Schwesternsammlung ""im Mäntell...","""Building complex Women's convent ""in the Mänt...",@47.663389/9.174574,"""20734""",Q22566,"""20734""",Q635758,"""GSMonasteryLocation17906""",,...,,"""20734""",,,,,,,,Q153178


## Next steps
As a next step, you should run notebook 2 - Monasteries to create the religious community items that go together with the building complexes. Afterwards you can copy the V1 statements from both, `data/results/building_complexes/import_building_complexes.csv` and `data/results/monasteries/import_monasteries.csv` to Quickstatements and upload.