In [102]:
import pandas as pd
import numpy as np
import csv
from helper_functions import df_to_qs_v1, query_factgrid, parse_date, process_date_parsing_results, DateType

# Step 4 - Connecting monasteries to their orders and assigning the instance of Statement

This notebook implements the last step of the Klosterdatenbank-to-FactGrid-Workflow which is to connect information on religious order to the newly created items for religious communities.

In the original database, the information about which order a religious community belongs to is stored in the table `gs_monastery_order`. This is a 1:n-relationship, meaning that a religious community can belong to different orders over the course of its existence. As of 2025, there are 108 order designations in the monastery database, although it is not claimed to be comprehensive. The designations were previously published as normed vocabularies as part of earlier research projects and can be accessed through [Dante](https://dante.gbv.de/search?p=407) or [FactGrid](https://tinyurl.com/27slckal). The table `gs_monastery_order` contains not only these standardized order designations, but also other designations that describe the order affiliation of the community, but are not standardized. Examples of this include collective terms such as "Other lay convents/Semi-religious communities (m)", which groups together male lay convents without a more specific designation. Another example is the "evangelical monastery/stift (w)". Evangelical monasteries are not within the scope of interest of the monastery database and are therefore not described in more detail. However, if the information is available that a monastery became an evangelical monastery, this is still recorded, and it is then reflected in the order designation "evangelical monastery/stift".

The designations outside of the normative vocabulary must be caught separately in the workflow. In some cases, the designation can be mapped to an existing FactGrid item. The term "Kanonissen" from the database corresponds to the "Community of Canonesses" in FactGrid [Q1480654](https://database.factgrid.de/wiki/Item:Q1480654). In the case of the canonesses and canons, it is also necessary to express that the community is a collegiate or a women's chapter. Therefore, the value associated with [P2](https://database.factgrid.de/wiki/Property:P2) must also be modified.

## Preparation

The following cell loads all required tables into a dictionary of DataFrames and queries FactGrid to retreive existent Q-Numbers for the religious communities. The table `gs_monastery_oder` contains the mappting between religious communities and orders. `gs_orders` contains information about the orders themselves, including their Q-numbers in FactGrid.

In [103]:
# Load required data into dataframes
dataframes = {}
dataframes["gs_monastery_order"] = pd.read_excel("data/exports_monasteryDB/gs_monastery_order.xlsx")
dataframes["monasteries_in_factgrid"] = query_factgrid("monasteries")
dataframes["monasteries_in_factgrid"]
dataframes["monasteries_in_factgrid"]["item"] = dataframes["monasteries_in_factgrid"]["item"].str.split("/").str[-1]
dataframes["monasteries_in_factgrid"]["KlosterdatenbankID"] = dataframes["monasteries_in_factgrid"]["KlosterdatenbankID"].astype(int)
dataframes["gs_orders"] = pd.read_excel("data/exports_monasteryDB/gs_orders.xlsx")

dataframes["new_monasteries"] = pd.read_csv("data/intermediate_results/new_monasteries_ids.csv")
dataframes["gs_monastery_order"] = dataframes["gs_monastery_order"][dataframes["gs_monastery_order"]["gsn_id"].isin(dataframes["new_monasteries"]["id_gsn"])]

# Merge information on orders, monasteries and FactGrid IDs of newly created items
df = pd.merge(dataframes["gs_monastery_order"], dataframes["monasteries_in_factgrid"], how="left", left_on="gsn_id", right_on="KlosterdatenbankID").dropna(subset="item")
df = pd.merge(df, dataframes["gs_orders"], how="left", left_on="order_id", right_on="id_order")
df

Unnamed: 0,Unnamed: 0_x,id_monastery_order,gsn_id,monastery_status,order_id,order_begin_tpq,order_begin_taq,order_end_tpq,order_end_taq,order_begin_note,...,order_name,order_abbreviation,Symbol,gender,imagefile,comment_order,ID_GSReligiousOrder,lthk,RG_Abkuerzung,FactGridID
0,2,17294,11650,Kloster,28,1215,,1796.0,,,...,Zisterzienserinnen,OCist (w),z,Frauenkloster,Zisterzienserinnen,,28.0,,o. Cist.,Q640839
1,6,17298,11223,Kloster,28,1237,,1797.0,,,...,Zisterzienserinnen,OCist (w),z,Frauenkloster,Zisterzienserinnen,,28.0,,o. Cist.,Q640839
2,15,17308,10810,Kloster,13,1667,1674.0,1785.0,,,...,Dominikaner,OP,Q,Männerkloster,Dominikaner,,13.0,,o. pred.,Q164200
3,32,17327,11674,Stift,1,1381,1432.0,1789.0,1798.0,zwischen 1381 und 1432,...,Augustinerchorherren,CanA,A,Männerkloster,Augustiner_Chorherren,,1.0,,o. s. Aug.,Q172927
4,54,17349,11690,Kloster,16,752,802.0,1034.0,1066.0,vor 802,...,Benediktiner,OSB,B,Männerkloster,Benediktiner,,16.0,,o. s. Ben.,Q164266
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
512,8846,17836,4000,Konvent,12,1774,,1808.0,,,...,Johanniter,OMel,M,Männerkloster,Malteser,,12.0,,s. Johannis Jerus.,Q174766
513,8865,17857,12079,Kommende,12,1784,,1808.0,,,...,Johanniter,OMel,M,Männerkloster,Malteser,,12.0,,s. Johannis Jerus.,Q174766
514,8872,17864,12083,Kommende,12,1784,,1808.0,,,...,Johanniter,OMel,M,Männerkloster,Malteser,,12.0,,s. Johannis Jerus.,Q174766
515,8876,17868,12086,Konvent,23,1596,,1773.0,,,...,Jesuiten,SJ,I,Männerkloster,Jesuiten,,23.0,,,Q160284


### Special Cases in order and instance assignment
This cell contains the aforementioned mapping of special cases in the relation to orders. The dictionary `orden_mapping` assigns Q-numbers to orders in the case where a direct mapping is possible. The `status_mapping` maps the status of a monastery in a certain order relation to a Q-Number that will be linked in a P2-Statemtent. This will be explained below. In `orden_p2_special_cases`, instances where a specific order induces a specific P2-statement are collected.

In [104]:
orden_mapping = {
    35:"Q1480654", # Kanonissen
    34:"Q1480653", # Kanoniker
    7:"Q164255", # Franziskaner (Minoriten/Konventualen/Observanten/Rekollekten)
    112:"Q640846", # sonstige Laienkonvente / Semireligiöse Gemeinschaften (w)
    79: "Q640842",  # sonstige Laienkonvente / Semireligiöse Gemeinschaften (m)
    60: "Q400468", # unbekannt (m)
    113: "Q400468", # unbekannt (w)
    47: "Q174766" # Johanniter-Doppelkommende
}
status_mapping = {
    "Stift":"Q523002",
    "Kloster":"Q141472",
    "Bruderhaus":"Q640846",
    "Domstift":"Q164867",
    "Kommende":"Q395378",
    "Konvent":"Q399022",
    "Schwesternhaus":"Q640846",
    "evangelisches Kloster/Stift":np.nan
}
orden_p2_special_cases = {
    34:"Q160437", # Kanonissen
    35:"Q898116" # Kanoniker
}
double_monasteries = {
    48:["Q164266", "Q1195146"], # Benediktiner-Doppelkloster
    56: ["Q1195144", "Q172927"], # Augustiner-Doppelstift
    59: ["Q1480653", "Q1480654"], # Doppelstift
    117: ["Q1352536", "Q1352539"], # Humiliaten-Doppelkloster
    58: ["Q143840", "Q1195163"], # Prämonstratenser-Doppelstift
    57: ["Q142148", "Q640839"] # Zisterzienser-Doppelkloster

}

## Orders

In this step, the religious order is added to monasteries using the property "Religious order" [P746](https://database.factgrid.de/wiki/Property:P746). The data is filled with the corresponding FactGrid Q-Number. In cases, where no number is given, the cell is filled using the mapping from above.

In [105]:
# Handle double monasteries
for index, row in df.iterrows():
    if df.loc[index,"order_id"] in double_monasteries:
        order_name = row["order_id"]
        df.loc[index, "FactGridID"] = double_monasteries[order_name][0]
        df.loc[len(df)] = df.loc[index].copy()
        df.loc[len(df)-1, "FactGridID"] = double_monasteries[order_name][1]

# Create new DataFrame for export data
export_df = pd.DataFrame()
export_df["qid"] = df["item"]

# Fill column P746 (religious order) with data from Factgrid. Fill remaining empty cells using the mapping from the cell above
export_df["P746"] = df["FactGridID"]
export_df["P746"] = export_df["P746"].fillna(df["order_id"].apply(lambda x: orden_mapping[x] if x in orden_mapping else np.nan))

export_df

Unnamed: 0,qid,P746
0,Q1763280,Q640839
1,Q1763396,Q640839
2,Q1763211,Q164200
3,Q1763521,Q172927
4,Q1763325,Q164266
...,...,...
519,Q1763302,Q1195163
520,Q1763538,Q1195163
521,Q1763283,Q1195146
522,Q1763324,Q1195163


In [106]:
export_df[export_df.duplicated(keep=False)]

Unnamed: 0,qid,P746
2,Q1763211,Q164200
21,Q1763185,Q172927
28,Q1763405,Q172927
29,Q1763405,Q1195144
30,Q1763405,Q1195144
62,Q1763185,Q172927
69,Q1763155,Q1480653
70,Q1763155,Q1480653
97,Q1763345,Q164266
100,Q1763538,Q1195163


If there are still empty cells in column `P746`, they will be shown here together with some further information about the case. This way it can be controlled if there is a FactGrid-Mapping missing (e.g. if new orders are added to the database). There is one special case in which it is expected to have an empty value, which is the case of evangelical monasteries. For those monasteries, no further action is required.

#TODO needs more information

In [107]:
na_orders = export_df[export_df["P746"].isna()]
na_orders = pd.merge(na_orders, df[["item","order_name"]], how="left", left_on="qid", right_on="item")
na_orders

Unnamed: 0,qid,P746,item,order_name
0,Q1763291,,Q1763291,Kanonissen
1,Q1763291,,Q1763291,evangelisches Kloster/Stift (w)
2,Q1763284,,Q1763284,Kanonissen
3,Q1763284,,Q1763284,Benediktinerinnen
4,Q1763284,,Q1763284,evangelisches Kloster/Stift (w)
5,Q1763185,,Q1763185,Augustinerchorherren
6,Q1763185,,Q1763185,Kanoniker
7,Q1763185,,Q1763185,evangelisches Kloster/Stift (m)
8,Q1763185,,Q1763185,Augustinerchorherren
9,Q1763201,,Q1763201,Kanonissen


### Dates

Just like the building complexes, the relationship between religious communities and orders also has a temporal component. The information can be found in `gs_monastery_orders`, in the columns `order_begin_tpq`, `order_begin_taq`, `order_end_tpq`, `order_end_taq`, `order_begin_note` and `order_end_note`. The processing is done the same way as for the building complexes. You can find a detailed explaination in Notebook 3.

In [108]:

# Get relevant data for date parsing
export_df["order_begin_tpq"] = df["order_begin_tpq"]
export_df["order_end_tpq"] = df["order_end_tpq"]
export_df["order_begin_note"] = df["order_begin_note"]
export_df["order_end_note"] = df["order_end_note"]

# Date Parsing
export_df["begin_date_parse_result"] = df["order_begin_note"].apply(lambda x: parse_date(str(x), DateType.BEGIN_DATE))
export_df["end_date_parse_result"] = df["order_end_note"].apply(lambda x: parse_date(str(x), DateType.END_DATE))
export_df['qal787'] = df["order_begin_note"].apply(lambda x: f'\"{x}\"' if not pd.isna(x) else np.nan)
export_df['qal788'] = df["order_end_note"].apply(lambda x: f'\"{x}\"' if not pd.isna(x) else np.nan).apply(lambda x: x if x != "heute" else np.nan)
process_date_parsing_results(export_df, "order")

# Cleanup
export_df = export_df.drop(columns=["order_begin_tpq", "order_end_tpq", "begin_date_parse_result", "end_date_parse_result"])
export_df["S471"] = df["gsn_id"].apply(lambda x: f'\"{x}\"')
export_df.drop(columns={"order_begin_note", "order_end_note"}, inplace=True)
export_df

Unnamed: 0,qid,P746,qal787,qal788,qal49,qal50,qal1124,qal785,qal786,qal1125,qal1123,qal1126,S471
0,Q1763280,Q640839,,,+1215-01-01T00:00:00Z/9/J,+1796-01-01T00:00:00Z/9,,,,,,,"""11650"""
1,Q1763396,Q640839,,,+1237-01-01T00:00:00Z/9/J,+1797-01-01T00:00:00Z/9,,,,,,,"""11223"""
2,Q1763211,Q164200,,,+1667-01-01T00:00:00Z/9,+1785-01-01T00:00:00Z/9,,,,,,,"""10810"""
3,Q1763521,Q172927,"""zwischen 1381 und 1432""","""zwischen 1789 und 1798""",+1381-01-01T00:00:00Z/9/J,+1789-01-01T00:00:00Z/9,,,,,,,"""11674"""
4,Q1763325,Q164266,"""vor 802""","""Mitte 11. Jahrhundert (?)""",,+1050-00-00T00:00:00Z/7/J,+0802-00-00T00:00:00Z/9/J,,,,,,"""11690"""
...,...,...,...,...,...,...,...,...,...,...,...,...,...
519,Q1763302,Q1195163,"""vor 1144/1145""","""um 1243""",,+1243-00-00T00:00:00Z/9/J,+1144-00-00T00:00:00Z/9/J,,Q10,,,,"""30143"""
520,Q1763538,Q1195163,"""nach 1150""","""nach 1247""",,,,,,+1247-00-00T00:00:00Z/9/J,,+1150-00-00T00:00:00Z/9/J,"""3543"""
521,Q1763283,Q1195146,"""vor 1209""","""um 1240""",,+1240-00-00T00:00:00Z/9/J,+1209-00-00T00:00:00Z/9/J,,Q10,,,,"""40141"""
522,Q1763324,Q1195163,"""vor 1200""",,,+1580-01-01T00:00:00Z/9/J,+1200-00-00T00:00:00Z/9/J,,,,,,"""5422"""


The following cell lists shows duplicates of monasteries and orders. These can occur through a variety of cases:
1. A religious community was living under the rule of a certain order more than once during its existence.
2. A monastery turned into a double monastery of the same order. In this case, whichever gender was there alone will be duplicated as it is also present during the time in which the community was a double monasterey.

At the moment, these duplicates are simply dropped from the list to avoid confusion as the would create the "qualifier mish-mash"-Problem in FactGrid. They are stored in a separate file under `data/intermediate_results/double_orders.xlsx` and it is recommended to enter the missing values to FactGrid manually or by using the FactGrid API.

In [109]:
export_df[export_df.duplicated(subset=["qid", "P746"], keep=False)]

Unnamed: 0,qid,P746,qal787,qal788,qal49,qal50,qal1124,qal785,qal786,qal1125,qal1123,qal1126,S471
2,Q1763211,Q164200,,,+1667-01-01T00:00:00Z/9,+1785-01-01T00:00:00Z/9,,,,,,,"""10810"""
21,Q1763185,Q172927,,,+1126-01-01T00:00:00Z/9/J,+1568-01-01T00:00:00Z/9/J,,,,,,,"""73"""
28,Q1763405,Q172927,,"""1145/46""",+1132-01-01T00:00:00Z/9/J,+1145-00-00T00:00:00Z/9/J,,,,,,,"""380"""
29,Q1763405,Q1195144,"""vor 1269""",,,+1542-01-01T00:00:00Z/9/J,+1269-00-00T00:00:00Z/9/J,,,,,,"""380"""
30,Q1763405,Q1195144,"""1145/46""","""vor 1269""",+1145-00-00T00:00:00Z/9/J,,,,,,+1269-00-00T00:00:00Z/9/J,,"""380"""
62,Q1763185,Q172927,,,+1629-01-01T00:00:00Z/9,+1803-01-01T00:00:00Z/9,,,,,,,"""73"""
69,Q1763155,Q1480653,"""1220/1234""","""Ende 16. Jahrhundert""",+1220-00-00T00:00:00Z/9/J,+1583-00-00T00:00:00Z/7,,,,,,,"""3045"""
70,Q1763155,Q1480653,"""18. Jahrhundert""","""während der französischen Revolution""",+1800-00-00T00:00:00Z/7,+1789-01-01T00:00:00Z/9,,,,,,,"""3045"""
97,Q1763345,Q164266,"""spätestens 1245""",,,+1495-01-01T00:00:00Z/9/J,+1245-00-00T00:00:00Z/9/J,,,,,,"""3467"""
100,Q1763538,Q1195163,"""nach 1247""",,,+1548-01-01T00:00:00Z/9/J,,,,,,+1247-00-00T00:00:00Z/9/J,"""3543"""


In [110]:
export_df[export_df.duplicated(subset=["qid", "P746"], keep=False)].to_excel("data/intermediate_results/double_orders.xlsx")
export_df.drop_duplicates(subset=["qid", "P746"], keep=False, inplace=True)

## Instance Of Statement

One of the most important properties in FactGrid is Property [P2](https://database.factgrid.de/wiki/Property:P2) "instance of". This property links an item to a class that serves as a category for the current item. Strictly speaking, there are only items in FactGrid, and no ontologically defined classes. The class structure is modeled through the linking of items with the properties "instance of" and "subproperty of". In the case of the monastery database, the information for this class affiliation is partially represented in the field `monastery_status` in the table `gs_monastery_order`. The assignment of a religious community to one of the classes depends on its order affiliation. This connection is not fully represented in the monastery database. Only a distinction is made between brotherhouse and sisterhouse, convent, abbey, monastery, commandery, convent, and evangelical convent. We use this distinction to integrate the data sets into the classification system of FactGrid, while being aware that we are omitting historical complexity. In cases where no classification is possible, the data sets are classified as "Religious Community", the most general class.

In [111]:
# Assign P2 statement according to dictionary
export_df["P2"] = df["order_name"].apply(lambda x: orden_p2_special_cases[x] if x in orden_p2_special_cases else np.nan)
export_df["P2"].fillna(df["monastery_status"].apply(lambda x: status_mapping[x] if x in status_mapping else "Q704192"), inplace=True)

# Prevent duplicate P2 assignments
p2_duplicates = export_df.duplicated(subset=["qid", "P2"])
export_df.loc[p2_duplicates, "P2"] = np.nan

export_df

The behavior will change in pandas 3.0. This inplace method will never work because the intermediate object on which we are setting values always behaves as a copy.

For example, when doing 'df[col].method(value, inplace=True)', try using 'df.method({col: value}, inplace=True)' or df[col] = df[col].method(value) instead, to perform the operation inplace on the original object.


  export_df["P2"].fillna(df["monastery_status"].apply(lambda x: status_mapping[x] if x in status_mapping else "Q704192"), inplace=True)
 'Q523002' 'Q523002' 'Q141472' 'Q523002' 'Q523002' 'Q141472' 'Q523002'
 'Q141472' 'Q523002' 'Q523002' 'Q141472' 'Q141472' 'Q523002' 'Q523002'
 'Q523002' 'Q141472' 'Q141472' 'Q141472' 'Q523002' 'Q523002' 'Q141472'
 'Q523002' 'Q141472' 'Q523002' 'Q141472' 'Q141472' 'Q399022' 'Q523002'
 'Q640846' 'Q141472' 'Q523002' 'Q141472' 'Q141472' 'Q141472' 'Q141472'
 'Q523002' 'Q141472' 'Q399022' 'Q523002' 'Q523002' 'Q141472' 'Q395378'
 'Q141472' 'Q141472' 'Q141472' 'Q523002' nan nan 'Q14147

Unnamed: 0,qid,P746,qal787,qal788,qal49,qal50,qal1124,qal785,qal786,qal1125,qal1123,qal1126,S471,P2
0,Q1763280,Q640839,,,+1215-01-01T00:00:00Z/9/J,+1796-01-01T00:00:00Z/9,,,,,,,"""11650""",Q141472
1,Q1763396,Q640839,,,+1237-01-01T00:00:00Z/9/J,+1797-01-01T00:00:00Z/9,,,,,,,"""11223""",Q141472
3,Q1763521,Q172927,"""zwischen 1381 und 1432""","""zwischen 1789 und 1798""",+1381-01-01T00:00:00Z/9/J,+1789-01-01T00:00:00Z/9,,,,,,,"""11674""",Q523002
4,Q1763325,Q164266,"""vor 802""","""Mitte 11. Jahrhundert (?)""",,+1050-00-00T00:00:00Z/7/J,+0802-00-00T00:00:00Z/9/J,,,,,,"""11690""",Q141472
5,Q1763284,Q1480654,"""ca.846""",,+841-01-01T00:00:00Z/9/J,+1207-01-01T00:00:00Z/9/J,,,,,,,"""327""",Q523002
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
516,Q1763549,Q174766,,,+1781-01-01T00:00:00Z/9,+1808-01-01T00:00:00Z/9,,,,,,,"""12086""",Q395378
518,Q1763366,Q1195146,"""vor 1220""","""nach 1559""",,,+1220-00-00T00:00:00Z/9/J,,,+1559-00-00T00:00:00Z/9/J,,,"""851""",
521,Q1763283,Q1195146,"""vor 1209""","""um 1240""",,+1240-00-00T00:00:00Z/9/J,+1209-00-00T00:00:00Z/9/J,,Q10,,,,"""40141""",
522,Q1763324,Q1195163,"""vor 1200""",,,+1580-01-01T00:00:00Z/9/J,+1200-00-00T00:00:00Z/9/J,,,,,,"""5422""",


Finally, the data is again exported into the various export formats

In [112]:
export_df.to_excel("data/results/monastery_order_connection/monastery_to_order.xlsx", index=False)
export_df.to_csv("data/results/monastery_order_connection/monastery_to_order.csv", index=False, doublequote=False, quoting=csv.QUOTE_NONE, escapechar="§")
with open("data/results/monastery_order_connection/monastery_to_order.tsv", "w") as file:
    file.write(df_to_qs_v1(export_df))

#TODO Last Step

In [113]:
from helper_functions import query_factgrid

# Get FactGrid Data
monasteries_in_factgrid = query_factgrid("monasteries")
building_complexes_in_factgrid = query_factgrid("building_complexes")

# Cleanup data
monasteries_in_factgrid["item"] = monasteries_in_factgrid["item"].str.split("/").str[-1]
monasteries_in_factgrid.rename(columns={"item":"url_value", "KlosterdatenbankID":"gsn_id"}, inplace=True)
monasteries_in_factgrid["url_type_id"] = 42

building_complexes_in_factgrid
building_complexes_in_factgrid["item"] = building_complexes_in_factgrid["item"].str.split("/").str[-1]
building_complexes_in_factgrid["GSVocabTerm"] = building_complexes_in_factgrid["GSVocabTerm"].str.split("Location").str[-1]
building_complexes_in_factgrid.rename(columns={"item":"factgrid_id", "GSVocabTerm":"id_monastery_location"}, inplace=True)

# Save data
monasteries_in_factgrid.to_csv("data/factgrid_data/monasteries_in_factgrid.csv")
monasteries_in_factgrid.to_excel("data/factgrid_data/monasteries_in_factgrid.xlsx")
building_complexes_in_factgrid.to_csv("data/factgrid_data/building_complexes_in_factgrid.csv")
building_complexes_in_factgrid.to_excel("data/factgrid_data/building_complexes_in_factgrid.xlsx")