# Notebook 2: Creating Items for Religious Communities
This notebook implements the second step of the Klosterdatenbank-to-FactGrid-Workflow, which is to create Items for religious communities.

As the name suggests, the "Monastery" is the central unit of the monastery database. In the table `gs_monastery`, each row represents a religious community. This is also reflected in the query options of the web interface of the monastery database. Applying the various filter functions always results in a list of religious communities. In the detail view, all relevant information from linked tables is then displayed. Here, the religious community is always at the center. All further information is displayed in connection with the religious community. By integrating with FactGrid, the query options are expanded. For example, it is now possible to query only building complexes. At the same time, the structure of the data model must be taken into account when querying information, such as the geographical location of a religious community at a specific point in time.

In order to import the religious communities to FactGrid, the following workflow will create labels based on the monastery name and it's translation from Notebook 1a. Other than the monastery locations/building complexes, the religious communities are connected to a series of external identifiers, which will party be transferred to FactGrid.

## Preparation
The notebook requires the following libraries to run. If an error occurs, make sure the libraries are installed on your system.

In [9]:
import pandas as pd
import numpy as np

First, the export files are loaded into [Dataframes](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.html). The dataframes are stored in a dictionary with the keys as the filenames, for easier access.

In [10]:
# Load Access exports
from helper_functions import load_files_from_folder

export_files = load_files_from_folder("data/exports_monasteryDB", "xlsx")

# Create dataframes for each table
dataframes = {key: pd.read_excel(value) for key, value in export_files.items()}

# Add dataframe for monasteries in factGrid (stored in a different directory)
dataframes["building_complexes_in_factgrid"] = pd.read_csv("data/factgrid_data/building_complexes_in_factgrid.csv")
dataframes["monasteries_in_factgrid"] = pd.read_csv("data/factgrid_data/monasteries_in_factgrid.csv")
# Add translation data
dataframes["translated"] = pd.read_csv("data/translation/translated.csv")

The next cell prepares the datasets for the workflow. First, the table `gs_monastery` is filtered for those religious communities that have the status "Online". This means, that the datasets are considered finished and are no longer being actively worked on. Afterwards, only the column `gsn_id` and `monastery_name` are selected. Finally, the column `gsn_id` is filtered against the list of monasteries with monastery database identifiers in FactGrid (`factgrid_data/monasteries_in_factgrid.xlsx`) to make sure that no duplicates are produced.

In [11]:
# Filter for monasteries online
monasteries_online = dataframes["gs_monastery"][dataframes["gs_monastery"]["status"] == "Online"]
# Drop irrelevant columns
prepared_df = monasteries_online[["id_gsn", "monastery_name"]]
prepared_df = prepared_df[~prepared_df["id_gsn"].isin(dataframes["monasteries_in_factgrid"]["KlosterdatenbankID"])]
prepared_df

Unnamed: 0,id_gsn,monastery_name
1,40329,Kapuzinerkloster Wellmich
2,8502,"Kartause Sionsberg, Noordgouwe, Niederlande"
3,60372,Franziskanerreformatenkloster Schrobenhausen
4,972,Kapuzinerkloster Rüthen
5,3790,Zisterzienserinnenkloster Roßleben
6,814,"Kollegiatstift St. Andreas, Verden"
7,8609,"Augustinerchorherrenstift Tirns, Niederlande"
8,3768,Dominikanerinnenkloster Wiederstedt
9,93,"Benediktinerabtei St. Michael, Hildesheim"
11,8478,"Chorfrauen vom Heiligen Grab, Nieuwstadt, Nied..."


## Labels

The label in FactGrid will be the `monastery_name`.

In [12]:
prepared_df["Lde"] = prepared_df["monastery_name"]
prepared_df

Unnamed: 0,id_gsn,monastery_name,Lde
1,40329,Kapuzinerkloster Wellmich,Kapuzinerkloster Wellmich
2,8502,"Kartause Sionsberg, Noordgouwe, Niederlande","Kartause Sionsberg, Noordgouwe, Niederlande"
3,60372,Franziskanerreformatenkloster Schrobenhausen,Franziskanerreformatenkloster Schrobenhausen
4,972,Kapuzinerkloster Rüthen,Kapuzinerkloster Rüthen
5,3790,Zisterzienserinnenkloster Roßleben,Zisterzienserinnenkloster Roßleben
6,814,"Kollegiatstift St. Andreas, Verden","Kollegiatstift St. Andreas, Verden"
7,8609,"Augustinerchorherrenstift Tirns, Niederlande","Augustinerchorherrenstift Tirns, Niederlande"
8,3768,Dominikanerinnenkloster Wiederstedt,Dominikanerinnenkloster Wiederstedt
9,93,"Benediktinerabtei St. Michael, Hildesheim","Benediktinerabtei St. Michael, Hildesheim"
11,8478,"Chorfrauen vom Heiligen Grab, Nieuwstadt, Nied...","Chorfrauen vom Heiligen Grab, Nieuwstadt, Nied..."


As the religious communities should also have an english label, the monastery name is automatically translated. The file `translated.csv` that has been created by notebook 1a - Translation can be reused for this. This means, that if you already ran Notebooks 1 - Building Complexes and 1a - translation completely, you don't have to do anything exept running this notebook to create the monasteries. 

In [13]:
prepared_df = pd.merge(prepared_df, dataframes["translated"], how="left", left_on="Lde", right_on="monastery_Lde")[["id_gsn", "monastery_name", "Lde", "monastery_Len"]].drop_duplicates().rename(columns={"monastery_Len":"Len"})
prepared_df["Lde"] = prepared_df["Lde"].apply(lambda x: f'\"{x}\"')
prepared_df["Len"] = prepared_df["Len"].apply(lambda x: f'\"{x}\"')
prepared_df.drop_duplicates(subset="id_gsn", inplace=True)
prepared_df

Unnamed: 0,id_gsn,monastery_name,Lde,Len
0,40329,Kapuzinerkloster Wellmich,"""Kapuzinerkloster Wellmich""","""Capuchin friary of Wellmich"""
1,8502,"Kartause Sionsberg, Noordgouwe, Niederlande","""Kartause Sionsberg, Noordgouwe, Niederlande""","""Carthusian monastery of Sionsberg, Noordgouwe..."
2,60372,Franziskanerreformatenkloster Schrobenhausen,"""Franziskanerreformatenkloster Schrobenhausen""","""Franciscans of Schrobenhausen"""
3,972,Kapuzinerkloster Rüthen,"""Kapuzinerkloster Rüthen""","""Capuchin friary of Rüthen"""
4,3790,Zisterzienserinnenkloster Roßleben,"""Zisterzienserinnenkloster Roßleben""","""Cistercian nunnery of Roßleben"""
5,814,"Kollegiatstift St. Andreas, Verden","""Kollegiatstift St. Andreas, Verden""","""Collegiate Church St. Andreas, Verden"""
6,8609,"Augustinerchorherrenstift Tirns, Niederlande","""Augustinerchorherrenstift Tirns, Niederlande""","""Canons Regular of St Augustine of Tirns, Neth..."
7,3768,Dominikanerinnenkloster Wiederstedt,"""Dominikanerinnenkloster Wiederstedt""","""Dominican Nuns' monastery of Wiederstedt"""
8,93,"Benediktinerabtei St. Michael, Hildesheim","""Benediktinerabtei St. Michael, Hildesheim""","""Benedictine abbey of St. Michael, Hildesheim"""
9,8478,"Chorfrauen vom Heiligen Grab, Nieuwstadt, Nied...","""Chorfrauen vom Heiligen Grab, Nieuwstadt, Nie...","""Canonesses of the Holy Sepulchre, Nieuwstadt,..."


## Link to Germania Sacra and monastery database

Every religious community is linked to the corresponding ID from the monastery database using the existing Property [P471](https://database.factgrid.de/wiki/Property:P471). Also, to state which research project contributed to the dataset, the project-item "Germania Sacra in FactGrid" ([Q153178](https://database.factgrid.de/wiki/Item:Q153178)) is linked using property [P131](https://database.factgrid.de/wiki/Property:P131).

In [14]:
prepared_df["P471"] = prepared_df["id_gsn"].apply(lambda x: f'\"{x}\"')
prepared_df["P131"] = "Q153178"
prepared_df

Unnamed: 0,id_gsn,monastery_name,Lde,Len,P471,P131
0,40329,Kapuzinerkloster Wellmich,"""Kapuzinerkloster Wellmich""","""Capuchin friary of Wellmich""","""40329""",Q153178
1,8502,"Kartause Sionsberg, Noordgouwe, Niederlande","""Kartause Sionsberg, Noordgouwe, Niederlande""","""Carthusian monastery of Sionsberg, Noordgouwe...","""8502""",Q153178
2,60372,Franziskanerreformatenkloster Schrobenhausen,"""Franziskanerreformatenkloster Schrobenhausen""","""Franciscans of Schrobenhausen""","""60372""",Q153178
3,972,Kapuzinerkloster Rüthen,"""Kapuzinerkloster Rüthen""","""Capuchin friary of Rüthen""","""972""",Q153178
4,3790,Zisterzienserinnenkloster Roßleben,"""Zisterzienserinnenkloster Roßleben""","""Cistercian nunnery of Roßleben""","""3790""",Q153178
5,814,"Kollegiatstift St. Andreas, Verden","""Kollegiatstift St. Andreas, Verden""","""Collegiate Church St. Andreas, Verden""","""814""",Q153178
6,8609,"Augustinerchorherrenstift Tirns, Niederlande","""Augustinerchorherrenstift Tirns, Niederlande""","""Canons Regular of St Augustine of Tirns, Neth...","""8609""",Q153178
7,3768,Dominikanerinnenkloster Wiederstedt,"""Dominikanerinnenkloster Wiederstedt""","""Dominican Nuns' monastery of Wiederstedt""","""3768""",Q153178
8,93,"Benediktinerabtei St. Michael, Hildesheim","""Benediktinerabtei St. Michael, Hildesheim""","""Benedictine abbey of St. Michael, Hildesheim""","""93""",Q153178
9,8478,"Chorfrauen vom Heiligen Grab, Nieuwstadt, Nied...","""Chorfrauen vom Heiligen Grab, Nieuwstadt, Nie...","""Canonesses of the Holy Sepulchre, Nieuwstadt,...","""8478""",Q153178


## External Identifiers

Other than for the building complexes, the monastery database provides a range of external URLs for each religious community to connect to existing databases and online ressources, such as wikipedia or wikidata. Some of these identifiers should also be included in FactGrid. The table `gs_external_urls_monastery` contains a mapping between religious communities, types of external identifiers and the specific identifiers that can be used to find the ressource in the corresponfing system. The table `gs_external_url_type_with_factgrid` contains information on how to resolve the identifiers using base URLs and also contains a mapping between URL types and existing FactGrid Identifiers. There are three different cases to consider: A URL type can be linked in FactGrid using a property that was introduced by the community such as the GND-ID ([P76](https://database.factgrid.de/wiki/Property:P76)). Links to other communities within the Wiki-Infrastructure are handled using Sitelinks. For Quickstatements they would be referenced by the letter "S" followed by the short name for the wiki-project. For example, to state the Q-Number for an item in Wikidata, the command would be `Swikidatawiki`, with the `S` standing for "sitelink" and `wikidatawiki` the short name for Wikidata. Third, there are also cases in which an Identifier has no correspondant Property in FactGrid. In these cases, the information is omitted. However, it can always be referenced using the monastery database's original interface which is linked in the references of each statement and via the corresponding identifier-property in FactGrid. 

The following cell processes the information on external URLs and adds them to the table. Please not that in order for the Sitelinks to be imported correctly, the final CSV table has to be processed with the function `df_to_qs_v1` from `helper_functions.py` as done in the cells below.

In [15]:
gs_external_url_type_with_factgrid = dataframes["gs_external_url_type_with_factgrid"].dropna(subset="factgrid_property")
url_factgrid = pd.merge(dataframes["gs_external_urls_monastery"], gs_external_url_type_with_factgrid, how="left", left_on="url_type_id", right_on="id_url_type")[["gsn_id", "url_value", "factgrid_property"]].dropna(subset="factgrid_property")
for index, row in url_factgrid.iterrows():
    if row["gsn_id"] in prepared_df["id_gsn"].values:
        prepared_df.loc[prepared_df["id_gsn"] == row["gsn_id"], row["factgrid_property"]] = f'\"{row["url_value"]}\"'
prepared_df

Unnamed: 0,id_gsn,monastery_name,Lde,Len,P471,P131,P76,Swikidatawiki,Sdewiki,P378,Snlwiki
0,40329,Kapuzinerkloster Wellmich,"""Kapuzinerkloster Wellmich""","""Capuchin friary of Wellmich""","""40329""",Q153178,,"""Q107762382""",,,
1,8502,"Kartause Sionsberg, Noordgouwe, Niederlande","""Kartause Sionsberg, Noordgouwe, Niederlande""","""Carthusian monastery of Sionsberg, Noordgouwe...","""8502""",Q153178,,"""Q90352106""",,,
2,60372,Franziskanerreformatenkloster Schrobenhausen,"""Franziskanerreformatenkloster Schrobenhausen""","""Franciscans of Schrobenhausen""","""60372""",Q153178,"""4797225-7""","""Q1776002""","""Kloster_Schrobenhausen""",,
3,972,Kapuzinerkloster Rüthen,"""Kapuzinerkloster Rüthen""","""Capuchin friary of Rüthen""","""972""",Q153178,"""4353568-9""","""Q1728822""","""Kapuzinerkloster_Rüthen""",,
4,3790,Zisterzienserinnenkloster Roßleben,"""Zisterzienserinnenkloster Roßleben""","""Cistercian nunnery of Roßleben""","""3790""",Q153178,"""4612129-8""","""Q28977873""",,,
5,814,"Kollegiatstift St. Andreas, Verden","""Kollegiatstift St. Andreas, Verden""","""Collegiate Church St. Andreas, Verden""","""814""",Q153178,,"""Q28979546""",,,
6,8609,"Augustinerchorherrenstift Tirns, Niederlande","""Augustinerchorherrenstift Tirns, Niederlande""","""Canons Regular of St Augustine of Tirns, Neth...","""8609""",Q153178,,"""Q2081188""",,,"""Thabor_(klooster)"""
7,3768,Dominikanerinnenkloster Wiederstedt,"""Dominikanerinnenkloster Wiederstedt""","""Dominican Nuns' monastery of Wiederstedt""","""3768""",Q153178,"""4622528-6""","""Q107760965""",,,
8,93,"Benediktinerabtei St. Michael, Hildesheim","""Benediktinerabtei St. Michael, Hildesheim""","""Benedictine abbey of St. Michael, Hildesheim""","""93""",Q153178,"""4374853-3""","""Q97727118""",,,
9,8478,"Chorfrauen vom Heiligen Grab, Nieuwstadt, Nied...","""Chorfrauen vom Heiligen Grab, Nieuwstadt, Nie...","""Canonesses of the Holy Sepulchre, Nieuwstadt,...","""8478""",Q153178,,,,,


## Finalizing
Before linking the monasteries to their orders and building complexes, they need to have a Q-number. Load the table that is created below into Quickstatements and save the created monasteries as specified.



In [16]:
from helper_functions import df_to_qs_v1
monastery_upload = prepared_df.drop(columns=["id_gsn", "monastery_name"])
monastery_upload.insert(0, "qid", np.nan)
monastery_upload.to_excel("data/results/monasteries/import_monasteries.xlsx")
monastery_upload.to_csv("data/results/monasteries/import_monasteries.csv")
with open("data/results/monasteries/import_monasteries.tsv", "w") as file:
    file.write(df_to_qs_v1(monastery_upload))
monastery_upload

Lde "Kapuzinerkloster Wellmich"
Len "Capuchin friary of Wellmich"
P471 "40329"
P131 Q153178
Swikidatawiki "Q107762382"
Lde "Kartause Sionsberg, Noordgouwe, Niederlande"
Len "Carthusian monastery of Sionsberg, Noordgouwe, Netherlands"
P471 "8502"
P131 Q153178
Swikidatawiki "Q90352106"
Lde "Franziskanerreformatenkloster Schrobenhausen"
Len "Franciscans of Schrobenhausen"
P471 "60372"
P131 Q153178
P76 "4797225-7"
Swikidatawiki "Q1776002"
Sdewiki "Kloster_Schrobenhausen"
Lde "Kapuzinerkloster Rüthen"
Len "Capuchin friary of Rüthen"
P471 "972"
P131 Q153178
P76 "4353568-9"
Swikidatawiki "Q1728822"
Sdewiki "Kapuzinerkloster_Rüthen"
Lde "Zisterzienserinnenkloster Roßleben"
Len "Cistercian nunnery of Roßleben"
P471 "3790"
P131 Q153178
P76 "4612129-8"
Swikidatawiki "Q28977873"
Lde "Kollegiatstift St. Andreas, Verden"
Len "Collegiate Church St. Andreas, Verden"
P471 "814"
P131 Q153178
Swikidatawiki "Q28979546"
Lde "Augustinerchorherrenstift Tirns, Niederlande"
Len "Canons Regular of St Augustine 

Unnamed: 0,qid,Lde,Len,P471,P131,P76,Swikidatawiki,Sdewiki,P378,Snlwiki
0,,"""Kapuzinerkloster Wellmich""","""Capuchin friary of Wellmich""","""40329""",Q153178,,"""Q107762382""",,,
1,,"""Kartause Sionsberg, Noordgouwe, Niederlande""","""Carthusian monastery of Sionsberg, Noordgouwe...","""8502""",Q153178,,"""Q90352106""",,,
2,,"""Franziskanerreformatenkloster Schrobenhausen""","""Franciscans of Schrobenhausen""","""60372""",Q153178,"""4797225-7""","""Q1776002""","""Kloster_Schrobenhausen""",,
3,,"""Kapuzinerkloster Rüthen""","""Capuchin friary of Rüthen""","""972""",Q153178,"""4353568-9""","""Q1728822""","""Kapuzinerkloster_Rüthen""",,
4,,"""Zisterzienserinnenkloster Roßleben""","""Cistercian nunnery of Roßleben""","""3790""",Q153178,"""4612129-8""","""Q28977873""",,,
5,,"""Kollegiatstift St. Andreas, Verden""","""Collegiate Church St. Andreas, Verden""","""814""",Q153178,,"""Q28979546""",,,
6,,"""Augustinerchorherrenstift Tirns, Niederlande""","""Canons Regular of St Augustine of Tirns, Neth...","""8609""",Q153178,,"""Q2081188""",,,"""Thabor_(klooster)"""
7,,"""Dominikanerinnenkloster Wiederstedt""","""Dominican Nuns' monastery of Wiederstedt""","""3768""",Q153178,"""4622528-6""","""Q107760965""",,,
8,,"""Benediktinerabtei St. Michael, Hildesheim""","""Benedictine abbey of St. Michael, Hildesheim""","""93""",Q153178,"""4374853-3""","""Q97727118""",,,
9,,"""Chorfrauen vom Heiligen Grab, Nieuwstadt, Nie...","""Canonesses of the Holy Sepulchre, Nieuwstadt,...","""8478""",Q153178,,,,,
