<a href="https://colab.research.google.com/github/Cath-Strategic-Tech/adpdx_etl/blob/main/ADPDX_ClergyDB.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>


# Introduction

The following notebook orchestrates the migration of ADPDX Accounts into Salesforce.


# Order of Loading

1. Vicariates
1. Organizations [MANUAL]
1. Religious Parents
1. Religious Communities
1. Religious Superiors
1. Contacts
1. Contact > Register Entries
1. Contact > Education Affiliations [MANUAL]
1. Contact > Ecclesial Affiliations [MANUAL]
1. Affiliations [MANUAL]


# Order of Operations

- Setup Enviro

  - [DONE] UDFs
  - [DONE] Load SF xref data

- ACCOUNTS

  - Extract Source Data
    - [DONE] Load 6 tables into separate dataframes
    - [DONE] Merge into single accounts table
    - [DONE]: Fix the ExternalID so that it references the original table, not the AccRecordType
  - Transform
    - Strip phone numbers
    - Validate email addresses
    - TODO: handle churches that aren't parishes (missions, non-diocesan parishes, etc.)
  - Load
    - [DONE]Vicariates
    - [DONE] Organizations (Parishes, Schools, Newman Centres, Offices)
    - Religious
      - [DONE] Religious Parent accounts
      - [DONE] Religious Communities
      - [DONE] Religious Superiors (Contacts, set AccountID to Rel. Parent)
        - [DONE]: Handle invalid email addresses
        - TODO: Handle duplicate entries
      - TODO: Update Religious Communities with lookup to Rel. Superior
  - TODO: Unit Tests
    - Num of Accounts, by type
    - Spot checking 3-5 account records & field values

- CONTACTS

  - Extract

    - [DONE] Import Contact records
    - TODO: Get Photo directory @soames

  - Analysis

    - [DONE] Check columns & row count (3016)
    - [DONE] Identify unique languages

  - Transform

    - Complete ETL of fields that are more complex (search for TODO)
    - [DONE] Create new df_contact_staging, renaming columns to SF APIs
    - [DONE] Drop columns that don't map to Contact
    - Migrate Languages field (waiting on next package version) @soames
      - TODO: transform `,` to `;` so imports to multi-select list correctly
    - TODO: Concat Mailing Street Address lines into one
    - TODO: Handle Private Addresses: decide if will code changes or NOT use a custom Private Address field.
    - [DONE] Update boolean fields to True/False
    - [DONE] Set Contact Record Type (UDF)
    - [DONE] Validate, drop invalid emails
    - [DONE] Generate ExternalID > 'Archdpdx_External_Id\_\_c'
    - TODO: Preferred Email/Phone > where blank, set a default. Currently, all are getting set to 'Personal' and 'Mobile.'
    - TODO: Ecclesial Status (not mapping correctly)
    - [DONE] DROP columns that haven't been mapped yet

  - Load
    - [DONE] Set JobID to curr_job_id
    - [DONE] Handle character encoding that is geting messed up

- CONTACTS > SPOUSES

- CONTACTS > PHOTOS

- CONTACTS > REGISTER ENTRIES

  - Parse columns into types of Sacraments or Notations
  - For lookups to Celebrants, query SF for contacts, create missing records
  - Generate External ID, apply to df
  - Clean up (remove extra columns, NaNs)
  - Upsert records

- CONTACTS > AFFILIATIONS

  - Map the various Contact fields that are actually Affiliations (start with manual migration)
    - Education/Degrees
    - Minor Orders
    - Religious Vows
    - Candidacy records (should this be another object?)
    - In/Excardination
    - Faculties

- AFFILIATIONS TABLE

  - Extract

    - [DONE] Turn the 'Org Table Name' & 'org Table Link' columns into External ID
    - Map in the Account IDs from SF

  - Transform

    - Parse RecordTypeId
    - Parse Category
    - Map columns to SF field APIs

  - Load


# Setup Enviro


In [1]:
# !conda install -y simple-salesforce
# !conda install -y email_validator
# !conda install -y python-dotenv
# !conda install import-ipynb


In [2]:
# enviro setup

import pandas as pd
import numpy as np

from datetime import datetime
now = datetime.now()

from simple_salesforce import Salesforce

In [3]:
# import environment variables (SF login credentials)
from dotenv import load_dotenv
import os

load_dotenv()

True

In [4]:
# @title Global Variables { run: "auto", vertical-output: true, display-mode: "both" }

target_enviro = "adpdx_devpro" # @param {type:"string"}

# @markdown The `run_upserts` variable controls whether or not upserts to Salesforce are executed when the notebook is run.
run_upserts = "True" # @param ["True", "False"]

In [5]:
# ADPDX dev_pro credentials
adpdx_user = os.getenv('ADPDX_UAT_USER')
print(adpdx_user)
adpdx_pass = os.getenv('ADPDX_UAT_PASS')
print( adpdx_pass)
adpdx_token = os.getenv('ADPDX_UAT_TOKEN')
print(adpdx_token)

# instantiate a SF session object
sf = Salesforce(domain='test', username=adpdx_user, password=adpdx_pass, security_token=adpdx_token)

matt+adpdx@meribahflow.com.uat
CSN?QM3e
97zjin7eBEEHmN5uBLIYWX0J


## UDFs


In [6]:
# General notebook UDFs

import json
import csv
from datetime import datetime
from simple_salesforce import Salesforce

# Job ID Incrementer
def update_job_id(file_name):
    # Open the file in read mode and get the current job ID
    with open(file_name, 'r') as file:
        current_job_id = int(file.readline())

    # Increment the job ID
    new_job_id = current_job_id + 1

    # Open the file in write mode and update the job ID
    with open(file_name, 'w') as file:
        file.write(str(new_job_id))

    # Return the new job ID
    return new_job_id


def concat_columns(df, columns, new_column, separator='_'):
    """
    Concatenates the values from specified columns into a single string
    with the specified separator and populates a new column in the DataFrame.

    Args:
    - df: pandas DataFrame
    - columns: list of column names to concatenate
    - new_column: name of the new column to be created
    - separator: separator to use between concatenated values (default is '_')

    Returns:
    - Updated pandas DataFrame with the new column
    """
    df[new_column] = df[columns].astype(str).apply(lambda x: separator.join(x), axis=1)
    return df


def convert_non_serializables(data):
    """Convert non-serializable objects to serializable formats."""
    for key, value in data.items():
        try:
            if isinstance(value, (datetime, date)):
                data[key] = value.isoformat()
            elif isinstance(value, float) and np.isnan(value):
                data[key] = None
            elif pd.isna(value):
                data[key] = None
            elif isinstance(value, (int, bool, str)):
                data[key] = value
            else:
                data[key] = str(value)  # Convert other types to string
        except Exception as e:
            print(f"Error processing key: {key}, value: {value}, error: {e}")
    return data


In [7]:
# Query, merge data with SF data  

import pandas as pd
from simple_salesforce import Salesforce
from simple_salesforce.exceptions import SalesforceMalformedRequest, SalesforceError

def find_salesforce_record_id(sf, df, column_to_search, sf_object_name, sf_field_name, new_column_name, match_behavior='first'):
    """
    Find Salesforce record IDs for a DataFrame column and add a new column with the Salesforce record IDs.

    Parameters:
    sf (Salesforce): The Salesforce connection instance.
    df (pd.DataFrame): The pandas DataFrame containing data.
    column_to_search (str): The column name in the DataFrame to search against Salesforce.
    sf_object_name (str): The Salesforce object name (e.g., 'Contact').
    sf_field_name (str): The field name in Salesforce to match.
    new_column_name (str): The name for the new DataFrame column to hold Salesforce record IDs.
    match_behavior (str): Behavior when multiple matches found ('first' or 'alert').

    Returns:
    pd.DataFrame: The original DataFrame with the new column containing Salesforce record IDs.

    Example usage:
    df_contact_staging = find_salesforce_record_id(sf, df_contact_staging, 'Link_to_Religious_Community', 'Contact', 'Archdpdx_Migration_Id__c', 'New_Column_Name', match_behavior='alert')

    """
    if column_to_search not in df.columns:
        raise ValueError(f"Column '{column_to_search}' not found in DataFrame.")

    df[new_column_name] = None
    multiple_matches_found = False

    unique_values = df[column_to_search].dropna().unique()
    chunk_size = 1000  # Adjust chunk size as needed

    for start in range(0, len(unique_values), chunk_size):
        chunk_values = unique_values[start:start + chunk_size]
        chunk_values_str = ", ".join([f"'{val}'" for val in chunk_values])

        soql_query = f"SELECT Id, {sf_field_name} FROM {sf_object_name} WHERE {sf_field_name} IN ({chunk_values_str})"
        
        try:
            query_result = sf.query_all(soql_query)
        except SalesforceMalformedRequest as e:
            raise ValueError(f"Malformed request error: {e.content}")
        except SalesforceError as e:
            raise ValueError(f"Salesforce error: {e.content}")

        id_mapping = {}
        for record in query_result['records']:
            key = record[sf_field_name]
            if key in id_mapping:
                multiple_matches_found = True
                if match_behavior == 'first':
                    continue  # Skip subsequent matches if 'first' behavior is selected
            id_mapping[key] = record['Id']

        df[new_column_name] = df[column_to_search].map(id_mapping)

    if multiple_matches_found and match_behavior == 'alert':
        print("Alert: Multiple matches found for some records.")

    return df


def get_recordtype_id(df_recordTypes, developer_name, sobject_type, namespace):
    """
    Retrieves the Record Type ID for a specific Developer Name, SObject Type, and Namespace.

    Parameters:
    df_recordTypes (pd.DataFrame): The DataFrame containing Salesforce Record Types.
    developer_name (str): The DeveloperName to filter by.
    sobject_type (str): The SObjectType to filter by.
    namespace (str): The Namespace to filter by.

    Returns:
    str: The Record Type ID if a match is found, otherwise raises an error.

    Example: 
    religious_recordtype_id = get_recordtype_id(df_sf_recordTypes, 'Religious', 'Account', 'mbfc')
    """
    try:
        recordtype_id = df_recordTypes.loc[
            (df_recordTypes['DeveloperName'] == developer_name) & 
            (df_recordTypes['SobjectType'] == sobject_type) &
            (df_recordTypes['NamespacePrefix'] == namespace),
            'Id'
        ].iloc[0]  # Retrieve the first match
        
        return recordtype_id
    except IndexError:
        raise ValueError(f"No record type found for DeveloperName '{developer_name}', SObjectType '{sobject_type}', and Namespace '{namespace}'")


# Add a Salesforce record ID column to a DataFrame based on matching external ID field values
def add_salesforce_record_ids(sf, dataframe, df_column_name, sf_object_name, sf_external_id_field, new_column_name, chunk_size=1000):
    """
    Add a Salesforce record ID column to a DataFrame based on matching external ID field values.

    Parameters:
    sf (Salesforce): The Salesforce connection instance.
    dataframe (pd.DataFrame): The pandas DataFrame containing data to match.
    df_column_name (str): The column name in the DataFrame to match with Salesforce.
    sf_object_name (str): The Salesforce object name (e.g., 'Contact').
    sf_external_id_field (str): The external ID field in Salesforce to match.
    new_column_name (str): The name for the new DataFrame column to hold Salesforce record IDs.
    chunk_size (int): The number of records to include in each chunk for querying Salesforce.

    Returns:
    pd.DataFrame: The original DataFrame with the new column containing Salesforce record IDs.
    """
    # Ensure the dataframe column name exists in the dataframe
    if df_column_name not in dataframe.columns:
        raise ValueError(f"Column '{df_column_name}' not found in DataFrame.")
    
    # Create a set of unique values from the specified DataFrame column
    unique_values = dataframe[df_column_name].dropna().unique()
    
    id_mapping = {}
    
    # Process the unique values in chunks
    for start in range(0, len(unique_values), chunk_size):
        chunk_values = unique_values[start:start + chunk_size]
        chunk_values_str = ", ".join([f"'{val}'" for val in chunk_values])
        
        soql_query = f"SELECT Id, {sf_external_id_field} FROM {sf_object_name} WHERE {sf_external_id_field} IN ({chunk_values_str})"
        
        try:
            query_result = sf.query_all(soql_query)
        except SalesforceMalformedRequest as e:
            raise ValueError(f"Malformed request error: {e.content}")
        except SalesforceError as e:
            raise ValueError(f"Salesforce error: {e.content}")
        
        # Update the id_mapping with results from the current chunk
        id_mapping.update({record[sf_external_id_field]: record['Id'] for record in query_result['records']})
    
    # Map the Salesforce record IDs to the DataFrame
    dataframe[new_column_name] = dataframe[df_column_name].map(id_mapping)
    
    return dataframe

In [8]:
# Upsert to SF

import pandas as pd
import numpy as np
from simple_salesforce import Salesforce, SalesforceMalformedRequest, SalesforceError
from datetime import datetime, date

# Gets or creates a Diocesan account based on the Account Name
def get_or_create_diocesan_account(sf, account_name):
    """
    Searches for an account by name, returns the ID if found,
    otherwise creates the account with RecordType 'Church' and 'mbfc__Church_Type__c' set to 'Diocese',
    and then returns the new ID.

    Parameters:
    sf (Salesforce): Salesforce connection object
    account_name (str): The name of the account to search for or create

    Returns:
    str: The ID of the found or created account
    """

    # Query for the Record Type ID using the Developer Name 'Church'
    record_type_query = "SELECT Id FROM RecordType WHERE SobjectType = 'Account' AND DeveloperName = 'Church' LIMIT 1"
    record_type_result = sf.query(record_type_query)
    if record_type_result['records']:
        record_type_id = record_type_result['records'][0]['Id']
    else:
        raise ValueError("No RecordType found with DeveloperName 'Church'")

    # Search for the Account by name
    account_query = f"SELECT Id FROM Account WHERE Name = '{account_name}' LIMIT 1"
    account_result = sf.query(account_query)
    
    if account_result['records']:
        # Account found, return the ID
        return account_result['records'][0]['Id']
    else:
        # Account not found, create a new Account
        account_data = {
            'Name': account_name,
            'RecordTypeId': record_type_id,
            'mbfc__Church_Type__c': 'Diocese'
        }
        new_account = sf.Account.create(account_data)
        return new_account['id']
    
    from simple_salesforce import Salesforce

# improved version of the get_or_create_diocesan_account function
def get_or_create_account(sf, account_name, record_type_dev_name, church_type):
    """
    Searches for an account by name, returns the ID if found,
    otherwise creates the account with the specified Record Type and Church Type,
    and then returns the new ID.

    Parameters:
    sf (Salesforce): Salesforce connection object
    account_name (str): The name of the account to search for or create
    record_type_dev_name (str): The developer name of the Record Type to use for creating the account
    church_type (str): The Church Type to set for the new account

    Returns:
    str: The ID of the found or created account

    Example usage: 
    sf = Salesforce(username='your_username', password='your_password', security_token='your_security_token')
    account_id = get_or_create_account(sf, 'Diocese of Calgary', 'Church', 'Diocese')
    print(f"Account ID: {account_id}")
    """

    # Query for the Record Type ID using the provided developer name
    record_type_query = f"SELECT Id FROM RecordType WHERE SobjectType = 'Account' AND DeveloperName = '{record_type_dev_name}' LIMIT 1"
    record_type_result = sf.query(record_type_query)
    if record_type_result['records']:
        record_type_id = record_type_result['records'][0]['Id']
    else:
        raise ValueError(f"No RecordType found with DeveloperName '{record_type_dev_name}'")

    # Search for the Account by name
    account_query = f"SELECT Id FROM Account WHERE Name = '{account_name}' LIMIT 1"
    account_result = sf.query(account_query)
    
    if account_result['records']:
        # Account found, return the ID
        return account_result['records'][0]['Id']
    else:
        # Account not found, create a new Account
        account_data = {
            'Name': account_name,
            'RecordTypeId': record_type_id,
            'mbfc__Church_Type__c': church_type
        }
        new_account = sf.Account.create(account_data)
        return new_account['id']


# def upsert_to_salesforce(sf, dataframe, object_name, external_id_field):
#     """
#     Upsert records to Salesforce from a pandas DataFrame.

#     Parameters:
#     sf (Salesforce): The Salesforce connection instance.
#     dataframe (pd.DataFrame): The pandas DataFrame containing data to upsert.
#     object_name (str): The Salesforce object name (e.g., 'Contact').
#     external_id_field (str): The external ID field used for upserts.
#     """
#     successful_upserts = 0
#     failed_upserts = 0

#     # Replace placeholder values with None in the DataFrame
#     dataframe.replace({None: pd.NA, ' ': None, '': None}, inplace=True)

#     # Convert DataFrame to a list of dictionaries
#     data_to_upsert = dataframe.to_dict(orient='records')

#     for data in data_to_upsert:
#         try:
#             data = convert_non_serializables(data)
#             external_id = data.pop(external_id_field)

#             # Perform upsert using only the External ID
#             response = getattr(sf, object_name).upsert(f'{external_id_field}/{external_id}', data)
#             successful_upserts += 1
#             print(f"Successfully upserted {object_name} with External ID: {external_id}")
#         except SalesforceMalformedRequest as e:
#             failed_upserts += 1
#             print(f"Malformed request error when upserting {object_name} with External ID: {external_id}. Error: {e.content}")
#         except SalesforceError as e:
#             failed_upserts += 1
#             print(f"Salesforce error when upserting {object_name} with External ID: {external_id}. Error: {e.content}")
#         except Exception as e:
#             failed_upserts += 1
#             print(f"Failed to upsert {object_name} with External ID: {external_id}. Error: {e}")

#     print(f"Upsert completed. Successful upserts: {successful_upserts}, Failed upserts: {failed_upserts}")


def upsert_to_salesforce_bulk(sf, dataframe, object_name, external_id_field, results_log_file, batch_size=100):
    """
    Upsert records to Salesforce from a pandas DataFrame using the Bulk API.

    Parameters:
    sf (Salesforce): The Salesforce connection instance.
    dataframe (pd.DataFrame): The pandas DataFrame containing data to upsert.
    object_name (str): The Salesforce object name (e.g., 'Contact').
    external_id_field (str): The external ID field used for upserts.
    results_log_file (str): The file name where the full upsert results will be logged.
    batch_size (int): The number of records to include in each batch.
    """
    successful_upserts = 0
    failed_upserts = 0
    batch_number = 0

    # Replace placeholder values with None in the DataFrame
    dataframe.replace({pd.NA: None, ' ': None, '': None}, inplace=True)

    # Convert DataFrame to a list of dictionaries
    data_to_upsert = dataframe.to_dict(orient='records')

    # Open the results log file in 'write' mode to truncate/overwrite existing data
    with open(results_log_file, 'w') as results_log:
        writer = csv.writer(results_log)
        writer.writerow(['Batch Number', 'Record', 'Success', 'Error'])  # Write the headers

        # Process data in batches
        for i in range(0, len(data_to_upsert), batch_size):
            batch_number += 1
            batch_data = data_to_upsert[i:i + batch_size]
            batch_data = [convert_non_serializables(record) for record in batch_data]

            try:
                # Perform bulk upsert
                response = sf.bulk.__getattr__(object_name).upsert(batch_data, external_id_field=external_id_field)

                for index, res in enumerate(response):
                    if res['success']:
                        successful_upserts += 1
                        writer.writerow([batch_number, json.dumps(batch_data[index]), 'True', ''])
                    else:
                        failed_upserts += 1
                        writer.writerow([batch_number, json.dumps(batch_data[index]), 'False', json.dumps(res['errors'])])

            except SalesforceMalformedRequest as e:
                failed_upserts += len(batch_data)
                writer.writerow([batch_number, '', 'False', f"Malformed request: {e.content}"])
                for record in batch_data:
                    writer.writerow([batch_number, json.dumps(record), 'False', f"Failed record due to malformed request"])

            except SalesforceError as e:
                failed_upserts += len(batch_data)
                writer.writerow([batch_number, '', 'False', f"Salesforce error: {e.content}"])
                for record in batch_data:
                    writer.writerow([batch_number, json.dumps(record), 'False', f"Failed record due to Salesforce error"])

            except Exception as e:
                failed_upserts += len(batch_data)
                writer.writerow([batch_number, '', 'False', f"Unexpected error: {str(e)}"])
                for record in batch_data:
                    writer.writerow([batch_number, json.dumps(record), 'False', f"Failed record due to unexpected error"])

            # Progress monitoring
            print(f"Batch {batch_number} processed: {successful_upserts} successful, {failed_upserts} failed.")

    # Final summary message
    total_records = len(data_to_upsert)
    total_batches = batch_number
    print(f"Upsert completed. Total records processed: {total_records}, Batches: {total_batches}, "
          f"Successful upserts: {successful_upserts}, Failed upserts: {failed_upserts}")

## Extract Salesforce xref data

The following cells downloads all records from the target Salesforce enviro for the following objects:

- RecordTypes
- Users
- Accounts
- Contacts


In [9]:
# Get or create the Diocesan Account and get its ID

# calls old function
# diocesan_account_id = get_or_create_diocesan_account(sf, 'Archdiocese of Portland in Oregon')

# calls new function
diocesan_account_id = get_or_create_account(sf, 'Archdiocese of Portland in Oregon', 'Church', 'Diocese')

print(f"Account ID: {diocesan_account_id}")

Account ID: 001Dx00001HwDsgIAF


In [10]:
# get all ACTIVE SF users

sf_users = sf.query('Select Alias, FirstName, LastName, Username, id from User WHERE IsActive = True')
df_sf_users = pd.DataFrame(sf_users['records'])
df_sf_users = df_sf_users.drop(columns = 'attributes')
df_sf_users.shape

(21, 5)

In [11]:
# get all SF Record Types
get_all_recordTypes = 'Select Id, Name, DeveloperName, sObjecttype, namespaceprefix from RecordType'

# get list of records, add to dataframe
sf_recordTypes = sf.query(get_all_recordTypes)
df_sf_recordTypes = pd.DataFrame(sf_recordTypes['records'])
df_sf_recordTypes = df_sf_recordTypes.drop(columns = 'attributes')

# Create a dictionary mapping 'DeveloperName' to 'Id' for faster lookup
record_types_mapping = df_sf_recordTypes.set_index('DeveloperName')['Id'].to_dict()

df_sf_recordTypes

Unnamed: 0,Id,Name,DeveloperName,SobjectType,NamespacePrefix
0,012Dx0000003p4xIAA,Church,Church,Account,mbfc
1,012Dx0000003p4yIAA,Deanery,Deanery,Account,mbfc
2,012Dx0000003p4zIAA,Group,Group,Account,mbfc
3,012Dx0000003p50IAA,Organization,Organization,Account,mbfc
4,012Dx0000003p51IAA,Property,Property,Account,mbfc
5,012Dx0000003p52IAA,Religious,Religious,Account,mbfc
6,012Dx0000003p53IAA,z) All Types,All_Types,mbfc__Affiliation__c,mbfc
7,012Dx0000003p54IAA,Any,Any,mbfc__Affiliation__c,mbfc
8,012Dx0000003p55IAA,Pastoral Assignments,Assignments_Clergy,mbfc__Affiliation__c,mbfc
9,012Dx0000003p56IAA,Chancery Users,Chancery_Users,mbfc__Affiliation__c,mbfc


In [12]:
# get SF Account
get_all_accounts = 'Select id, Name, RecordTypeId, Type, mbfc__Parish_Code__c, Job_Id__c, Archdpdx_Migration_Id__c from Account'

# get list of records, add to dataframe
sf_accounts = sf.query(get_all_accounts)
df_sf_accounts = pd.DataFrame(sf_accounts['records'])
df_sf_accounts = df_sf_accounts.drop(columns = 'attributes')
df_sf_accounts.shape

(2000, 7)

In [13]:
# get SF Contacts
get_all_contacts = 'Select id, Name, npe01__Type_of_Account__c, RecordTypeId, Archdpdx_Migration_Id__c, CreatedById from Contact'

# get list of records, add to dataframe
sf_contacts = sf.query(get_all_contacts)
df_sf_contacts = pd.DataFrame(sf_contacts['records'])
df_sf_contacts = df_sf_contacts.drop(columns = 'attributes')
df_sf_contacts.shape

(2000, 6)

# ACCOUNTS


## Extract


### Load ArchdPDX csvs as DataFrames

ADPDX data for organizations is held in 6 tables, all of which will be migrated into Salesforce's Accounts object.


In [14]:
df_offices = pd.read_csv('/Users/matthewmartin/Library/CloudStorage/GoogleDrive-matt@meribahflow.com/Shared drives/Clients/ADPDX (Portland)/Data/Clergy DB/reports from clergypdx/Offices.csv', skiprows= lambda x: x in [1])
df_offices["src_table"] = 'Offices'
df_offices["AccountRecordType"] = 'Organization'
df_offices.rename({
    "Common Name": "Name",
    "Name": "Formal_Name__c"
    }, axis="columns", inplace=True)


In [15]:
df_parishes = pd.read_csv('/Users/matthewmartin/Library/CloudStorage/GoogleDrive-matt@meribahflow.com/Shared drives/Clients/ADPDX (Portland)/Data/Clergy DB/reports from clergypdx/Parishes (3).csv', dtype={'Vicariate': 'object', 'Established': 'str', 'Mission Of': 'str'}, skiprows= lambda x: x in [1])
df_parishes["src_table"] = 'Parishes'
df_parishes["AccountRecordType"] = 'Church'
# df_parishes.rename({"Parish Formal Name": "Account Name"}, axis="columns", inplace=True)
df_parishes.rename({
                    "Parish Formal Name": "Formal_Name__c",
                    "Common Name": "Name",
                    'Mission Of': 'Parent_Parish'
                }, axis="columns", inplace=True)


In [16]:
df_religious = pd.read_csv('/Users/matthewmartin/Library/CloudStorage/GoogleDrive-matt@meribahflow.com/Shared drives/Clients/ADPDX (Portland)/Data/Clergy DB/reports from clergypdx/RelCommunities.csv', skiprows= lambda x: x in [1])
df_religious["src_table"] = 'RelCommunities'
df_religious["AccountRecordType"] = 'Religious'
df_religious.rename({
                    "Community Name": "Formal_Name__c",
                    "Common Name": "Name"
                     }, axis="columns", inplace=True)


In [17]:
df_schools = pd.read_csv('/Users/matthewmartin/Library/CloudStorage/GoogleDrive-matt@meribahflow.com/Shared drives/Clients/ADPDX (Portland)/Data/Clergy DB/reports from clergypdx/Schools.csv', skiprows= lambda x: x in [1])
df_schools["src_table"] = 'Schools'
df_schools["AccountRecordType"] = 'Organization'
df_schools.rename({
                    "School Name": "Formal_Name__c",
                    "Common Name": "Name",
                    'Parish Link': 'Parent_Parish'
                    
                    }, axis="columns", inplace=True)

In [18]:
df_schools

Unnamed: 0,Record Number,Name,Formal_Name__c,School City,Archdiocese Assigns Clergy,Parent_Parish,Vicariate Link,Archdiocesan School Code,Grades Provided,Established,...,Mailing Address Zip,Mailing Address Country,Phone,Fax,Email,Web Site,Lat/Long Coordinates Decimal,Google Small Embed URL,src_table,AccountRecordType
0,1,"St. Rose School, Portland",St. Rose School,Portland,Yes,109,0,12-PDXARCS,PK-8,1913.0,...,97213,,503-281-1912,503-281-0554,mrandazzo@strosepdx.org,https://strosepdxschool.org/,"45.5421456,-122.6082186",https://www.google.com/maps/embed?pb=!1m18!1m1...,Schools,Organization
1,2,"St. Cecilia School, Beaverton",St. Cecilia School,Beaverton,Yes,11,0,12-BEACECS,PS-8,,...,97005,,503-644-2619,503-646-4217,,http://www.stceciliaschool.us/,"45.48253044013768,-122.80538544843398",https://www.google.com/maps/embed?pb=!1m18!1m1...,Schools,Organization
2,3,"Cathedral School, Portland",Cathedral School,Portland,Yes,101,0,12-PDXCATS,PK-8,1896.0,...,97209,,503-275-9370,503-275-9378,info@cathedral-or.org,http://www.cathedral-or.org/,"45.52425670989584,-122.6882098411886",https://www.google.com/maps/embed?pb=!1m18!1m1...,Schools,Organization
3,5,"St. Francis of Assisi School, Banks (Roy)",St. Francis of Assisi School,Banks (Roy),Yes,121,0,12-ROYFRAS,PS-8,1912.0,...,97106,,503-324-2182,503-324-7032,sfa.royschool@gmail.com,https://www.sfa-roy.org/,"45.59540773259601,-123.08112804819821",https://www.google.com/maps/embed?pb=!1m18!1m1...,Schools,Organization
4,6,"Holy Trinity School, Beaverton",Holy Trinity School,Beaverton,Yes,9,0,12-BEAHOLS,PK-8,1963.0,...,97005,,503-644-5748,503-643-4475,sdummer@htsch.org,https://www.htsch.org/,"45.5064461553052,-122.8167416472551",https://www.google.com/maps/embed?pb=!1m18!1m1...,Schools,Organization
5,8,"Valley Catholic Elementary School, Beaverton",Valley Catholic Elementary School,Beaverton,No,0,0,12-BEAVALE,K-5,,...,97078,,503-718-6500,503-718-6520,yayesiga@valleycatholic.org,https://www.valleycatholic.org/,"45.48728744411335,-122.83094091801642",https://www.google.com/maps/embed?pb=!1m14!1m8...,Schools,Organization
6,9,"Valley Catholic Middle School, Beaverton",Valley Catholic Middle School,Beaverton,No,0,0,12-BEAVALJ,6-8,,...,97078,,503-718-6500,503-718-6520,jgfroerer@valleycatholic.org,https://www.valleycatholic.org/,"45.48728744411335,-122.83094091801642",https://www.google.com/maps/embed?pb=!1m14!1m8...,Schools,Organization
7,10,"Valley Catholic High School, Beaverton",Valley Catholic High School,Beaverton,No,0,0,12-BEAVALH,9-12,,...,97078,,503-644-3745,503-646-4054,dierardi@valleycatholic.org,https://www.valleycatholic.org/,"45.48728744411335,-122.83094091801642",https://www.google.com/maps/embed?pb=!1m14!1m8...,Schools,Organization
8,11,"Blanchet Catholic School, Salem",Blanchet Catholic School,Salem,Yes,0,0,12-SALBLAS,6-12,,...,97301,,503-391-2639,503-399-1259,info@blanchetcatholicschool.com,http://www.blanchetcatholicschool.com/,"44.95192007539606,-122.97951704820905",https://www.google.com/maps/embed?pb=!1m18!1m1...,Schools,Organization
9,12,"Central Catholic High School, Portland",Central Catholic High School,Portland,Yes,0,0,12-PDXCENS,9-12,1939.0,...,97214,,503-235-3138,503-233-0073,info@centralcatholichigh.org,https://www.centralcatholichigh.org/,"45.52003173763356,-122.64242704819948",https://www.google.com/maps/embed?pb=!1m18!1m1...,Schools,Organization


In [19]:
df_vicariates = pd.read_csv('/Users/matthewmartin/Library/CloudStorage/GoogleDrive-matt@meribahflow.com/Shared drives/Clients/ADPDX (Portland)/Data/Clergy DB/reports from clergypdx/Vicariates.csv', skiprows= lambda x: x in [1])
df_vicariates["src_table"] = 'Vicariates'
df_vicariates["AccountRecordType"] = 'Deanery'
# As we want to designate the Common Name as what will be the Account Name in Salesforce, we are renaming these columns in a different pattern than prior CSVs.
df_vicariates.rename({"Common Name": "Name"}, axis="columns", inplace=True)
df_vicariates

Unnamed: 0,Record Number,Name,Vicariate Name,Archdiocese Assigns Clergy,src_table,AccountRecordType
0,1,Albany-Corvallis Vicariate,Albany-Corvallis,Yes,Vicariates,Deanery
1,2,"Beaverton, Suburban Vicariate","Beaverton, Suburban",Yes,Vicariates,Deanery
2,3,Columbia County Vicariate,Columbia County,Yes,Vicariates,Deanery
3,4,Downtown Portland Vicariate,Downtown Portland,Yes,Vicariates,Deanery
4,5,"East Portland, Suburban Vicariate","East Portland, Suburban",Yes,Vicariates,Deanery
5,6,Marion County Vicariate,Marion County,Yes,Vicariates,Deanery
6,7,Metropolitan Eugene Vicariate,Metropolitan Eugene,Yes,Vicariates,Deanery
7,8,Metropolitan Salem Vicariate,Metropolitan Salem,Yes,Vicariates,Deanery
8,9,North Coast Vicariate,North Coast,Yes,Vicariates,Deanery
9,10,Northeast Portland Vicariate,Northeast Portland,Yes,Vicariates,Deanery


In [20]:
df_newman = pd.read_csv('/Users/matthewmartin/Library/CloudStorage/GoogleDrive-matt@meribahflow.com/Shared drives/Clients/ADPDX (Portland)/Data/Clergy DB/reports from clergypdx/NewmanCenters.csv', skiprows= lambda x: x in [1])
df_newman["src_table"] = 'NewmanCenters'
df_newman["AccountRecordType"] = 'Organization'
df_newman.rename({
                    "Newman Center Name": "Formal_Name__c",
                    "Common Name": "Name",
                    "Newman Center City": "Mailing Address City2"
                  }, axis="columns", inplace=True)


Each of the 6 tables has an overlapping but distinct set of columns, making it challenging to conform these tables into a single staging table.

In addition, columns that correspond to the same field in salesforce are named differently in each table (eg. 'Parish City' vs. 'Religious City' vs. 'Newman Center City')


In [21]:
print('TABLE: (ROWS, COLUMNS)\n')

print(f'Offices:    {df_offices.shape}')
print(f'Parishes:   {df_parishes.shape}')
print(f'Religious:  {df_religious.shape}')
print(f'Schools:    {df_schools.shape}')
print(f'Vicariates: {df_vicariates.shape}')
print(f'Newman Ctr: {df_newman.shape}')

TABLE: (ROWS, COLUMNS)

Offices:    (35, 18)
Parishes:   (151, 45)
Religious:  (70, 34)
Schools:    (56, 26)
Vicariates: (18, 6)
Newman Ctr: (4, 37)


### Merge DFs into a single Accounts DF

This step takes 6 different tables and combines them into a single Accounts table for cleaning and staging.


In [22]:
# init list of DataFrames
src_accounts = [df_offices, df_parishes, df_religious, df_schools, df_vicariates, df_newman]

# concats the various Account dataframes into one large table
accounts = pd.concat(src_accounts, ignore_index=True)

In [23]:
accounts.columns

Index(['Record Number', 'Name', 'Formal_Name__c', 'Archdiocese Assigns Clergy',
       'Locator Description', 'Mailing Address', 'Mailing Address 2',
       'Mailing Address City', 'Mailing Address State',
       'Mailing Address Province', 'Mailing Address Postal Code',
       'Mailing Address Country', 'Phone', 'Fax', 'Email', 'Web Site',
       'src_table', 'AccountRecordType', 'Sort Name', 'Parish Name',
       'Parish City', 'Parent_Parish', 'Established', 'Vicariate', 'Non-Latin',
       'County', 'Disabled Access', 'Sanctuary Capacity',
       'Lat/Long Coordinates Decimal', 'Google Small Embed URL',
       'Miles to Pastoral Center', 'Schedule 1 Head', 'Schedule 1 Text',
       'Schedule 2 Head', 'Schedule 2 Text', 'Schedule 3 Head',
       'Schedule 3 Text', 'Schedule 4 Head', 'Schedule 4 Text',
       'Schedule 5 Head', 'Schedule 5 Text', 'Schedule 6 Head',
       'Schedule 6 Text', 'Schedule 7 Head', 'Schedule 7 Text',
       'Community City', 'Order Full Name', 'Order Commo

In [24]:
# Check to see that 'Parish Link' from Schools table made into Parent_Parish field
accounts[accounts.src_table == 'Schools'].Parent_Parish

256    109
257     11
258    101
259    121
260      9
261      0
262      0
263      0
264      0
265      0
266      0
267      0
268      0
269      0
270      0
271      0
272      0
273      0
274     33
275    149
276     38
277     42
278     46
279     54
280     58
281     59
282     61
283     62
284     73
285     75
286     77
287    154
288    145
289    135
290    126
291    125
292    124
293    123
294    102
295    114
296    113
297    107
298     78
299     79
300     87
301      0
302     93
303     96
304     98
305      0
306      0
307      0
308      0
309      0
310      0
311    147
Name: Parent_Parish, dtype: object

## Transform


Time to do some table column renaming and re-organizing!


In [25]:
# renames columns headers to consolidate account names into SF-conformed data model
accounts.rename({"Common Name": "Name, City"}, axis="columns", inplace=True)

accounts.rename(
    columns={
        # 'Account Name': 'Name',
        'Mailing Address': 'BillingStreet1',
        'Mailing Address 2': 'BillingStreet2',
        'Mailing Address City': 'BillingCity',
        'Mailing Address State': 'BillingState',
        'Mailing Address Postal Code': 'BillingPostalCode',
        'Mailing Address Country': 'BillingCountry',
        'Email': 'mbfc__Email__c',
        'Web Site': 'Website',
        'Order Common Name': 'mbfc__Abbreviation__c',
        'Order Letters': 'mbfc__Religious_Suffix__c',
        'Men or Women': 'mbfc__Type_Members__c',
        'Archdiocese Assigns Clergy': 'Archdiocese_Assigns_Clergy__c',
        'Locator Description': 'Locator_Description__c',
        'Established': 'mbfc__Date_Established__c',
        'County': 'County__c',
        'Disabled Access': 'Disabled_Access__c',
        'Sanctuary Capacity': 'Sanctuary_Capacity__c',
        'Miles to Pastoral Centre': 'Miles_to_Pastoral_Centre__c',
        'Archdiocesan School Code': 'Archdiocesan_School_Code__c',
        'Grades Provided': 'Grades_Provided__c'

    },
    inplace=True
)


# reorder column order
col = accounts.pop('Name')
accounts.insert(2, col.name, col)

col = accounts.pop('Parish Name')
accounts.insert(3, col.name, col)

col = accounts.pop('AccountRecordType')
accounts.insert(1, col.name, col)



In [26]:
accounts.columns

Index(['Record Number', 'AccountRecordType', 'Formal_Name__c', 'Name',
       'Parish Name', 'Archdiocese_Assigns_Clergy__c',
       'Locator_Description__c', 'BillingStreet1', 'BillingStreet2',
       'BillingCity', 'BillingState', 'Mailing Address Province',
       'BillingPostalCode', 'BillingCountry', 'Phone', 'Fax', 'mbfc__Email__c',
       'Website', 'src_table', 'Sort Name', 'Parish City', 'Parent_Parish',
       'mbfc__Date_Established__c', 'Vicariate', 'Non-Latin', 'County__c',
       'Disabled_Access__c', 'Sanctuary_Capacity__c',
       'Lat/Long Coordinates Decimal', 'Google Small Embed URL',
       'Miles to Pastoral Center', 'Schedule 1 Head', 'Schedule 1 Text',
       'Schedule 2 Head', 'Schedule 2 Text', 'Schedule 3 Head',
       'Schedule 3 Text', 'Schedule 4 Head', 'Schedule 4 Text',
       'Schedule 5 Head', 'Schedule 5 Text', 'Schedule 6 Head',
       'Schedule 6 Text', 'Schedule 7 Head', 'Schedule 7 Text',
       'Community City', 'Order Full Name', 'mbfc__Abbreviat

In [27]:
accounts[accounts.BillingStreet2.isna() == False]

Unnamed: 0,Record Number,AccountRecordType,Formal_Name__c,Name,Parish Name,Archdiocese_Assigns_Clergy__c,Locator_Description__c,BillingStreet1,BillingStreet2,BillingCity,...,Major Superior Phone,Major Superior Email,School City,Vicariate Link,Archdiocesan_School_Code__c,Grades_Provided__c,Mailing Address 1,Mailing Address Zip,Vicariate Name,Mailing Address City2
14,32,Organization,Diaconate Office,Diaconate Office,,Yes,,Pastoral Center,2838 E Burnside St,Portland,...,,,,,,,,,,
32,58,Organization,Office of Marketing and Communications,Office of Marketing and Communications,,Yes,,Pastoral Center,2838 E Burnside St,Portland,...,,,,,,,,,,
35,1,Church,"Our Lady of Perpetual Help, St Mary’s","Our Lady of Perpetual Help, St Mary’s, Albany",,Yes,SW Ellsworth St between 8th and 9th Streets,"Our Lady of Perpetual Help, St Mary’s Parish",815 Broadalbin St SW,Albany,...,,,,,,,,,,
36,2,Church,St. Andrew Dũng-Lạc,"St. Andrew Dũng-Lạc Mission, Aloha",,No,SW Grabhorn Rd/209th Ave and Farmington Rd,St. Andrew Dũng-Lạc Mission,7390 SW Grabhorn Rd,Aloha,...,,,,,,,,,,
37,3,Church,St. Elizabeth Ann Seton,"St. Elizabeth Ann Seton, Aloha",,Yes,,St. Elizabeth Ann Seton Parish,3145 SW 192nd Ave,Aloha,...,,,,,,,,,,
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
236,62,Religious,Work of Jesus the High Priest,"Work of Jesus the High Priest, Gresham (OJSS)",,No,,OJSS Community,451 NW 1st St,Gresham,...,,,,,,,,,,
238,64,Religious,Heralds of the Good News,"Heralds of the Good News, Portland (HGN)",,No,,c/o Chancellor,2838 E Burnside St,Portland,...,+91 80 74 51 02 67,rkappumkal@gmail.com,,,,,,,,
239,65,Religious,Missionary Oblates of Mary Immaculate,"Missionary Oblates of Mary Immaculate, Rome, I...",,No,,Missionary Oblates of Mary Immaculate,Via Aurelia 290,Roma,...,,gensec@omigen.org,,,,,,,,
247,73,Religious,Brothers of Saint John,"Brothers of Saint John, Laredo, TX (CSJ)",,No,,St. John Priory,505 Century Dr S,Laredo,...,,,,,,,,,,


In [28]:
# merge two Non-Latin columns into one 
accounts['Non_Latin__c'] = accounts['Non-Latin'].combine_first(accounts['Non-Latin Rite']) 

# Rename the 'Non_Latin__c' field to 'mbfc__Non_Latin__c'
accounts.rename(columns={'Non_Latin__c': 'mbfc__Non_Latin__c'}, inplace=True)


In [29]:
# export merged tables DESCRIPTION to CSV for mapping
accounts.describe(include='all').transpose().to_csv(f'/Users/matthewmartin/Library/CloudStorage/GoogleDrive-matt@meribahflow.com/Shared drives/Clients/ADPDX (Portland)/Data/Clergy DB/working/accounts.csv')
accounts.describe(include='all').transpose()

Unnamed: 0,count,unique,top,freq,mean,std,min,25%,50%,75%,max
Record Number,334.0,,,,54.5,41.389801,1.0,21.25,45.0,76.75,173.0
AccountRecordType,334,4,Church,151,,,,,,,
Formal_Name__c,316,273,St. Mary,5,,,,,,,
Name,334,334,Pastoral Center,1,,,,,,,
Parish Name,5,5,St. Anne,1,,,,,,,
...,...,...,...,...,...,...,...,...,...,...,...
Mailing Address 1,56,55,4420 SW St Marys Dr,2,,,,,,,
Mailing Address Zip,56.0,,,,97222.446429,124.9586,97005.0,97134.75,97217.5,97301.0,97526.0
Vicariate Name,18,18,Albany-Corvallis,1,,,,,,,
Mailing Address City2,4,4,Corvallis,1,,,,,,,


In [30]:
# Create a single BillingAddress field

# Concatenate the two columns with CHAR(10) as separator
accounts['BillingStreet'] = accounts[['BillingStreet1', 'BillingStreet2']].apply(lambda x: '\n'.join(x.dropna()), axis=1)

# Drop the original columns
accounts.drop(columns=['BillingStreet1', 'BillingStreet2'], inplace=True)

In [31]:
# Handle boolean fields

boolean_columns_to_convert = [
    'Archdiocese_Assigns_Clergy__c', 
    'mbfc__Non_Latin__c', 
    'Disabled_Access__c', 
    ]

# Convert 'Yes'/'No' to True/False
accounts[boolean_columns_to_convert] = accounts[boolean_columns_to_convert].replace({'Yes': True, 'No': False, None: False})



In [32]:
accounts[boolean_columns_to_convert].sample(10)

Unnamed: 0,Archdiocese_Assigns_Clergy__c,mbfc__Non_Latin__c,Disabled_Access__c
180,True,False,True
233,False,False,False
6,False,False,False
84,True,False,True
55,True,False,True
307,False,False,False
190,False,False,False
223,False,False,False
229,False,False,False
100,True,False,True


In [33]:
# Religious Order fields > conform to new data model

# Apply logic to create new columns
accounts['Religious_Secular_Order__c'] = accounts.apply(
    lambda x: 'Religious Order' if x['Religious Order'] == 'Yes' else ('Secular Order' if x['Secular Order'] == 'Yes' else None), axis=1
)

accounts['Pontifical_or_Diocesan_Order__c'] = accounts.apply(
    lambda x: 'Diocesan Order' if x['Diocesan Order'] == 'Yes' else ('Pontifical Order' if x['Pontifical Order'] == 'Yes' else None), axis=1
)

accounts.drop(columns=['Religious Order', 'Secular Order', 'Diocesan Order', 'Pontifical Order'], inplace=True)

In [34]:
print(accounts['mbfc__Date_Established__c'].dtype)

object


In [35]:
# Handle Date fields that are only YYYY

# Ensure all values in 'mbfc__Date_Established__c' are strings
accounts['mbfc__Date_Established__c'] = accounts['mbfc__Date_Established__c'].astype(str)

# Define a function to transform valid year values
def transform_year(year):
    if pd.notna(year) and year.replace('.', '', 1).isdigit() and len(year.split('.')[0]) == 4:
        return pd.to_datetime(year.split('.')[0] + '-01-01')
    else:
        return pd.NaT

# Apply the function to the 'mbfc__Date_Established__c' column
accounts['mbfc__Date_Established__c'] = accounts['mbfc__Date_Established__c'].apply(transform_year)


In [36]:
accounts['mbfc__Date_Established__c'].sample(10)

56           NaT
296          NaT
196          NaT
164   1921-01-01
256   1913-01-01
124   1876-01-01
65    1955-01-01
61    1969-01-01
136   1954-01-01
333          NaT
Name: mbfc__Date_Established__c, dtype: datetime64[ns]

In [37]:
accounts[accounts.src_table == 'Schools'].Parent_Parish

256    109
257     11
258    101
259    121
260      9
261      0
262      0
263      0
264      0
265      0
266      0
267      0
268      0
269      0
270      0
271      0
272      0
273      0
274     33
275    149
276     38
277     42
278     46
279     54
280     58
281     59
282     61
283     62
284     73
285     75
286     77
287    154
288    145
289    135
290    126
291    125
292    124
293    123
294    102
295    114
296    113
297    107
298     78
299     79
300     87
301      0
302     93
303     96
304     98
305      0
306      0
307      0
308      0
309      0
310      0
311    147
Name: Parent_Parish, dtype: object

In [38]:
# Format Parent_Parish field

# Remove instances of '0'
accounts.Parent_Parish = accounts.Parent_Parish.apply(lambda x: '' if x == 0 else x)


In [39]:
# Append prefix
accounts['Parent_Parish'] = accounts['Parent_Parish'].apply(lambda x: 'Parishes_' + str(x) if pd.notna(x) and x != '' else x)

In [40]:
# Check final results, in particular the 'Schools' records
accounts.Parent_Parish[(accounts.Parent_Parish.isna() == False) & (accounts["src_table"] == "Schools")].sample(10)

268                
301                
264                
299     Parishes_79
256    Parishes_109
258    Parishes_101
300     Parishes_87
273                
266                
277     Parishes_42
Name: Parent_Parish, dtype: object

In [41]:
# Replace Parent_Parish unique ids with SF records
add_salesforce_record_ids(sf, accounts, "Parent_Parish", "Account", "Archdpdx_Migration_Id__c", "Parent_Parish__c", 10 )

Unnamed: 0,Record Number,AccountRecordType,Formal_Name__c,Name,Parish Name,Archdiocese_Assigns_Clergy__c,Locator_Description__c,BillingCity,BillingState,Mailing Address Province,...,Grades_Provided__c,Mailing Address 1,Mailing Address Zip,Vicariate Name,Mailing Address City2,mbfc__Non_Latin__c,BillingStreet,Religious_Secular_Order__c,Pontifical_or_Diocesan_Order__c,Parent_Parish__c
0,1,Organization,Pastoral Center,Pastoral Center,,True,,Portland,OR,,...,,,,,,False,2838 E Burnside St,,,
1,3,Organization,Catholic Sentinel,Catholic Sentinel,,False,,Portland,OR,,...,,,,,,False,2838 E Burnside St,,,
2,4,Organization,Catholic Cemeteries,Catholic Cemeteries,,False,,Portland,OR,,...,,,,,,False,333 SW Skyline Blvd,,,
3,6,Organization,Griffin Center,Griffin Center,,False,,Milwaukie,OR,,...,,,,,,False,11957 SE Fuller Rd,,,
4,11,Organization,Providence Portland Medical Center,Providence Portland Medical Center,,True,,Portland,OR,,...,,,,,,False,4805 NE Glisan St,,,
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
329,18,Deanery,,Yamhill County Vicariate,,True,,,,,...,,,,Yamhill County,,False,,,,
330,1,Organization,OSU Newman Center,"OSU Newman Center, Corvallis",,False,,Corvallis,OR,,...,,,,,Corvallis,False,2127 NW Monroe Ave,,,
331,2,Organization,St. Thomas More (UO) Newman Center,"St. Thomas More (UO) Newman Center, Eugene",,False,,Eugene,OR,,...,,,,,Eugene,False,1850 Emerald St,,,
332,3,Organization,Walsh Memorial (SOU) Newman Center at Our Lady...,Walsh Memorial (SOU) Newman Center at Our Lady...,,True,,Ashland,OR,,...,,,,,Ashland,False,987 Hillview Dr,,,


In [42]:
# ParentID field

accounts['ParentId'] = accounts['Parent_Parish__c']

# Verify results
accounts[accounts.Parent_Parish__c.isna() == False]


Unnamed: 0,Record Number,AccountRecordType,Formal_Name__c,Name,Parish Name,Archdiocese_Assigns_Clergy__c,Locator_Description__c,BillingCity,BillingState,Mailing Address Province,...,Mailing Address 1,Mailing Address Zip,Vicariate Name,Mailing Address City2,mbfc__Non_Latin__c,BillingStreet,Religious_Secular_Order__c,Pontifical_or_Diocesan_Order__c,Parent_Parish__c,ParentId
36,2,Church,St. Andrew Dũng-Lạc,"St. Andrew Dũng-Lạc Mission, Aloha",,False,SW Grabhorn Rd/209th Ave and Farmington Rd,Aloha,OR,,...,,,,,False,St. Andrew Dũng-Lạc Mission\n7390 SW Grabhorn Rd,,,001Dx00001HwDzoIAF,001Dx00001HwDzoIAF
38,4,Church,St. Peter the Fisherman,"St. Peter the Fisherman Mission, Arch Cape",,True,79441 Hwy 101 S,Seaside,OR,,...,,,,,False,St. Peter the Fisherman Mission\nPO Box 29,,,001Dx00001HwE0XIAV,001Dx00001HwE0XIAV
45,13,Church,Holy Trinity,"Holy Trinity Mission, Brownsville",,True,W Blakely Ave and French St,Brownsville,OR,,...,,,,,False,Holy Trinity Mission\n104 W Blakely Ave,,,001Dx00001HwE0jIAF,001Dx00001HwE0jIAF
47,15,Church,St. Patrick of the Forest,"St. Patrick of the Forest Mission, Cave Junction",,True,407 W River St,Cave Junction,OR,,...,,,,,False,St. Patrick of the Forest Mission\n407 W River St,,,001Dx00001HwDzHIAV,001Dx00001HwDzHIAV
49,17,Church,St. John the Baptist,"St. John the Baptist Mission, Clatskanie",,True,100 SW High St,Rainier,OR,,...,,,,,False,St. John the Baptist Mission\nPO Box 340,,,001Dx00001HwE0JIAV,001Dx00001HwE0JIAV
50,18,Church,St. Joseph,"St. Joseph Mission, Cloverdale",,True,34560 Parkway Dr,Cloverdale,OR,,...,,,,,False,St. Joseph Mission\nPO Box 9,,,001Dx00001HwE0lIAF,001Dx00001HwE0lIAF
56,25,Church,St. Philip Benizi,"St. Philip Benizi Mission, Creswell",,True,552 Holbrook Ln,Cottage Grove,OR,,...,,,,,False,St. Philip Benizi Mission\n1025 N 19th St,,,001Dx00001HwDz5IAF,001Dx00001HwDz5IAF
58,27,Church,St. Martin de Porres,"St. Martin de Porres Mission, Dayton",,True,407 Ferry St,Yamhill,OR,,...,,,,,False,St. Martin de Porres Mission\nPO Box 580,,,001Dx00001HwE0tIAF,001Dx00001HwE0tIAF
59,28,Church,St. Henry,"St. Henry Mission, Dexter",,True,38925 Dexter Rd,Dexter,OR,,...,,,,,False,St. Henry Mission\nPO Box 65,,,001Dx00001HwDzeIAF,001Dx00001HwDzeIAF
69,39,Church,Holy Family,"Holy Family Mission, Glendale",,True,243 Marshall St,Myrtle Creek,OR,,...,,,,,False,Holy Family Mission\nPO Box 810,,,001Dx00001HwDzZIAV,001Dx00001HwDzZIAV


### AccountRecordType & ChurchType


In [43]:
#Sets all rows where AccountRecordType is Church as a Parish.
accounts.loc[accounts['AccountRecordType'] == 'Church', 'mbfc__Church_Type__c'] = 'Parish'
accounts[accounts['AccountRecordType'] == 'Church'].head(5)


Unnamed: 0,Record Number,AccountRecordType,Formal_Name__c,Name,Parish Name,Archdiocese_Assigns_Clergy__c,Locator_Description__c,BillingCity,BillingState,Mailing Address Province,...,Mailing Address Zip,Vicariate Name,Mailing Address City2,mbfc__Non_Latin__c,BillingStreet,Religious_Secular_Order__c,Pontifical_or_Diocesan_Order__c,Parent_Parish__c,ParentId,mbfc__Church_Type__c
35,1,Church,"Our Lady of Perpetual Help, St Mary’s","Our Lady of Perpetual Help, St Mary’s, Albany",,True,SW Ellsworth St between 8th and 9th Streets,Albany,OR,,...,,,,False,"Our Lady of Perpetual Help, St Mary’s Parish\n...",,,,,Parish
36,2,Church,St. Andrew Dũng-Lạc,"St. Andrew Dũng-Lạc Mission, Aloha",,False,SW Grabhorn Rd/209th Ave and Farmington Rd,Aloha,OR,,...,,,,False,St. Andrew Dũng-Lạc Mission\n7390 SW Grabhorn Rd,,,001Dx00001HwDzoIAF,001Dx00001HwDzoIAF,Parish
37,3,Church,St. Elizabeth Ann Seton,"St. Elizabeth Ann Seton, Aloha",,True,,Aloha,OR,,...,,,,False,St. Elizabeth Ann Seton Parish\n3145 SW 192nd Ave,,,,,Parish
38,4,Church,St. Peter the Fisherman,"St. Peter the Fisherman Mission, Arch Cape",,True,79441 Hwy 101 S,Seaside,OR,,...,,,,False,St. Peter the Fisherman Mission\nPO Box 29,,,001Dx00001HwE0XIAV,001Dx00001HwE0XIAV,Parish
39,5,Church,Our Lady of the Mountain,"Our Lady of the Mountain, Ashland",,True,,Ashland,OR,,...,,,,False,Our Lady of the Mountain Parish\n987 Hillview Dr,,,,,Parish


### Generate ExternalId


In [44]:
# Generate an External ID
columns_to_concate = ['src_table', 'Record Number']
accounts = concat_columns(accounts, columns_to_concate, 'Archdpdx_Migration_Id__c', separator='_')

In [45]:
# set Deanery RecordTypeId to the Church RecordTypeId
# map in RecordTypeIds
accounts['RecordTypeId'] = accounts['AccountRecordType'].map(record_types_mapping)
record_types_mapping

{'Church': '012Dx0000003p4xIAA',
 'Deanery': '012Dx0000003p4yIAA',
 'Group': '012Dx0000003p4zIAA',
 'Organization': '012Hu000001pkqEIAQ',
 'Property': '012Dx0000003p51IAA',
 'Religious': '012Dx0000003p5KIAQ',
 'All_Types': '012Dx0000003p53IAA',
 'Any': '012Dx0000003p54IAA',
 'Assignments_Clergy': '012Dx0000003p55IAA',
 'Chancery_Users': '012Dx0000003p56IAA',
 'Clergy_Religious_Residence': '012Dx0000003p57IAA',
 'Diocean_Users': '012Dx0000003p58IAA',
 'Diocesan_Appointment': '012Dx0000003p59IAA',
 'Ecclesial_Affiliation': '012Dx0000003p5AIAQ',
 'Education': '012Dx0000003p5BIAQ',
 'Lay_Person': '012Dx0000003p5HIAQ',
 'Ministerial_Status': '012Dx0000003p5DIAQ',
 'Parish_Affiliations': '012Dx0000003p5EIAQ',
 'Staff': '012Dx0000003p5FIAQ',
 'Consecrated': '012Dx0000003p5GIAQ',
 'Permanent_Deacon': '012Dx0000003p5IIAQ',
 'Priest': '012Dx0000003p5JIAQ',
 'MajorGift': '012Hu000001pkqBIAQ',
 'Grant': '012Hu000001pkqCIAQ',
 'HH_Account': '012Hu000001pkqDIAQ',
 'Donation': '012Hu000001pkqFIAQ',
 

## Load


### Generate a new Job ID


In [46]:
# increment to the job_id
file_name = '/Users/matthewmartin/Library/CloudStorage/GoogleDrive-matt@meribahflow.com/Shared drives/Clients/ADPDX (Portland)/Data/Clergy DB/jobs/job_id'
curr_job_id = update_job_id(file_name)
print(f"New job ID: {curr_job_id}")

# add/update account DF with job_id
accounts["Job_Id__c"] = curr_job_id


New job ID: 124


### A) Vicariates


In [47]:
# Get Account Group RecordTypeID
deanery_recordTypeId = df_sf_recordTypes.loc[
    (df_sf_recordTypes['DeveloperName'] == 'Deanery') & (df_sf_recordTypes['SobjectType'] == 'Account'),
    'Id'
    ].iloc[0]  # Use .iloc[0] to get the first item if you're expecting exactly one match


# Insert Vicariates holding account
vicariate_account = sf.Account.upsert('Archdpdx_Migration_Id__c/Vicariates_Holding_Acc',
    {
    "Name": "Vicariates",
    "ParentId": diocesan_account_id,
    "mbfc__Diocese__c": diocesan_account_id,
    "RecordTypeId": deanery_recordTypeId,
    # "mbfc__Group_Type__c": 'Office',
    "Job_Id__c": curr_job_id
    }
)

# Get Vicariate Holding Acc's SF ID (as an upsert doesn't return the actual record ID)
vicariate_account = sf.Account.get_by_custom_id('Archdpdx_Migration_Id__c', 'Vicariates_Holding_Acc')
vicariate_account_id = vicariate_account['Id']

vicariate_account_id

'001Dx00001HwDuDIAV'

In [48]:
# Prepare Vicariates staging DF

vicariates = accounts[accounts['AccountRecordType'] == 'Deanery']


vicariates = vicariates[[
    'Record Number',
    'Name',
    # 'AccountRecordType',
    'Job_Id__c',
    'Archdpdx_Migration_Id__c',
    'RecordTypeId'
    ]]

# add parentid
vicariates["mbfc__Diocese__c"] = diocesan_account_id
vicariates['ParentId'] = vicariate_account_id
# vicariates['mbfc__Church_Type__c'] = 'Deanery'
vicariates['RecordTypeId'] = deanery_recordTypeId

vicariates.rename(columns={
        # 'Name, City': 'Name',
        'External_Id': 'Archdpdx_Migration_Id__c'
    }, inplace=True)

vicariates.reset_index()
vicariates.set_index('Record Number', inplace=True)

vicariates

Unnamed: 0_level_0,Name,Job_Id__c,Archdpdx_Migration_Id__c,RecordTypeId,mbfc__Diocese__c,ParentId
Record Number,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
1,Albany-Corvallis Vicariate,124,Vicariates_1,012Dx0000003p4yIAA,001Dx00001HwDsgIAF,001Dx00001HwDuDIAV
2,"Beaverton, Suburban Vicariate",124,Vicariates_2,012Dx0000003p4yIAA,001Dx00001HwDsgIAF,001Dx00001HwDuDIAV
3,Columbia County Vicariate,124,Vicariates_3,012Dx0000003p4yIAA,001Dx00001HwDsgIAF,001Dx00001HwDuDIAV
4,Downtown Portland Vicariate,124,Vicariates_4,012Dx0000003p4yIAA,001Dx00001HwDsgIAF,001Dx00001HwDuDIAV
5,"East Portland, Suburban Vicariate",124,Vicariates_5,012Dx0000003p4yIAA,001Dx00001HwDsgIAF,001Dx00001HwDuDIAV
6,Marion County Vicariate,124,Vicariates_6,012Dx0000003p4yIAA,001Dx00001HwDsgIAF,001Dx00001HwDuDIAV
7,Metropolitan Eugene Vicariate,124,Vicariates_7,012Dx0000003p4yIAA,001Dx00001HwDsgIAF,001Dx00001HwDuDIAV
8,Metropolitan Salem Vicariate,124,Vicariates_8,012Dx0000003p4yIAA,001Dx00001HwDsgIAF,001Dx00001HwDuDIAV
9,North Coast Vicariate,124,Vicariates_9,012Dx0000003p4yIAA,001Dx00001HwDsgIAF,001Dx00001HwDuDIAV
10,Northeast Portland Vicariate,124,Vicariates_10,012Dx0000003p4yIAA,001Dx00001HwDsgIAF,001Dx00001HwDuDIAV


#### Export Vicariates to CSV


In [49]:
# export to CSV
vicariates.to_csv('/Users/matthewmartin/Library/CloudStorage/GoogleDrive-matt@meribahflow.com/Shared drives/Clients/ADPDX (Portland)/Data/Clergy DB/staging/vicariates_staging.csv')


#### Upsert Vicariates


In [50]:
bulk_data = []
for row in vicariates.itertuples(index=False):
    d = row._asdict()
    # del d['Index']
    bulk_data.append(d)

if run_upserts == 'True':
    vicariate_upsert = sf.bulk.Account.upsert(data=bulk_data, external_id_field='Archdpdx_Migration_Id__c', batch_size=100, use_serial=False)
    upserts = pd.DataFrame(vicariate_upsert)

    print(upserts)
    

    success  created                  id errors
0      True    False  001Dx00001HwDwnIAF     []
1      True    False  001Dx00001HwDwoIAF     []
2      True    False  001Dx00001HwDwpIAF     []
3      True    False  001Dx00001HwDwqIAF     []
4      True    False  001Dx00001HwDwrIAF     []
5      True    False  001Dx00001HwDwsIAF     []
6      True    False  001Dx00001HwDwtIAF     []
7      True    False  001Dx00001HwDwuIAF     []
8      True    False  001Dx00001HwDwvIAF     []
9      True    False  001Dx00001HwDwwIAF     []
10     True    False  001Dx00001HwDwxIAF     []
11     True    False  001Dx00001HwDwyIAF     []
12     True    False  001Dx00001HwDwzIAF     []
13     True    False  001Dx00001HwDx0IAF     []
14     True    False  001Dx00001HwDx1IAF     []
15     True    False  001Dx00001HwDx2IAF     []
16     True    False  001Dx00001HwDx3IAF     []
17     True    False  001Dx00001HwDx4IAF     []


In [51]:
# Generate an Errors log
import csv

keys = vicariate_upsert[0].keys()

with open('results_files/vicariate_results', 'w', newline='') as csv_file:
    writer = csv.DictWriter(csv_file, keys)
    writer.writeheader()
    writer.writerows(vicariate_upsert)

In [52]:
# Get Vicariate records from SF

sf_deaneries = sf.query("SELECT Archdpdx_Migration_Id__c, Id FROM Account WHERE RecordType.DeveloperName = 'Deanery'")

df_sf_deaneries = pd.DataFrame(sf_deaneries['records'])
df_sf_deaneries = df_sf_deaneries.drop(columns = 'attributes')

df_sf_deaneries

# Creates a dict of Vicariate unique ids to the new Salesforce record IDs, so can populate on latter Account records
vicariate_sf_recordids = df_sf_deaneries.set_index('Archdpdx_Migration_Id__c')['Id'].to_dict()
vicariate_sf_recordids

{'Vicariates_Holding_Acc': '001Dx00001HwDuDIAV',
 'Vicariates_1': '001Dx00001HwDwnIAF',
 'Vicariates_2': '001Dx00001HwDwoIAF',
 'Vicariates_3': '001Dx00001HwDwpIAF',
 'Vicariates_4': '001Dx00001HwDwqIAF',
 'Vicariates_5': '001Dx00001HwDwrIAF',
 'Vicariates_6': '001Dx00001HwDwsIAF',
 'Vicariates_7': '001Dx00001HwDwtIAF',
 'Vicariates_8': '001Dx00001HwDwuIAF',
 'Vicariates_9': '001Dx00001HwDwvIAF',
 'Vicariates_10': '001Dx00001HwDwwIAF',
 'Vicariates_11': '001Dx00001HwDwxIAF',
 'Vicariates_12': '001Dx00001HwDwyIAF',
 'Vicariates_13': '001Dx00001HwDwzIAF',
 'Vicariates_14': '001Dx00001HwDx0IAF',
 'Vicariates_15': '001Dx00001HwDx1IAF',
 'Vicariates_16': '001Dx00001HwDx2IAF',
 'Vicariates_17': '001Dx00001HwDx3IAF',
 'Vicariates_18': '001Dx00001HwDx4IAF'}

### B) Parishes, Schools, Organizations


In [53]:
# Create acc_main (accounts excluding Deaneries (already handled) and Religious (to be handled differently, after))
acc_main = accounts[accounts['AccountRecordType'] != 'Deanery']
acc_main = acc_main[acc_main['AccountRecordType'] != 'Religious']

acc_main.loc[acc_main['AccountRecordType'] == 'Church', 'Vicariate_Ext_Id'] = 'Vicariates_' + acc_main['Vicariate']

In [54]:
acc_main.sample(5)

Unnamed: 0,Record Number,AccountRecordType,Formal_Name__c,Name,Parish Name,Archdiocese_Assigns_Clergy__c,Locator_Description__c,BillingCity,BillingState,Mailing Address Province,...,BillingStreet,Religious_Secular_Order__c,Pontifical_or_Diocesan_Order__c,Parent_Parish__c,ParentId,mbfc__Church_Type__c,Archdpdx_Migration_Id__c,RecordTypeId,Job_Id__c,Vicariate_Ext_Id
61,30,Church,St. Jude,"St. Jude, Eugene",,True,,Eugene,OR,,...,St. Jude Parish\n4330 Willamette St,,,,,Parish,Parishes_30,012Dx0000003p4xIAA,124,Vicariates_7
127,98,Church,St. John Fisher,"St. John Fisher, Portland",,True,SW 45th Ave and Nevada St,Portland,OR,,...,St. John Fisher Parish\n4567 SW Nevada St,,,,,Parish,Parishes_98,012Dx0000003p4xIAA,124,Vicariates_2
57,26,Church,St. Philip,"St. Philip, Dallas",,True,,Dallas,OR,,...,St. Philip Parish\n825 SW Mill St,,,,,Parish,Parishes_26,012Dx0000003p4xIAA,124,Vicariates_8
295,42,Organization,St. Thomas More Catholic School,"St. Thomas More Catholic School, Portland",,True,,Portland,OR,,...,,,,001Dx00001HwE0IIAV,001Dx00001HwE0IIAV,,Schools_42,012Hu000001pkqEIAQ,124,
136,107,Church,St. Pius X,"St. Pius X, Portland",,True,,Portland,OR,,...,St. Pius X Parish\n1280 NW Saltzman Rd,,,,,Parish,Parishes_107,012Dx0000003p4xIAA,124,Vicariates_2


In [55]:
# map in Deaneries
acc_main['mbfc__Deanery__c'] = acc_main.Vicariate_Ext_Id.map(vicariate_sf_recordids)

acc_main[acc_main['AccountRecordType'] == 'Church']['mbfc__Deanery__c']

35     001Dx00001HwDwnIAF
36     001Dx00001HwDwzIAF
37     001Dx00001HwDx2IAF
38     001Dx00001HwDwvIAF
39     001Dx00001HwDx1IAF
              ...        
181    001Dx00001HwDwrIAF
182    001Dx00001HwDx3IAF
183    001Dx00001HwDwsIAF
184    001Dx00001HwDx4IAF
185    001Dx00001HwDwtIAF
Name: mbfc__Deanery__c, Length: 151, dtype: object

In [56]:
# Clean up NaN values

acc_main.fillna('', inplace=True)

In [57]:
# Generate Schedule text from all Schedule columns

def create_account_schedule(row):
    account_schedule = []
    for i in range(1, 8):
        head_col = f'Schedule {i} Head'
        text_col = f'Schedule {i} Text'
        
        head = row[head_col]
        text = row[text_col]
        
        if pd.notnull(head) or pd.notnull(text):
            if pd.notnull(head):
                account_schedule.append(f"<p><strong>{head}</strong></p>")
            if pd.notnull(text):
                account_schedule.append(f"<p>{text}</p>")
            account_schedule.append("<p><br></p>")
    
    # Join all parts into a single string
    return "".join(account_schedule).strip()

acc_main['mbfc__Mass_Times__c'] = acc_main.apply(create_account_schedule, axis=1)



In [58]:
acc_main['mbfc__Mass_Times__c'].sample(15)

332    <p><strong>Sunday Mass</strong></p><p>5:00 pm ...
101    <p><strong>Weekend Mass</strong></p><p>Sat 5:3...
333    <p><strong>Weekend Mass</strong></p><p>Sunday ...
117    <p><strong>Weekend Mass</strong></p><p>Saturda...
257    <p><strong></strong></p><p></p><p><br></p><p><...
29     <p><strong></strong></p><p></p><p><br></p><p><...
287    <p><strong></strong></p><p></p><p><br></p><p><...
261    <p><strong></strong></p><p></p><p><br></p><p><...
57     <p><strong>Weekend Mass</strong></p><p>Saturda...
298    <p><strong></strong></p><p></p><p><br></p><p><...
138    <p><strong>Weekend Mass</strong></p><p>Saturda...
307    <p><strong></strong></p><p></p><p><br></p><p><...
63     <p><strong>Weekend Mass</strong></p><p>Saturda...
89     <p><strong>Weekend Mass</strong></p><p>Saturda...
96     <p><strong>Weekend Mass</strong></p><p>Sat 4:0...
Name: mbfc__Mass_Times__c, dtype: object

In [59]:
acc_main

Unnamed: 0,Record Number,AccountRecordType,Formal_Name__c,Name,Parish Name,Archdiocese_Assigns_Clergy__c,Locator_Description__c,BillingCity,BillingState,Mailing Address Province,...,Pontifical_or_Diocesan_Order__c,Parent_Parish__c,ParentId,mbfc__Church_Type__c,Archdpdx_Migration_Id__c,RecordTypeId,Job_Id__c,Vicariate_Ext_Id,mbfc__Deanery__c,mbfc__Mass_Times__c
0,1,Organization,Pastoral Center,Pastoral Center,,True,,Portland,OR,,...,,,,,Offices_1,012Hu000001pkqEIAQ,124,,,<p><strong></strong></p><p></p><p><br></p><p><...
1,3,Organization,Catholic Sentinel,Catholic Sentinel,,False,,Portland,OR,,...,,,,,Offices_3,012Hu000001pkqEIAQ,124,,,<p><strong></strong></p><p></p><p><br></p><p><...
2,4,Organization,Catholic Cemeteries,Catholic Cemeteries,,False,,Portland,OR,,...,,,,,Offices_4,012Hu000001pkqEIAQ,124,,,<p><strong></strong></p><p></p><p><br></p><p><...
3,6,Organization,Griffin Center,Griffin Center,,False,,Milwaukie,OR,,...,,,,,Offices_6,012Hu000001pkqEIAQ,124,,,<p><strong></strong></p><p></p><p><br></p><p><...
4,11,Organization,Providence Portland Medical Center,Providence Portland Medical Center,,True,,Portland,OR,,...,,,,,Offices_11,012Hu000001pkqEIAQ,124,,,<p><strong></strong></p><p></p><p><br></p><p><...
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
311,58,Organization,Resurrection Catholic Parish School,"Resurrection Catholic Parish School, Tualatin",,True,,Tualatin,OR,,...,,001Dx00001HwE0mIAF,001Dx00001HwE0mIAF,,Schools_58,012Hu000001pkqEIAQ,124,,,<p><strong></strong></p><p></p><p><br></p><p><...
330,1,Organization,OSU Newman Center,"OSU Newman Center, Corvallis",,False,,Corvallis,OR,,...,,,,,NewmanCenters_1,012Hu000001pkqEIAQ,124,,,<p><strong>Mass (During Academic Year)</strong...
331,2,Organization,St. Thomas More (UO) Newman Center,"St. Thomas More (UO) Newman Center, Eugene",,False,,Eugene,OR,,...,,,,,NewmanCenters_2,012Hu000001pkqEIAQ,124,,,<p><strong>Weekend Mass</strong></p><p>Saturda...
332,3,Organization,Walsh Memorial (SOU) Newman Center at Our Lady...,Walsh Memorial (SOU) Newman Center at Our Lady...,,True,,Ashland,OR,,...,,,,,NewmanCenters_3,012Hu000001pkqEIAQ,124,,,<p><strong>Sunday Mass</strong></p><p>5:00 pm ...


In [60]:
# Create 'account_staging' df (drop extraneous columns)

accounts_staging = acc_main[[
    'Name',
    'Formal_Name__c',
    'RecordTypeId',
    'mbfc__Church_Type__c',
    'mbfc__Deanery__c',
    'BillingStreet',
    'BillingCity',
    'BillingState',
    'BillingPostalCode',
    'BillingCountry',
    'Phone',
    'Fax',
    'mbfc__Email__c',
    'Website',
    'mbfc__Mass_Times__c',
    'mbfc__Abbreviation__c',
    'mbfc__Religious_Suffix__c',
    'mbfc__Type_Members__c',
    'Description',
    'Archdiocese_Assigns_Clergy__c', # Boolean fields
    'mbfc__Non_Latin__c', 
    'Disabled_Access__c', 
    'Locator_Description__c',
    'Parent_Parish__c',
    'mbfc__Date_Established__c',
    'County__c',
    'Sanctuary_Capacity__c',
    # 'Miles_to_Pastoral_Centre__c',
    'Religious_Secular_Order__c',
    'Pontifical_or_Diocesan_Order__c',
    'Archdiocesan_School_Code__c',
    'Grades_Provided__c',
    'Job_Id__c',
    'Archdpdx_Migration_Id__c',
    'ParentId'  # Later, check whether or not can upsert using external ID using this field


    ]]

In [61]:
accounts_staging

Unnamed: 0,Name,Formal_Name__c,RecordTypeId,mbfc__Church_Type__c,mbfc__Deanery__c,BillingStreet,BillingCity,BillingState,BillingPostalCode,BillingCountry,...,mbfc__Date_Established__c,County__c,Sanctuary_Capacity__c,Religious_Secular_Order__c,Pontifical_or_Diocesan_Order__c,Archdiocesan_School_Code__c,Grades_Provided__c,Job_Id__c,Archdpdx_Migration_Id__c,ParentId
0,Pastoral Center,Pastoral Center,012Hu000001pkqEIAQ,,,2838 E Burnside St,Portland,OR,97214,,...,NaT,,,,,,,124,Offices_1,
1,Catholic Sentinel,Catholic Sentinel,012Hu000001pkqEIAQ,,,2838 E Burnside St,Portland,OR,97214,,...,NaT,,,,,,,124,Offices_3,
2,Catholic Cemeteries,Catholic Cemeteries,012Hu000001pkqEIAQ,,,333 SW Skyline Blvd,Portland,OR,97221,,...,NaT,,,,,,,124,Offices_4,
3,Griffin Center,Griffin Center,012Hu000001pkqEIAQ,,,11957 SE Fuller Rd,Milwaukie,OR,97222,,...,NaT,,,,,,,124,Offices_6,
4,Providence Portland Medical Center,Providence Portland Medical Center,012Hu000001pkqEIAQ,,,4805 NE Glisan St,Portland,OR,97213,,...,NaT,,,,,,,124,Offices_11,
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
311,"Resurrection Catholic Parish School, Tualatin",Resurrection Catholic Parish School,012Hu000001pkqEIAQ,,,,Tualatin,OR,,,...,NaT,,,,,12-WEESRES,PK-5,124,Schools_58,001Dx00001HwE0mIAF
330,"OSU Newman Center, Corvallis",OSU Newman Center,012Hu000001pkqEIAQ,,,2127 NW Monroe Ave,Corvallis,OR,97330,,...,NaT,,,,,,,124,NewmanCenters_1,
331,"St. Thomas More (UO) Newman Center, Eugene",St. Thomas More (UO) Newman Center,012Hu000001pkqEIAQ,,,1850 Emerald St,Eugene,OR,97403,,...,1915-01-01,,,,,,,124,NewmanCenters_2,
332,Walsh Memorial (SOU) Newman Center at Our Lady...,Walsh Memorial (SOU) Newman Center at Our Lady...,012Hu000001pkqEIAQ,,,987 Hillview Dr,Ashland,OR,97520,,...,NaT,,,,,,,124,NewmanCenters_3,


#### Create Parishes Holding Acc for acc heirarchy


In [62]:
# Upsert a Parishes holding account

# Get Account Group RecordTypeID
group_recordTypeId = df_sf_recordTypes.loc[
    (df_sf_recordTypes['DeveloperName'] == 'Group') & (df_sf_recordTypes['SobjectType'] == 'Account'),
    'Id'
    ].iloc[0]  # Use .iloc[0] to get the first item if you're expecting exactly one match


# Insert Vicariates holding account
parish_holding_account = sf.Account.upsert('Archdpdx_Migration_Id__c/Parishes_Holding_Acc',
    {
    "Name": "Parishes",
    "ParentId": diocesan_account_id,
    "RecordTypeId": group_recordTypeId,
    "Job_Id__c": curr_job_id,
    "mbfc__Group_Type__c": "Office"
    }
)

# Get Vicariate Holding Acc's SF ID (as an upsert doesn't return the actual record ID)

parish_holding_account = sf.Account.get_by_custom_id('Archdpdx_Migration_Id__c', 'Parishes_Holding_Acc')

parishes_holding_account_id = parish_holding_account['Id']

parishes_holding_account_id

'001Dx00001HwDxKIAV'

In [63]:
# Set the ParentId for all Parish records

# accounts_staging['ParentId'] = None # Commented this out as (a) the field already exists and it was blanking out pre-existing values.

accounts_staging['ParentId']= accounts_staging.apply(
    lambda row: parishes_holding_account_id if row['mbfc__Church_Type__c'] == 'Parish' else row['ParentId'], axis=1
)

accounts_staging.sample(10)


A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  accounts_staging['ParentId']= accounts_staging.apply(


Unnamed: 0,Name,Formal_Name__c,RecordTypeId,mbfc__Church_Type__c,mbfc__Deanery__c,BillingStreet,BillingCity,BillingState,BillingPostalCode,BillingCountry,...,mbfc__Date_Established__c,County__c,Sanctuary_Capacity__c,Religious_Secular_Order__c,Pontifical_or_Diocesan_Order__c,Archdiocesan_School_Code__c,Grades_Provided__c,Job_Id__c,Archdpdx_Migration_Id__c,ParentId
179,"St. Mary, Vernonia",St. Mary,012Dx0000003p4xIAA,Parish,001Dx00001HwDwpIAF,St. Mary Parish\nPO Box 312,Vernonia,OR,97064.0,,...,1923-01-01,Columbia,150.0,,,,,124,Parishes_150,001Dx00001HwDxKIAV
40,"St. Mary Star of the Sea, Astoria",St. Mary Star of the Sea,012Dx0000003p4xIAA,Parish,001Dx00001HwDwvIAF,St. Mary Star of the Sea Parish\n1465 Grand Ave,Astoria,OR,97103.0,,...,1874-01-01,Clatsop,480.0,,,,,124,Parishes_7,001Dx00001HwDxKIAV
300,"St. Agatha Catholic School, Portland",St. Agatha Catholic School,012Hu000001pkqEIAQ,,,,Portland,OR,,,...,1911-01-01,,,,,12-PDXAGAS,PS-8,124,Schools_47,001Dx00001HwDzrIAF
264,"Blanchet Catholic School, Salem",Blanchet Catholic School,012Hu000001pkqEIAQ,,,,Salem,OR,,,...,NaT,,,,,12-SALBLAS,6-12,124,Schools_11,
95,"St. Mary, Mount Angel",St. Mary,012Dx0000003p4xIAA,Parish,001Dx00001HwDwsIAF,St. Mary Parish\n575 E College St,Mount Angel,OR,97362.0,,...,1881-01-01,Marion,500.0,,,,,124,Parishes_65,001Dx00001HwDxKIAV
135,"St. Philip Neri, Portland",St. Philip Neri,012Dx0000003p4xIAA,Parish,001Dx00001HwDx0IAF,St. Philip Neri Parish\n2408 SE 16th Ave,Portland,OR,97214.0,,...,1912-01-01,Multnomah,400.0,,,,,124,Parishes_106,001Dx00001HwDxKIAV
157,"St. Wenceslaus, Scappoose",St. Wenceslaus,012Dx0000003p4xIAA,Parish,001Dx00001HwDwpIAF,St. Wenceslaus Parish\n51555 Old Portland Rd,Scappoose,OR,97056.0,,...,1909-01-01,Columbia,204.0,,,,,124,Parishes_128,001Dx00001HwDxKIAV
62,"St. Mark, Eugene",St. Mark,012Dx0000003p4xIAA,Parish,001Dx00001HwDwtIAF,St. Mark Parish\n1760 Echo Hollow Road,Eugene,OR,97402.0,,...,1961-01-01,Lane,270.0,,,,,124,Parishes_31,001Dx00001HwDxKIAV
92,"St. John the Baptist, Milwaukie",St. John the Baptist,012Dx0000003p4xIAA,Parish,001Dx00001HwDwzIAF,St. John the Baptist Parish\n10955 SE 25th Ave,Milwaukie,OR,97222.0,,...,1864-01-01,Clackamas,900.0,,,,,124,Parishes_62,001Dx00001HwDxKIAV
43,"St. Cecilia, Beaverton",St. Cecilia,012Dx0000003p4xIAA,Parish,001Dx00001HwDwoIAF,St. Cecilia Parish\n5105 SW Franklin Ave,Beaverton,OR,97005.0,,...,1913-01-01,Washington,500.0,,,,,124,Parishes_11,001Dx00001HwDxKIAV


#### Upsert Accounts (TBD )


In [64]:
# send accounts_staging to csv
accounts_staging.to_csv('staging_files/accounts_staging.csv', encoding='utf-8-sig')

In [65]:
# # FIXME: Format ExternalID lookups into dictionary to match SF's api so can upsert using simple-salesforce

# # Rename columns apis
# accounts_staging = accounts_staging.rename(columns={'Parent_Parish__c': 'Parent_Parish__r'})  # Later on, attempt to include 'ParentId' (which, as a standard SF field, might not work)

# # Reformat values to match what SF api requires
# accounts_staging['Parent_Parish__r'] = accounts_staging.apply(lambda x: "{'Archdpdx_Migration_Id__c': '" + x['Parent_Parish__r'] + "'}" if pd.notna(x['Parent_Parish__r']) and x['Parent_Parish__r'] != 'None' and x['Parent_Parish__r'] != '' else None, axis=1)




In [66]:
accounts_staging

Unnamed: 0,Name,Formal_Name__c,RecordTypeId,mbfc__Church_Type__c,mbfc__Deanery__c,BillingStreet,BillingCity,BillingState,BillingPostalCode,BillingCountry,...,mbfc__Date_Established__c,County__c,Sanctuary_Capacity__c,Religious_Secular_Order__c,Pontifical_or_Diocesan_Order__c,Archdiocesan_School_Code__c,Grades_Provided__c,Job_Id__c,Archdpdx_Migration_Id__c,ParentId
0,Pastoral Center,Pastoral Center,012Hu000001pkqEIAQ,,,2838 E Burnside St,Portland,OR,97214,,...,NaT,,,,,,,124,Offices_1,
1,Catholic Sentinel,Catholic Sentinel,012Hu000001pkqEIAQ,,,2838 E Burnside St,Portland,OR,97214,,...,NaT,,,,,,,124,Offices_3,
2,Catholic Cemeteries,Catholic Cemeteries,012Hu000001pkqEIAQ,,,333 SW Skyline Blvd,Portland,OR,97221,,...,NaT,,,,,,,124,Offices_4,
3,Griffin Center,Griffin Center,012Hu000001pkqEIAQ,,,11957 SE Fuller Rd,Milwaukie,OR,97222,,...,NaT,,,,,,,124,Offices_6,
4,Providence Portland Medical Center,Providence Portland Medical Center,012Hu000001pkqEIAQ,,,4805 NE Glisan St,Portland,OR,97213,,...,NaT,,,,,,,124,Offices_11,
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
311,"Resurrection Catholic Parish School, Tualatin",Resurrection Catholic Parish School,012Hu000001pkqEIAQ,,,,Tualatin,OR,,,...,NaT,,,,,12-WEESRES,PK-5,124,Schools_58,001Dx00001HwE0mIAF
330,"OSU Newman Center, Corvallis",OSU Newman Center,012Hu000001pkqEIAQ,,,2127 NW Monroe Ave,Corvallis,OR,97330,,...,NaT,,,,,,,124,NewmanCenters_1,
331,"St. Thomas More (UO) Newman Center, Eugene",St. Thomas More (UO) Newman Center,012Hu000001pkqEIAQ,,,1850 Emerald St,Eugene,OR,97403,,...,1915-01-01,,,,,,,124,NewmanCenters_2,
332,Walsh Memorial (SOU) Newman Center at Our Lady...,Walsh Memorial (SOU) Newman Center at Our Lady...,012Hu000001pkqEIAQ,,,987 Hillview Dr,Ashland,OR,97520,,...,NaT,,,,,,,124,NewmanCenters_3,


In [67]:
# accounts_staging[accounts_staging.Parent_Parish__r.isnull() == False]["Parent_Parish__r"]

In [68]:
print(accounts_staging['mbfc__Date_Established__c'].dtype)

datetime64[ns]


In [69]:

# Convert datetime to string in the desired format
accounts_staging['mbfc__Date_Established__c'] = accounts_staging['mbfc__Date_Established__c'].dt.strftime('%Y-%m-%d')

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  accounts_staging['mbfc__Date_Established__c'] = accounts_staging['mbfc__Date_Established__c'].dt.strftime('%Y-%m-%d')


In [70]:
# Upsert using new function
# FIXME: This upsert isn't working but appears to have worked previously (according to the 'accounts_results' file)... it was because of the 'mbfc__Date_Established__c' field formatted incorrectly!

accounts_upsert2 = upsert_to_salesforce_bulk(sf, accounts_staging, 'Account', 'Archdpdx_Migration_Id__c', 'results_files/accounts_failed', batch_size=100)

A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  dataframe.replace({pd.NA: None, ' ': None, '': None}, inplace=True)


Batch 1 processed: 100 successful, 0 failed.
Batch 2 processed: 200 successful, 0 failed.
Batch 3 processed: 246 successful, 0 failed.
Upsert completed. Total records processed: 246, Batches: 3, Successful upserts: 246, Failed upserts: 0


In [71]:
# Extract SF Account records

sf_accounts = sf.query('Select id, Name, RecordTypeId, mbfc__Church_Type__c, Archdpdx_Migration_Id__c, Job_Id__c from Account WHERE Job_Id__c != null')
sf_accounts = pd.DataFrame(sf_accounts['records'])
sf_accounts = sf_accounts.drop(columns = 'attributes')
sf_accounts

Unnamed: 0,Id,Name,RecordTypeId,mbfc__Church_Type__c,Archdpdx_Migration_Id__c,Job_Id__c
0,001Dx00001HwDuDIAV,Vicariates,012Dx0000003p4yIAA,,Vicariates_Holding_Acc,124
1,001Dx00001HwDwnIAF,Albany-Corvallis Vicariate,012Dx0000003p4yIAA,,Vicariates_1,124
2,001Dx00001HwDwoIAF,"Beaverton, Suburban Vicariate",012Dx0000003p4yIAA,,Vicariates_2,124
3,001Dx00001HwDwpIAF,Columbia County Vicariate,012Dx0000003p4yIAA,,Vicariates_3,124
4,001Dx00001HwDwqIAF,Downtown Portland Vicariate,012Dx0000003p4yIAA,,Vicariates_4,124
...,...,...,...,...,...,...
330,001Dx00001HwE5fIAF,"Society of the Divine Word, Techny, IL (SVD)",012Dx0000003p52IAA,,RelCommunities_77,123
331,001Dx00001HwE5gIAF,"Society of the Divine Saviour, Rome, Italy (SDS)",012Dx0000003p52IAA,,RelCommunities_78,123
332,001Dx00001HwE5hIAF,"Society of Our Lady of the Most Holy Trinity, ...",012Dx0000003p52IAA,,RelCommunities_79,123
333,001Dx00001HwE5iIAF,"Community of St. Thomas More, Eugene (OP)",012Dx0000003p52IAA,,RelCommunities_80,123


### C) Religious Institutes (Parents)


This section prepares and upserts parent Religious Congregation accounts for each row in RelCommunities table.

Dataframes >>
- acc_religious
- acc_religious_2
- acc_religious_parents

In [72]:
"""
- 'acc_religious' DF: create unique_id of religious parents
- create 'acc_religious_orders' DF , upsert into SF
- extract accounts from Salesforce, create dict (external_ID : account_ID)
- map parent ids onto religious child accounts DF in main DF
- 'acc_religious' > staging DF ('acc_religious')
    - drop unnecessary columns
    - upsert create DF of religious children, upsert into SF with
"""

# Create a new DF of all Religious accounts
acc_religious = accounts[accounts['AccountRecordType'] == 'Religious']

# Create a simplified external ID field for Parent Accounts
acc_religious['Archdpdx_Migration_Id__c'] = acc_religious['Order Full Name'].apply(
    lambda x: x.lower().replace(' ', '')[:40]
)

acc_religious_2 = acc_religious

# Create a DF for only parent religious order accounts
acc_religious_parents = acc_religious_2[[
    'Order Full Name', 
    # 'Name', 
    'mbfc__Abbreviation__c', 
    'mbfc__Religious_Suffix__c', 
    'mbfc__Type_Members__c', 
    'Archdpdx_Migration_Id__c',
    'Pontifical_or_Diocesan_Order__c',
    'Religious_Secular_Order__c',
    ]]

# Drop duplicate rows of the same parent Religious Order (becuase there are more than 1 local community of a particular order)
acc_religious_parents.drop_duplicates('Order Full Name', inplace=True)

# Manipulate the 'Name' field to remove any comma and subsequent text
# acc_religious_parents['Name'] = acc_religious_parents['Name'].str.split(',').str[0]

# How many remaining rows after dropping duplicates?
print(acc_religious_parents.shape)

# Rename columns
acc_religious_parents = acc_religious_parents.rename(columns={
    # 'Order Full Name': 'Description',
    'Order Full Name': 'Name'
    })

# Drop NA
acc_religious_parents.fillna('', inplace=True)

acc_religious_parents


(62, 7)


A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  acc_religious['Archdpdx_Migration_Id__c'] = acc_religious['Order Full Name'].apply(
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  acc_religious_parents.drop_duplicates('Order Full Name', inplace=True)


Unnamed: 0,Name,mbfc__Abbreviation__c,mbfc__Religious_Suffix__c,mbfc__Type_Members__c,Archdpdx_Migration_Id__c,Pontifical_or_Diocesan_Order__c,Religious_Secular_Order__c
186,Societas Iesu,Jesuits,SJ,Men,societasiesu,,Religious Order
187,Ordo Cisterciensis Strictioris Observantiae,Trappists,OCSO,Men,ordocisterciensisstrictiorisobservantiae,Pontifical Order,Religious Order
189,Ordo Sancti Benedicti,Benedictines,OSB,Men,ordosanctibenedicti,,Religious Order
190,Misioneros del Espíritu Santo,"Missionaries of the Holy Spirit, Christ the Pr...",MSpS,Men,misionerosdelespíritusanto,,
191,Apostles of Jesus,Apostles of Jesus,AJ,Men,apostlesofjesus,Diocesan Order,Religious Order
...,...,...,...,...,...,...,...
249,Fraternità san Carlo Borromeo,Fraternity of St. Charles,FSCB,Men,fraternitàsancarloborromeo,,
250,"Sons of Mary, Mother of Mercy","Sons of Mary, Mother of Mercy",SMMM,Men,"sonsofmary,motherofmercy",,
251,Society of the Divine Word,Society of the Divine Word,SVD,Men,societyofthedivineword,,
252,Society of the Divine Saviour,Society of the Divine Saviour,SDS,Men,societyofthedivinesaviour,,


In [73]:
acc_religious_parents['mbfc__Religious_Type__c'] = 'Congregation'

In [74]:
# Get Religious RecordTypeId
religious_recordtype_id = get_recordtype_id(df_sf_recordTypes, 'Religious', 'Account', 'mbfc')

religious_recordtype_id

'012Dx0000003p52IAA'

In [75]:
# Set recordType to 'Religious'

religious_recordtype_id = df_sf_recordTypes.loc[
    (df_sf_recordTypes['DeveloperName'] == 'Religious') & (df_sf_recordTypes['SobjectType'] == 'Account'),
    'Id'
    ].iloc[0]  # Use .iloc[0] to get the first item if you're expecting exactly one match

print(religious_recordtype_id)

acc_religious_parents['RecordTypeId'] = religious_recordtype_id

acc_religious_parents.sample(10)

012Dx0000003p52IAA


Unnamed: 0,Name,mbfc__Abbreviation__c,mbfc__Religious_Suffix__c,mbfc__Type_Members__c,Archdpdx_Migration_Id__c,Pontifical_or_Diocesan_Order__c,Religious_Secular_Order__c,mbfc__Religious_Type__c,RecordTypeId
241,"Order of Friars Minor, Conventual","Franciscans, Conventual",OFM Conv,Men,"orderoffriarsminor,conventual",,,Congregation,012Dx0000003p52IAA
194,Augustinians,Augustinians,OSA,Men,augustinians,,,Congregation,012Dx0000003p52IAA
210,Sisters of St. Dominic of Caldwell,Sisters of St. Dominic,OP,Women,sistersofst.dominicofcaldwell,,Religious Order,Congregation,012Dx0000003p52IAA
235,Towarzystwo Chrystusowe,Society of Christ,SCH,Men,towarzystwochrystusowe,Pontifical Order,Religious Order,Congregation,012Dx0000003p52IAA
248,Sub Mariae Nomine,Marists,SM,Men,submariaenomine,,,Congregation,012Dx0000003p52IAA
247,Brothers of Saint John,Brothers of Saint John,CSJ,Men,brothersofsaintjohn,,,Congregation,012Dx0000003p52IAA
200,Domus Dei Clerical Society of Apostolic Life,Domus Dei,SDD,Men,domusdeiclericalsocietyofapostoliclife,,Religious Order,Congregation,012Dx0000003p52IAA
187,Ordo Cisterciensis Strictioris Observantiae,Trappists,OCSO,Men,ordocisterciensisstrictiorisobservantiae,Pontifical Order,Religious Order,Congregation,012Dx0000003p52IAA
229,Sisters of St. Mary of Oregon,Sisters of St. Mary of Oregon,SSMO,Women,sistersofst.maryoforegon,,Religious Order,Congregation,012Dx0000003p52IAA
209,Congregación de Oblatas de Santa Marta,Servites,OSM,Women,congregacióndeoblatasdesantamarta,,,Congregation,012Dx0000003p52IAA


In [76]:
# Send to CSV
acc_religious_parents.to_csv('staging_files/religious_order_staging.csv', encoding='utf-8-sig')

In [77]:
# Upsert to Salesforce
bulk_data = []
for row in acc_religious_parents.itertuples(index=False):
    d = row._asdict()
    # del d['Index']
    bulk_data.append(d)

if run_upserts == 'True':
    religious_order_upsert = sf.bulk.Account.upsert(data=bulk_data, external_id_field='Archdpdx_Migration_Id__c', batch_size=100, use_serial=False)
    df_rel_order_upsert = pd.DataFrame(religious_order_upsert)

df_rel_order_upsert

Unnamed: 0,success,created,id,errors
0,True,False,001Dx00001HwE3TIAV,[]
1,True,False,001Dx00001HwE3UIAV,[]
2,True,False,001Dx00001HwE3VIAV,[]
3,True,False,001Dx00001HwE3WIAV,[]
4,True,False,001Dx00001HwE3XIAV,[]
...,...,...,...,...
57,True,False,001Dx00001HwE4OIAV,[]
58,True,False,001Dx00001HwE4PIAV,[]
59,True,False,001Dx00001HwE4QIAV,[]
60,True,False,001Dx00001HwE4RIAV,[]


In [78]:
# Generate an Errors log
import csv

keys = religious_order_upsert[0].keys()

with open('results_files/religious_order_results', 'w', newline='') as csv_file:
    writer = csv.DictWriter(csv_file, keys)
    writer.writeheader()
    writer.writerows(religious_order_upsert)

In [79]:
# @title get SF Accounts
get_all_rel_accounts = f"Select id, Name, RecordTypeId, Type, Archdpdx_Migration_Id__c from Account where RecordTypeID = '{religious_recordtype_id}'"

print(religious_recordtype_id)

# get list of records, add to dataframe
sf_accounts = sf.query(get_all_rel_accounts)
df_sf_accounts = pd.DataFrame(sf_accounts['records'])
df_sf_accounts = df_sf_accounts.drop(columns = 'attributes')

df_sf_accounts.sample(10)

012Dx0000003p52IAA


Unnamed: 0,Id,Name,RecordTypeId,Type,Archdpdx_Migration_Id__c
127,001Dx00001HwE5fIAF,"Society of the Divine Word, Techny, IL (SVD)",012Dx0000003p52IAA,,RelCommunities_77
145,001Dx00001HwFHDIA3,Order of St. Benedict (OSB),012Dx0000003p52IAA,,
78,001Dx00001HwE4sIAF,"Franciscan Friars, Oakland (OFM)",012Dx0000003p52IAA,,RelCommunities_22
97,001Dx00001HwE5BIAV,"Sisters of Jesus the Saviour, Gold Beach (SJS)",012Dx0000003p52IAA,,RelCommunities_45
52,001Dx00001HwE4IIAV,Congregatio Sanctissimi Redemptoris,012Dx0000003p52IAA,,congregatiosanctissimiredemptoris
124,001Dx00001HwE5cIAF,"Society of Mary, Washington, DC (SM)",012Dx0000003p52IAA,,RelCommunities_74
64,001Dx00001HwE4eIAF,"Abbey of Our Lady of Guadalupe, Carlton (OCSO)",012Dx0000003p52IAA,,RelCommunities_2
106,001Dx00001HwE5KIAV,"Society of Mary, Corvallis (SdM)",012Dx0000003p52IAA,,RelCommunities_54
73,001Dx00001HwE4nIAF,"Félix Rougier House of Studies, Mount Angel (M...",012Dx0000003p52IAA,,RelCommunities_14
94,001Dx00001HwE58IAF,"Sisters of Charity of the Blessed Virgin Mary,...",012Dx0000003p52IAA,,RelCommunities_42


In [80]:
religious_order_mapping = df_sf_accounts.set_index('Archdpdx_Migration_Id__c')['Id'].to_dict()
# religious_order_mapping

### D) Religious Communities


This section stages the 'relCommunities' tables as Religious Account records. 

Dataframes:
- acc_religious_staging
- acc_religious_staging_2 

In [81]:
acc_religious_staging = (acc_religious
                         .rename(columns={'Archdpdx_Migration_Id__c' : 'Parent_Archdpdx_Migration_Id__c'})
)

acc_religious_staging['ParentId'] = acc_religious_staging['Parent_Archdpdx_Migration_Id__c'].map(religious_order_mapping)

In [82]:
pd.set_option('display.max_columns', None)

In [83]:
# Enrich the data

acc_religious_staging['mbfc__Religious_Type__c'] = 'Local Community'
acc_religious_staging['Archdpdx_Migration_Id__c'] = 'RelCommunities_' + acc_religious_staging['Record Number'].astype('str')
acc_religious_staging['RecordTypeId'] = religious_recordtype_id
# acc_religious_staging.drop(columns='Name', inplace=True)
# acc_religious_staging.rename(columns={
#     'Name, City': 'Name'
# }, inplace=True)

In [84]:
acc_religious_staging.sample(5)

Unnamed: 0,Record Number,AccountRecordType,Formal_Name__c,Name,Parish Name,Archdiocese_Assigns_Clergy__c,Locator_Description__c,BillingCity,BillingState,Mailing Address Province,BillingPostalCode,BillingCountry,Phone,Fax,mbfc__Email__c,Website,src_table,Sort Name,Parish City,Parent_Parish,mbfc__Date_Established__c,Vicariate,Non-Latin,County__c,Disabled_Access__c,Sanctuary_Capacity__c,Lat/Long Coordinates Decimal,Google Small Embed URL,Miles to Pastoral Center,Schedule 1 Head,Schedule 1 Text,Schedule 2 Head,Schedule 2 Text,Schedule 3 Head,Schedule 3 Text,Schedule 4 Head,Schedule 4 Text,Schedule 5 Head,Schedule 5 Text,Schedule 6 Head,Schedule 6 Text,Schedule 7 Head,Schedule 7 Text,Community City,Order Full Name,mbfc__Abbreviation__c,mbfc__Religious_Suffix__c,mbfc__Type_Members__c,Non-Latin Rite,Show Order in Name,Description,Local Superior,Major Superior Name,Major Superior Phone,Major Superior Email,School City,Vicariate Link,Archdiocesan_School_Code__c,Grades_Provided__c,Mailing Address 1,Mailing Address Zip,Vicariate Name,Mailing Address City2,mbfc__Non_Latin__c,BillingStreet,Religious_Secular_Order__c,Pontifical_or_Diocesan_Order__c,Parent_Parish__c,ParentId,mbfc__Church_Type__c,Parent_Archdpdx_Migration_Id__c,RecordTypeId,Job_Id__c,mbfc__Religious_Type__c,Archdpdx_Migration_Id__c
241,67,Religious,"Order of Friars Minor, Conventual","Order of Friars Minor, Conventual, Portland (O...",,False,,,,,,,,,,,RelCommunities,,,,NaT,,,,False,,,,,,,,,,,,,,,,,,,Portland,"Order of Friars Minor, Conventual","Franciscans, Conventual",OFM Conv,Men,No,Yes,,0.0,,,,,,,,,,,,False,,,,,001Dx00001HwE4GIAV,,"orderoffriarsminor,conventual",012Dx0000003p52IAA,124,Local Community,RelCommunities_67
240,66,Religious,"Order of Friars Minor, Capuchins","Order of Friars Minor, Capuchins, Portland (OF...",,False,,,,,,,,,,,RelCommunities,,,,NaT,,,,False,,,,,,,,,,,,,,,,,,,Portland,"Order of Friars Minor, Capuchins",Capuchins,OFM Cap,Men,No,Yes,,0.0,,,,,,,,,,,,False,,,,,001Dx00001HwE4FIAV,,"orderoffriarsminor,capuchins",012Dx0000003p52IAA,124,Local Community,RelCommunities_66
232,56,Religious,Sacred Heart Maronite Monastery,"Sacred Heart Maronite Monastery, Beaverton (MM...",,False,"3880 NW 171st Pl, Beaverton",Portland,OR,,97213.0,,503-690-4425,,maronitemonks@gmail.com,https://www.maronitemonastery.com/,RelCommunities,,,,NaT,,,,False,,,,,,,,,,,,,,,,,,,Beaverton,Maronite Monks Of Jesus Mary and Joseph,Maronite Monks,MMJMJ,Men,Yes,Yes,A contemplative order of Maronite Eastern Cath...,684.0,"Bishop A. Eilas Zaidan, DD MLM, Eparchy of Our...",818-626-9193,info@eparchy.org,,,,,,,,,True,PO Box 13723,,,,001Dx00001HwE48IAF,,maronitemonksofjesusmaryandjoseph,012Dx0000003p52IAA,124,Local Community,RelCommunities_56
197,16,Religious,Brotherhood of the People of Praise,"Brotherhood of the People of Praise, Portland",,False,,Portland,OR,,97217.0,,503-230-9999,,,,RelCommunities,,,,NaT,,,,False,,,,,,,,,,,,,,,,,,,Portland,Brotherhood of the People of Praise,Brotherhood of the People of Praise,,Men,No,No,"A religious community of celibate men, lay and...",3009.0,,,,,,,,,,,,False,7709 N Denver Ave,,Diocesan Order,,001Dx00001HwE3bIAF,,brotherhoodofthepeopleofpraise,012Dx0000003p52IAA,124,Local Community,RelCommunities_16
211,35,Religious,Sisters of Our Lady of Sorrows,"Sisters of Our Lady of Sorrows, Beaverton (OSF)",,False,,Beaverton,OR,,97003.0,,503-649-7127,503-259-9507,sisters@olpretreat.org,https://www.olpretreat.org/,RelCommunities,,,,NaT,,,,False,,,,,,,,,,,,,,,,,,,Beaverton,Franciscan Missionary Sisters of Our Lady of S...,Franciscans,OSF,Women,No,Yes,,2608.0,"Sr. Anne Marie Warren, OSF",,,,,,,,,,,False,3600 SW 170th Ave,Religious Order,Diocesan Order,,001Dx00001HwE3nIAF,,franciscanmissionarysistersofourladyofso,012Dx0000003p52IAA,124,Local Community,RelCommunities_35


In [85]:
acc_religious_staging_2 = acc_religious_staging[[
    'Name',
    'RecordTypeId',
    'mbfc__Religious_Type__c',
    'BillingStreet',
    'BillingCity',
    'BillingState',
    'BillingPostalCode',
    'BillingCountry',
    'Phone',
    'Fax',
    'mbfc__Email__c',
    'Website',
    'mbfc__Abbreviation__c',
    'mbfc__Religious_Suffix__c',
    'mbfc__Type_Members__c',
    'Description',
    'Job_Id__c',
    'ParentId',
    'Archdpdx_Migration_Id__c'
    ]]

acc_religious_staging_2.sample(5)

Unnamed: 0,Name,RecordTypeId,mbfc__Religious_Type__c,BillingStreet,BillingCity,BillingState,BillingPostalCode,BillingCountry,Phone,Fax,mbfc__Email__c,Website,mbfc__Abbreviation__c,mbfc__Religious_Suffix__c,mbfc__Type_Members__c,Description,Job_Id__c,ParentId,Archdpdx_Migration_Id__c
238,"Heralds of the Good News, Portland (HGN)",012Dx0000003p52IAA,Local Community,c/o Chancellor\n2838 E Burnside St,Portland,OR,97214,,503-233-8322,,vschueler@archdpdx.org,,Heralds of the Good News,HGN,Men,,124,001Dx00001HwE4DIAV,RelCommunities_64
230,"Society of Mary, Corvallis (SdM)",012Dx0000003p52IAA,Local Community,540 NW 9th St,Corvallis,OR,97330,,541-754-1505,,sister.teresa@socmaria.org,https://www.socmaria.org/home,Society of Mary,SdM,Women,An Institute of consecrated missionary sisters...,124,001Dx00001HwE46IAF,RelCommunities_54
236,"Work of Jesus the High Priest, Gresham (OJSS)",012Dx0000003p52IAA,Local Community,OJSS Community\n451 NW 1st St,Gresham,OR,97030,,,,,https://www.familiemariens.info/html/en/index....,Work of Jesus the High Priest,OJSS,Men,Missionary brothers and priests associated wit...,124,001Dx00001HwE4BIAV,RelCommunities_62
233,"Priory of Our Lady of Consolation, Amity (OSsS)",012Dx0000003p52IAA,Local Community,Priory of Our Lady of Consolation\n23300 SW Wa...,Amity,OR,97101,,503-835-8080,503-835-9662,monks@brigittine.org,http://www.brigittine.com/,Brigittines,OSsS,Men,Canonical status of a Priory “Sui Juris”. Brot...,124,001Dx00001HwE49IAF,RelCommunities_57
207,"Benedictine Sisters of Mount Angel, Mount Ange...",012Dx0000003p52IAA,Local Community,Queen of Angels Monastery\n840 S Main St,Mount Angel,OR,97362,,503-845-6141,503-845-6585,info@benedictine-srs.org,https://www.benedictine-srs.org/,Benedictine Sisters of Mount Angel,OSB,Women,"Serving St. Joseph Shelter/Mission Benedict, M...",124,001Dx00001HwE3VIAV,RelCommunities_31


In [86]:
# Final Cleanup

acc_religious_staging_2 = acc_religious_staging_2.fillna('')

In [87]:
# @title Send to CSV
acc_religious_staging_2.to_csv('/Users/matthewmartin/Library/CloudStorage/GoogleDrive-matt@meribahflow.com/Shared drives/Clients/ADPDX (Portland)/Data/Clergy DB/staging/religious_community_staging.csv', encoding='utf-8-sig')

In [88]:
upsert_to_salesforce_bulk(sf, acc_religious_staging_2, 'Account', 'Archdpdx_Migration_Id__c', 'results_files/religious_comm_results.csv', 100)


Batch 1 processed: 69 successful, 1 failed.
Upsert completed. Total records processed: 70, Batches: 1, Successful upserts: 69, Failed upserts: 1


### E) Religious Superiors


In [89]:
acc_rel_superiors = acc_religious_2[[
    'Name',
    'Major Superior Name',
    'Major Superior Phone',
    'Major Superior Email',
    'Archdpdx_Migration_Id__c']].copy()


acc_rel_superiors['AccountId'] = acc_rel_superiors.Archdpdx_Migration_Id__c.map(religious_order_mapping)

acc_rel_superiors.sample(5)

Unnamed: 0,Name,Major Superior Name,Major Superior Phone,Major Superior Email,Archdpdx_Migration_Id__c,AccountId
247,"Brothers of Saint John, Laredo, TX (CSJ)",,,,brothersofsaintjohn,001Dx00001HwE4MIAV
205,"Adorers of the Holy Cross, Portland (MTG)",,,,adorersoftheholycross,001Dx00001HwE3jIAF
198,"Carmelite House of Studies, Mount Angel (OCD)","Fr. Matthew Williams, O.C.D.",909-793-0424,,ordocarmelitarumdiscalceatorum,001Dx00001HwE3cIAF
233,"Priory of Our Lady of Consolation, Amity (OSsS)",,,,"brigittinemonks,orderofthemostholysavior",001Dx00001HwE49IAF
190,Missionaries of the Holy Spirit Provincial Hou...,,,,misionerosdelespíritusanto,001Dx00001HwE3WIAV


In [90]:
def parse_names(df, column_name):
    # Convert all non-string entries to strings (handling NaN and other data types)
    df[column_name] = df[column_name].fillna('').apply(str)

    # Create a new DataFrame to store the name parts
    name_parts = pd.DataFrame()

    # Parse each name in the column
    name_parts['First Name'] = df[column_name].apply(lambda x: HumanName(x).first if x.strip() != '' else '')
    name_parts['Last Name'] = df[column_name].apply(lambda x: HumanName(x).last if x.strip() != '' else '')
    name_parts['Middle Name'] = df[column_name].apply(lambda x: HumanName(x).middle if x.strip() != '' else '')
    name_parts['Title'] = df[column_name].apply(lambda x: HumanName(x).title if x.strip() != '' else '')
    name_parts['Suffix'] = df[column_name].apply(lambda x: HumanName(x).suffix if x.strip() != '' else '')
    name_parts['Nickname'] = df[column_name].apply(lambda x: HumanName(x).nickname if x.strip() != '' else '')

    # Combine the original DataFrame with the name parts DataFrame
    result_df = pd.concat([df, name_parts], axis=1)
    return result_df



In [93]:
!pip install nameparser
from nameparser import HumanName
from nameparser.config import CONSTANTS

# Add dataset-specific Titles and Suffix constants for parsing
CONSTANTS.titles.add('Rev.', 'Very Rev.', 'Very Rev', 'Sr.', 'Sr. ', 'Very', 'Bishop')
CONSTANTS.suffix_acronyms.add('FRS', 'OMI', 'OSA', 'OCD', 'OFM', 'OP', 'OC', 'FSE', 'OMV', 'SDB', 'SM', 'SFX', 'SP', 'OP', 'O.S.M', 'OSM' 'SNJM', 'OSF', 'HMRF', 'DD', 'CSJP', 'SDD', 'BVM', 'BVM - President', 'SJ')





SetManager({'faia', 'issp-csp', 'mai', 'mct', 'gri', 'si', 'cp-c', 'dma', 'fnss', 'cwna', 'ndtr', 'cva[22]', 'els', 'chpln', 'agsf', 'clsd', 'chmm', 'kp', 'cna', 'rrt-accs', 'cmas', 'nmd', 'dep', 'qam', 'cic', 'csbe', 'sfp', 'cciso', 'cm', 'qgm', 'cms', 'dnp', 'ncidq', 'psp', 'cgsp', 'hrs', 'rfp', 'cmc', 'apss', 'ccp', 'jp', 'mirm', 'rdcs', 'cnm', 'aca', 'lpc', 'bpe', 'gbe', 'cebs', 'mcm', 'crtt', 'od', 'vrd', 'ams', 'cpcu', 'dvm', 'cgma', 'ms', 'cds', 'ipep', 'cvrs', 'facd', 'facep', 'mp', 'ocd', 'ae', 'bpt', 'cpa', 'capm', 'gmb', 'lpa', 'ph', 'casp', 'rpl', 'hmrf', 'apr', 'cbp', 'chfc', 'ccmt', 'drb', 'fcas', 'iccm-d', 'ccie', 'nbcch', 'nd', 'cfcc', 'grp', 'litl', 'fp&a', 'chrm', 'dso', 'gcsi', 'cwp', 'faspen', 'ei', 'crp', 'chse', 'usar', 'ceh', 'pls', 'crt', 'ed', 'cbrte', 'enp', 'rrt-sds', 'cpacc', 'dbe', 'mlt', 'cbsp', 'pa-c', 'shrm-cp', 'rci', 'sp', 'cbnt', 'mpa', 'mches', 'alp', 'csjp', 'aas', 'ceds', 'dcmg', 'cwsp', 'faap', 'lmsw', 'cea', 'erd', 'ot', 'cipm', 'dpt', 'lvn', 'cf

In [96]:
# Parse Complex Names
acc_rel_superiors_parsed = parse_names(acc_rel_superiors, 'Major Superior Name')

In [97]:
acc_rel_superiors_staging = acc_rel_superiors_parsed.fillna('')

acc_rel_superiors_staging['Archdpdx_Migration_Id__c'] = acc_rel_superiors_staging['Major Superior Name'].apply(lambda x: x.replace(' ','').lower())

# Rename columns
acc_rel_superiors_staging = acc_rel_superiors_staging.rename(columns={
    'Major Superior Phone': 'Phone',
    'Major Superior Email': 'Email',
    'Title': 'Salutation',
    'First Name': 'FirstName',
    'Middle Name': 'MiddleName',
    'Last Name': 'LastName'
})

# Add job id
acc_rel_superiors_staging['Archdpdx_Job_Id__c'] = curr_job_id

# Drop columns
acc_rel_superiors_staging = acc_rel_superiors_staging.drop(columns=['Name', 'Major Superior Name', 'Nickname'])

# Drop empty rows
acc_rel_superiors_staging = acc_rel_superiors_staging[acc_rel_superiors_staging['LastName'].str.strip() != '']

acc_rel_superiors_staging.sample(10)

Unnamed: 0,Phone,Email,Archdpdx_Migration_Id__c,AccountId,FirstName,LastName,MiddleName,Salutation,Suffix,Archdpdx_Job_Id__c
211,,,"sr.annemariewarren,osf",001Dx00001HwE3nIAF,Anne,Warren,Marie,Sr.,OSF,124
218,563-588-2351,,"ladonnamanternach,bvm–president",001Dx00001HwE3uIAF,BVM,LaDonna Manternach,– President,,,124
215,011 52 55 58 72 20 0,hmrf@misionerasdefatima.org,"candelarianavarroalvarado,hmrf",001Dx00001HwE3rIAF,Candelaria,Alvarado,Navarro,,HMRF,124
236,,,fr.gebhardpaulm.sigl,001Dx00001HwE4BIAV,Gebhard,Sigl,Paul M.,Fr.,,124
219,314-397-9436,tponder@gspmna.org,toniponder,001Dx00001HwE3vIAF,Toni,Ponder,,,,124
207,,,"sr.janehibbard,snjmmonasteryadministrator",001Dx00001HwE3VIAV,SNJM,Sr. Jane Hibbard,Monastery Administrator,,,124
210,973-403-3331,dominicans@caldwellop.org,"sr.luellaramm,op",001Dx00001HwE3mIAF,Luella,Ramm,,Sr.,OP,124
192,203-238-2243,,"mothermiriamseiferman,fse",001Dx00001HwE3YIAV,Miriam,Seiferman,,Mother,FSE,124
186,,,"rev.seancarroll,sj",001Dx00001HwE3TIAV,Sean,Carroll,,Rev.,SJ,124
245,617-536-4141,office@omvusa.org,"fr.jimwalther,omv",001Dx00001HwE4KIAV,Jim,Walther,,Fr.,OMV,124


In [98]:
acc_rel_superiors_staging.sample(10)

Unnamed: 0,Phone,Email,Archdpdx_Migration_Id__c,AccountId,FirstName,LastName,MiddleName,Salutation,Suffix,Archdpdx_Job_Id__c
239,,gensec@omigen.org,"fr.luisignacioroisalonso,omi",001Dx00001HwE4EIAV,Luis,Alonso,Ignacio Rois,Fr.,OMI,124
235,630-424-0401,schprov@ aol.com,fr.jacekwalkiewicz,001Dx00001HwE4AIAV,Jacek,Walkiewicz,,Fr.,,124
194,,osa-west@calprovince.org,"rev.garysanders,osa",001Dx00001HwE3aIAF,Gary,Sanders,,Rev.,OSA,124
243,,,rogériogomes,001Dx00001HwE4IIAV,Rogério,Gomes,,,,124
227,610-459-4125,tfirenze@osfphila.org,"sr.theresamariefirenze,osf",001Dx00001HwE43IAF,Theresa,Firenze,Marie,Sr.,OSF,124
230,,hermana.veronica@socmaria.org,motherveronicalopez,001Dx00001HwE46IAF,Veronica,Lopez,,Mother,,124
254,510-658-8722,provincial@opwest.org,"veryrev.christopherfadok,op,provincial",001Dx00001HwE3dIAF,Christopher,Fadok,,Very Rev.,"OP, Provincial",124
210,973-403-3331,dominicans@caldwellop.org,"sr.luellaramm,op",001Dx00001HwE3mIAF,Luella,Ramm,,Sr.,OP,124
192,203-238-2243,,"mothermiriamseiferman,fse",001Dx00001HwE3YIAV,Miriam,Seiferman,,Mother,FSE,124
219,314-397-9436,tponder@gspmna.org,toniponder,001Dx00001HwE3vIAF,Toni,Ponder,,,,124


In [99]:
# Send to CSV
acc_rel_superiors_staging.to_csv('staging_files/religious_superiors_staging.csv', encoding='utf-8-sig')

In [100]:
# Upsert to Salesforce

def find_existing_contact(sf, first_name, last_name):
    query = f"SELECT Id, Archdpdx_Migration_Id__c FROM Contact WHERE FirstName = '{first_name}' AND LastName = '{last_name}'"
    result = sf.query(query)
    return result['records']



bulk_data = []
for row in acc_rel_superiors_staging.itertuples(index=False):
    d = row._asdict()
    existing_contacts = find_existing_contact(sf, d['FirstName'], d['LastName'])
    if existing_contacts:
        # Update existing contact with external ID
        d['Id'] = existing_contacts[0]['Id']
        bulk_data.append(d)
    else:
        bulk_data.append(d)


if run_upserts == 'True':
    religious_superior_upsert = sf.bulk.Contact.upsert(data=bulk_data, external_id_field='Archdpdx_Migration_Id__c', batch_size=100, use_serial=False)
    df_rel_superior_upsert = pd.DataFrame(religious_superior_upsert)

df_rel_superior_upsert

Unnamed: 0,success,created,id,errors
0,False,False,,"[{'statusCode': 'DUPLICATE_VALUE', 'message': ..."
1,True,False,003Dx00000nKikgIAC,[]
2,True,False,003Dx00000nKikhIAC,[]
3,True,False,003Dx00000nKikiIAC,[]
4,False,True,,"[{'statusCode': 'INVALID_EMAIL_ADDRESS', 'mess..."
5,True,False,003Dx00000nKikjIAC,[]
6,True,False,003Dx00000nKikkIAC,[]
7,False,False,,"[{'statusCode': 'DUPLICATE_VALUE', 'message': ..."
8,True,False,003Dx00000nKiklIAC,[]
9,True,False,003Dx00000nKikmIAC,[]


In [101]:
# Update Religious Communities with Rel. Superior

# TODO: It would take much less time to simply do this post-migration manually.

# CONTACTS


## Extract


In [102]:
import pandas as pd
df_contacts = (pd.read_csv('/Users/matthewmartin/Library/CloudStorage/GoogleDrive-matt@meribahflow.com/Shared drives/Clients/ADPDX (Portland)/Data/Clergy DB/reports from clergypdx/People.csv')
               .set_index('Record Number', verify_integrity=True)
               .drop(index='recNum') # Drops the extra row that replicates the labels
               .rename(columns=lambda x: x.replace(' ', '_')) # Remove whitespace in column names
)

df_contacts.sample(10)


Unnamed: 0_level_0,Common_Name,Sort_Name,Type(s),Clergy_Status,Religious_Status,Login_ID,Password,Password_Must_be_Changed,Access_Permission,Spouse,Title,Salutation,Christian_Name,Nickname,Middle_Name(s),Surname,Suffix,Mailing_Address,Mailing_Address_2,Mailing_Address_City,Mailing_Address_State,Mailing_Address_Province,Mailing_Address_Postal_Code,Mailing_Address_Country,Private_Address,Private_Address_2,Private_Address_City,Private_Address_State,Private_Address_Province,Private_Address_Postal_Code,Private_Address_Country,Preferred_Address,Work_Phone,Home_Phone,Cell_Phone,Preferred_Phone,Work_Email,Archdiocesan_Email,Home_Email,Preferred_Email,Directory_Include,Directory_Include_Middle_Name,Directory_Include_Suffix,Suppress_From_Reports,Seminarian_Student_Debt,Seminarian_Medical_Benefits,Send_Group_Mail_and_Email,Birth_Date,Place_of_Birth,Foreign_Born,Father_Full_Name,Mother_Full_Maiden_Name,Foreign_Citizenship,Immigration_Status,Passport/Visa_Expiration_Date,Social_Security_Account_Number,Baptism_Date,Place_of_Baptism,Confirmation_Date,Place_of_Confirmation,Received_Date,Parish_of_Record,Marriage_Date,Place_of_Marriage,Date_of_First_Vows,Date_of_Final_Vows,Accepted_to_Formation_Date,Reader_Date,Acolyte_Date,Candidacy_Date,Formation_Withdrawn_Date,Formation_Deferred_Date,Formation_Terminated_Date,Terminate_or_Defer_Note,Bachelor_Degree_Year,Bachelor_Degree_Type,Bachelor_Degree_Institution,Graduate_1_Degree_Year,Graduate_1_Degree_Type,Graduate_1_Degree_Institution,Graduate_2_Degree_Year,Graduate_2_Degree_Type,Graduate_2_Degree_Institution,Graduate_3_Degree_Year,Graduate_3_Degree_Type,Graduate_3_Degree_Institution,Graduate_4_Degree_Year,Graduate_4_Degree_Type,Graduate_4_Degree_Institution,CARA_Highest_Ed_Level,Diaconal_Ordination_Date,Diaconal_Ordination_Place,Diaconal_Ordination_Prelate,Presbyteral_Ordination_Date,Presbyteral_Ordination_Place,Presbyteral_Ordination_Prelate,Episcopal_Ordination_Date,Episcopal_Ordination_Place,Episcopal_Ordination_Prelate,Ordination_Diocese,Incardinated_From_Diocese,Incardinated_From_Date,Incardinated_Now,Serving_Now,Excardinated_To_Diocese,Excardinated_To_Date,Letter_of_Good_Standing_Date,Religious_In_Archdiocese_Date,Faculties,Faculties_Granted_Date,Faculties_Restricted_Date,Faculties_Withdrawn_Date,Last_Retreat_Date,Last_Educ_Requirement_Date,Policy_Manual_Acknowledgement_Date,Harassment_Prevention_Course_Date,Standards_of_Conduct_Date,Last_Background_Check_Date,Last_Child_Protection_Training_Date,Out_of_Diocese_Date,Senior_Status_Date,Laicized_Date,Deceased_Date,Languages,Coverage_Availability,Advanced_Directive_Date,End_of_Life_Plan_Date,Will_Date,Will_Note,CIC_489_File,Registered_Parish,CARA_Ethnicity,Seminarian_Status,Other_Diaconal_Ministry,Spiritual_Director_Authorized,Link_to_Religious_Community,Place_of_Work,Volunteer_Place,Type_of_Work,Work_Load,Work_Title
Record Number,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1,Unnamed: 22_level_1,Unnamed: 23_level_1,Unnamed: 24_level_1,Unnamed: 25_level_1,Unnamed: 26_level_1,Unnamed: 27_level_1,Unnamed: 28_level_1,Unnamed: 29_level_1,Unnamed: 30_level_1,Unnamed: 31_level_1,Unnamed: 32_level_1,Unnamed: 33_level_1,Unnamed: 34_level_1,Unnamed: 35_level_1,Unnamed: 36_level_1,Unnamed: 37_level_1,Unnamed: 38_level_1,Unnamed: 39_level_1,Unnamed: 40_level_1,Unnamed: 41_level_1,Unnamed: 42_level_1,Unnamed: 43_level_1,Unnamed: 44_level_1,Unnamed: 45_level_1,Unnamed: 46_level_1,Unnamed: 47_level_1,Unnamed: 48_level_1,Unnamed: 49_level_1,Unnamed: 50_level_1,Unnamed: 51_level_1,Unnamed: 52_level_1,Unnamed: 53_level_1,Unnamed: 54_level_1,Unnamed: 55_level_1,Unnamed: 56_level_1,Unnamed: 57_level_1,Unnamed: 58_level_1,Unnamed: 59_level_1,Unnamed: 60_level_1,Unnamed: 61_level_1,Unnamed: 62_level_1,Unnamed: 63_level_1,Unnamed: 64_level_1,Unnamed: 65_level_1,Unnamed: 66_level_1,Unnamed: 67_level_1,Unnamed: 68_level_1,Unnamed: 69_level_1,Unnamed: 70_level_1,Unnamed: 71_level_1,Unnamed: 72_level_1,Unnamed: 73_level_1,Unnamed: 74_level_1,Unnamed: 75_level_1,Unnamed: 76_level_1,Unnamed: 77_level_1,Unnamed: 78_level_1,Unnamed: 79_level_1,Unnamed: 80_level_1,Unnamed: 81_level_1,Unnamed: 82_level_1,Unnamed: 83_level_1,Unnamed: 84_level_1,Unnamed: 85_level_1,Unnamed: 86_level_1,Unnamed: 87_level_1,Unnamed: 88_level_1,Unnamed: 89_level_1,Unnamed: 90_level_1,Unnamed: 91_level_1,Unnamed: 92_level_1,Unnamed: 93_level_1,Unnamed: 94_level_1,Unnamed: 95_level_1,Unnamed: 96_level_1,Unnamed: 97_level_1,Unnamed: 98_level_1,Unnamed: 99_level_1,Unnamed: 100_level_1,Unnamed: 101_level_1,Unnamed: 102_level_1,Unnamed: 103_level_1,Unnamed: 104_level_1,Unnamed: 105_level_1,Unnamed: 106_level_1,Unnamed: 107_level_1,Unnamed: 108_level_1,Unnamed: 109_level_1,Unnamed: 110_level_1,Unnamed: 111_level_1,Unnamed: 112_level_1,Unnamed: 113_level_1,Unnamed: 114_level_1,Unnamed: 115_level_1,Unnamed: 116_level_1,Unnamed: 117_level_1,Unnamed: 118_level_1,Unnamed: 119_level_1,Unnamed: 120_level_1,Unnamed: 121_level_1,Unnamed: 122_level_1,Unnamed: 123_level_1,Unnamed: 124_level_1,Unnamed: 125_level_1,Unnamed: 126_level_1,Unnamed: 127_level_1,Unnamed: 128_level_1,Unnamed: 129_level_1,Unnamed: 130_level_1,Unnamed: 131_level_1,Unnamed: 132_level_1,Unnamed: 133_level_1,Unnamed: 134_level_1,Unnamed: 135_level_1,Unnamed: 136_level_1,Unnamed: 137_level_1,Unnamed: 138_level_1,Unnamed: 139_level_1,Unnamed: 140_level_1,Unnamed: 141_level_1
442,Mr. Bob Nelson,nelson bob james,Archive,,,,,,,0,Mr.,Mr.,Robert,Bob,James,Nelson,,PO Box 417,,Gervais,OR,,97026.0,,675 NE 3rd St,,Gervais,OR,,97026.0,,,503-682-1951,,503-537-8990,,robert.nelson@penske.com,,janiceandbob@hotmail.com,,,No,No,,0,,,1973-03-18,,,Ronald Ray Nelson,Diane Marie Straub,,,,,1973-10-07,"Ascension Catholic Church, Portland, OR",1991-03-30,,,,1999-07-10,,,,,,,,,2017-05-23,,,,,,,,,,,,,,,,,,Some college/Associate degree,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,0,Caucasian/white,,,,0,,,,,
3017,Mr. Noah Maffett,maffett noah john,Seminarian,,,,,,,0,Mr.,Mr.,Noah,,John,Maffett,,Bishop White Seminary,429 E Sharp Ave,Spokane,WA,,99202.0,,2620 S Silver Lake Rd,,Castle Rock,WA,,98611.0,,Mailing,,360-274-3297,360-261-8475,Do Not Include,,nmaffett@archdpdx.org,adventusmaria@gmail.com,Archdiocese,Yes,,,,0,No,Yes,2002-04-12,"Castle Rock, WA",No,Joseph D. Maffett,Jean Maffett,,,,uAPfum+L+3YINCcQukEUKQ==,2002-04-14,"St. Birgitta Parish, Portland, OR",2015-06-30,"St. Birgitta Parish, Portland, OR",,,,,,,2023-06-15,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,2023-04-06,2023-04-06,,,,,,,,,,,,,0,,,,,0,,,,,
2858,Mr. Will Christensen,christensen will,Staff,,,,,,,0,Mr.,Mr.,William,Will,,Christensen,,St. Thomas More Newman Center Parish,1850 Emerald St,Eugene,OR,,97403.0,,,,,,,,,,541-343-7021,,,,peerminister@uonewman.org,,,,,,,,0,,Yes,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,0,,,,,0,,,,,
2972,Ms. Kelly Littleton,littleton kelly,Staff,,,,,,,0,Ms.,Ms.,Kelly,,,Littleton,J.C.L.,Pastoral Center,2838 E Burnside St,Portland,OR,,97214.0,,,,,,,,,,503-234-8383,,,,klittleton@archdpdx.org,,,,,,,,0,,Yes,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,0,,,,,0,,,,,
805,Rev. Paulinus Mangesho,mangesho paulinus amas,"Priest,Religious",Active,Active,pmangesho,327d12b77d310ae6ac08c619db86affed804b7cc4655bc...,No,,0,Rev.,Fr.,Paulinus,,Amas,Mangesho,,Immaculate Heart of Mary Parish,2926 N Williams Ave,Portland,OR,,97227.0,,,,,,,,,,503-287-3724,,503-409-2813,,,pmangesho@archdpdx.org,pamasho5@yahoo.com,,Yes,No,No,No,0,,Yes,1958-04-26,Tanzania,Yes,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,1991-06-30,"Diocese of Moshi, Tanzania",,,,,Apostolic Life Community of Priests in the Opu...,,,Apostolic Life Community of Priests in the Opu...,Archdiocese of Portland in Oregon,,,2017-01-01,2014-07-01,General,2017-07-01,,,,,,2022-04-13,2016-03-18,2021-05-10,2023-07-12,,,,,,,,,,,,0,,,,,11,"Immaculate Heart Parish, Portland",,Pastor,Full Time,Pastor
3081,Mr. Robert Fraley,fraley robert,Staff,,,,,,,0,Mr.,Mr.,Robert,,,Fraley,,Marist Catholic High School,1900 Kingsley Rd,Eugene,OR,,97401.0,,,,,,,,,,541-686-2234,,,,rfraley@marisths.org,,,,,,,,0,,Yes,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,0,,,,,0,,,,,
976,Rev. Juan Chiarinoti,chiarinoti juan carlos,Priest,Transferred Out,,,,,,0,Rev.,Fr.,Juan,,Carlos,Chiarinoti,,,,,,,,,,,,,,,,,,,,,,,,,No,No,No,No,0,,Yes,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,1900-01-01,,,,,,,,,,,0,,,,,0,,,,,
2485,"Sr. Mary Josephine Thai Phuong Doan, MTG",doan mary josephine thai phuong,Religious,,Active,,,,,0,Sr.,Sr.,Mary Josephine Thai Phuong,,,Doan,,Adorers of the Holy Cross,7408 SE Alder St,Portland,OR,,97215.0,,,,,,,,,,503-254-3284,503-249-5892,,Work,,,,,Yes,,,,0,,Yes,,,,,,,,,,,,,,,,,,1995-07-16,2002-08-10,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,0,,,,,29,"Our Lady of Lavang Parish, Portland",,Administrative,Full Time,Secretary
2165,Mr. Ed Kurtz,kurtz ed,Staff,,,,,,,0,Mr.,Mr.,Ed,,,Kurtz,,St. Henry Parish,346 NW 1st St,Gresham,OR,,97030.0,,,,,,,,,,503-665-9129,,,,,,,,,,,,0,,Yes,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,0,,,,,0,,,,,
2548,Ms. Virginia Calcagno,calcagno virginia,Staff,,,,,,,0,Ms.,Ms.,Virginia,,,Calcagno,,St. Thomas More Parish,3525 SW Patton Rd,Portland,OR,,97221.0,,,,,,,,,,503-222-2055,,,,,,,,,,,,0,,Yes,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,0,,,,,0,,,,,


#### Get Photos


In [103]:
import os
import pandas as pd

# def list_jpeg_files(directory):
#     data = []
#     for filename in os.listdir(directory):
#         if filename.endswith(".jpeg") or filename.endswith(".jpg"):  # Checking for jpeg files
#             full_path = os.path.join(directory, filename)
#             data.append({'Filename': filename, 'Full Path': full_path})
#     return pd.DataFrame(data)

# # Specify your directory
# directory = '/content/drive/Shareddrives/Clients/ADPDX (Portland)/Data/Clergy DB/sql_backup/archdpdx.info backups/public_html/people/graphics/portraits/large'
# jpeg_files_df = list_jpeg_files(directory)


In [104]:
# # Query for the Library
# library_query = "SELECT Id, Name FROM ContentWorkspace WHERE Name = 'ADPDX Person Profile Photos'"
# library_result = sf.query(library_query)

# # Check if the library exists and get its ID
# if library_result['records']:
#     library_id = library_result['records'][0]['Id']
#     print(f"Library ID: {library_id}")

#     # Query for the Folder within the Library
#     folder_query = f"SELECT Id, Name FROM ContentFolder WHERE ParentContentFolderId = '{library_id}'"
#     folder_result = sf.query(folder_query)

#     # Check if the folder exists and get its ID
#     if folder_result['records']:
#         folder_id = folder_result['records'][0]['Id']
#         print(f"Folder ID: {folder_id}")
#     else:
#         print("Folder 'Large JPEGs' not found in the library.")
# else:
#     print("Library 'ADPDX Person Profile Photos' not found.")

## Analysis

Here we check the various columns and their types, count where values exist, count of unique values, sample data, etc.

DF shape:

- 142 columns
- 3017 rows


In [105]:
# Check the original shape of the imported CSV
print(f"Shape of original data set: {df_contacts.shape}")

# export to csv a list of the contact fields with count, unique, top, freq
contacts_describe = df_contacts.describe(include='all').transpose()
contacts_describe.to_csv(f'/Users/matthewmartin/Library/CloudStorage/GoogleDrive-matt@meribahflow.com/Shared drives/Clients/ADPDX (Portland)/Data/Clergy DB/analysis/contacts_describe.csv')

df_contacts.describe(include='all').transpose()  #initial analysis of the Contacts table

Shape of original data set: (3016, 141)


Unnamed: 0,count,unique,top,freq
Common_Name,3016,3011,Ms. Leslie Jones,2
Sort_Name,3016,3009,nguyen anthony,3
Type(s),3016,29,Staff,1139
Clergy_Status,1138,8,Transferred Out,462
Religious_Status,902,4,Active,456
...,...,...,...,...
Place_of_Work,269,133,Mount Angel Abbey,37
Volunteer_Place,54,47,Mary’s Woods,4
Type_of_Work,276,117,Pastoral Ministry,30
Work_Load,262,2,Full Time,230


In [106]:
unique_languages = df_contacts['Languages'].unique()
unique_languages

array([nan, 'English,Spanish', 'Igbo', 'English, Spanish',
       'Spanish, Mayaqeqchi', 'Spanish (Mass only)',
       'Latin Mass and written translation. Read French, Italian, Spanish.',
       'Spanish', 'Hindi, Konkani, Tamil',
       'French (fluent), Spanish (beginner), Latin (beginner)',
       'German, Spanish, Italian, French', 'Kiswahili, Kichagga',
       'Spanish (English is second language)',
       'German, Spanish, Italian, Latin Mass',
       'English, Spanish, Italian', 'Spanish, Italian', 'English',
       'Bicolango, Tagalog, Spanish', 'Spanish, Italian, Latin Mass',
       'Italian', 'Tagalog, English, Spanish',
       'French, Italian, Aramaic (modern), Spanish', 'Vietnamese',
       'German, Spanish', 'English,Spanish,Italian',
       'Conversant in Italian and Spanish, some facility with Latin and German',
       'English, Spanish, Latin Mass', 'Italian, Spanish',
       'Konkani, Hindi, Marathi, Spanish',
       'Tagalog, Bicol, Spanish (Mass only)', 'Spanish, E

In [107]:
# import re
# import numpy as np


# def deduplicate_languages(list_languages):
#     # Define a regular expression pattern to match periods and punctuation
#     punctuation_pattern = r'[.,!?;:"]'

#     # Flatten the array and filter out NaN values
#     flattened_languages = [re.sub(punctuation_pattern, '', lang) for sublist in list_languages if pd.notna(sublist) for lang in sublist.split(',')]

#     # Deduplicate the list of languages
#     unique_languages = list(set(flattened_languages))

#     return unique_languages


# # Example usage:
# unique_languages = deduplicate_languages(unique_languages)
# print(unique_languages)


## Transform


In [108]:
# list of columns NOT to be migrated as Contact attributes
misc_columns_to_drop = [
    'Password',
    'Password_Must_be_Changed',
    'Sort_Name'
]

affiliation_columns = [
    'Baptism_Date',
    'Place_of_Baptism',
    'Confirmation_Date',
    'Place_of_Confirmation',
    'Received_Date',
    'Parish_of_Record',
    'Marriage_Date',
    'Place_of_Marriage',
    'Date_of_First_Vows',
    'Date_of_Final_Vows',
    'Reader_Date',
    'Acolyte_Date',
    'Bachelor_Degree_Year',
    'Bachelor_Degree_Type',
    'Bachelor_Degree_Institution',
    'Graduate_1_Degree_Institution',
    'Graduate_1_Degree_Type',
    'Graduate_1_Degree_Year',
    'Graduate_2_Degree_Institution',
    'Graduate_2_Degree_Type',
    'Graduate_2_Degree_Year',
    'Graduate_3_Degree_Institution',
    'Graduate_3_Degree_Type',
    'Graduate_3_Degree_Year',
    'Graduate_4_Degree_Institution',
    'Graduate_4_Degree_Type',
    'Graduate_4_Degree_Year',
    'Diaconal_Ordination_Date',
    'Diaconal_Ordination_Place',
    'Diaconal_Ordination_Prelate',
    'Presbyteral_Ordination_Date',
    'Presbyteral_Ordination_Place',
    'Presbyteral_Ordination_Prelate',
    'Episcopal_Ordination_Date',
    'Episcopal_Ordination_Place',
    'Episcopal_Ordination_Prelate',
    'Incardinated_From_Date',
    'Incardinated_From_Diocese',
    'Excardinated_To_Diocese',
    'Excardinated_To_Date',
    'Faculties',
    'Faculties_Granted_Date',
    'Faculties_Restricted_Date',
    'Faculties_Withdrawn_Date',
]

# These fields need to be KEPT but while building the SF upsert flow these are dropped temporarily until mapping logic is included.
# TODO

fields_not_yet_mapped = [
    'Common_Name',
    'Spouse',
    'Father_Full_Name',
    'Mother_Full_Maiden_Name',
    'Mailing_Address_Province',
    'Private_Address_Province',
    # 'Preferred_Address',
    # 'Private_Address__Street__s',
    # 'Private_Address_2',
    # 'Private_Address__City__s',
    # 'Private_Address__StateCode__s',
    # 'Private_Address__PostalCode__s',
    # 'Private_Address__CountryCode__s',
    'Preferred_Email',
    'Preferred_Phone',
    'Social_Security_Account_Number__c',  # The data is encrypted
    'Serving_Now',
    'Ordination_Diocese',
    'Registered_Parish'

]

In [109]:
# UDF to combine multiple Mailing Street Address lines into one
def combine_addresses(row, *columns):
    address_parts = []
    for col in columns:
        value = row[col]
        if pd.notnull(value):  # Check for non-null values
            address_parts.append(str(value))  # Convert to string
    return '\n'.join(address_parts)  # '\n' for line break

In [110]:
df_contact_staging = (df_contacts
                      .drop(columns='Salutation')
                      .rename(columns={
                          'Clergy_Status' : 'ADPDX_Clergy_Status__c',
                          'Religious_Status' : 'ADPDX_Religious_Status__c',
                          'Login_ID' : 'ADPDX_Login_ID__c',
                          'Access_Permission': 'ADPDX_Access_Permission__c',
                          'Title': 'Salutation',
                          'Christian_Name': 'FirstName',
                          'Middle_Name(s)': 'MiddleName',
                          'Surname': 'LastName',
                          'Suffix': 'Suffix',
                          'Preferred_Address': 'Preferred_Address__c',
                          'Mailing_Address_City': 'MailingCity',
                          'Mailing_Address_State': 'MailingState',
                          'Mailing_Address_Postal_Code': 'MailingPostalCode',
                          'Mailing_Address_Country': 'MailingCountry',
                          'Private_Address_City': 'OtherCity',
                          'Private_Address_State': 'OtherState',
                          'Private_Address_Postal_Code': 'OtherPostalCode',
                          'Private_Address_Country': 'OtherCountry',
                          'Work_Phone': 'npe01__WorkPhone__c',
                          'Home_Phone': 'HomePhone',
                          'Cell_Phone': 'MobilePhone',
                        #   'Preferred_Phone': 'npe01__PreferredPhone__c',
                          # IF Preferred phone contains, 'do not publish'
                          'Work_Email' : 'npe01__WorkEmail__c',
                          'Archdiocesan_Email': 'npe01__AlternateEmail__c',
                          'Home_Email': 'npe01__HomeEmail__c',
                        #   'Preferred_Email': 'npe01__Preferred_Email__c',
                          # IF Preferred email contains 'do not publish''
                          'Directory_Include': 'Directory_Include__c',
                          'Directory_Include_Middle_Name': 'Directory_Include_Middle_Name__c',
                          'Directory_Include_Suffix': 'Directory_Include_Suffix__c',
                          'Suppress_From_Reports': 'Suppress_From_Reports__c',
                          'Send_Group_Mail_and_Email': 'Send_Group_Mail_and_Email__c',
                          'Birth_Date': 'Birthdate',
                          'Place_of_Birth': 'mbfc__Place_of_Birth__c',
                          'Foreign_Born': 'Foreign_Born__c',
                          'Foreign_Citizenship': 'Foreign_Citizenship__c',
                          'Immigration_Status': 'Immigration_Status__c',
                          'Passport/Visa_Expiration_Date': 'Passport_Visa_Expiration_Date__c',
                          'Social_Security_Account_Number': 'Social_Security_Account_Number__c',
                          'Deceased_Date': 'mbfc__Date_of_Death__c',
                          'Out_of_Diocese_Date': 'mbfc__Date_Left_Diocese__c', 
                          'CARA_Ethnicity': 'adpdx_CARA_Ethnicity__c',
                          'Seminarian_Status': 'adpdx_Seminarian_Status__c',
                          'Other_Diaconal_Ministry': 'adpdx_Other_Diaconal_Ministry__c',
                          'Spiritual_Director_Authorized': 'adpdx_Spiritual_Director_Authorized__c',
                          'Place_of_Work': 'adpdx_Place_of_Work__c',
                          'Volunteer_Place': 'adpdx_Volunteer_Place__c',
                          'Type_of_Work': 'adpdx_Type_of_Work__c',
                          'Work_Load': 'adpdx_Work_Load__c',
                          'Work_Title': 'adpdx_Work_Title__c',
                          'Coverage_Availability': 'adpdx_Coverage_Availability__c', 
                          'Advanced_Directive_Date': 'adpdx_Advanced_Directive_Date__c',
                          'End_of_Life_Plan_Date': 'adpdx_End_of_Life_Plan_Date__c',
                          'Will_Date': 'adpdx_Will_Date__c',
                          'Will_Note': 'adpdx_Will_Note__c',
                          'CIC_489_File': 'adpdx_CIC_489_File__c',
                          'Senior_Status_Date': 'adpdx_Senior_Status_Date__c', 
                          'Laicized_Date': 'adpdx_Laicized_Date__c',
                          'Seminarian_Student_Debt': 'adpdx_Seminarian_Student_Debt__c',
                          'Seminarian_Medical_Benefits': 'adpdx_Seminarian_Medical_Benefits__c',
                          'Candidacy_Date': 'adpdx_Candidacy_Date__c',
                          'Accepted_to_Formation_Date': 'adpdx_Accepted_to_Formation_Date__c',
                          'Formation_Withdrawn_Date': 'adpdx_Formation_Withdrawn_Date__c',
                          'Formation_Deferred_Date': 'adpdx_Formation_Deferred_Date__c',
                          'Formation_Terminated_Date': 'adpdx_Formation_Terminated_Date__c',
                          'Terminate_or_Defer_Note': 'adpdx_Terminate_or_Defer_Note__c',
                          'CARA_Highest_Ed_Level': 'adpdx_CARA_Highest_Ed_Level__c',
                          'Letter_of_Good_Standing_Date': 'adpdx_Letter_of_Good_Standing__c',
                          'Religious_In_Archdiocese_Date': 'mbfc__Date_of_Arrival_in_Diocese__c',
                          'Last_Retreat_Date': 'adpdx_Last_Retreat_Date__c',
                          'Last_Educ_Requirement_Date': 'adpdx_Last_Educ_Requirement_Date__c',
                          'Policy_Manual_Acknowledgement_Date': 'adpdx_Policy_Manual_Acknowledgement_Date__c',
                          'Harassment_Prevention_Course_Date': 'adpdx_Harassment_Prevention_Course_Date__c',
                          'Standards_of_Conduct_Date': 'adpdx_Standards_of_Conduct_Date__c',
                          'Last_Background_Check_Date': 'adpdx_Last_Background_Check_Date__c',
                          'Last_Child_Protection_Training_Date': 'adpdx_Last_Child_Protection_Training__c',
                          'Languages': 'Languages__c',
                          'Nickname': 'adpdx_Preferred_Name__c'

                          })
                      .assign(Bi_Ritual__c=lambda x: x['Type(s)'].str.contains('Biritual'))
                      .assign(Non_Latin_Rite__c=lambda x: x['Type(s)'].str.contains('Non-Latin Rite'))
                      .assign(adpdx_Discerner_Aspirant_for_Diaconate__c=lambda x: x['Type(s)'].str.contains('Diaconate'))
                      .assign(adpdx_Is_Seminarian__c=lambda x: x['Type(s)'].str.contains('Seminar'))
                      
                      .assign(Archdpdx_Migration_Id__c=lambda x: x.index)
                      .assign(MailingStreet=lambda x: x.apply(lambda row: combine_addresses(row, 'Mailing_Address', 'Mailing_Address_2'), axis=1))
                      .drop(columns=['Mailing_Address', 'Mailing_Address_2'])  # Optional: Drop original columns if not needed
                      .assign(OtherStreet=lambda x: x.apply(lambda row: combine_addresses(row, 'Private_Address', 'Private_Address_2'), axis=1))
                      .drop(columns=['Private_Address', 'Private_Address_2'])  # Optional: Drop original columns if not needed
                      .drop(columns=misc_columns_to_drop)
                      .drop(columns=affiliation_columns)
                      .drop(columns=fields_not_yet_mapped)

        )


In [111]:
df_contact_staging.columns

Index(['Type(s)', 'ADPDX_Clergy_Status__c', 'ADPDX_Religious_Status__c',
       'ADPDX_Login_ID__c', 'ADPDX_Access_Permission__c', 'Salutation',
       'FirstName', 'adpdx_Preferred_Name__c', 'MiddleName', 'LastName',
       'Suffix', 'MailingCity', 'MailingState', 'MailingPostalCode',
       'MailingCountry', 'OtherCity', 'OtherState', 'OtherPostalCode',
       'OtherCountry', 'Preferred_Address__c', 'npe01__WorkPhone__c',
       'HomePhone', 'MobilePhone', 'npe01__WorkEmail__c',
       'npe01__AlternateEmail__c', 'npe01__HomeEmail__c',
       'Directory_Include__c', 'Directory_Include_Middle_Name__c',
       'Directory_Include_Suffix__c', 'Suppress_From_Reports__c',
       'adpdx_Seminarian_Student_Debt__c',
       'adpdx_Seminarian_Medical_Benefits__c', 'Send_Group_Mail_and_Email__c',
       'Birthdate', 'mbfc__Place_of_Birth__c', 'Foreign_Born__c',
       'Foreign_Citizenship__c', 'Immigration_Status__c',
       'Passport_Visa_Expiration_Date__c',
       'adpdx_Accepted_to_Formatio

In [112]:
df_contact_staging.MailingStreet.sample(10)

Record Number
1132                                                     
1389                                                     
3303        All Saints School\n601 NE Cesar E Chavez Blvd
2267            The Madeleine School\n3240 NE 23rd Avenue
3272         St. Philip Benizi Parish\n18211 S Henrici Rd
428                                         717 N 3rd Ave
1543       University of Portland\n5000 N Willamette Blvd
1002                                                     
2658    Benedictine Sisters of Mount Angel\n840 S Main St
1644               St. Francis Parish\n15651 SW Oregon St
Name: MailingStreet, dtype: object

### Languages


In [113]:
# # Define a function to clean the 'languages' column

# import re
# def clean_languages(text):
#     if pd.isna(text):
#         return text
#     # Remove text inside parentheses
#     text = re.sub(r'\(.*?\)', '', text)
#     # Replace ' & ' or ' and ' with ';'
#     text = re.sub(r' & | and ', ';', text)
#     # Replace commas with semicolons
#     text = text.replace(',', ';')
#     # Remove spaces before and after semicolons
#     text = re.sub(r'\s*;\s*', ';', text)
#     return text.strip(';')

# # Apply the cleaning function to the 'languages' column
# df_contact_staging['Languages__c'] = df_contact_staging['Languages__c'].apply(clean_languages)


### Private Address Handling


In [114]:
# If 'OtherStreet' is not null, then set Secondary Address Type to 'Private'.  This is because the 'OtherAddress' fields all come from the 'Private' address fields in source system. 
df_contact_staging['npe01__Secondary_Address_Type__c'] = df_contact_staging['OtherStreet'].apply(lambda x: 'Private' if pd.notnull(x) else None)


### Handle Boolean Fields


In [115]:
boolean_columns_to_convert = ['Foreign_Born__c', 'Directory_Include__c', 'Directory_Include_Middle_Name__c', 'Directory_Include_Suffix__c',
       'Suppress_From_Reports__c', 'Send_Group_Mail_and_Email__c', ]

df_contact_staging[boolean_columns_to_convert] = df_contact_staging[boolean_columns_to_convert].replace({'Yes': True, 'No': False})


In [116]:
df_contact_staging[boolean_columns_to_convert] = df_contact_staging[boolean_columns_to_convert].fillna(False)

df_contact_staging[boolean_columns_to_convert].sample(5)

Unnamed: 0_level_0,Foreign_Born__c,Directory_Include__c,Directory_Include_Middle_Name__c,Directory_Include_Suffix__c,Suppress_From_Reports__c,Send_Group_Mail_and_Email__c
Record Number,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
1530,False,False,False,False,False,False
1981,False,False,False,False,False,True
2185,False,False,False,False,False,True
916,False,False,False,False,False,True
2740,True,False,False,False,False,True


### Set Contact Record Type


In [117]:
# Set Record Type

# Go down row by row and check the 'Type(s)' columns, check for certain words that are keys in a dictionary, and
# the that row's 'Type(s)' field contains a string that is in the a key in a dictionary the update another columns
# called 'ContactRecordType' with the paired value.

contact_type_map = {
    'Bishop': 'Priest',
    'Priest': 'Priest',
    'Transitional Deacon': 'Permanent_Deacon',
    'Permanent Deacon': 'Permanent_Deacon',
    'Seminarian': 'Lay_Person',
    'Diaconate Formation': 'Lay_Person',
    'Seminary Applicant': 'Lay_Person',
    'Diaconate Inquirer': 'Lay_Person',
    'Wife': 'Lay_Person',
    'Religious': 'Religious',
    'Staff': 'Lay_Person',
    'Seminary Applicant': 'Lay_Person',
    'Archive': 'Lay_Person'
}

def update_contact_record_type(row):
    for key, value in contact_type_map.items():
        if key in row['Type(s)']:
            return value
    return None

df_contact_staging['ContactRecordType'] = df_contact_staging.apply(update_contact_record_type, axis=1)

In [118]:
# Map in the RecordTypeIDs
df_contact_staging['RecordTypeID'] = df_contact_staging['ContactRecordType'].map(record_types_mapping)

### Ecclesial Status & Ministerial Status


In [119]:
df_contact_staging

Unnamed: 0_level_0,Type(s),ADPDX_Clergy_Status__c,ADPDX_Religious_Status__c,ADPDX_Login_ID__c,ADPDX_Access_Permission__c,Salutation,FirstName,adpdx_Preferred_Name__c,MiddleName,LastName,Suffix,MailingCity,MailingState,MailingPostalCode,MailingCountry,OtherCity,OtherState,OtherPostalCode,OtherCountry,Preferred_Address__c,npe01__WorkPhone__c,HomePhone,MobilePhone,npe01__WorkEmail__c,npe01__AlternateEmail__c,npe01__HomeEmail__c,Directory_Include__c,Directory_Include_Middle_Name__c,Directory_Include_Suffix__c,Suppress_From_Reports__c,adpdx_Seminarian_Student_Debt__c,adpdx_Seminarian_Medical_Benefits__c,Send_Group_Mail_and_Email__c,Birthdate,mbfc__Place_of_Birth__c,Foreign_Born__c,Foreign_Citizenship__c,Immigration_Status__c,Passport_Visa_Expiration_Date__c,adpdx_Accepted_to_Formation_Date__c,adpdx_Candidacy_Date__c,adpdx_Formation_Withdrawn_Date__c,adpdx_Formation_Deferred_Date__c,adpdx_Formation_Terminated_Date__c,adpdx_Terminate_or_Defer_Note__c,adpdx_CARA_Highest_Ed_Level__c,Incardinated_Now,adpdx_Letter_of_Good_Standing__c,mbfc__Date_of_Arrival_in_Diocese__c,adpdx_Last_Retreat_Date__c,adpdx_Last_Educ_Requirement_Date__c,adpdx_Policy_Manual_Acknowledgement_Date__c,adpdx_Harassment_Prevention_Course_Date__c,adpdx_Standards_of_Conduct_Date__c,adpdx_Last_Background_Check_Date__c,adpdx_Last_Child_Protection_Training__c,mbfc__Date_Left_Diocese__c,adpdx_Senior_Status_Date__c,adpdx_Laicized_Date__c,mbfc__Date_of_Death__c,Languages__c,adpdx_Coverage_Availability__c,adpdx_Advanced_Directive_Date__c,adpdx_End_of_Life_Plan_Date__c,adpdx_Will_Date__c,adpdx_Will_Note__c,adpdx_CIC_489_File__c,adpdx_CARA_Ethnicity__c,adpdx_Seminarian_Status__c,adpdx_Other_Diaconal_Ministry__c,adpdx_Spiritual_Director_Authorized__c,Link_to_Religious_Community,adpdx_Place_of_Work__c,adpdx_Volunteer_Place__c,adpdx_Type_of_Work__c,adpdx_Work_Load__c,adpdx_Work_Title__c,Bi_Ritual__c,Non_Latin_Rite__c,adpdx_Discerner_Aspirant_for_Diaconate__c,adpdx_Is_Seminarian__c,Archdpdx_Migration_Id__c,MailingStreet,OtherStreet,npe01__Secondary_Address_Type__c,ContactRecordType,RecordTypeID
Record Number,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1,Unnamed: 22_level_1,Unnamed: 23_level_1,Unnamed: 24_level_1,Unnamed: 25_level_1,Unnamed: 26_level_1,Unnamed: 27_level_1,Unnamed: 28_level_1,Unnamed: 29_level_1,Unnamed: 30_level_1,Unnamed: 31_level_1,Unnamed: 32_level_1,Unnamed: 33_level_1,Unnamed: 34_level_1,Unnamed: 35_level_1,Unnamed: 36_level_1,Unnamed: 37_level_1,Unnamed: 38_level_1,Unnamed: 39_level_1,Unnamed: 40_level_1,Unnamed: 41_level_1,Unnamed: 42_level_1,Unnamed: 43_level_1,Unnamed: 44_level_1,Unnamed: 45_level_1,Unnamed: 46_level_1,Unnamed: 47_level_1,Unnamed: 48_level_1,Unnamed: 49_level_1,Unnamed: 50_level_1,Unnamed: 51_level_1,Unnamed: 52_level_1,Unnamed: 53_level_1,Unnamed: 54_level_1,Unnamed: 55_level_1,Unnamed: 56_level_1,Unnamed: 57_level_1,Unnamed: 58_level_1,Unnamed: 59_level_1,Unnamed: 60_level_1,Unnamed: 61_level_1,Unnamed: 62_level_1,Unnamed: 63_level_1,Unnamed: 64_level_1,Unnamed: 65_level_1,Unnamed: 66_level_1,Unnamed: 67_level_1,Unnamed: 68_level_1,Unnamed: 69_level_1,Unnamed: 70_level_1,Unnamed: 71_level_1,Unnamed: 72_level_1,Unnamed: 73_level_1,Unnamed: 74_level_1,Unnamed: 75_level_1,Unnamed: 76_level_1,Unnamed: 77_level_1,Unnamed: 78_level_1,Unnamed: 79_level_1,Unnamed: 80_level_1,Unnamed: 81_level_1,Unnamed: 82_level_1,Unnamed: 83_level_1,Unnamed: 84_level_1,Unnamed: 85_level_1,Unnamed: 86_level_1,Unnamed: 87_level_1
2766,Priest,Transferred Out,,sabukaka,,Rev.,Stephen,,Ozovehe,Abaukaka,,Tualatin,OR,97062,,Portland,OR,97202,,Mailing,503-430-7699,,773-733-3772,,,abstoz@yahoo.com,True,False,False,False,0,,True,1967-06-07,,False,,,,,,,,,,,"Diocese of Lokoja, Nigeria",,,,,,2022-05-30,2021-11-03,2021-11-04,2022-11-24,2023-01-16,,,,,,,,,,,,,,,0,,,,,,False,False,False,False,2766,Brighton Hospice Office\n8050 SW Warm Springs ...,5802 SW Milwaukie Ave Apt 4,Private,Priest,012Dx0000003p5JIAQ
2337,Staff,,,,,Mr.,Rogelio,,,Acevedo,,Portland,OR,97229,,,,,,,503-644-5264,,,facilities@stpius.org,,,False,False,False,False,0,,True,,,False,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,0,,,,,,False,False,False,False,2337,St. Pius X Parish\n1280 NW Saltzman Rd,,Private,Lay_Person,012Dx0000003p5HIAQ
3244,Staff,,,,,Mr.,Sean,,,Ackroyd,,Corvallis,OR,97330,,,,,,,541-757-1988,,,,,,False,False,False,False,0,,True,,,False,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,0,,,,,,False,False,False,False,3244,St. Mary Parish\n501 NW 25th St,,Private,Lay_Person,012Dx0000003p5HIAQ
3295,Staff,,,,,Ms.,Sherril,,,Acton,,Eugene,OR,97401,,,,,,,541-686-2234 x1524,,,sacton@marisths.org,,,False,False,False,False,0,,True,,,False,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,0,,,,,,False,False,False,False,3295,Marist Catholic High School\n1900 Kingsley Rd,,Private,Lay_Person,012Dx0000003p5HIAQ
2164,Staff,,,,,Ms.,Barbara,,,Adams,,Gresham,OR,97030,,,,,,,503-665-9129,,,adamsby@eou.edu,,,False,False,False,False,0,,True,,,False,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,0,,,,,,False,False,False,False,2164,St. Henry Parish\n346 NW 1st St,,Private,Lay_Person,012Dx0000003p5HIAQ
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
1670,Staff,,,,,Ms.,Jenny,,,Zomerdyk,,Central Point,OR,97502,,,,,,,541-664-1050,,,churchoffice@shepherdcatholic.com,,,False,False,False,False,0,,True,,,False,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,0,,,,,,False,False,False,False,1670,Shepherd of the Valley Parish\n600 Beebe Rd,,Private,Lay_Person,012Dx0000003p5HIAQ
2755,Religious,,Active,dzorrilla,,Br.,Daniel,,,Zorrilla,,Saint Benedict,OR,97373,,,,,,,503-845-1181,,,,,,False,False,False,False,0,,True,,,False,,,,,,,,,,,,,2021-08-01,,,,,,2019-06-28,2021-10-10,,,,,,,,,,,,,,,,14,,,,,,False,False,False,False,2755,Félix Rougier House of Studies\nPO Box 499,,Private,Religious,012Dx0000003p5KIAQ
1962,Staff,,,,,Ms.,Kim,,,Zuber,,Sublimity,OR,97385,,,,,,,503-769-5664,,,boniface@wvi.com,,,False,False,False,False,0,,True,,,False,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,0,,,,,,False,False,False,False,1962,St. Boniface Parish\n375 SE Church St,,Private,Lay_Person,012Dx0000003p5HIAQ
2202,Staff,,,,,Ms.,Agnes,,,Zueger,,Lake Oswego,OR,97034,,,,,,,503-636-7687,,,agnesz@ollparish.com,,,False,False,False,False,0,,True,,,False,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,0,,,,,,False,False,False,False,2202,Our Lady of the Lake Parish\n650 A Ave,,Private,Lay_Person,012Dx0000003p5HIAQ


In [120]:
def determine_ecclesial_status(df):
    def ecclesial_status(row):
        if pd.notna(row['ADPDX_Clergy_Status__c']) and 'Laicized' in row['ADPDX_Clergy_Status__c']:
            return 'Laicized'
        # elif pd.notna(row['ADPDX_Clergy_Status__c']) and 'Faculties Withdrawn' in row['ADPDX_Clergy_Status__c']:
        #     return 'Faculties Withdrawn'
        elif pd.notna(row['Type(s)']) and 'Bishop' in row['Type(s)']:
            return 'Bishop/Archbishop'
        elif pd.notna(row['Type(s)']) and 'Priest,Religious' in row['Type(s)']:
            return 'Priest - Religious'
        elif pd.notna(row['Type(s)']) and 'Priest' in row['Type(s)'] and (not pd.isna(row['Foreign_Citizenship__c']) or row['Incardinated_Now'] != 'Archdiocese of Portland in Oregon'):
            return 'Priest - Temporary Sojourn (Foreign)'
        elif pd.notna(row['Type(s)']) and 'Priest' in row['Type(s)'] and (pd.isna(row['Foreign_Citizenship__c']) and row['Incardinated_Now'] == 'Archdiocese of Portland in Oregon'):
            return 'Priest - Diocesan'
        elif pd.notna(row['Type(s)']) and row['Type(s)'] == 'Permanent Deacon':
            return 'Permanent Deacon'
        else:
            return None

    df['mbfc__Ecclesial_Status__c'] = df.apply(ecclesial_status, axis=1)
    return df


df_contact_staging = determine_ecclesial_status(df_contact_staging)

In [121]:
# This function is no longer used due to ADPDX's custom enhancement in which a Flow automatically updates this status. 

def determine_ministerial_status(df):
    def ministerial_status(row):
        if row['ADPDX_Clergy_Status__c'] == 'Deceased':
            return 'Deceased'
        elif row['ADPDX_Clergy_Status__c'] == 'Active':
            return 'Active in Ministry'
        elif row['ADPDX_Clergy_Status__c'] == 'Inactive':
            return 'Inactive'
        elif row['ADPDX_Clergy_Status__c'] == 'Senior Status':
            return 'Senior Status'
        elif row['ADPDX_Clergy_Status__c'] == 'Faculties Withdrawn':
            return 'Faculties Withdrawn'
        elif row['ADPDX_Clergy_Status__c'] == 'Transferred Out':
            return 'Left Diocese'
        elif row['ADPDX_Clergy_Status__c'] == 'Unassigned':
            return 'Unassigned'
        elif row['ADPDX_Clergy_Status__c'] == 'Laicized':
            return 'Laicized'
        else:
            return 'Unknown'
        
    df['mbfc__Ministerial_Status__c'] = df.apply(ministerial_status, axis=1)
    return df

# df_contact_staging = determine_ministerial_status(df_contact_staging)

### Religious Congregation

In this section, for those Contacts who have a value in the `Link to Religious Community` source field we need to populate the `mbfc__Religious_Order__c` target field in Salesforce with the correct Religious Community's parent account - the Religious Congregation.

NOTE: In the source data, there is no differentiation between a child Religious Community and a parent Religious Order, there is only one record for the Religious Comnmunity. In MF360 we represent these Accounts separately so we need to first (a) get the Religious Community record using the `Link to Religious Community` value but transforming it (adding 'RelCommunities\_' in front of the value) so it matches the Archdpdx_Migration_Id\_\_c in Salesforce.

Once acquired, (b) we need to get the value of the `ParentID` field on the Religious Community which is the ID of the Religious Congregation record. That ID is the value we then want to populate in the `mbfc__Religious_Order__c` field.


In [122]:
# get SF Account
get_all_accounts = 'Select Id, Name, RecordTypeId, Type, mbfc__Parish_Code__c, Job_Id__c, Archdpdx_Migration_Id__c, ParentID from Account WHERE Archdpdx_Migration_Id__c != null'

# get list of records, add to dataframe
sf_accounts = sf.query(get_all_accounts)
df_sf_accounts = pd.DataFrame(sf_accounts['records'])
df_sf_accounts = df_sf_accounts.drop(columns = 'attributes')

# create a dict in order to apply later
accounts_id_map = df_sf_accounts.set_index('Archdpdx_Migration_Id__c')['Id'].to_dict()

In [123]:
df_sf_accounts[df_sf_accounts['Archdpdx_Migration_Id__c'].str.contains('RelCommunities', na=False)]

Unnamed: 0,Id,Name,RecordTypeId,Type,mbfc__Parish_Code__c,Job_Id__c,Archdpdx_Migration_Id__c,ParentId
228,001Dx00001HwE4dIAF,"Colombiere Jesuit Community, Portland (SJ)",012Dx0000003p52IAA,,,124,RelCommunities_1,001Dx00001HwE3TIAV
229,001Dx00001HwE4jIAF,"Franciscan Sisters of the Eucharist, Bridal Ve...",012Dx0000003p52IAA,,,124,RelCommunities_10,001Dx00001HwE3YIAV
230,001Dx00001HwE4kIAF,"Apostolic Life Community, Portland (ALCP)",012Dx0000003p52IAA,,,124,RelCommunities_11,001Dx00001HwE3ZIAV
231,001Dx00001HwE4lIAF,"Blessed Stephen Bellesini Community, c/o Chanc...",012Dx0000003p52IAA,,,124,RelCommunities_12,001Dx00001HwE3aIAF
232,001Dx00001HwE4mIAF,Canisius Jesuit Community at Jesuit High Schoo...,012Dx0000003p52IAA,,,124,RelCommunities_13,001Dx00001HwE3TIAV
...,...,...,...,...,...,...,...,...
292,001Dx00001HwE5hIAF,"Society of Our Lady of the Most Holy Trinity, ...",012Dx0000003p52IAA,,,124,RelCommunities_79,001Dx00001HwE4SIAV
293,001Dx00001HwE4hIAF,Missionaries of the Holy Spirit Provincial Hou...,012Dx0000003p52IAA,,,124,RelCommunities_8,001Dx00001HwE3WIAV
294,001Dx00001HwE5iIAF,"Community of St. Thomas More, Eugene (OP)",012Dx0000003p52IAA,,,124,RelCommunities_80,001Dx00001HwE3dIAF
295,001Dx00001HwE5jIAF,"Saint Benedict Lodge, McKenzie Bridge (OP)",012Dx0000003p52IAA,,,124,RelCommunities_81,001Dx00001HwE3dIAF


In [124]:
# applies a lambda function to each element in the ‘Link_to_Religious_Community’ column, prefixing the value with 'RelCommunities_'
def transform_religious_community_link(df):
    df['Link_to_Religious_Community'] = df['Link_to_Religious_Community'].apply(
        lambda x: None if x == '0' else f'RelCommunities_{x}'
    )
    return df

# This function searches for a record in the sf_accounts DataFrame where the ‘Archdpdx_Migration_Id__c’ column matches the given archdpdx_migration_id
def get_parent_id_from_salesforce(sf_accounts, archdpdx_migration_id):
    print(f"Searching for: {archdpdx_migration_id}")  # Debug print
    matching_record = sf_accounts[sf_accounts['Archdpdx_Migration_Id__c'] == archdpdx_migration_id]
    if not matching_record.empty:
        print(f"Found: {matching_record['ParentId'].values[0]}")  # Debug print
        return matching_record['ParentId'].values[0]
    print("Not found")  # Debug print
    return None

# uses the get_parent_id_from_salesforce function to find the ‘ParentId’ from the sf_accounts DataFrame
def update_religious_order(df, sf_accounts):
    df['mbfc__Religious_Order__c'] = df.apply(
        lambda row: get_parent_id_from_salesforce(sf_accounts, row['Link_to_Religious_Community']) 
        if row['Link_to_Religious_Community'] is not None else None, axis=1
    )
    return df


# run the transform_religious_community_link and update_religious_order functions
df_contact_staging = transform_religious_community_link(df_contact_staging)

df_contact_staging = update_religious_order(df_contact_staging, df_sf_accounts)

Searching for: RelCommunities_60
Found: 001Dx00001HwE3TIAV
Searching for: RelCommunities_53
Found: 001Dx00001HwE45IAF
Searching for: RelCommunities_9
Found: 001Dx00001HwE3XIAV
Searching for: RelCommunities_4
Found: 001Dx00001HwE3VIAV
Searching for: RelCommunities_8
Found: 001Dx00001HwE3WIAV
Searching for: RelCommunities_35
Found: 001Dx00001HwE3nIAF
Searching for: RelCommunities_1
Found: 001Dx00001HwE3TIAV
Searching for: RelCommunities_23
Not found
Searching for: RelCommunities_56
Found: 001Dx00001HwE48IAF
Searching for: RelCommunities_23
Not found
Searching for: RelCommunities_53
Found: 001Dx00001HwE45IAF
Searching for: RelCommunities_60
Found: 001Dx00001HwE3TIAV
Searching for: RelCommunities_1
Found: 001Dx00001HwE3TIAV
Searching for: RelCommunities_27
Found: 001Dx00001HwE3iIAF
Searching for: RelCommunities_44
Found: 001Dx00001HwE3wIAF
Searching for: RelCommunities_23
Not found
Searching for: RelCommunities_44
Found: 001Dx00001HwE3wIAF
Searching for: RelCommunities_60
Found: 001Dx00001

In [125]:
df_contact_staging[df_contact_staging.mbfc__Religious_Order__c.isna() == False]

Unnamed: 0_level_0,Type(s),ADPDX_Clergy_Status__c,ADPDX_Religious_Status__c,ADPDX_Login_ID__c,ADPDX_Access_Permission__c,Salutation,FirstName,adpdx_Preferred_Name__c,MiddleName,LastName,Suffix,MailingCity,MailingState,MailingPostalCode,MailingCountry,OtherCity,OtherState,OtherPostalCode,OtherCountry,Preferred_Address__c,npe01__WorkPhone__c,HomePhone,MobilePhone,npe01__WorkEmail__c,npe01__AlternateEmail__c,npe01__HomeEmail__c,Directory_Include__c,Directory_Include_Middle_Name__c,Directory_Include_Suffix__c,Suppress_From_Reports__c,adpdx_Seminarian_Student_Debt__c,adpdx_Seminarian_Medical_Benefits__c,Send_Group_Mail_and_Email__c,Birthdate,mbfc__Place_of_Birth__c,Foreign_Born__c,Foreign_Citizenship__c,Immigration_Status__c,Passport_Visa_Expiration_Date__c,adpdx_Accepted_to_Formation_Date__c,adpdx_Candidacy_Date__c,adpdx_Formation_Withdrawn_Date__c,adpdx_Formation_Deferred_Date__c,adpdx_Formation_Terminated_Date__c,adpdx_Terminate_or_Defer_Note__c,adpdx_CARA_Highest_Ed_Level__c,Incardinated_Now,adpdx_Letter_of_Good_Standing__c,mbfc__Date_of_Arrival_in_Diocese__c,adpdx_Last_Retreat_Date__c,adpdx_Last_Educ_Requirement_Date__c,adpdx_Policy_Manual_Acknowledgement_Date__c,adpdx_Harassment_Prevention_Course_Date__c,adpdx_Standards_of_Conduct_Date__c,adpdx_Last_Background_Check_Date__c,adpdx_Last_Child_Protection_Training__c,mbfc__Date_Left_Diocese__c,adpdx_Senior_Status_Date__c,adpdx_Laicized_Date__c,mbfc__Date_of_Death__c,Languages__c,adpdx_Coverage_Availability__c,adpdx_Advanced_Directive_Date__c,adpdx_End_of_Life_Plan_Date__c,adpdx_Will_Date__c,adpdx_Will_Note__c,adpdx_CIC_489_File__c,adpdx_CARA_Ethnicity__c,adpdx_Seminarian_Status__c,adpdx_Other_Diaconal_Ministry__c,adpdx_Spiritual_Director_Authorized__c,Link_to_Religious_Community,adpdx_Place_of_Work__c,adpdx_Volunteer_Place__c,adpdx_Type_of_Work__c,adpdx_Work_Load__c,adpdx_Work_Title__c,Bi_Ritual__c,Non_Latin_Rite__c,adpdx_Discerner_Aspirant_for_Diaconate__c,adpdx_Is_Seminarian__c,Archdpdx_Migration_Id__c,MailingStreet,OtherStreet,npe01__Secondary_Address_Type__c,ContactRecordType,RecordTypeID,mbfc__Ecclesial_Status__c,mbfc__Religious_Order__c
Record Number,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1,Unnamed: 22_level_1,Unnamed: 23_level_1,Unnamed: 24_level_1,Unnamed: 25_level_1,Unnamed: 26_level_1,Unnamed: 27_level_1,Unnamed: 28_level_1,Unnamed: 29_level_1,Unnamed: 30_level_1,Unnamed: 31_level_1,Unnamed: 32_level_1,Unnamed: 33_level_1,Unnamed: 34_level_1,Unnamed: 35_level_1,Unnamed: 36_level_1,Unnamed: 37_level_1,Unnamed: 38_level_1,Unnamed: 39_level_1,Unnamed: 40_level_1,Unnamed: 41_level_1,Unnamed: 42_level_1,Unnamed: 43_level_1,Unnamed: 44_level_1,Unnamed: 45_level_1,Unnamed: 46_level_1,Unnamed: 47_level_1,Unnamed: 48_level_1,Unnamed: 49_level_1,Unnamed: 50_level_1,Unnamed: 51_level_1,Unnamed: 52_level_1,Unnamed: 53_level_1,Unnamed: 54_level_1,Unnamed: 55_level_1,Unnamed: 56_level_1,Unnamed: 57_level_1,Unnamed: 58_level_1,Unnamed: 59_level_1,Unnamed: 60_level_1,Unnamed: 61_level_1,Unnamed: 62_level_1,Unnamed: 63_level_1,Unnamed: 64_level_1,Unnamed: 65_level_1,Unnamed: 66_level_1,Unnamed: 67_level_1,Unnamed: 68_level_1,Unnamed: 69_level_1,Unnamed: 70_level_1,Unnamed: 71_level_1,Unnamed: 72_level_1,Unnamed: 73_level_1,Unnamed: 74_level_1,Unnamed: 75_level_1,Unnamed: 76_level_1,Unnamed: 77_level_1,Unnamed: 78_level_1,Unnamed: 79_level_1,Unnamed: 80_level_1,Unnamed: 81_level_1,Unnamed: 82_level_1,Unnamed: 83_level_1,Unnamed: 84_level_1,Unnamed: 85_level_1,Unnamed: 86_level_1,Unnamed: 87_level_1,Unnamed: 88_level_1,Unnamed: 89_level_1
671,"Priest,Religious",Transferred Out,Transferred Out,jadams,,Rev.,J.,J.K.,K.,Adams,III,,,,,,,,,,,503-975-4744,,jadams@jesuits.org,,,False,False,False,False,0,,True,,,False,,,,,,,,,,,,,,,,,,,,,2010-06-30,,,,,,,,,,,,,,,RelCommunities_60,,,,,,False,False,False,False,671,,,Private,Priest,012Dx0000003p5JIAQ,Priest - Religious,001Dx00001HwE3TIAV
2430,Religious,,Active,,,Sr.,Delores,,,Adelman,,Beaverton,OR,97078,,Beaverton,OR,97078,,,503-644-9181,503-718-0411,,,,srdeloresa@ssmo.org,True,False,False,False,0,,True,,,False,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,RelCommunities_53,,,,,,False,False,False,False,2430,Sisters of St. Mary of Oregon\n4440 SW 148th Ave,4595 SW 148th Ave,Private,Religious,012Dx0000003p5KIAQ,,001Dx00001HwE45IAF
1584,"Priest,Religious",Active,Active,makuti,,Rev.,Macdonald,,,Akuti,,Rockaway,OR,97136,,,,,,,503-355-2661,,424-410-0097,padreakuti@gmail.com,makuti@archdpdx.org,,True,False,False,False,0,,True,1977-08-18,"Vura Bilinyo, Uganda",True,Uganda,R1 (Religious Visa),2022-02-14,,,,,,,,"Apostles of Jesus, Kenya",2019-04-25,,,,2019-05-24,2022-04-21,2020-01-10,2022-04-28,2022-11-23,,,,,,,,,,,,,,,,RelCommunities_9,"St. Mary’s by the Sea Parish, Rockaway",,Parish Ministry,Full Time,Administrator,False,False,False,False,1584,St. Mary by the Sea Parish\nPO Box 390,,Private,Priest,012Dx0000003p5JIAQ,Priest - Religious,001Dx00001HwE3XIAV
912,"Priest,Religious",Transferred Out,Transferred Out,,,Rt. Rev.,James,,,Albers,,,,,,,,,,,,,,,,,False,False,False,False,0,,True,,,False,,,,,,,,,,,,,,,,,,,,,1900-01-01,,,,,,,,,,,,,,,RelCommunities_4,,,,,,False,False,False,False,912,,,Private,Priest,012Dx0000003p5JIAQ,Priest - Religious,001Dx00001HwE3VIAV
913,"Priest,Religious",Transferred Out,Transferred Out,,,Rev.,Jose,,,Alberto,,,,,,,,,,,,,,,,,False,False,False,False,0,,True,,,False,,,,,,,,,,,,,,,,,,,,,1900-01-01,,,,,,,,,,,,,,,RelCommunities_8,,,,,,False,False,False,False,913,,,Private,Priest,012Dx0000003p5JIAQ,Priest - Religious,001Dx00001HwE3WIAV
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
2884,"Priest,Religious",Transferred Out,Transferred Out,pyoun,,Rev.,Pius,,,Youn,,Eugene,OR,97403,,,,,,,541-343-7021,520-222-8844,907-313-9028,,,pius.youn@gmail.com,True,False,False,False,0,,True,1987-08-31,,False,,,,,,,,,,,,,2022-06-01,,,2022-05-22,2022-10-18,2022-10-10,2022-10-10,2023-01-26,2023-06-05,,,,"Korean, Spanish, Italian, Latin",,,,,,,,,,,RelCommunities_18,,,,,,False,False,False,False,2884,St. Thomas More Newman Center Parish\n1850 Eme...,,Private,Priest,012Dx0000003p5JIAQ,Priest - Religious,001Dx00001HwE3dIAF
1434,"Priest,Religious",Deceased,Deceased,,,Rev.,Jerome,,,Young,,,,,,,,,,,,,,,,,False,False,False,False,0,,False,,,False,,,,,,,,,,,,,,,,,,,,,,,,2012-12-08,,,,,,,,,,,,RelCommunities_4,,,,,,False,False,False,False,1434,,,Private,Priest,012Dx0000003p5JIAQ,Priest - Religious,001Dx00001HwE3VIAV
1435,"Priest,Religious",Transferred Out,Transferred Out,,,Rev.,Robert,,,Young,,,,,,,,,,,,,,,,,False,False,False,False,0,,True,,,False,,,,,,,,,,,,,,,,,,,,,1900-01-01,,,,,,,,,,,,,,,RelCommunities_22,,,,,,False,False,False,False,1435,,,Private,Priest,012Dx0000003p5JIAQ,Priest - Religious,001Dx00001HwE3fIAF
787,"Priest,Religious",Senior Status,Retired,nzodrow,,Rt. Rev.,Nathan,,,Zodrow,,Saint Benedict,OR,97373,,,,,,,503-845-3030,503-236-4747,,,,nathan.zodrow@mtangel.edu,True,False,False,False,0,,False,1952-03-02,USA,False,,,,,,,,,,,Benedectines (OSB),,1974-09-08,,,,,,,,,2010-06-20,,,Spanish,,,,,,,,,,,RelCommunities_4,Mount Angel Abbey,,Curator of Art Collection / Archivist,Full Time,Curator Archivist,False,False,False,False,787,Mount Angel Abbey\n1 Abbey Dr,,Private,Priest,012Dx0000003p5JIAQ,Priest - Religious,001Dx00001HwE3VIAV


### Registered Parish

In this section we populate the 'Home Parish' target field for Contacts who have a 'Registered Parish' in the source system.

TODO: Check to see if the Registered Parish data is worth importing. Currently, 'Registered Parish' is only populated on 51 rows, and 32 of those rows in the 'Types' field are listed as 'Archive'. In other words, **only 19 of the 51 rows have a 'Registered Parish' value that might be meaningful.**


### Diocese of Incardination


In [126]:
df_contact_staging['Incardinated_Now'].sample(10)

Record Number
2679                                   NaN
1752                                   NaN
2084                                   NaN
2733                                   NaN
1391                                   NaN
2173                                   NaN
618     Dominican Friars, Western Province
2156                                   NaN
1191                                   NaN
748      Archdiocese of Portland in Oregon
Name: Incardinated_Now, dtype: object

In [127]:
# Need to look for, then create a new Account that corresponds to a given 'Diocese of Incardination', then populate with record Id. 

def update_incardinated_accounts(sf, df, column_name, record_type_dev_name, church_type, new_column_name):
    """
    Update the DataFrame by getting or creating Salesforce accounts for the values in the specified column.

    Parameters:
    sf (Salesforce): Salesforce connection object
    df (pd.DataFrame): The DataFrame to update
    column_name (str): The name of the column to search for account names
    record_type_dev_name (str): The developer name of the Record Type to use for creating the account
    church_type (str): The Church Type to set for the new account
    new_column_name (str): The name of the new column to store the Salesforce account IDs

    Returns:
    pd.DataFrame: The updated DataFrame with the new column containing Salesforce account IDs
    """
    df[new_column_name] = None

    for index, row in df.iterrows():
        account_name = row[column_name]
        if pd.notna(account_name):
            account_id = get_or_create_account(sf, account_name, record_type_dev_name, church_type)
            df.at[index, new_column_name] = account_id
    
    return df

# Example usage
# sf = Salesforce(username='your_username', password='your_password', security_token='your_security_token')
df_contact_staging = update_incardinated_accounts(sf, df_contact_staging, 'Incardinated_Now', 'Church', 'Diocese', 'mbfc__Diocese_of_Incardination__c')

# This cell takes >3m to run

In [128]:
df_contact_staging[['mbfc__Diocese_of_Incardination__c', 'Incardinated_Now']].sample(20)

Unnamed: 0_level_0,mbfc__Diocese_of_Incardination__c,Incardinated_Now
Record Number,Unnamed: 1_level_1,Unnamed: 2_level_1
1875,,
127,,
143,,
420,001Dx00001HwzptIAB,Diocese of Helena
272,,
3246,,
536,001Dx00001HwDsgIAF,Archdiocese of Portland in Oregon
1211,,
2601,,
1501,,


In [129]:
# Drop the 'Incardinated Now' column 
del df_contact_staging['Incardinated_Now']


### Deceased & Date of Death

ADPDX does not have a 'Deceased' boolean other than whether or not the Date of Death column has been populated. The target application functions based on both a 'Deceased' boolean and, optionally, a 'Date of Death.'


In [130]:
# Create a new column 'npsp__Deceased__c' and set its value to True when there is a value in 'mbfc__Date_of_Death__c'
df_contact_staging['npsp__Deceased__c'] = df_contact_staging['mbfc__Date_of_Death__c'].notna()


### Final Dataframe Cleanup


In [131]:
# drop columns that are no longer needed
# del df_contact_staging['Type(s)']  # Commented this out as we want to KEEP the field and migrated to 'ADPDX Contact Type'
del df_contact_staging['ContactRecordType']
del df_contact_staging['Link_to_Religious_Community']

In [132]:
df_contact_staging = df_contact_staging.rename(columns={'Type(s)': 'ADPDX_Contact_Type__c'})

In [133]:
# convert '' to NaN
df_contact_staging.replace("", np.nan, inplace=True)

# convert NaN to None
df_contact_staging = df_contact_staging.where(df_contact_staging.notnull(), None)


In [134]:
df_contact_staging['Languages__c'].sample(20)

Record Number
1002                                              None
2146                                              None
873                                               None
2851                                              None
624                                               None
2553                                              None
1730                                              None
2123                                              None
1692                                              None
2769                                              None
3309                                              None
1308                                              None
1871                                              None
1556                                              None
2586                                              None
2678                                              None
706     French, German, Latin, Greek, Hebrew (reading)
81                                                N

In [135]:
# df_contact_staging_2 = df_contact_staging.where(df_contact_staging.notnull(), None)

## Load


In [136]:
df_contact_staging['Archdpdx_Job_Id__c'] = curr_job_id

In [137]:
# generate CSV for manual loading
df_contact_staging.to_csv(f'/Users/matthewmartin/Library/CloudStorage/GoogleDrive-matt@meribahflow.com/Shared drives/Clients/ADPDX (Portland)/Data/Clergy DB/staging/df_contacts_staging.csv', encoding='utf-8-sig')
df_contact_staging.to_csv('staging_files/contacts_staging.csv', encoding='utf-8-sig')


In [138]:
# upsert Contact records into SF using Bulk api

from simple_salesforce.exceptions import SalesforceMalformedRequest

bulk_data = []
for row in df_contact_staging.itertuples(index=False):
    d = row._asdict()
    # del d['Index']
    bulk_data.append(d)

try:
    # Attempt to upsert Contact records into SF using Bulk API
    contact_upsert = sf.bulk.Contact.upsert(data=bulk_data, external_id_field='Archdpdx_Migration_Id__c', batch_size=500, use_serial=False)
    contact_upsert_results = pd.DataFrame(contact_upsert)
except SalesforceMalformedRequest as e:
    # If a SalesforceMalformedRequest error occurs, print the error message and response content
    print(f"SalesforceMalformedRequest error: {e}")
    print(f"Response content: {e.content}")

In [139]:
# Print upsert results to local file

keys = contact_upsert[0].keys()
with open('results_files/contact_results', 'w', newline='') as csv_file:
    writer = csv.DictWriter(csv_file, keys)
    writer.writeheader()
    writer.writerows(contact_upsert)


# CONTACT > SPOUSES

#TODO: Contact Spouses migration


# CONTACTS > PHOTOS

#TODO: Contact Photos


# CONTACT > REGISTER ENTRIES


In [140]:
import pandas as pd

# Load CSV
df = (pd.read_csv('/Users/matthewmartin/Library/CloudStorage/GoogleDrive-matt@meribahflow.com/Shared drives/Clients/ADPDX (Portland)/Data/Clergy DB/reports from clergypdx/People.csv')
               .rename(columns=lambda x: x.replace(' ', '_')) # Remove whitespace in column names
               .drop(index=0) # Drops the extra row that replicates the labels
)

df

Unnamed: 0,Record_Number,Common_Name,Sort_Name,Type(s),Clergy_Status,Religious_Status,Login_ID,Password,Password_Must_be_Changed,Access_Permission,Spouse,Title,Salutation,Christian_Name,Nickname,Middle_Name(s),Surname,Suffix,Mailing_Address,Mailing_Address_2,Mailing_Address_City,Mailing_Address_State,Mailing_Address_Province,Mailing_Address_Postal_Code,Mailing_Address_Country,Private_Address,Private_Address_2,Private_Address_City,Private_Address_State,Private_Address_Province,Private_Address_Postal_Code,Private_Address_Country,Preferred_Address,Work_Phone,Home_Phone,Cell_Phone,Preferred_Phone,Work_Email,Archdiocesan_Email,Home_Email,Preferred_Email,Directory_Include,Directory_Include_Middle_Name,Directory_Include_Suffix,Suppress_From_Reports,Seminarian_Student_Debt,Seminarian_Medical_Benefits,Send_Group_Mail_and_Email,Birth_Date,Place_of_Birth,Foreign_Born,Father_Full_Name,Mother_Full_Maiden_Name,Foreign_Citizenship,Immigration_Status,Passport/Visa_Expiration_Date,Social_Security_Account_Number,Baptism_Date,Place_of_Baptism,Confirmation_Date,Place_of_Confirmation,Received_Date,Parish_of_Record,Marriage_Date,Place_of_Marriage,Date_of_First_Vows,Date_of_Final_Vows,Accepted_to_Formation_Date,Reader_Date,Acolyte_Date,Candidacy_Date,Formation_Withdrawn_Date,Formation_Deferred_Date,Formation_Terminated_Date,Terminate_or_Defer_Note,Bachelor_Degree_Year,Bachelor_Degree_Type,Bachelor_Degree_Institution,Graduate_1_Degree_Year,Graduate_1_Degree_Type,Graduate_1_Degree_Institution,Graduate_2_Degree_Year,Graduate_2_Degree_Type,Graduate_2_Degree_Institution,Graduate_3_Degree_Year,Graduate_3_Degree_Type,Graduate_3_Degree_Institution,Graduate_4_Degree_Year,Graduate_4_Degree_Type,Graduate_4_Degree_Institution,CARA_Highest_Ed_Level,Diaconal_Ordination_Date,Diaconal_Ordination_Place,Diaconal_Ordination_Prelate,Presbyteral_Ordination_Date,Presbyteral_Ordination_Place,Presbyteral_Ordination_Prelate,Episcopal_Ordination_Date,Episcopal_Ordination_Place,Episcopal_Ordination_Prelate,Ordination_Diocese,Incardinated_From_Diocese,Incardinated_From_Date,Incardinated_Now,Serving_Now,Excardinated_To_Diocese,Excardinated_To_Date,Letter_of_Good_Standing_Date,Religious_In_Archdiocese_Date,Faculties,Faculties_Granted_Date,Faculties_Restricted_Date,Faculties_Withdrawn_Date,Last_Retreat_Date,Last_Educ_Requirement_Date,Policy_Manual_Acknowledgement_Date,Harassment_Prevention_Course_Date,Standards_of_Conduct_Date,Last_Background_Check_Date,Last_Child_Protection_Training_Date,Out_of_Diocese_Date,Senior_Status_Date,Laicized_Date,Deceased_Date,Languages,Coverage_Availability,Advanced_Directive_Date,End_of_Life_Plan_Date,Will_Date,Will_Note,CIC_489_File,Registered_Parish,CARA_Ethnicity,Seminarian_Status,Other_Diaconal_Ministry,Spiritual_Director_Authorized,Link_to_Religious_Community,Place_of_Work,Volunteer_Place,Type_of_Work,Work_Load,Work_Title
1,2766,Rev. Stephen Abaukaka,abaukaka stephen ozovehe,Priest,Transferred Out,,sabukaka,def2a990be60a7998b1ed7c820101f3bd02d33b8992518...,Yes,,0,Rev.,Fr.,Stephen,,Ozovehe,Abaukaka,,Brighton Hospice Office,8050 SW Warm Springs St Ste 205,Tualatin,OR,,97062,,5802 SW Milwaukie Ave Apt 4,,Portland,OR,,97202,,Mailing,503-430-7699,,773-733-3772,Work,,,abstoz@yahoo.com,,Yes,,,No,0,,Yes,1967-06-07,,,,,,,,,,,,,,,,,,,,,,,,,,,1996,Theology,,2013,MA Pastoral Studies,Chicago Theological Union,,,,,,,,,,,,,,1997-05-03,,"Diocese of Lokoja, Nigeria",,,,"Diocese of Lokoja, Nigeria",,,"Diocese of Lokoja, Nigeria","Diocese of Lokoja, Nigeria",,,,,Confessional,2021-11-02,,,,,,2022-05-30,2021-11-03,2021-11-04,2022-11-24,2023-01-16,,,,,,,,,,,0,,,,,0,,,,,
2,2337,Mr. Rogelio Acevedo,acevedo rogelio,Staff,,,,,,,0,Mr.,Mr.,Rogelio,,,Acevedo,,St. Pius X Parish,1280 NW Saltzman Rd,Portland,OR,,97229,,,,,,,,,,503-644-5264,,,,facilities@stpius.org,,,,,,,,0,,Yes,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,0,,,,,0,,,,,
3,3244,Mr. Sean Ackroyd,ackroyd sean,Staff,,,,,,,0,Mr.,Mr.,Sean,,,Ackroyd,,St. Mary Parish,501 NW 25th St,Corvallis,OR,,97330,,,,,,,,,,541-757-1988,,,,,,,,,,,,0,,Yes,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,0,,,,,0,,,,,
4,3295,Ms. Sherril Acton,acton sherril,Staff,,,,,,,0,Ms.,Ms.,Sherril,,,Acton,,Marist Catholic High School,1900 Kingsley Rd,Eugene,OR,,97401,,,,,,,,,,541-686-2234 x1524,,,,sacton@marisths.org,,,,,,,,0,,Yes,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,0,,,,,0,,,,,
5,2164,Ms. Barbara Adams,adams barbara,Staff,,,,,,,0,Ms.,Ms.,Barbara,,,Adams,,St. Henry Parish,346 NW 1st St,Gresham,OR,,97030,,,,,,,,,,503-665-9129,,,,adamsby@eou.edu,,,,,,,,0,,Yes,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,0,,,,,0,,,,,
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
3012,1670,Ms. Jenny Zomerdyk,zomerdyk jenny,Staff,,,,,,,0,Ms.,Ms.,Jenny,,,Zomerdyk,,Shepherd of the Valley Parish,600 Beebe Rd,Central Point,OR,,97502,,,,,,,,,,541-664-1050,,,,churchoffice@shepherdcatholic.com,,,,,,,,0,,Yes,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,0,,,,,0,,,,,
3013,2755,"Br. Daniel Zorrilla, MSpS",zorrilla daniel,Religious,,Active,dzorrilla,391eedf7c936f63d3d0a7d9ea7e506a84709662fd31ba9...,Yes,,0,Br.,Br.,Daniel,,,Zorrilla,,Félix Rougier House of Studies,PO Box 499,Saint Benedict,OR,,97373,,,,,,,,,,503-845-1181,,,,,,,,,,,,0,,Yes,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,2021-08-01,,,,,,,,,,2019-06-28,2021-10-10,,,,,,,,,,,,0,,,,,14,,,,,
3014,1962,Ms. Kim Zuber,zuber kim,Staff,,,,,,,0,Ms.,Ms.,Kim,,,Zuber,,St. Boniface Parish,375 SE Church St,Sublimity,OR,,97385,,,,,,,,,,503-769-5664,,,,boniface@wvi.com,,,,,,,,0,,Yes,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,0,,,,,0,,,,,
3015,2202,Ms. Agnes Zueger,zueger agnes,Staff,,,,,,,0,Ms.,Ms.,Agnes,,,Zueger,,Our Lady of the Lake Parish,650 A Ave,Lake Oswego,OR,,97034,,,,,,,,,,503-636-7687,,,,agnesz@ollparish.com,,,,,,,,0,,Yes,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,0,,,,,0,,,,,


In [141]:
# Import all Contact fields that actually map to Register Entry records

import pandas as pd

# Define the structure of your column sets with correct attribute names
column_sets = [
    {'date': 'Baptism_Date', 'place': 'Place_of_Baptism', 'notation_type': 'Proof of Baptism'},
    {'date': 'Confirmation_Date', 'place': 'Place_of_Confirmation', 'notation_type': 'Notice of Confirmation'},
    {'date': 'Received_Date', 'place': 'Parish_of_Record', 'notation_type': 'Notice of Profession of Faith'},
    {'date': 'Marriage_Date', 'place': 'Place_of_Marriage', 'notation_type': 'Notice of Matrimony'},
    {'date': 'Diaconal_Ordination_Date', 'place': 'Diaconal_Ordination_Place', 'prelate': 'Diaconate_Ordination_Prelate', 'notation_type': 'Notice of Holy Orders', 'ordination_type': 'Diaconate'},
    {'date': 'Presbyteral_Ordination_Date', 'place': 'Presbyteral_Ordination_Place', 'prelate': 'Presbyteral_Ordination_Prelate', 'notation_type': 'Notice of Holy Orders', 'ordination_type': 'Presbyteral'},
    {'date': 'Episcopal_Ordination_Date', 'place': 'Episcopal_Ordination_Place', 'prelate': 'Episcopal_Ordination_Prelate', 'notation_type': 'Notice of Holy Orders', 'ordination_type': 'Episcopal'}
]

# New DataFrame for entries
register_entries = pd.DataFrame(columns=['RecordNumber', 'mbfc__Register_Entry_Type__c', 'mbfc__Type__c', 'mbfc__Notation_Type__c', 'mbfc__Ordination_Type__c', 'Date', 'Place', 'Prelate'])
new_entries = []  # List to store entries before final concatenation

# Processing rows
for row in df.itertuples():
    for column_set in column_sets:
        date_value = getattr(row, column_set['date'], None)
        if pd.notna(date_value):  # Check if date field is not NaN
            entry = {
                'RecordNumber': getattr(row, 'Record_Number', None),
                'Date': date_value,
                'Place': getattr(row, column_set['place'], None)
            }
            # Add Prelate if applicable
            if 'prelate' in column_set:
                entry['Prelate'] = getattr(row, column_set['prelate'], None)

            # Set 'mbfc__Register_Entry_Type__c', and conditionally add 'mbfc__Type__c' or 'mbfc__Notation_Type__c'
            if 'sacrament_type' in column_set:
                entry['mbfc__Type__c'] = column_set['sacrament_type']
                entry['mbfc__Register_Entry_Type__c'] = 'Sacrament'
            if 'notation_type' in column_set:
                entry['mbfc__Notation_Type__c'] = column_set['notation_type']
                entry['mbfc__Register_Entry_Type__c'] = 'Notation'

            # Handle ordination type specific updates
            if 'ordination_type' in column_set:
                entry['mbfc__Ordination_Type__c'] = column_set['ordination_type']

            new_entries.append(entry)
    
    # Add entries for 'Reader Date'
    # reader_date = getattr(row, 'Reader_Date', None)
    # if pd.notna(reader_date):
    #     entry = {
    #         'RecordNumber': getattr(row, 'Record_Number', None),
    #         'Date': reader_date,
    #         'mbfc__Notation_Type__c': 'Notice of Holy Orders',
    #         'mbfc__Ordination_Type__c': 'Minor Order: Reader',
    #         'mbfc__Register_Entry_Type__c': 'Notation'
    #     }
    #     new_entries.append(entry)
    
    # # Add entries for 'Acolyte Date'
    # acolyte_date = getattr(row, 'Acolyte_Date', None)
    # if pd.notna(acolyte_date):
    #     entry = {
    #         'RecordNumber': getattr(row, 'Record_Number', None),
    #         'Date': acolyte_date,
    #         'mbfc__Notation_Type__c': 'Notice of Holy Orders',
    #         'mbfc__Ordination_Type__c': 'Minor Order: Acolyte',
    #         'mbfc__Register_Entry_Type__c': 'Notation'
    #     }
    #     new_entries.append(entry)

# Concatenate all new entries to the DataFrame at once
if new_entries:
    register_entries = pd.concat([register_entries, pd.DataFrame(new_entries)], ignore_index=True)

print(f"Total records added: {len(register_entries)}")

# Optionally, save the new DataFrame to a CSV
register_entries.to_csv('Register_Entries.csv', index=False)

# Display the DataFrame
register_entries.sample(10)


Total records added: 1534


Unnamed: 0,RecordNumber,mbfc__Register_Entry_Type__c,mbfc__Type__c,mbfc__Notation_Type__c,mbfc__Ordination_Type__c,Date,Place,Prelate
1247,326,Notation,,Notice of Holy Orders,Episcopal,2006-01-25,"St. Peter Cathedral, Marquette, MI",Adam Cardinal Maida
1028,643,Notation,,Notice of Holy Orders,Presbyteral,1998-12-12,,
1403,2758,Notation,,Notice of Holy Orders,Presbyteral,2015-06-27,"Holy Spirit Church, Agraharam","Most Rev. Singaroyan, Bishop of Salem"
468,119,Notation,,Notice of Profession of Faith,,1987-04-04,"St. Patrick, Canby OR",
1297,2066,Notation,,Notice of Holy Orders,Presbyteral,2006-12-29,"Cathedral of Namcheon, Busan, South Korea",Most Rev. Augustine Cheong
144,678,Notation,,Notice of Holy Orders,Presbyteral,1956-05-19,"Cathedral of the Immaculate Conception, Portla...",Most Rev. Edward D. Howard
1014,746,Notation,,Notice of Holy Orders,Presbyteral,2008-06-07,"Cathedral of the Immaculate Conception, Portla...",Most Rev. John G. Vlazny
885,120,Notation,,Notice of Matrimony,,1985-01-01,,
212,1930,Notation,,Notice of Confirmation,,1991-03-30,"Centerville Chapel, Augsburg, Germany",
1370,1369,Notation,,Notice of Holy Orders,Presbyteral,2012-06-29,,


### Populate Lookup for Prelate


In [142]:
from nameparser import HumanName
from nameparser.config import CONSTANTS

# Add dataset-specific Titles and Suffix constants for parsing
CONSTANTS.titles.add('Very', 'Rev.', 'Very Rev.', 'Sr.', 'Most Rev.')
CONSTANTS.suffix_acronyms.add('FRS', 'J.C.L.', 'J.C.L., D.D.', 'D.D.', 'OMI', 'OSA', 'OCD', 'OP', 'OC', 'FSE', 'OMV', 'SDB', 'SM', 'SFX', 'SP', 'OP', 'O.S.M', 'SNJM', 'OSF', 'HMRF', 'DD', 'CSJP', 'SDD', 'BVM', 'BVM - President', 'SJ', 'SL', 'IX', 'SSJ', 'J.C.L.', 'J.C.L', 'OFM', 'MSpS', 'Fco.' )


def parse_name(name):
    if pd.isna(name):  # Checks if the name is NaN or None
        return {
            'Salutation': '',
            'FirstName': '',
            'MiddleName': '',
            'LastName': '',
            'Suffix': ''
        }
    else:
        name = HumanName(name)
        return {
            'Salutation': name.title,
            'FirstName': name.first,
            'MiddleName': name.middle,
            'LastName': name.last,
            'Suffix': name.suffix
        }

# Apply the parsing function only where 'Prelate' exists and is not NaN
for entry in new_entries:
    if 'Prelate' in entry and pd.notna(entry['Prelate']):
        parsed_name = parse_name(entry['Prelate'])
        entry.update(parsed_name)

# Ensure the DataFrame creation from new_entries includes checks for existence of keys:
register_entries = pd.DataFrame(new_entries)
if 'Prelate' in register_entries.columns:
    register_entries['Salutation'] = register_entries['Prelate'].apply(lambda x: parse_name(x)['Salutation'] if pd.notna(x) else '')
    register_entries['FirstName'] = register_entries['Prelate'].apply(lambda x: parse_name(x)['FirstName'] if pd.notna(x) else '')
    register_entries['MiddleName'] = register_entries['Prelate'].apply(lambda x: parse_name(x)['MiddleName'] if pd.notna(x) else '')
    register_entries['LastName'] = register_entries['Prelate'].apply(lambda x: parse_name(x)['LastName'] if pd.notna(x) else '')
    register_entries['Suffix'] = register_entries['Prelate'].apply(lambda x: parse_name(x)['Suffix'] if pd.notna(x) else '')


# Display the DataFrame
print(f"Total records added: {len(register_entries)}")
register_entries.sample(10)



Total records added: 1534


Unnamed: 0,RecordNumber,Date,Place,Prelate,mbfc__Notation_Type__c,mbfc__Register_Entry_Type__c,mbfc__Ordination_Type__c,Salutation,FirstName,MiddleName,LastName,Suffix
555,736,1961-06-09,"St. Mary’s Cathedral, San Francisco, CA",Most Rev. Hugh Donahue,Notice of Holy Orders,Notation,Presbyteral,Most Rev.,Hugh,,Donahue,
246,52,1985-09-07,,,Notice of Matrimony,Notation,,,,,,
888,2749,1981-10-10,,,Proof of Baptism,Notation,,,,,,
1354,1568,1996-08-10,"Sacred Heart Cathedral, Lubaga",,Notice of Holy Orders,Notation,Presbyteral,,,,,
1158,555,1954-05-12,,,Notice of Confirmation,Notation,,,,,,
786,1185,1989-04-01,,,Notice of Holy Orders,Notation,Presbyteral,,,,,
1446,217,1951-09-30,"St. Mary, Mount Angel, OR",,Proof of Baptism,Notation,,,,,,
1503,576,1984-04-28,,,Notice of Holy Orders,Notation,Presbyteral,,,,,
1381,50,1966-11-20,"Sacred Heart, Medford OR",,Proof of Baptism,Notation,,,,,,
3,1592,2014-05-09,,,Notice of Confirmation,Notation,,,,,,


In [143]:
# Query Salesforce for existing contacts and create a dictionary for mapping

from simple_salesforce import Salesforce

query = """
SELECT Id, Archdpdx_Migration_Id__c
FROM Contact
"""
result = sf.query_all(query)
contact_map = {rec['Archdpdx_Migration_Id__c']: rec['Id'] for rec in result['records']}


In [144]:
# Get RecordTypeId for Contact.Priest
priest_contact_recordtype_id = get_recordtype_id(df_sf_recordTypes, 'Priest', 'Contact', 'mbfc')

priest_contact_recordtype_id

'012Dx0000003p5JIAQ'

In [145]:
# Get RecordID for Prelates by querying for Contacts by FirstName and LastName and, if not found, Create New Contacts

from simple_salesforce import SFType, SalesforceResourceNotFound

contact = SFType('Contact', sf.session_id, sf.sf_instance)
for index, row in register_entries.iterrows():
    first_name, last_name = row.get('FirstName'), row.get('LastName')

    if pd.isna(first_name) or pd.isna(last_name) or first_name.strip() == '' or last_name.strip() == '':
        # If either first name or last name is missing or empty, skip this row or handle as needed
        print(f"Skipping row {index} due to missing name information.")
        continue

    try:
        # Search for contact by First and Last Name
        query = f"SELECT Id FROM Contact WHERE FirstName = '{first_name}' AND LastName = '{last_name}'"
        result = sf.query(query)
        if result['totalSize'] > 0:
            contact_id = result['records'][0]['Id']
        else:
            # Create a new contact if no match found
            new_contact = {
                'FirstName': first_name,
                'LastName': last_name,
                'Archdpdx_Job_Id__c': curr_job_id,
                'RecordTypeId': priest_contact_recordtype_id
            }
            create_result = contact.create(new_contact)
            contact_id = create_result['id']

        # Update DataFrame with the Salesforce Contact ID
        register_entries.at[index, 'mbfc__Celebrant__c'] = contact_id

    except SalesforceException as e:
        print(f"Error processing row {index}: {e}")



Skipping row 2 due to missing name information.
Skipping row 3 due to missing name information.
Skipping row 4 due to missing name information.
Skipping row 5 due to missing name information.
Skipping row 6 due to missing name information.
Skipping row 8 due to missing name information.
Skipping row 9 due to missing name information.
Skipping row 10 due to missing name information.
Skipping row 11 due to missing name information.
Skipping row 12 due to missing name information.
Skipping row 13 due to missing name information.
Skipping row 16 due to missing name information.
Skipping row 17 due to missing name information.
Skipping row 18 due to missing name information.
Skipping row 19 due to missing name information.
Skipping row 20 due to missing name information.
Skipping row 21 due to missing name information.
Skipping row 22 due to missing name information.
Skipping row 23 due to missing name information.
Skipping row 24 due to missing name information.
Skipping row 26 due to miss

### Prepare to Upsert


In [146]:
# Map Contact IDs to Register Entries

register_entries_2 = register_entries

register_entries_2['mbfc__Contact__c'] = register_entries['RecordNumber'].map(contact_map)


In [147]:
# Append Job_Id__c
register_entries_2['Archdpdx_Job_Id__c'] = curr_job_id

In [148]:
# Generate an External ID
def create_external_id(row):
    record_number = str(row['RecordNumber']).replace(' ', '').replace('-', '')
    entry_type = str(row['mbfc__Register_Entry_Type__c']).replace(' ', '').replace('-', '')

    # Check whether to use Type or Notation Type based on what's available
    if 'mbfc__Type__c' in row and not pd.isna(row['mbfc__Type__c']):
        type_field = str(row['mbfc__Type__c']).replace(' ', '').replace('-', '')
    elif 'mbfc__Notation_Type__c' in row and not pd.isna(row['mbfc__Notation_Type__c']):
        type_field = str(row['mbfc__Notation_Type__c']).replace(' ', '').replace('-', '') + str(row['mbfc__Ordination_Type__c']).replace(' ', '').replace('-', '')
    else:
        type_field = 'Unknown'

    return f"{record_number}_{entry_type}_{type_field}"

In [149]:
# Assuming your DataFrame is named `register_entries`
register_entries_2['Archdpdx_Migration_Id__c'] = register_entries.apply(create_external_id, axis=1)

if register_entries['Archdpdx_Migration_Id__c'].duplicated().any():
    print("Warning: There are duplicate external IDs.")
    # Optionally, show the duplicates
    duplicates = register_entries[register_entries['external_id'].duplicated(keep=False)]
    print(duplicates)
else:
    print("All external IDs are unique.")


All external IDs are unique.


In [150]:
# Drop unnecessary columns:
register_entries_2.drop(['RecordNumber', 'Prelate', 'Salutation', 'FirstName', 'MiddleName', 'LastName', 'Suffix'], axis=1, inplace=True)

In [151]:
register_entries_staging = register_entries_2

In [152]:
# Remove all NaN values:
register_entries_staging.fillna('', inplace=True)

# Rename columns
register_entries_staging = register_entries_staging.rename(columns={
    'Place': 'mbfc__Location_text__c',
    'Date': 'mbfc__Event_Date__c'
})


In [153]:
# What is this checking for?... Why did I include this?
register_entries_staging[register_entries_staging.mbfc__Contact__c == '003Dx00000m0OtXIAU']


Unnamed: 0,mbfc__Event_Date__c,mbfc__Location_text__c,mbfc__Notation_Type__c,mbfc__Register_Entry_Type__c,mbfc__Ordination_Type__c,mbfc__Celebrant__c,mbfc__Contact__c,Archdpdx_Job_Id__c,Archdpdx_Migration_Id__c


In [154]:
# generate CSV for manual loading
register_entries_staging.to_csv('staging_files/reg_entry_staging.csv', encoding='utf-8-sig')


In [155]:
# Upsert Register Entry Records

bulk_data = []
for row in register_entries_staging.itertuples(index=False):
    d = row._asdict()
    # del d['Index']
    bulk_data.append(d)

# Keep the batch <100 as I've been getting an exceptionCode: 'InvalidBatch', 'exceptionMessage': 'Records not processed'
reg_entry_upsert = sf.bulk.mbfc__Register_Entry__c.upsert(data=bulk_data, external_id_field='Archdpdx_Migration_Id__c', batch_size=100, use_serial=False)
reg_entry_upsert_results = pd.DataFrame(reg_entry_upsert)

In [156]:
# Print upsert results to local file

keys = reg_entry_upsert[0].keys()

with open('results_files/register_entry_results', 'w', newline='') as csv_file:
    writer = csv.DictWriter(csv_file, keys)
    writer.writeheader()
    writer.writerows(reg_entry_upsert)

# CONTACT > AFFILIATIONS


In [157]:
# Function to create a unique ID based on Person's Name + completion date or start date + affiliation type
def create_unique_id(row):
    # Get values, handling NaNs
    person_id = str(row.get('mbfc__Person__c', '')).strip()
    
    # Check for completion date, and if it's blank, use the start date
    completion_date = row.get('mbfc__Completion_Date__c', '')
    if pd.isna(completion_date) or completion_date == '':
        completion_date = row.get('mbfc__Start_Date__c', '')
    
    completion_date = str(completion_date).strip()
    affiliation = str(row.get('mbfc__Affiliation__c', '')).strip()
    
    # Concatenate the three fields
    combined = f"{person_id}{completion_date}{affiliation}"
    
    # Remove unwanted characters and convert to lowercase
    clean_id = ''.join(combined.split()).replace('-', '').replace('.', '').lower()
    
    # Limit the string to 50 characters
    return clean_id[:50]

## Education Affiliations

This section takes multiple sets of columns (all related to a person's education) from the Contacts table, and combines them into a single set of columns in a new dataframe for insertion into Salesforce as Affiliation records.


In [158]:
# Parse and stage Education Affiliation records
import pandas as pd
from functools import lru_cache

# Load CSV
df = (pd.read_csv('/Users/matthewmartin/Library/CloudStorage/GoogleDrive-matt@meribahflow.com/Shared drives/Clients/ADPDX (Portland)/Data/Clergy DB/reports from clergypdx/People.csv')
               .rename(columns=lambda x: x.replace(' ', '_')) # Remove whitespace in column names
               .drop(index=0) # Drops the extra row that replicates the labels
)


# Define the structure of your column sets with correct attribute names
degree_sets = [
    {'year': 'Bachelor_Degree_Year', 'type': 'Bachelor_Degree_Type', 'institution': 'Bachelor_Degree_Institution'},
    {'year': 'Graduate_1_Degree_Year', 'type': 'Graduate_1_Degree_Type', 'institution': 'Graduate_1_Degree_Institution'},
    {'year': 'Graduate_2_Degree_Year', 'type': 'Graduate_2_Degree_Type', 'institution': 'Graduate_2_Degree_Institution'},
    {'year': 'Graduate_3_Degree_Year', 'type': 'Graduate_3_Degree_Type', 'institution': 'Graduate_3_Degree_Institution'},
    {'year': 'Graduate_4_Degree_Year', 'type': 'Graduate_4_Degree_Type', 'institution': 'Graduate_4_Degree_Institution'}
]

# Query for the Record Type ID for 'Organization'
record_type_result = sf.query("SELECT Id FROM RecordType WHERE SobjectType = 'Account' AND DeveloperName = 'Organization' AND NamespacePrefix = 'mbfc'")
organization_record_type_id = record_type_result['records'][0]['Id'] if record_type_result['records'] else None

# Initialize the DataFrame for the staging table
education_staging = pd.DataFrame()

# Function to check and create institution account
@lru_cache(maxsize=None)
def get_or_create_institution_account(institution_name):
    if pd.isna(institution_name):
        return None  # Return None or handle as appropriate if institution name is NaN

    # Query Salesforce to find the institution
    query = f"SELECT Id, Name FROM Account WHERE Name = '{institution_name}' LIMIT 1"
    results = sf.query(query)
    
    # If exists, return the ID
    if results['records']:
        return results['records'][0]['Id']
    else:
        # Ensure no NaN values are sent to Salesforce
        account_data = {
            'Name': institution_name if pd.notna(institution_name) else "Default Name",  # Provide a default if NaN
            'RecordTypeId': organization_record_type_id,
            'mbfc__Organization_Type__c': 'School'
        }
        # Remove keys with None values to avoid JSON serialization issues
        account_data = {k: v for k, v in account_data.items() if v is not None}
        
        new_account = sf.Account.create(account_data)
        return new_account['id']

# Get Contact record ID from Salesforce
@lru_cache(maxsize=None)
def get_contact_id_by_record_number(record_number):
    if pd.isna(record_number):
        return None
    query = f"SELECT Id FROM Contact WHERE Archdpdx_Migration_Id__c = '{record_number}'"
    results = sf.query(query)
    if results['records']:
        return results['records'][0]['Id']
    return None


# Initialize an empty list to collect DataFrames or dictionaries
new_entries = []

# Process each row and each degree set
for index, row in df.iterrows():
    for degree_set in degree_sets:
        year = row[degree_set['year']]
        if pd.notna(year):  # Only proceed if the year column is not NaN
            formatted_year = f"{int(year)}-01-01"  # Convert year to YYYY-MM-DD format
            institution_name = row[degree_set['institution']]
            account_id = get_or_create_institution_account(institution_name)
            contact_id = get_contact_id_by_record_number(row['Record_Number'])
            
            # Create a record for the staging table
            affiliation_record = {
                'mbfc__Person__c': contact_id,
                'mbfc__Completion_Date__c': formatted_year,
                'mbfc__Context__c': account_id,
                'mbfc__Category__c': 'Education/Studies',
                'mbfc__Affiliation__c': row[degree_set['type']]
            }
            new_entries.append(affiliation_record)

# Convert all collected records to a DataFrame in one go
education_staging = pd.DataFrame(new_entries)


#FIXME: There are 4 rows where no INSTITUTION is listed. This makes it impossible to import an Affiliation record. Need to figure out how to handle this with Client. 
#FIXME: There are about 15 rows where no DEGREE is listed. This makes it impossible to import an Affiliation record. Need to figure out how to handle this with Client. 

SalesforceMalformedRequest: Malformed request https://adpdx--uat.sandbox.my.salesforce.com/services/data/v57.0/sobjects/Account/. Response content: [{'message': 'bad value for restricted picklist field: School', 'errorCode': 'INVALID_OR_NULL_FOR_RESTRICTED_PICKLIST', 'fields': ['mbfc__Organization_Type__c']}, {'message': 'bad value for restricted picklist field: School', 'errorCode': 'INVALID_OR_NULL_FOR_RESTRICTED_PICKLIST', 'fields': ['mbfc__Organization_Type__c']}]

In [None]:
# Apply the function to each row and create a new column with the unique ID
education_staging['Archdpdx_Migration_Id__c'] = education_staging.apply(create_unique_id, axis=1)

# Check the first few rows to verify the new column
education_staging.head()

In [None]:
# Fill any NaN values
education_staging = education_staging.fillna('')

In [None]:
# Save the staging table to CSV
education_staging.to_csv('staging_files/education_staging.csv', index=False)


In [528]:
import pandas as pd
import numpy as np
from simple_salesforce import Salesforce, SalesforceMalformedRequest, SalesforceError
from datetime import datetime, date



# def upsert_to_salesforce(sf, dataframe, object_name, external_id_field):
#     """
#     Upsert records to Salesforce from a pandas DataFrame.

#     Parameters:
#     sf (Salesforce): The Salesforce connection instance.
#     dataframe (pd.DataFrame): The pandas DataFrame containing data to upsert.
#     object_name (str): The Salesforce object name (e.g., 'Contact').
#     external_id_field (str): The external ID field used for upserts.
#     """
#     successful_upserts = 0
#     failed_upserts = 0

#     # Replace placeholder values with None in the DataFrame
#     dataframe.replace({None: pd.NA, ' ': None, '': None}, inplace=True)

#     # Convert DataFrame to a list of dictionaries
#     data_to_upsert = dataframe.to_dict(orient='records')

#     for data in data_to_upsert:
#         try:
#             data = convert_non_serializables(data)
#             external_id = data.pop(external_id_field)

#             # Perform upsert using only the External ID
#             response = getattr(sf, object_name).upsert(f'{external_id_field}/{external_id}', data)
#             successful_upserts += 1
#             print(f"Successfully upserted {object_name} with External ID: {external_id}")
#         except SalesforceMalformedRequest as e:
#             failed_upserts += 1
#             print(f"Malformed request error when upserting {object_name} with External ID: {external_id}. Error: {e.content}")
#         except SalesforceError as e:
#             failed_upserts += 1
#             print(f"Salesforce error when upserting {object_name} with External ID: {external_id}. Error: {e.content}")
#         except Exception as e:
#             failed_upserts += 1
#             print(f"Failed to upsert {object_name} with External ID: {external_id}. Error: {e}")

#     print(f"Upsert completed. Successful upserts: {successful_upserts}, Failed upserts: {failed_upserts}")

def convert_non_serializables(data):
    """Convert non-serializable objects to serializable formats."""
    for key, value in data.items():
        try:
            if isinstance(value, (datetime, date)):
                data[key] = value.isoformat()
            elif isinstance(value, float) and np.isnan(value):
                data[key] = None
            elif pd.isna(value):
                data[key] = None
            elif isinstance(value, (int, bool, str)):
                data[key] = value
            else:
                data[key] = str(value)  # Convert other types to string
        except Exception as e:
            print(f"Error processing key: {key}, value: {value}, error: {e}")
    return data

def upsert_to_salesforce_bulk(sf, dataframe, object_name, external_id_field, failed_log_file, batch_size=10000):
    """
    Upsert records to Salesforce from a pandas DataFrame using the Bulk API.

    Parameters:
    sf (Salesforce): The Salesforce connection instance.
    dataframe (pd.DataFrame): The pandas DataFrame containing data to upsert.
    object_name (str): The Salesforce object name (e.g., 'Contact').
    external_id_field (str): The external ID field used for upserts.
    failed_log_file (str): The file name where failed upsert records will be logged.
    batch_size (int): The number of records to include in each batch.
    """
    successful_upserts = 0
    failed_upserts = 0

    # Replace placeholder values with None in the DataFrame
    dataframe.replace({None: pd.NA, ' ': None, '': None}, inplace=True)

    # Convert DataFrame to a list of dictionaries
    data_to_upsert = dataframe.to_dict(orient='records')

    with open(failed_log_file, 'a') as log_file:
        # Process data in batches
        for i in range(0, len(data_to_upsert), batch_size):
            batch_data = data_to_upsert[i:i + batch_size]
            batch_data = [convert_non_serializables(record) for record in batch_data]

            try:
                # Perform bulk upsert
                response = sf.bulk.__getattr__(object_name).upsert(batch_data, external_id_field=external_id_field)

                for res in response:
                    if res['success']:
                        successful_upserts += 1
                    else:
                        failed_upserts += 1
                        log_file.write(f"Failed to upsert record: {res}\n")

            except SalesforceMalformedRequest as e:
                failed_upserts += len(batch_data)
                log_file.write(f"Malformed request error when upserting batch. Error: {e.content}\n")
            except SalesforceError as e:
                failed_upserts += len(batch_data)
                log_file.write(f"Salesforce error when upserting batch. Error: {e.content}\n")
            except Exception as e:
                failed_upserts += len(batch_data)
                log_file.write(f"Failed to upsert batch. Error: {e}\n")

    print(f"Upsert completed. Successful upserts: {successful_upserts}, Failed upserts: {failed_upserts}")


In [None]:
# Upsert Education Affiliation records

# upsert_to_salesforce(sf, education_staging, 'mbfc__Affiliation__c', 'Archdpdx_Migration_Id__c')
upsert_to_salesforce_bulk(sf, education_staging, 'mbfc__Affiliation__c', 'Archdpdx_Migration_Id__c', 'results_files/education_affil', batch_size=1000)


In [None]:

#FIXME: A number of Education Affiliation records are missing either an Affiliation title or a Context

In [None]:
# Upsert Education Affiliation records [DEP]

# bulk_data = []
# for row in education_staging.itertuples(index=False):
#     d = row._asdict()
#     # del d['Index']
#     bulk_data.append(d)

# try:
#     # Attempt to upsert Education Affiliation records into SF using Bulk API
    # education_affil_upsert = sf.bulk.mbfc__Affiliation__c.upsert(data=bulk_data, external_id_field='Archdpdx_Migration_Id__c', batch_size=500, use_serial=False)
    

# except SalesforceMalformedRequest as e:
#     # If a SalesforceMalformedRequest error occurs, print the error message and response content
#     print(f"SalesforceMalformedRequest error: {e}")
#     print(f"Response content: {e.content}")

# Send results to CSV
# education_affil_upsert_results = pd.DataFrame(education_affil_upsert)
# education_affil_upsert_results.to_csv('results_files/education_affil_upsert_results')

## Ecclesial Affiliations

This section handles individual Contact source table FIELDS that map to Affiliation RECORDS in the target system.

As the source data model and target data model are substantially different, this section groups together source columns into what will become individual records in the new system and populates missing information based on or required by the target system.

Example: each affiliation record in the target system requires a Context. In certain cases this data does not exist in the source or it is found in another column:

| Affiliation            | Context                   | Completion Date           |
| ---------------------- | ------------------------- | ------------------------- |
| First Vows             | Religious Order           | Date of First Vows        |
| Final Vows             | Religious Order           | Date of Final Vows        |
| Incardination          | Incardinated from Diocese | Incardinated From Date    |
| Faculties (Type)       | Local Diocese             | Faculties Granted Date    |
| Faculties (Restricted) | Local Diocese             | Faculties Restricted Date |
| Faculties (Withdrawn)  | Local Diocese             | Faculties Withdrawn Date  |
| Excardinated           | Excardinated To Diocese   | Excardinated To Date      |

Other examples of columns that need to be populated:

- RecordTypeId
- Category
- Start Date
- Completion Date

Depending on which column is being migrated, the date value might be considered to be a Start Date or a Completion Date in the target system, and needs to be staged accordingly.


In [192]:
# Generate a staging DF of Ecclesial Affiliations out of a handful of fields in the source data, each of which is to be converted into a new row in the staging DF.

# FIXME: There are a number of rows where a Faculties Granted is missing a date, and conversely, where there is a Faculties Granted Date but no description of the Faculties granted. This is a problem, because the application requires a date for when Faculties were granted.


import pandas as pd
from functools import lru_cache
from simple_salesforce import Salesforce

# Load CSV
df = (pd.read_csv('/Users/matthewmartin/Library/CloudStorage/GoogleDrive-matt@meribahflow.com/Shared drives/Clients/ADPDX (Portland)/Data/Clergy DB/reports from clergypdx/People.csv')
               .rename(columns=lambda x: x.replace(' ', '_')) # Remove whitespace in column names
               .drop(index=0) # Drops the extra row that replicates the labels
)

# Define the structure of your column sets with correct attribute names
column_sets = [
    {'year': 'Incardinated_From_Date', 'context': 'Incardinated_From_Diocese'},
    {'year': 'Excardinated_To_Date', 'context': 'Excardinated_To_Diocese'},
    {'year': 'Faculties_Granted_Date', 'affiliation': 'Faculties'},
    {'year': 'Faculties_Restricted_Date'},
    {'year': 'Faculties_Withdrawn_Date'},
    {'year': 'Reader_Date'},  # Add Reader Date
    {'year': 'Acolyte_Date'},  # Add Acolyte Date
    {'year': 'Candidacy_Date'}  # Add Candidate Date
]



In [193]:

# Query for the Record Type IDs of Church, Religious    
record_type_query = "SELECT Id, DeveloperName FROM RecordType WHERE SobjectType = 'Account' AND DeveloperName IN ('Church', 'Religious')"
record_type_result = sf.query(record_type_query)
record_type_ids = {record['DeveloperName']: record['Id'] for record in record_type_result['records']}

church_record_type_id = record_type_ids.get('Church')
religious_record_type_id = record_type_ids.get('Religious')

# Query for the Record Type IDs for 'Ecclesial_Affiliation' and 'Ministerial_Status' for mbfc__Affiliation__c object
record_type_query = "SELECT Id, DeveloperName FROM RecordType WHERE SobjectType = 'mbfc__Affiliation__c' AND DeveloperName IN ('Ecclesial_Affiliation', 'Ministerial_Status')"
record_type_result = sf.query(record_type_query)
record_type_ids = {record['DeveloperName']: record['Id'] for record in record_type_result['records']}

ecclesial_affiliation_record_type_id = record_type_ids.get('Ecclesial_Affiliation')
ministerial_status_record_type_id = record_type_ids.get('Ministerial_Status')

# Check if any of the required Record Types are missing
if not ecclesial_affiliation_record_type_id:
    raise ValueError("No RecordType found for Ecclesial Affiliation on mbfc__Affiliation__c object.")
if not ministerial_status_record_type_id:
    raise ValueError("No RecordType found for Ministerial Status on mbfc__Affiliation__c object.")

In [194]:

# Initialize the DataFrame for the staging table
ecclesial_affiliation_staging = pd.DataFrame()

# Function to check and create institution account
@lru_cache(maxsize=None)
def get_or_create_church_account(context):
    if pd.isna(context):
        return None  # Return None or handle as appropriate if institution name is NaN

    # Query Salesforce to find the institution
    query = f"SELECT Id, Name FROM Account WHERE Name = '{context}' LIMIT 1"
    results = sf.query(query)
    
    # If exists, return the ID
    if results['records']:
        return results['records'][0]['Id']
    else:
        # Ensure no NaN values are sent to Salesforce
        if 'Diocese' in context or 'Archdiocese' in context:
            account_data = {
                'Name': context if pd.notna(context) else "Church Name Missing",  # Provide a default if NaN
                'RecordTypeId': church_record_type_id,
                'mbfc__Church_Type__c': 'Diocese'
            }
        else:
            account_data = {
                'Name': context if pd.notna(context) else "Religious Name Missing",  # Provide a default if NaN
                'RecordTypeId': religious_record_type_id
            }

        # Remove keys with None values to avoid JSON serialization issues
        account_data = {k: v for k, v in account_data.items() if v is not None}
        
        new_account = sf.Account.create(account_data)
        return new_account['id']

# Get Contact record ID from Salesforce
@lru_cache(maxsize=None)
def get_contact_id_by_record_number(record_number):
    if pd.isna(record_number):
        return None
    query = f"SELECT Id FROM Contact WHERE Archdpdx_Migration_Id__c = '{record_number}'"
    results = sf.query(query)
    if results['records']:
        return results['records'][0]['Id']
    return None

# Initialize an empty list to collect DataFrames or dictionaries
new_entries = []

# Process each row and each degree set
for index, row in df.iterrows():
    for col_set in column_sets:
        date = row[col_set['year']]
        if pd.notna(date):  # Only proceed if the year column is not NaN
            context = row.get(col_set.get('context'), None)
            account_id = get_or_create_church_account(context)
            contact_id = get_contact_id_by_record_number(row['Record_Number'])
            
            # Initialize all necessary variables with None
            start_date = None
            completion_date = None
            affiliation = None
            record_type_id = None
            category = None

            # Determine the mbfc__Affiliation__c value
            if 'Incardinated_From_Date' in col_set['year']:
                affiliation = 'Incardinated'
                completion_date = date
                record_type_id = ecclesial_affiliation_record_type_id
                category = 'Ecclesial Affiliations'
            elif 'Excardinated_To_Date' in col_set['year']:
                affiliation = 'Excardinated'
                completion_date = date
                record_type_id = ecclesial_affiliation_record_type_id
                category = 'Ecclesial Affiliations'
            elif 'Faculties_Granted_Date' in col_set['year']:
                faculties_value = row.get(col_set.get('affiliation', ''))
                if pd.isna(faculties_value):
                    affiliation = 'Faculties'
                else:
                    affiliation = f"Faculties ({faculties_value})"
                account_id = diocesan_account_id  # Override account ID for faculties
                start_date = date
                record_type_id = ministerial_status_record_type_id
                category = 'Faculties'
            elif 'Faculties_Restricted_Date' in col_set['year']:
                affiliation = 'Faculties (Restricted)'
                account_id = diocesan_account_id  # Override account ID for faculties
                completion_date = date
                record_type_id = ministerial_status_record_type_id
                category = 'Faculties'
            elif 'Faculties_Withdrawn_Date' in col_set['year']:
                affiliation = 'Faculties (Withdrawn)'
                account_id = diocesan_account_id  # Override account ID for faculties
                completion_date = date
                record_type_id = ministerial_status_record_type_id
                category = 'Faculties'
            elif 'Date_of_First_Vows' in col_set['year']:
                affiliation = 'First Vows'
                completion_date = date
                record_type_id = ecclesial_affiliation_record_type_id
                category = 'Ecclesial Affiliations'
            elif 'Date_of_Final_Vows' in col_set['year']:
                affiliation = 'Final Vows'
                completion_date = date
                record_type_id = ecclesial_affiliation_record_type_id
                category = 'Ecclesial Affiliations'
            elif 'Reader_Date' in col_set['year']:
                affiliation = 'Reader Installation'
                completion_date = date
                record_type_id = ecclesial_affiliation_record_type_id
                category = 'Installations'
                account_id = diocesan_account_id
            elif 'Acolyte_Date' in col_set['year']:
                affiliation = 'Acolyte Installation'
                completion_date = date
                record_type_id = ecclesial_affiliation_record_type_id
                category = 'Installations'
                account_id = diocesan_account_id
            elif 'Candidacy_Date' in col_set['year']:
                affiliation = 'Candidate Installation'
                completion_date = date
                record_type_id = ecclesial_affiliation_record_type_id
                category = 'Installations'
                account_id = diocesan_account_id

            else:
                affiliation = row.get(col_set.get('affiliation', ''), None)
            
            # Create a record for the staging table
            affiliation_record = {
                'RecordTypeId': record_type_id,
                'mbfc__Person__c': contact_id,
                'mbfc__Completion_Date__c': completion_date,
                'mbfc__Start_Date__c': start_date,
                'mbfc__Context__c': account_id,
                'mbfc__Category__c': category,
                'mbfc__Affiliation__c': affiliation
            }
            new_entries.append(affiliation_record)

# Convert all collected records to a DataFrame in one go
ecclesial_affiliations_staging = pd.DataFrame(new_entries)

# Takes approx. 1.5 minutes to run

In [199]:
ecclesial_affiliations_staging.sample(20)

Unnamed: 0,RecordTypeId,mbfc__Person__c,mbfc__Completion_Date__c,mbfc__Start_Date__c,mbfc__Context__c,mbfc__Category__c,mbfc__Affiliation__c
362,012Dx0000003p5DIAQ,003Dx00000nKitVIAS,,2017-12-01,001Dx00001HwDsgIAF,Faculties,Faculties (General)
395,012Dx0000003p5DIAQ,003Dx00000nKiveIAC,,1970-10-18,001Dx00001HwDsgIAF,Faculties,Faculties (General)
933,012Dx0000003p5AIAQ,003Dx00000nKjOHIA0,2019-11-09,,001Dx00001HwDsgIAF,Installations,Reader Installation
28,012Dx0000003p5AIAQ,003Dx00000nKir3IAC,2020-07-01,,001Dx00001HwFF2IAN,Ecclesial Affiliations,Excardinated
852,012Dx0000003p5DIAQ,003Dx00000nKj3gIAC,,2022-07-01,001Dx00001HwDsgIAF,Faculties,Faculties (General)
663,012Dx0000003p5DIAQ,003Dx00000nKjTqIAK,,1992-09-05,001Dx00001HwDsgIAF,Faculties,Faculties (Restricted)
655,012Dx0000003p5AIAQ,003Dx00000nKjTIIA0,1975-04-26,,001Dx00001HwDsgIAF,Installations,Reader Installation
374,012Dx0000003p5DIAQ,003Dx00000nKiuFIAS,,2023-06-12,001Dx00001HwDsgIAF,Faculties,Faculties (Confessional)
629,012Dx0000003p5DIAQ,003Dx00000nKjI2IAK,,2020-07-01,001Dx00001HwDsgIAF,Faculties,Faculties (General)
10,012Dx0000003p5AIAQ,003Dx00000nKiqVIAS,2007-03-12,,001Dx00001HwDsgIAF,Installations,Reader Installation


In [201]:
# Apply the function to each row and create a new column with the unique ID
ecclesial_affiliations_staging['Archdpdx_Migration_Id__c'] = ecclesial_affiliations_staging.apply(create_unique_id, axis=1)

# Check for duplicates
ecclesial_affiliations_staging['Archdpdx_Migration_Id__c'].duplicated().value_counts()

False    984
True       1
Name: Archdpdx_Migration_Id__c, dtype: int64

In [202]:
# Send the new DataFrame to a CSV
ecclesial_affiliations_staging.to_csv('staging_files/Ecclesial_Affiliations_Staging.csv', index=False, encoding='utf-8-sig')

In [203]:
# NEW Upsert function to upsert Ecclesial Affiliation records
upsert_to_salesforce_bulk(sf, ecclesial_affiliations_staging, 'mbfc__Affiliation__c', 'Archdpdx_Migration_Id__c', 'results_files/ecclesial_affil_upsert_results')

#FIXME: ... the upsert_to_salesforce function is declared in a few places in this workbook (what a mess!!)  one of them works (the latter one), an earlier version of it does not. 

#FIXME: There are a number of rows where a Faculties Granted is missing a date, and conversely, where there is a Faculties Granted Date but no description of the Faculties granted. This is a problem, because the application requires a date for when Faculties were granted.

# Takes approx 1.5 minutes to run

Upsert completed. Successful upserts: 954, Failed upserts: 31


In [167]:
#FIXME: Handful of Ecclesial Affil records with error: [{'statusCode': 'FIELD_CUSTOM_VALIDATION_EXCEPTION', 'message': 'Context is required', 'fields': []}]"


# AFFILIATIONS


In [168]:
# Import Assignments.csv

import pandas as pd


df_affiliations = (
    pd.read_csv('/Users/matthewmartin/Library/CloudStorage/GoogleDrive-matt@meribahflow.com/Shared drives/Clients/ADPDX (Portland)/Data/Clergy DB/reports from clergypdx/Assignments (1).csv')
    .set_index('Record Number', verify_integrity=True)
    .drop(index='recNum', errors='ignore')  # Added errors='ignore' to prevent errors if 'recNum' does not exist
    .drop(columns=['Historic Name'], errors='ignore')  # Added errors='ignore' for the same reason
    .rename(columns=lambda x: x.replace(' ', '_'))  # Remove whitespace in column names
    .assign(Account_Ext_Id=lambda df: df['Organization_Table_Name'] + '_' + df['Organization_Table_Link'])
    # .assign(mbfc__Person__r=lambda df: df['Assigned_Person'].apply(lambda x: {'Archdpdx_Migration_Id__c': x}))
    # .assign(mbfc__Context__r=lambda df: df['Account_Ext_Id'].apply(lambda x: {'Archdpdx_Migration_Id__c': x}))
    # .assign(mbfc__Use_Custom_Title__c= True)
    .assign(mbfc__Category__c= 'Any All')
    # .assign(Archdpdx_Migration_Id__c= df_affiliations.index)
    .drop(columns=[
        # 'Assigned_Person'
        'Organization_Table_Name'
        ,'Organization_Table_Link'
        ,'Projected_Term_End_Date'
        ,'Term_Number'
        ,'Leave_Type' # Leave out 'Leave_Type' until mapped properly
        ])
    .rename(columns={
        'Duty_Load': 'mbfc__Duty_Load__c',
        'Start_Date': 'mbfc__Start_Date__c',
        'End_Date': 'mbfc__Completion_Date__c',
        'Assignment_Title': 'mbfc__Affiliation__c',
        'Archdiocesan_Assignment': 'adpdx_Archdiocesan_Assignment__c',
    })
    .replace({'ADPDX_Archdiocesan_Assignment__c': {'Yes': True, 'No': False, None: False}})
    .fillna('')
)

# Display a sample of the DataFrame to check the new structure
df_affiliations.sample(10)



Unnamed: 0_level_0,Assigned_Person,mbfc__Affiliation__c,adpdx_Archdiocesan_Assignment__c,mbfc__Duty_Load__c,mbfc__Start_Date__c,mbfc__Completion_Date__c,Account_Ext_Id,mbfc__Category__c
Record Number,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1
2753,2879,Project/Facility Coordinator,,,,,Parishes_53,Any All
43,310,Deacon,Yes,,2011-10-29,,Parishes_115,Any All
784,1585,Pastor,No,,2019-04-01,2019-07-15,Parishes_97,Any All
3262,3116,Novice Master,Yes,Full Time,2023-11-06,,RelCommunities_17,Any All
3481,3299,Office Manager,,,2023-01-01,,Schools_19,Any All
129,101,Deacon,Yes,,2003-09-01,2017-08-22,Parishes_100,Any All
719,681,Director of Continuing Education,Yes,,2019-07-01,,Offices_21,Any All
3126,710,Pastor,Yes,Full Time,2023-07-01,,Parishes_125,Any All
1836,2328,Administrative Assistant/Registrar,,,,,Schools_41,Any All
3327,3173,Office Assistant (Spanish),,,2023-01-01,,Parishes_16,Any All


In [169]:
# Get SF Record Ids from External Ids

# Get Context Account Ids
add_salesforce_record_ids(sf, df_affiliations, 'Account_Ext_Id', 'Account', 'Archdpdx_Migration_Id__c', 'mbfc__Context__c')

Unnamed: 0_level_0,Assigned_Person,mbfc__Affiliation__c,adpdx_Archdiocesan_Assignment__c,mbfc__Duty_Load__c,mbfc__Start_Date__c,mbfc__Completion_Date__c,Account_Ext_Id,mbfc__Category__c,mbfc__Context__c
Record Number,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1
3515,780,Presbyteral Council Rep,Yes,Part Time,2024-01-01,,Vicariates_16,Any All,001Dx00001HwDx2IAF
3514,762,Presbyteral Council Rep,Yes,Part Time,2024-01-01,,Vicariates_3,Any All,001Dx00001HwDwpIAF
3512,3321,Deacon,Yes,Full Time,2024-02-19,,Parishes_83,Any All,001Dx00001HwDzoIAF
3511,803,Special Assignment,Yes,Full Time,2024-02-10,,Offices_21,Any All,001Dx00001HwDyQIAV
3510,3317,Development Operations Associate,,,2024-01-16,,Offices_1,Any All,001Dx00001HwDyIIAV
...,...,...,...,...,...,...,...,...,...
5,511,Vicar Forane,Yes,,2016-10-01,2023-09-30,Vicariates_10,Any All,001Dx00001HwDwwIAF
4,511,Administrator,Yes,,2013-07-01,2017-06-30,Parishes_109,Any All,001Dx00001HwE0DIAV
3,511,Pastor,Yes,Full Time,2017-07-01,,Parishes_109,Any All,001Dx00001HwE0DIAV
2,318,Deacon,Yes,,2002-12-23,2016-06-30,Parishes_114,Any All,001Dx00001HwE0IIAV


In [170]:
# Get Person Contact Ids
add_salesforce_record_ids(sf, df_affiliations, 'Assigned_Person', 'Contact', 'Archdpdx_Migration_Id__c', 'mbfc__Person__c')

Unnamed: 0_level_0,Assigned_Person,mbfc__Affiliation__c,adpdx_Archdiocesan_Assignment__c,mbfc__Duty_Load__c,mbfc__Start_Date__c,mbfc__Completion_Date__c,Account_Ext_Id,mbfc__Category__c,mbfc__Context__c,mbfc__Person__c
Record Number,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1
3515,780,Presbyteral Council Rep,Yes,Part Time,2024-01-01,,Vicariates_16,Any All,001Dx00001HwDx2IAF,003Dx00000nKjOcIAK
3514,762,Presbyteral Council Rep,Yes,Part Time,2024-01-01,,Vicariates_3,Any All,001Dx00001HwDwpIAF,003Dx00000nKjUcIAK
3512,3321,Deacon,Yes,Full Time,2024-02-19,,Parishes_83,Any All,001Dx00001HwDzoIAF,003Dx00000nKjSuIAK
3511,803,Special Assignment,Yes,Full Time,2024-02-10,,Offices_21,Any All,001Dx00001HwDyQIAV,003Dx00000nKiw1IAC
3510,3317,Development Operations Associate,,,2024-01-16,,Offices_1,Any All,001Dx00001HwDyIIAV,003Dx00000nKjDGIA0
...,...,...,...,...,...,...,...,...,...,...
5,511,Vicar Forane,Yes,,2016-10-01,2023-09-30,Vicariates_10,Any All,001Dx00001HwDwwIAF,003Dx00000nKjQrIAK
4,511,Administrator,Yes,,2013-07-01,2017-06-30,Parishes_109,Any All,001Dx00001HwE0DIAV,003Dx00000nKjQrIAK
3,511,Pastor,Yes,Full Time,2017-07-01,,Parishes_109,Any All,001Dx00001HwE0DIAV,003Dx00000nKjQrIAK
2,318,Deacon,Yes,,2002-12-23,2016-06-30,Parishes_114,Any All,001Dx00001HwE0IIAV,003Dx00000nKipTIAS


In [171]:
# Set Archdpdx_Migration_Id__c External ID
df_affiliations['Archdpdx_Migration_Id__c'] = df_affiliations.index

# Create Job ID
df_affiliations['Archdpdx_Job_Id__c'] = curr_job_id

df_affiliations


Unnamed: 0_level_0,Assigned_Person,mbfc__Affiliation__c,adpdx_Archdiocesan_Assignment__c,mbfc__Duty_Load__c,mbfc__Start_Date__c,mbfc__Completion_Date__c,Account_Ext_Id,mbfc__Category__c,mbfc__Context__c,mbfc__Person__c,Archdpdx_Migration_Id__c,Archdpdx_Job_Id__c
Record Number,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1
3515,780,Presbyteral Council Rep,Yes,Part Time,2024-01-01,,Vicariates_16,Any All,001Dx00001HwDx2IAF,003Dx00000nKjOcIAK,3515,124
3514,762,Presbyteral Council Rep,Yes,Part Time,2024-01-01,,Vicariates_3,Any All,001Dx00001HwDwpIAF,003Dx00000nKjUcIAK,3514,124
3512,3321,Deacon,Yes,Full Time,2024-02-19,,Parishes_83,Any All,001Dx00001HwDzoIAF,003Dx00000nKjSuIAK,3512,124
3511,803,Special Assignment,Yes,Full Time,2024-02-10,,Offices_21,Any All,001Dx00001HwDyQIAV,003Dx00000nKiw1IAC,3511,124
3510,3317,Development Operations Associate,,,2024-01-16,,Offices_1,Any All,001Dx00001HwDyIIAV,003Dx00000nKjDGIA0,3510,124
...,...,...,...,...,...,...,...,...,...,...,...,...
5,511,Vicar Forane,Yes,,2016-10-01,2023-09-30,Vicariates_10,Any All,001Dx00001HwDwwIAF,003Dx00000nKjQrIAK,5,124
4,511,Administrator,Yes,,2013-07-01,2017-06-30,Parishes_109,Any All,001Dx00001HwE0DIAV,003Dx00000nKjQrIAK,4,124
3,511,Pastor,Yes,Full Time,2017-07-01,,Parishes_109,Any All,001Dx00001HwE0DIAV,003Dx00000nKjQrIAK,3,124
2,318,Deacon,Yes,,2002-12-23,2016-06-30,Parishes_114,Any All,001Dx00001HwE0IIAV,003Dx00000nKipTIAS,2,124


In [172]:
# Final cleanup
df_affiliations.drop(columns=[
    'Account_Ext_Id',
    'Assigned_Person', 
    ], 
    inplace=True)

df_affiliations

#FIXME: INVALID_FIELD: Foreign key external ID: relcommunities_23 not found for field Archdpdx_Migration_Id__c
#FIXME: INVALID_FIELD: Foreign key external ID: offices_0 not found for field Archdpdx_Migration_Id__c
#FIXME: Record #115 > FIELD_INTEGRITY_EXCEPTION: Start Date: invalid date: Tue Aug 01 00:00:00 GMT 1021 [mbfc__Start_Date__c

Unnamed: 0_level_0,mbfc__Affiliation__c,adpdx_Archdiocesan_Assignment__c,mbfc__Duty_Load__c,mbfc__Start_Date__c,mbfc__Completion_Date__c,mbfc__Category__c,mbfc__Context__c,mbfc__Person__c,Archdpdx_Migration_Id__c,Archdpdx_Job_Id__c
Record Number,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1
3515,Presbyteral Council Rep,Yes,Part Time,2024-01-01,,Any All,001Dx00001HwDx2IAF,003Dx00000nKjOcIAK,3515,124
3514,Presbyteral Council Rep,Yes,Part Time,2024-01-01,,Any All,001Dx00001HwDwpIAF,003Dx00000nKjUcIAK,3514,124
3512,Deacon,Yes,Full Time,2024-02-19,,Any All,001Dx00001HwDzoIAF,003Dx00000nKjSuIAK,3512,124
3511,Special Assignment,Yes,Full Time,2024-02-10,,Any All,001Dx00001HwDyQIAV,003Dx00000nKiw1IAC,3511,124
3510,Development Operations Associate,,,2024-01-16,,Any All,001Dx00001HwDyIIAV,003Dx00000nKjDGIA0,3510,124
...,...,...,...,...,...,...,...,...,...,...
5,Vicar Forane,Yes,,2016-10-01,2023-09-30,Any All,001Dx00001HwDwwIAF,003Dx00000nKjQrIAK,5,124
4,Administrator,Yes,,2013-07-01,2017-06-30,Any All,001Dx00001HwE0DIAV,003Dx00000nKjQrIAK,4,124
3,Pastor,Yes,Full Time,2017-07-01,,Any All,001Dx00001HwE0DIAV,003Dx00000nKjQrIAK,3,124
2,Deacon,Yes,,2002-12-23,2016-06-30,Any All,001Dx00001HwE0IIAV,003Dx00000nKipTIAS,2,124


In [173]:
df_affiliations.to_csv('staging_files/affiliations_staging.csv', encoding='utf-8', index=False)

In [174]:
import pandas as pd
from simple_salesforce import Salesforce
from simple_salesforce.exceptions import SalesforceMalformedRequest, SalesforceError

def convert_non_serializables(record):
    """Convert non-serializable values to strings or handle them appropriately."""
    for key, value in record.items():
        if pd.isna(value):
            record[key] = None
        elif isinstance(value, pd.Timestamp):
            record[key] = value.isoformat()
        elif isinstance(value, (pd.Timedelta, pd.Period)):
            record[key] = str(value)
    return record

def upsert_to_salesforce_bulk(sf, dataframe, object_name, external_id_field, failed_log_file, batch_size=10000):
    """
    Upsert records to Salesforce from a pandas DataFrame using the Bulk API.

    Parameters:
    sf (Salesforce): The Salesforce connection instance.
    dataframe (pd.DataFrame): The pandas DataFrame containing data to upsert.
    object_name (str): The Salesforce object name (e.g., 'Contact').
    external_id_field (str): The external ID field used for upserts.
    failed_log_file (str): The file name where failed upsert records will be logged.
    batch_size (int): The number of records to include in each batch.
    """
    successful_upserts = 0
    failed_upserts = 0

    # Replace placeholder values with None in the DataFrame
    dataframe.replace({pd.NA: None, ' ': None, '': None}, inplace=True)

    # Convert DataFrame to a list of dictionaries
    data_to_upsert = dataframe.to_dict(orient='records')

    with open(failed_log_file, 'a') as log_file:
        # Process data in batches
        for i in range(0, len(data_to_upsert), batch_size):
            batch_data = data_to_upsert[i:i + batch_size]
            batch_data = [convert_non_serializables(record) for record in batch_data]

            try:
                # Perform bulk upsert
                response = sf.bulk.__getattr__(object_name).upsert(batch_data, external_id_field=external_id_field)

                for res in response:
                    if res['success']:
                        successful_upserts += 1
                    else:
                        failed_upserts += 1
                        log_file.write(f"Failed to upsert record: {res}\n")

            except SalesforceMalformedRequest as e:
                failed_upserts += len(batch_data)
                log_file.write(f"Malformed request error when upserting batch. Error: {e.content}\n")
                for record in batch_data:
                    log_file.write(f"Failed record: {record}\n")
            except SalesforceError as e:
                failed_upserts += len(batch_data)
                log_file.write(f"Salesforce error when upserting batch. Error: {e.content}\n")
                for record in batch_data:
                    log_file.write(f"Failed record: {record}\n")
            except Exception as e:
                failed_upserts += len(batch_data)
                log_file.write(f"Failed to upsert batch. Error: {e}\n")
                for record in batch_data:
                    log_file.write(f"Failed record: {record}\n")

    print(f"Upsert completed. Successful upserts: {successful_upserts}, Failed upserts: {failed_upserts}")

In [175]:
upsert_to_salesforce_bulk(sf, 'mbfc__Affiliation__c', df_affiliations, 'Archdpdx_Migration_Id__c', 'results_files/affiliation_upsert_results')

TypeError: str.replace() takes no keyword arguments

In [None]:
# @ Upsert Register Entry Records

bulk_data = []
for row in df_affiliations.itertuples(index=False):
    d = row._asdict()
    bulk_data.append(d)

In [None]:
# Upsert Salesforce records
# FIXME: Encoding is getting messed up and I'm unsure how to pass in a parameter that will fix this. 

try:
    # Attempt to upsert Affiliation records into SF using Bulk API
    affiliation_upsert = sf.bulk.mbfc__Affiliation__c.upsert(data=bulk_data, external_id_field='Archdpdx_Migration_Id__c', batch_size=1000, use_serial=False)
    affiliation_upsert_results = pd.DataFrame(affiliation_upsert)
    affiliation_upsert_results.to_csv('results_files/affiliation_upsert_results')

except SalesforceMalformedRequest as e:
    # If a SalesforceMalformedRequest error occurs, print the error message and response content
    print(f"SalesforceMalformedRequest error: {e}")
    print(f"Response content: {e.content}")


# Post-Migration Manual Updates

1. Convert 'Offices' that are ADPDX Pastoral Centre offices into record type: 'Groups', and set their parentID to the Diocese (there are just 6 of these accounts).
1. Update the Religous Order records 'Religious Superior' lookup.
1. Set 'organization type' field value for each account in the 'organization' load: Offices, Newman Centres, Schools, Organizations
1. Consolidate education degree titles in 'Affiliation.Affiliation' picklist into the standard value
