<a href="https://colab.research.google.com/github/Cath-Strategic-Tech/adpdx_etl/blob/main/ADPDX_ClergyDB.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>


# Introduction

The following notebook orchestrates the migration of ADPDX Accounts into Salesforce.


# Order of Loading

1. Vicariates
1. Organizations [MANUAL]
1. Religious Parents
1. Religious Communities
1. Religious Superiors
1. Contacts
1. Contact > Register Entries
1. Contact > Education Affiliations [MANUAL]
1. Contact > Ecclesial Affiliations [MANUAL]
1. Affiliations [MANUAL]


# Order of Operations

- Setup Enviro

  - [DONE] UDFs
  - [DONE] Load SF xref data

- ACCOUNTS

  - Extract Source Data
    - [DONE] Load 6 tables into separate dataframes
    - [DONE] Merge into single accounts table
    - [DONE]: Fix the ExternalID so that it references the original table, not the AccRecordType
  - Transform
    - Strip phone numbers
    - Validate email addresses
    - TODO: handle churches that aren't parishes (missions, non-diocesan parishes, etc.)
  - Load
    - [DONE]Vicariates
    - [DONE] Organizations (Parishes, Schools, Newman Centres, Offices)
    - Religious
      - [DONE] Religious Parent accounts
      - [DONE] Religious Communities
      - [DONE] Religious Superiors (Contacts, set AccountID to Rel. Parent)
        - [DONE]: Handle invalid email addresses
        - TODO: Handle duplicate entries
      - TODO: Update Religious Communities with lookup to Rel. Superior
  - TODO: Unit Tests
    - Num of Accounts, by type
    - Spot checking 3-5 account records & field values

- CONTACTS

  - Extract

    - [DONE] Import Contact records
    - TODO: Get Photo directory @soames

  - Analysis

    - [DONE] Check columns & row count (3016)
    - [DONE] Identify unique languages

  - Transform

    - Complete ETL of fields that are more complex (search for TODO)
    - [DONE] Create new df_contact_staging, renaming columns to SF APIs
    - [DONE] Drop columns that don't map to Contact
    - Migrate Languages field (waiting on next package version) @soames
      - TODO: transform `,` to `;` so imports to multi-select list correctly
    - TODO: Concat Mailing Street Address lines into one
    - TODO: Handle Private Addresses: decide if will code changes or NOT use a custom Private Address field.
    - [DONE] Update boolean fields to True/False
    - [DONE] Set Contact Record Type (UDF)
    - [DONE] Validate, drop invalid emails
    - [DONE] Generate ExternalID > 'Archdpdx_External_Id\_\_c'
    - TODO: Preferred Email/Phone > where blank, set a default. Currently, all are getting set to 'Personal' and 'Mobile.'
    - TODO: Ecclesial Status (not mapping correctly)
    - [DONE] DROP columns that haven't been mapped yet

  - Load
    - [DONE] Set JobID to curr_job_id
    - [DONE] Handle character encoding that is geting messed up

- CONTACTS > SPOUSES

- CONTACTS > PHOTOS

- CONTACTS > REGISTER ENTRIES

  - Parse columns into types of Sacraments or Notations
  - For lookups to Celebrants, query SF for contacts, create missing records
  - Generate External ID, apply to df
  - Clean up (remove extra columns, NaNs)
  - Upsert records

- CONTACTS > AFFILIATIONS

  - Map the various Contact fields that are actually Affiliations (start with manual migration)
    - Education/Degrees
    - Minor Orders
    - Religious Vows
    - Candidacy records (should this be another object?)
    - In/Excardination
    - Faculties

- AFFILIATIONS TABLE

  - Extract

    - [DONE] Turn the 'Org Table Name' & 'org Table Link' columns into External ID
    - Map in the Account IDs from SF

  - Transform

    - Parse RecordTypeId
    - Parse Category
    - Map columns to SF field APIs

  - Load


# Setup Enviro


In [763]:
# !conda install -y simple-salesforce
# !conda install -y email_validator
# !conda install -y python-dotenv
# !conda install import-ipynb


In [764]:
# enviro setup

import pandas as pd
import numpy as np

from datetime import datetime
now = datetime.now()

from simple_salesforce import Salesforce

In [765]:
# import environment variables (SF login credentials)
from dotenv import load_dotenv
import os

load_dotenv()

True

In [766]:
# @title Global Variables { run: "auto", vertical-output: true, display-mode: "both" }

target_enviro = "adpdx_devpro" # @param {type:"string"}

# @markdown The `run_upserts` variable controls whether or not upserts to Salesforce are executed when the notebook is run.
run_upserts = "True" # @param ["True", "False"]

In [767]:
# ADPDX dev_pro credentials
adpdx_user = os.getenv('ADPDX_UAT_USER')
print(adpdx_user)
adpdx_pass = os.getenv('ADPDX_UAT_PASS')
print( adpdx_pass)
adpdx_token = os.getenv('ADPDX_UAT_TOKEN')
print(adpdx_token)

# instantiate a SF session object
sf = Salesforce(domain='test', username=adpdx_user, password=adpdx_pass, security_token=adpdx_token)

matt+adpdx@meribahflow.com.uat
8n&ycaQJ
aKRgyLyAX5V0YPeJJRX5bDdi


## UDFs


In [768]:
from simple_salesforce import Salesforce

# Job ID Incrementer
def update_job_id(file_name):
    # Open the file in read mode and get the current job ID
    with open(file_name, 'r') as file:
        current_job_id = int(file.readline())

    # Increment the job ID
    new_job_id = current_job_id + 1

    # Open the file in write mode and update the job ID
    with open(file_name, 'w') as file:
        file.write(str(new_job_id))

    # Return the new job ID
    return new_job_id


# Concates two DF columns for an External ID
def concat_columns(df, columns, new_column, separator='_'):
    """
    Concatenates the values from specified columns into a single string
    with the specified separator and populates a new column in the DataFrame.

    Args:
    - df: pandas DataFrame
    - columns: list of column names to concatenate
    - new_column: name of the new column to be created
    - separator: separator to use between concatenated values (default is '_')

    Returns:
    - Updated pandas DataFrame with the new column
    """
    df[new_column] = df[columns].astype(str).apply(lambda x: separator.join(x), axis=1)
    return df


# Gets or creates a Diocesan account based on the Account Name
def get_or_create_diocesan_account(sf, account_name):
    """
    Searches for an account by name, returns the ID if found,
    otherwise creates the account with RecordType 'Church' and 'mbfc__Church_Type__c' set to 'Diocese',
    and then returns the new ID.

    Parameters:
    sf (Salesforce): Salesforce connection object
    account_name (str): The name of the account to search for or create

    Returns:
    str: The ID of the found or created account
    """

    # Query for the Record Type ID using the Developer Name 'Church'
    record_type_query = "SELECT Id FROM RecordType WHERE SobjectType = 'Account' AND DeveloperName = 'Church' LIMIT 1"
    record_type_result = sf.query(record_type_query)
    if record_type_result['records']:
        record_type_id = record_type_result['records'][0]['Id']
    else:
        raise ValueError("No RecordType found with DeveloperName 'Church'")

    # Search for the Account by name
    account_query = f"SELECT Id FROM Account WHERE Name = '{account_name}' LIMIT 1"
    account_result = sf.query(account_query)
    
    if account_result['records']:
        # Account found, return the ID
        return account_result['records'][0]['Id']
    else:
        # Account not found, create a new Account
        account_data = {
            'Name': account_name,
            'RecordTypeId': record_type_id,
            'mbfc__Church_Type__c': 'Diocese'
        }
        new_account = sf.Account.create(account_data)
        return new_account['id']
    
    from simple_salesforce import Salesforce

# improved version of the get_or_create_diocesan_account function
def get_or_create_account(sf, account_name, record_type_dev_name, church_type):
    """
    Searches for an account by name, returns the ID if found,
    otherwise creates the account with the specified Record Type and Church Type,
    and then returns the new ID.

    Parameters:
    sf (Salesforce): Salesforce connection object
    account_name (str): The name of the account to search for or create
    record_type_dev_name (str): The developer name of the Record Type to use for creating the account
    church_type (str): The Church Type to set for the new account

    Returns:
    str: The ID of the found or created account
    """

    # Query for the Record Type ID using the provided developer name
    record_type_query = f"SELECT Id FROM RecordType WHERE SobjectType = 'Account' AND DeveloperName = '{record_type_dev_name}' LIMIT 1"
    record_type_result = sf.query(record_type_query)
    if record_type_result['records']:
        record_type_id = record_type_result['records'][0]['Id']
    else:
        raise ValueError(f"No RecordType found with DeveloperName '{record_type_dev_name}'")

    # Search for the Account by name
    account_query = f"SELECT Id FROM Account WHERE Name = '{account_name}' LIMIT 1"
    account_result = sf.query(account_query)
    
    if account_result['records']:
        # Account found, return the ID
        return account_result['records'][0]['Id']
    else:
        # Account not found, create a new Account
        account_data = {
            'Name': account_name,
            'RecordTypeId': record_type_id,
            'mbfc__Church_Type__c': church_type
        }
        new_account = sf.Account.create(account_data)
        return new_account['id']

# Example usage
# sf = Salesforce(username='your_username', password='your_password', security_token='your_security_token')
# account_id = get_or_create_account(sf, 'Diocese of Calgary', 'Church', 'Diocese')
# print(f"Account ID: {account_id}")

In [769]:
# Add a Salesforce record ID column to a DataFrame based on matching external ID field values

import pandas as pd
from simple_salesforce import Salesforce
from simple_salesforce.exceptions import SalesforceMalformedRequest, SalesforceError

def add_salesforce_record_ids(sf, dataframe, df_column_name, sf_object_name, sf_external_id_field, new_column_name, chunk_size=1000):
    """
    Add a Salesforce record ID column to a DataFrame based on matching external ID field values.

    Parameters:
    sf (Salesforce): The Salesforce connection instance.
    dataframe (pd.DataFrame): The pandas DataFrame containing data to match.
    df_column_name (str): The column name in the DataFrame to match with Salesforce.
    sf_object_name (str): The Salesforce object name (e.g., 'Contact').
    sf_external_id_field (str): The external ID field in Salesforce to match.
    new_column_name (str): The name for the new DataFrame column to hold Salesforce record IDs.
    chunk_size (int): The number of records to include in each chunk for querying Salesforce.

    Returns:
    pd.DataFrame: The original DataFrame with the new column containing Salesforce record IDs.
    """
    # Ensure the dataframe column name exists in the dataframe
    if df_column_name not in dataframe.columns:
        raise ValueError(f"Column '{df_column_name}' not found in DataFrame.")
    
    # Create a set of unique values from the specified DataFrame column
    unique_values = dataframe[df_column_name].dropna().unique()
    
    id_mapping = {}
    
    # Process the unique values in chunks
    for start in range(0, len(unique_values), chunk_size):
        chunk_values = unique_values[start:start + chunk_size]
        chunk_values_str = ", ".join([f"'{val}'" for val in chunk_values])
        
        soql_query = f"SELECT Id, {sf_external_id_field} FROM {sf_object_name} WHERE {sf_external_id_field} IN ({chunk_values_str})"
        
        try:
            query_result = sf.query_all(soql_query)
        except SalesforceMalformedRequest as e:
            raise ValueError(f"Malformed request error: {e.content}")
        except SalesforceError as e:
            raise ValueError(f"Salesforce error: {e.content}")
        
        # Update the id_mapping with results from the current chunk
        id_mapping.update({record[sf_external_id_field]: record['Id'] for record in query_result['records']})
    
    # Map the Salesforce record IDs to the DataFrame
    dataframe[new_column_name] = dataframe[df_column_name].map(id_mapping)
    
    return dataframe

In [770]:
import pandas as pd
from simple_salesforce import Salesforce
from simple_salesforce.exceptions import SalesforceMalformedRequest, SalesforceError

def find_salesforce_record_id(sf, df, column_to_search, sf_object_name, sf_field_name, new_column_name, match_behavior='first'):
    """
    Find Salesforce record IDs for a DataFrame column and add a new column with the Salesforce record IDs.

    Parameters:
    sf (Salesforce): The Salesforce connection instance.
    df (pd.DataFrame): The pandas DataFrame containing data.
    column_to_search (str): The column name in the DataFrame to search against Salesforce.
    sf_object_name (str): The Salesforce object name (e.g., 'Contact').
    sf_field_name (str): The field name in Salesforce to match.
    new_column_name (str): The name for the new DataFrame column to hold Salesforce record IDs.
    match_behavior (str): Behavior when multiple matches found ('first' or 'alert').

    Returns:
    pd.DataFrame: The original DataFrame with the new column containing Salesforce record IDs.
    """
    if column_to_search not in df.columns:
        raise ValueError(f"Column '{column_to_search}' not found in DataFrame.")

    df[new_column_name] = None
    multiple_matches_found = False

    unique_values = df[column_to_search].dropna().unique()
    chunk_size = 1000  # Adjust chunk size as needed

    for start in range(0, len(unique_values), chunk_size):
        chunk_values = unique_values[start:start + chunk_size]
        chunk_values_str = ", ".join([f"'{val}'" for val in chunk_values])

        soql_query = f"SELECT Id, {sf_field_name} FROM {sf_object_name} WHERE {sf_field_name} IN ({chunk_values_str})"
        
        try:
            query_result = sf.query_all(soql_query)
        except SalesforceMalformedRequest as e:
            raise ValueError(f"Malformed request error: {e.content}")
        except SalesforceError as e:
            raise ValueError(f"Salesforce error: {e.content}")

        id_mapping = {}
        for record in query_result['records']:
            key = record[sf_field_name]
            if key in id_mapping:
                multiple_matches_found = True
                if match_behavior == 'first':
                    continue  # Skip subsequent matches if 'first' behavior is selected
            id_mapping[key] = record['Id']

        df[new_column_name] = df[column_to_search].map(id_mapping)

    if multiple_matches_found and match_behavior == 'alert':
        print("Alert: Multiple matches found for some records.")

    return df

# Example usage
# df_contact_staging = find_salesforce_record_id(sf, df_contact_staging, 'Link_to_Religious_Community', 'Contact', 'Archdpdx_Migration_Id__c', 'New_Column_Name', match_behavior='alert')


In [771]:
# Upsert functions

import pandas as pd
import numpy as np
from simple_salesforce import Salesforce, SalesforceMalformedRequest, SalesforceError
from datetime import datetime, date

def convert_non_serializables(data):
    """Convert non-serializable objects to serializable formats."""
    for key, value in data.items():
        try:
            if isinstance(value, float) and np.isnan(value):
                data[key] = None
            elif pd.isna(value):
                data[key] = None
            elif isinstance(value, (int, bool, str)):
                data[key] = value
            else:
                data[key] = str(value)  # Convert other types to string
        except Exception as e:
            print(f"Error processing key: {key}, value: {value}, error: {e}")
    return data


def upsert_to_salesforce(sf, dataframe, object_name, external_id_field):
    """
    Upsert records to Salesforce from a pandas DataFrame.

    Parameters:
    sf (Salesforce): The Salesforce connection instance.
    dataframe (pd.DataFrame): The pandas DataFrame containing data to upsert.
    object_name (str): The Salesforce object name (e.g., 'Contact').
    external_id_field (str): The external ID field used for upserts.
    """
    successful_upserts = 0
    failed_upserts = 0

    # Replace placeholder values with None in the DataFrame
    dataframe.replace({None: pd.NA, ' ': None, '': None}, inplace=True)

    # Convert DataFrame to a list of dictionaries
    data_to_upsert = dataframe.to_dict(orient='records')

    for data in data_to_upsert:
        try:
            data = convert_non_serializables(data)
            external_id = data.pop(external_id_field)

            # Perform upsert using only the External ID
            response = getattr(sf, object_name).upsert(f'{external_id_field}/{external_id}', data)
            successful_upserts += 1
            print(f"Successfully upserted {object_name} with External ID: {external_id}")
        except SalesforceMalformedRequest as e:
            failed_upserts += 1
            print(f"Malformed request error when upserting {object_name} with External ID: {external_id}. Error: {e.content}")
        except SalesforceError as e:
            failed_upserts += 1
            print(f"Salesforce error when upserting {object_name} with External ID: {external_id}. Error: {e.content}")
        except Exception as e:
            failed_upserts += 1
            print(f"Failed to upsert {object_name} with External ID: {external_id}. Error: {e}")

    print(f"Upsert completed. Successful upserts: {successful_upserts}, Failed upserts: {failed_upserts}")


def convert_non_serializables(data):
    """Convert non-serializable objects to serializable formats."""
    for key, value in data.items():
        try:
            if isinstance(value, (datetime, date)):
                data[key] = value.isoformat()
            elif isinstance(value, float) and np.isnan(value):
                data[key] = None
            elif pd.isna(value):
                data[key] = None
            elif isinstance(value, (int, bool, str)):
                data[key] = value
            else:
                data[key] = str(value)  # Convert other types to string
        except Exception as e:
            print(f"Error processing key: {key}, value: {value}, error: {e}")
    return data

def upsert_to_salesforce_bulk(sf, dataframe, object_name, external_id_field, failed_log_file, batch_size=100):
    """
    Upsert records to Salesforce from a pandas DataFrame using the Bulk API.

    Parameters:
    sf (Salesforce): The Salesforce connection instance.
    dataframe (pd.DataFrame): The pandas DataFrame containing data to upsert.
    object_name (str): The Salesforce object name (e.g., 'Contact').
    external_id_field (str): The external ID field used for upserts.
    failed_log_file (str): The file name where failed upsert records will be logged.
    batch_size (int): The number of records to include in each batch.
    """
    successful_upserts = 0
    failed_upserts = 0

    # Replace placeholder values with None in the DataFrame
    dataframe.replace({pd.NA: None, ' ': None, '': None}, inplace=True)

    # Convert DataFrame to a list of dictionaries
    data_to_upsert = dataframe.to_dict(orient='records')

    with open(failed_log_file, 'a') as log_file:
        # Process data in batches
        for i in range(0, len(data_to_upsert), batch_size):
            batch_data = data_to_upsert[i:i + batch_size]
            batch_data = [convert_non_serializables(record) for record in batch_data]

            try:
                # Perform bulk upsert
                response = sf.bulk.__getattr__(object_name).upsert(batch_data, external_id_field=external_id_field)

                for res in response:
                    if res['success']:
                        successful_upserts += 1
                    else:
                        failed_upserts += 1
                        log_file.write(f"Failed to upsert record: {res}\n")

            except SalesforceMalformedRequest as e:
                failed_upserts += len(batch_data)
                log_file.write(f"Malformed request error when upserting batch. Error: {e.content}\n")
                for record in batch_data:
                    log_file.write(f"Failed record: {record}\n")
            except SalesforceError as e:
                failed_upserts += len(batch_data)
                log_file.write(f"Salesforce error when upserting batch. Error: {e.content}\n")
                for record in batch_data:
                    log_file.write(f"Failed record: {record}\n")
            except Exception as e:
                failed_upserts += len(batch_data)
                log_file.write(f"Failed to upsert batch. Error: {e}\n")
                for record in batch_data:
                    log_file.write(f"Failed record: {record}\n")

    print(f"Upsert completed. Successful upserts: {successful_upserts}, Failed upserts: {failed_upserts}")


## Extract Salesforce xref data

The following cells downloads all records from the target Salesforce enviro for the following objects:

- RecordTypes
- Users
- Accounts
- Contacts


In [772]:
# Get or create the Diocesan Account and get its ID

# calls old function
# diocesan_account_id = get_or_create_diocesan_account(sf, 'Archdiocese of Portland in Oregon')

# calls new function
diocesan_account_id = get_or_create_account(sf, 'Archdiocese of Portland in Oregon', 'Church', 'Diocese')

print(f"Account ID: {diocesan_account_id}")

Account ID: 001Dx00001HwDsgIAF


In [773]:
# get all ACTIVE SF users

sf_users = sf.query('Select Alias, FirstName, LastName, Username, id from User WHERE IsActive = True')
df_sf_users = pd.DataFrame(sf_users['records'])
df_sf_users = df_sf_users.drop(columns = 'attributes')
df_sf_users.shape

(20, 5)

In [774]:
# get all SF Record Types
get_all_recordTypes = 'Select Id, Name, DeveloperName, sObjecttype, namespaceprefix from RecordType'

# get list of records, add to dataframe
sf_recordTypes = sf.query(get_all_recordTypes)
df_sf_recordTypes = pd.DataFrame(sf_recordTypes['records'])
df_sf_recordTypes = df_sf_recordTypes.drop(columns = 'attributes')

# Create a dictionary mapping 'DeveloperName' to 'Id' for faster lookup
record_types_mapping = df_sf_recordTypes.set_index('DeveloperName')['Id'].to_dict()

df_sf_recordTypes

Unnamed: 0,Id,Name,DeveloperName,SobjectType,NamespacePrefix
0,012Dx0000003p4xIAA,Church,Church,Account,mbfc
1,012Dx0000003p4yIAA,Deanery,Deanery,Account,mbfc
2,012Dx0000003p4zIAA,Group,Group,Account,mbfc
3,012Dx0000003p50IAA,Organization,Organization,Account,mbfc
4,012Dx0000003p51IAA,Property,Property,Account,mbfc
5,012Dx0000003p52IAA,Religious,Religious,Account,mbfc
6,012Dx0000003p53IAA,z) All Types,All_Types,mbfc__Affiliation__c,mbfc
7,012Dx0000003p54IAA,Any,Any,mbfc__Affiliation__c,mbfc
8,012Dx0000003p55IAA,Pastoral Assignments,Assignments_Clergy,mbfc__Affiliation__c,mbfc
9,012Dx0000003p56IAA,Chancery Users,Chancery_Users,mbfc__Affiliation__c,mbfc


In [775]:
# get SF Account
get_all_accounts = 'Select id, Name, RecordTypeId, Type, mbfc__Parish_Code__c, Job_Id__c, Archdpdx_Migration_Id__c from Account'

# get list of records, add to dataframe
sf_accounts = sf.query(get_all_accounts)
df_sf_accounts = pd.DataFrame(sf_accounts['records'])
df_sf_accounts = df_sf_accounts.drop(columns = 'attributes')
df_sf_accounts.shape

(2000, 7)

In [776]:
# get SF Contacts
get_all_contacts = 'Select id, Name, npe01__Type_of_Account__c, RecordTypeId, Archdpdx_Migration_Id__c, CreatedById from Contact'

# get list of records, add to dataframe
sf_contacts = sf.query(get_all_contacts)
df_sf_contacts = pd.DataFrame(sf_contacts['records'])
df_sf_contacts = df_sf_contacts.drop(columns = 'attributes')
df_sf_contacts.shape

(2000, 6)

# ACCOUNTS


## Extract


### Load ArchdPDX csvs as DataFrames

ADPDX data for organizations is held in 6 tables, all of which will be migrated into Salesforce's Accounts object.


In [777]:
df_offices = pd.read_csv('/Users/matthewmartin/Library/CloudStorage/GoogleDrive-matt@meribahflow.com/Shared drives/Clients/ADPDX (Portland)/Data/Clergy DB/reports from clergypdx/Offices.csv', skiprows= lambda x: x in [1])
df_offices["src_table"] = 'Offices'
df_offices["AccountRecordType"] = 'Organization'
df_offices.rename({
    "Common Name": "Name",
    "Name": "Formal_Name__c"
    }, axis="columns", inplace=True)


In [778]:
df_parishes = pd.read_csv('/Users/matthewmartin/Library/CloudStorage/GoogleDrive-matt@meribahflow.com/Shared drives/Clients/ADPDX (Portland)/Data/Clergy DB/reports from clergypdx/Parishes (3).csv', dtype={'Vicariate': 'object', 'Established': 'str', 'Mission Of': 'str'}, skiprows= lambda x: x in [1])
df_parishes["src_table"] = 'Parishes'
df_parishes["AccountRecordType"] = 'Church'
# df_parishes.rename({"Parish Formal Name": "Account Name"}, axis="columns", inplace=True)
df_parishes.rename({
                    "Parish Formal Name": "Formal_Name__c",
                    "Common Name": "Name"
                }, axis="columns", inplace=True)


In [779]:
df_religious = pd.read_csv('/Users/matthewmartin/Library/CloudStorage/GoogleDrive-matt@meribahflow.com/Shared drives/Clients/ADPDX (Portland)/Data/Clergy DB/reports from clergypdx/RelCommunities.csv', skiprows= lambda x: x in [1])
df_religious["src_table"] = 'RelCommunities'
df_religious["AccountRecordType"] = 'Religious'
df_religious.rename({
                    "Community Name": "Formal_Name__c",
                    "Common Name": "Name"
                     }, axis="columns", inplace=True)


In [780]:
df_schools = pd.read_csv('/Users/matthewmartin/Library/CloudStorage/GoogleDrive-matt@meribahflow.com/Shared drives/Clients/ADPDX (Portland)/Data/Clergy DB/reports from clergypdx/Schools.csv', skiprows= lambda x: x in [1])
df_schools["src_table"] = 'Schools'
df_schools["AccountRecordType"] = 'Organization'
df_schools.rename({
                    "School Name": "Formal_Name__c",
                    "Common Name": "Name"
                    
                    }, axis="columns", inplace=True)

In [781]:
df_vicariates = pd.read_csv('/Users/matthewmartin/Library/CloudStorage/GoogleDrive-matt@meribahflow.com/Shared drives/Clients/ADPDX (Portland)/Data/Clergy DB/reports from clergypdx/Vicariates.csv', skiprows= lambda x: x in [1])
df_vicariates["src_table"] = 'Vicariates'
df_vicariates["AccountRecordType"] = 'Deanery'
# As we want to designate the Common Name as what will be the Account Name in Salesforce, we are renaming these columns in a different pattern than prior CSVs.
df_vicariates.rename({"Common Name": "Name"}, axis="columns", inplace=True)
df_vicariates

Unnamed: 0,Record Number,Name,Vicariate Name,Archdiocese Assigns Clergy,src_table,AccountRecordType
0,1,Albany-Corvallis Vicariate,Albany-Corvallis,Yes,Vicariates,Deanery
1,2,"Beaverton, Suburban Vicariate","Beaverton, Suburban",Yes,Vicariates,Deanery
2,3,Columbia County Vicariate,Columbia County,Yes,Vicariates,Deanery
3,4,Downtown Portland Vicariate,Downtown Portland,Yes,Vicariates,Deanery
4,5,"East Portland, Suburban Vicariate","East Portland, Suburban",Yes,Vicariates,Deanery
5,6,Marion County Vicariate,Marion County,Yes,Vicariates,Deanery
6,7,Metropolitan Eugene Vicariate,Metropolitan Eugene,Yes,Vicariates,Deanery
7,8,Metropolitan Salem Vicariate,Metropolitan Salem,Yes,Vicariates,Deanery
8,9,North Coast Vicariate,North Coast,Yes,Vicariates,Deanery
9,10,Northeast Portland Vicariate,Northeast Portland,Yes,Vicariates,Deanery


In [782]:
df_newman = pd.read_csv('/Users/matthewmartin/Library/CloudStorage/GoogleDrive-matt@meribahflow.com/Shared drives/Clients/ADPDX (Portland)/Data/Clergy DB/reports from clergypdx/NewmanCenters.csv', skiprows= lambda x: x in [1])
df_newman["src_table"] = 'NewmanCenters'
df_newman["AccountRecordType"] = 'Organization'
df_newman.rename({
                    "Newman Center Name": "Formal_Name__c",
                    "Common Name": "Name",
                    "Newman Center City": "Mailing Address City2"
                  }, axis="columns", inplace=True)


Each of the 6 tables has an overlapping but distinct set of columns, making it challenging to conform these tables into a single staging table.

In addition, columns that correspond to the same field in salesforce are named differently in each table (eg. 'Parish City' vs. 'Religious City' vs. 'Newman Center City')


In [783]:
print('TABLE: (ROWS, COLUMNS)\n')

print(f'Offices:    {df_offices.shape}')
print(f'Parishes:   {df_parishes.shape}')
print(f'Religious:  {df_religious.shape}')
print(f'Schools:    {df_schools.shape}')
print(f'Vicariates: {df_vicariates.shape}')
print(f'Newman Ctr: {df_newman.shape}')

TABLE: (ROWS, COLUMNS)

Offices:    (35, 18)
Parishes:   (151, 45)
Religious:  (70, 34)
Schools:    (56, 26)
Vicariates: (18, 6)
Newman Ctr: (4, 37)


### Merge DFs into a single Accounts DF

This step takes 6 different tables and combines them into a single Accounts table for cleaning and staging.


In [784]:
# init list of DataFrames
src_accounts = [df_offices, df_parishes, df_religious, df_schools, df_vicariates, df_newman]

# concats the various Account dataframes into one large table
accounts = pd.concat(src_accounts, ignore_index=True)

In [785]:
accounts.columns

Index(['Record Number', 'Name', 'Formal_Name__c', 'Archdiocese Assigns Clergy',
       'Locator Description', 'Mailing Address', 'Mailing Address 2',
       'Mailing Address City', 'Mailing Address State',
       'Mailing Address Province', 'Mailing Address Postal Code',
       'Mailing Address Country', 'Phone', 'Fax', 'Email', 'Web Site',
       'src_table', 'AccountRecordType', 'Sort Name', 'Parish Name',
       'Parish City', 'Mission Of', 'Established', 'Vicariate', 'Non-Latin',
       'County', 'Disabled Access', 'Sanctuary Capacity',
       'Lat/Long Coordinates Decimal', 'Google Small Embed URL',
       'Miles to Pastoral Center', 'Schedule 1 Head', 'Schedule 1 Text',
       'Schedule 2 Head', 'Schedule 2 Text', 'Schedule 3 Head',
       'Schedule 3 Text', 'Schedule 4 Head', 'Schedule 4 Text',
       'Schedule 5 Head', 'Schedule 5 Text', 'Schedule 6 Head',
       'Schedule 6 Text', 'Schedule 7 Head', 'Schedule 7 Text',
       'Community City', 'Order Full Name', 'Order Common N

## Transform


Time to do some table column renaming and re-organizing!


In [786]:
# renames columns headers to consolidate account names into SF-conformed data model
accounts.rename({"Common Name": "Name, City"}, axis="columns", inplace=True)

accounts.rename(
    columns={
        # 'Account Name': 'Name',
        'Mailing Address': 'BillingStreet1',
        'Mailing Address 2': 'BillingStreet2',
        'Mailing Address City': 'BillingCity',
        'Mailing Address State': 'BillingState',
        'Mailing Address Postal Code': 'BillingPostalCode',
        'Mailing Address Country': 'BillingCountry',
        'Email': 'mbfc__Email__c',
        'Web Site': 'Website',
        'Order Common Name': 'mbfc__Abbreviation__c',
        'Order Letters': 'mbfc__Religious_Suffix__c',
        'Men or Women': 'mbfc__Type_Members__c',
        'Archdiocese Assigns Clergy': 'Archdiocese_Assigns_Clergy__c',
        'Locator Description': 'Locator_Description__c',
        'Mission Of': 'Parent_Parish__c',
        'Established': 'mbfc__Date_Established__c',
        'County': 'County__c',
        'Disabled Access': 'Disabled_Access__c',
        'Sanctuary Capacity': 'Sanctuary_Capacity__c',
        'Miles to Pastoral Centre': 'Miles_to_Pastoral_Centre__c',
        'Archdiocesan School Code': 'Archdiocesan_School_Code__c',
        'Grades Provided': 'Grades_Provided__c'

    },
    inplace=True
)


# reorder column order
col = accounts.pop('Name')
accounts.insert(2, col.name, col)

col = accounts.pop('Parish Name')
accounts.insert(3, col.name, col)

col = accounts.pop('AccountRecordType')
accounts.insert(1, col.name, col)



In [787]:
accounts[accounts.BillingStreet2.isna() == False]

Unnamed: 0,Record Number,AccountRecordType,Formal_Name__c,Name,Parish Name,Archdiocese_Assigns_Clergy__c,Locator_Description__c,BillingStreet1,BillingStreet2,BillingCity,BillingState,Mailing Address Province,BillingPostalCode,BillingCountry,Phone,Fax,mbfc__Email__c,Website,src_table,Sort Name,Parish City,Parent_Parish__c,mbfc__Date_Established__c,Vicariate,Non-Latin,County__c,Disabled_Access__c,Sanctuary_Capacity__c,Lat/Long Coordinates Decimal,Google Small Embed URL,Miles to Pastoral Center,Schedule 1 Head,Schedule 1 Text,Schedule 2 Head,Schedule 2 Text,Schedule 3 Head,Schedule 3 Text,Schedule 4 Head,Schedule 4 Text,Schedule 5 Head,Schedule 5 Text,Schedule 6 Head,Schedule 6 Text,Schedule 7 Head,Schedule 7 Text,Community City,Order Full Name,mbfc__Abbreviation__c,mbfc__Religious_Suffix__c,mbfc__Type_Members__c,Non-Latin Rite,Show Order in Name,Description,Religious Order,Secular Order,Diocesan Order,Pontifical Order,Local Superior,Major Superior Name,Major Superior Phone,Major Superior Email,School City,Parish Link,Vicariate Link,Archdiocesan_School_Code__c,Grades_Provided__c,Mailing Address 1,Mailing Address Zip,Vicariate Name,Mailing Address City2
14,32,Organization,Diaconate Office,Diaconate Office,,Yes,,Pastoral Center,2838 E Burnside St,Portland,OR,,97214,,503-233-8337,,bdiehm@archdpdx.org,https://deacons.archdpdx.org/,Offices,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,
32,58,Organization,Office of Marketing and Communications,Office of Marketing and Communications,,Yes,,Pastoral Center,2838 E Burnside St,Portland,OR,,97214,,503-233-8332,,news@archdpdx.org,https://archdpdx.org/communications,Offices,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,
35,1,Church,"Our Lady of Perpetual Help, St Mary’s","Our Lady of Perpetual Help, St Mary’s, Albany",,Yes,SW Ellsworth St between 8th and 9th Streets,"Our Lady of Perpetual Help, St Mary’s Parish",815 Broadalbin St SW,Albany,OR,,97321,,541-926-1449,541-926-2191,olphoffice@stmarysalbany.com,https://stmarysalbany.com/,Parishes,our lady of perpetual help st marys albany,Albany,0,1885,1,No,Linn,Yes,600.0,"44.6313042,-123.1059622",https://www.google.com/maps/embed?pb=!1m14!1m8...,72.0,Weekend Mass,"Saturday Vigil 5:00 pm, 7:00 pm (Español)<br>S...",Weekday Mass,"Tuesday Noon<br>Wednesday 8:30 am, 7:00 pm (Es...",Reconciliation (Confession),Saturday 3:00 – 4:30 pm,Adoration,Wednesday 6:00 pm – 7:00 pm in the Church<br>D...,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,
36,2,Church,St. Andrew Dũng-Lạc,"St. Andrew Dũng-Lạc Mission, Aloha",,No,SW Grabhorn Rd/209th Ave and Farmington Rd,St. Andrew Dũng-Lạc Mission,7390 SW Grabhorn Rd,Aloha,OR,,97007,,503-591-5302,,,http://www.anredl.org/,Parishes,st andrew dunglac aloha,Aloha,83,0,13,No,Washington,No,0.0,"45.4667627,-122.893276",https://www.google.com/maps/embed?pb=!1m14!1m8...,18.0,Weekend Mass,"Sunday 9:00 am, 11:00 am (Youth)",Weekday Mass,Tuesday 6:30 pm,Reconciliation (Confession),Sunday 10:15–11:15 am<br>Tuesday 6:00 pm,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,
37,3,Church,St. Elizabeth Ann Seton,"St. Elizabeth Ann Seton, Aloha",,Yes,,St. Elizabeth Ann Seton Parish,3145 SW 192nd Ave,Aloha,OR,,97003,,503-649-9044,503-848-2915,admin@seas-aloha.org,http://www.seas-aloha.org/,Parishes,st elizabeth ann seton aloha,Aloha,0,1982,16,No,Washington,Yes,675.0,"45.4965071,-122.8780289",https://www.google.com/maps/embed?pb=!1m14!1m8...,17.0,Weekend Mass,"Saturday 5:30 pm (Español)<br>Sunday 8:00 am, ...",Weekday Mass,Tuesday–Friday 8:00 am<br>Tuesday 7:00 pm (Esp...,Reconciliation (Confession),Tuesday 8:00–9:00 pm<br>Friday 9:00–10:00 am<b...,Adoration/Adoración,Sunday 11:00 am–3:00 pm (chapel)<br>Monday–Wed...,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
236,62,Religious,Work of Jesus the High Priest,"Work of Jesus the High Priest, Gresham (OJSS)",,No,,OJSS Community,451 NW 1st St,Gresham,OR,,97030,,,,,https://www.familiemariens.info/html/en/index....,RelCommunities,,,,,,,,,,,,,,,,,,,,,,,,,,,Gresham,Opera di Gesù Sommo Sacerdote,Work of Jesus the High Priest,OJSS,Men,No,Yes,Missionary brothers and priests associated wit...,Yes,No,,,2977.0,Fr. Gebhard Paul M. Sigl,,,,,,,,,,,
238,64,Religious,Heralds of the Good News,"Heralds of the Good News, Portland (HGN)",,No,,c/o Chancellor,2838 E Burnside St,Portland,OR,,97214,,503-233-8322,,vschueler@archdpdx.org,,RelCommunities,,,,,,,,,,,,,,,,,,,,,,,,,,,Portland,Heralds of the Good News,Heralds of the Good News,HGN,Men,No,Yes,,,,,,0.0,Fr. Kappumkal Thomas,+91 80 74 51 02 67,rkappumkal@gmail.com,,,,,,,,,
239,65,Religious,Missionary Oblates of Mary Immaculate,"Missionary Oblates of Mary Immaculate, Rome, I...",,No,,Missionary Oblates of Mary Immaculate,Via Aurelia 290,Roma,,,00165,ITALY,,,,,RelCommunities,,,,,,,,,,,,,,,,,,,,,,,,,,,"Rome, ITALY",Missionary Oblates of Mary Immaculate,Oblates of Mary Immaculate,OMI,Men,No,Yes,,,,,,0.0,"Fr. Luis Ignacio Rois Alonso, OMI",,gensec@omigen.org,,,,,,,,,
247,73,Religious,Brothers of Saint John,"Brothers of Saint John, Laredo, TX (CSJ)",,No,,St. John Priory,505 Century Dr S,Laredo,TX,,78046,,956-285-3784,,br.jmp@stjean.com,https://csjohn.org/,RelCommunities,,,,,,,,,,,,,,,,,,,,,,,,,,,"Laredo, TX",Brothers of Saint John,Brothers of Saint John,CSJ,Men,No,Yes,,,,,,0.0,,,,,,,,,,,,


In [788]:
# merge two Non-Latin columns into one 
accounts['Non_Latin__c'] = accounts['Non-Latin'].combine_first(accounts['Non-Latin Rite']) 

# Rename the 'Non_Latin__c' field to 'mbfc__Non_Latin__c'
accounts.rename(columns={'Non_Latin__c': 'mbfc__Non_Latin__c'}, inplace=True)


In [789]:
# export merged tables DESCRIPTION to CSV for mapping
accounts.describe(include='all').transpose().to_csv(f'/Users/matthewmartin/Library/CloudStorage/GoogleDrive-matt@meribahflow.com/Shared drives/Clients/ADPDX (Portland)/Data/Clergy DB/working/accounts.csv')
accounts.describe(include='all').transpose()

Unnamed: 0,count,unique,top,freq,mean,std,min,25%,50%,75%,max
Record Number,334.0,,,,54.5,41.389801,1.0,21.25,45.0,76.75,173.0
AccountRecordType,334,4,Church,151,,,,,,,
Formal_Name__c,316,273,St. Mary,5,,,,,,,
Name,334,334,Pastoral Center,1,,,,,,,
Parish Name,5,5,St. Anne,1,,,,,,,
...,...,...,...,...,...,...,...,...,...,...,...
Mailing Address 1,56,55,4420 SW St Marys Dr,2,,,,,,,
Mailing Address Zip,56.0,,,,97222.446429,124.9586,97005.0,97134.75,97217.5,97301.0,97526.0
Vicariate Name,18,18,Albany-Corvallis,1,,,,,,,
Mailing Address City2,4,4,Corvallis,1,,,,,,,


In [790]:
# Create a single BillingAddress field

# Concatenate the two columns with CHAR(10) as separator
accounts['BillingStreet'] = accounts[['BillingStreet1', 'BillingStreet2']].apply(lambda x: '\n'.join(x.dropna()), axis=1)

# Drop the original columns
accounts.drop(columns=['BillingStreet1', 'BillingStreet2'], inplace=True)

In [791]:
# Handle boolean fields

boolean_columns_to_convert = [
    'Archdiocese_Assigns_Clergy__c', 
    'mbfc__Non_Latin__c', 
    'Disabled_Access__c', 
    ]

# Convert 'Yes'/'No' to True/False
accounts[boolean_columns_to_convert] = accounts[boolean_columns_to_convert].replace({'Yes': True, 'No': False, None: False})



In [792]:
accounts[boolean_columns_to_convert].sample(10)

Unnamed: 0,Archdiocese_Assigns_Clergy__c,mbfc__Non_Latin__c,Disabled_Access__c
235,False,False,False
302,True,False,False
191,False,False,False
244,False,False,False
99,True,False,True
328,True,False,False
164,True,False,True
227,False,False,False
76,True,False,True
49,True,False,True


In [793]:
# Religious Order fields > conform to new data model

# Apply logic to create new columns
accounts['Religious_Secular_Order__c'] = accounts.apply(
    lambda x: 'Religious Order' if x['Religious Order'] == 'Yes' else ('Secular Order' if x['Secular Order'] == 'Yes' else None), axis=1
)

accounts['Pontifical_or_Diocesan_Order__c'] = accounts.apply(
    lambda x: 'Diocesan Order' if x['Diocesan Order'] == 'Yes' else ('Pontifical Order' if x['Pontifical Order'] == 'Yes' else None), axis=1
)

accounts.drop(columns=['Religious Order', 'Secular Order', 'Diocesan Order', 'Pontifical Order'], inplace=True)

In [794]:
print(accounts['mbfc__Date_Established__c'].dtype)

object


In [795]:
# Handle Date fields that are only YYYY

# Ensure all values in 'mbfc__Date_Established__c' are strings
accounts['mbfc__Date_Established__c'] = accounts['mbfc__Date_Established__c'].astype(str)

# Define a function to transform valid year values
def transform_year(year):
    if pd.notna(year) and year.replace('.', '', 1).isdigit() and len(year.split('.')[0]) == 4:
        return pd.to_datetime(year.split('.')[0] + '-01-01')
    else:
        return pd.NaT

# Apply the function to the 'mbfc__Date_Established__c' column
accounts['mbfc__Date_Established__c'] = accounts['mbfc__Date_Established__c'].apply(transform_year)


In [796]:
accounts['mbfc__Date_Established__c'].sample(10)

290   1925-01-01
46    1884-01-01
79    1858-01-01
276          NaT
45           NaT
330          NaT
320          NaT
230          NaT
23           NaT
197          NaT
Name: mbfc__Date_Established__c, dtype: datetime64[ns]

In [797]:
# Format Parent_Parish__c field

# Remove instances of '0'
accounts.Parent_Parish__c = accounts.Parent_Parish__c.str.replace('0', '')



In [798]:
# Append prefix
accounts['Parent_Parish__c'] = accounts['Parent_Parish__c'].apply(lambda x: 'Parishes_' + x if pd.notna(x) and x is not None and x != '' else x)


In [799]:
# Check final results
accounts.Parent_Parish__c[accounts.Parent_Parish__c.isna() == False].sample(10)

86     
60     
136    
137    
84     
42     
152    
73     
120    
141    
Name: Parent_Parish__c, dtype: object

In [800]:
# ParentID field

accounts['ParentId'] = accounts['Parent_Parish__c']


### AccountRecordType & ChurchType


In [801]:
#Sets all rows where AccountRecordType is Church as a Parish.
accounts.loc[accounts['AccountRecordType'] == 'Church', 'mbfc__Church_Type__c'] = 'Parish'
accounts[accounts['AccountRecordType'] == 'Church'].head(5)


Unnamed: 0,Record Number,AccountRecordType,Formal_Name__c,Name,Parish Name,Archdiocese_Assigns_Clergy__c,Locator_Description__c,BillingCity,BillingState,Mailing Address Province,BillingPostalCode,BillingCountry,Phone,Fax,mbfc__Email__c,Website,src_table,Sort Name,Parish City,Parent_Parish__c,mbfc__Date_Established__c,Vicariate,Non-Latin,County__c,Disabled_Access__c,Sanctuary_Capacity__c,Lat/Long Coordinates Decimal,Google Small Embed URL,Miles to Pastoral Center,Schedule 1 Head,Schedule 1 Text,Schedule 2 Head,Schedule 2 Text,Schedule 3 Head,Schedule 3 Text,Schedule 4 Head,Schedule 4 Text,Schedule 5 Head,Schedule 5 Text,Schedule 6 Head,Schedule 6 Text,Schedule 7 Head,Schedule 7 Text,Community City,Order Full Name,mbfc__Abbreviation__c,mbfc__Religious_Suffix__c,mbfc__Type_Members__c,Non-Latin Rite,Show Order in Name,Description,Local Superior,Major Superior Name,Major Superior Phone,Major Superior Email,School City,Parish Link,Vicariate Link,Archdiocesan_School_Code__c,Grades_Provided__c,Mailing Address 1,Mailing Address Zip,Vicariate Name,Mailing Address City2,mbfc__Non_Latin__c,BillingStreet,Religious_Secular_Order__c,Pontifical_or_Diocesan_Order__c,ParentId,mbfc__Church_Type__c
35,1,Church,"Our Lady of Perpetual Help, St Mary’s","Our Lady of Perpetual Help, St Mary’s, Albany",,True,SW Ellsworth St between 8th and 9th Streets,Albany,OR,,97321,,541-926-1449,541-926-2191,olphoffice@stmarysalbany.com,https://stmarysalbany.com/,Parishes,our lady of perpetual help st marys albany,Albany,,1885-01-01,1,No,Linn,True,600.0,"44.6313042,-123.1059622",https://www.google.com/maps/embed?pb=!1m14!1m8...,72.0,Weekend Mass,"Saturday Vigil 5:00 pm, 7:00 pm (Español)<br>S...",Weekday Mass,"Tuesday Noon<br>Wednesday 8:30 am, 7:00 pm (Es...",Reconciliation (Confession),Saturday 3:00 – 4:30 pm,Adoration,Wednesday 6:00 pm – 7:00 pm in the Church<br>D...,,,,,,,,,,,,,,,,,,,,,,,,,,,,False,"Our Lady of Perpetual Help, St Mary’s Parish\n...",,,,Parish
36,2,Church,St. Andrew Dũng-Lạc,"St. Andrew Dũng-Lạc Mission, Aloha",,False,SW Grabhorn Rd/209th Ave and Farmington Rd,Aloha,OR,,97007,,503-591-5302,,,http://www.anredl.org/,Parishes,st andrew dunglac aloha,Aloha,Parishes_83,NaT,13,No,Washington,False,0.0,"45.4667627,-122.893276",https://www.google.com/maps/embed?pb=!1m14!1m8...,18.0,Weekend Mass,"Sunday 9:00 am, 11:00 am (Youth)",Weekday Mass,Tuesday 6:30 pm,Reconciliation (Confession),Sunday 10:15–11:15 am<br>Tuesday 6:00 pm,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,False,St. Andrew Dũng-Lạc Mission\n7390 SW Grabhorn Rd,,,Parishes_83,Parish
37,3,Church,St. Elizabeth Ann Seton,"St. Elizabeth Ann Seton, Aloha",,True,,Aloha,OR,,97003,,503-649-9044,503-848-2915,admin@seas-aloha.org,http://www.seas-aloha.org/,Parishes,st elizabeth ann seton aloha,Aloha,,1982-01-01,16,No,Washington,True,675.0,"45.4965071,-122.8780289",https://www.google.com/maps/embed?pb=!1m14!1m8...,17.0,Weekend Mass,"Saturday 5:30 pm (Español)<br>Sunday 8:00 am, ...",Weekday Mass,Tuesday–Friday 8:00 am<br>Tuesday 7:00 pm (Esp...,Reconciliation (Confession),Tuesday 8:00–9:00 pm<br>Friday 9:00–10:00 am<b...,Adoration/Adoración,Sunday 11:00 am–3:00 pm (chapel)<br>Monday–Wed...,,,,,,,,,,,,,,,,,,,,,,,,,,,,False,St. Elizabeth Ann Seton Parish\n3145 SW 192nd Ave,,,,Parish
38,4,Church,St. Peter the Fisherman,"St. Peter the Fisherman Mission, Arch Cape",,True,79441 Hwy 101 S,Seaside,OR,,97138,,503-436-2876,,olvoffice@archdpdx.org,http://ourladyofvictoryseaside.org/,Parishes,st peter the fisherman arch cape,Arch Cape,Parishes_131,NaT,9,No,Clatsop,True,0.0,"45.8115472,-123.962712",https://www.google.com/maps/embed?pb=!1m14!1m8...,90.0,Weekend Mass,Sunday 9:00 am,Reconciliation (Confession),Friday 10:00 am<br>or by appointment,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,False,St. Peter the Fisherman Mission\nPO Box 29,,,Parishes_131,Parish
39,5,Church,Our Lady of the Mountain,"Our Lady of the Mountain, Ashland",,True,,Ashland,OR,,97520,,541-482-1146,541-488-5174,olmop@mind.net,https://ourladymt.org/,Parishes,our lady of the mountain ashland,Ashland,,1887-01-01,15,No,Jackson,True,0.0,"42.1774285,-122.6864436",https://www.google.com/maps/embed?pb=!1m14!1m8...,291.0,Weekend Mass,"Saturday 5:00 pm<br>Sunday 9:30 am, Noon (Espa...",Weekday Mass,"Wednesday, Friday 9:30 am<br>Tuesday, Thursday...",Reconciliation (Confession),"First Saturday 8:15–8:45 am, 9:45–11 am<br>All...",Adoration,First Friday 9:00 am–6:00 pm<br>or 24 hours pe...,,,,,,,,,,,,,,,,,,,,,,,,,,,,False,Our Lady of the Mountain Parish\n987 Hillview Dr,,,,Parish


### Generate ExternalId


In [802]:
# Generate an External ID
columns_to_concate = ['src_table', 'Record Number']
accounts = concat_columns(accounts, columns_to_concate, 'Archdpdx_Migration_Id__c', separator='_')

In [803]:
# set Deanery RecordTypeId to the Church RecordTypeId
# map in RecordTypeIds
accounts['RecordTypeId'] = accounts['AccountRecordType'].map(record_types_mapping)
record_types_mapping

{'Church': '012Dx0000003p4xIAA',
 'Deanery': '012Dx0000003p4yIAA',
 'Group': '012Dx0000003p4zIAA',
 'Organization': '012Hu000001pkqEIAQ',
 'Property': '012Dx0000003p51IAA',
 'Religious': '012Dx0000003p5KIAQ',
 'All_Types': '012Dx0000003p53IAA',
 'Any': '012Dx0000003p54IAA',
 'Assignments_Clergy': '012Dx0000003p55IAA',
 'Chancery_Users': '012Dx0000003p56IAA',
 'Clergy_Religious_Residence': '012Dx0000003p57IAA',
 'Diocean_Users': '012Dx0000003p58IAA',
 'Diocesan_Appointment': '012Dx0000003p59IAA',
 'Ecclesial_Affiliation': '012Dx0000003p5AIAQ',
 'Education': '012Dx0000003p5BIAQ',
 'Lay_Person': '012Dx0000003p5HIAQ',
 'Ministerial_Status': '012Dx0000003p5DIAQ',
 'Parish_Affiliations': '012Dx0000003p5EIAQ',
 'Staff': '012Dx0000003p5FIAQ',
 'Consecrated': '012Dx0000003p5GIAQ',
 'Permanent_Deacon': '012Dx0000003p5IIAQ',
 'Priest': '012Dx0000003p5JIAQ',
 'MajorGift': '012Hu000001pkqBIAQ',
 'Grant': '012Hu000001pkqCIAQ',
 'HH_Account': '012Hu000001pkqDIAQ',
 'Donation': '012Hu000001pkqFIAQ',
 

## Load


### Generate a new Job ID


In [804]:
# increment to the job_id
file_name = '/Users/matthewmartin/Library/CloudStorage/GoogleDrive-matt@meribahflow.com/Shared drives/Clients/ADPDX (Portland)/Data/Clergy DB/jobs/job_id'
curr_job_id = update_job_id(file_name)
print(f"New job ID: {curr_job_id}")

# add/update account DF with job_id
accounts["Job_Id__c"] = curr_job_id


New job ID: 116


### A) Vicariates


In [805]:
# Get Account Group RecordTypeID
deanery_recordTypeId = df_sf_recordTypes.loc[
    (df_sf_recordTypes['DeveloperName'] == 'Deanery') & (df_sf_recordTypes['SobjectType'] == 'Account'),
    'Id'
    ].iloc[0]  # Use .iloc[0] to get the first item if you're expecting exactly one match


# Insert Vicariates holding account
vicariate_account = sf.Account.upsert('Archdpdx_Migration_Id__c/Vicariates_Holding_Acc',
    {
    "Name": "Vicariates",
    "ParentId": diocesan_account_id,
    "mbfc__Diocese__c": diocesan_account_id,
    "RecordTypeId": deanery_recordTypeId,
    # "mbfc__Group_Type__c": 'Office',
    "Job_Id__c": curr_job_id
    }
)

# Get Vicariate Holding Acc's SF ID (as an upsert doesn't return the actual record ID)
vicariate_account = sf.Account.get_by_custom_id('Archdpdx_Migration_Id__c', 'Vicariates_Holding_Acc')
vicariate_account_id = vicariate_account['Id']

vicariate_account_id

'001Dx00001HwDuDIAV'

In [806]:
# Prepare Vicariates staging DF

vicariates = accounts[accounts['AccountRecordType'] == 'Deanery']


vicariates = vicariates[[
    'Record Number',
    'Name',
    # 'AccountRecordType',
    'Job_Id__c',
    'Archdpdx_Migration_Id__c',
    'RecordTypeId'
    ]]

# add parentid
vicariates["mbfc__Diocese__c"] = diocesan_account_id
vicariates['ParentId'] = vicariate_account_id
# vicariates['mbfc__Church_Type__c'] = 'Deanery'
vicariates['RecordTypeId'] = deanery_recordTypeId

vicariates.rename(columns={
        # 'Name, City': 'Name',
        'External_Id': 'Archdpdx_Migration_Id__c'
    }, inplace=True)

vicariates.reset_index()
vicariates.set_index('Record Number', inplace=True)

vicariates

Unnamed: 0_level_0,Name,Job_Id__c,Archdpdx_Migration_Id__c,RecordTypeId,mbfc__Diocese__c,ParentId
Record Number,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
1,Albany-Corvallis Vicariate,116,Vicariates_1,012Dx0000003p4yIAA,001Dx00001HwDsgIAF,001Dx00001HwDuDIAV
2,"Beaverton, Suburban Vicariate",116,Vicariates_2,012Dx0000003p4yIAA,001Dx00001HwDsgIAF,001Dx00001HwDuDIAV
3,Columbia County Vicariate,116,Vicariates_3,012Dx0000003p4yIAA,001Dx00001HwDsgIAF,001Dx00001HwDuDIAV
4,Downtown Portland Vicariate,116,Vicariates_4,012Dx0000003p4yIAA,001Dx00001HwDsgIAF,001Dx00001HwDuDIAV
5,"East Portland, Suburban Vicariate",116,Vicariates_5,012Dx0000003p4yIAA,001Dx00001HwDsgIAF,001Dx00001HwDuDIAV
6,Marion County Vicariate,116,Vicariates_6,012Dx0000003p4yIAA,001Dx00001HwDsgIAF,001Dx00001HwDuDIAV
7,Metropolitan Eugene Vicariate,116,Vicariates_7,012Dx0000003p4yIAA,001Dx00001HwDsgIAF,001Dx00001HwDuDIAV
8,Metropolitan Salem Vicariate,116,Vicariates_8,012Dx0000003p4yIAA,001Dx00001HwDsgIAF,001Dx00001HwDuDIAV
9,North Coast Vicariate,116,Vicariates_9,012Dx0000003p4yIAA,001Dx00001HwDsgIAF,001Dx00001HwDuDIAV
10,Northeast Portland Vicariate,116,Vicariates_10,012Dx0000003p4yIAA,001Dx00001HwDsgIAF,001Dx00001HwDuDIAV


#### Export Vicariates to CSV


In [807]:
# export to CSV
vicariates.to_csv('/Users/matthewmartin/Library/CloudStorage/GoogleDrive-matt@meribahflow.com/Shared drives/Clients/ADPDX (Portland)/Data/Clergy DB/staging/vicariates_staging.csv')


#### Upsert Vicariates


In [808]:
bulk_data = []
for row in vicariates.itertuples(index=False):
    d = row._asdict()
    # del d['Index']
    bulk_data.append(d)

if run_upserts == 'True':
    vicariate_upsert = sf.bulk.Account.upsert(data=bulk_data, external_id_field='Archdpdx_Migration_Id__c', batch_size=100, use_serial=False)
    upserts = pd.DataFrame(vicariate_upsert)

    print(upserts)
    

    success  created                  id errors
0      True    False  001Dx00001HwDwnIAF     []
1      True    False  001Dx00001HwDwoIAF     []
2      True    False  001Dx00001HwDwpIAF     []
3      True    False  001Dx00001HwDwqIAF     []
4      True    False  001Dx00001HwDwrIAF     []
5      True    False  001Dx00001HwDwsIAF     []
6      True    False  001Dx00001HwDwtIAF     []
7      True    False  001Dx00001HwDwuIAF     []
8      True    False  001Dx00001HwDwvIAF     []
9      True    False  001Dx00001HwDwwIAF     []
10     True    False  001Dx00001HwDwxIAF     []
11     True    False  001Dx00001HwDwyIAF     []
12     True    False  001Dx00001HwDwzIAF     []
13     True    False  001Dx00001HwDx0IAF     []
14     True    False  001Dx00001HwDx1IAF     []
15     True    False  001Dx00001HwDx2IAF     []
16     True    False  001Dx00001HwDx3IAF     []
17     True    False  001Dx00001HwDx4IAF     []


In [809]:
# Generate an Errors log
import csv

keys = vicariate_upsert[0].keys()

with open('results_files/vicariate_results', 'w', newline='') as csv_file:
    writer = csv.DictWriter(csv_file, keys)
    writer.writeheader()
    writer.writerows(vicariate_upsert)

In [810]:
# Get Vicariate records from SF

sf_deaneries = sf.query("SELECT Archdpdx_Migration_Id__c, Id FROM Account WHERE RecordType.DeveloperName = 'Deanery'")

df_sf_deaneries = pd.DataFrame(sf_deaneries['records'])
df_sf_deaneries = df_sf_deaneries.drop(columns = 'attributes')

df_sf_deaneries

# Creates a dict of Vicariate unique ids to the new Salesforce record IDs, so can populate on latter Account records
vicariate_sf_recordids = df_sf_deaneries.set_index('Archdpdx_Migration_Id__c')['Id'].to_dict()
vicariate_sf_recordids

{'Vicariates_Holding_Acc': '001Dx00001HwDuDIAV',
 'Vicariates_1': '001Dx00001HwDwnIAF',
 'Vicariates_2': '001Dx00001HwDwoIAF',
 'Vicariates_3': '001Dx00001HwDwpIAF',
 'Vicariates_4': '001Dx00001HwDwqIAF',
 'Vicariates_5': '001Dx00001HwDwrIAF',
 'Vicariates_6': '001Dx00001HwDwsIAF',
 'Vicariates_7': '001Dx00001HwDwtIAF',
 'Vicariates_8': '001Dx00001HwDwuIAF',
 'Vicariates_9': '001Dx00001HwDwvIAF',
 'Vicariates_10': '001Dx00001HwDwwIAF',
 'Vicariates_11': '001Dx00001HwDwxIAF',
 'Vicariates_12': '001Dx00001HwDwyIAF',
 'Vicariates_13': '001Dx00001HwDwzIAF',
 'Vicariates_14': '001Dx00001HwDx0IAF',
 'Vicariates_15': '001Dx00001HwDx1IAF',
 'Vicariates_16': '001Dx00001HwDx2IAF',
 'Vicariates_17': '001Dx00001HwDx3IAF',
 'Vicariates_18': '001Dx00001HwDx4IAF'}

### B) Parishes, Schools, Organizations


In [811]:
# Create acc_main (accounts excluding Deaneries (already handled) and Religious (to be handled differently, after))
acc_main = accounts[accounts['AccountRecordType'] != 'Deanery']
acc_main = acc_main[acc_main['AccountRecordType'] != 'Religious']

acc_main.loc[acc_main['AccountRecordType'] == 'Church', 'Vicariate_Ext_Id'] = 'Vicariates_' + acc_main['Vicariate']

In [812]:
acc_main.sample(5)

Unnamed: 0,Record Number,AccountRecordType,Formal_Name__c,Name,Parish Name,Archdiocese_Assigns_Clergy__c,Locator_Description__c,BillingCity,BillingState,Mailing Address Province,BillingPostalCode,BillingCountry,Phone,Fax,mbfc__Email__c,Website,src_table,Sort Name,Parish City,Parent_Parish__c,mbfc__Date_Established__c,Vicariate,Non-Latin,County__c,Disabled_Access__c,Sanctuary_Capacity__c,Lat/Long Coordinates Decimal,Google Small Embed URL,Miles to Pastoral Center,Schedule 1 Head,Schedule 1 Text,Schedule 2 Head,Schedule 2 Text,Schedule 3 Head,Schedule 3 Text,Schedule 4 Head,Schedule 4 Text,Schedule 5 Head,Schedule 5 Text,Schedule 6 Head,Schedule 6 Text,Schedule 7 Head,Schedule 7 Text,Community City,Order Full Name,mbfc__Abbreviation__c,mbfc__Religious_Suffix__c,mbfc__Type_Members__c,Non-Latin Rite,Show Order in Name,Description,Local Superior,Major Superior Name,Major Superior Phone,Major Superior Email,School City,Parish Link,Vicariate Link,Archdiocesan_School_Code__c,Grades_Provided__c,Mailing Address 1,Mailing Address Zip,Vicariate Name,Mailing Address City2,mbfc__Non_Latin__c,BillingStreet,Religious_Secular_Order__c,Pontifical_or_Diocesan_Order__c,ParentId,mbfc__Church_Type__c,Archdpdx_Migration_Id__c,RecordTypeId,Job_Id__c,Vicariate_Ext_Id
5,12,Organization,Providence St. Vincent Medical Center,Providence St. Vincent Medical Center,,True,,Portland,OR,,97213,,503-216-2261,,,,Offices,,,,NaT,,,,False,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,False,9205 SW Barnes Rd,,,,,Offices_12,012Hu000001pkqEIAQ,116,
87,57,Church,St. Patrick,"St. Patrick Mission, Lyons",,True,7th St and Ash St,Scio,OR,,97374,,503-394-2437,503-394-7045,pastor@immacstayton.org,https://www.immacstayton.org/lyons-st-patrick-...,Parishes,st patrick lyons,Lyons,Parishes_51,NaT,11.0,No,Linn,True,0.0,"44.775785,-122.613664",https://www.google.com/maps/embed?pb=!1m18!1m1...,74.0,Weekend Mass,Second and Fourth Saturday 5:00 pm,Weekday Mass,Tuesday 8:00 am,Reconciliation (Confession),By appointment,Adoration,Chaplet of Divine Mercy and Rosary Tuesday 7:3...,,,,,,,,,,,,,,,,,,,,,,,,,,,,False,St. Patrick Mission\n39043 Jordan Road,,,Parishes_51,Parish,Parishes_57,012Dx0000003p4xIAA,116,Vicariates_11
68,38,Church,Sacred Heart,"Sacred Heart, Gervais",,True,605 7th St,Gervais,OR,,97026,,503-792-4231,,secretary@shstl.org,https://www.shstl.org/,Parishes,sacred heart gervais,Gervais,,1847-01-01,6.0,No,Marion,True,400.0,"45.10957009443997,-122.9008325603607",https://www.google.com/maps/embed?pb=!1m18!1m1...,35.0,Weekend Mass,"Saturday 4:00 pm, 6:00 pm (Español)<br>Sunday ...",Weekday Mass,Monday 8:00 (only during Advent)<br>Tuesday 8:...,Reconciliation (Confession),Wednesday 5:00 pm<br>Saturday 3:00 pm,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,False,Sacred Heart Parish\nPO Box 236,,,,Parish,Parishes_38,012Dx0000003p4xIAA,116,Vicariates_6
131,102,Church,St. Mary Magdalene,"The Madeleine, Portland",The Madeleine,True,NE 24th Ave and Siskiyou St,Portland,OR,,97212,,503-281-5777,503-281-0673,jreilly@themadeleine.edu,https://themadeleine.edu/site/church/,Parishes,the madeleine portland,Portland,,1911-01-01,10.0,No,Multnomah,True,500.0,"45.54588743590623,-122.64383074819736",https://www.google.com/maps/embed?pb=!1m18!1m1...,2.0,Weekend Mass,"Saturday 5:00 pm<br>Sunday 8:00 am, 10:00 am",Weekday Mass,"Tuesday, Wednesday, Friday 8:00 am<br>Thursday...",Reconciliation (Confession),by appointment,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,False,The Madeleine Parish\n3123 NE 24th Ave,,,,Parish,Parishes_102,012Dx0000003p4xIAA,116,Vicariates_10
128,99,Church,St. Joseph the Worker,"St. Joseph the Worker, Portland",,True,SE 148th Ave near Division St,Portland,OR,,97233,,503-761-8710,503-761-8545,sjtwadmin@comcast.net,https://www.stjosephtheworkerpdx.org/,Parishes,st joseph the worker portland,Portland,,1885-01-01,5.0,No,Multnomah,True,370.0,"45.50542453871301,-122.5115450845345",https://www.google.com/maps/embed?pb=!1m18!1m1...,9.0,Weekend Mass,"Saturday 5:00 pm<br>Sunday 8:30 am, 11:00 am, ...",Weekday Mass,"Tuesday, Thursday 12:15 pm (adoration 11:45 am...",Reconciliation (Confession),Wednesday 6:00–7:00 pm,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,False,St. Joseph the Worker Parish\n2310 SE 148th Ave,,,,Parish,Parishes_99,012Dx0000003p4xIAA,116,Vicariates_5


In [813]:
# map in Deaneries
acc_main['mbfc__Deanery__c'] = acc_main.Vicariate_Ext_Id.map(vicariate_sf_recordids)

acc_main[acc_main['AccountRecordType'] == 'Church']['mbfc__Deanery__c']

35     001Dx00001HwDwnIAF
36     001Dx00001HwDwzIAF
37     001Dx00001HwDx2IAF
38     001Dx00001HwDwvIAF
39     001Dx00001HwDx1IAF
              ...        
181    001Dx00001HwDwrIAF
182    001Dx00001HwDx3IAF
183    001Dx00001HwDwsIAF
184    001Dx00001HwDx4IAF
185    001Dx00001HwDwtIAF
Name: mbfc__Deanery__c, Length: 151, dtype: object

In [814]:
# Clean up NaN values

acc_main.fillna('', inplace=True)

In [815]:
# Generate Schedule text from all Schedule columns

def create_account_schedule(row):
    account_schedule = []
    for i in range(1, 8):
        head_col = f'Schedule {i} Head'
        text_col = f'Schedule {i} Text'
        
        head = row[head_col]
        text = row[text_col]
        
        if pd.notnull(head) or pd.notnull(text):
            if pd.notnull(head):
                account_schedule.append(f"<p><strong>{head}</strong></p>")
            if pd.notnull(text):
                account_schedule.append(f"<p>{text}</p>")
            account_schedule.append("<p><br></p>")
    
    # Join all parts into a single string
    return "".join(account_schedule).strip()

acc_main['mbfc__Mass_Times__c'] = acc_main.apply(create_account_schedule, axis=1)



In [816]:
acc_main['mbfc__Mass_Times__c'].sample(15)

59     <p><strong>Weekend Mass</strong></p><p>Sunday ...
333    <p><strong>Weekend Mass</strong></p><p>Sunday ...
104    <p><strong>Weekend Mass</strong></p><p>Sunday ...
120    <p><strong>Weekend Mass</strong></p><p>Saturda...
73     <p><strong>Weekend Mass</strong></p><p>Saturda...
261    <p><strong></strong></p><p></p><p><br></p><p><...
90     <p><strong>Weekend Mass</strong></p><p>Saturda...
18     <p><strong></strong></p><p></p><p><br></p><p><...
49     <p><strong>Mass</strong></p><p>Sunday 11:15 am...
274    <p><strong></strong></p><p></p><p><br></p><p><...
272    <p><strong></strong></p><p></p><p><br></p><p><...
299    <p><strong></strong></p><p></p><p><br></p><p><...
163    <p><strong>Weekend Mass</strong></p><p>Sunday ...
33     <p><strong></strong></p><p></p><p><br></p><p><...
264    <p><strong></strong></p><p></p><p><br></p><p><...
Name: mbfc__Mass_Times__c, dtype: object

In [817]:
acc_main

Unnamed: 0,Record Number,AccountRecordType,Formal_Name__c,Name,Parish Name,Archdiocese_Assigns_Clergy__c,Locator_Description__c,BillingCity,BillingState,Mailing Address Province,BillingPostalCode,BillingCountry,Phone,Fax,mbfc__Email__c,Website,src_table,Sort Name,Parish City,Parent_Parish__c,mbfc__Date_Established__c,Vicariate,Non-Latin,County__c,Disabled_Access__c,Sanctuary_Capacity__c,Lat/Long Coordinates Decimal,Google Small Embed URL,Miles to Pastoral Center,Schedule 1 Head,Schedule 1 Text,Schedule 2 Head,Schedule 2 Text,Schedule 3 Head,Schedule 3 Text,Schedule 4 Head,Schedule 4 Text,Schedule 5 Head,Schedule 5 Text,Schedule 6 Head,Schedule 6 Text,Schedule 7 Head,Schedule 7 Text,Community City,Order Full Name,mbfc__Abbreviation__c,mbfc__Religious_Suffix__c,mbfc__Type_Members__c,Non-Latin Rite,Show Order in Name,Description,Local Superior,Major Superior Name,Major Superior Phone,Major Superior Email,School City,Parish Link,Vicariate Link,Archdiocesan_School_Code__c,Grades_Provided__c,Mailing Address 1,Mailing Address Zip,Vicariate Name,Mailing Address City2,mbfc__Non_Latin__c,BillingStreet,Religious_Secular_Order__c,Pontifical_or_Diocesan_Order__c,ParentId,mbfc__Church_Type__c,Archdpdx_Migration_Id__c,RecordTypeId,Job_Id__c,Vicariate_Ext_Id,mbfc__Deanery__c,mbfc__Mass_Times__c
0,1,Organization,Pastoral Center,Pastoral Center,,True,,Portland,OR,,97214,,503-234-5334,503-234-2545,commdir@archdpdx.org,http://www.archdpdx.org/,Offices,,,,NaT,,,,False,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,False,2838 E Burnside St,,,,,Offices_1,012Hu000001pkqEIAQ,116,,,<p><strong></strong></p><p></p><p><br></p><p><...
1,3,Organization,Catholic Sentinel,Catholic Sentinel,,False,,Portland,OR,,97214,,503-281-1191,,sentinel@catholicsentinel.org,http://www.sentinel.org/,Offices,,,,NaT,,,,False,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,False,2838 E Burnside St,,,,,Offices_3,012Hu000001pkqEIAQ,116,,,<p><strong></strong></p><p></p><p><br></p><p><...
2,4,Organization,Catholic Cemeteries,Catholic Cemeteries,,False,,Portland,OR,,97221,,503-292-6621,,,http://www.ccpdxor.com/,Offices,,,,NaT,,,,False,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,False,333 SW Skyline Blvd,,,,,Offices_4,012Hu000001pkqEIAQ,116,,,<p><strong></strong></p><p></p><p><br></p><p><...
3,6,Organization,Griffin Center,Griffin Center,,False,,Milwaukie,OR,,97222,,503-652-7476,,hwycoff@archdpdx.org,http://www.griffincenterportland.org/,Offices,,,,NaT,,,,False,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,False,11957 SE Fuller Rd,,,,,Offices_6,012Hu000001pkqEIAQ,116,,,<p><strong></strong></p><p></p><p><br></p><p><...
4,11,Organization,Providence Portland Medical Center,Providence Portland Medical Center,,True,,Portland,OR,,97213,,503-215-6833,,,,Offices,,,,NaT,,,,False,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,False,4805 NE Glisan St,,,,,Offices_11,012Hu000001pkqEIAQ,116,,,<p><strong></strong></p><p></p><p><br></p><p><...
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
311,58,Organization,Resurrection Catholic Parish School,"Resurrection Catholic Parish School, Tualatin",,True,,Tualatin,OR,,,,503-638-8869,,schooloffice@rcparish.org,https://www.resurrectioncatholicprimary.com/,Schools,,,,NaT,,,,False,,"45.367489206497204,-122.70846888751714",https://www.google.com/maps/embed?pb=!1m18!1m1...,,,,,,,,,,,,,,,,,,,,,,,,,,,,Tualatin,147.0,0.0,12-WEESRES,PK-5,21060 SW Stafford Rd,97062.0,,,False,,,,,,Schools_58,012Hu000001pkqEIAQ,116,,,<p><strong></strong></p><p></p><p><br></p><p><...
330,1,Organization,OSU Newman Center,"OSU Newman Center, Corvallis",,False,,Corvallis,OR,,97330,,541-752-6818,,info@osunewman.org,http://www.osunewman.org/,NewmanCenters,,,,NaT,,,,False,,"44.5684145,-123.2789302",https://www.google.com/maps/embed?pb=!1m14!1m8...,89.0,Mass (During Academic Year),Tuesday 5:00 pm<br>Thursday 8:00 pm<br>Sunday ...,Reconciliation (Confession),Wednesday 4:00–6:00 pm (St. Mary)<br>Thursday ...,Adoration,Thursday 9:00 am–7:30 pm during academic year<...,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Corvallis,False,2127 NW Monroe Ave,,,,,NewmanCenters_1,012Hu000001pkqEIAQ,116,,,<p><strong>Mass (During Academic Year)</strong...
331,2,Organization,St. Thomas More (UO) Newman Center,"St. Thomas More (UO) Newman Center, Eugene",,False,,Eugene,OR,,97403,,541-343-7021,541-686-8028,secretary@uonewman.org,http://www.uonewman.org/,NewmanCenters,,,,1915-01-01,,,,False,,"44.03944857900737,-123.07404404885466",https://www.google.com/maps/embed?pb=!1m18!1m1...,116.0,Weekend Mass,"Saturday 5:00 pm<br>Sunday 9:00 am, 11:00 am, ...",Weekday Mass,Tuesday-Friday 5:15 pm,Reconciliation (Confession),Saturday 4:00–4:45 pm<br>or by appointment,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Eugene,False,1850 Emerald St,,,,,NewmanCenters_2,012Hu000001pkqEIAQ,116,,,<p><strong>Weekend Mass</strong></p><p>Saturda...
332,3,Organization,Walsh Memorial (SOU) Newman Center at Our Lady...,Walsh Memorial (SOU) Newman Center at Our Lady...,,True,,Ashland,OR,,97520,,541-708-8503,,emillenheft@archdpdx.org,https://ourladymt.org/the-newman-center,NewmanCenters,,,,NaT,,,,False,,"42.17747270484193,-122.68546235847474",https://www.google.com/maps/embed?pb=!1m14!1m8...,291.0,Sunday Mass,5:00 pm <br>9:30 am <br>12:00 pm Spanish<br>5:...,Weekday Mass,Tues: Noon<br>Wed: 8:30 am<br>Thurs: Noon<br>F...,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Ashland,False,987 Hillview Dr,,,,,NewmanCenters_3,012Hu000001pkqEIAQ,116,,,<p><strong>Sunday Mass</strong></p><p>5:00 pm ...


In [818]:
# Create 'account_staging' df (drop extraneous columns)

accounts_staging = acc_main[[
    'Name',
    'Formal_Name__c',
    'RecordTypeId',
    'mbfc__Church_Type__c',
    'mbfc__Deanery__c',
    'BillingStreet',
    'BillingCity',
    'BillingState',
    'BillingPostalCode',
    'BillingCountry',
    'Phone',
    'Fax',
    'mbfc__Email__c',
    'Website',
    'mbfc__Mass_Times__c',
    'mbfc__Abbreviation__c',
    'mbfc__Religious_Suffix__c',
    'mbfc__Type_Members__c',
    'Description',
    'Archdiocese_Assigns_Clergy__c', # Boolean fields
    'mbfc__Non_Latin__c', 
    'Disabled_Access__c', 
    'Locator_Description__c',
    'Parent_Parish__c',
    'mbfc__Date_Established__c',
    'County__c',
    'Sanctuary_Capacity__c',
    # 'Miles_to_Pastoral_Centre__c',
    'Religious_Secular_Order__c',
    'Pontifical_or_Diocesan_Order__c',
    'Archdiocesan_School_Code__c',
    'Grades_Provided__c',
    'Job_Id__c',
    'Archdpdx_Migration_Id__c',
    # 'ParentId'  # Later, check whether or not can upsert using external ID using this field

    ]]

In [819]:
accounts_staging

Unnamed: 0,Name,Formal_Name__c,RecordTypeId,mbfc__Church_Type__c,mbfc__Deanery__c,BillingStreet,BillingCity,BillingState,BillingPostalCode,BillingCountry,Phone,Fax,mbfc__Email__c,Website,mbfc__Mass_Times__c,mbfc__Abbreviation__c,mbfc__Religious_Suffix__c,mbfc__Type_Members__c,Description,Archdiocese_Assigns_Clergy__c,mbfc__Non_Latin__c,Disabled_Access__c,Locator_Description__c,Parent_Parish__c,mbfc__Date_Established__c,County__c,Sanctuary_Capacity__c,Religious_Secular_Order__c,Pontifical_or_Diocesan_Order__c,Archdiocesan_School_Code__c,Grades_Provided__c,Job_Id__c,Archdpdx_Migration_Id__c
0,Pastoral Center,Pastoral Center,012Hu000001pkqEIAQ,,,2838 E Burnside St,Portland,OR,97214,,503-234-5334,503-234-2545,commdir@archdpdx.org,http://www.archdpdx.org/,<p><strong></strong></p><p></p><p><br></p><p><...,,,,,True,False,False,,,NaT,,,,,,,116,Offices_1
1,Catholic Sentinel,Catholic Sentinel,012Hu000001pkqEIAQ,,,2838 E Burnside St,Portland,OR,97214,,503-281-1191,,sentinel@catholicsentinel.org,http://www.sentinel.org/,<p><strong></strong></p><p></p><p><br></p><p><...,,,,,False,False,False,,,NaT,,,,,,,116,Offices_3
2,Catholic Cemeteries,Catholic Cemeteries,012Hu000001pkqEIAQ,,,333 SW Skyline Blvd,Portland,OR,97221,,503-292-6621,,,http://www.ccpdxor.com/,<p><strong></strong></p><p></p><p><br></p><p><...,,,,,False,False,False,,,NaT,,,,,,,116,Offices_4
3,Griffin Center,Griffin Center,012Hu000001pkqEIAQ,,,11957 SE Fuller Rd,Milwaukie,OR,97222,,503-652-7476,,hwycoff@archdpdx.org,http://www.griffincenterportland.org/,<p><strong></strong></p><p></p><p><br></p><p><...,,,,,False,False,False,,,NaT,,,,,,,116,Offices_6
4,Providence Portland Medical Center,Providence Portland Medical Center,012Hu000001pkqEIAQ,,,4805 NE Glisan St,Portland,OR,97213,,503-215-6833,,,,<p><strong></strong></p><p></p><p><br></p><p><...,,,,,True,False,False,,,NaT,,,,,,,116,Offices_11
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
311,"Resurrection Catholic Parish School, Tualatin",Resurrection Catholic Parish School,012Hu000001pkqEIAQ,,,,Tualatin,OR,,,503-638-8869,,schooloffice@rcparish.org,https://www.resurrectioncatholicprimary.com/,<p><strong></strong></p><p></p><p><br></p><p><...,,,,,True,False,False,,,NaT,,,,,12-WEESRES,PK-5,116,Schools_58
330,"OSU Newman Center, Corvallis",OSU Newman Center,012Hu000001pkqEIAQ,,,2127 NW Monroe Ave,Corvallis,OR,97330,,541-752-6818,,info@osunewman.org,http://www.osunewman.org/,<p><strong>Mass (During Academic Year)</strong...,,,,,False,False,False,,,NaT,,,,,,,116,NewmanCenters_1
331,"St. Thomas More (UO) Newman Center, Eugene",St. Thomas More (UO) Newman Center,012Hu000001pkqEIAQ,,,1850 Emerald St,Eugene,OR,97403,,541-343-7021,541-686-8028,secretary@uonewman.org,http://www.uonewman.org/,<p><strong>Weekend Mass</strong></p><p>Saturda...,,,,,False,False,False,,,1915-01-01,,,,,,,116,NewmanCenters_2
332,Walsh Memorial (SOU) Newman Center at Our Lady...,Walsh Memorial (SOU) Newman Center at Our Lady...,012Hu000001pkqEIAQ,,,987 Hillview Dr,Ashland,OR,97520,,541-708-8503,,emillenheft@archdpdx.org,https://ourladymt.org/the-newman-center,<p><strong>Sunday Mass</strong></p><p>5:00 pm ...,,,,,True,False,False,,,NaT,,,,,,,116,NewmanCenters_3


#### Create Parishes Holding Acc for acc heirarchy

In [820]:
# Upsert a Parishes holding account

# Get Account Group RecordTypeID
group_recordTypeId = df_sf_recordTypes.loc[
    (df_sf_recordTypes['DeveloperName'] == 'Group') & (df_sf_recordTypes['SobjectType'] == 'Account'),
    'Id'
    ].iloc[0]  # Use .iloc[0] to get the first item if you're expecting exactly one match


# Insert Vicariates holding account
parish_holding_account = sf.Account.upsert('Archdpdx_Migration_Id__c/Parishes_Holding_Acc',
    {
    "Name": "Parishes",
    "ParentId": diocesan_account_id,
    "RecordTypeId": group_recordTypeId,
    "Job_Id__c": curr_job_id,
    "mbfc__Group_Type__c": "Office"
    }
)

# Get Vicariate Holding Acc's SF ID (as an upsert doesn't return the actual record ID)

parish_holding_account = sf.Account.get_by_custom_id('Archdpdx_Migration_Id__c', 'Parishes_Holding_Acc')

parishes_holding_account_id = parish_holding_account['Id']

parishes_holding_account_id

'001Dx00001HwDxKIAV'

In [821]:
# Set the ParentId for all Parish records

accounts_staging['ParentId'] = None

accounts_staging['ParentId']= accounts_staging.apply(
    lambda row: parishes_holding_account_id if row['mbfc__Church_Type__c'] == 'Parish' else row['ParentId'], axis=1
)

accounts_staging.sample(10)


A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  accounts_staging['ParentId'] = None
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  accounts_staging['ParentId']= accounts_staging.apply(


Unnamed: 0,Name,Formal_Name__c,RecordTypeId,mbfc__Church_Type__c,mbfc__Deanery__c,BillingStreet,BillingCity,BillingState,BillingPostalCode,BillingCountry,Phone,Fax,mbfc__Email__c,Website,mbfc__Mass_Times__c,mbfc__Abbreviation__c,mbfc__Religious_Suffix__c,mbfc__Type_Members__c,Description,Archdiocese_Assigns_Clergy__c,mbfc__Non_Latin__c,Disabled_Access__c,Locator_Description__c,Parent_Parish__c,mbfc__Date_Established__c,County__c,Sanctuary_Capacity__c,Religious_Secular_Order__c,Pontifical_or_Diocesan_Order__c,Archdiocesan_School_Code__c,Grades_Provided__c,Job_Id__c,Archdpdx_Migration_Id__c,ParentId
85,"St. Edward, Lebanon",St. Edward,012Dx0000003p4xIAA,Parish,001Dx00001HwDwnIAF,St. Edward Parish\n100 S Main St,Lebanon,OR,97355.0,,541-258-5333,541-258-2511,stedwardslebanon@comcast.net,https://stedwardlebanon.org/,<p><strong>Weekend Mass</strong></p><p>Saturda...,,,,,True,False,True,,,1903-01-01,Linn,389.0,,,,,116,Parishes_55,001Dx00001HwDxKIAV
140,"St. Stanislaus, Portland",St. Stanislaus,012Dx0000003p4xIAA,Parish,001Dx00001HwDwwIAF,St. Stanislaus Parish\n3916 N Interstate Ave,Portland,OR,97227.0,,503-281-7532,503-281-7532,parish@ststanislausparish.com,http://www.ststanislausparish.com/,<p><strong>Weekend Mass</strong></p><p>Sunday ...,,,,,True,False,False,,,1907-01-01,Multnomah,400.0,,,,,116,Parishes_111,001Dx00001HwDxKIAV
311,"Resurrection Catholic Parish School, Tualatin",Resurrection Catholic Parish School,012Hu000001pkqEIAQ,,,,Tualatin,OR,,,503-638-8869,,schooloffice@rcparish.org,https://www.resurrectioncatholicprimary.com/,<p><strong></strong></p><p></p><p><br></p><p><...,,,,,True,False,False,,,NaT,,,,,12-WEESRES,PK-5,116,Schools_58,
134,"St. Peter, Portland",St. Peter,012Dx0000003p4xIAA,Parish,001Dx00001HwDwrIAF,St. Peter Parish\n8623 SE Woodstock Blvd,Portland,OR,97266.0,,503-777-3321,503-777-3351,office@stpeterpdx.org,https://stpeterpdx.org/,<p><strong>Weekend Mass</strong></p><p>Sunday ...,,,,,True,False,True,"5905 SE 87th Ave, between Foster Rd and Woodst...",,1910-01-01,Multnomah,800.0,,,,,116,Parishes_105,001Dx00001HwDxKIAV
72,"St. Anne, Grants Pass",Ste. Anne de Beaupré,012Dx0000003p4xIAA,Parish,001Dx00001HwDx1IAF,St. Anne Parish\n1131 NE 10th St,Grants Pass,OR,97526.0,,541-476-2240,541-476-2194,office@stannegp.com,https://www.stannegp.com/,<p><strong>Weekend Mass</strong></p><p>Saturda...,,,,,True,False,True,,,1896-01-01,Josephine,0.0,,,,,116,Parishes_42,001Dx00001HwDxKIAV
117,"St. Andre Bessette, Portland",St. André Bessette,012Dx0000003p4xIAA,Parish,001Dx00001HwDwqIAF,St. Andre Bessette Parish\n601 W Burnside St,Portland,OR,97209.0,,503-228-0746,503-972-1063,info@saintandrechurch.org,https://www.saintandrebessettepdx.org/,<p><strong>Weekend Mass</strong></p><p>Saturda...,,,,,True,False,True,W Burnside St and NW 6th Ave,,1919-01-01,Multnomah,200.0,,,,,116,Parishes_88,001Dx00001HwDxKIAV
284,"St. John the Apostle Catholic School, Oregon City",St. John the Apostle Catholic School,012Hu000001pkqEIAQ,,,,Oregon City,OR,,,503-742-8230,,office@sja-eagles.com,http://sja-eagles.com/,<p><strong></strong></p><p></p><p><br></p><p><...,,,,,True,False,False,,,1844-01-01,,,,,12-OREJOHS,PK-8,116,Schools_31,
91,"Christ the King, Milwaukie",Christ the King,012Dx0000003p4xIAA,Parish,001Dx00001HwDwzIAF,Christ the King Parish\n7414 SE Michael Dr,Milwaukie,OR,97222.0,,503-659-1475,503-659-6138,office@ctk.cc,https://www.ctk.cc/,<p><strong>Weekend Mass</strong></p><p>Sat 5:3...,,,,,True,False,True,11709 SE Fuller Rd,,1959-01-01,Clackamas,750.0,,,,,116,Parishes_61,001Dx00001HwDxKIAV
52,"Holy Name, Coquille",Holy Name,012Dx0000003p4xIAA,Parish,001Dx00001HwDwyIAF,Holy Name Parish\nPO Box 368,Coquille,OR,97423.0,,541-396-3849,,parishoffice@holynamecq.org,https://holynamecq.org/,<p><strong>Weekend Mass</strong></p><p>Saturda...,,,,,True,False,True,50 S Dean St,,1913-01-01,Coos,250.0,,,,,116,Parishes_20,001Dx00001HwDxKIAV
45,"Holy Trinity Mission, Brownsville",Holy Trinity,012Dx0000003p4xIAA,Parish,001Dx00001HwDwnIAF,Holy Trinity Mission\n104 W Blakely Ave,Brownsville,OR,97327.0,,541-367-2530,,holytrinitybrownsvilleor@gmail.com,https://sweethomecatholicchurch.com/,<p><strong>Mass</strong></p><p>Sunday 11:30 am...,,,,,True,False,True,W Blakely Ave and French St,Parishes_144,NaT,Linn,0.0,,,,,116,Parishes_13,001Dx00001HwDxKIAV


#### Upsert Accounts (TBD )


In [822]:
# send accounts_staging to csv
accounts_staging.to_csv('staging_files/accounts_staging.csv', encoding='utf-8-sig')

In [823]:
# FIXME: Format ExternalID lookups into dictionary to match SF's api so can upsert using simple-salesforce

# Rename columns apis
accounts_staging = accounts_staging.rename(columns={'Parent_Parish__c': 'Parent_Parish__r'})  # Later on, attempt to include 'ParentId' (which, as a standard SF field, might not work)

# Reformat values to match what SF api requires
accounts_staging['Parent_Parish__r'] = accounts_staging.apply(lambda x: "{'Archdpdx_Migration_Id__c': '" + x['Parent_Parish__r'] + "'}" if pd.notna(x['Parent_Parish__r']) and x['Parent_Parish__r'] != 'None' and x['Parent_Parish__r'] != '' else None, axis=1)




In [824]:
accounts_staging

Unnamed: 0,Name,Formal_Name__c,RecordTypeId,mbfc__Church_Type__c,mbfc__Deanery__c,BillingStreet,BillingCity,BillingState,BillingPostalCode,BillingCountry,Phone,Fax,mbfc__Email__c,Website,mbfc__Mass_Times__c,mbfc__Abbreviation__c,mbfc__Religious_Suffix__c,mbfc__Type_Members__c,Description,Archdiocese_Assigns_Clergy__c,mbfc__Non_Latin__c,Disabled_Access__c,Locator_Description__c,Parent_Parish__r,mbfc__Date_Established__c,County__c,Sanctuary_Capacity__c,Religious_Secular_Order__c,Pontifical_or_Diocesan_Order__c,Archdiocesan_School_Code__c,Grades_Provided__c,Job_Id__c,Archdpdx_Migration_Id__c,ParentId
0,Pastoral Center,Pastoral Center,012Hu000001pkqEIAQ,,,2838 E Burnside St,Portland,OR,97214,,503-234-5334,503-234-2545,commdir@archdpdx.org,http://www.archdpdx.org/,<p><strong></strong></p><p></p><p><br></p><p><...,,,,,True,False,False,,,NaT,,,,,,,116,Offices_1,
1,Catholic Sentinel,Catholic Sentinel,012Hu000001pkqEIAQ,,,2838 E Burnside St,Portland,OR,97214,,503-281-1191,,sentinel@catholicsentinel.org,http://www.sentinel.org/,<p><strong></strong></p><p></p><p><br></p><p><...,,,,,False,False,False,,,NaT,,,,,,,116,Offices_3,
2,Catholic Cemeteries,Catholic Cemeteries,012Hu000001pkqEIAQ,,,333 SW Skyline Blvd,Portland,OR,97221,,503-292-6621,,,http://www.ccpdxor.com/,<p><strong></strong></p><p></p><p><br></p><p><...,,,,,False,False,False,,,NaT,,,,,,,116,Offices_4,
3,Griffin Center,Griffin Center,012Hu000001pkqEIAQ,,,11957 SE Fuller Rd,Milwaukie,OR,97222,,503-652-7476,,hwycoff@archdpdx.org,http://www.griffincenterportland.org/,<p><strong></strong></p><p></p><p><br></p><p><...,,,,,False,False,False,,,NaT,,,,,,,116,Offices_6,
4,Providence Portland Medical Center,Providence Portland Medical Center,012Hu000001pkqEIAQ,,,4805 NE Glisan St,Portland,OR,97213,,503-215-6833,,,,<p><strong></strong></p><p></p><p><br></p><p><...,,,,,True,False,False,,,NaT,,,,,,,116,Offices_11,
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
311,"Resurrection Catholic Parish School, Tualatin",Resurrection Catholic Parish School,012Hu000001pkqEIAQ,,,,Tualatin,OR,,,503-638-8869,,schooloffice@rcparish.org,https://www.resurrectioncatholicprimary.com/,<p><strong></strong></p><p></p><p><br></p><p><...,,,,,True,False,False,,,NaT,,,,,12-WEESRES,PK-5,116,Schools_58,
330,"OSU Newman Center, Corvallis",OSU Newman Center,012Hu000001pkqEIAQ,,,2127 NW Monroe Ave,Corvallis,OR,97330,,541-752-6818,,info@osunewman.org,http://www.osunewman.org/,<p><strong>Mass (During Academic Year)</strong...,,,,,False,False,False,,,NaT,,,,,,,116,NewmanCenters_1,
331,"St. Thomas More (UO) Newman Center, Eugene",St. Thomas More (UO) Newman Center,012Hu000001pkqEIAQ,,,1850 Emerald St,Eugene,OR,97403,,541-343-7021,541-686-8028,secretary@uonewman.org,http://www.uonewman.org/,<p><strong>Weekend Mass</strong></p><p>Saturda...,,,,,False,False,False,,,1915-01-01,,,,,,,116,NewmanCenters_2,
332,Walsh Memorial (SOU) Newman Center at Our Lady...,Walsh Memorial (SOU) Newman Center at Our Lady...,012Hu000001pkqEIAQ,,,987 Hillview Dr,Ashland,OR,97520,,541-708-8503,,emillenheft@archdpdx.org,https://ourladymt.org/the-newman-center,<p><strong>Sunday Mass</strong></p><p>5:00 pm ...,,,,,True,False,False,,,NaT,,,,,,,116,NewmanCenters_3,


In [825]:
print(accounts_staging['mbfc__Date_Established__c'].dtype)

datetime64[ns]


In [826]:

# Convert datetime to string in the desired format
accounts_staging['mbfc__Date_Established__c'] = accounts_staging['mbfc__Date_Established__c'].dt.strftime('%Y-%m-%d')

In [829]:
# Attempt to upsert using new function
# FIXME: This upsert isn't working but appears to have worked previously (according to the 'accounts_results' file)... it was because of the 'mbfc__Date_Established__c' field formatted incorrectly!

accounts_upsert2 = upsert_to_salesforce_bulk(sf, accounts_staging, 'Account', 'Archdpdx_Migration_Id__c', 'results_files/accounts_failed', batch_size=100)

Upsert completed. Successful upserts: 0, Failed upserts: 246


In [375]:
# Extract SF Account records

sf_accounts = sf.query('Select id, Name, RecordTypeId, mbfc__Church_Type__c, Archdpdx_Migration_Id__c, Job_Id__c from Account WHERE Job_Id__c != null')
sf_accounts = pd.DataFrame(sf_accounts['records'])
sf_accounts = sf_accounts.drop(columns = 'attributes')
sf_accounts

Unnamed: 0,Id,Name,RecordTypeId,mbfc__Church_Type__c,Archdpdx_Migration_Id__c,Job_Id__c
0,001Dx00001HwDyIIAV,Pastoral Center,012Hu000001pkqEIAQ,,Offices_1,108
1,001Dx00001HwDyJIAV,Catholic Sentinel,012Hu000001pkqEIAQ,,Offices_3,108
2,001Dx00001HwDyKIAV,Catholic Cemeteries,012Hu000001pkqEIAQ,,Offices_4,108
3,001Dx00001HwDyLIAV,Griffin Center,012Hu000001pkqEIAQ,,Offices_6,108
4,001Dx00001HwDyMIAV,Providence Portland Medical Center,012Hu000001pkqEIAQ,,Offices_11,108
...,...,...,...,...,...,...
330,001Dx00001HwDx1IAF,Southern Oregon Vicariate,012Dx0000003p4yIAA,,Vicariates_15,111
331,001Dx00001HwDx2IAF,Tualatin Valley Vicariate,012Dx0000003p4yIAA,,Vicariates_16,111
332,001Dx00001HwDx3IAF,"West Portland, Suburban Vicariate",012Dx0000003p4yIAA,,Vicariates_17,111
333,001Dx00001HwDx4IAF,Yamhill County Vicariate,012Dx0000003p4yIAA,,Vicariates_18,111


### C) Religious Institutes (Parents)


In [376]:
"""
- 'acc_religious' DF: create unique_id of religious parents
- create 'acc_religious_orders' DF , upsert into SF
- extract accounts from Salesforce, create dict (external_ID : account_ID)
- map parent ids onto religious child accounts DF in main DF
- 'acc_religious' > staging DF ('acc_religious')
    - drop unnecessary columns
    - upsert create DF of religious children, upsert into SF with
"""

# Create a new DF of all Religious accounts
acc_religious = accounts[accounts['AccountRecordType'] == 'Religious']

# Create a simplified external ID field
acc_religious['Archdpdx_Migration_Id__c'] = acc_religious['Order Full Name'].apply(
    lambda x: x.lower().replace(' ', '')[:40]
)

acc_religious_2 = acc_religious

# Create a DF for only parent religious order accounts
acc_religious_parents = acc_religious_2[[
    'Order Full Name', 
    # 'Name', 
    'mbfc__Abbreviation__c', 
    'mbfc__Religious_Suffix__c', 
    'mbfc__Type_Members__c', 
    'Archdpdx_Migration_Id__c',
    'Pontifical_or_Diocesan_Order__c',
    'Religious_Secular_Order__c',
    ]]

# Drop duplicate rows of the same parent Religious Order (becuase there are more than 1 local community of a particular order)
acc_religious_parents.drop_duplicates('Order Full Name', inplace=True)

# Manipulate the 'Name' field to remove any comma and subsequent text
# acc_religious_parents['Name'] = acc_religious_parents['Name'].str.split(',').str[0]

# How many remaining rows after dropping duplicates?
print(acc_religious_parents.shape)

# Rename columns
acc_religious_parents = acc_religious_parents.rename(columns={
    # 'Order Full Name': 'Description',
    'Order Full Name': 'Name'
    })

# Drop NA
acc_religious_parents.fillna('', inplace=True)

acc_religious_parents


(62, 7)


A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  acc_religious['Archdpdx_Migration_Id__c'] = acc_religious['Order Full Name'].apply(
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  acc_religious_parents.drop_duplicates('Order Full Name', inplace=True)


Unnamed: 0,Name,mbfc__Abbreviation__c,mbfc__Religious_Suffix__c,mbfc__Type_Members__c,Archdpdx_Migration_Id__c,Pontifical_or_Diocesan_Order__c,Religious_Secular_Order__c
186,Societas Iesu,Jesuits,SJ,Men,societasiesu,,Religious Order
187,Ordo Cisterciensis Strictioris Observantiae,Trappists,OCSO,Men,ordocisterciensisstrictiorisobservantiae,Pontifical Order,Religious Order
189,Ordo Sancti Benedicti,Benedictines,OSB,Men,ordosanctibenedicti,,Religious Order
190,Misioneros del Espíritu Santo,"Missionaries of the Holy Spirit, Christ the Pr...",MSpS,Men,misionerosdelespíritusanto,,
191,Apostles of Jesus,Apostles of Jesus,AJ,Men,apostlesofjesus,Diocesan Order,Religious Order
...,...,...,...,...,...,...,...
249,Fraternità san Carlo Borromeo,Fraternity of St. Charles,FSCB,Men,fraternitàsancarloborromeo,,
250,"Sons of Mary, Mother of Mercy","Sons of Mary, Mother of Mercy",SMMM,Men,"sonsofmary,motherofmercy",,
251,Society of the Divine Word,Society of the Divine Word,SVD,Men,societyofthedivineword,,
252,Society of the Divine Saviour,Society of the Divine Saviour,SDS,Men,societyofthedivinesaviour,,


In [377]:
acc_religious_parents['mbfc__Religious_Type__c'] = 'Congregation'

In [378]:
# Set recordType to 'Religious'

religious_recordtype_id = df_sf_recordTypes.loc[
    (df_sf_recordTypes['DeveloperName'] == 'Religious') & (df_sf_recordTypes['SobjectType'] == 'Account'),
    'Id'
    ].iloc[0]  # Use .iloc[0] to get the first item if you're expecting exactly one match

print(religious_recordtype_id)

acc_religious_parents['RecordTypeId'] = religious_recordtype_id

acc_religious_parents.sample(10)

012Dx0000003p52IAA


Unnamed: 0,Name,mbfc__Abbreviation__c,mbfc__Religious_Suffix__c,mbfc__Type_Members__c,Archdpdx_Migration_Id__c,Pontifical_or_Diocesan_Order__c,Religious_Secular_Order__c,mbfc__Religious_Type__c,RecordTypeId
220,Sisters of the Holy Names of Jesus and Mary U....,Holy Names Sisters,SNJM,Women,sistersoftheholynamesofjesusandmaryu.s.-,,Religious Order,Congregation,012Dx0000003p52IAA
223,Sisters of Mercy of the Americas West/Midwest ...,Sisters of Mercy,RSM,Women,sistersofmercyoftheamericaswest/midwestr,,,Congregation,012Dx0000003p52IAA
189,Ordo Sancti Benedicti,Benedictines,OSB,Men,ordosanctibenedicti,,Religious Order,Congregation,012Dx0000003p52IAA
249,Fraternità san Carlo Borromeo,Fraternity of St. Charles,FSCB,Men,fraternitàsancarloborromeo,,,Congregation,012Dx0000003p52IAA
193,Apostolic Life Community of Priests in the Opu...,Holy Spirit Fathers,ALCP,Men,apostoliclifecommunityofpriestsintheopus,,,Congregation,012Dx0000003p52IAA
239,Missionary Oblates of Mary Immaculate,Oblates of Mary Immaculate,OMI,Men,missionaryoblatesofmaryimmaculate,,,Congregation,012Dx0000003p52IAA
191,Apostles of Jesus,Apostles of Jesus,AJ,Men,apostlesofjesus,Diocesan Order,Religious Order,Congregation,012Dx0000003p52IAA
238,Heralds of the Good News,Heralds of the Good News,HGN,Men,heraldsofthegoodnews,,,Congregation,012Dx0000003p52IAA
205,Adorers of the Holy Cross,Adorers of the Holy Cross,MTG,Women,adorersoftheholycross,,,Congregation,012Dx0000003p52IAA
253,Society of Our Lady of the Most Holy Trinity,Society of Our Lady of the Most Holy Trinity,SOLT,Men,societyofourladyofthemostholytrinity,,,Congregation,012Dx0000003p52IAA


In [379]:
# Send to CSV
acc_religious_parents.to_csv('staging_files/religious_order_staging.csv', encoding='utf-8-sig')

In [380]:
# Upsert to Salesforce
bulk_data = []
for row in acc_religious_parents.itertuples(index=False):
    d = row._asdict()
    # del d['Index']
    bulk_data.append(d)

if run_upserts == 'True':
    religious_order_upsert = sf.bulk.Account.upsert(data=bulk_data, external_id_field='Archdpdx_Migration_Id__c', batch_size=100, use_serial=False)
    df_rel_order_upsert = pd.DataFrame(religious_order_upsert)

df_rel_order_upsert

Unnamed: 0,success,created,id,errors
0,True,False,001Dx00001HwE3TIAV,[]
1,True,False,001Dx00001HwE3UIAV,[]
2,True,False,001Dx00001HwE3VIAV,[]
3,True,False,001Dx00001HwE3WIAV,[]
4,True,False,001Dx00001HwE3XIAV,[]
...,...,...,...,...
57,True,False,001Dx00001HwE4OIAV,[]
58,True,False,001Dx00001HwE4PIAV,[]
59,True,False,001Dx00001HwE4QIAV,[]
60,True,False,001Dx00001HwE4RIAV,[]


In [381]:
# Generate an Errors log
import csv

keys = religious_order_upsert[0].keys()

with open('results_files/religious_order_results', 'w', newline='') as csv_file:
    writer = csv.DictWriter(csv_file, keys)
    writer.writeheader()
    writer.writerows(religious_order_upsert)

In [382]:
# @title get SF Accounts
get_all_rel_accounts = f"Select id, Name, RecordTypeId, Type, Archdpdx_Migration_Id__c from Account where RecordTypeID = '{religious_recordtype_id}'"

print(religious_recordtype_id)

# get list of records, add to dataframe
sf_accounts = sf.query(get_all_rel_accounts)
df_sf_accounts = pd.DataFrame(sf_accounts['records'])
df_sf_accounts = df_sf_accounts.drop(columns = 'attributes')

df_sf_accounts.sample(10)

012Dx0000003p52IAA


Unnamed: 0,Id,Name,RecordTypeId,Type,Archdpdx_Migration_Id__c
32,001Dx00001HwE3lIAF,Congregación de Oblatas de Santa Marta,012Dx0000003p52IAA,,congregacióndeoblatasdesantamarta
26,001Dx00001HwE3fIAF,Ordo Fratrum Minorum Province of Saint Barbara,012Dx0000003p52IAA,,ordofratrumminorumprovinceofsaintbarbara
7,001Dx00001HwFFCIA3,Josephite Fathers,012Dx0000003p52IAA,,
113,001Dx00001HwE5DIAV,"Sisters of Mercy of the Americas, Portland (RSM)",012Dx0000003p52IAA,,RelCommunities_47
11,001Dx00001HwFFwIAN,Conventual Franciscan,012Dx0000003p52IAA,,
31,001Dx00001HwE3kIAF,Dominican Sisters of Adrian,012Dx0000003p52IAA,,dominicansistersofadrian
44,001Dx00001HwE3xIAF,Sisters of Jesus the Saviour,012Dx0000003p52IAA,,sistersofjesusthesaviour
131,001Dx00001HwE5VIAV,"Order of Friars Minor, Conventual, Portland (O...",012Dx0000003p52IAA,,RelCommunities_67
54,001Dx00001HwDxoIAF,Colombiere Jesuit Community,012Dx0000003p52IAA,,
134,001Dx00001HwE5YIAV,"Franciscan Friars of the Renewal, New York, NY...",012Dx0000003p52IAA,,RelCommunities_70


In [383]:
religious_order_mapping = df_sf_accounts.set_index('Archdpdx_Migration_Id__c')['Id'].to_dict()
# religious_order_mapping

### D) Religious Communities


In [384]:
acc_religious_staging = (acc_religious
                         .rename(columns={'Archdpdx_Migration_Id__c' : 'Parent_Archdpdx_Migration_Id__c'})
)

acc_religious_staging['ParentId'] = acc_religious_staging['Parent_Archdpdx_Migration_Id__c'].map(religious_order_mapping)

In [385]:
pd.set_option('display.max_columns', None)

In [386]:
# Enrich the data

acc_religious_staging['mbfc__Religious_Type__c'] = 'Local Community'
acc_religious_staging['Archdpdx_Migration_Id__c'] = 'RelCommunities_' + acc_religious_staging['Record Number'].astype('str')
acc_religious_staging['RecordTypeId'] = religious_recordtype_id
# acc_religious_staging.drop(columns='Name', inplace=True)
# acc_religious_staging.rename(columns={
#     'Name, City': 'Name'
# }, inplace=True)

acc_religious_staging.sample(5)

Unnamed: 0,Record Number,AccountRecordType,Formal_Name__c,Name,Parish Name,Archdiocese_Assigns_Clergy__c,Locator_Description__c,BillingCity,BillingState,Mailing Address Province,BillingPostalCode,BillingCountry,Phone,Fax,mbfc__Email__c,Website,src_table,Sort Name,Parish City,Parent_Parish__c,mbfc__Date_Established__c,Vicariate,Non-Latin,County__c,Disabled_Access__c,Sanctuary_Capacity__c,Lat/Long Coordinates Decimal,Google Small Embed URL,Miles to Pastoral Center,Schedule 1 Head,Schedule 1 Text,Schedule 2 Head,Schedule 2 Text,Schedule 3 Head,Schedule 3 Text,Schedule 4 Head,Schedule 4 Text,Schedule 5 Head,Schedule 5 Text,Schedule 6 Head,Schedule 6 Text,Schedule 7 Head,Schedule 7 Text,Community City,Order Full Name,mbfc__Abbreviation__c,mbfc__Religious_Suffix__c,mbfc__Type_Members__c,Non-Latin Rite,Show Order in Name,Description,Local Superior,Major Superior Name,Major Superior Phone,Major Superior Email,School City,Parish Link,Vicariate Link,Archdiocesan_School_Code__c,Grades_Provided__c,Mailing Address 1,Mailing Address Zip,Vicariate Name,Mailing Address City2,mbfc__Non_Latin__c,BillingStreet,Religious_Secular_Order__c,Pontifical_or_Diocesan_Order__c,ParentId,mbfc__Church_Type__c,Parent_Archdpdx_Migration_Id__c,RecordTypeId,Job_Id__c,mbfc__Religious_Type__c,Archdpdx_Migration_Id__c
190,8,Religious,Missionaries of the Holy Spirit Provincial House,Missionaries of the Holy Spirit Provincial Hou...,,False,2512 SE Monroe St,Milwaukie,OR,,97269.0,,503-324-2492,503-324-2493,,www.mspscpp.org,RelCommunities,,,,NaT,,,,False,,,,,,,,,,,,,,,,,,,Milwaukie,Misioneros del Espíritu Santo,"Missionaries of the Holy Spirit, Christ the Pr...",MSpS,Men,No,Yes,,0.0,,,,,,,,,,,,,False,PO Box 22387,,,001Dx00001HwE3WIAV,,misionerosdelespíritusanto,012Dx0000003p52IAA,111,Local Community,RelCommunities_8
210,34,Religious,Sisters of St. Dominic of Caldwell,"Sisters of St. Dominic of Caldwell, Caldwell, ...",,False,,Caldwell,NJ,,7006.0,,973-403-3331,973-228-9611,dempsey@up.edu,https://caldwellop.org/,RelCommunities,,,,NaT,,,,False,,,,,,,,,,,,,,,,,,,"Caldwell, NJ",Sisters of St. Dominic of Caldwell,Sisters of St. Dominic,OP,Women,No,Yes,Serving the University of Portland,0.0,"Sr. Luella Ramm, OP",973-403-3331,dominicans@caldwellop.org,,,,,,,,,,False,1 Ryerson Avenue,Religious Order,,001Dx00001HwE3mIAF,,sistersofst.dominicofcaldwell,012Dx0000003p52IAA,111,Local Community,RelCommunities_34
186,1,Religious,Colombiere Jesuit Community,"Colombiere Jesuit Community, Portland (SJ)",,False,,Portland,OR,,97206.0,,503-595-1941,,,https://www.jesuitswest.org/,RelCommunities,,,,NaT,,,,False,,,,,,,,,,,,,,,,,,,Portland,Societas Iesu,Jesuits,SJ,Men,No,Yes,"Manager: Fr. Paul Cochran, SJ",1525.0,"Rev. Sean Carroll, SJ",,,,,,,,,,,,False,3220 SE 43rd Ave,Religious Order,,001Dx00001HwE3TIAV,,societasiesu,012Dx0000003p52IAA,111,Local Community,RelCommunities_1
239,65,Religious,Missionary Oblates of Mary Immaculate,"Missionary Oblates of Mary Immaculate, Rome, I...",,False,,Roma,,,165.0,ITALY,,,,,RelCommunities,,,,NaT,,,,False,,,,,,,,,,,,,,,,,,,"Rome, ITALY",Missionary Oblates of Mary Immaculate,Oblates of Mary Immaculate,OMI,Men,No,Yes,,0.0,"Fr. Luis Ignacio Rois Alonso, OMI",,gensec@omigen.org,,,,,,,,,,False,Missionary Oblates of Mary Immaculate\nVia Aur...,,,001Dx00001HwE4EIAV,,missionaryoblatesofmaryimmaculate,012Dx0000003p52IAA,111,Local Community,RelCommunities_65
244,70,Religious,Franciscan Friars of the Renewal,"Franciscan Friars of the Renewal, New York, NY...",,False,,,,,,,,,,https://www.franciscanfriars.com/,RelCommunities,,,,NaT,,,,False,,,,,,,,,,,,,,,,,,,"New York, NY",Fratres Franciscani a Renovatione,Franciscan Friars of the Renewal,CFR,Men,No,No,,0.0,"John Paul Ouellette, CFR",,,,,,,,,,,,False,,,,001Dx00001HwE4JIAV,,fratresfranciscaniarenovatione,012Dx0000003p52IAA,111,Local Community,RelCommunities_70


In [387]:
acc_religious_staging.sample(5)

Unnamed: 0,Record Number,AccountRecordType,Formal_Name__c,Name,Parish Name,Archdiocese_Assigns_Clergy__c,Locator_Description__c,BillingCity,BillingState,Mailing Address Province,BillingPostalCode,BillingCountry,Phone,Fax,mbfc__Email__c,Website,src_table,Sort Name,Parish City,Parent_Parish__c,mbfc__Date_Established__c,Vicariate,Non-Latin,County__c,Disabled_Access__c,Sanctuary_Capacity__c,Lat/Long Coordinates Decimal,Google Small Embed URL,Miles to Pastoral Center,Schedule 1 Head,Schedule 1 Text,Schedule 2 Head,Schedule 2 Text,Schedule 3 Head,Schedule 3 Text,Schedule 4 Head,Schedule 4 Text,Schedule 5 Head,Schedule 5 Text,Schedule 6 Head,Schedule 6 Text,Schedule 7 Head,Schedule 7 Text,Community City,Order Full Name,mbfc__Abbreviation__c,mbfc__Religious_Suffix__c,mbfc__Type_Members__c,Non-Latin Rite,Show Order in Name,Description,Local Superior,Major Superior Name,Major Superior Phone,Major Superior Email,School City,Parish Link,Vicariate Link,Archdiocesan_School_Code__c,Grades_Provided__c,Mailing Address 1,Mailing Address Zip,Vicariate Name,Mailing Address City2,mbfc__Non_Latin__c,BillingStreet,Religious_Secular_Order__c,Pontifical_or_Diocesan_Order__c,ParentId,mbfc__Church_Type__c,Parent_Archdpdx_Migration_Id__c,RecordTypeId,Job_Id__c,mbfc__Religious_Type__c,Archdpdx_Migration_Id__c
253,79,Religious,Society of Our Lady of the Most Holy Trinity,"Society of Our Lady of the Most Holy Trinity, ...",,False,,Corpus Christi,TX,,78469,,,,,https://solt.net/,RelCommunities,,,,NaT,,,,False,,,,,,,,,,,,,,,,,,,"Corpus Christi, TX",Society of Our Lady of the Most Holy Trinity,Society of Our Lady of the Most Holy Trinity,SOLT,Men,No,Yes,,0.0,,,,,,,,,,,,,False,PO Box 4116,,,001Dx00001HwE4SIAV,,societyofourladyofthemostholytrinity,012Dx0000003p52IAA,111,Local Community,RelCommunities_79
227,51,Religious,Sisters of St. Francis of Philadelphia,"Sisters of St. Francis of Philadelphia, Portla...",,False,,Aston,PA,,19014,,610-459-4125,,communications@osfphila.org,https://osfphila.org/,RelCommunities,,,,NaT,,,,False,,,,,,,,,,,,,,,,,,,Portland,Sisters of St. Francis of Philadelphia,Sisters of St. Francis of Philadelphia,OSF,Women,No,Yes,"Serving Ascension Parish, Cathedral of the Imm...",0.0,"Sr. Theresa Marie Firenze, OSF",610-459-4125,tfirenze@osfphila.org,,,,,,,,,,False,609 S Convent Rd,,,001Dx00001HwE43IAF,,sistersofst.francisofphiladelphia,012Dx0000003p52IAA,111,Local Community,RelCommunities_51
226,50,Religious,Sisters of St. Francis,"Sisters of St. Francis, Lake Oswego (OSF)",,False,,Clinton,IA,,52732,,503-657-0109,,,http://www.clintonfranciscans.com/,RelCommunities,,,,NaT,,,,False,,,,,,,,,,,,,,,,,,,Lake Oswego,"Sisters of St. Francis, Clinton, Iowa",Sisters of St. Francis,OSF,Women,No,Yes,"Serving Our Lady of the Lake Parish, Lake Oswego",0.0,,,,,,,,,,,,,False,843 13th Ave N,,,001Dx00001HwE42IAF,,"sistersofst.francis,clinton,iowa",012Dx0000003p52IAA,111,Local Community,RelCommunities_50
233,57,Religious,Priory of Our Lady of Consolation,"Priory of Our Lady of Consolation, Amity (OSsS)",,False,,Amity,OR,,97101,,503-835-8080,503-835-9662,monks@brigittine.org,http://www.brigittine.com/,RelCommunities,,,,NaT,,,,False,,,,,,,,,,,,,,,,,,,Amity,"Brigittine Monks, Order of the Most Holy Savior",Brigittines,OSsS,Men,No,Yes,Canonical status of a Priory “Sui Juris”. Brot...,2425.0,,,,,,,,,,,,,False,Priory of Our Lady of Consolation\n23300 SW Wa...,,,001Dx00001HwE49IAF,,"brigittinemonks,orderofthemostholysavior",012Dx0000003p52IAA,111,Local Community,RelCommunities_57
234,60,Religious,Jesuits West Provincial Office,"Jesuits West Provincial Office, Portland (SJ)",,False,3215 SE 45th Ave,Portland,OR,,97286,,503-226-6977,503-228-6741,uweprovince@jesuits.org,https://www.jesuitswest.org/,RelCommunities,,,,NaT,,,,False,,,,,,,,,,,,,,,,,,,Portland,Societas Iesu,Jesuits,SJ,Men,No,Yes,,0.0,"Rev. Sean Carroll, SJ",,,,,,,,,,,,False,PO Box 86010,Religious Order,,001Dx00001HwE3TIAV,,societasiesu,012Dx0000003p52IAA,111,Local Community,RelCommunities_60


In [388]:
acc_religious_staging_2 = acc_religious_staging[[
    'Name',
    'RecordTypeId',
    'mbfc__Religious_Type__c',
    'BillingStreet',
    'BillingCity',
    'BillingState',
    'BillingPostalCode',
    'BillingCountry',
    'Phone',
    'Fax',
    'mbfc__Email__c',
    'Website',
    'mbfc__Abbreviation__c',
    'mbfc__Religious_Suffix__c',
    'mbfc__Type_Members__c',
    'Description',
    'Job_Id__c',
    'ParentId',
    'Archdpdx_Migration_Id__c'
    ]]

acc_religious_staging_2.sample(5)

Unnamed: 0,Name,RecordTypeId,mbfc__Religious_Type__c,BillingStreet,BillingCity,BillingState,BillingPostalCode,BillingCountry,Phone,Fax,mbfc__Email__c,Website,mbfc__Abbreviation__c,mbfc__Religious_Suffix__c,mbfc__Type_Members__c,Description,Job_Id__c,ParentId,Archdpdx_Migration_Id__c
245,"Oblates of the Virgin Mary, Boston, MA (OMV)",012Dx0000003p52IAA,Local Community,2 Ipswich Street,Boston,MA,2215,,617-536-4141,,office@omvusa.org,https://www.omvusa.org/,Oblates of the Virgin Mary,OMV,Men,,111,001Dx00001HwE4KIAV,RelCommunities_71
206,"Adrian Dominican Sisters, Adrian, MI (OP)",012Dx0000003p52IAA,Local Community,1257 East Siena Heights Drive,Adrian,MI,49221,,517-266-3400,,jfinfera@adriandominicans.org,http://adriandominicans.org/,Dominicans,OP,Women,,111,001Dx00001HwE3kIAF,RelCommunities_30
200,"Society of Domus Dei Holy House Monasteries, W...",012Dx0000003p52IAA,Local Community,462 Hudson Rd,Washougal,WA,98671,,360-835-5358,,,http://nhachua.net/,Domus Dei,SDD,Men,Serving Our Lady of Lavang Parish; Southeast A...,111,001Dx00001HwE3eIAF,RelCommunities_21
226,"Sisters of St. Francis, Lake Oswego (OSF)",012Dx0000003p52IAA,Local Community,843 13th Ave N,Clinton,IA,52732,,503-657-0109,,,http://www.clintonfranciscans.com/,Sisters of St. Francis,OSF,Women,"Serving Our Lady of the Lake Parish, Lake Oswego",111,001Dx00001HwE42IAF,RelCommunities_50
190,Missionaries of the Holy Spirit Provincial Hou...,012Dx0000003p52IAA,Local Community,PO Box 22387,Milwaukie,OR,97269,,503-324-2492,503-324-2493,,www.mspscpp.org,"Missionaries of the Holy Spirit, Christ the Pr...",MSpS,Men,,111,001Dx00001HwE3WIAV,RelCommunities_8


In [389]:
# Final Cleanup

acc_religious_staging_2 = acc_religious_staging_2.fillna('')

In [390]:
# @title Send to CSV
acc_religious_staging_2.to_csv('/Users/matthewmartin/Library/CloudStorage/GoogleDrive-matt@meribahflow.com/Shared drives/Clients/ADPDX (Portland)/Data/Clergy DB/staging/religious_community_staging.csv', encoding='utf-8-sig')

In [391]:
# @title Upsert to Salesforce
bulk_data = []
for row in acc_religious_staging_2.itertuples(index=False):
    d = row._asdict()
    # del d['Index']
    bulk_data.append(d)

if run_upserts == 'True':
    religious_community_upsert = sf.bulk.Account.upsert(data=bulk_data, external_id_field='Archdpdx_Migration_Id__c', batch_size=100, use_serial=False)
    df_rel_community_upsert = pd.DataFrame(religious_community_upsert)

df_rel_community_upsert

Unnamed: 0,success,created,id,errors
0,True,False,001Dx00001HwE4dIAF,[]
1,True,False,001Dx00001HwE4eIAF,[]
2,True,False,001Dx00001HwE4fIAF,[]
3,True,False,001Dx00001HwE4gIAF,[]
4,True,False,001Dx00001HwE4hIAF,[]
...,...,...,...,...
65,True,False,001Dx00001HwE5fIAF,[]
66,True,False,001Dx00001HwE5gIAF,[]
67,True,False,001Dx00001HwE5hIAF,[]
68,True,False,001Dx00001HwE5iIAF,[]


### E) Religious Superiors


In [None]:
acc_rel_superiors = acc_religious_2[[
    'Name',
    'Major Superior Name',
    'Major Superior Phone',
    'Major Superior Email',
    'Archdpdx_Migration_Id__c']]


acc_rel_superiors['AccountId'] = acc_rel_superiors.Archdpdx_Migration_Id__c.map(religious_order_mapping)

# acc_rel_superiors.sample(5)

In [None]:
# @title Parse Complex Names
def parse_names(df, column_name):
    # Convert all non-string entries to strings (handling NaN and other data types)
    df[column_name] = df[column_name].fillna('').apply(str)

    # Create a new DataFrame to store the name parts
    name_parts = pd.DataFrame()

    # Parse each name in the column
    name_parts['First Name'] = df[column_name].apply(lambda x: HumanName(x).first if x.strip() != '' else '')
    name_parts['Last Name'] = df[column_name].apply(lambda x: HumanName(x).last if x.strip() != '' else '')
    name_parts['Middle Name'] = df[column_name].apply(lambda x: HumanName(x).middle if x.strip() != '' else '')
    name_parts['Title'] = df[column_name].apply(lambda x: HumanName(x).title if x.strip() != '' else '')
    name_parts['Suffix'] = df[column_name].apply(lambda x: HumanName(x).suffix if x.strip() != '' else '')
    name_parts['Nickname'] = df[column_name].apply(lambda x: HumanName(x).nickname if x.strip() != '' else '')

    # Combine the original DataFrame with the name parts DataFrame
    result_df = pd.concat([df, name_parts], axis=1)
    return result_df



In [None]:
!pip install nameparser
from nameparser import HumanName
from nameparser.config import CONSTANTS

# Add dataset-specific Titles and Suffix constants for parsing
CONSTANTS.titles.add('Very', 'Rev.', 'Very Rev.', 'Sr.')
CONSTANTS.suffix_acronyms.add('FRS', 'OMI', 'OSA', 'OCD', 'OP', 'OC', 'FSE', 'OMV', 'SDB', 'SM', 'SFX', 'SP', 'OP', 'O.S.M', 'SNJM', 'OSF', 'HMRF', 'DD', 'CSJP', 'SDD', 'BVM', 'BVM - President' )


In [None]:
# Parse Complex Names
acc_rel_superiors_parsed = parse_names(acc_rel_superiors, 'Major Superior Name')

In [None]:
# @title Final cleanup

acc_rel_superiors_staging = acc_rel_superiors_parsed.fillna('')

acc_rel_superiors_staging['Archdpdx_Migration_Id__c'] = acc_rel_superiors_staging['Major Superior Name'].apply(lambda x: x.replace(' ','').lower())

# Rename columns
acc_rel_superiors_staging = acc_rel_superiors_staging.rename(columns={
    'Major Superior Phone': 'Phone',
    'Major Superior Email': 'Email',
    'Title': 'Salutation',
    'First Name': 'FirstName',
    'Middle Name': 'MiddleName',
    'Last Name': 'LastName'
})

# Add job id
acc_rel_superiors_staging['Archdpdx_Job_Id__c'] = curr_job_id

# Drop columns
acc_rel_superiors_staging = acc_rel_superiors_staging.drop(columns=['Name', 'Major Superior Name', 'Nickname'])

# Drop empty rows
acc_rel_superiors_staging = acc_rel_superiors_staging[acc_rel_superiors_staging['LastName'].str.strip() != '']

acc_rel_superiors_staging.sample(10)

In [None]:
# @title Send to CSV
acc_rel_superiors_staging.to_csv('staging_files/religious_superiors_staging.csv', encoding='utf-8-sig')

In [None]:
# Upsert to Salesforce

def find_existing_contact(sf, first_name, last_name):
    query = f"SELECT Id, Archdpdx_Migration_Id__c FROM Contact WHERE FirstName = '{first_name}' AND LastName = '{last_name}'"
    result = sf.query(query)
    return result['records']



bulk_data = []
for row in acc_rel_superiors_staging.itertuples(index=False):
    d = row._asdict()
    existing_contacts = find_existing_contact(sf, d['FirstName'], d['LastName'])
    if existing_contacts:
        # Update existing contact with external ID
        d['Id'] = existing_contacts[0]['Id']
        bulk_data.append(d)
    else:
        bulk_data.append(d)


if run_upserts == 'True':
    religious_superior_upsert = sf.bulk.Contact.upsert(data=bulk_data, external_id_field='Archdpdx_Migration_Id__c', batch_size=100, use_serial=False)
    df_rel_superior_upsert = pd.DataFrame(religious_superior_upsert)

df_rel_superior_upsert

In [None]:
# Update Religious Communities with Rel. Superior

# TODO: It would take much less time to simply do this post-migration manually.

# CONTACTS


## Extract


In [351]:
import pandas as pd
df_contacts = (pd.read_csv('/Users/matthewmartin/Library/CloudStorage/GoogleDrive-matt@meribahflow.com/Shared drives/Clients/ADPDX (Portland)/Data/Clergy DB/reports from clergypdx/People.csv')
               .set_index('Record Number', verify_integrity=True)
               .drop(index='recNum') # Drops the extra row that replicates the labels
               .rename(columns=lambda x: x.replace(' ', '_')) # Remove whitespace in column names
)

df_contacts.sample(10)


Unnamed: 0_level_0,Common_Name,Sort_Name,Type(s),Clergy_Status,Religious_Status,Login_ID,Password,Password_Must_be_Changed,Access_Permission,Spouse,Title,Salutation,Christian_Name,Nickname,Middle_Name(s),Surname,Suffix,Mailing_Address,Mailing_Address_2,Mailing_Address_City,Mailing_Address_State,Mailing_Address_Province,Mailing_Address_Postal_Code,Mailing_Address_Country,Private_Address,Private_Address_2,Private_Address_City,Private_Address_State,Private_Address_Province,Private_Address_Postal_Code,Private_Address_Country,Preferred_Address,Work_Phone,Home_Phone,Cell_Phone,Preferred_Phone,Work_Email,Archdiocesan_Email,Home_Email,Preferred_Email,Directory_Include,Directory_Include_Middle_Name,Directory_Include_Suffix,Suppress_From_Reports,Seminarian_Student_Debt,Seminarian_Medical_Benefits,Send_Group_Mail_and_Email,Birth_Date,Place_of_Birth,Foreign_Born,Father_Full_Name,Mother_Full_Maiden_Name,Foreign_Citizenship,Immigration_Status,Passport/Visa_Expiration_Date,Social_Security_Account_Number,Baptism_Date,Place_of_Baptism,Confirmation_Date,Place_of_Confirmation,Received_Date,Parish_of_Record,Marriage_Date,Place_of_Marriage,Date_of_First_Vows,Date_of_Final_Vows,Accepted_to_Formation_Date,Reader_Date,Acolyte_Date,Candidacy_Date,Formation_Withdrawn_Date,Formation_Deferred_Date,Formation_Terminated_Date,Terminate_or_Defer_Note,Bachelor_Degree_Year,Bachelor_Degree_Type,Bachelor_Degree_Institution,Graduate_1_Degree_Year,Graduate_1_Degree_Type,Graduate_1_Degree_Institution,Graduate_2_Degree_Year,Graduate_2_Degree_Type,Graduate_2_Degree_Institution,Graduate_3_Degree_Year,Graduate_3_Degree_Type,Graduate_3_Degree_Institution,Graduate_4_Degree_Year,Graduate_4_Degree_Type,Graduate_4_Degree_Institution,CARA_Highest_Ed_Level,Diaconal_Ordination_Date,Diaconal_Ordination_Place,Diaconal_Ordination_Prelate,Presbyteral_Ordination_Date,Presbyteral_Ordination_Place,Presbyteral_Ordination_Prelate,Episcopal_Ordination_Date,Episcopal_Ordination_Place,Episcopal_Ordination_Prelate,Ordination_Diocese,Incardinated_From_Diocese,Incardinated_From_Date,Incardinated_Now,Serving_Now,Excardinated_To_Diocese,Excardinated_To_Date,Letter_of_Good_Standing_Date,Religious_In_Archdiocese_Date,Faculties,Faculties_Granted_Date,Faculties_Restricted_Date,Faculties_Withdrawn_Date,Last_Retreat_Date,Last_Educ_Requirement_Date,Policy_Manual_Acknowledgement_Date,Harassment_Prevention_Course_Date,Standards_of_Conduct_Date,Last_Background_Check_Date,Last_Child_Protection_Training_Date,Out_of_Diocese_Date,Senior_Status_Date,Laicized_Date,Deceased_Date,Languages,Coverage_Availability,Advanced_Directive_Date,End_of_Life_Plan_Date,Will_Date,Will_Note,CIC_489_File,Registered_Parish,CARA_Ethnicity,Seminarian_Status,Other_Diaconal_Ministry,Spiritual_Director_Authorized,Link_to_Religious_Community,Place_of_Work,Volunteer_Place,Type_of_Work,Work_Load,Work_Title
Record Number,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1,Unnamed: 22_level_1,Unnamed: 23_level_1,Unnamed: 24_level_1,Unnamed: 25_level_1,Unnamed: 26_level_1,Unnamed: 27_level_1,Unnamed: 28_level_1,Unnamed: 29_level_1,Unnamed: 30_level_1,Unnamed: 31_level_1,Unnamed: 32_level_1,Unnamed: 33_level_1,Unnamed: 34_level_1,Unnamed: 35_level_1,Unnamed: 36_level_1,Unnamed: 37_level_1,Unnamed: 38_level_1,Unnamed: 39_level_1,Unnamed: 40_level_1,Unnamed: 41_level_1,Unnamed: 42_level_1,Unnamed: 43_level_1,Unnamed: 44_level_1,Unnamed: 45_level_1,Unnamed: 46_level_1,Unnamed: 47_level_1,Unnamed: 48_level_1,Unnamed: 49_level_1,Unnamed: 50_level_1,Unnamed: 51_level_1,Unnamed: 52_level_1,Unnamed: 53_level_1,Unnamed: 54_level_1,Unnamed: 55_level_1,Unnamed: 56_level_1,Unnamed: 57_level_1,Unnamed: 58_level_1,Unnamed: 59_level_1,Unnamed: 60_level_1,Unnamed: 61_level_1,Unnamed: 62_level_1,Unnamed: 63_level_1,Unnamed: 64_level_1,Unnamed: 65_level_1,Unnamed: 66_level_1,Unnamed: 67_level_1,Unnamed: 68_level_1,Unnamed: 69_level_1,Unnamed: 70_level_1,Unnamed: 71_level_1,Unnamed: 72_level_1,Unnamed: 73_level_1,Unnamed: 74_level_1,Unnamed: 75_level_1,Unnamed: 76_level_1,Unnamed: 77_level_1,Unnamed: 78_level_1,Unnamed: 79_level_1,Unnamed: 80_level_1,Unnamed: 81_level_1,Unnamed: 82_level_1,Unnamed: 83_level_1,Unnamed: 84_level_1,Unnamed: 85_level_1,Unnamed: 86_level_1,Unnamed: 87_level_1,Unnamed: 88_level_1,Unnamed: 89_level_1,Unnamed: 90_level_1,Unnamed: 91_level_1,Unnamed: 92_level_1,Unnamed: 93_level_1,Unnamed: 94_level_1,Unnamed: 95_level_1,Unnamed: 96_level_1,Unnamed: 97_level_1,Unnamed: 98_level_1,Unnamed: 99_level_1,Unnamed: 100_level_1,Unnamed: 101_level_1,Unnamed: 102_level_1,Unnamed: 103_level_1,Unnamed: 104_level_1,Unnamed: 105_level_1,Unnamed: 106_level_1,Unnamed: 107_level_1,Unnamed: 108_level_1,Unnamed: 109_level_1,Unnamed: 110_level_1,Unnamed: 111_level_1,Unnamed: 112_level_1,Unnamed: 113_level_1,Unnamed: 114_level_1,Unnamed: 115_level_1,Unnamed: 116_level_1,Unnamed: 117_level_1,Unnamed: 118_level_1,Unnamed: 119_level_1,Unnamed: 120_level_1,Unnamed: 121_level_1,Unnamed: 122_level_1,Unnamed: 123_level_1,Unnamed: 124_level_1,Unnamed: 125_level_1,Unnamed: 126_level_1,Unnamed: 127_level_1,Unnamed: 128_level_1,Unnamed: 129_level_1,Unnamed: 130_level_1,Unnamed: 131_level_1,Unnamed: 132_level_1,Unnamed: 133_level_1,Unnamed: 134_level_1,Unnamed: 135_level_1,Unnamed: 136_level_1,Unnamed: 137_level_1,Unnamed: 138_level_1,Unnamed: 139_level_1,Unnamed: 140_level_1,Unnamed: 141_level_1
472,Deacon Tien Nguyễn,nguyen tien nam,Permanent Deacon,Active,,tnguyen,4a6b6c15fa69d4704f292690f0071beaf680e7e97ba36b...,Yes,,473,Deacon,Deacon,Tien,,Nam,Nguyễn,,Our Lady of Lavang Parish,11731 SE Stevens Rd,Happy Valley,OR,,97086.0,,14481 SE 155th Dr,,Clackamas,OR,,97015.0,,,,916-242-8458,916-792-6488,,,,tienthinh77@yahoo.com,,Yes,No,No,No,0,,Yes,1955-03-17,"Saigon, Viet Nam",Yes,John Nguyên Van Khan,Mary Nguyên Thi Nhai,,,,,1955-03-17,"Bac Ha, Cu Chi, Viet Nam",,,,,1977-11-19,,,,,2001-09-23,2003-09-29,,,,,,,,,,,,,,,,,,,,,Bachelor’s degree,2004-06-05,"Holy Trinity, El Dorado Hills, CA",Most Rev. William K. Weigand,,,,,,,Diocese of Sacramento,,,Diocese of Sacramento,Archdiocese of Portland in Oregon,,,,,Diaconal,2016-09-01,,,2023-05-14,,,2023-01-08,2022-09-15,2022-09-15,2023-10-06,,,,,Vietnamese,,,,,,,0,Asian/Pacific Islander,,"Parish Baptism Prep, Home Visitation",No,0,,,,,
794,"Rev. Peter Do, OP",do peter,"Priest,Religious",Active,Active,pdo,3174c6a0e2f5cc156fc59bf7f5f3031049d5e42296b6d9...,No,,0,Rev.,Fr.,Peter,,,Do,,Holy Rosary Church and Priory,375 NE Clackamas St,Portland,OR,,97232.0,,,,,,,,,,458-215-4212,,541-735-5782,,,pdo@archdpdx.org,peterdo95@gmail.com,Home,Yes,No,No,No,0,,Yes,1977-02-27,Vietnam (Bien Hoa),Yes,,,,,,,,,,,,,,,2002-09-06,2009-05-30,,,,,,,,,2006.0,BS Biochemistry,"Portland State University, Portland, OR",2001.0,MS Chemistry,"University of Utah, Salt Lake City, UT",2009.0,M. Div,"Dominican School of Philosophy and Theology, B...",,,,,,,,2008-01-12,Basilica of the National Shrine of the Immacul...,,2009-05-30,"St. Dominic Church, San Francisco, CA","Most Rev. Thomas C. Kelly, OP",,,,Western Dominican Province,,,Western Dominican Province,Western Dominican Province,,,2022-08-24,,General,2022-09-01,,,,,2022-09-13,2022-12-15,2015-09-16,2023-12-18,2023-07-07,,,,,,,,,,,,0,,,,,18,Holy Rosary Priory and Parish,,Pastoral Ministry,Full Time,"Prior, Pastor"
3221,Ms. Sarah Montague,montague sarah,Staff,,,,,,,0,Ms.,Ms.,Sarah,,,Montague,,St. Rose of Lima Parish,2727 NE 54th Ave,Portland,OR,,97213.0,,,,,,,,,,503-281-5318 x205,,,,smontague@strosepdx.org,,,,,,,,0,,Yes,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,0,,,,,0,,,,,
894,Rev. Maximo Stöck,stock maximo,"Priest,Religious",Transferred Out,Transferred Out,mstock,a288810085ae4fd49e7f69ef0825fb5a7f5831b57c8e5c...,Yes,,0,Rev.,Fr.,Maximo,,,Stöck,,OSU Newman Center,2127 NW Monroe Ave,Corvallis,OR,,97330.0,,,,,,,,,,541-752-6818,541-757-1988,,,director@osunewman.org,,mstock@socsj.org,,Yes,No,No,No,0,,Yes,1984-06-25,Argentina,Yes,,,Argentina,Permanent Resident,,,,,,,,,,,2012-07-14,,,,,,,,,,,,,,,,,,,,,,,,,,,,,2012-07-14,"Cordoba, Argentina",Bishop Olivera,,,,Society of Saint John,,,Saint John Society,Saint John Society,,,2020-05-28,,Confessional,2017-10-16,,,,,,2022-05-10,2018-01-09,2021-01-12,2023-09-01,2024-01-01,,,,,,,,,,,0,,,,,26,"OSU Newman Center, Corvallis",,Pastoral Ministry,Full Time,Director
1285,Rev. Angel Perez,perez angel,Priest,Faculties Withdrawn,,,,,,0,Rev.,Fr.,Angel,,,Perez,,Agustin Yanez #71,Cocula Centro,"Cocula, Jalisco",OR,,48500.0,MEXICO,,,,,,,,,,,,,,,,,No,No,No,Yes,0,,No,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,1975-08-09,San Luis Potosi,Most Rev. Ezequiel Perea Sánchez,,,,Archdiocese of San Luis Potosi,Archdiocese of San Luis Potosi,2004-05-02,Archdiocese of Portland in Oregon,,,,,,,,,2015-04-01,,,,,,,,,,,,,,,,,,,0,,,,,0,,,,,
182,Deacon Francis Potts,potts francis douglas francis,Permanent Deacon,Active,,fpotts,3f2898500d0c3aaaeea7d056ba7333d86357feca458949...,Yes,,222,Deacon,Deacon,Ernest,Francis,Douglas Francis,Potts,,,,,,,,,3402 SW Knollbrook Ave,,Corvallis,OR,,97333.0,,,,541-754-8374,541-602-2476,,,,fpottola3402@gmail.com,,Yes,No,No,No,0,,Yes,1951-02-14,Chicago IL,No,"Ernest D. Potts, Sr.",Barbara Jean Ryan,,,,,1951-03-11,"St. Edmund, Oak Park IL",,,,,1982-08-07,,,,,1977-10-12,1977-10-12,1993-04-17,,,,,,,,1981.0,Master of Theological Studies,"Dominican School of Philosophy and Theology, B...",,,,,,,,,,"Graduate degree in religious studies, theology...",1993-11-05,"Cathedral of the Immaculate Conception, Portla...",Most Rev. William J. Levada,,,,,,,Archdiocese of Portland in Oregon,,,Archdiocese of Portland in Oregon,Archdiocese of Portland in Oregon,,,,,Diaconal,1993-11-05,,,2023-11-22,,2023-08-01,2023-08-02,2023-08-01,2021-03-24,2023-08-01,,,,,,,,,,,,0,Caucasian/white,,,No,0,,,,,
1971,Ms. Ann Barba,barba ann,Staff,,,,,,,0,Ms.,Ms.,Ann,,,Barba,,Our Lady of the Lake Parish,650 A Ave,Lake Oswego,OR,,97034.0,,,,,,,,,,503-636-7687,,,,annb@ollparish.com,,,,,,,,0,,Yes,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,0,,,,,0,,,,,
2235,Ms. Naomi Arreguin,arreguin naomi,Staff,,,,,,,0,Ms.,Ms.,Naomi,,,Arreguin,,St. Joseph Parish,721 Chemeketa St NE,Salem,OR,,97301.0,,,,,,,,,,503-581-1623 x108,,,,naomi@stjosephchurch.com,,,,,,,,0,,Yes,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,0,,,,,0,,,,,
2807,Mrs. Erin Springer,springer erin marie,Wife,,,espringer,30ae2c39481de44f92282009ae034806123a42c1de9759...,No,,2806,Mrs.,Mrs.,Erin,,Marie,Springer,,,,,,,,,32266 Cedar Valley Rd,,Gold Beach,OR,,97444.0,,,,541-247-4798,541-373-3850,,,,emspringer@gmail.com,,,,,,0,,Yes,,,,,,,,,,,,,,,,1997-02-15,"St. Francis de Sales, Las Vegas, NV",,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,0,,,,,0,,,,,
1118,"Rev. J. Patrick Hurley, SJ",hurley j patrick,"Priest,Religious",Transferred Out,Transferred Out,,,,,0,Rev.,Fr.,J. Patrick,,,Hurley,,,,,,,,,,,,,,,,,,,,,,,,,No,No,No,No,0,,Yes,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,1900-01-01,,,,,,,,,,,0,,,,,60,,,,,


#### Get Photos


In [352]:
import os
import pandas as pd

# def list_jpeg_files(directory):
#     data = []
#     for filename in os.listdir(directory):
#         if filename.endswith(".jpeg") or filename.endswith(".jpg"):  # Checking for jpeg files
#             full_path = os.path.join(directory, filename)
#             data.append({'Filename': filename, 'Full Path': full_path})
#     return pd.DataFrame(data)

# # Specify your directory
# directory = '/content/drive/Shareddrives/Clients/ADPDX (Portland)/Data/Clergy DB/sql_backup/archdpdx.info backups/public_html/people/graphics/portraits/large'
# jpeg_files_df = list_jpeg_files(directory)


In [353]:
# # Query for the Library
# library_query = "SELECT Id, Name FROM ContentWorkspace WHERE Name = 'ADPDX Person Profile Photos'"
# library_result = sf.query(library_query)

# # Check if the library exists and get its ID
# if library_result['records']:
#     library_id = library_result['records'][0]['Id']
#     print(f"Library ID: {library_id}")

#     # Query for the Folder within the Library
#     folder_query = f"SELECT Id, Name FROM ContentFolder WHERE ParentContentFolderId = '{library_id}'"
#     folder_result = sf.query(folder_query)

#     # Check if the folder exists and get its ID
#     if folder_result['records']:
#         folder_id = folder_result['records'][0]['Id']
#         print(f"Folder ID: {folder_id}")
#     else:
#         print("Folder 'Large JPEGs' not found in the library.")
# else:
#     print("Library 'ADPDX Person Profile Photos' not found.")

## Analysis

Here we check the various columns and their types, count where values exist, count of unique values, sample data, etc.

DF shape:

- 142 columns
- 3017 rows


In [354]:
# Check the original shape of the imported CSV
print(f"Shape of original data set: {df_contacts.shape}")

# export to csv a list of the contact fields with count, unique, top, freq
contacts_describe = df_contacts.describe(include='all').transpose()
contacts_describe.to_csv(f'/Users/matthewmartin/Library/CloudStorage/GoogleDrive-matt@meribahflow.com/Shared drives/Clients/ADPDX (Portland)/Data/Clergy DB/analysis/contacts_describe.csv')

df_contacts.describe(include='all').transpose()  #initial analysis of the Contacts table

Shape of original data set: (3016, 141)


Unnamed: 0,count,unique,top,freq
Common_Name,3016,3011,Ms. Leslie Jones,2
Sort_Name,3016,3009,nguyen anthony,3
Type(s),3016,29,Staff,1139
Clergy_Status,1138,8,Transferred Out,462
Religious_Status,902,4,Active,456
...,...,...,...,...
Place_of_Work,269,133,Mount Angel Abbey,37
Volunteer_Place,54,47,Mary’s Woods,4
Type_of_Work,276,117,Pastoral Ministry,30
Work_Load,262,2,Full Time,230


In [355]:
unique_languages = df_contacts['Languages'].unique()
unique_languages

array([nan, 'English,Spanish', 'Igbo', 'English, Spanish',
       'Spanish, Mayaqeqchi', 'Spanish (Mass only)',
       'Latin Mass and written translation. Read French, Italian, Spanish.',
       'Spanish', 'Hindi, Konkani, Tamil',
       'French (fluent), Spanish (beginner), Latin (beginner)',
       'German, Spanish, Italian, French', 'Kiswahili, Kichagga',
       'Spanish (English is second language)',
       'German, Spanish, Italian, Latin Mass',
       'English, Spanish, Italian', 'Spanish, Italian', 'English',
       'Bicolango, Tagalog, Spanish', 'Spanish, Italian, Latin Mass',
       'Italian', 'Tagalog, English, Spanish',
       'French, Italian, Aramaic (modern), Spanish', 'Vietnamese',
       'German, Spanish', 'English,Spanish,Italian',
       'Conversant in Italian and Spanish, some facility with Latin and German',
       'English, Spanish, Latin Mass', 'Italian, Spanish',
       'Konkani, Hindi, Marathi, Spanish',
       'Tagalog, Bicol, Spanish (Mass only)', 'Spanish, E

In [356]:
# import re
# import numpy as np


# def deduplicate_languages(list_languages):
#     # Define a regular expression pattern to match periods and punctuation
#     punctuation_pattern = r'[.,!?;:"]'

#     # Flatten the array and filter out NaN values
#     flattened_languages = [re.sub(punctuation_pattern, '', lang) for sublist in list_languages if pd.notna(sublist) for lang in sublist.split(',')]

#     # Deduplicate the list of languages
#     unique_languages = list(set(flattened_languages))

#     return unique_languages


# # Example usage:
# unique_languages = deduplicate_languages(unique_languages)
# print(unique_languages)


## Transform


In [357]:
# list of columns NOT to be migrated as Contact attributes
misc_columns_to_drop = [
    'Password',
    'Password_Must_be_Changed',
    'Sort_Name'
]

affiliation_columns = [
    'Baptism_Date',
    'Place_of_Baptism',
    'Confirmation_Date',
    'Place_of_Confirmation',
    'Received_Date',
    'Parish_of_Record',
    'Marriage_Date',
    'Place_of_Marriage',
    'Date_of_First_Vows',
    'Date_of_Final_Vows',
    'Reader_Date',
    'Acolyte_Date',
    'Bachelor_Degree_Year',
    'Bachelor_Degree_Type',
    'Bachelor_Degree_Institution',
    'Graduate_1_Degree_Institution',
    'Graduate_1_Degree_Type',
    'Graduate_1_Degree_Year',
    'Graduate_2_Degree_Institution',
    'Graduate_2_Degree_Type',
    'Graduate_2_Degree_Year',
    'Graduate_3_Degree_Institution',
    'Graduate_3_Degree_Type',
    'Graduate_3_Degree_Year',
    'Graduate_4_Degree_Institution',
    'Graduate_4_Degree_Type',
    'Graduate_4_Degree_Year',
    'Diaconal_Ordination_Date',
    'Diaconal_Ordination_Place',
    'Diaconal_Ordination_Prelate',
    'Presbyteral_Ordination_Date',
    'Presbyteral_Ordination_Place',
    'Presbyteral_Ordination_Prelate',
    'Episcopal_Ordination_Date',
    'Episcopal_Ordination_Place',
    'Episcopal_Ordination_Prelate',
    'Incardinated_From_Date',
    'Incardinated_From_Diocese',
    'Excardinated_To_Diocese',
    'Excardinated_To_Date',
    'Faculties',
    'Faculties_Granted_Date',
    'Faculties_Restricted_Date',
    'Faculties_Withdrawn_Date',
]

# These fields need to be KEPT but while building the SF upsert flow these are dropped temporarily until mapping logic is included.
# TODO

fields_not_yet_mapped = [
    'Common_Name',
    'Spouse',
    'Father_Full_Name',
    'Mother_Full_Maiden_Name',
    'Mailing_Address_Province',
    'Private_Address_Province',
    # 'Preferred_Address',
    # 'Private_Address__Street__s',
    # 'Private_Address_2',
    # 'Private_Address__City__s',
    # 'Private_Address__StateCode__s',
    # 'Private_Address__PostalCode__s',
    # 'Private_Address__CountryCode__s',
    'Preferred_Email',
    'Preferred_Phone',
    'Social_Security_Account_Number__c',  # The data is encrypted
    'Serving_Now',
    'Ordination_Diocese',
    'Registered_Parish'

]

In [358]:
# UDF to combine multiple Mailing Street Address lines into one
def combine_addresses(row, *columns):
    address_parts = []
    for col in columns:
        value = row[col]
        if pd.notnull(value):  # Check for non-null values
            address_parts.append(str(value))  # Convert to string
    return '\n'.join(address_parts)  # '\n' for line break

In [359]:
df_contact_staging = (df_contacts
                      .drop(columns='Salutation')
                      .rename(columns={
                          'Clergy_Status' : 'ADPDX_Clergy_Status__c',
                          'Religious_Status' : 'ADPDX_Religious_Status__c',
                          'Login_ID' : 'ADPDX_Login_ID__c',
                          'Access_Permission': 'ADPDX_Access_Permission__c',
                          'Title': 'Salutation',
                          'Christian_Name': 'FirstName',
                          'Middle_Name(s)': 'MiddleName',
                          'Surname': 'LastName',
                          'Suffix': 'Suffix',
                          'Preferred_Address': 'Preferred_Address__c',
                          'Mailing_Address_City': 'MailingCity',
                          'Mailing_Address_State': 'MailingState',
                          'Mailing_Address_Postal_Code': 'MailingPostalCode',
                          'Mailing_Address_Country': 'MailingCountry',
                          'Private_Address_City': 'OtherCity',
                          'Private_Address_State': 'OtherState',
                          'Private_Address_Postal_Code': 'OtherPostalCode',
                          'Private_Address_Country': 'OtherCountry',
                          'Work_Phone': 'npe01__WorkPhone__c',
                          'Home_Phone': 'HomePhone',
                          'Cell_Phone': 'MobilePhone',
                        #   'Preferred_Phone': 'npe01__PreferredPhone__c',
                          # IF Preferred phone contains, 'do not publish'
                          'Work_Email' : 'npe01__WorkEmail__c',
                          'Archdiocesan_Email': 'npe01__AlternateEmail__c',
                          'Home_Email': 'npe01__HomeEmail__c',
                        #   'Preferred_Email': 'npe01__Preferred_Email__c',
                          # IF Preferred email contains 'do not publish''
                          'Directory_Include': 'Directory_Include__c',
                          'Directory_Include_Middle_Name': 'Directory_Include_Middle_Name__c',
                          'Directory_Include_Suffix': 'Directory_Include_Suffix__c',
                          'Suppress_From_Reports': 'Suppress_From_Reports__c',
                          'Send_Group_Mail_and_Email': 'Send_Group_Mail_and_Email__c',
                          'Birth_Date': 'Birthdate',
                          'Place_of_Birth': 'mbfc__Place_of_Birth__c',
                          'Foreign_Born': 'Foreign_Born__c',
                          'Foreign_Citizenship': 'Foreign_Citizenship__c',
                          'Immigration_Status': 'Immigration_Status__c',
                          'Passport/Visa_Expiration_Date': 'Passport_Visa_Expiration_Date__c',
                          'Social_Security_Account_Number': 'Social_Security_Account_Number__c',
                          'Deceased_Date': 'mbfc__Date_of_Death__c',
                          'Out_of_Diocese_Date': 'mbfc__Date_Left_Diocese__c', 
                          'CARA_Ethnicity': 'adpdx_CARA_Ethnicity__c',
                          'Seminarian_Status': 'adpdx_Seminarian_Status__c',
                          'Other_Diaconal_Ministry': 'adpdx_Other_Diaconal_Ministry__c',
                          'Spiritual_Director_Authorized': 'adpdx_Spiritual_Director_Authorized__c',
                          'Place_of_Work': 'adpdx_Place_of_Work__c',
                          'Volunteer_Place': 'adpdx_Volunteer_Place__c',
                          'Type_of_Work': 'adpdx_Type_of_Work__c',
                          'Work_Load': 'adpdx_Work_Load__c',
                          'Work_Title': 'adpdx_Work_Title__c',
                          'Coverage_Availability': 'adpdx_Coverage_Availability__c', 
                          'Advanced_Directive_Date': 'adpdx_Advanced_Directive_Date__c',
                          'End_of_Life_Plan_Date': 'adpdx_End_of_Life_Plan_Date__c',
                          'Will_Date': 'adpdx_Will_Date__c',
                          'Will_Note': 'adpdx_Will_Note__c',
                          'CIC_489_File': 'adpdx_CIC_489_File__c',
                          'Senior_Status_Date': 'adpdx_Senior_Status_Date__c', 
                          'Laicized_Date': 'adpdx_Laicized_Date__c',
                          'Seminarian_Student_Debt': 'adpdx_Seminarian_Student_Debt__c',
                          'Seminarian_Medical_Benefits': 'adpdx_Seminarian_Medical_Benefits__c',
                          'Candidacy_Date': 'adpdx_Candidacy_Date__c',
                          'Accepted_to_Formation_Date': 'adpdx_Accepted_to_Formation_Date__c',
                          'Formation_Withdrawn_Date': 'adpdx_Formation_Withdrawn_Date__c',
                          'Formation_Deferred_Date': 'adpdx_Formation_Deferred_Date__c',
                          'Formation_Terminated_Date': 'adpdx_Formation_Terminated_Date__c',
                          'Terminate_or_Defer_Note': 'adpdx_Terminate_or_Defer_Note__c',
                          'CARA_Highest_Ed_Level': 'adpdx_CARA_Highest_Ed_Level__c',
                          'Letter_of_Good_Standing_Date': 'adpdx_Letter_of_Good_Standing__c',
                          'Religious_In_Archdiocese_Date': 'mbfc__Date_of_Arrival_in_Diocese__c',
                          'Last_Retreat_Date': 'adpdx_Last_Retreat_Date__c',
                          'Last_Educ_Requirement_Date': 'adpdx_Last_Educ_Requirement_Date__c',
                          'Policy_Manual_Acknowledgement_Date': 'adpdx_Policy_Manual_Acknowledgement_Date__c',
                          'Harassment_Prevention_Course_Date': 'adpdx_Harassment_Prevention_Course_Date__c',
                          'Standards_of_Conduct_Date': 'adpdx_Standards_of_Conduct_Date__c',
                          'Last_Background_Check_Date': 'adpdx_Last_Background_Check_Date__c',
                          'Last_Child_Protection_Training_Date': 'adpdx_Last_Child_Protection_Training__c',
                          'Languages': 'Languages__c',
                          'Nickname': 'adpdx_Preferred_Name__c'

                          })
                      .assign(Bi_Ritual__c=lambda x: x['Type(s)'].str.contains('Biritual'))
                      .assign(Non_Latin_Rite__c=lambda x: x['Type(s)'].str.contains('Non-Latin Rite'))
                      .assign(adpdx_Discerner_Aspirant_for_Diaconate__c=lambda x: x['Type(s)'].str.contains('Diaconate'))
                      .assign(adpdx_Is_Seminarian__c=lambda x: x['Type(s)'].str.contains('Seminar'))
                      
                      .assign(Archdpdx_Migration_Id__c=lambda x: x.index)
                      .assign(MailingStreet=lambda x: x.apply(lambda row: combine_addresses(row, 'Mailing_Address', 'Mailing_Address_2'), axis=1))
                      .drop(columns=['Mailing_Address', 'Mailing_Address_2'])  # Optional: Drop original columns if not needed
                      .assign(OtherStreet=lambda x: x.apply(lambda row: combine_addresses(row, 'Private_Address', 'Private_Address_2'), axis=1))
                      .drop(columns=['Private_Address', 'Private_Address_2'])  # Optional: Drop original columns if not needed
                      .drop(columns=misc_columns_to_drop)
                      .drop(columns=affiliation_columns)
                      .drop(columns=fields_not_yet_mapped)

        )


In [360]:
df_contact_staging.columns

Index(['Type(s)', 'ADPDX_Clergy_Status__c', 'ADPDX_Religious_Status__c',
       'ADPDX_Login_ID__c', 'ADPDX_Access_Permission__c', 'Salutation',
       'FirstName', 'adpdx_Preferred_Name__c', 'MiddleName', 'LastName',
       'Suffix', 'MailingCity', 'MailingState', 'MailingPostalCode',
       'MailingCountry', 'OtherCity', 'OtherState', 'OtherPostalCode',
       'OtherCountry', 'Preferred_Address__c', 'npe01__WorkPhone__c',
       'HomePhone', 'MobilePhone', 'npe01__WorkEmail__c',
       'npe01__AlternateEmail__c', 'npe01__HomeEmail__c',
       'Directory_Include__c', 'Directory_Include_Middle_Name__c',
       'Directory_Include_Suffix__c', 'Suppress_From_Reports__c',
       'adpdx_Seminarian_Student_Debt__c',
       'adpdx_Seminarian_Medical_Benefits__c', 'Send_Group_Mail_and_Email__c',
       'Birthdate', 'mbfc__Place_of_Birth__c', 'Foreign_Born__c',
       'Foreign_Citizenship__c', 'Immigration_Status__c',
       'Passport_Visa_Expiration_Date__c',
       'adpdx_Accepted_to_Formatio

In [361]:
df_contact_staging.MailingStreet.sample(10)

Record Number
805     Immaculate Heart of Mary Parish\n2926 N Willia...
1028                                                     
1594        St. Mary’s Cathedral Parish\n1716 NW Davis St
1979            St. Catherine of Siena Parish\nPO Box 277
1863                   St. Clare Parish\n8535 SW 19th Ave
2548            St. Thomas More Parish\n3525 SW Patton Rd
2235               St. Joseph Parish\n721 Chemeketa St NE
2666                                                     
1746                   St. Edward Parish\n5303 River Rd N
679     Pastoral Center, Clergy Office\n2838 E Burnsid...
Name: MailingStreet, dtype: object

### Languages

In [362]:
# # Define a function to clean the 'languages' column

# import re
# def clean_languages(text):
#     if pd.isna(text):
#         return text
#     # Remove text inside parentheses
#     text = re.sub(r'\(.*?\)', '', text)
#     # Replace ' & ' or ' and ' with ';'
#     text = re.sub(r' & | and ', ';', text)
#     # Replace commas with semicolons
#     text = text.replace(',', ';')
#     # Remove spaces before and after semicolons
#     text = re.sub(r'\s*;\s*', ';', text)
#     return text.strip(';')

# # Apply the cleaning function to the 'languages' column
# df_contact_staging['Languages__c'] = df_contact_staging['Languages__c'].apply(clean_languages)


### Private Address Handling


In [363]:
# If 'OtherStreet' is not null, then set Secondary Address Type to 'Private'.  This is because the 'OtherAddress' fields all come from the 'Private' address fields in source system. 
df_contact_staging['npe01__Secondary_Address_Type__c'] = df_contact_staging['OtherStreet'].apply(lambda x: 'Private' if pd.notnull(x) else None)


### Handle Boolean Fields


In [364]:
boolean_columns_to_convert = ['Foreign_Born__c', 'Directory_Include__c', 'Directory_Include_Middle_Name__c', 'Directory_Include_Suffix__c',
       'Suppress_From_Reports__c', 'Send_Group_Mail_and_Email__c', ]

df_contact_staging[boolean_columns_to_convert] = df_contact_staging[boolean_columns_to_convert].replace({'Yes': True, 'No': False})


In [365]:
df_contact_staging[boolean_columns_to_convert] = df_contact_staging[boolean_columns_to_convert].fillna(False)

df_contact_staging[boolean_columns_to_convert].sample(5)

Unnamed: 0_level_0,Foreign_Born__c,Directory_Include__c,Directory_Include_Middle_Name__c,Directory_Include_Suffix__c,Suppress_From_Reports__c,Send_Group_Mail_and_Email__c
Record Number,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
2287,False,False,False,False,False,True
1559,False,False,False,False,False,False
1245,False,False,False,False,False,True
482,False,False,False,False,False,False
606,False,False,False,False,False,False


### Set Contact Record Type


In [366]:
# Set Record Type

# Go down row by row and check the 'Type(s)' columns, check for certain words that are keys in a dictionary, and
# the that row's 'Type(s)' field contains a string that is in the a key in a dictionary the update another columns
# called 'ContactRecordType' with the paired value.

contact_type_map = {
    'Bishop': 'Priest',
    'Priest': 'Priest',
    'Transitional Deacon': 'Permanent_Deacon',
    'Permanent Deacon': 'Permanent_Deacon',
    'Seminarian': 'Lay_Person',
    'Diaconate Formation': 'Lay_Person',
    'Seminary Applicant': 'Lay_Person',
    'Diaconate Inquirer': 'Lay_Person',
    'Wife': 'Lay_Person',
    'Religious': 'Religious',
    'Staff': 'Lay_Person',
    'Seminary Applicant': 'Lay_Person',
    'Archive': 'Lay_Person'
}

def update_contact_record_type(row):
    for key, value in contact_type_map.items():
        if key in row['Type(s)']:
            return value
    return None

df_contact_staging['ContactRecordType'] = df_contact_staging.apply(update_contact_record_type, axis=1)

In [367]:
# Map in the RecordTypeIDs
df_contact_staging['RecordTypeID'] = df_contact_staging['ContactRecordType'].map(record_types_mapping)

### Ecclesial Status & Ministerial Status


In [368]:
df_contact_staging

Unnamed: 0_level_0,Type(s),ADPDX_Clergy_Status__c,ADPDX_Religious_Status__c,ADPDX_Login_ID__c,ADPDX_Access_Permission__c,Salutation,FirstName,adpdx_Preferred_Name__c,MiddleName,LastName,Suffix,MailingCity,MailingState,MailingPostalCode,MailingCountry,OtherCity,OtherState,OtherPostalCode,OtherCountry,Preferred_Address__c,npe01__WorkPhone__c,HomePhone,MobilePhone,npe01__WorkEmail__c,npe01__AlternateEmail__c,npe01__HomeEmail__c,Directory_Include__c,Directory_Include_Middle_Name__c,Directory_Include_Suffix__c,Suppress_From_Reports__c,adpdx_Seminarian_Student_Debt__c,adpdx_Seminarian_Medical_Benefits__c,Send_Group_Mail_and_Email__c,Birthdate,mbfc__Place_of_Birth__c,Foreign_Born__c,Foreign_Citizenship__c,Immigration_Status__c,Passport_Visa_Expiration_Date__c,adpdx_Accepted_to_Formation_Date__c,adpdx_Candidacy_Date__c,adpdx_Formation_Withdrawn_Date__c,adpdx_Formation_Deferred_Date__c,adpdx_Formation_Terminated_Date__c,adpdx_Terminate_or_Defer_Note__c,adpdx_CARA_Highest_Ed_Level__c,Incardinated_Now,adpdx_Letter_of_Good_Standing__c,mbfc__Date_of_Arrival_in_Diocese__c,adpdx_Last_Retreat_Date__c,adpdx_Last_Educ_Requirement_Date__c,adpdx_Policy_Manual_Acknowledgement_Date__c,adpdx_Harassment_Prevention_Course_Date__c,adpdx_Standards_of_Conduct_Date__c,adpdx_Last_Background_Check_Date__c,adpdx_Last_Child_Protection_Training__c,mbfc__Date_Left_Diocese__c,adpdx_Senior_Status_Date__c,adpdx_Laicized_Date__c,mbfc__Date_of_Death__c,Languages__c,adpdx_Coverage_Availability__c,adpdx_Advanced_Directive_Date__c,adpdx_End_of_Life_Plan_Date__c,adpdx_Will_Date__c,adpdx_Will_Note__c,adpdx_CIC_489_File__c,adpdx_CARA_Ethnicity__c,adpdx_Seminarian_Status__c,adpdx_Other_Diaconal_Ministry__c,adpdx_Spiritual_Director_Authorized__c,Link_to_Religious_Community,adpdx_Place_of_Work__c,adpdx_Volunteer_Place__c,adpdx_Type_of_Work__c,adpdx_Work_Load__c,adpdx_Work_Title__c,Bi_Ritual__c,Non_Latin_Rite__c,adpdx_Discerner_Aspirant_for_Diaconate__c,adpdx_Is_Seminarian__c,Archdpdx_Migration_Id__c,MailingStreet,OtherStreet,npe01__Secondary_Address_Type__c,ContactRecordType,RecordTypeID
Record Number,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1,Unnamed: 22_level_1,Unnamed: 23_level_1,Unnamed: 24_level_1,Unnamed: 25_level_1,Unnamed: 26_level_1,Unnamed: 27_level_1,Unnamed: 28_level_1,Unnamed: 29_level_1,Unnamed: 30_level_1,Unnamed: 31_level_1,Unnamed: 32_level_1,Unnamed: 33_level_1,Unnamed: 34_level_1,Unnamed: 35_level_1,Unnamed: 36_level_1,Unnamed: 37_level_1,Unnamed: 38_level_1,Unnamed: 39_level_1,Unnamed: 40_level_1,Unnamed: 41_level_1,Unnamed: 42_level_1,Unnamed: 43_level_1,Unnamed: 44_level_1,Unnamed: 45_level_1,Unnamed: 46_level_1,Unnamed: 47_level_1,Unnamed: 48_level_1,Unnamed: 49_level_1,Unnamed: 50_level_1,Unnamed: 51_level_1,Unnamed: 52_level_1,Unnamed: 53_level_1,Unnamed: 54_level_1,Unnamed: 55_level_1,Unnamed: 56_level_1,Unnamed: 57_level_1,Unnamed: 58_level_1,Unnamed: 59_level_1,Unnamed: 60_level_1,Unnamed: 61_level_1,Unnamed: 62_level_1,Unnamed: 63_level_1,Unnamed: 64_level_1,Unnamed: 65_level_1,Unnamed: 66_level_1,Unnamed: 67_level_1,Unnamed: 68_level_1,Unnamed: 69_level_1,Unnamed: 70_level_1,Unnamed: 71_level_1,Unnamed: 72_level_1,Unnamed: 73_level_1,Unnamed: 74_level_1,Unnamed: 75_level_1,Unnamed: 76_level_1,Unnamed: 77_level_1,Unnamed: 78_level_1,Unnamed: 79_level_1,Unnamed: 80_level_1,Unnamed: 81_level_1,Unnamed: 82_level_1,Unnamed: 83_level_1,Unnamed: 84_level_1,Unnamed: 85_level_1,Unnamed: 86_level_1,Unnamed: 87_level_1
2766,Priest,Transferred Out,,sabukaka,,Rev.,Stephen,,Ozovehe,Abaukaka,,Tualatin,OR,97062,,Portland,OR,97202,,Mailing,503-430-7699,,773-733-3772,,,abstoz@yahoo.com,True,False,False,False,0,,True,1967-06-07,,False,,,,,,,,,,,"Diocese of Lokoja, Nigeria",,,,,,2022-05-30,2021-11-03,2021-11-04,2022-11-24,2023-01-16,,,,,,,,,,,,,,,0,,,,,,False,False,False,False,2766,Brighton Hospice Office\n8050 SW Warm Springs ...,5802 SW Milwaukie Ave Apt 4,Private,Priest,012Dx0000003p5JIAQ
2337,Staff,,,,,Mr.,Rogelio,,,Acevedo,,Portland,OR,97229,,,,,,,503-644-5264,,,facilities@stpius.org,,,False,False,False,False,0,,True,,,False,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,0,,,,,,False,False,False,False,2337,St. Pius X Parish\n1280 NW Saltzman Rd,,Private,Lay_Person,012Dx0000003p5HIAQ
3244,Staff,,,,,Mr.,Sean,,,Ackroyd,,Corvallis,OR,97330,,,,,,,541-757-1988,,,,,,False,False,False,False,0,,True,,,False,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,0,,,,,,False,False,False,False,3244,St. Mary Parish\n501 NW 25th St,,Private,Lay_Person,012Dx0000003p5HIAQ
3295,Staff,,,,,Ms.,Sherril,,,Acton,,Eugene,OR,97401,,,,,,,541-686-2234 x1524,,,sacton@marisths.org,,,False,False,False,False,0,,True,,,False,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,0,,,,,,False,False,False,False,3295,Marist Catholic High School\n1900 Kingsley Rd,,Private,Lay_Person,012Dx0000003p5HIAQ
2164,Staff,,,,,Ms.,Barbara,,,Adams,,Gresham,OR,97030,,,,,,,503-665-9129,,,adamsby@eou.edu,,,False,False,False,False,0,,True,,,False,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,0,,,,,,False,False,False,False,2164,St. Henry Parish\n346 NW 1st St,,Private,Lay_Person,012Dx0000003p5HIAQ
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
1670,Staff,,,,,Ms.,Jenny,,,Zomerdyk,,Central Point,OR,97502,,,,,,,541-664-1050,,,churchoffice@shepherdcatholic.com,,,False,False,False,False,0,,True,,,False,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,0,,,,,,False,False,False,False,1670,Shepherd of the Valley Parish\n600 Beebe Rd,,Private,Lay_Person,012Dx0000003p5HIAQ
2755,Religious,,Active,dzorrilla,,Br.,Daniel,,,Zorrilla,,Saint Benedict,OR,97373,,,,,,,503-845-1181,,,,,,False,False,False,False,0,,True,,,False,,,,,,,,,,,,,2021-08-01,,,,,,2019-06-28,2021-10-10,,,,,,,,,,,,,,,,14,,,,,,False,False,False,False,2755,Félix Rougier House of Studies\nPO Box 499,,Private,Religious,012Dx0000003p5KIAQ
1962,Staff,,,,,Ms.,Kim,,,Zuber,,Sublimity,OR,97385,,,,,,,503-769-5664,,,boniface@wvi.com,,,False,False,False,False,0,,True,,,False,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,0,,,,,,False,False,False,False,1962,St. Boniface Parish\n375 SE Church St,,Private,Lay_Person,012Dx0000003p5HIAQ
2202,Staff,,,,,Ms.,Agnes,,,Zueger,,Lake Oswego,OR,97034,,,,,,,503-636-7687,,,agnesz@ollparish.com,,,False,False,False,False,0,,True,,,False,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,0,,,,,,False,False,False,False,2202,Our Lady of the Lake Parish\n650 A Ave,,Private,Lay_Person,012Dx0000003p5HIAQ


In [369]:
def determine_ecclesial_status(df):
    def ecclesial_status(row):
        if pd.notna(row['ADPDX_Clergy_Status__c']) and 'Laicized' in row['ADPDX_Clergy_Status__c']:
            return 'Laicized'
        # elif pd.notna(row['ADPDX_Clergy_Status__c']) and 'Faculties Withdrawn' in row['ADPDX_Clergy_Status__c']:
        #     return 'Faculties Withdrawn'
        elif pd.notna(row['Type(s)']) and 'Bishop' in row['Type(s)']:
            return 'Bishop/Archbishop'
        elif pd.notna(row['Type(s)']) and 'Priest,Religious' in row['Type(s)']:
            return 'Priest - Religious'
        elif pd.notna(row['Type(s)']) and 'Priest' in row['Type(s)'] and (not pd.isna(row['Foreign_Citizenship__c']) or row['Incardinated_Now'] != 'Archdiocese of Portland in Oregon'):
            return 'Priest - Temporary Sojourn (Foreign)'
        elif pd.notna(row['Type(s)']) and 'Priest' in row['Type(s)'] and (pd.isna(row['Foreign_Citizenship__c']) and row['Incardinated_Now'] == 'Archdiocese of Portland in Oregon'):
            return 'Priest - Diocesan'
        elif pd.notna(row['Type(s)']) and row['Type(s)'] == 'Permanent Deacon':
            return 'Permanent Deacon'
        else:
            return None

    df['mbfc__Ecclesial_Status__c'] = df.apply(ecclesial_status, axis=1)
    return df


df_contact_staging = determine_ecclesial_status(df_contact_staging)

In [370]:
# This function is no longer used due to ADPDX's custom enhancement in which a Flow automatically updates this status. 

def determine_ministerial_status(df):
    def ministerial_status(row):
        if row['ADPDX_Clergy_Status__c'] == 'Deceased':
            return 'Deceased'
        elif row['ADPDX_Clergy_Status__c'] == 'Active':
            return 'Active in Ministry'
        elif row['ADPDX_Clergy_Status__c'] == 'Inactive':
            return 'Inactive'
        elif row['ADPDX_Clergy_Status__c'] == 'Senior Status':
            return 'Senior Status'
        elif row['ADPDX_Clergy_Status__c'] == 'Faculties Withdrawn':
            return 'Faculties Withdrawn'
        elif row['ADPDX_Clergy_Status__c'] == 'Transferred Out':
            return 'Left Diocese'
        elif row['ADPDX_Clergy_Status__c'] == 'Unassigned':
            return 'Unassigned'
        elif row['ADPDX_Clergy_Status__c'] == 'Laicized':
            return 'Laicized'
        else:
            return 'Unknown'
        
    df['mbfc__Ministerial_Status__c'] = df.apply(ministerial_status, axis=1)
    return df

# df_contact_staging = determine_ministerial_status(df_contact_staging)

### Religious Congregation
In this section, for those Contacts who have a value in the `Link to Religious Community` source field we need to populate the `mbfc__Religious_Order__c` target field in Salesforce with the correct Religious Community's parent account - the Religious Congregation.

NOTE: In the source data, there is no differentiation between a child Religious Community and a parent Religious Order, there is only one record for the Religious Comnmunity.  In MF360 we represent these Accounts separately so we need to first (a) get the Religious Community record using the `Link to Religious Community` value but transforming it (adding 'RelCommunities_' in front of the value) so it matches the Archdpdx_Migration_Id__c in Salesforce.  

Once acquired, (b) we need to get the value of the `ParentID` field on the Religious Community which is the ID of the Religious Congregation record.  That ID is the value we then want to populate in the `mbfc__Religious_Order__c` field. 

In [371]:
# get SF Account
get_all_accounts = 'Select Id, Name, RecordTypeId, Type, mbfc__Parish_Code__c, Job_Id__c, Archdpdx_Migration_Id__c, ParentID from Account WHERE Archdpdx_Migration_Id__c != null'

# get list of records, add to dataframe
sf_accounts = sf.query(get_all_accounts)
df_sf_accounts = pd.DataFrame(sf_accounts['records'])
df_sf_accounts = df_sf_accounts.drop(columns = 'attributes')

# create a dict in order to apply later
accounts_id_map = df_sf_accounts.set_index('Archdpdx_Migration_Id__c')['Id'].to_dict()

In [372]:
df_sf_accounts[df_sf_accounts['Archdpdx_Migration_Id__c'].str.contains('RelCommunities', na=False)]

Unnamed: 0,Id,Name,RecordTypeId,Type,mbfc__Parish_Code__c,Job_Id__c,Archdpdx_Migration_Id__c,ParentId
328,001Dx00001HwE4dIAF,"Colombiere Jesuit Community, Portland (SJ)",012Dx0000003p52IAA,,,102,RelCommunities_1,001Dx00001HwE3TIAV
329,001Dx00001HwE4eIAF,"Abbey of Our Lady of Guadalupe, Carlton (OCSO)",012Dx0000003p52IAA,,,102,RelCommunities_2,001Dx00001HwE3UIAV
330,001Dx00001HwE4fIAF,"JCCU Jesuit Tertianship, Portland (SJ)",012Dx0000003p52IAA,,,102,RelCommunities_3,001Dx00001HwE3TIAV
331,001Dx00001HwE4gIAF,"Benedictine Monks of Mount Angel Abbey, Saint ...",012Dx0000003p52IAA,,,102,RelCommunities_4,001Dx00001HwE3VIAV
332,001Dx00001HwE4hIAF,Missionaries of the Holy Spirit Provincial Hou...,012Dx0000003p52IAA,,,102,RelCommunities_8,001Dx00001HwE3WIAV
...,...,...,...,...,...,...,...,...
392,001Dx00001HwE5fIAF,"Society of the Divine Word, Techny, IL (SVD)",012Dx0000003p52IAA,,,102,RelCommunities_77,001Dx00001HwE4QIAV
393,001Dx00001HwE5gIAF,"Society of the Divine Saviour, Rome, Italy (SDS)",012Dx0000003p52IAA,,,102,RelCommunities_78,001Dx00001HwE4RIAV
394,001Dx00001HwE5hIAF,"Society of Our Lady of the Most Holy Trinity, ...",012Dx0000003p52IAA,,,102,RelCommunities_79,001Dx00001HwE4SIAV
395,001Dx00001HwE5iIAF,"Community of St. Thomas More, Eugene (OP)",012Dx0000003p52IAA,,,102,RelCommunities_80,001Dx00001HwE3dIAF


In [373]:
# applies a lambda function to each element in the ‘Link_to_Religious_Community’ column, prefixing the value with 'RelCommunities_'
def transform_religious_community_link(df):
    df['Link_to_Religious_Community'] = df['Link_to_Religious_Community'].apply(
        lambda x: None if x == '0' else f'RelCommunities_{x}'
    )
    return df

# This function searches for a record in the sf_accounts DataFrame where the ‘Archdpdx_Migration_Id__c’ column matches the given archdpdx_migration_id
def get_parent_id_from_salesforce(sf_accounts, archdpdx_migration_id):
    print(f"Searching for: {archdpdx_migration_id}")  # Debug print
    matching_record = sf_accounts[sf_accounts['Archdpdx_Migration_Id__c'] == archdpdx_migration_id]
    if not matching_record.empty:
        print(f"Found: {matching_record['ParentId'].values[0]}")  # Debug print
        return matching_record['ParentId'].values[0]
    print("Not found")  # Debug print
    return None

# uses the get_parent_id_from_salesforce function to find the ‘ParentId’ from the sf_accounts DataFrame
def update_religious_order(df, sf_accounts):
    df['mbfc__Religious_Order__c'] = df.apply(
        lambda row: get_parent_id_from_salesforce(sf_accounts, row['Link_to_Religious_Community']) 
        if row['Link_to_Religious_Community'] is not None else None, axis=1
    )
    return df


# run the transform_religious_community_link and update_religious_order functions
df_contact_staging = transform_religious_community_link(df_contact_staging)

df_contact_staging = update_religious_order(df_contact_staging, df_sf_accounts)

Searching for: RelCommunities_60
Found: 001Dx00001HwE3TIAV
Searching for: RelCommunities_53
Found: 001Dx00001HwE45IAF
Searching for: RelCommunities_9
Found: 001Dx00001HwE3XIAV
Searching for: RelCommunities_4
Found: 001Dx00001HwE3VIAV
Searching for: RelCommunities_8
Found: 001Dx00001HwE3WIAV
Searching for: RelCommunities_35
Found: 001Dx00001HwE3nIAF
Searching for: RelCommunities_1
Found: 001Dx00001HwE3TIAV
Searching for: RelCommunities_23
Not found
Searching for: RelCommunities_56
Found: 001Dx00001HwE48IAF
Searching for: RelCommunities_23
Not found
Searching for: RelCommunities_53
Found: 001Dx00001HwE45IAF
Searching for: RelCommunities_60
Found: 001Dx00001HwE3TIAV
Searching for: RelCommunities_1
Found: 001Dx00001HwE3TIAV
Searching for: RelCommunities_27
Found: 001Dx00001HwE3iIAF
Searching for: RelCommunities_44
Found: 001Dx00001HwE3wIAF
Searching for: RelCommunities_23
Not found
Searching for: RelCommunities_44
Found: 001Dx00001HwE3wIAF
Searching for: RelCommunities_60
Found: 001Dx00001

In [374]:
df_contact_staging[df_contact_staging.mbfc__Religious_Order__c.isna() == False]

Unnamed: 0_level_0,Type(s),ADPDX_Clergy_Status__c,ADPDX_Religious_Status__c,ADPDX_Login_ID__c,ADPDX_Access_Permission__c,Salutation,FirstName,adpdx_Preferred_Name__c,MiddleName,LastName,Suffix,MailingCity,MailingState,MailingPostalCode,MailingCountry,OtherCity,OtherState,OtherPostalCode,OtherCountry,Preferred_Address__c,npe01__WorkPhone__c,HomePhone,MobilePhone,npe01__WorkEmail__c,npe01__AlternateEmail__c,npe01__HomeEmail__c,Directory_Include__c,Directory_Include_Middle_Name__c,Directory_Include_Suffix__c,Suppress_From_Reports__c,adpdx_Seminarian_Student_Debt__c,adpdx_Seminarian_Medical_Benefits__c,Send_Group_Mail_and_Email__c,Birthdate,mbfc__Place_of_Birth__c,Foreign_Born__c,Foreign_Citizenship__c,Immigration_Status__c,Passport_Visa_Expiration_Date__c,adpdx_Accepted_to_Formation_Date__c,adpdx_Candidacy_Date__c,adpdx_Formation_Withdrawn_Date__c,adpdx_Formation_Deferred_Date__c,adpdx_Formation_Terminated_Date__c,adpdx_Terminate_or_Defer_Note__c,adpdx_CARA_Highest_Ed_Level__c,Incardinated_Now,adpdx_Letter_of_Good_Standing__c,mbfc__Date_of_Arrival_in_Diocese__c,adpdx_Last_Retreat_Date__c,adpdx_Last_Educ_Requirement_Date__c,adpdx_Policy_Manual_Acknowledgement_Date__c,adpdx_Harassment_Prevention_Course_Date__c,adpdx_Standards_of_Conduct_Date__c,adpdx_Last_Background_Check_Date__c,adpdx_Last_Child_Protection_Training__c,mbfc__Date_Left_Diocese__c,adpdx_Senior_Status_Date__c,adpdx_Laicized_Date__c,mbfc__Date_of_Death__c,Languages__c,adpdx_Coverage_Availability__c,adpdx_Advanced_Directive_Date__c,adpdx_End_of_Life_Plan_Date__c,adpdx_Will_Date__c,adpdx_Will_Note__c,adpdx_CIC_489_File__c,adpdx_CARA_Ethnicity__c,adpdx_Seminarian_Status__c,adpdx_Other_Diaconal_Ministry__c,adpdx_Spiritual_Director_Authorized__c,Link_to_Religious_Community,adpdx_Place_of_Work__c,adpdx_Volunteer_Place__c,adpdx_Type_of_Work__c,adpdx_Work_Load__c,adpdx_Work_Title__c,Bi_Ritual__c,Non_Latin_Rite__c,adpdx_Discerner_Aspirant_for_Diaconate__c,adpdx_Is_Seminarian__c,Archdpdx_Migration_Id__c,MailingStreet,OtherStreet,npe01__Secondary_Address_Type__c,ContactRecordType,RecordTypeID,mbfc__Ecclesial_Status__c,mbfc__Religious_Order__c
Record Number,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1,Unnamed: 22_level_1,Unnamed: 23_level_1,Unnamed: 24_level_1,Unnamed: 25_level_1,Unnamed: 26_level_1,Unnamed: 27_level_1,Unnamed: 28_level_1,Unnamed: 29_level_1,Unnamed: 30_level_1,Unnamed: 31_level_1,Unnamed: 32_level_1,Unnamed: 33_level_1,Unnamed: 34_level_1,Unnamed: 35_level_1,Unnamed: 36_level_1,Unnamed: 37_level_1,Unnamed: 38_level_1,Unnamed: 39_level_1,Unnamed: 40_level_1,Unnamed: 41_level_1,Unnamed: 42_level_1,Unnamed: 43_level_1,Unnamed: 44_level_1,Unnamed: 45_level_1,Unnamed: 46_level_1,Unnamed: 47_level_1,Unnamed: 48_level_1,Unnamed: 49_level_1,Unnamed: 50_level_1,Unnamed: 51_level_1,Unnamed: 52_level_1,Unnamed: 53_level_1,Unnamed: 54_level_1,Unnamed: 55_level_1,Unnamed: 56_level_1,Unnamed: 57_level_1,Unnamed: 58_level_1,Unnamed: 59_level_1,Unnamed: 60_level_1,Unnamed: 61_level_1,Unnamed: 62_level_1,Unnamed: 63_level_1,Unnamed: 64_level_1,Unnamed: 65_level_1,Unnamed: 66_level_1,Unnamed: 67_level_1,Unnamed: 68_level_1,Unnamed: 69_level_1,Unnamed: 70_level_1,Unnamed: 71_level_1,Unnamed: 72_level_1,Unnamed: 73_level_1,Unnamed: 74_level_1,Unnamed: 75_level_1,Unnamed: 76_level_1,Unnamed: 77_level_1,Unnamed: 78_level_1,Unnamed: 79_level_1,Unnamed: 80_level_1,Unnamed: 81_level_1,Unnamed: 82_level_1,Unnamed: 83_level_1,Unnamed: 84_level_1,Unnamed: 85_level_1,Unnamed: 86_level_1,Unnamed: 87_level_1,Unnamed: 88_level_1,Unnamed: 89_level_1
671,"Priest,Religious",Transferred Out,Transferred Out,jadams,,Rev.,J.,J.K.,K.,Adams,III,,,,,,,,,,,503-975-4744,,jadams@jesuits.org,,,False,False,False,False,0,,True,,,False,,,,,,,,,,,,,,,,,,,,,2010-06-30,,,,,,,,,,,,,,,RelCommunities_60,,,,,,False,False,False,False,671,,,Private,Priest,012Dx0000003p5JIAQ,Priest - Religious,001Dx00001HwE3TIAV
2430,Religious,,Active,,,Sr.,Delores,,,Adelman,,Beaverton,OR,97078,,Beaverton,OR,97078,,,503-644-9181,503-718-0411,,,,srdeloresa@ssmo.org,True,False,False,False,0,,True,,,False,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,RelCommunities_53,,,,,,False,False,False,False,2430,Sisters of St. Mary of Oregon\n4440 SW 148th Ave,4595 SW 148th Ave,Private,Religious,012Dx0000003p5KIAQ,,001Dx00001HwE45IAF
1584,"Priest,Religious",Active,Active,makuti,,Rev.,Macdonald,,,Akuti,,Rockaway,OR,97136,,,,,,,503-355-2661,,424-410-0097,padreakuti@gmail.com,makuti@archdpdx.org,,True,False,False,False,0,,True,1977-08-18,"Vura Bilinyo, Uganda",True,Uganda,R1 (Religious Visa),2022-02-14,,,,,,,,"Apostles of Jesus, Kenya",2019-04-25,,,,2019-05-24,2022-04-21,2020-01-10,2022-04-28,2022-11-23,,,,,,,,,,,,,,,,RelCommunities_9,"St. Mary’s by the Sea Parish, Rockaway",,Parish Ministry,Full Time,Administrator,False,False,False,False,1584,St. Mary by the Sea Parish\nPO Box 390,,Private,Priest,012Dx0000003p5JIAQ,Priest - Religious,001Dx00001HwE3XIAV
912,"Priest,Religious",Transferred Out,Transferred Out,,,Rt. Rev.,James,,,Albers,,,,,,,,,,,,,,,,,False,False,False,False,0,,True,,,False,,,,,,,,,,,,,,,,,,,,,1900-01-01,,,,,,,,,,,,,,,RelCommunities_4,,,,,,False,False,False,False,912,,,Private,Priest,012Dx0000003p5JIAQ,Priest - Religious,001Dx00001HwE3VIAV
913,"Priest,Religious",Transferred Out,Transferred Out,,,Rev.,Jose,,,Alberto,,,,,,,,,,,,,,,,,False,False,False,False,0,,True,,,False,,,,,,,,,,,,,,,,,,,,,1900-01-01,,,,,,,,,,,,,,,RelCommunities_8,,,,,,False,False,False,False,913,,,Private,Priest,012Dx0000003p5JIAQ,Priest - Religious,001Dx00001HwE3WIAV
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
2884,"Priest,Religious",Transferred Out,Transferred Out,pyoun,,Rev.,Pius,,,Youn,,Eugene,OR,97403,,,,,,,541-343-7021,520-222-8844,907-313-9028,,,pius.youn@gmail.com,True,False,False,False,0,,True,1987-08-31,,False,,,,,,,,,,,,,2022-06-01,,,2022-05-22,2022-10-18,2022-10-10,2022-10-10,2023-01-26,2023-06-05,,,,"Korean, Spanish, Italian, Latin",,,,,,,,,,,RelCommunities_18,,,,,,False,False,False,False,2884,St. Thomas More Newman Center Parish\n1850 Eme...,,Private,Priest,012Dx0000003p5JIAQ,Priest - Religious,001Dx00001HwE3dIAF
1434,"Priest,Religious",Deceased,Deceased,,,Rev.,Jerome,,,Young,,,,,,,,,,,,,,,,,False,False,False,False,0,,False,,,False,,,,,,,,,,,,,,,,,,,,,,,,2012-12-08,,,,,,,,,,,,RelCommunities_4,,,,,,False,False,False,False,1434,,,Private,Priest,012Dx0000003p5JIAQ,Priest - Religious,001Dx00001HwE3VIAV
1435,"Priest,Religious",Transferred Out,Transferred Out,,,Rev.,Robert,,,Young,,,,,,,,,,,,,,,,,False,False,False,False,0,,True,,,False,,,,,,,,,,,,,,,,,,,,,1900-01-01,,,,,,,,,,,,,,,RelCommunities_22,,,,,,False,False,False,False,1435,,,Private,Priest,012Dx0000003p5JIAQ,Priest - Religious,001Dx00001HwE3fIAF
787,"Priest,Religious",Senior Status,Retired,nzodrow,,Rt. Rev.,Nathan,,,Zodrow,,Saint Benedict,OR,97373,,,,,,,503-845-3030,503-236-4747,,,,nathan.zodrow@mtangel.edu,True,False,False,False,0,,False,1952-03-02,USA,False,,,,,,,,,,,Benedectines (OSB),,1974-09-08,,,,,,,,,2010-06-20,,,Spanish,,,,,,,,,,,RelCommunities_4,Mount Angel Abbey,,Curator of Art Collection / Archivist,Full Time,Curator Archivist,False,False,False,False,787,Mount Angel Abbey\n1 Abbey Dr,,Private,Priest,012Dx0000003p5JIAQ,Priest - Religious,001Dx00001HwE3VIAV


### Registered Parish

In this section we populate the 'Home Parish' target field for Contacts who have a 'Registered Parish' in the source system. 

TODO: Check to see if the Registered Parish data is worth importing. Currently, 'Registered Parish' is only populated on 51 rows, and 32 of those rows in the 'Types' field are listed as 'Archive'. In other words, **only 19 of the 51 rows have a 'Registered Parish' value that might be meaningful.** 

### Diocese of Incardination

In [375]:
df_contact_staging['Incardinated_Now'].sample(10)

Record Number
2906                                    NaN
2952                                    NaN
836       Archdiocese of Portland in Oregon
2009    USA West Province, Society of Jesus
2398                                    NaN
201       Archdiocese of Portland in Oregon
369       Archdiocese of Portland in Oregon
1682                                    NaN
1370                                    NaN
2247             Discalced Carmelite Friars
Name: Incardinated_Now, dtype: object

In [376]:
# Need to look for, then create a new Account that corresponds to a given 'Diocese of Incardination', then populate with record Id. 

def update_incardinated_accounts(sf, df, column_name, record_type_dev_name, church_type, new_column_name):
    """
    Update the DataFrame by getting or creating Salesforce accounts for the values in the specified column.

    Parameters:
    sf (Salesforce): Salesforce connection object
    df (pd.DataFrame): The DataFrame to update
    column_name (str): The name of the column to search for account names
    record_type_dev_name (str): The developer name of the Record Type to use for creating the account
    church_type (str): The Church Type to set for the new account
    new_column_name (str): The name of the new column to store the Salesforce account IDs

    Returns:
    pd.DataFrame: The updated DataFrame with the new column containing Salesforce account IDs
    """
    df[new_column_name] = None

    for index, row in df.iterrows():
        account_name = row[column_name]
        if pd.notna(account_name):
            account_id = get_or_create_account(sf, account_name, record_type_dev_name, church_type)
            df.at[index, new_column_name] = account_id
    
    return df

# Example usage
# sf = Salesforce(username='your_username', password='your_password', security_token='your_security_token')
df_contact_staging = update_incardinated_accounts(sf, df_contact_staging, 'Incardinated_Now', 'Church', 'Diocese', 'mbfc__Diocese_of_Incardination__c')

# This cell takes >3m to run

In [377]:
df_contact_staging[['mbfc__Diocese_of_Incardination__c', 'Incardinated_Now']].sample(20)

Unnamed: 0_level_0,mbfc__Diocese_of_Incardination__c,Incardinated_Now
Record Number,Unnamed: 1_level_1,Unnamed: 2_level_1
2994,,
1862,,
2841,,
2009,001Dx00001HwzmFIAR,"USA West Province, Society of Jesus"
2016,,
918,,
3054,001Dx00001HwzokIAB,"Congregation of Holy Cross, US Province"
2152,,
81,,
1732,,


In [378]:
# Drop the 'Incardinated Now' column 
del df_contact_staging['Incardinated_Now']


### Deceased & Date of Death
ADPDX does not have a 'Deceased' boolean other than whether or not the Date of Death column has been populated. The target application functions based on both a 'Deceased' boolean and, optionally, a 'Date of Death.'

In [379]:
# Create a new column 'npsp__Deceased__c' and set its value to True when there is a value in 'mbfc__Date_of_Death__c'
df_contact_staging['npsp__Deceased__c'] = df_contact_staging['mbfc__Date_of_Death__c'].notna()


### Final Dataframe Cleanup


In [380]:
# drop columns that are no longer needed
# del df_contact_staging['Type(s)']  # Commented this out as we want to KEEP the field and migrated to 'ADPDX Contact Type'
del df_contact_staging['ContactRecordType']
del df_contact_staging['Link_to_Religious_Community']

In [381]:
df_contact_staging = df_contact_staging.rename(columns={'Type(s)': 'ADPDX_Contact_Type__c'})

In [382]:
# convert '' to NaN
df_contact_staging.replace("", np.nan, inplace=True)

# convert NaN to None
df_contact_staging = df_contact_staging.where(df_contact_staging.notnull(), None)


In [383]:
df_contact_staging['Languages__c'].sample(20)

Record Number
2159    None
2576    None
3269    None
1008    None
2895    None
2284    None
159     None
3158    None
253     None
868     None
3276    None
933     None
274     None
2505    None
323     None
578     None
2002    None
1989    None
980     None
3322    None
Name: Languages__c, dtype: object

In [384]:
# df_contact_staging_2 = df_contact_staging.where(df_contact_staging.notnull(), None)

## Load


In [385]:
df_contact_staging['Archdpdx_Job_Id__c'] = curr_job_id

In [386]:
# generate CSV for manual loading
df_contact_staging.to_csv(f'/Users/matthewmartin/Library/CloudStorage/GoogleDrive-matt@meribahflow.com/Shared drives/Clients/ADPDX (Portland)/Data/Clergy DB/staging/df_contacts_staging.csv', encoding='utf-8-sig')
df_contact_staging.to_csv('staging_files/contacts_staging.csv', encoding='utf-8-sig')


In [389]:
# upsert Contact records into SF using Bulk api

from simple_salesforce.exceptions import SalesforceMalformedRequest

bulk_data = []
for row in df_contact_staging.itertuples(index=False):
    d = row._asdict()
    # del d['Index']
    bulk_data.append(d)

try:
    # Attempt to upsert Contact records into SF using Bulk API
    contact_upsert = sf.bulk.Contact.upsert(data=bulk_data, external_id_field='Archdpdx_Migration_Id__c', batch_size=500, use_serial=False)
    contact_upsert_results = pd.DataFrame(contact_upsert)
except SalesforceMalformedRequest as e:
    # If a SalesforceMalformedRequest error occurs, print the error message and response content
    print(f"SalesforceMalformedRequest error: {e}")
    print(f"Response content: {e.content}")

In [391]:
# Print upsert results to local file

keys = contact_upsert[0].keys()
with open('results_files/contact_results', 'w', newline='') as csv_file:
    writer = csv.DictWriter(csv_file, keys)
    writer.writeheader()
    writer.writerows(contact_upsert)


# CONTACT > SPOUSES

#TODO: Contact Spouses migration


# CONTACTS > PHOTOS

#TODO: Contact Photos


# CONTACT > REGISTER ENTRIES


In [None]:
import pandas as pd

# Load CSV
df = (pd.read_csv('/Users/matthewmartin/Library/CloudStorage/GoogleDrive-matt@meribahflow.com/Shared drives/Clients/ADPDX (Portland)/Data/Clergy DB/reports from clergypdx/People.csv')
               .rename(columns=lambda x: x.replace(' ', '_')) # Remove whitespace in column names
               .drop(index=0) # Drops the extra row that replicates the labels
)

df

In [None]:
# Import all Contact fields that actually map to Register Entry records

import pandas as pd

# Define the structure of your column sets with correct attribute names
column_sets = [
    {'date': 'Baptism_Date', 'place': 'Place_of_Baptism', 'notation_type': 'Proof of Baptism'},
    {'date': 'Confirmation_Date', 'place': 'Place_of_Confirmation', 'notation_type': 'Notice of Confirmation'},
    {'date': 'Received_Date', 'place': 'Parish_of_Record', 'notation_type': 'Notice of Profession of Faith'},
    {'date': 'Marriage_Date', 'place': 'Place_of_Marriage', 'notation_type': 'Notice of Matrimony'},
    {'date': 'Diaconal_Ordination_Date', 'place': 'Diaconal_Ordination_Place', 'prelate': 'Diaconate_Ordination_Prelate', 'notation_type': 'Notice of Holy Orders', 'ordination_type': 'Diaconate'},
    {'date': 'Presbyteral_Ordination_Date', 'place': 'Presbyteral_Ordination_Place', 'prelate': 'Presbyteral_Ordination_Prelate', 'notation_type': 'Notice of Holy Orders', 'ordination_type': 'Presbyteral'},
    {'date': 'Episcopal_Ordination_Date', 'place': 'Episcopal_Ordination_Place', 'prelate': 'Episcopal_Ordination_Prelate', 'notation_type': 'Notice of Holy Orders', 'ordination_type': 'Episcopal'}
]

# New DataFrame for entries
register_entries = pd.DataFrame(columns=['RecordNumber', 'mbfc__Register_Entry_Type__c', 'mbfc__Type__c', 'mbfc__Notation_Type__c', 'mbfc__Ordination_Type__c', 'Date', 'Place', 'Prelate'])
new_entries = []  # List to store entries before final concatenation

# Processing rows
for row in df.itertuples():
    for column_set in column_sets:
        date_value = getattr(row, column_set['date'], None)
        if pd.notna(date_value):  # Check if date field is not NaN
            entry = {
                'RecordNumber': getattr(row, 'Record_Number', None),
                'Date': date_value,
                'Place': getattr(row, column_set['place'], None)
            }
            # Add Prelate if applicable
            if 'prelate' in column_set:
                entry['Prelate'] = getattr(row, column_set['prelate'], None)

            # Set 'mbfc__Register_Entry_Type__c', and conditionally add 'mbfc__Type__c' or 'mbfc__Notation_Type__c'
            if 'sacrament_type' in column_set:
                entry['mbfc__Type__c'] = column_set['sacrament_type']
                entry['mbfc__Register_Entry_Type__c'] = 'Sacrament'
            if 'notation_type' in column_set:
                entry['mbfc__Notation_Type__c'] = column_set['notation_type']
                entry['mbfc__Register_Entry_Type__c'] = 'Notation'

            # Handle ordination type specific updates
            if 'ordination_type' in column_set:
                entry['mbfc__Ordination_Type__c'] = column_set['ordination_type']

            new_entries.append(entry)
    
    # Add entries for 'Reader Date'
    reader_date = getattr(row, 'Reader_Date', None)
    if pd.notna(reader_date):
        entry = {
            'RecordNumber': getattr(row, 'Record_Number', None),
            'Date': reader_date,
            'mbfc__Notation_Type__c': 'Notice of Holy Orders',
            'mbfc__Ordination_Type__c': 'Minor Order: Reader',
            'mbfc__Register_Entry_Type__c': 'Notation'
        }
        new_entries.append(entry)
    
    # Add entries for 'Acolyte Date'
    acolyte_date = getattr(row, 'Acolyte_Date', None)
    if pd.notna(acolyte_date):
        entry = {
            'RecordNumber': getattr(row, 'Record_Number', None),
            'Date': acolyte_date,
            'mbfc__Notation_Type__c': 'Notice of Holy Orders',
            'mbfc__Ordination_Type__c': 'Minor Order: Acolyte',
            'mbfc__Register_Entry_Type__c': 'Notation'
        }
        new_entries.append(entry)

# Concatenate all new entries to the DataFrame at once
if new_entries:
    register_entries = pd.concat([register_entries, pd.DataFrame(new_entries)], ignore_index=True)

print(f"Total records added: {len(register_entries)}")

# Optionally, save the new DataFrame to a CSV
register_entries.to_csv('Register_Entries.csv', index=False)

# Display the DataFrame
register_entries.sample(10)


### Populate Lookup for Prelate 

In [None]:
from nameparser import HumanName
from nameparser.config import CONSTANTS

# Add dataset-specific Titles and Suffix constants for parsing
CONSTANTS.titles.add('Very', 'Rev.', 'Very Rev.', 'Sr.', 'Most Rev.')
CONSTANTS.suffix_acronyms.add('FRS', 'J.C.L.', 'J.C.L., D.D.', 'D.D.', 'OMI', 'OSA', 'OCD', 'OP', 'OC', 'FSE', 'OMV', 'SDB', 'SM', 'SFX', 'SP', 'OP', 'O.S.M', 'SNJM', 'OSF', 'HMRF', 'DD', 'CSJP', 'SDD', 'BVM', 'BVM - President', 'SJ', 'SL', 'IX', 'SSJ', 'J.C.L.', 'J.C.L', 'OFM', 'MSpS', 'Fco.' )


def parse_name(name):
    if pd.isna(name):  # Checks if the name is NaN or None
        return {
            'Salutation': '',
            'FirstName': '',
            'MiddleName': '',
            'LastName': '',
            'Suffix': ''
        }
    else:
        name = HumanName(name)
        return {
            'Salutation': name.title,
            'FirstName': name.first,
            'MiddleName': name.middle,
            'LastName': name.last,
            'Suffix': name.suffix
        }

# Apply the parsing function only where 'Prelate' exists and is not NaN
for entry in new_entries:
    if 'Prelate' in entry and pd.notna(entry['Prelate']):
        parsed_name = parse_name(entry['Prelate'])
        entry.update(parsed_name)

# Ensure the DataFrame creation from new_entries includes checks for existence of keys:
register_entries = pd.DataFrame(new_entries)
if 'Prelate' in register_entries.columns:
    register_entries['Salutation'] = register_entries['Prelate'].apply(lambda x: parse_name(x)['Salutation'] if pd.notna(x) else '')
    register_entries['FirstName'] = register_entries['Prelate'].apply(lambda x: parse_name(x)['FirstName'] if pd.notna(x) else '')
    register_entries['MiddleName'] = register_entries['Prelate'].apply(lambda x: parse_name(x)['MiddleName'] if pd.notna(x) else '')
    register_entries['LastName'] = register_entries['Prelate'].apply(lambda x: parse_name(x)['LastName'] if pd.notna(x) else '')
    register_entries['Suffix'] = register_entries['Prelate'].apply(lambda x: parse_name(x)['Suffix'] if pd.notna(x) else '')


# Display the DataFrame
print(f"Total records added: {len(register_entries)}")
register_entries.sample(10)



In [None]:
# Query Salesforce for existing contacts and create a dictionary for mapping

from simple_salesforce import Salesforce

query = """
SELECT Id, Archdpdx_Migration_Id__c
FROM Contact
"""
result = sf.query_all(query)
contact_map = {rec['Archdpdx_Migration_Id__c']: rec['Id'] for rec in result['records']}


In [None]:
# Get RecordTypeId for Contact.Priest

priest_contact_recordtype_id = df_sf_recordTypes.loc[
    (df_sf_recordTypes['DeveloperName'] == 'Priest') & (df_sf_recordTypes['SobjectType'] == 'Contact'),
    'Id'
    ].iloc[0]  # Use .iloc[0] to get the first item if you're expecting exactly one match


In [None]:
# Get RecordID for Prelates by querying for Contacts by FirstName and LastName and, if not found, Create New Contacts

from simple_salesforce import SFType, SalesforceResourceNotFound

contact = SFType('Contact', sf.session_id, sf.sf_instance)
for index, row in register_entries.iterrows():
    first_name, last_name = row.get('FirstName'), row.get('LastName')

    if pd.isna(first_name) or pd.isna(last_name) or first_name.strip() == '' or last_name.strip() == '':
        # If either first name or last name is missing or empty, skip this row or handle as needed
        print(f"Skipping row {index} due to missing name information.")
        continue

    try:
        # Search for contact by First and Last Name
        query = f"SELECT Id FROM Contact WHERE FirstName = '{first_name}' AND LastName = '{last_name}'"
        result = sf.query(query)
        if result['totalSize'] > 0:
            contact_id = result['records'][0]['Id']
        else:
            # Create a new contact if no match found
            new_contact = {
                'FirstName': first_name,
                'LastName': last_name,
                'Archdpdx_Job_Id__c': curr_job_id,
                'RecordTypeId': priest_contact_recordtype_id
            }
            create_result = contact.create(new_contact)
            contact_id = create_result['id']

        # Update DataFrame with the Salesforce Contact ID
        register_entries.at[index, 'mbfc__Celebrant__c'] = contact_id

    except SalesforceException as e:
        print(f"Error processing row {index}: {e}")



### Prepare to Upsert   

In [None]:
# Map Contact IDs to Register Entries

register_entries_2 = register_entries

register_entries_2['mbfc__Contact__c'] = register_entries['RecordNumber'].map(contact_map)


In [None]:
# Append Job_Id__c
register_entries_2['Archdpdx_Job_Id__c'] = curr_job_id

In [None]:
# Generate an External ID
def create_external_id(row):
    record_number = str(row['RecordNumber']).replace(' ', '').replace('-', '')
    entry_type = str(row['mbfc__Register_Entry_Type__c']).replace(' ', '').replace('-', '')

    # Check whether to use Type or Notation Type based on what's available
    if 'mbfc__Type__c' in row and not pd.isna(row['mbfc__Type__c']):
        type_field = str(row['mbfc__Type__c']).replace(' ', '').replace('-', '')
    elif 'mbfc__Notation_Type__c' in row and not pd.isna(row['mbfc__Notation_Type__c']):
        type_field = str(row['mbfc__Notation_Type__c']).replace(' ', '').replace('-', '') + str(row['mbfc__Ordination_Type__c']).replace(' ', '').replace('-', '')
    else:
        type_field = 'Unknown'

    return f"{record_number}_{entry_type}_{type_field}"

In [None]:
# Assuming your DataFrame is named `register_entries`
register_entries_2['Archdpdx_Migration_Id__c'] = register_entries.apply(create_external_id, axis=1)

if register_entries['Archdpdx_Migration_Id__c'].duplicated().any():
    print("Warning: There are duplicate external IDs.")
    # Optionally, show the duplicates
    duplicates = register_entries[register_entries['external_id'].duplicated(keep=False)]
    print(duplicates)
else:
    print("All external IDs are unique.")


In [None]:
# Drop unnecessary columns:
register_entries_2.drop(['RecordNumber', 'Prelate', 'Salutation', 'FirstName', 'MiddleName', 'LastName', 'Suffix'], axis=1, inplace=True)

In [None]:
register_entries_staging = register_entries_2

In [None]:
# Remove all NaN values:
register_entries_staging.fillna('', inplace=True)

# Rename columns
register_entries_staging = register_entries_staging.rename(columns={
    'Place': 'mbfc__Location_text__c',
    'Date': 'mbfc__Event_Date__c'
})


In [None]:
# What is this checking for?... Why did I include this?
register_entries_staging[register_entries_staging.mbfc__Contact__c == '003Dx00000m0OtXIAU']


In [None]:
# generate CSV for manual loading
register_entries_staging.to_csv('staging_files/reg_entry_staging.csv', encoding='utf-8-sig')


In [None]:
# Upsert Register Entry Records

bulk_data = []
for row in register_entries_staging.itertuples(index=False):
    d = row._asdict()
    # del d['Index']
    bulk_data.append(d)

# Keep the batch <100 as I've been getting an exceptionCode: 'InvalidBatch', 'exceptionMessage': 'Records not processed'
reg_entry_upsert = sf.bulk.mbfc__Register_Entry__c.upsert(data=bulk_data, external_id_field='Archdpdx_Migration_Id__c', batch_size=100, use_serial=False)
reg_entry_upsert_results = pd.DataFrame(reg_entry_upsert)

In [None]:
# Print upsert results to local file

keys = reg_entry_upsert[0].keys()

with open('results_files/register_entry_results', 'w', newline='') as csv_file:
    writer = csv.DictWriter(csv_file, keys)
    writer.writeheader()
    writer.writerows(reg_entry_upsert)

# CONTACT > AFFILIATIONS


In [292]:
# Function to create a unique ID based on Person's Name + completion date or start date + affiliation type
def create_unique_id(row):
    # Get values, handling NaNs
    person_id = str(row.get('mbfc__Person__c', '')).strip()
    
    # Check for completion date, and if it's blank, use the start date
    completion_date = row.get('mbfc__Completion_Date__c', '')
    if pd.isna(completion_date) or completion_date == '':
        completion_date = row.get('mbfc__Start_Date__c', '')
    
    completion_date = str(completion_date).strip()
    affiliation = str(row.get('mbfc__Affiliation__c', '')).strip()
    
    # Concatenate the three fields
    combined = f"{person_id}{completion_date}{affiliation}"
    
    # Remove unwanted characters and convert to lowercase
    clean_id = ''.join(combined.split()).replace('-', '').replace('.', '').lower()
    
    # Limit the string to 50 characters
    return clean_id[:50]

## Education Affiliations

This section takes multiple sets of columns (all related to a person's education) from the Contacts table, and combines them into a single set of columns in a new dataframe for insertion into Salesforce as Affiliation records.


In [None]:
# Parse and stage Education Affiliation records
import pandas as pd
from functools import lru_cache

# Load CSV
df = (pd.read_csv('/Users/matthewmartin/Library/CloudStorage/GoogleDrive-matt@meribahflow.com/Shared drives/Clients/ADPDX (Portland)/Data/Clergy DB/reports from clergypdx/People.csv')
               .rename(columns=lambda x: x.replace(' ', '_')) # Remove whitespace in column names
               .drop(index=0) # Drops the extra row that replicates the labels
)


# Define the structure of your column sets with correct attribute names
degree_sets = [
    {'year': 'Bachelor_Degree_Year', 'type': 'Bachelor_Degree_Type', 'institution': 'Bachelor_Degree_Institution'},
    {'year': 'Graduate_1_Degree_Year', 'type': 'Graduate_1_Degree_Type', 'institution': 'Graduate_1_Degree_Institution'},
    {'year': 'Graduate_2_Degree_Year', 'type': 'Graduate_2_Degree_Type', 'institution': 'Graduate_2_Degree_Institution'},
    {'year': 'Graduate_3_Degree_Year', 'type': 'Graduate_3_Degree_Type', 'institution': 'Graduate_3_Degree_Institution'},
    {'year': 'Graduate_4_Degree_Year', 'type': 'Graduate_4_Degree_Type', 'institution': 'Graduate_4_Degree_Institution'}
]

# Query for the Record Type ID for 'Organization'
record_type_result = sf.query("SELECT Id FROM RecordType WHERE SobjectType = 'Account' AND DeveloperName = 'Organization' AND NamespacePrefix = 'mbfc'")
organization_record_type_id = record_type_result['records'][0]['Id'] if record_type_result['records'] else None

# Initialize the DataFrame for the staging table
education_staging = pd.DataFrame()

# Function to check and create institution account
@lru_cache(maxsize=None)
def get_or_create_institution_account(institution_name):
    if pd.isna(institution_name):
        return None  # Return None or handle as appropriate if institution name is NaN

    # Query Salesforce to find the institution
    query = f"SELECT Id, Name FROM Account WHERE Name = '{institution_name}' LIMIT 1"
    results = sf.query(query)
    
    # If exists, return the ID
    if results['records']:
        return results['records'][0]['Id']
    else:
        # Ensure no NaN values are sent to Salesforce
        account_data = {
            'Name': institution_name if pd.notna(institution_name) else "Default Name",  # Provide a default if NaN
            'RecordTypeId': organization_record_type_id,
            'mbfc__Organization_Type__c': 'School'
        }
        # Remove keys with None values to avoid JSON serialization issues
        account_data = {k: v for k, v in account_data.items() if v is not None}
        
        new_account = sf.Account.create(account_data)
        return new_account['id']

# Get Contact record ID from Salesforce
@lru_cache(maxsize=None)
def get_contact_id_by_record_number(record_number):
    if pd.isna(record_number):
        return None
    query = f"SELECT Id FROM Contact WHERE Archdpdx_Migration_Id__c = '{record_number}'"
    results = sf.query(query)
    if results['records']:
        return results['records'][0]['Id']
    return None


# Initialize an empty list to collect DataFrames or dictionaries
new_entries = []

# Process each row and each degree set
for index, row in df.iterrows():
    for degree_set in degree_sets:
        year = row[degree_set['year']]
        if pd.notna(year):  # Only proceed if the year column is not NaN
            formatted_year = f"{int(year)}-01-01"  # Convert year to YYYY-MM-DD format
            institution_name = row[degree_set['institution']]
            account_id = get_or_create_institution_account(institution_name)
            contact_id = get_contact_id_by_record_number(row['Record_Number'])
            
            # Create a record for the staging table
            affiliation_record = {
                'mbfc__Person__c': contact_id,
                'mbfc__Completion_Date__c': formatted_year,
                'mbfc__Context__c': account_id,
                'mbfc__Category__c': 'Education/Studies',
                'mbfc__Affiliation__c': row[degree_set['type']]
            }
            new_entries.append(affiliation_record)

# Convert all collected records to a DataFrame in one go
education_staging = pd.DataFrame(new_entries)


#FIXME: There are 4 rows where no INSTITUTION is listed. This makes it impossible to import an Affiliation record. Need to figure out how to handle this with Client. 
#FIXME: There are about 15 rows where no DEGREE is listed. This makes it impossible to import an Affiliation record. Need to figure out how to handle this with Client. 

In [None]:
# Apply the function to each row and create a new column with the unique ID
education_staging['Archdpdx_Migration_Id__c'] = education_staging.apply(create_unique_id, axis=1)

# Check the first few rows to verify the new column
education_staging.head()

In [None]:
# Fill any NaN values
education_staging = education_staging.fillna('')

In [None]:
# Save the staging table to CSV
education_staging.to_csv('staging_files/education_staging.csv', index=False)


In [528]:
import pandas as pd
import numpy as np
from simple_salesforce import Salesforce, SalesforceMalformedRequest, SalesforceError
from datetime import datetime, date



# def upsert_to_salesforce(sf, dataframe, object_name, external_id_field):
#     """
#     Upsert records to Salesforce from a pandas DataFrame.

#     Parameters:
#     sf (Salesforce): The Salesforce connection instance.
#     dataframe (pd.DataFrame): The pandas DataFrame containing data to upsert.
#     object_name (str): The Salesforce object name (e.g., 'Contact').
#     external_id_field (str): The external ID field used for upserts.
#     """
#     successful_upserts = 0
#     failed_upserts = 0

#     # Replace placeholder values with None in the DataFrame
#     dataframe.replace({None: pd.NA, ' ': None, '': None}, inplace=True)

#     # Convert DataFrame to a list of dictionaries
#     data_to_upsert = dataframe.to_dict(orient='records')

#     for data in data_to_upsert:
#         try:
#             data = convert_non_serializables(data)
#             external_id = data.pop(external_id_field)

#             # Perform upsert using only the External ID
#             response = getattr(sf, object_name).upsert(f'{external_id_field}/{external_id}', data)
#             successful_upserts += 1
#             print(f"Successfully upserted {object_name} with External ID: {external_id}")
#         except SalesforceMalformedRequest as e:
#             failed_upserts += 1
#             print(f"Malformed request error when upserting {object_name} with External ID: {external_id}. Error: {e.content}")
#         except SalesforceError as e:
#             failed_upserts += 1
#             print(f"Salesforce error when upserting {object_name} with External ID: {external_id}. Error: {e.content}")
#         except Exception as e:
#             failed_upserts += 1
#             print(f"Failed to upsert {object_name} with External ID: {external_id}. Error: {e}")

#     print(f"Upsert completed. Successful upserts: {successful_upserts}, Failed upserts: {failed_upserts}")

def convert_non_serializables(data):
    """Convert non-serializable objects to serializable formats."""
    for key, value in data.items():
        try:
            if isinstance(value, (datetime, date)):
                data[key] = value.isoformat()
            elif isinstance(value, float) and np.isnan(value):
                data[key] = None
            elif pd.isna(value):
                data[key] = None
            elif isinstance(value, (int, bool, str)):
                data[key] = value
            else:
                data[key] = str(value)  # Convert other types to string
        except Exception as e:
            print(f"Error processing key: {key}, value: {value}, error: {e}")
    return data

def upsert_to_salesforce_bulk(sf, dataframe, object_name, external_id_field, failed_log_file, batch_size=10000):
    """
    Upsert records to Salesforce from a pandas DataFrame using the Bulk API.

    Parameters:
    sf (Salesforce): The Salesforce connection instance.
    dataframe (pd.DataFrame): The pandas DataFrame containing data to upsert.
    object_name (str): The Salesforce object name (e.g., 'Contact').
    external_id_field (str): The external ID field used for upserts.
    failed_log_file (str): The file name where failed upsert records will be logged.
    batch_size (int): The number of records to include in each batch.
    """
    successful_upserts = 0
    failed_upserts = 0

    # Replace placeholder values with None in the DataFrame
    dataframe.replace({None: pd.NA, ' ': None, '': None}, inplace=True)

    # Convert DataFrame to a list of dictionaries
    data_to_upsert = dataframe.to_dict(orient='records')

    with open(failed_log_file, 'a') as log_file:
        # Process data in batches
        for i in range(0, len(data_to_upsert), batch_size):
            batch_data = data_to_upsert[i:i + batch_size]
            batch_data = [convert_non_serializables(record) for record in batch_data]

            try:
                # Perform bulk upsert
                response = sf.bulk.__getattr__(object_name).upsert(batch_data, external_id_field=external_id_field)

                for res in response:
                    if res['success']:
                        successful_upserts += 1
                    else:
                        failed_upserts += 1
                        log_file.write(f"Failed to upsert record: {res}\n")

            except SalesforceMalformedRequest as e:
                failed_upserts += len(batch_data)
                log_file.write(f"Malformed request error when upserting batch. Error: {e.content}\n")
            except SalesforceError as e:
                failed_upserts += len(batch_data)
                log_file.write(f"Salesforce error when upserting batch. Error: {e.content}\n")
            except Exception as e:
                failed_upserts += len(batch_data)
                log_file.write(f"Failed to upsert batch. Error: {e}\n")

    print(f"Upsert completed. Successful upserts: {successful_upserts}, Failed upserts: {failed_upserts}")


In [None]:
# Upsert Education Affiliation records

# upsert_to_salesforce(sf, education_staging, 'mbfc__Affiliation__c', 'Archdpdx_Migration_Id__c')
upsert_to_salesforce_bulk(sf, education_staging, 'mbfc__Affiliation__c', 'Archdpdx_Migration_Id__c', 'results_files/education_affil', batch_size=1000)


In [None]:

#FIXME: A number of Education Affiliation records are missing either an Affiliation title or a Context

In [None]:
# Upsert Education Affiliation records [DEP]

# bulk_data = []
# for row in education_staging.itertuples(index=False):
#     d = row._asdict()
#     # del d['Index']
#     bulk_data.append(d)

# try:
#     # Attempt to upsert Education Affiliation records into SF using Bulk API
    # education_affil_upsert = sf.bulk.mbfc__Affiliation__c.upsert(data=bulk_data, external_id_field='Archdpdx_Migration_Id__c', batch_size=500, use_serial=False)
    

# except SalesforceMalformedRequest as e:
#     # If a SalesforceMalformedRequest error occurs, print the error message and response content
#     print(f"SalesforceMalformedRequest error: {e}")
#     print(f"Response content: {e.content}")

# Send results to CSV
# education_affil_upsert_results = pd.DataFrame(education_affil_upsert)
# education_affil_upsert_results.to_csv('results_files/education_affil_upsert_results')

## Ecclesial Affiliations

This section handles individual Contact source table FIELDS that map to Affiliation RECORDS in the target system. 

As the source data model and target data model are substantially different, this section groups together source columns into what will become individual records in the new system and populates missing information based on or required by the target system. 

Example: each affiliation record in the target system requires a Context. In certain cases this data does not exist in the source or it is found in another column:

| Affiliation            | Context                   | Completion Date           |
| ---------------------- | ------------------------- | ------------------------- |
| First Vows             | Religious Order           | Date of First Vows        |
| Final Vows             | Religious Order           | Date of Final Vows        |
| Incardination          | Incardinated from Diocese | Incardinated From Date    |
| Faculties (Type)       | Local Diocese             | Faculties Granted Date    |
| Faculties (Restricted) | Local Diocese             | Faculties Restricted Date |
| Faculties (Withdrawn)  | Local Diocese             | Faculties Withdrawn Date  |
| Excardinated           | Excardinated To Diocese   | Excardinated To Date      |

Other examples of columns that need to be populated:
- RecordTypeId
- Category
- Start Date
- Completion Date

Depending on which column is being migrated, the date value might be considered to be a Start Date or a Completion Date in the target system, and needs to be staged accordingly. 

In [266]:
# Generate a staging DF of Ecclesial Affiliations out of a handful of fields in the source data, each of which is to be converted into a new row in the staging DF.

# FIXME: There are a number of rows where a Faculties Granted is missing a date, and conversely, where there is a Faculties Granted Date but no description of the Faculties granted. This is a problem, because the application requires a date for when Faculties were granted.


import pandas as pd
from functools import lru_cache
from simple_salesforce import Salesforce

# Load CSV
df = (pd.read_csv('/Users/matthewmartin/Library/CloudStorage/GoogleDrive-matt@meribahflow.com/Shared drives/Clients/ADPDX (Portland)/Data/Clergy DB/reports from clergypdx/People.csv')
               .rename(columns=lambda x: x.replace(' ', '_')) # Remove whitespace in column names
               .drop(index=0) # Drops the extra row that replicates the labels
)

# Define the structure of your column sets with correct attribute names
column_sets = [
    {'year': 'Incardinated_From_Date', 'context': 'Incardinated_From_Diocese'},
    {'year': 'Excardinated_To_Date', 'context': 'Excardinated_To_Diocese'},
    {'year': 'Faculties_Granted_Date', 'affiliation': 'Faculties'},
    {'year': 'Faculties_Restricted_Date'},
    {'year': 'Faculties_Withdrawn_Date'},
]


In [272]:

# Query for the Record Type IDs of Church, Religious    
record_type_query = "SELECT Id, DeveloperName FROM RecordType WHERE SobjectType = 'Account' AND DeveloperName IN ('Church', 'Religious')"
record_type_result = sf.query(record_type_query)
record_type_ids = {record['DeveloperName']: record['Id'] for record in record_type_result['records']}

church_record_type_id = record_type_ids.get('Church')
religious_record_type_id = record_type_ids.get('Religious')

# Query for the Record Type IDs for 'Ecclesial_Affiliation' and 'Ministerial_Status' for mbfc__Affiliation__c object
record_type_query = "SELECT Id, DeveloperName FROM RecordType WHERE SobjectType = 'mbfc__Affiliation__c' AND DeveloperName IN ('Ecclesial_Affiliation', 'Ministerial_Status')"
record_type_result = sf.query(record_type_query)
record_type_ids = {record['DeveloperName']: record['Id'] for record in record_type_result['records']}

ecclesial_affiliation_record_type_id = record_type_ids.get('Ecclesial_Affiliation')
ministerial_status_record_type_id = record_type_ids.get('Ministerial_Status')

# Check if any of the required Record Types are missing
if not ecclesial_affiliation_record_type_id:
    raise ValueError("No RecordType found for Ecclesial Affiliation on mbfc__Affiliation__c object.")
if not ministerial_status_record_type_id:
    raise ValueError("No RecordType found for Ministerial Status on mbfc__Affiliation__c object.")

In [301]:

# Initialize the DataFrame for the staging table
ecclesial_affiliation_staging = pd.DataFrame()

# Function to check and create institution account
@lru_cache(maxsize=None)
def get_or_create_church_account(context):
    if pd.isna(context):
        return None  # Return None or handle as appropriate if institution name is NaN

    # Query Salesforce to find the institution
    query = f"SELECT Id, Name FROM Account WHERE Name = '{context}' LIMIT 1"
    results = sf.query(query)
    
    # If exists, return the ID
    if results['records']:
        return results['records'][0]['Id']
    else:
        # Ensure no NaN values are sent to Salesforce
        if 'Diocese' in context or 'Archdiocese' in context:
            account_data = {
                'Name': context if pd.notna(context) else "Church Name Missing",  # Provide a default if NaN
                'RecordTypeId': church_record_type_id,
                'mbfc__Church_Type__c': 'Diocese'
            }
        else:
            account_data = {
                'Name': context if pd.notna(context) else "Religious Name Missing",  # Provide a default if NaN
                'RecordTypeId': religious_record_type_id
            }

        # Remove keys with None values to avoid JSON serialization issues
        account_data = {k: v for k, v in account_data.items() if v is not None}
        
        new_account = sf.Account.create(account_data)
        return new_account['id']

# Get Contact record ID from Salesforce
@lru_cache(maxsize=None)
def get_contact_id_by_record_number(record_number):
    if pd.isna(record_number):
        return None
    query = f"SELECT Id FROM Contact WHERE Archdpdx_Migration_Id__c = '{record_number}'"
    results = sf.query(query)
    if results['records']:
        return results['records'][0]['Id']
    return None

# Initialize an empty list to collect DataFrames or dictionaries
new_entries = []

# Process each row and each degree set
for index, row in df.iterrows():
    for col_set in column_sets:
        date = row[col_set['year']]
        if pd.notna(date):  # Only proceed if the year column is not NaN
            context = row.get(col_set.get('context'), None)
            account_id = get_or_create_church_account(context)
            contact_id = get_contact_id_by_record_number(row['Record_Number'])
            
            # Initialize all necessary variables with None
            start_date = None
            completion_date = None
            affiliation = None
            record_type_id = None
            category = None

            # Determine the mbfc__Affiliation__c value
            if 'Incardinated_From_Date' in col_set['year']:
                affiliation = 'Incardinated'
                completion_date = date
                record_type_id = ecclesial_affiliation_record_type_id
                category = 'Ecclesial Affiliations'
            elif 'Excardinated_To_Date' in col_set['year']:
                affiliation = 'Excardinated'
                completion_date = date
                record_type_id = ecclesial_affiliation_record_type_id
                category = 'Ecclesial Affiliations'
            elif 'Faculties_Granted_Date' in col_set['year']:
                faculties_value = row.get(col_set.get('affiliation', ''))
                if pd.isna(faculties_value):
                    affiliation = 'Faculties'
                else:
                    affiliation = f"Faculties ({faculties_value})"
                account_id = diocesan_account_id  # Override account ID for faculties
                start_date = date
                record_type_id = ministerial_status_record_type_id
                category = 'Faculties'
            elif 'Faculties_Restricted_Date' in col_set['year']:
                affiliation = 'Faculties (Restricted)'
                account_id = diocesan_account_id  # Override account ID for faculties
                completion_date = date
                record_type_id = ministerial_status_record_type_id
                category = 'Faculties'
            elif 'Faculties_Withdrawn_Date' in col_set['year']:
                affiliation = 'Faculties (Withdrawn)'
                account_id = diocesan_account_id  # Override account ID for faculties
                completion_date = date
                record_type_id = ministerial_status_record_type_id
                category = 'Faculties'
            elif 'Date_of_First_Vows' in col_set['year']:
                affiliation = 'First Vows'
                completion_date = date
                record_type_id = ecclesial_affiliation_record_type_id
                category = 'Ecclesial Affiliations'
            elif 'Date_of_Final_Vows' in col_set['year']:
                affiliation = 'Final Vows'
                completion_date = date
                record_type_id = ecclesial_affiliation_record_type_id
                category = 'Ecclesial Affiliations'
            else:
                affiliation = row.get(col_set.get('affiliation', ''), None)
            
            # Create a record for the staging table
            affiliation_record = {
                'RecordTypeId': record_type_id,
                'mbfc__Person__c': contact_id,
                'mbfc__Completion_Date__c': completion_date,
                'mbfc__Start_Date__c': start_date,
                'mbfc__Context__c': account_id,
                'mbfc__Category__c': category,
                'mbfc__Affiliation__c': affiliation
            }
            new_entries.append(affiliation_record)

# Convert all collected records to a DataFrame in one go
ecclesial_affiliations_staging = pd.DataFrame(new_entries)


In [302]:
ecclesial_affiliations_staging.sample(20)

Unnamed: 0,RecordTypeId,mbfc__Person__c,mbfc__Completion_Date__c,mbfc__Start_Date__c,mbfc__Context__c,mbfc__Category__c,mbfc__Affiliation__c
220,012Dx0000003p5DIAQ,003Dx00000nKj6bIAC,,2016-01-13,001Dx00001HwDsgIAF,Faculties,Faculties (General)
344,012Dx0000003p5DIAQ,003Dx00000nKjThIAK,,2005-06-11,001Dx00001HwDsgIAF,Faculties,Faculties (General)
254,012Dx0000003p5DIAQ,003Dx00000nKjPkIAK,,2021-08-15,001Dx00001HwDsgIAF,Faculties,Faculties (General)
455,012Dx0000003p5DIAQ,003Dx00000nKj46IAC,,2001-06-09,001Dx00001HwDsgIAF,Faculties,Faculties (General)
89,012Dx0000003p5AIAQ,003Dx00000nKjScIAK,2013-06-23,,,Ecclesial Affiliations,Incardinated
395,012Dx0000003p5DIAQ,003Dx00000nKjKBIA0,,2021-09-11,001Dx00001HwDsgIAF,Faculties,Faculties (Diaconal)
188,012Dx0000003p5DIAQ,003Dx00000nKiuFIAS,,2023-06-12,001Dx00001HwDsgIAF,Faculties,Faculties (Confessional)
114,012Dx0000003p5DIAQ,003Dx00000nKipHIAS,,2005-11-05,001Dx00001HwDsgIAF,Faculties,Faculties (Diaconal)
72,012Dx0000003p5DIAQ,003Dx00000nKjCVIA0,,2021-06-15,001Dx00001HwDsgIAF,Faculties,Faculties (Confessional)
77,012Dx0000003p5AIAQ,003Dx00000nKjS0IAK,1978-01-05,,001Dx00001HwFFlIAN,Ecclesial Affiliations,Incardinated


In [303]:
# Apply the function to each row and create a new column with the unique ID
ecclesial_affiliations_staging['Archdpdx_Migration_Id__c'] = ecclesial_affiliations_staging.apply(create_unique_id, axis=1)

# Check for duplicates
ecclesial_affiliations_staging['Archdpdx_Migration_Id__c'].duplicated().value_counts()

False    529
Name: Archdpdx_Migration_Id__c, dtype: int64

In [304]:
# Send the new DataFrame to a CSV
ecclesial_affiliations_staging.to_csv('staging_files/Ecclesial_Affiliations_Staging.csv', index=False, encoding='utf-8-sig')

In [305]:
# NEW Upsert function to upsert Ecclesial Affiliation records
upsert_to_salesforce_bulk(sf, ecclesial_affiliations_staging, 'mbfc__Affiliation__c', 'Archdpdx_Migration_Id__c', 'results_files/ecclesial_affil_upsert_results')

#FIXME: ... the upsert_to_salesforce function is declared in a few places in this workbook (what a mess!!)  one of them works (the latter one), an earlier version of it does not. 

# FIXME: There are a number of rows where a Faculties Granted is missing a date, and conversely, where there is a Faculties Granted Date but no description of the Faculties granted. This is a problem, because the application requires a date for when Faculties were granted.


Upsert completed. Successful upserts: 517, Failed upserts: 12


In [None]:
#FIXME: Handful of Ecclesial Affil records with error: [{'statusCode': 'FIELD_CUSTOM_VALIDATION_EXCEPTION', 'message': 'Context is required', 'fields': []}]"


# AFFILIATIONS


In [None]:
# Import Assignments.csv

import pandas as pd


df_affiliations = (
    pd.read_csv('/Users/matthewmartin/Library/CloudStorage/GoogleDrive-matt@meribahflow.com/Shared drives/Clients/ADPDX (Portland)/Data/Clergy DB/reports from clergypdx/Assignments (1).csv')
    .set_index('Record Number', verify_integrity=True)
    .drop(index='recNum', errors='ignore')  # Added errors='ignore' to prevent errors if 'recNum' does not exist
    .drop(columns=['Historic Name'], errors='ignore')  # Added errors='ignore' for the same reason
    .rename(columns=lambda x: x.replace(' ', '_'))  # Remove whitespace in column names
    .assign(Account_Ext_Id=lambda df: df['Organization_Table_Name'] + '_' + df['Organization_Table_Link'])
    # .assign(mbfc__Person__r=lambda df: df['Assigned_Person'].apply(lambda x: {'Archdpdx_Migration_Id__c': x}))
    # .assign(mbfc__Context__r=lambda df: df['Account_Ext_Id'].apply(lambda x: {'Archdpdx_Migration_Id__c': x}))
    # .assign(mbfc__Use_Custom_Title__c= True)
    .assign(mbfc__Category__c= 'Any All')
    # .assign(Archdpdx_Migration_Id__c= df_affiliations.index)
    .drop(columns=[
        # 'Assigned_Person'
        'Organization_Table_Name'
        ,'Organization_Table_Link'
        ,'Projected_Term_End_Date'
        ,'Term_Number'
        ,'Leave_Type' # Leave out 'Leave_Type' until mapped properly
        ])
    .rename(columns={
        'Duty_Load': 'mbfc__Duty_Load__c',
        'Start_Date': 'mbfc__Start_Date__c',
        'End_Date': 'mbfc__Completion_Date__c',
        'Assignment_Title': 'mbfc__Affiliation__c',
        'Archdiocesan_Assignment': 'adpdx_Archdiocesan_Assignment__c',
    })
    .replace({'ADPDX_Archdiocesan_Assignment__c': {'Yes': True, 'No': False, None: False}})
    .fillna('')
)

# Display a sample of the DataFrame to check the new structure
df_affiliations.sample(10)



In [None]:
# Get SF Record Ids from External Ids

# Get Context Account Ids
add_salesforce_record_ids(sf, df_affiliations, 'Account_Ext_Id', 'Account', 'Archdpdx_Migration_Id__c', 'mbfc__Context__c')

In [None]:
# Get Person Contact Ids
add_salesforce_record_ids(sf, df_affiliations, 'Assigned_Person', 'Contact', 'Archdpdx_Migration_Id__c', 'mbfc__Person__c')

In [None]:
# Set Archdpdx_Migration_Id__c External ID
df_affiliations['Archdpdx_Migration_Id__c'] = df_affiliations.index

# Create Job ID
df_affiliations['Archdpdx_Job_Id__c'] = curr_job_id

df_affiliations


In [None]:
# Final cleanup
df_affiliations.drop(columns=[
    'Account_Ext_Id',
    'Assigned_Person', 
    ], 
    inplace=True)

df_affiliations

#FIXME: INVALID_FIELD: Foreign key external ID: relcommunities_23 not found for field Archdpdx_Migration_Id__c
#FIXME: INVALID_FIELD: Foreign key external ID: offices_0 not found for field Archdpdx_Migration_Id__c
#FIXME: Record #115 > FIELD_INTEGRITY_EXCEPTION: Start Date: invalid date: Tue Aug 01 00:00:00 GMT 1021 [mbfc__Start_Date__c

In [None]:
df_affiliations.to_csv('staging_files/affiliations_staging.csv', encoding='utf-8', index=False)

In [531]:
import pandas as pd
from simple_salesforce import Salesforce
from simple_salesforce.exceptions import SalesforceMalformedRequest, SalesforceError

def convert_non_serializables(record):
    """Convert non-serializable values to strings or handle them appropriately."""
    for key, value in record.items():
        if pd.isna(value):
            record[key] = None
        elif isinstance(value, pd.Timestamp):
            record[key] = value.isoformat()
        elif isinstance(value, (pd.Timedelta, pd.Period)):
            record[key] = str(value)
    return record

def upsert_to_salesforce_bulk(sf, dataframe, object_name, external_id_field, failed_log_file, batch_size=10000):
    """
    Upsert records to Salesforce from a pandas DataFrame using the Bulk API.

    Parameters:
    sf (Salesforce): The Salesforce connection instance.
    dataframe (pd.DataFrame): The pandas DataFrame containing data to upsert.
    object_name (str): The Salesforce object name (e.g., 'Contact').
    external_id_field (str): The external ID field used for upserts.
    failed_log_file (str): The file name where failed upsert records will be logged.
    batch_size (int): The number of records to include in each batch.
    """
    successful_upserts = 0
    failed_upserts = 0

    # Replace placeholder values with None in the DataFrame
    dataframe.replace({pd.NA: None, ' ': None, '': None}, inplace=True)

    # Convert DataFrame to a list of dictionaries
    data_to_upsert = dataframe.to_dict(orient='records')

    with open(failed_log_file, 'a') as log_file:
        # Process data in batches
        for i in range(0, len(data_to_upsert), batch_size):
            batch_data = data_to_upsert[i:i + batch_size]
            batch_data = [convert_non_serializables(record) for record in batch_data]

            try:
                # Perform bulk upsert
                response = sf.bulk.__getattr__(object_name).upsert(batch_data, external_id_field=external_id_field)

                for res in response:
                    if res['success']:
                        successful_upserts += 1
                    else:
                        failed_upserts += 1
                        log_file.write(f"Failed to upsert record: {res}\n")

            except SalesforceMalformedRequest as e:
                failed_upserts += len(batch_data)
                log_file.write(f"Malformed request error when upserting batch. Error: {e.content}\n")
                for record in batch_data:
                    log_file.write(f"Failed record: {record}\n")
            except SalesforceError as e:
                failed_upserts += len(batch_data)
                log_file.write(f"Salesforce error when upserting batch. Error: {e.content}\n")
                for record in batch_data:
                    log_file.write(f"Failed record: {record}\n")
            except Exception as e:
                failed_upserts += len(batch_data)
                log_file.write(f"Failed to upsert batch. Error: {e}\n")
                for record in batch_data:
                    log_file.write(f"Failed record: {record}\n")

    print(f"Upsert completed. Successful upserts: {successful_upserts}, Failed upserts: {failed_upserts}")

In [None]:
upsert_to_salesforce_bulk(sf, 'mbfc__Affiliation__c', df_affiliations, 'Archdpdx_Migration_Id__c', 'results_files/affiliation_upsert_results')

In [None]:
# @ Upsert Register Entry Records

bulk_data = []
for row in df_affiliations.itertuples(index=False):
    d = row._asdict()
    bulk_data.append(d)

In [None]:
# Upsert Salesforce records
# FIXME: Encoding is getting messed up and I'm unsure how to pass in a parameter that will fix this. 

try:
    # Attempt to upsert Affiliation records into SF using Bulk API
    affiliation_upsert = sf.bulk.mbfc__Affiliation__c.upsert(data=bulk_data, external_id_field='Archdpdx_Migration_Id__c', batch_size=1000, use_serial=False)
    affiliation_upsert_results = pd.DataFrame(affiliation_upsert)
    affiliation_upsert_results.to_csv('results_files/affiliation_upsert_results')

except SalesforceMalformedRequest as e:
    # If a SalesforceMalformedRequest error occurs, print the error message and response content
    print(f"SalesforceMalformedRequest error: {e}")
    print(f"Response content: {e.content}")


# Post-Migration Manual Updates

1. Convert 'Offices' that are ADPDX Pastoral Centre offices into record type: 'Groups', and set their parentID to the Diocese (there are just 6 of these accounts).
1. Update the Religous Order records 'Religious Superior' lookup.
1. Set 'organization type' field value for each account in the 'organization' load: Offices, Newman Centres, Schools, Organizations
1. Consolidate education degree titles in 'Affiliation.Affiliation' picklist into the standard value
