<a href="https://colab.research.google.com/github/Cath-Strategic-Tech/adpdx_etl/blob/main/ADPDX_ClergyDB.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>


# Introduction

The following notebook orchestrates the migration of ADPDX Accounts into Salesforce.


# Order of Loading

1. Vicariates
1. Organizations [MANUAL]
1. Religious Parents
1. Religious Communities
1. Religious Superiors
1. Contacts
1. Contact > Register Entries
1. Contact > Education Affiliations [MANUAL]
1. Contact > Ecclesial Affiliations [MANUAL]
1. Affiliations [MANUAL]


# Order of Operations

- Setup Enviro

  - [DONE] UDFs
  - [DONE] Load SF xref data

- ACCOUNTS

  - Extract Source Data
    - [DONE] Load 6 tables into separate dataframes
    - [DONE] Merge into single accounts table
    - [DONE]: Fix the ExternalID so that it references the original table, not the AccRecordType
  - Transform
    - Strip phone numbers
    - Validate email addresses
    - TODO: handle churches that aren't parishes (missions, non-diocesan parishes, etc.)
  - Load
    - [DONE]Vicariates
    - [DONE] Organizations (Parishes, Schools, Newman Centres, Offices)
    - Religious
      - [DONE] Religious Parent accounts
      - [DONE] Religious Communities
      - [DONE] Religious Superiors (Contacts, set AccountID to Rel. Parent)
        - [DONE]: Handle invalid email addresses
        - TODO: Handle duplicate entries
      - TODO: Update Religious Communities with lookup to Rel. Superior
  - TODO: Unit Tests
    - Num of Accounts, by type
    - Spot checking 3-5 account records & field values

- CONTACTS

  - Extract

    - [DONE] Import Contact records
    - TODO: Get Photo directory @soames

  - Analysis

    - [DONE] Check columns & row count (3016)
    - [DONE] Identify unique languages

  - Transform

    - Complete ETL of fields that are more complex (search for TODO)
    - [DONE] Create new df_contact_staging, renaming columns to SF APIs
    - [DONE] Drop columns that don't map to Contact
    - Migrate Languages field (waiting on next package version) @soames
      - TODO: transform `,` to `;` so imports to multi-select list correctly
    - TODO: Concat Mailing Street Address lines into one
    - TODO: Handle Private Addresses: decide if will code changes or NOT use a custom Private Address field.
    - [DONE] Update boolean fields to True/False
    - [DONE] Set Contact Record Type (UDF)
    - [DONE] Validate, drop invalid emails
    - [DONE] Generate ExternalID > 'Archdpdx_External_Id\_\_c'
    - TODO: Preferred Email/Phone > where blank, set a default. Currently, all are getting set to 'Personal' and 'Mobile.'
    - TODO: Ecclesial Status (not mapping correctly)
    - [DONE] DROP columns that haven't been mapped yet

  - Load
    - [DONE] Set JobID to curr_job_id
    - [DONE] Handle character encoding that is geting messed up

- CONTACTS > SPOUSES

- CONTACTS > PHOTOS

- CONTACTS > REGISTER ENTRIES

  - Parse columns into types of Sacraments or Notations
  - For lookups to Celebrants, query SF for contacts, create missing records
  - Generate External ID, apply to df
  - Clean up (remove extra columns, NaNs)
  - Upsert records

- CONTACTS > AFFILIATIONS

  - Map the various Contact fields that are actually Affiliations (start with manual migration)
    - Education/Degrees
    - Minor Orders
    - Religious Vows
    - Candidacy records (should this be another object?)
    - In/Excardination
    - Faculties

- AFFILIATIONS TABLE

  - Extract

    - [DONE] Turn the 'Org Table Name' & 'org Table Link' columns into External ID
    - Map in the Account IDs from SF

  - Transform

    - Parse RecordTypeId
    - Parse Category
    - Map columns to SF field APIs

  - Load


# Setup Enviro


In [164]:
# !conda install -y simple-salesforce
# !conda install -y email_validator
# !conda install -y python-dotenv
# !conda install import-ipynb


In [165]:
# enviro setup

import pandas as pd
import numpy as np

from datetime import datetime
now = datetime.now()

from simple_salesforce import Salesforce

In [166]:
# import environment variables (SF login credentials)
from dotenv import load_dotenv
import os

load_dotenv()

True

In [167]:
# @title Global Variables { run: "auto", vertical-output: true, display-mode: "both" }

target_enviro = "adpdx_devpro" # @param {type:"string"}

# @markdown The `run_upserts` variable controls whether or not upserts to Salesforce are executed when the notebook is run.
run_upserts = "True" # @param ["True", "False"]

In [168]:
# ADPDX dev_pro credentials
adpdx_user = os.getenv('ADPDX_UAT_USER')
print(adpdx_user)
adpdx_pass = os.getenv('ADPDX_UAT_PASS')
print( adpdx_pass)
adpdx_token = os.getenv('ADPDX_UAT_TOKEN')
print(adpdx_token)

# instantiate a SF session object
sf = Salesforce(domain='test', username=adpdx_user, password=adpdx_pass, security_token=adpdx_token)

matt+adpdx@meribahflow.com.uat
8n&ycaQJ
aKRgyLyAX5V0YPeJJRX5bDdi


## UDFs


In [169]:
# Job ID Incrementer

def update_job_id(file_name):
    # Open the file in read mode and get the current job ID
    with open(file_name, 'r') as file:
        current_job_id = int(file.readline())

    # Increment the job ID
    new_job_id = current_job_id + 1

    # Open the file in write mode and update the job ID
    with open(file_name, 'w') as file:
        file.write(str(new_job_id))

    # Return the new job ID
    return new_job_id


# Concates two DF columns for an External ID

def concat_columns(df, columns, new_column, separator='_'):
    """
    Concatenates the values from specified columns into a single string
    with the specified separator and populates a new column in the DataFrame.

    Args:
    - df: pandas DataFrame
    - columns: list of column names to concatenate
    - new_column: name of the new column to be created
    - separator: separator to use between concatenated values (default is '_')

    Returns:
    - Updated pandas DataFrame with the new column
    """
    df[new_column] = df[columns].astype(str).apply(lambda x: separator.join(x), axis=1)
    return df



from simple_salesforce import Salesforce

def get_or_create_diocesan_account(sf, account_name):
    """
    Searches for an account by name, returns the ID if found,
    otherwise creates the account with RecordType 'Church' and 'mbfc__Church_Type__c' set to 'Diocese',
    and then returns the new ID.

    Parameters:
    sf (Salesforce): Salesforce connection object
    account_name (str): The name of the account to search for or create

    Returns:
    str: The ID of the found or created account
    """

    # Query for the Record Type ID using the Developer Name 'Church'
    record_type_query = "SELECT Id FROM RecordType WHERE SobjectType = 'Account' AND DeveloperName = 'Church' LIMIT 1"
    record_type_result = sf.query(record_type_query)
    if record_type_result['records']:
        record_type_id = record_type_result['records'][0]['Id']
    else:
        raise ValueError("No RecordType found with DeveloperName 'Church'")

    # Search for the Account by name
    account_query = f"SELECT Id FROM Account WHERE Name = '{account_name}' LIMIT 1"
    account_result = sf.query(account_query)
    
    if account_result['records']:
        # Account found, return the ID
        return account_result['records'][0]['Id']
    else:
        # Account not found, create a new Account
        account_data = {
            'Name': account_name,
            'RecordTypeId': record_type_id,
            'mbfc__Church_Type__c': 'Diocese'
        }
        new_account = sf.Account.create(account_data)
        return new_account['id']




## Extract Salesforce xref data

The following cells downloads all records from the target Salesforce enviro for the following objects:

- RecordTypes
- Users
- Accounts
- Contacts


In [170]:
# Get or create the Diocesan Account and get its ID
diocesan_account_id = get_or_create_diocesan_account(sf, 'Roman Catholic Archdiocese of Portland')
print(f"Account ID: {diocesan_account_id}")

Account ID: 001Dx00001HwDsgIAF


In [171]:
# get all ACTIVE SF users

sf_users = sf.query('Select Alias, FirstName, LastName, Username, id from User WHERE IsActive = True')
df_sf_users = pd.DataFrame(sf_users['records'])
df_sf_users = df_sf_users.drop(columns = 'attributes')
df_sf_users.shape

(9, 5)

In [172]:
# get all SF Record Types
get_all_recordTypes = 'Select Id, Name, DeveloperName, sObjecttype, namespaceprefix from RecordType'

# get list of records, add to dataframe
sf_recordTypes = sf.query(get_all_recordTypes)
df_sf_recordTypes = pd.DataFrame(sf_recordTypes['records'])
df_sf_recordTypes = df_sf_recordTypes.drop(columns = 'attributes')

# Create a dictionary mapping 'DeveloperName' to 'Id' for faster lookup
record_types_mapping = df_sf_recordTypes.set_index('DeveloperName')['Id'].to_dict()

df_sf_recordTypes

Unnamed: 0,Id,Name,DeveloperName,SobjectType,NamespacePrefix
0,012Dx0000003p4xIAA,Church,Church,Account,mbfc
1,012Dx0000003p4yIAA,Deanery,Deanery,Account,mbfc
2,012Dx0000003p4zIAA,Group,Group,Account,mbfc
3,012Dx0000003p50IAA,Organization,Organization,Account,mbfc
4,012Dx0000003p51IAA,Property,Property,Account,mbfc
5,012Dx0000003p52IAA,Religious,Religious,Account,mbfc
6,012Dx0000003p53IAA,z) All Types,All_Types,mbfc__Affiliation__c,mbfc
7,012Dx0000003p54IAA,Any,Any,mbfc__Affiliation__c,mbfc
8,012Dx0000003p55IAA,Pastoral Assignments,Assignments_Clergy,mbfc__Affiliation__c,mbfc
9,012Dx0000003p56IAA,Chancery Users,Chancery_Users,mbfc__Affiliation__c,mbfc


In [173]:
# get SF Account
get_all_accounts = 'Select id, Name, RecordTypeId, Type, mbfc__Parish_Code__c, Job_Id__c, Archdpdx_Migration_Id__c from Account'

# get list of records, add to dataframe
sf_accounts = sf.query(get_all_accounts)
df_sf_accounts = pd.DataFrame(sf_accounts['records'])
df_sf_accounts = df_sf_accounts.drop(columns = 'attributes')
df_sf_accounts.shape

(217, 7)

In [174]:
# get SF Contacts
get_all_contacts = 'Select id, Name, npe01__Type_of_Account__c, RecordTypeId, Archdpdx_Migration_Id__c, CreatedById from Contact'

# get list of records, add to dataframe
sf_contacts = sf.query(get_all_contacts)
df_sf_contacts = pd.DataFrame(sf_contacts['records'])
df_sf_contacts = df_sf_contacts.drop(columns = 'attributes')
df_sf_contacts.shape

(3, 6)

# ACCOUNTS


## Extract


### Load ArchdPDX csvs as DataFrames

ADPDX data for organizations is held in 6 tables, all of which will be migrated into Salesforce's Accounts object.


In [175]:
df_offices = pd.read_csv('/Users/matthewmartin/Library/CloudStorage/GoogleDrive-matt@meribahflow.com/Shared drives/Clients/ADPDX (Portland)/Data/Clergy DB/reports from clergypdx/Offices.csv', skiprows= lambda x: x in [1])
df_offices["src_table"] = 'Offices'
df_offices["AccountRecordType"] = 'Organization'
df_offices.rename({"Name": "Account Name"}, axis="columns", inplace=True)


In [176]:
df_parishes = pd.read_csv('/Users/matthewmartin/Library/CloudStorage/GoogleDrive-matt@meribahflow.com/Shared drives/Clients/ADPDX (Portland)/Data/Clergy DB/reports from clergypdx/Parishes (3).csv', dtype={'Vicariate': 'object', 'Established': 'str', 'Mission Of': 'str'}, skiprows= lambda x: x in [1])
df_parishes["src_table"] = 'Parishes'
df_parishes["AccountRecordType"] = 'Church'
df_parishes.rename({"Parish Formal Name": "Account Name"}, axis="columns", inplace=True)


In [177]:
df_religious = pd.read_csv('/Users/matthewmartin/Library/CloudStorage/GoogleDrive-matt@meribahflow.com/Shared drives/Clients/ADPDX (Portland)/Data/Clergy DB/reports from clergypdx/RelCommunities.csv', skiprows= lambda x: x in [1])
df_religious["src_table"] = 'RelCommunities'
df_religious["AccountRecordType"] = 'Religious'
df_religious.rename({"Community Name": "Account Name"}, axis="columns", inplace=True)


In [178]:
df_schools = pd.read_csv('/Users/matthewmartin/Library/CloudStorage/GoogleDrive-matt@meribahflow.com/Shared drives/Clients/ADPDX (Portland)/Data/Clergy DB/reports from clergypdx/Schools.csv', skiprows= lambda x: x in [1])
df_schools["src_table"] = 'Schools'
df_schools["AccountRecordType"] = 'Organization'
df_schools.rename({"School Name": "Account Name"}, axis="columns", inplace=True)

In [179]:
df_vicariates = pd.read_csv('/Users/matthewmartin/Library/CloudStorage/GoogleDrive-matt@meribahflow.com/Shared drives/Clients/ADPDX (Portland)/Data/Clergy DB/reports from clergypdx/Vicariates.csv', skiprows= lambda x: x in [1])
df_vicariates["src_table"] = 'Vicariates'
df_vicariates["AccountRecordType"] = 'Deanery'
# As we want to designate the Common Name as what will be the Account Name in Salesforce, we are renaming these columns in a different pattern than prior CSVs.
df_vicariates.rename({"Common Name": "Account Name"}, axis="columns", inplace=True)


In [180]:
df_newman = pd.read_csv('/Users/matthewmartin/Library/CloudStorage/GoogleDrive-matt@meribahflow.com/Shared drives/Clients/ADPDX (Portland)/Data/Clergy DB/reports from clergypdx/NewmanCenters.csv', skiprows= lambda x: x in [1])
df_newman["src_table"] = 'NewmanCenters'
df_newman["AccountRecordType"] = 'Organization'
df_newman.rename({"Newman Center Name": "Account Name", "Newman Center City": "Mailing Address City2"}, axis="columns", inplace=True)


Each of the 6 tables has an overlapping but distinct set of columns, making it challenging to conform these tables into a single staging table.

In addition, columns that correspond to the same field in salesforce are named differently in each table (eg. 'Parish City' vs. 'Religious City' vs. 'Newman Center City')


In [181]:
print('TABLE: (ROWS, COLUMNS)\n')

print(f'Offices:    {df_offices.shape}')
print(f'Parishes:   {df_parishes.shape}')
print(f'Religious:  {df_religious.shape}')
print(f'Schools:    {df_schools.shape}')
print(f'Vicariates: {df_vicariates.shape}')
print(f'Newman Ctr: {df_newman.shape}')

TABLE: (ROWS, COLUMNS)

Offices:    (35, 18)
Parishes:   (151, 45)
Religious:  (70, 34)
Schools:    (56, 26)
Vicariates: (18, 6)
Newman Ctr: (4, 37)


### Merge DFs into a single Accounts DF

This step takes 6 different tables and combines them into a single Accounts table for cleaning and staging.


In [182]:
# init list of DataFrames
src_accounts = [df_offices, df_parishes, df_religious, df_schools, df_vicariates, df_newman]

# concats the various Account dataframes into one large table
accounts = pd.concat(src_accounts, ignore_index=True)

## Transform


Time to do some table column renaming and re-organizing!


In [183]:
# renames columns headers to consolidate account names into SF-conformed data model
accounts.rename({"Common Name": "Name, City"}, axis="columns", inplace=True)

accounts.rename(
    columns={
        'Account Name': 'Name',
        'Mailing Address': 'BillingStreet1',
        'Mailing Address 2': 'BillingStreet2',
        'Mailing Address City': 'BillingCity',
        'Mailing Address State': 'BillingState',
        'Mailing Address Postal Code': 'BillingPostalCode',
        'Mailing Address Country': 'BillingCountry',
        'Email': 'mbfc__Email__c',
        'Web Site': 'Website',
        'Order Common Name': 'mbfc__Abbreviation__c',
        'Order Letters': 'mbfc__Religious_Suffix__c',
        'Men or Women': 'mbfc__Type_Members__c',
        'Archdiocese Assigns Clergy': 'Archdiocese_Assigns_Clergy__c',
        'Locator Description': 'Locator_Description__c',
        'Mission Of': 'Parent_Parish__c',
        'Established': 'mbfc__Date_Established__c',
        'County': 'County__c',
        'Disabled Access': 'Disabled_Access__c',
        'Sanctuary Capacity': 'Sanctuary_Capacity__c',
        'Miles to Pastoral Centre': 'Miles_to_Pastoral_Centre__c',
        'Archdiocesan School Code': 'Archdiocesan_School_Code__c',
        'Grades Provided': 'Grades_Provided__c',
    },
    inplace=True
)


# reorder column order
col = accounts.pop('Name')
accounts.insert(2, col.name, col)

col = accounts.pop('Parish Name')
accounts.insert(3, col.name, col)

col = accounts.pop('AccountRecordType')
accounts.insert(1, col.name, col)



In [184]:
accounts[accounts.BillingStreet2.isna() == False]

Unnamed: 0,Record Number,AccountRecordType,"Name, City",Name,Parish Name,Archdiocese_Assigns_Clergy__c,Locator_Description__c,BillingStreet1,BillingStreet2,BillingCity,...,Major Superior Email,School City,Parish Link,Vicariate Link,Archdiocesan_School_Code__c,Grades_Provided__c,Mailing Address 1,Mailing Address Zip,Vicariate Name,Mailing Address City2
14,32,Organization,Diaconate Office,Diaconate Office,,Yes,,Pastoral Center,2838 E Burnside St,Portland,...,,,,,,,,,,
32,58,Organization,Office of Marketing and Communications,Office of Marketing and Communications,,Yes,,Pastoral Center,2838 E Burnside St,Portland,...,,,,,,,,,,
35,1,Church,"Our Lady of Perpetual Help, St Mary’s, Albany","Our Lady of Perpetual Help, St Mary’s",,Yes,SW Ellsworth St between 8th and 9th Streets,"Our Lady of Perpetual Help, St Mary’s Parish",815 Broadalbin St SW,Albany,...,,,,,,,,,,
36,2,Church,"St. Andrew Dũng-Lạc Mission, Aloha",St. Andrew Dũng-Lạc,,No,SW Grabhorn Rd/209th Ave and Farmington Rd,St. Andrew Dũng-Lạc Mission,7390 SW Grabhorn Rd,Aloha,...,,,,,,,,,,
37,3,Church,"St. Elizabeth Ann Seton, Aloha",St. Elizabeth Ann Seton,,Yes,,St. Elizabeth Ann Seton Parish,3145 SW 192nd Ave,Aloha,...,,,,,,,,,,
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
236,62,Religious,"Work of Jesus the High Priest, Gresham (OJSS)",Work of Jesus the High Priest,,No,,OJSS Community,451 NW 1st St,Gresham,...,,,,,,,,,,
238,64,Religious,"Heralds of the Good News, Portland (HGN)",Heralds of the Good News,,No,,c/o Chancellor,2838 E Burnside St,Portland,...,rkappumkal@gmail.com,,,,,,,,,
239,65,Religious,"Missionary Oblates of Mary Immaculate, Rome, I...",Missionary Oblates of Mary Immaculate,,No,,Missionary Oblates of Mary Immaculate,Via Aurelia 290,Roma,...,gensec@omigen.org,,,,,,,,,
247,73,Religious,"Brothers of Saint John, Laredo, TX (CSJ)",Brothers of Saint John,,No,,St. John Priory,505 Century Dr S,Laredo,...,,,,,,,,,,


In [185]:
# merge two Non-Latin columns into one 
accounts['Non_Latin__c'] = accounts['Non-Latin'].combine_first(accounts['Non-Latin Rite']) 

In [186]:
# export merged tables DESCRIPTION to CSV for mapping
accounts.describe(include='all').transpose().to_csv(f'/Users/matthewmartin/Library/CloudStorage/GoogleDrive-matt@meribahflow.com/Shared drives/Clients/ADPDX (Portland)/Data/Clergy DB/working/accounts.csv')
accounts.describe(include='all').transpose()

Unnamed: 0,count,unique,top,freq,mean,std,min,25%,50%,75%,max
Record Number,334.0,,,,54.5,41.389801,1.0,21.25,45.0,76.75,173.0
AccountRecordType,334,4,Church,151,,,,,,,
"Name, City",316,316,Pastoral Center,1,,,,,,,
Name,334,291,St. Mary,5,,,,,,,
Parish Name,5,5,St. Anne,1,,,,,,,
...,...,...,...,...,...,...,...,...,...,...,...
Mailing Address 1,56,55,4420 SW St Marys Dr,2,,,,,,,
Mailing Address Zip,56.0,,,,97222.446429,124.9586,97005.0,97134.75,97217.5,97301.0,97526.0
Vicariate Name,18,18,Albany-Corvallis,1,,,,,,,
Mailing Address City2,4,4,Corvallis,1,,,,,,,


In [187]:
# Create a single BillingAddress field

# Concatenate the two columns with CHAR(10) as separator
accounts['BillingStreet'] = accounts[['BillingStreet1', 'BillingStreet2']].apply(lambda x: '\n'.join(x.dropna()), axis=1)

# Drop the original columns
accounts.drop(columns=['BillingStreet1', 'BillingStreet2'], inplace=True)

In [188]:
# Handle boolean fields

boolean_columns_to_convert = [
    'Archdiocese_Assigns_Clergy__c', 
    'Non_Latin__c', 
    'Disabled_Access__c', 
    ]

# Convert 'Yes'/'No' to True/False
accounts[boolean_columns_to_convert] = accounts[boolean_columns_to_convert].replace({'Yes': True, 'No': False, None: False})



In [189]:
accounts[boolean_columns_to_convert].sample(10)

Unnamed: 0,Archdiocese_Assigns_Clergy__c,Non_Latin__c,Disabled_Access__c
100,True,False,True
276,True,False,False
229,False,False,False
263,False,False,False
169,True,False,True
26,True,False,False
76,True,False,True
204,False,False,False
195,False,False,False
138,True,False,True


In [190]:
accounts['Religious Order']

0      NaN
1      NaN
2      NaN
3      NaN
4      NaN
      ... 
329    NaN
330    NaN
331    NaN
332    NaN
333    NaN
Name: Religious Order, Length: 334, dtype: object

In [191]:
# Religious Order fields > conform to new data model

# Apply logic to create new columns
accounts['Religious_Secular_Order__c'] = accounts.apply(
    lambda x: 'Religious Order' if x['Religious Order'] == 'Yes' else ('Secular Order' if x['Secular Order'] == 'Yes' else None), axis=1
)

accounts['Pontifical_or_Diocesan_Order__c'] = accounts.apply(
    lambda x: 'Diocesan Order' if x['Diocesan Order'] == 'Yes' else ('Pontifical Order' if x['Pontifical Order'] == 'Yes' else None), axis=1
)

accounts.drop(columns=['Religious Order', 'Secular Order', 'Diocesan Order', 'Pontifical Order'], inplace=True)

In [192]:
print(accounts['mbfc__Date_Established__c'].dtype)

object


In [193]:
# Handle Date fields that are only YYYY

# Ensure all values in 'mbfc__Date_Established__c' are strings
accounts['mbfc__Date_Established__c'] = accounts['mbfc__Date_Established__c'].astype(str)

# Define a function to transform valid year values
def transform_year(year):
    if pd.notna(year) and year.replace('.', '', 1).isdigit() and len(year.split('.')[0]) == 4:
        return pd.to_datetime(year.split('.')[0] + '-01-01')
    else:
        return pd.NaT

# Apply the function to the 'mbfc__Date_Established__c' column
accounts['mbfc__Date_Established__c'] = accounts['mbfc__Date_Established__c'].apply(transform_year)


In [194]:
# Format Parent_Parish__c field

# Remove instances of '0'
accounts.Parent_Parish__c = accounts.Parent_Parish__c.str.replace('0', '')



In [195]:
# Append prefix
accounts['Parent_Parish__c'] = accounts['Parent_Parish__c'].apply(lambda x: 'Parishes_' + x if pd.notna(x) and x is not None and x != '' else x)


In [196]:
# Check final results
accounts.Parent_Parish__c[accounts.Parent_Parish__c.isna() == False].sample(10)

183                
85                 
51                 
168                
114                
137                
80     Parishes_129
40                 
171                
163                
Name: Parent_Parish__c, dtype: object

In [197]:
# ParentID field

accounts['ParentId'] = accounts['Parent_Parish__c']


### AccountRecordType & ChurchType


In [198]:
#Sets all rows where AccountRecordType is Church as a Parish.
accounts.loc[accounts['AccountRecordType'] == 'Church', 'mbfc__Church_Type__c'] = 'Parish'
accounts[accounts['AccountRecordType'] == 'Church'].head(5)


Unnamed: 0,Record Number,AccountRecordType,"Name, City",Name,Parish Name,Archdiocese_Assigns_Clergy__c,Locator_Description__c,BillingCity,BillingState,Mailing Address Province,...,Mailing Address 1,Mailing Address Zip,Vicariate Name,Mailing Address City2,Non_Latin__c,BillingStreet,Religious_Secular_Order__c,Pontifical_or_Diocesan_Order__c,ParentId,mbfc__Church_Type__c
35,1,Church,"Our Lady of Perpetual Help, St Mary’s, Albany","Our Lady of Perpetual Help, St Mary’s",,True,SW Ellsworth St between 8th and 9th Streets,Albany,OR,,...,,,,,False,"Our Lady of Perpetual Help, St Mary’s Parish\n...",,,,Parish
36,2,Church,"St. Andrew Dũng-Lạc Mission, Aloha",St. Andrew Dũng-Lạc,,False,SW Grabhorn Rd/209th Ave and Farmington Rd,Aloha,OR,,...,,,,,False,St. Andrew Dũng-Lạc Mission\n7390 SW Grabhorn Rd,,,Parishes_83,Parish
37,3,Church,"St. Elizabeth Ann Seton, Aloha",St. Elizabeth Ann Seton,,True,,Aloha,OR,,...,,,,,False,St. Elizabeth Ann Seton Parish\n3145 SW 192nd Ave,,,,Parish
38,4,Church,"St. Peter the Fisherman Mission, Arch Cape",St. Peter the Fisherman,,True,79441 Hwy 101 S,Seaside,OR,,...,,,,,False,St. Peter the Fisherman Mission\nPO Box 29,,,Parishes_131,Parish
39,5,Church,"Our Lady of the Mountain, Ashland",Our Lady of the Mountain,,True,,Ashland,OR,,...,,,,,False,Our Lady of the Mountain Parish\n987 Hillview Dr,,,,Parish


### Generate ExternalId


In [199]:
# Generate an External ID
columns_to_concate = ['src_table', 'Record Number']
accounts = concat_columns(accounts, columns_to_concate, 'Archdpdx_Migration_Id__c', separator='_')

In [200]:
# set Deanery RecordTypeId to the Church RecordTypeId
# map in RecordTypeIds
accounts['RecordTypeId'] = accounts['AccountRecordType'].map(record_types_mapping)
record_types_mapping

{'Church': '012Dx0000003p4xIAA',
 'Deanery': '012Dx0000003p4yIAA',
 'Group': '012Dx0000003p4zIAA',
 'Organization': '012Hu000001pkqEIAQ',
 'Property': '012Dx0000003p51IAA',
 'Religious': '012Dx0000003p5KIAQ',
 'All_Types': '012Dx0000003p53IAA',
 'Any': '012Dx0000003p54IAA',
 'Assignments_Clergy': '012Dx0000003p55IAA',
 'Chancery_Users': '012Dx0000003p56IAA',
 'Clergy_Religious_Residence': '012Dx0000003p57IAA',
 'Diocean_Users': '012Dx0000003p58IAA',
 'Diocesan_Appointment': '012Dx0000003p59IAA',
 'Ecclesial_Affiliation': '012Dx0000003p5AIAQ',
 'Education': '012Dx0000003p5BIAQ',
 'Lay_Person': '012Dx0000003p5HIAQ',
 'Ministerial_Status': '012Dx0000003p5DIAQ',
 'Parish_Affiliations': '012Dx0000003p5EIAQ',
 'Staff': '012Dx0000003p5FIAQ',
 'Consecrated': '012Dx0000003p5GIAQ',
 'Permanent_Deacon': '012Dx0000003p5IIAQ',
 'Priest': '012Dx0000003p5JIAQ',
 'MajorGift': '012Hu000001pkqBIAQ',
 'Grant': '012Hu000001pkqCIAQ',
 'HH_Account': '012Hu000001pkqDIAQ',
 'Donation': '012Hu000001pkqFIAQ',
 

### Send to CSV for examination


## Load


### Generate a new Job ID


In [201]:
# increment to the job_id
file_name = '/Users/matthewmartin/Library/CloudStorage/GoogleDrive-matt@meribahflow.com/Shared drives/Clients/ADPDX (Portland)/Data/Clergy DB/jobs/job_id'
curr_job_id = update_job_id(file_name)
print(f"New job ID: {curr_job_id}")

# add/update account DF with job_id
accounts["Job_Id__c"] = curr_job_id


New job ID: 98


### A) Vicariates


In [202]:
# Get Account Group RecordTypeID
deanery_recordTypeId = df_sf_recordTypes.loc[
    (df_sf_recordTypes['DeveloperName'] == 'Deanery') & (df_sf_recordTypes['SobjectType'] == 'Account'),
    'Id'
    ].iloc[0]  # Use .iloc[0] to get the first item if you're expecting exactly one match


# Insert Vicariates holding account
vicariate_account = sf.Account.upsert('Archdpdx_Migration_Id__c/Vicariates_Holding_Acc',
    {
    "Name": "Vicariates",
    "ParentId": diocesan_account_id,
    "mbfc__Diocese__c": diocesan_account_id,
    "RecordTypeId": deanery_recordTypeId,
    # "mbfc__Group_Type__c": 'Office',
    "Job_Id__c": curr_job_id
    }
)

# Get Vicariate Holding Acc's SF ID (as an upsert doesn't return the actual record ID)
vicariate_account = sf.Account.get_by_custom_id('Archdpdx_Migration_Id__c', 'Vicariates_Holding_Acc')
vicariate_account_id = vicariate_account['Id']

vicariate_account_id

'001Dx00001HwDuDIAV'

In [203]:
# Prepare Vicariates staging DF

vicariates = accounts[accounts['AccountRecordType'] == 'Deanery']


vicariates = vicariates[[
    'Record Number',
    'Name',
    # 'AccountRecordType',
    'Job_Id__c',
    'Archdpdx_Migration_Id__c',
    'RecordTypeId'
    ]]

# add parentid
vicariates["mbfc__Diocese__c"] = diocesan_account_id
vicariates['ParentId'] = vicariate_account_id
# vicariates['mbfc__Church_Type__c'] = 'Deanery'
vicariates['RecordTypeId'] = deanery_recordTypeId

vicariates.rename(columns={
        # 'Name, City': 'Name',
        'External_Id': 'Archdpdx_Migration_Id__c'
    }, inplace=True)

vicariates.reset_index()
vicariates.set_index('Record Number', inplace=True)

vicariates

Unnamed: 0_level_0,Name,Job_Id__c,Archdpdx_Migration_Id__c,RecordTypeId,mbfc__Diocese__c,ParentId
Record Number,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
1,Albany-Corvallis Vicariate,98,Vicariates_1,012Dx0000003p4yIAA,001Dx00001HwDsgIAF,001Dx00001HwDuDIAV
2,"Beaverton, Suburban Vicariate",98,Vicariates_2,012Dx0000003p4yIAA,001Dx00001HwDsgIAF,001Dx00001HwDuDIAV
3,Columbia County Vicariate,98,Vicariates_3,012Dx0000003p4yIAA,001Dx00001HwDsgIAF,001Dx00001HwDuDIAV
4,Downtown Portland Vicariate,98,Vicariates_4,012Dx0000003p4yIAA,001Dx00001HwDsgIAF,001Dx00001HwDuDIAV
5,"East Portland, Suburban Vicariate",98,Vicariates_5,012Dx0000003p4yIAA,001Dx00001HwDsgIAF,001Dx00001HwDuDIAV
6,Marion County Vicariate,98,Vicariates_6,012Dx0000003p4yIAA,001Dx00001HwDsgIAF,001Dx00001HwDuDIAV
7,Metropolitan Eugene Vicariate,98,Vicariates_7,012Dx0000003p4yIAA,001Dx00001HwDsgIAF,001Dx00001HwDuDIAV
8,Metropolitan Salem Vicariate,98,Vicariates_8,012Dx0000003p4yIAA,001Dx00001HwDsgIAF,001Dx00001HwDuDIAV
9,North Coast Vicariate,98,Vicariates_9,012Dx0000003p4yIAA,001Dx00001HwDsgIAF,001Dx00001HwDuDIAV
10,Northeast Portland Vicariate,98,Vicariates_10,012Dx0000003p4yIAA,001Dx00001HwDsgIAF,001Dx00001HwDuDIAV


#### Export Vicariates to CSV


In [204]:
# export to CSV
vicariates.to_csv('/Users/matthewmartin/Library/CloudStorage/GoogleDrive-matt@meribahflow.com/Shared drives/Clients/ADPDX (Portland)/Data/Clergy DB/staging/vicariates_staging.csv')


#### Upsert Vicariates


In [205]:
bulk_data = []
for row in vicariates.itertuples(index=False):
    d = row._asdict()
    # del d['Index']
    bulk_data.append(d)

if run_upserts == 'True':
    vicariate_upsert = sf.bulk.Account.upsert(data=bulk_data, external_id_field='Archdpdx_Migration_Id__c', batch_size=100, use_serial=False)
    upserts = pd.DataFrame(vicariate_upsert)

    print(upserts)
    

    success  created                  id errors
0      True    False  001Dx00001HwDwnIAF     []
1      True    False  001Dx00001HwDwoIAF     []
2      True    False  001Dx00001HwDwpIAF     []
3      True    False  001Dx00001HwDwqIAF     []
4      True    False  001Dx00001HwDwrIAF     []
5      True    False  001Dx00001HwDwsIAF     []
6      True    False  001Dx00001HwDwtIAF     []
7      True    False  001Dx00001HwDwuIAF     []
8      True    False  001Dx00001HwDwvIAF     []
9      True    False  001Dx00001HwDwwIAF     []
10     True    False  001Dx00001HwDwxIAF     []
11     True    False  001Dx00001HwDwyIAF     []
12     True    False  001Dx00001HwDwzIAF     []
13     True    False  001Dx00001HwDx0IAF     []
14     True    False  001Dx00001HwDx1IAF     []
15     True    False  001Dx00001HwDx2IAF     []
16     True    False  001Dx00001HwDx3IAF     []
17     True    False  001Dx00001HwDx4IAF     []


In [206]:
# Generate an Errors log
import csv

keys = vicariate_upsert[0].keys()

with open('results_files/vicariate_results', 'w', newline='') as csv_file:
    writer = csv.DictWriter(csv_file, keys)
    writer.writeheader()
    writer.writerows(vicariate_upsert)

In [207]:
# Get Vicariate records from SF

sf_deaneries = sf.query("SELECT Archdpdx_Migration_Id__c, Id FROM Account WHERE RecordType.DeveloperName = 'Deanery'")

df_sf_deaneries = pd.DataFrame(sf_deaneries['records'])
df_sf_deaneries = df_sf_deaneries.drop(columns = 'attributes')

df_sf_deaneries

# Creates a dict of Vicariate unique ids to the new Salesforce record IDs, so can populate on latter Account records
vicariate_sf_recordids = df_sf_deaneries.set_index('Archdpdx_Migration_Id__c')['Id'].to_dict()
vicariate_sf_recordids

{'Vicariates_Holding_Acc': '001Dx00001HwDuDIAV',
 'Vicariates_1': '001Dx00001HwDwnIAF',
 'Vicariates_2': '001Dx00001HwDwoIAF',
 'Vicariates_3': '001Dx00001HwDwpIAF',
 'Vicariates_4': '001Dx00001HwDwqIAF',
 'Vicariates_5': '001Dx00001HwDwrIAF',
 'Vicariates_6': '001Dx00001HwDwsIAF',
 'Vicariates_7': '001Dx00001HwDwtIAF',
 'Vicariates_8': '001Dx00001HwDwuIAF',
 'Vicariates_9': '001Dx00001HwDwvIAF',
 'Vicariates_10': '001Dx00001HwDwwIAF',
 'Vicariates_11': '001Dx00001HwDwxIAF',
 'Vicariates_12': '001Dx00001HwDwyIAF',
 'Vicariates_13': '001Dx00001HwDwzIAF',
 'Vicariates_14': '001Dx00001HwDx0IAF',
 'Vicariates_15': '001Dx00001HwDx1IAF',
 'Vicariates_16': '001Dx00001HwDx2IAF',
 'Vicariates_17': '001Dx00001HwDx3IAF',
 'Vicariates_18': '001Dx00001HwDx4IAF'}

### B) Parishes, Schools, Organizations


In [208]:
# Create acc_main (accounts excluding Deaneries (already handled) and Religious (to be handled differently, after))
acc_main = accounts[accounts['AccountRecordType'] != 'Deanery']
acc_main = acc_main[acc_main['AccountRecordType'] != 'Religious']

acc_main.loc[acc_main['AccountRecordType'] == 'Church', 'Vicariate_Ext_Id'] = 'Vicariates_' + acc_main['Vicariate']

In [209]:
acc_main.sample(5)

Unnamed: 0,Record Number,AccountRecordType,"Name, City",Name,Parish Name,Archdiocese_Assigns_Clergy__c,Locator_Description__c,BillingCity,BillingState,Mailing Address Province,...,Non_Latin__c,BillingStreet,Religious_Secular_Order__c,Pontifical_or_Diocesan_Order__c,ParentId,mbfc__Church_Type__c,Archdpdx_Migration_Id__c,RecordTypeId,Job_Id__c,Vicariate_Ext_Id
47,15,Church,"St. Patrick of the Forest Mission, Cave Junction",St. Patrick of the Forest,,True,407 W River St,Cave Junction,OR,,...,False,St. Patrick of the Forest Mission\n407 W River St,,,Parishes_42,Parish,Parishes_15,012Dx0000003p4xIAA,98,Vicariates_15
55,24,Church,"Our Lady of Perpetual Help, Cottage Grove",Our Lady of Perpetual Help,,True,,Cottage Grove,OR,,...,False,Our Lady of Perpetual Help Parish\n1025 N 19th St,,,,Parish,Parishes_24,012Dx0000003p4xIAA,98,Vicariates_7
294,41,Organization,"The Madeleine School, Portland",The Madeleine School,,True,,Portland,OR,,...,False,,,,,,Schools_41,012Hu000001pkqEIAQ,98,
161,132,Church,"Our Lady of Fatima, Shady Cove",Our Lady of Fatima,,True,56 Williams Lane,Shady Cove,OR,,...,False,Our Lady of Fatima Parish\nPO Box 116,,,,Parish,Parishes_132,012Dx0000003p4xIAA,98,Vicariates_15
158,129,Church,"St. Bernard, Scio",St. Bernard,,True,38810 NW Cherry St,Scio,OR,,...,False,St. Bernard Parish\nPO Box 45,,,,Parish,Parishes_129,012Dx0000003p4xIAA,98,Vicariates_1


In [210]:
# map in Deaneries
acc_main['mbfc__Deanery__c'] = acc_main.Vicariate_Ext_Id.map(vicariate_sf_recordids)

acc_main[acc_main['AccountRecordType'] == 'Church']['mbfc__Deanery__c']

35     001Dx00001HwDwnIAF
36     001Dx00001HwDwzIAF
37     001Dx00001HwDx2IAF
38     001Dx00001HwDwvIAF
39     001Dx00001HwDx1IAF
              ...        
181    001Dx00001HwDwrIAF
182    001Dx00001HwDx3IAF
183    001Dx00001HwDwsIAF
184    001Dx00001HwDx4IAF
185    001Dx00001HwDwtIAF
Name: mbfc__Deanery__c, Length: 151, dtype: object

In [211]:
# Clean up NaN values

acc_main.fillna('', inplace=True)

In [212]:
# Generate Schedule text from all Schedule columns

def create_account_schedule(row):
    account_schedule = []
    for i in range(1, 8):
        head_col = f'Schedule {i} Head'
        text_col = f'Schedule {i} Text'
        
        head = row[head_col]
        text = row[text_col]
        
        if pd.notnull(head) or pd.notnull(text):
            if pd.notnull(head):
                account_schedule.append(f"<p><strong>{head}</strong></p>")
            if pd.notnull(text):
                account_schedule.append(f"<p>{text}</p>")
            account_schedule.append("<p><br></p>")
    
    # Join all parts into a single string
    return "".join(account_schedule).strip()

acc_main['Account_Schedule__c'] = acc_main.apply(create_account_schedule, axis=1)



In [213]:
acc_main['Account_Schedule__c'].sample(15)

143    <p><strong>Weekend Mass</strong></p><p>Saturda...
263    <p><strong></strong></p><p></p><p><br></p><p><...
260    <p><strong></strong></p><p></p><p><br></p><p><...
16     <p><strong></strong></p><p></p><p><br></p><p><...
109    <p><strong>Weekend Mass</strong></p><p>Sat 5:0...
173    <p><strong>Weekend Mass</strong></p><p>Saturda...
107    <p><strong>Weekend Mass</strong></p><p>Saturda...
154    <p><strong>Weekend Mass</strong></p><p>Saturda...
79     <p><strong>Weekend Mass</strong></p><p>No Mass...
64     <p><strong>Weekend Mass</strong></p><p>Saturda...
33     <p><strong></strong></p><p></p><p><br></p><p><...
286    <p><strong></strong></p><p></p><p><br></p><p><...
292    <p><strong></strong></p><p></p><p><br></p><p><...
84     <p><strong>Weekend Mass</strong></p><p>Saturda...
81     <p><strong>Weekend Mass</strong></p><p>Sunday ...
Name: Account_Schedule__c, dtype: object

In [214]:
# Create 'account_staging' df (drop extraneous columns)

accounts_staging = acc_main[[
    'Name',
    'RecordTypeId',
    'mbfc__Church_Type__c',
    'mbfc__Deanery__c',
    'BillingStreet',
    'BillingCity',
    'BillingState',
    'BillingPostalCode',
    'BillingCountry',
    'Phone',
    'Fax',
    'mbfc__Email__c',
    'Website',
    'Account_Schedule__c',
    'mbfc__Abbreviation__c',
    'mbfc__Religious_Suffix__c',
    'mbfc__Type_Members__c',
    'Description',
    'Archdiocese_Assigns_Clergy__c', # Boolean fields
    'Non_Latin__c', 
    'Disabled_Access__c', 
    'Locator_Description__c',
    'Parent_Parish__c',
    'mbfc__Date_Established__c',
    'County__c',
    'Sanctuary_Capacity__c',
    # 'Miles_to_Pastoral_Centre__c',
    'Religious_Secular_Order__c',
    'Pontifical_or_Diocesan_Order__c',
    'Archdiocesan_School_Code__c',
    'Grades_Provided__c',
    'Job_Id__c',
    'Archdpdx_Migration_Id__c',
    # 'ParentId'  # Later, check whether or not can upsert using external ID using this field

    ]]

In [215]:
accounts_staging

Unnamed: 0,Name,RecordTypeId,mbfc__Church_Type__c,mbfc__Deanery__c,BillingStreet,BillingCity,BillingState,BillingPostalCode,BillingCountry,Phone,...,Parent_Parish__c,mbfc__Date_Established__c,County__c,Sanctuary_Capacity__c,Religious_Secular_Order__c,Pontifical_or_Diocesan_Order__c,Archdiocesan_School_Code__c,Grades_Provided__c,Job_Id__c,Archdpdx_Migration_Id__c
0,Pastoral Center,012Hu000001pkqEIAQ,,,2838 E Burnside St,Portland,OR,97214,,503-234-5334,...,,NaT,,,,,,,98,Offices_1
1,Catholic Sentinel,012Hu000001pkqEIAQ,,,2838 E Burnside St,Portland,OR,97214,,503-281-1191,...,,NaT,,,,,,,98,Offices_3
2,Catholic Cemeteries,012Hu000001pkqEIAQ,,,333 SW Skyline Blvd,Portland,OR,97221,,503-292-6621,...,,NaT,,,,,,,98,Offices_4
3,Griffin Center,012Hu000001pkqEIAQ,,,11957 SE Fuller Rd,Milwaukie,OR,97222,,503-652-7476,...,,NaT,,,,,,,98,Offices_6
4,Providence Portland Medical Center,012Hu000001pkqEIAQ,,,4805 NE Glisan St,Portland,OR,97213,,503-215-6833,...,,NaT,,,,,,,98,Offices_11
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
311,Resurrection Catholic Parish School,012Hu000001pkqEIAQ,,,,Tualatin,OR,,,503-638-8869,...,,NaT,,,,,12-WEESRES,PK-5,98,Schools_58
330,OSU Newman Center,012Hu000001pkqEIAQ,,,2127 NW Monroe Ave,Corvallis,OR,97330,,541-752-6818,...,,NaT,,,,,,,98,NewmanCenters_1
331,St. Thomas More (UO) Newman Center,012Hu000001pkqEIAQ,,,1850 Emerald St,Eugene,OR,97403,,541-343-7021,...,,1915-01-01,,,,,,,98,NewmanCenters_2
332,Walsh Memorial (SOU) Newman Center at Our Lady...,012Hu000001pkqEIAQ,,,987 Hillview Dr,Ashland,OR,97520,,541-708-8503,...,,NaT,,,,,,,98,NewmanCenters_3


#### Create Parishes Holding Acc for acc heirarchy

In [216]:
# Upsert a Parishes holding account

# Get Account Group RecordTypeID
group_recordTypeId = df_sf_recordTypes.loc[
    (df_sf_recordTypes['DeveloperName'] == 'Group') & (df_sf_recordTypes['SobjectType'] == 'Account'),
    'Id'
    ].iloc[0]  # Use .iloc[0] to get the first item if you're expecting exactly one match


# Insert Vicariates holding account
parish_holding_account = sf.Account.upsert('Archdpdx_Migration_Id__c/Parishes_Holding_Acc',
    {
    "Name": "Parishes",
    "ParentId": diocesan_account_id,
    "RecordTypeId": group_recordTypeId,
    "Job_Id__c": curr_job_id,
    "mbfc__Group_Type__c": "Office"
    }
)

# Get Vicariate Holding Acc's SF ID (as an upsert doesn't return the actual record ID)

parish_holding_account = sf.Account.get_by_custom_id('Archdpdx_Migration_Id__c', 'Parishes_Holding_Acc')

parishes_holding_account_id = parish_holding_account['Id']

parishes_holding_account_id

'001Dx00001HwDxKIAV'

In [217]:
# Set the ParentId for all Parish records

accounts_staging['ParentId'] = None

accounts_staging['ParentId']= accounts_staging.apply(
    lambda row: parishes_holding_account_id if row['mbfc__Church_Type__c'] == 'Parish' else row['ParentId'], axis=1
)

accounts_staging.sample(10)


A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  accounts_staging['ParentId'] = None
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  accounts_staging['ParentId']= accounts_staging.apply(


Unnamed: 0,Name,RecordTypeId,mbfc__Church_Type__c,mbfc__Deanery__c,BillingStreet,BillingCity,BillingState,BillingPostalCode,BillingCountry,Phone,...,mbfc__Date_Established__c,County__c,Sanctuary_Capacity__c,Religious_Secular_Order__c,Pontifical_or_Diocesan_Order__c,Archdiocesan_School_Code__c,Grades_Provided__c,Job_Id__c,Archdpdx_Migration_Id__c,ParentId
275,Visitation Catholic School,012Hu000001pkqEIAQ,,,,Forest Grove,OR,,,503-357-6990,...,1883-01-01,,,,,12-VBTVISS,PS-8,98,Schools_22,
171,St. Boniface,012Dx0000003p4xIAA,Parish,001Dx00001HwDwxIAF,St. Boniface Parish\n375 SE Church St,Sublimity,OR,97385.0,,503-769-5664,...,1879-01-01,Marion,250.0,,,,,98,Parishes_142,001Dx00001HwDxKIAV
126,St. Irene the Virgin and Great Martyr,012Dx0000003p4xIAA,Parish,,St. Irene the Virgin and Great Martyr Parish\n...,Portland,OR,97217.0,,503-281-6744,...,2000-01-01,Multnomah,0.0,,,,,98,Parishes_97,001Dx00001HwDxKIAV
73,St. Anne,012Dx0000003p4xIAA,Parish,001Dx00001HwDwrIAF,St. Anne Parish\n1015 SE 182nd Ave,Portland,OR,97233.0,,503-665-4935,...,1957-01-01,Multnomah,425.0,,,,,98,Parishes_43,001Dx00001HwDxKIAV
19,U.S. Veterans’ Administration Hospital,012Hu000001pkqEIAQ,,,913 NW Garden Valley Blvd,Roseburg,OR,97470.0,,541-440-1000,...,NaT,,,,,,,98,Offices_37,
168,St. Alice,012Dx0000003p4xIAA,Parish,001Dx00001HwDwtIAF,St. Alice Parish\n1520 E St,Springfield,OR,97477.0,,541-747-7041,...,1921-01-01,Lane,450.0,,,,,98,Parishes_139,001Dx00001HwDxKIAV
284,St. John the Apostle Catholic School,012Hu000001pkqEIAQ,,,,Oregon City,OR,,,503-742-8230,...,1844-01-01,,,,,12-OREJOHS,PK-8,98,Schools_31,
133,St. Patrick,012Dx0000003p4xIAA,Parish,001Dx00001HwDwqIAF,St. Patrick Parish\n1623 NW 19th Ave,Portland,OR,97209.0,,503-222-4086,...,1889-01-01,Multnomah,275.0,,,,,98,Parishes_104,001Dx00001HwDxKIAV
142,St. Thérèse of the Child Jesus and the Holy Face,012Dx0000003p4xIAA,Parish,001Dx00001HwDwrIAF,St. Therese Parish\n1260 NE 132nd Ave,Portland,OR,97230.0,,503-256-5850,...,1955-01-01,Multnomah,550.0,,,,,98,Parishes_113,001Dx00001HwDxKIAV
160,Our Lady of Victory,012Dx0000003p4xIAA,Parish,001Dx00001HwDwvIAF,Our Lady of Victory Parish\nPO Box 29,Seaside,OR,97138.0,,503-738-6161,...,1900-01-01,Clatsop,300.0,,,,,98,Parishes_131,001Dx00001HwDxKIAV


#### Upsert Accounts (TBD )


In [218]:
# send accounts_staging to csv
accounts_staging.to_csv('staging_files/accounts_staging.csv', encoding='utf-8-sig')

In [219]:
# FIXME: Format ExternalID lookups into dictionary to match SF's api so can upsert using simple-salesforce

# Rename columns apis
accounts_staging = accounts_staging.rename(columns={'Parent_Parish__c': 'Parent_Parish__r'})  # Later on, attempt to include 'ParentId' (which, as a standard SF field, might not work)

# Reformat values to match what SF api requires
accounts_staging['Parent_Parish__r'] = accounts_staging.apply(lambda x: "{'Archdpdx_Migration_Id__c': '" + x['Parent_Parish__r'] + "'}" if pd.notna(x['Parent_Parish__r']) and x['Parent_Parish__r'] != 'None' and x['Parent_Parish__r'] != '' else None, axis=1)




In [220]:
# Remove all NaT values
accounts_staging.replace({pd.NaT: None}, inplace=True)

# convert '' to NaN
accounts_staging.replace("", np.nan, inplace=True)

# convert NaN to None
accounts_staging = accounts_staging.where(accounts_staging.notnull(), None)

In [221]:
bulk_data = []
for row in accounts_staging.itertuples(index=False):
    d = row._asdict()
    # del d['Index']
    bulk_data.append(d)

In [222]:
# Ensure all values are JSON serializable
import json

def make_json_serializable(records):
    for record in records:
        for key, value in record.items():
            if isinstance(value, pd.Timestamp):
                record[key] = value.isoformat() if not pd.isna(value) else None
            elif pd.isna(value):
                record[key] = None
    return records

bulk_data = make_json_serializable(bulk_data)

# Ensure that records are JSON serializable
json.dumps(bulk_data)  # This line will raise an error if there's still any non-serializable value



'[{"Name": "Pastoral Center", "RecordTypeId": "012Hu000001pkqEIAQ", "mbfc__Church_Type__c": null, "mbfc__Deanery__c": null, "BillingStreet": "2838 E Burnside St", "BillingCity": "Portland", "BillingState": "OR", "BillingPostalCode": 97214.0, "BillingCountry": null, "Phone": "503-234-5334", "Fax": "503-234-2545", "mbfc__Email__c": "commdir@archdpdx.org", "Website": "http://www.archdpdx.org/", "Account_Schedule__c": "<p><strong></strong></p><p></p><p><br></p><p><strong></strong></p><p></p><p><br></p><p><strong></strong></p><p></p><p><br></p><p><strong></strong></p><p></p><p><br></p><p><strong></strong></p><p></p><p><br></p><p><strong></strong></p><p></p><p><br></p><p><strong></strong></p><p></p><p><br></p>", "mbfc__Abbreviation__c": null, "mbfc__Religious_Suffix__c": null, "mbfc__Type_Members__c": null, "Description": null, "Archdiocese_Assigns_Clergy__c": true, "Non_Latin__c": false, "Disabled_Access__c": false, "Locator_Description__c": null, "Parent_Parish__r": null, "mbfc__Date_Estab

In [223]:
bulk_data

[{'Name': 'Pastoral Center',
  'RecordTypeId': '012Hu000001pkqEIAQ',
  'mbfc__Church_Type__c': None,
  'mbfc__Deanery__c': None,
  'BillingStreet': '2838 E Burnside St',
  'BillingCity': 'Portland',
  'BillingState': 'OR',
  'BillingPostalCode': 97214.0,
  'BillingCountry': None,
  'Phone': '503-234-5334',
  'Fax': '503-234-2545',
  'mbfc__Email__c': 'commdir@archdpdx.org',
  'Website': 'http://www.archdpdx.org/',
  'Account_Schedule__c': '<p><strong></strong></p><p></p><p><br></p><p><strong></strong></p><p></p><p><br></p><p><strong></strong></p><p></p><p><br></p><p><strong></strong></p><p></p><p><br></p><p><strong></strong></p><p></p><p><br></p><p><strong></strong></p><p></p><p><br></p><p><strong></strong></p><p></p><p><br></p>',
  'mbfc__Abbreviation__c': None,
  'mbfc__Religious_Suffix__c': None,
  'mbfc__Type_Members__c': None,
  'Description': None,
  'Archdiocese_Assigns_Clergy__c': True,
  'Non_Latin__c': False,
  'Disabled_Access__c': False,
  'Locator_Description__c': None,
  

In [224]:
#FIXME: account_staging isn't upserting via simple-salesforce (but it is via the Salesforce API)
from simple_salesforce.exceptions import SalesforceMalformedRequest

if run_upserts == 'True':

    try:
        account_staging_upsert = sf.bulk.Account.upsert(data=bulk_data, external_id_field='Archdpdx_Migration_Id__c', batch_size=100, use_serial=False)
        account_upserts = pd.DataFrame(account_staging_upsert)
    except SalesforceMalformedRequest as e:
        # If a SalesforceMalformedRequest error occurs, print the error message and response content
        print(f"SalesforceMalformedRequest error: {e}")
        print(f"Response content: {e.content}")

SalesforceMalformedRequest error: Malformed request https://adpdx--uat.sandbox.my.salesforce.com/services/async/57.0/job/750Dx0000088uqMIAQ/batch/751Dx000008pjbBIAQ/result. Response content: {'exceptionCode': 'InvalidBatch', 'exceptionMessage': 'Records not processed'}
Response content: {'exceptionCode': 'InvalidBatch', 'exceptionMessage': 'Records not processed'}


In [225]:
# Generate an Errors log
# import csv

# keys = account_staging_upsert[0].keys()

# with open('results_files/accounts_results', 'w', newline='') as csv_file:
#     writer = csv.DictWriter(csv_file, keys)
#     writer.writeheader()
#     writer.writerows(account_staging_upsert)

In [226]:
# Extract SF Account records

sf_accounts = sf.query('Select id, Name, RecordTypeId, mbfc__Church_Type__c, Archdpdx_Migration_Id__c, Job_Id__c from Account WHERE Job_Id__c != null')
sf_accounts = pd.DataFrame(sf_accounts['records'])
sf_accounts = sf_accounts.drop(columns = 'attributes')
sf_accounts

Unnamed: 0,Id,Name,RecordTypeId,mbfc__Church_Type__c,Archdpdx_Migration_Id__c,Job_Id__c
0,001Dx00001HwDyRIAV,Vocations,012Hu000001pkqEIAQ,,Offices_22,96
1,001Dx00001HwDySIAV,Our Lady of Peace Retreat,012Hu000001pkqEIAQ,,Offices_23,96
2,001Dx00001HwDyTIAV,St John Vianney Residence,012Hu000001pkqEIAQ,,Offices_26,96
3,001Dx00001HwDyUIAV,Father Bernard Youth Center,012Hu000001pkqEIAQ,,Offices_27,96
4,001Dx00001HwDyVIAV,PeaceHealth Sacred Heart Medical Center,012Hu000001pkqEIAQ,,Offices_28,96
...,...,...,...,...,...,...
205,001Dx00001HwDx0IAF,Southeast Portland Vicariate,012Dx0000003p4yIAA,,Vicariates_14,98
206,001Dx00001HwDx1IAF,Southern Oregon Vicariate,012Dx0000003p4yIAA,,Vicariates_15,98
207,001Dx00001HwDx2IAF,Tualatin Valley Vicariate,012Dx0000003p4yIAA,,Vicariates_16,98
208,001Dx00001HwDx3IAF,"West Portland, Suburban Vicariate",012Dx0000003p4yIAA,,Vicariates_17,98


### C) Religious Institutes (Parents)


In [238]:
"""
- 'acc_religious' DF: create unique_id of religious parents
- create 'acc_religious_orders' DF , upsert into SF
- extract accounts from Salesforce, create dict (external_ID : account_ID)
- map parent ids onto religious child accounts DF in main DF
- 'acc_religious' > staging DF ('acc_religious')
    - drop unnecessary columns
    - upsert create DF of religious children, upsert into SF with
"""

# Create a new DF of all Religious accounts
acc_religious = accounts[accounts['AccountRecordType'] == 'Religious']

# Create a simplified external ID field
acc_religious['Archdpdx_Migration_Id__c'] = acc_religious['Order Full Name'].apply(
    lambda x: x.lower().replace(' ', '')[:40]
)

acc_religious_2 = acc_religious

# Create a DF for only parent religious order accounts
acc_religious_parents = acc_religious_2[[
    'Order Full Name', 
    'Name', 
    'mbfc__Abbreviation__c', 
    'mbfc__Religious_Suffix__c', 
    'mbfc__Type_Members__c', 
    'Archdpdx_Migration_Id__c',
    'Pontifical_or_Diocesan_Order__c',
    'Religious_Secular_Order__c',
    ]]

# Drop duplicate rows of the same parent Religious Order (becuase there are more than 1 local community of a particular order)
acc_religious_parents.drop_duplicates('Order Full Name', inplace=True)

# How many remaining rows after dropping duplicates?
print(acc_religious_parents.shape)

# Rename columns
acc_religious_parents = acc_religious_parents.rename(columns={
    'Order Full Name': 'Description'
    })

# Drop NA
acc_religious_parents.fillna('', inplace=True)

acc_religious_parents


(62, 8)


A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  acc_religious['Archdpdx_Migration_Id__c'] = acc_religious['Order Full Name'].apply(
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  acc_religious_parents.drop_duplicates('Order Full Name', inplace=True)


Unnamed: 0,Description,Name,mbfc__Abbreviation__c,mbfc__Religious_Suffix__c,mbfc__Type_Members__c,Archdpdx_Migration_Id__c,Pontifical_or_Diocesan_Order__c,Religious_Secular_Order__c
186,Societas Iesu,Colombiere Jesuit Community,Jesuits,SJ,Men,societasiesu,,Religious Order
187,Ordo Cisterciensis Strictioris Observantiae,Abbey of Our Lady of Guadalupe,Trappists,OCSO,Men,ordocisterciensisstrictiorisobservantiae,Pontifical Order,Religious Order
189,Ordo Sancti Benedicti,Benedictine Monks of Mount Angel Abbey,Benedictines,OSB,Men,ordosanctibenedicti,,Religious Order
190,Misioneros del Espíritu Santo,Missionaries of the Holy Spirit Provincial House,"Missionaries of the Holy Spirit, Christ the Pr...",MSpS,Men,misionerosdelespíritusanto,,
191,Apostles of Jesus,Apostles of Jesus,Apostles of Jesus,AJ,Men,apostlesofjesus,Diocesan Order,Religious Order
...,...,...,...,...,...,...,...,...
249,Fraternità san Carlo Borromeo,Priestly Fraternity of the Missionaries of St....,Fraternity of St. Charles,FSCB,Men,fraternitàsancarloborromeo,,
250,"Sons of Mary, Mother of Mercy","Sons of Mary, Mother of Mercy","Sons of Mary, Mother of Mercy",SMMM,Men,"sonsofmary,motherofmercy",,
251,Society of the Divine Word,Society of the Divine Word,Society of the Divine Word,SVD,Men,societyofthedivineword,,
252,Society of the Divine Saviour,Society of the Divine Saviour,Society of the Divine Saviour,SDS,Men,societyofthedivinesaviour,,


In [239]:
acc_religious_parents['mbfc__Religious_Type__c'] = 'Congregation'

In [240]:
# Set recordType to 'Religious'

religious_recordtype_id = df_sf_recordTypes.loc[
    (df_sf_recordTypes['DeveloperName'] == 'Religious') & (df_sf_recordTypes['SobjectType'] == 'Account'),
    'Id'
    ].iloc[0]  # Use .iloc[0] to get the first item if you're expecting exactly one match

print(religious_recordtype_id)

acc_religious_parents['RecordTypeId'] = religious_recordtype_id

acc_religious_parents.sample(10)

012Dx0000003p52IAA


Unnamed: 0,Description,Name,mbfc__Abbreviation__c,mbfc__Religious_Suffix__c,mbfc__Type_Members__c,Archdpdx_Migration_Id__c,Pontifical_or_Diocesan_Order__c,Religious_Secular_Order__c,mbfc__Religious_Type__c,RecordTypeId
231,Society of the Holy Child Jesus,Society of the Holy Child Jesus,Society of the Holy Child Jesus,SHCJ,Women,societyoftheholychildjesus,Pontifical Order,Religious Order,Congregation,012Dx0000003p52IAA
220,Sisters of the Holy Names of Jesus and Mary U....,Sisters of the Holy Names of Jesus and Mary,Holy Names Sisters,SNJM,Women,sistersoftheholynamesofjesusandmaryu.s.-,,Religious Order,Congregation,012Dx0000003p52IAA
213,Lovers of Thuthiem Holy Cross Sisters,Thu Thiem Sisters,Thu Thiem Sisters,LHC,Women,loversofthuthiemholycrosssisters,,,Congregation,012Dx0000003p52IAA
236,Opera di Gesù Sommo Sacerdote,Work of Jesus the High Priest,Work of Jesus the High Priest,OJSS,Men,operadigesùsommosacerdote,,Religious Order,Congregation,012Dx0000003p52IAA
219,Congregation of Our Lady of Charity of the Goo...,Good Shepherd Sisters,Good Shepherd Sisters,RGS,Women,congregationofourladyofcharityofthegoods,,Religious Order,Congregation,012Dx0000003p52IAA
243,Congregatio Sanctissimi Redemptoris,Redemptorists,Redemptorists,CSsR,Men,congregatiosanctissimiredemptoris,,,Congregation,012Dx0000003p52IAA
252,Society of the Divine Saviour,Society of the Divine Saviour,Society of the Divine Saviour,SDS,Men,societyofthedivinesaviour,,,Congregation,012Dx0000003p52IAA
197,Brotherhood of the People of Praise,Brotherhood of the People of Praise,Brotherhood of the People of Praise,,Men,brotherhoodofthepeopleofpraise,Diocesan Order,,Congregation,012Dx0000003p52IAA
245,Oblates of the Virgin Mary,Oblates of the Virgin Mary,Oblates of the Virgin Mary,OMV,Men,oblatesofthevirginmary,,,Congregation,012Dx0000003p52IAA
232,Maronite Monks Of Jesus Mary and Joseph,Sacred Heart Maronite Monastery,Maronite Monks,MMJMJ,Men,maronitemonksofjesusmaryandjoseph,,,Congregation,012Dx0000003p52IAA


In [241]:
# Send to CSV
acc_religious_parents.to_csv('staging_files/religious_order_staging.csv', encoding='utf-8-sig')

In [242]:
# Upsert to Salesforce
bulk_data = []
for row in acc_religious_parents.itertuples(index=False):
    d = row._asdict()
    # del d['Index']
    bulk_data.append(d)

if run_upserts == 'True':
    religious_order_upsert = sf.bulk.Account.upsert(data=bulk_data, external_id_field='Archdpdx_Migration_Id__c', batch_size=100, use_serial=False)
    df_rel_order_upsert = pd.DataFrame(religious_order_upsert)

df_rel_order_upsert

Unnamed: 0,success,created,id,errors
0,True,True,001Dx00001HwE3TIAV,[]
1,True,True,001Dx00001HwE3UIAV,[]
2,True,True,001Dx00001HwE3VIAV,[]
3,True,True,001Dx00001HwE3WIAV,[]
4,True,True,001Dx00001HwE3XIAV,[]
...,...,...,...,...
57,True,True,001Dx00001HwE4OIAV,[]
58,True,True,001Dx00001HwE4PIAV,[]
59,True,True,001Dx00001HwE4QIAV,[]
60,True,True,001Dx00001HwE4RIAV,[]


In [243]:
# Generate an Errors log
import csv

keys = religious_order_upsert[0].keys()

with open('results_files/religious_order_results', 'w', newline='') as csv_file:
    writer = csv.DictWriter(csv_file, keys)
    writer.writeheader()
    writer.writerows(religious_order_upsert)

In [244]:
# @title get SF Accounts
get_all_rel_accounts = f"Select id, Name, RecordTypeId, Type, Archdpdx_Migration_Id__c from Account where RecordTypeID = '{religious_recordtype_id}'"

print(religious_recordtype_id)

# get list of records, add to dataframe
sf_accounts = sf.query(get_all_rel_accounts)
df_sf_accounts = pd.DataFrame(sf_accounts['records'])
df_sf_accounts = df_sf_accounts.drop(columns = 'attributes')

df_sf_accounts.sample(10)

012Dx0000003p52IAA


Unnamed: 0,Id,Name,RecordTypeId,Type,Archdpdx_Migration_Id__c
3,001Dx00001HwE3WIAV,Missionaries of the Holy Spirit Provincial House,012Dx0000003p52IAA,,misionerosdelespíritusanto
59,001Dx00001HwE4PIAV,"Sons of Mary, Mother of Mercy",012Dx0000003p52IAA,,"sonsofmary,motherofmercy"
39,001Dx00001HwE46IAF,Society of Mary,012Dx0000003p52IAA,,sociedaddemaría
27,001Dx00001HwE3uIAF,Sisters of Charity of the Blessed Virgin Mary,012Dx0000003p52IAA,,sistersofcharityoftheblessedvirginmary
15,001Dx00001HwE3iIAF,Servite Friars,012Dx0000003p52IAA,,ordoservorumbeataemariaevirginis
20,001Dx00001HwE3nIAF,Sisters of Our Lady of Sorrows,012Dx0000003p52IAA,,franciscanmissionarysistersofourladyofso
25,001Dx00001HwE3sIAF,Servants of Mary,012Dx0000003p52IAA,,orderofservantsofmary
52,001Dx00001HwE4IIAV,Redemptorists,012Dx0000003p52IAA,,congregatiosanctissimiredemptoris
18,001Dx00001HwE3lIAF,Oblates of St. Martha,012Dx0000003p52IAA,,congregacióndeoblatasdesantamarta
56,001Dx00001HwE4MIAV,Brothers of Saint John,012Dx0000003p52IAA,,brothersofsaintjohn


In [245]:
religious_order_mapping = df_sf_accounts.set_index('Archdpdx_Migration_Id__c')['Id'].to_dict()
# religious_order_mapping

### D) Religious Communities


In [249]:
acc_religious_staging = (acc_religious
                         .rename(columns={'Archdpdx_Migration_Id__c' : 'Parent_Archdpdx_Migration_Id__c'})
)

acc_religious_staging['ParentId'] = acc_religious_staging['Parent_Archdpdx_Migration_Id__c'].map(religious_order_mapping)

In [250]:
# Enrich the data

acc_religious_staging['mbfc__Religious_Type__c'] = 'Local Community'
acc_religious_staging['Archdpdx_Migration_Id__c'] = 'RelCommunities_' + acc_religious_staging['Record Number'].astype('str')
acc_religious_staging['RecordTypeId'] = religious_recordtype_id
acc_religious_staging.drop(columns='Name', inplace=True)
acc_religious_staging.rename(columns={
    'Name, City': 'Name'
}, inplace=True)

acc_religious_staging.sample(5)

Unnamed: 0,Record Number,AccountRecordType,Name,Parish Name,Archdiocese_Assigns_Clergy__c,Locator_Description__c,BillingCity,BillingState,Mailing Address Province,BillingPostalCode,...,BillingStreet,Religious_Secular_Order__c,Pontifical_or_Diocesan_Order__c,ParentId,mbfc__Church_Type__c,Parent_Archdpdx_Migration_Id__c,RecordTypeId,Job_Id__c,mbfc__Religious_Type__c,Archdpdx_Migration_Id__c
225,49,Religious,"Sisters of Reparation, Portland (SR)",,False,,Portland,OR,,97214.0,...,2120 SE 24th Ave,Religious Order,Diocesan Order,001Dx00001HwE41IAF,,sistersofreparationofthesacredwoundsofje,012Dx0000003p52IAA,98,Local Community,RelCommunities_49
223,47,Religious,"Sisters of Mercy of the Americas, Portland (RSM)",,False,,Portland,OR,,97212.0,...,2010 NE 19th Ave,,,001Dx00001HwE3zIAF,,sistersofmercyoftheamericaswest/midwestr,012Dx0000003p52IAA,98,Local Community,RelCommunities_47
188,3,Religious,"JCCU Jesuit Tertianship, Portland (SJ)",,False,,Portland,OR,,97206.0,...,3301 SE 45th Ave,,,001Dx00001HwE3TIAV,,societasiesu,012Dx0000003p52IAA,98,Local Community,RelCommunities_3
220,44,Religious,"Sisters of the Holy Names of Jesus and Mary, M...",,False,,Marylhurst,OR,,97306.0,...,PO Box 398,Religious Order,,001Dx00001HwE3wIAF,,sistersoftheholynamesofjesusandmaryu.s.-,012Dx0000003p52IAA,98,Local Community,RelCommunities_44
237,63,Religious,Society of the Missionaries of St. Francis Xav...,,False,,,,,,...,,,,001Dx00001HwE4CIAV,,societyofthemissionariesofst.francisxavi,012Dx0000003p52IAA,98,Local Community,RelCommunities_63


In [251]:
acc_religious_staging_2 = acc_religious_staging[[
    'Name',
    'RecordTypeId',
    'mbfc__Religious_Type__c',
    'BillingStreet',
    'BillingCity',
    'BillingState',
    'BillingPostalCode',
    'BillingCountry',
    'Phone',
    'Fax',
    'mbfc__Email__c',
    'Website',
    'mbfc__Abbreviation__c',
    'mbfc__Religious_Suffix__c',
    'mbfc__Type_Members__c',
    'Description',
    'Job_Id__c',
    'ParentId',
    'Archdpdx_Migration_Id__c'
    ]]

acc_religious_staging_2.sample(5)

Unnamed: 0,Name,RecordTypeId,mbfc__Religious_Type__c,BillingStreet,BillingCity,BillingState,BillingPostalCode,BillingCountry,Phone,Fax,mbfc__Email__c,Website,mbfc__Abbreviation__c,mbfc__Religious_Suffix__c,mbfc__Type_Members__c,Description,Job_Id__c,ParentId,Archdpdx_Migration_Id__c
202,"Congregation of the Holy Cross, Portland (CSC)",012Dx0000003p52IAA,Local Community,5410 N Strong St,Portland,OR,97203,,503-943-8024,503-943-7313,holycrossoffice@up.edu; ministry@up.edu,https://www.holycrossusa.org/,Holy Cross,CSC,Men,Serving the University of Portland; Holy Redee...,98,001Dx00001HwE3gIAF,RelCommunities_23
186,"Colombiere Jesuit Community, Portland (SJ)",012Dx0000003p52IAA,Local Community,3220 SE 43rd Ave,Portland,OR,97206,,503-595-1941,,,https://www.jesuitswest.org/,Jesuits,SJ,Men,"Manager: Fr. Paul Cochran, SJ",98,001Dx00001HwE3TIAV,RelCommunities_1
236,"Work of Jesus the High Priest, Gresham (OJSS)",012Dx0000003p52IAA,Local Community,OJSS Community\n451 NW 1st St,Gresham,OR,97030,,,,,https://www.familiemariens.info/html/en/index....,Work of Jesus the High Priest,OJSS,Men,Missionary brothers and priests associated wit...,98,001Dx00001HwE4BIAV,RelCommunities_62
225,"Sisters of Reparation, Portland (SR)",012Dx0000003p52IAA,Local Community,2120 SE 24th Ave,Portland,OR,97214,,503-236-4207,503-236-3400,repsrs@comcast.net,http://reparationsisters.org/,Sisters of Reparation,SR,Women,Serving Rose Hall Reparation and Prayer Center...,98,001Dx00001HwE41IAF,RelCommunities_49
230,"Society of Mary, Corvallis (SdM)",012Dx0000003p52IAA,Local Community,540 NW 9th St,Corvallis,OR,97330,,541-754-1505,,sister.teresa@socmaria.org,https://www.socmaria.org/home,Society of Mary,SdM,Women,An Institute of consecrated missionary sisters...,98,001Dx00001HwE46IAF,RelCommunities_54


In [252]:
# Final Cleanup

acc_religious_staging_2 = acc_religious_staging_2.fillna('')

In [253]:
# @title Send to CSV
acc_religious_staging_2.to_csv('/Users/matthewmartin/Library/CloudStorage/GoogleDrive-matt@meribahflow.com/Shared drives/Clients/ADPDX (Portland)/Data/Clergy DB/staging/religious_community_staging.csv', encoding='utf-8-sig')

In [254]:
# @title Upsert to Salesforce
bulk_data = []
for row in acc_religious_staging_2.itertuples(index=False):
    d = row._asdict()
    # del d['Index']
    bulk_data.append(d)

if run_upserts == 'True':
    religious_community_upsert = sf.bulk.Account.upsert(data=bulk_data, external_id_field='Archdpdx_Migration_Id__c', batch_size=100, use_serial=False)
    df_rel_community_upsert = pd.DataFrame(religious_community_upsert)

df_rel_community_upsert

Unnamed: 0,success,created,id,errors
0,True,True,001Dx00001HwE4dIAF,[]
1,True,True,001Dx00001HwE4eIAF,[]
2,True,True,001Dx00001HwE4fIAF,[]
3,True,True,001Dx00001HwE4gIAF,[]
4,True,True,001Dx00001HwE4hIAF,[]
...,...,...,...,...
65,True,True,001Dx00001HwE5fIAF,[]
66,True,True,001Dx00001HwE5gIAF,[]
67,True,True,001Dx00001HwE5hIAF,[]
68,True,True,001Dx00001HwE5iIAF,[]


### E) Religious Superiors


In [263]:
acc_rel_superiors = acc_religious_2[[
    'Name',
    'Major Superior Name',
    'Major Superior Phone',
    'Major Superior Email',
    'Archdpdx_Migration_Id__c']]


acc_rel_superiors['AccountId'] = acc_rel_superiors.Archdpdx_Migration_Id__c.map(religious_order_mapping)

# acc_rel_superiors.sample(5)

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  acc_rel_superiors['AccountId'] = acc_rel_superiors.Archdpdx_Migration_Id__c.map(religious_order_mapping)


In [264]:
# @title Parse Complex Names
def parse_names(df, column_name):
    # Convert all non-string entries to strings (handling NaN and other data types)
    df[column_name] = df[column_name].fillna('').apply(str)

    # Create a new DataFrame to store the name parts
    name_parts = pd.DataFrame()

    # Parse each name in the column
    name_parts['First Name'] = df[column_name].apply(lambda x: HumanName(x).first if x.strip() != '' else '')
    name_parts['Last Name'] = df[column_name].apply(lambda x: HumanName(x).last if x.strip() != '' else '')
    name_parts['Middle Name'] = df[column_name].apply(lambda x: HumanName(x).middle if x.strip() != '' else '')
    name_parts['Title'] = df[column_name].apply(lambda x: HumanName(x).title if x.strip() != '' else '')
    name_parts['Suffix'] = df[column_name].apply(lambda x: HumanName(x).suffix if x.strip() != '' else '')
    name_parts['Nickname'] = df[column_name].apply(lambda x: HumanName(x).nickname if x.strip() != '' else '')

    # Combine the original DataFrame with the name parts DataFrame
    result_df = pd.concat([df, name_parts], axis=1)
    return result_df



In [265]:
!pip install nameparser
from nameparser import HumanName
from nameparser.config import CONSTANTS

# Add dataset-specific Titles and Suffix constants for parsing
CONSTANTS.titles.add('Very', 'Rev.', 'Very Rev.', 'Sr.')
CONSTANTS.suffix_acronyms.add('FRS', 'OMI', 'OSA', 'OCD', 'OP', 'OC', 'FSE', 'OMV', 'SDB', 'SM', 'SFX', 'SP', 'OP', 'O.S.M', 'SNJM', 'OSF', 'HMRF', 'DD', 'CSJP', 'SDD', 'BVM', 'BVM - President' )




SetManager({'dnp', 'cspo', 'cto', 'litk', 'drb', 'abc', 'cgb', 'kchs/dchs', 'cdt', 'psp', 'cp', 'frs', '(vet)', 'mpse', 'ncps', 'ceh', 'ra', 'caha', 'qsd', 'bpt', 'cfm', 'ae', 'cpfa', 'rba', 'ipep', 'psm ii', 'oscp', 'dabfm', 'gcmg', 'stmieee', 'cgp', 'psm', 'rrc', 'nicet iv', 'ifgict', 'crp', 'cbnt', 'rdms', 'cacts', 'afm', 'chss', 'fashp', 'lp', 'crme', 'fasla', 'dso', 'cst', 'cpsi', 'lvo', 'lmt', 'cprp', 'rcp', 'emt-i/99', 'pmp', 'pp', 'ccm', 'cp-c', 'qpm', 'cmfo', 'cmp', 'iso', 'ndtr', 'afasma', 'cisa', 'rid', 'bvm', 'cwap', 'cic', 'ms', 'cams', 'aem', 'omi', 'dsc', 'sscp', 'fp-c', 'ccc', 'usn', 'usmc', 'cscp', 'gchs', 'capa', 'chpln', 'mcse', 'cgap', 'nmd', 'cip', 'ncto', 'maaa', 'chrm', 'bt', 'gcvo', 'erd', 'shrm-cp', 'si', 'chpse', 'cyds', 'emt-i/85', 'psyd', 'rdh', 'o.s.m', 'gcb', 'crt', 'iccm-d', 'nbcfch-ps', 'ctfa', 'cfce', 'gc', 'mcdba', 'acp', 'crisc', 'nremt', 'msm', 'obe', 'sdb', 'pps', 'cnp', 'do', 'lcmt', 'cciso', 'fala', 'awb', 'fmva', 'iaee', 'facha', 'lsit', 'ccna', 

In [266]:
# Parse Complex Names
acc_rel_superiors_parsed = parse_names(acc_rel_superiors, 'Major Superior Name')

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df[column_name] = df[column_name].fillna('').apply(str)


In [267]:
# @title Final cleanup

acc_rel_superiors_staging = acc_rel_superiors_parsed.fillna('')

acc_rel_superiors_staging['Archdpdx_Migration_Id__c'] = acc_rel_superiors_staging['Major Superior Name'].apply(lambda x: x.replace(' ','').lower())

# Rename columns
acc_rel_superiors_staging = acc_rel_superiors_staging.rename(columns={
    'Major Superior Phone': 'Phone',
    'Major Superior Email': 'Email',
    'Title': 'Salutation',
    'First Name': 'FirstName',
    'Middle Name': 'MiddleName',
    'Last Name': 'LastName'
})

# Add job id
acc_rel_superiors_staging['Archdpdx_Job_Id__c'] = curr_job_id

# Drop columns
acc_rel_superiors_staging = acc_rel_superiors_staging.drop(columns=['Name', 'Major Superior Name', 'Nickname'])

# Drop empty rows
acc_rel_superiors_staging = acc_rel_superiors_staging[acc_rel_superiors_staging['LastName'].str.strip() != '']

acc_rel_superiors_staging.sample(10)

Unnamed: 0,Phone,Email,Archdpdx_Migration_Id__c,AccountId,FirstName,LastName,MiddleName,Salutation,Suffix,Archdpdx_Job_Id__c
238,+91 80 74 51 02 67,rkappumkal@gmail.com,fr.kappumkalthomas,001Dx00001HwE4DIAV,Kappumkal,Thomas,,Fr.,,98
239,,gensec@omigen.org,"fr.luisignacioroisalonso,omi",001Dx00001HwE4EIAV,Luis,Alonso,Ignacio Rois,Fr.,OMI,98
244,,,"johnpaulouellette,cfr",001Dx00001HwE4JIAV,John,Ouellette,Paul,,CFR,98
215,011 52 55 58 72 20 0,hmrf@misionerasdefatima.org,"candelarianavarroalvarado,hmrf",001Dx00001HwE3rIAF,Candelaria,Alvarado,Navarro,,HMRF,98
219,314-397-9436,tponder@gspmna.org,toniponder,001Dx00001HwE3vIAF,Toni,Ponder,,,,98
198,909-793-0424,,"fr.matthewwilliams,o.c.d.",001Dx00001HwE3cIAF,Matthew,Williams,,Fr.,O.C.D.,98
243,,,rogériogomes,001Dx00001HwE4IIAV,Rogério,Gomes,,,,98
207,,,"sr.janehibbard,snjmmonasteryadministrator",001Dx00001HwE3VIAV,SNJM,Sr. Jane Hibbard,Monastery Administrator,,,98
228,,,"sisterandreanenzel,csjp",001Dx00001HwE44IAF,Andrea,Nenzel,,Sister,CSJP,98
227,610-459-4125,tfirenze@osfphila.org,"sr.theresamariefirenze,osf",001Dx00001HwE43IAF,Theresa,Firenze,Marie,Sr.,OSF,98


In [268]:
# @title Send to CSV
acc_rel_superiors_staging.to_csv('staging_files/religious_superiors_staging.csv', encoding='utf-8-sig')

In [269]:
# Upsert to Salesforce

def find_existing_contact(sf, first_name, last_name):
    query = f"SELECT Id, Archdpdx_Migration_Id__c FROM Contact WHERE FirstName = '{first_name}' AND LastName = '{last_name}'"
    result = sf.query(query)
    return result['records']



bulk_data = []
for row in acc_rel_superiors_staging.itertuples(index=False):
    d = row._asdict()
    existing_contacts = find_existing_contact(sf, d['FirstName'], d['LastName'])
    if existing_contacts:
        # Update existing contact with external ID
        d['Id'] = existing_contacts[0]['Id']
        bulk_data.append(d)
    else:
        bulk_data.append(d)


if run_upserts == 'True':
    religious_superior_upsert = sf.bulk.Contact.upsert(data=bulk_data, external_id_field='Archdpdx_Migration_Id__c', batch_size=100, use_serial=False)
    df_rel_superior_upsert = pd.DataFrame(religious_superior_upsert)

df_rel_superior_upsert

Unnamed: 0,success,created,id,errors
0,False,False,,"[{'statusCode': 'DUPLICATE_VALUE', 'message': ..."
1,True,True,003Dx00000nKikgIAC,[]
2,True,True,003Dx00000nKikhIAC,[]
3,True,True,003Dx00000nKikiIAC,[]
4,False,True,,"[{'statusCode': 'INVALID_EMAIL_ADDRESS', 'mess..."
5,True,True,003Dx00000nKikjIAC,[]
6,True,True,003Dx00000nKikkIAC,[]
7,False,False,,"[{'statusCode': 'DUPLICATE_VALUE', 'message': ..."
8,True,True,003Dx00000nKiklIAC,[]
9,True,True,003Dx00000nKikmIAC,[]


In [270]:
# Update Religious Communities with Rel. Superior

# TODO: It would take much less time to simply do this post-migration manually.

# CONTACTS


## Extract


In [271]:
import pandas as pd
df_contacts = (pd.read_csv('/Users/matthewmartin/Library/CloudStorage/GoogleDrive-matt@meribahflow.com/Shared drives/Clients/ADPDX (Portland)/Data/Clergy DB/reports from clergypdx/People.csv')
               .set_index('Record Number', verify_integrity=True)
               .drop(index='recNum') # Drops the extra row that replicates the labels
               .rename(columns=lambda x: x.replace(' ', '_')) # Remove whitespace in column names
)

df_contacts.sample(10)


Unnamed: 0_level_0,Common_Name,Sort_Name,Type(s),Clergy_Status,Religious_Status,Login_ID,Password,Password_Must_be_Changed,Access_Permission,Spouse,...,CARA_Ethnicity,Seminarian_Status,Other_Diaconal_Ministry,Spiritual_Director_Authorized,Link_to_Religious_Community,Place_of_Work,Volunteer_Place,Type_of_Work,Work_Load,Work_Title
Record Number,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
2328,Ms. Angela Rosebrook,rosebrook angela,Staff,,,,,,,0,...,,,,,0,,,,,
2774,Ms. Heather Brower,brower heather,Staff,,,hbrower,5a516b7659bf2441259050e96fa85353ad34b5b03396ac...,Yes,,0,...,,,,,0,,,,,
1062,"Rev. Kevin Fitzpatrick, OSM",fitzpatrick kevin,"Priest,Religious",Deceased,Deceased,,,,,0,...,,,,,27,,,,,
2331,Ms. Christina Self,self christina,Staff,,,,,,,0,...,,,,,0,,,,,
1319,Rev. Louis Rodakowski,rodakowski louis,Priest,Deceased,,,,,,0,...,,,,,0,,,,,
1824,Ms. Sue Unger,unger sue,Staff,,,,,,,0,...,,,,,0,,,,,
1838,Ms. Katie Edson,edson katie,Staff,,,,,,,0,...,,,,,0,,,,,
2796,Very Reverend Michael Mandelas,mandelas michael j,"Priest,Non-Latin Rite",Transferred Out,,mmandelas,378b3ff2a8d635938056240288014278b94ff0b69bdaf2...,Yes,,0,...,,,,,0,,,,,
3214,Mr. Kurt Johnson,johnson kurt,Staff,,,,,,,0,...,,,,,0,,,,,
2933,Mr. Nicolai Bajanov,bajanov nicolai,Staff,,,,,,,0,...,,,,,0,,,,,


#### Get Photos


In [272]:
import os
import pandas as pd

# def list_jpeg_files(directory):
#     data = []
#     for filename in os.listdir(directory):
#         if filename.endswith(".jpeg") or filename.endswith(".jpg"):  # Checking for jpeg files
#             full_path = os.path.join(directory, filename)
#             data.append({'Filename': filename, 'Full Path': full_path})
#     return pd.DataFrame(data)

# # Specify your directory
# directory = '/content/drive/Shareddrives/Clients/ADPDX (Portland)/Data/Clergy DB/sql_backup/archdpdx.info backups/public_html/people/graphics/portraits/large'
# jpeg_files_df = list_jpeg_files(directory)


In [273]:
# # Query for the Library
# library_query = "SELECT Id, Name FROM ContentWorkspace WHERE Name = 'ADPDX Person Profile Photos'"
# library_result = sf.query(library_query)

# # Check if the library exists and get its ID
# if library_result['records']:
#     library_id = library_result['records'][0]['Id']
#     print(f"Library ID: {library_id}")

#     # Query for the Folder within the Library
#     folder_query = f"SELECT Id, Name FROM ContentFolder WHERE ParentContentFolderId = '{library_id}'"
#     folder_result = sf.query(folder_query)

#     # Check if the folder exists and get its ID
#     if folder_result['records']:
#         folder_id = folder_result['records'][0]['Id']
#         print(f"Folder ID: {folder_id}")
#     else:
#         print("Folder 'Large JPEGs' not found in the library.")
# else:
#     print("Library 'ADPDX Person Profile Photos' not found.")

## Analysis

Here we check the various columns and their types, count where values exist, count of unique values, sample data, etc.

DF shape:

- 142 columns
- 3017 rows


In [274]:
# Check the original shape of the imported CSV
print(f"Shape of original data set: {df_contacts.shape}")

# export to csv a list of the contact fields with count, unique, top, freq
contacts_describe = df_contacts.describe(include='all').transpose()
contacts_describe.to_csv(f'/Users/matthewmartin/Library/CloudStorage/GoogleDrive-matt@meribahflow.com/Shared drives/Clients/ADPDX (Portland)/Data/Clergy DB/analysis/contacts_describe.csv')

df_contacts.describe(include='all').transpose()  #initial analysis of the Contacts table

Shape of original data set: (3016, 141)


Unnamed: 0,count,unique,top,freq
Common_Name,3016,3011,Ms. Leslie Jones,2
Sort_Name,3016,3009,nguyen anthony,3
Type(s),3016,29,Staff,1139
Clergy_Status,1138,8,Transferred Out,462
Religious_Status,902,4,Active,456
...,...,...,...,...
Place_of_Work,269,133,Mount Angel Abbey,37
Volunteer_Place,54,47,Mary’s Woods,4
Type_of_Work,276,117,Pastoral Ministry,30
Work_Load,262,2,Full Time,230


In [275]:
unique_languages = df_contacts['Languages'].unique()
unique_languages

array([nan, 'English,Spanish', 'Igbo', 'English, Spanish',
       'Spanish, Mayaqeqchi', 'Spanish (Mass only)',
       'Latin Mass and written translation. Read French, Italian, Spanish.',
       'Spanish', 'Hindi, Konkani, Tamil',
       'French (fluent), Spanish (beginner), Latin (beginner)',
       'German, Spanish, Italian, French', 'Kiswahili, Kichagga',
       'Spanish (English is second language)',
       'German, Spanish, Italian, Latin Mass',
       'English, Spanish, Italian', 'Spanish, Italian', 'English',
       'Bicolango, Tagalog, Spanish', 'Spanish, Italian, Latin Mass',
       'Italian', 'Tagalog, English, Spanish',
       'French, Italian, Aramaic (modern), Spanish', 'Vietnamese',
       'German, Spanish', 'English,Spanish,Italian',
       'Conversant in Italian and Spanish, some facility with Latin and German',
       'English, Spanish, Latin Mass', 'Italian, Spanish',
       'Konkani, Hindi, Marathi, Spanish',
       'Tagalog, Bicol, Spanish (Mass only)', 'Spanish, E

In [276]:
# import re
# import numpy as np


# def deduplicate_languages(list_languages):
#     # Define a regular expression pattern to match periods and punctuation
#     punctuation_pattern = r'[.,!?;:"]'

#     # Flatten the array and filter out NaN values
#     flattened_languages = [re.sub(punctuation_pattern, '', lang) for sublist in list_languages if pd.notna(sublist) for lang in sublist.split(',')]

#     # Deduplicate the list of languages
#     unique_languages = list(set(flattened_languages))

#     return unique_languages


# # Example usage:
# unique_languages = deduplicate_languages(unique_languages)
# print(unique_languages)


## Transform


In [277]:
# list of columns NOT to be migrated as Contact attributes
misc_columns_to_drop = [
    'Password',
    'Password_Must_be_Changed',
    'Sort_Name'
]

affiliation_columns = [
    'Baptism_Date',
    'Place_of_Baptism',
    'Confirmation_Date',
    'Place_of_Confirmation',
    'Received_Date',
    'Parish_of_Record',
    'Marriage_Date',
    'Place_of_Marriage',
    'Date_of_First_Vows',
    'Date_of_Final_Vows',
    'Reader_Date',
    'Acolyte_Date',
    'Bachelor_Degree_Year',
    'Bachelor_Degree_Type',
    'Bachelor_Degree_Institution',
    'Graduate_1_Degree_Institution',
    'Graduate_1_Degree_Type',
    'Graduate_1_Degree_Year',
    'Graduate_2_Degree_Institution',
    'Graduate_2_Degree_Type',
    'Graduate_2_Degree_Year',
    'Graduate_3_Degree_Institution',
    'Graduate_3_Degree_Type',
    'Graduate_3_Degree_Year',
    'Graduate_4_Degree_Institution',
    'Graduate_4_Degree_Type',
    'Graduate_4_Degree_Year',
    'Diaconal_Ordination_Date',
    'Diaconal_Ordination_Place',
    'Diaconal_Ordination_Prelate',
    'Presbyteral_Ordination_Date',
    'Presbyteral_Ordination_Place',
    'Presbyteral_Ordination_Prelate',
    'Episcopal_Ordination_Date',
    'Episcopal_Ordination_Place',
    'Episcopal_Ordination_Prelate',
    'Incardinated_From_Date',
    'Incardinated_From_Diocese',
    'Excardinated_To_Diocese',
    'Excardinated_To_Date',
    'Faculties',
    'Faculties_Granted_Date',
    'Faculties_Restricted_Date',
    'Faculties_Withdrawn_Date',
]

# These fields need to be KEPT but while building the SF upsert flow these are dropped temporarily until mapping logic is included.
# TODO

fields_not_yet_mapped = [
    'Common_Name',
    'Spouse',
    'Father_Full_Name',
    'Mother_Full_Maiden_Name',
    'Mailing_Address_Province',
    'Private_Address_Province',
    # 'Preferred_Address',
    # 'Private_Address__Street__s',
    # 'Private_Address_2',
    # 'Private_Address__City__s',
    # 'Private_Address__StateCode__s',
    # 'Private_Address__PostalCode__s',
    # 'Private_Address__CountryCode__s',
    'Preferred_Email',
    'Preferred_Phone',
    'Social_Security_Account_Number__c',  # The data is encrypted
    'Serving_Now',
    'Ordination_Diocese',
    'Registered_Parish'

]

In [278]:
# UDF to combine multiple Mailing Street Address lines into one
def combine_addresses(row, *columns):
    address_parts = []
    for col in columns:
        value = row[col]
        if pd.notnull(value):  # Check for non-null values
            address_parts.append(str(value))  # Convert to string
    return '\n'.join(address_parts)  # '\n' for line break

In [279]:
df_contact_staging = (df_contacts
                      .drop(columns='Salutation')
                      .rename(columns={
                          'Clergy_Status' : 'ADPDX_Clergy_Status__c',
                          'Religious_Status' : 'ADPDX_Religious_Status__c',
                          'Login_ID' : 'ADPDX_Login_ID__c',
                          'Access_Permission': 'ADPDX_Access_Permission__c',
                          'Title': 'Salutation',
                          'Christian_Name': 'FirstName',
                          'Middle_Name(s)': 'MiddleName',
                          'Surname': 'LastName',
                          'Suffix': 'Suffix',
                          'Preferred_Address': 'Preferred_Address__c',
                          'Mailing_Address_City': 'MailingCity',
                          'Mailing_Address_State': 'MailingState',
                          'Mailing_Address_Postal_Code': 'MailingPostalCode',
                          'Mailing_Address_Country': 'MailingCountry',
                          'Private_Address_City': 'OtherCity',
                          'Private_Address_State': 'OtherState',
                          'Private_Address_Postal_Code': 'OtherPostalCode',
                          'Private_Address_Country': 'OtherCountry',
                          'Work_Phone': 'npe01__WorkPhone__c',
                          'Home_Phone': 'HomePhone',
                          'Cell_Phone': 'MobilePhone',
                        #   'Preferred_Phone': 'npe01__PreferredPhone__c',
                          # IF Preferred phone contains, 'do not publish'
                          'Work_Email' : 'npe01__WorkEmail__c',
                          'Archdiocesan_Email': 'npe01__AlternateEmail__c',
                          'Home_Email': 'npe01__HomeEmail__c',
                        #   'Preferred_Email': 'npe01__Preferred_Email__c',
                          # IF Preferred email contains 'do not publish''
                          'Directory_Include': 'Directory_Include__c',
                          'Directory_Include_Middle_Name': 'Directory_Include_Middle_Name__c',
                          'Directory_Include_Suffix': 'Directory_Include_Suffix__c',
                          'Suppress_From_Reports': 'Suppress_From_Reports__c',
                          'Send_Group_Mail_and_Email': 'Send_Group_Mail_and_Email__c',
                          'Birth_Date': 'Birthdate',
                          'Place_of_Birth': 'mbfc__Place_of_Birth__c',
                          'Foreign_Born': 'Foreign_Born__c',
                          'Foreign_Citizenship': 'Foreign_Citizenship__c',
                          'Immigration_Status': 'Immigration_Status__c',
                          'Passport/Visa_Expiration_Date': 'Passport_Visa_Expiration_Date__c',
                          'Social_Security_Account_Number': 'Social_Security_Account_Number__c',
                          'Deceased_Date': 'mbfc__Date_of_Death__c',
                          'Out_of_Diocese_Date': 'mbfc__Date_Left_Diocese__c', 
                          'CARA_Ethnicity': 'adpdx_CARA_Ethnicity__c',
                          'Seminarian_Status': 'adpdx_Seminarian_Status__c',
                          'Other_Diaconal_Ministry': 'adpdx_Other_Diaconal_Ministry__c',
                          'Spiritual_Director_Authorized': 'adpdx_Spiritual_Director_Authorized__c',
                          'Place_of_Work': 'adpdx_Place_of_Work__c',
                          'Volunteer_Place': 'adpdx_Volunteer_Place__c',
                          'Type_of_Work': 'adpdx_Type_of_Work__c',
                          'Work_Load': 'adpdx_Work_Load__c',
                          'Work_Title': 'adpdx_Work_Title__c',
                          'Coverage_Availability': 'adpdx_Coverage_Availability__c', 
                          'Advanced_Directive_Date': 'adpdx_Advanced_Directive_Date__c',
                          'End_of_Life_Plan_Date': 'adpdx_End_of_Life_Plan_Date__c',
                          'Will_Date': 'adpdx_Will_Date__c',
                          'Will_Note': 'adpdx_Will_Note__c',
                          'CIC_489_File': 'adpdx_CIC_489_File__c',
                          'Senior_Status_Date': 'adpdx_Senior_Status_Date__c', 
                          'Laicized_Date': 'adpdx_Laicized_Date__c',
                          'Seminarian_Student_Debt': 'adpdx_Seminarian_Student_Debt__c',
                          'Seminarian_Medical_Benefits': 'adpdx_Seminarian_Medical_Benefits__c',
                          'Candidacy_Date': 'adpdx_Candidacy_Date__c',
                          'Accepted_to_Formation_Date': 'adpdx_Accepted_to_Formation_Date__c',
                          'Formation_Withdrawn_Date': 'adpdx_Formation_Withdrawn_Date__c',
                          'Formation_Deferred_Date': 'adpdx_Formation_Deferred_Date__c',
                          'Formation_Terminated_Date': 'adpdx_Formation_Terminated_Date__c',
                          'Terminate_or_Defer_Note': 'adpdx_Terminate_or_Defer_Note__c',
                          'CARA_Highest_Ed_Level': 'adpdx_CARA_Highest_Ed_Level__c',
                          'Letter_of_Good_Standing_Date': 'adpdx_Letter_of_Good_Standing__c',
                          'Religious_In_Archdiocese_Date': 'mbfc__Date_of_Arrival_in_Diocese__c',
                          'Last_Retreat_Date': 'adpdx_Last_Retreat_Date__c',
                          'Last_Educ_Requirement_Date': 'adpdx_Last_Educ_Requirement_Date__c',
                          'Policy_Manual_Acknowledgement_Date': 'adpdx_Policy_Manual_Acknowledgement_Date__c',
                          'Harassment_Prevention_Course_Date': 'adpdx_Harassment_Prevention_Course_Date__c',
                          'Standards_of_Conduct_Date': 'adpdx_Standards_of_Conduct_Date__c',
                          'Last_Background_Check_Date': 'adpdx_Last_Background_Check_Date__c',
                          'Last_Child_Protection_Training_Date': 'adpdx_Last_Child_Protection_Training__c',
                          'Languages': 'Languages__c',
                          'Nickname': 'adpdx_Preferred_Name__c'

                          })
                      .assign(Bi_Ritual__c=lambda x: x['Type(s)'].str.contains('Biritual'))
                      .assign(Non_Latin_Rite__c=lambda x: x['Type(s)'].str.contains('Non-Latin Rite'))
                      .assign(adpdx_Discerner_Aspirant_for_Diaconate__c=lambda x: x['Type(s)'].str.contains('Diaconate'))
                      .assign(adpdx_Is_Seminarian__c=lambda x: x['Type(s)'].str.contains('Seminar'))
                      
                      .assign(Archdpdx_Migration_Id__c=lambda x: x.index)
                      .assign(MailingStreet=lambda x: x.apply(lambda row: combine_addresses(row, 'Mailing_Address', 'Mailing_Address_2'), axis=1))
                      .drop(columns=['Mailing_Address', 'Mailing_Address_2'])  # Optional: Drop original columns if not needed
                      .assign(OtherStreet=lambda x: x.apply(lambda row: combine_addresses(row, 'Private_Address', 'Private_Address_2'), axis=1))
                      .drop(columns=['Private_Address', 'Private_Address_2'])  # Optional: Drop original columns if not needed
                      .drop(columns=misc_columns_to_drop)
                      .drop(columns=affiliation_columns)
                      .drop(columns=fields_not_yet_mapped)

        )


In [280]:
df_contact_staging.columns

Index(['Type(s)', 'ADPDX_Clergy_Status__c', 'ADPDX_Religious_Status__c',
       'ADPDX_Login_ID__c', 'ADPDX_Access_Permission__c', 'Salutation',
       'FirstName', 'adpdx_Preferred_Name__c', 'MiddleName', 'LastName',
       'Suffix', 'MailingCity', 'MailingState', 'MailingPostalCode',
       'MailingCountry', 'OtherCity', 'OtherState', 'OtherPostalCode',
       'OtherCountry', 'Preferred_Address__c', 'npe01__WorkPhone__c',
       'HomePhone', 'MobilePhone', 'npe01__WorkEmail__c',
       'npe01__AlternateEmail__c', 'npe01__HomeEmail__c',
       'Directory_Include__c', 'Directory_Include_Middle_Name__c',
       'Directory_Include_Suffix__c', 'Suppress_From_Reports__c',
       'adpdx_Seminarian_Student_Debt__c',
       'adpdx_Seminarian_Medical_Benefits__c', 'Send_Group_Mail_and_Email__c',
       'Birthdate', 'mbfc__Place_of_Birth__c', 'Foreign_Born__c',
       'Foreign_Citizenship__c', 'Immigration_Status__c',
       'Passport_Visa_Expiration_Date__c',
       'adpdx_Accepted_to_Formatio

In [281]:
df_contact_staging.MailingStreet.sample(10)

Record Number
2660    Benedictine Sisters of Mount Angel\n840 S Main St
604                                                      
2155                     St. Anne Parish\n1131 NE 10th St
3264                       St. Patrick Parish\nPO Box 730
606                                                      
300                                     13664 SW Aerie Dr
1032                                                     
297                                                      
1934           Queen of Peace Parish\n4227 Lone Oak Rd SE
2693                                                     
Name: MailingStreet, dtype: object

### Languages

In [282]:
# # Define a function to clean the 'languages' column

# import re
# def clean_languages(text):
#     if pd.isna(text):
#         return text
#     # Remove text inside parentheses
#     text = re.sub(r'\(.*?\)', '', text)
#     # Replace ' & ' or ' and ' with ';'
#     text = re.sub(r' & | and ', ';', text)
#     # Replace commas with semicolons
#     text = text.replace(',', ';')
#     # Remove spaces before and after semicolons
#     text = re.sub(r'\s*;\s*', ';', text)
#     return text.strip(';')

# # Apply the cleaning function to the 'languages' column
# df_contact_staging['Languages__c'] = df_contact_staging['Languages__c'].apply(clean_languages)


### Private Address Handling


In [283]:
# If 'OtherStreet' is not null, then set Secondary Address Type to 'Private'.  This is because the 'OtherAddress' fields all come from the 'Private' address fields in source system. 
df_contact_staging['npe01__Secondary_Address_Type__c'] = df_contact_staging['OtherStreet'].apply(lambda x: 'Private' if pd.notnull(x) else None)


### Handle Boolean Fields


In [284]:
boolean_columns_to_convert = ['Foreign_Born__c', 'Directory_Include__c', 'Directory_Include_Middle_Name__c', 'Directory_Include_Suffix__c',
       'Suppress_From_Reports__c', 'Send_Group_Mail_and_Email__c', ]

df_contact_staging[boolean_columns_to_convert] = df_contact_staging[boolean_columns_to_convert].replace({'Yes': True, 'No': False})


In [285]:
df_contact_staging[boolean_columns_to_convert] = df_contact_staging[boolean_columns_to_convert].fillna(False)

df_contact_staging[boolean_columns_to_convert].sample(5)

Unnamed: 0_level_0,Foreign_Born__c,Directory_Include__c,Directory_Include_Middle_Name__c,Directory_Include_Suffix__c,Suppress_From_Reports__c,Send_Group_Mail_and_Email__c
Record Number,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
2677,False,True,False,False,False,True
1064,False,False,False,False,False,True
300,False,False,False,False,False,False
1155,False,False,False,False,False,True
3237,False,False,False,False,False,True


### Set Contact Record Type


In [286]:
# Set Record Type

# Go down row by row and check the 'Type(s)' columns, check for certain words that are keys in a dictionary, and
# the that row's 'Type(s)' field contains a string that is in the a key in a dictionary the update another columns
# called 'ContactRecordType' with the paired value.

contact_type_map = {
    'Bishop': 'Priest',
    'Priest': 'Priest',
    'Transitional Deacon': 'Permanent_Deacon',
    'Permanent Deacon': 'Permanent_Deacon',
    'Seminarian': 'Lay_Person',
    'Diaconate Formation': 'Lay_Person',
    'Seminary Applicant': 'Lay_Person',
    'Diaconate Inquirer': 'Lay_Person',
    'Wife': 'Lay_Person',
    'Religious': 'Religious',
    'Staff': 'Lay_Person',
    'Seminary Applicant': 'Lay_Person',
    'Archive': 'Lay_Person'
}

def update_contact_record_type(row):
    for key, value in contact_type_map.items():
        if key in row['Type(s)']:
            return value
    return None

df_contact_staging['ContactRecordType'] = df_contact_staging.apply(update_contact_record_type, axis=1)

In [287]:
# Map in the RecordTypeIDs
df_contact_staging['RecordTypeID'] = df_contact_staging['ContactRecordType'].map(record_types_mapping)

### Ecclesial Status & Ministerial Status


In [288]:
df_contact_staging

Unnamed: 0_level_0,Type(s),ADPDX_Clergy_Status__c,ADPDX_Religious_Status__c,ADPDX_Login_ID__c,ADPDX_Access_Permission__c,Salutation,FirstName,adpdx_Preferred_Name__c,MiddleName,LastName,...,Bi_Ritual__c,Non_Latin_Rite__c,adpdx_Discerner_Aspirant_for_Diaconate__c,adpdx_Is_Seminarian__c,Archdpdx_Migration_Id__c,MailingStreet,OtherStreet,npe01__Secondary_Address_Type__c,ContactRecordType,RecordTypeID
Record Number,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
2766,Priest,Transferred Out,,sabukaka,,Rev.,Stephen,,Ozovehe,Abaukaka,...,False,False,False,False,2766,Brighton Hospice Office\n8050 SW Warm Springs ...,5802 SW Milwaukie Ave Apt 4,Private,Priest,012Dx0000003p5JIAQ
2337,Staff,,,,,Mr.,Rogelio,,,Acevedo,...,False,False,False,False,2337,St. Pius X Parish\n1280 NW Saltzman Rd,,Private,Lay_Person,012Dx0000003p5HIAQ
3244,Staff,,,,,Mr.,Sean,,,Ackroyd,...,False,False,False,False,3244,St. Mary Parish\n501 NW 25th St,,Private,Lay_Person,012Dx0000003p5HIAQ
3295,Staff,,,,,Ms.,Sherril,,,Acton,...,False,False,False,False,3295,Marist Catholic High School\n1900 Kingsley Rd,,Private,Lay_Person,012Dx0000003p5HIAQ
2164,Staff,,,,,Ms.,Barbara,,,Adams,...,False,False,False,False,2164,St. Henry Parish\n346 NW 1st St,,Private,Lay_Person,012Dx0000003p5HIAQ
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
1670,Staff,,,,,Ms.,Jenny,,,Zomerdyk,...,False,False,False,False,1670,Shepherd of the Valley Parish\n600 Beebe Rd,,Private,Lay_Person,012Dx0000003p5HIAQ
2755,Religious,,Active,dzorrilla,,Br.,Daniel,,,Zorrilla,...,False,False,False,False,2755,Félix Rougier House of Studies\nPO Box 499,,Private,Religious,012Dx0000003p5KIAQ
1962,Staff,,,,,Ms.,Kim,,,Zuber,...,False,False,False,False,1962,St. Boniface Parish\n375 SE Church St,,Private,Lay_Person,012Dx0000003p5HIAQ
2202,Staff,,,,,Ms.,Agnes,,,Zueger,...,False,False,False,False,2202,Our Lady of the Lake Parish\n650 A Ave,,Private,Lay_Person,012Dx0000003p5HIAQ


In [289]:
def determine_ecclesial_status(df):
    def ecclesial_status(row):
        if pd.notna(row['ADPDX_Clergy_Status__c']) and 'Laicized' in row['ADPDX_Clergy_Status__c']:
            return 'Laicized'
        # elif pd.notna(row['ADPDX_Clergy_Status__c']) and 'Faculties Withdrawn' in row['ADPDX_Clergy_Status__c']:
        #     return 'Faculties Withdrawn'
        elif pd.notna(row['Type(s)']) and 'Bishop' in row['Type(s)']:
            return 'Bishop/Archbishop'
        elif pd.notna(row['Type(s)']) and 'Priest,Religious' in row['Type(s)']:
            return 'Priest - Religious'
        elif pd.notna(row['Type(s)']) and 'Priest' in row['Type(s)'] and (not pd.isna(row['Foreign_Citizenship__c']) or row['Incardinated_Now'] != 'Archdiocese of Portland in Oregon'):
            return 'Priest - Temporary Sojourn (Foreign)'
        elif pd.notna(row['Type(s)']) and 'Priest' in row['Type(s)'] and (pd.isna(row['Foreign_Citizenship__c']) and row['Incardinated_Now'] == 'Archdiocese of Portland in Oregon'):
            return 'Priest - Diocesan'
        elif pd.notna(row['Type(s)']) and row['Type(s)'] == 'Permanent Deacon':
            return 'Permanent Deacon'
        else:
            return None

    df['mbfc__Ecclesial_Status__c'] = df.apply(ecclesial_status, axis=1)
    return df


df_contact_staging = determine_ecclesial_status(df_contact_staging)

In [290]:
def determine_ministerial_status(df):
    def ministerial_status(row):
        if row['ADPDX_Clergy_Status__c'] == 'Deceased':
            return 'Deceased'
        elif row['ADPDX_Clergy_Status__c'] == 'Active':
            return 'Active in Ministry'
        elif row['ADPDX_Clergy_Status__c'] == 'Inactive':
            return 'Inactive'
        elif row['ADPDX_Clergy_Status__c'] == 'Senior Status':
            return 'Senior Status'
        elif row['ADPDX_Clergy_Status__c'] == 'Faculties Withdrawn':
            return 'Faculties Withdrawn'
        elif row['ADPDX_Clergy_Status__c'] == 'Transferred Out':
            return 'Left Diocese'
        elif row['ADPDX_Clergy_Status__c'] == 'Unassigned':
            return 'Unassigned'
        elif row['ADPDX_Clergy_Status__c'] == 'Laicized':
            return 'Laicized'
        else:
            return 'Unknown'
        
    df['mbfc__Ministerial_Status__c'] = df.apply(ministerial_status, axis=1)
    return df

df_contact_staging = determine_ministerial_status(df_contact_staging)

### Religious Congregation
In this section, for those Contacts who have a value in the `Link to Religious Community` source field we need to populate the `mbfc__Religious_Order__c` target field in Salesforce with the correct Religious Community's parent account - the Religious Congregation.

NOTE: In the source data, there is no differentiation between a child Religious Community and a parent Religious Order, there is only one record for the Religious Comnmunity.  In MF360 we represent these Accounts separately so we need to first (a) get the Religious Community record using the `Link to Religious Community` value but transforming it (adding 'RelCommunities_' in front of the value) so it matches the Archdpdx_Migration_Id__c in Salesforce.  

Once acquired, (b) we need to get the value of the `ParentID` field on the Religious Community which is the ID of the Religious Congregation record.  That ID is the value we then want to populate in the `mbfc__Religious_Order__c` field. 

In [291]:
# get SF Account
get_all_accounts = 'Select Id, Name, RecordTypeId, Type, mbfc__Parish_Code__c, Job_Id__c, Archdpdx_Migration_Id__c, ParentID from Account WHERE Archdpdx_Migration_Id__c != null'

# get list of records, add to dataframe
sf_accounts = sf.query(get_all_accounts)
df_sf_accounts = pd.DataFrame(sf_accounts['records'])
df_sf_accounts = df_sf_accounts.drop(columns = 'attributes')

# create a dict in order to apply later
accounts_id_map = df_sf_accounts.set_index('Archdpdx_Migration_Id__c')['Id'].to_dict()

In [292]:
df_sf_accounts[df_sf_accounts['Archdpdx_Migration_Id__c'].str.contains('RelCommunities', na=False)]

Unnamed: 0,Id,Name,RecordTypeId,Type,mbfc__Parish_Code__c,Job_Id__c,Archdpdx_Migration_Id__c,ParentId
183,001Dx00001HwE4dIAF,"Colombiere Jesuit Community, Portland (SJ)",012Dx0000003p52IAA,,,98,RelCommunities_1,001Dx00001HwE3TIAV
215,001Dx00001HwE4vIAF,"Adorers of the Holy Cross, Portland (MTG)",012Dx0000003p52IAA,,,98,RelCommunities_29,001Dx00001HwE3jIAF
216,001Dx00001HwE4wIAF,"Adrian Dominican Sisters, Adrian, MI (OP)",012Dx0000003p52IAA,,,98,RelCommunities_30,001Dx00001HwE3kIAF
217,001Dx00001HwE4xIAF,"Benedictine Sisters of Mount Angel, Mount Ange...",012Dx0000003p52IAA,,,98,RelCommunities_31,001Dx00001HwE3VIAV
218,001Dx00001HwE4yIAF,"Carmelite Sisters, Discalced, Eugene (OCD)",012Dx0000003p52IAA,,,98,RelCommunities_32,001Dx00001HwE3cIAF
...,...,...,...,...,...,...,...,...
316,001Dx00001HwE5dIAF,Priestly Fraternity of the Missionaries of St....,012Dx0000003p52IAA,,,98,RelCommunities_75,001Dx00001HwE4OIAV
317,001Dx00001HwE5eIAF,"Sons of Mary, Mother of Mercy, Umuahia, Nigeri...",012Dx0000003p52IAA,,,98,RelCommunities_76,001Dx00001HwE4PIAV
318,001Dx00001HwE5fIAF,"Society of the Divine Word, Techny, IL (SVD)",012Dx0000003p52IAA,,,98,RelCommunities_77,001Dx00001HwE4QIAV
319,001Dx00001HwE5gIAF,"Society of the Divine Saviour, Rome, Italy (SDS)",012Dx0000003p52IAA,,,98,RelCommunities_78,001Dx00001HwE4RIAV


In [293]:

def transform_religious_community_link(df):
    df['Link_to_Religious_Community'] = df['Link_to_Religious_Community'].apply(
        lambda x: None if x == '0' else f'RelCommunities_{x}'
    )
    return df

def get_parent_id_from_salesforce(sf_accounts, archdpdx_migration_id):
    print(f"Searching for: {archdpdx_migration_id}")  # Debug print
    matching_record = sf_accounts[sf_accounts['Archdpdx_Migration_Id__c'] == archdpdx_migration_id]
    if not matching_record.empty:
        print(f"Found: {matching_record['ParentId'].values[0]}")  # Debug print
        return matching_record['ParentId'].values[0]
    print("Not found")  # Debug print
    return None

def update_religious_order(df, sf_accounts):
    df['mbfc__Religious_Order__c'] = df.apply(
        lambda row: get_parent_id_from_salesforce(sf_accounts, row['Link_to_Religious_Community']) 
        if row['Link_to_Religious_Community'] is not None else None, axis=1
    )
    return df


# run the transform_religious_community_link and update_religious_order functions
df_contact_staging = transform_religious_community_link(df_contact_staging)

df_contact_staging = update_religious_order(df_contact_staging, df_sf_accounts)

Searching for: RelCommunities_60
Found: 001Dx00001HwE3TIAV
Searching for: RelCommunities_53
Found: 001Dx00001HwE45IAF
Searching for: RelCommunities_9
Found: 001Dx00001HwE3XIAV
Searching for: RelCommunities_4
Found: 001Dx00001HwE3VIAV
Searching for: RelCommunities_8
Found: 001Dx00001HwE3WIAV
Searching for: RelCommunities_35
Found: 001Dx00001HwE3nIAF
Searching for: RelCommunities_1
Found: 001Dx00001HwE3TIAV
Searching for: RelCommunities_23
Not found
Searching for: RelCommunities_56
Found: 001Dx00001HwE48IAF
Searching for: RelCommunities_23
Not found
Searching for: RelCommunities_53
Found: 001Dx00001HwE45IAF
Searching for: RelCommunities_60
Found: 001Dx00001HwE3TIAV
Searching for: RelCommunities_1
Found: 001Dx00001HwE3TIAV
Searching for: RelCommunities_27
Found: 001Dx00001HwE3iIAF
Searching for: RelCommunities_44
Found: 001Dx00001HwE3wIAF
Searching for: RelCommunities_23
Not found
Searching for: RelCommunities_44
Found: 001Dx00001HwE3wIAF
Searching for: RelCommunities_60
Found: 001Dx00001

In [294]:
df_contact_staging[df_contact_staging.mbfc__Religious_Order__c.isna() == False]

Unnamed: 0_level_0,Type(s),ADPDX_Clergy_Status__c,ADPDX_Religious_Status__c,ADPDX_Login_ID__c,ADPDX_Access_Permission__c,Salutation,FirstName,adpdx_Preferred_Name__c,MiddleName,LastName,...,adpdx_Is_Seminarian__c,Archdpdx_Migration_Id__c,MailingStreet,OtherStreet,npe01__Secondary_Address_Type__c,ContactRecordType,RecordTypeID,mbfc__Ecclesial_Status__c,mbfc__Ministerial_Status__c,mbfc__Religious_Order__c
Record Number,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
671,"Priest,Religious",Transferred Out,Transferred Out,jadams,,Rev.,J.,J.K.,K.,Adams,...,False,671,,,Private,Priest,012Dx0000003p5JIAQ,Priest - Religious,Left Diocese,001Dx00001HwE3TIAV
2430,Religious,,Active,,,Sr.,Delores,,,Adelman,...,False,2430,Sisters of St. Mary of Oregon\n4440 SW 148th Ave,4595 SW 148th Ave,Private,Religious,012Dx0000003p5KIAQ,,Unknown,001Dx00001HwE45IAF
1584,"Priest,Religious",Active,Active,makuti,,Rev.,Macdonald,,,Akuti,...,False,1584,St. Mary by the Sea Parish\nPO Box 390,,Private,Priest,012Dx0000003p5JIAQ,Priest - Religious,Active in Ministry,001Dx00001HwE3XIAV
912,"Priest,Religious",Transferred Out,Transferred Out,,,Rt. Rev.,James,,,Albers,...,False,912,,,Private,Priest,012Dx0000003p5JIAQ,Priest - Religious,Left Diocese,001Dx00001HwE3VIAV
913,"Priest,Religious",Transferred Out,Transferred Out,,,Rev.,Jose,,,Alberto,...,False,913,,,Private,Priest,012Dx0000003p5JIAQ,Priest - Religious,Left Diocese,001Dx00001HwE3WIAV
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
2884,"Priest,Religious",Transferred Out,Transferred Out,pyoun,,Rev.,Pius,,,Youn,...,False,2884,St. Thomas More Newman Center Parish\n1850 Eme...,,Private,Priest,012Dx0000003p5JIAQ,Priest - Religious,Left Diocese,001Dx00001HwE3dIAF
1434,"Priest,Religious",Deceased,Deceased,,,Rev.,Jerome,,,Young,...,False,1434,,,Private,Priest,012Dx0000003p5JIAQ,Priest - Religious,Deceased,001Dx00001HwE3VIAV
1435,"Priest,Religious",Transferred Out,Transferred Out,,,Rev.,Robert,,,Young,...,False,1435,,,Private,Priest,012Dx0000003p5JIAQ,Priest - Religious,Left Diocese,001Dx00001HwE3fIAF
787,"Priest,Religious",Senior Status,Retired,nzodrow,,Rt. Rev.,Nathan,,,Zodrow,...,False,787,Mount Angel Abbey\n1 Abbey Dr,,Private,Priest,012Dx0000003p5JIAQ,Priest - Religious,Senior Status,001Dx00001HwE3VIAV


### Registered Parish

In this section we populate the 'Home Parish' target field for Contacts who have a 'Registered Parish' in the source system. 

TODO: Check to see if the Registered Parish data is worth importing. Currently, 'Registered Parish' is only populated on 51 rows, and 32 of those rows in the 'Types' field are listed as 'Archive'. In other words, **only 19 of the 51 rows have a 'Registered Parish' value that might be meaningful.** 

### Final Dataframe Cleanup


In [295]:
# drop columns that are no longer needed
# del df_contact_staging['Type(s)']  # Commented this out as we want to KEEP the field and migrated to 'ADPDX Contact Type'
del df_contact_staging['ContactRecordType']
del df_contact_staging['Incardinated_Now']
del df_contact_staging['Link_to_Religious_Community']

In [296]:
df_contact_staging = df_contact_staging.rename(columns={'Type(s)': 'ADPDX_Contact_Type__c'})

In [297]:
# convert '' to NaN
df_contact_staging.replace("", np.nan, inplace=True)

# convert NaN to None
df_contact_staging = df_contact_staging.where(df_contact_staging.notnull(), None)


In [298]:
df_contact_staging['Languages__c'].sample(20)

Record Number
1282                       None
1273                       None
234                        None
2580                       None
1183                       None
2586                       None
2293                       None
358     Italian, Latin, Spanish
2269                       None
641                        None
959                        None
2462                       None
2976                       None
1130                       None
1521                       None
1493                       None
1581                       None
1926                       None
951                        None
1232                       None
Name: Languages__c, dtype: object

In [299]:
# df_contact_staging_2 = df_contact_staging.where(df_contact_staging.notnull(), None)

## Load


In [300]:
df_contact_staging['Archdpdx_Job_Id__c'] = curr_job_id

In [301]:
# generate CSV for manual loading
df_contact_staging.to_csv(f'/Users/matthewmartin/Library/CloudStorage/GoogleDrive-matt@meribahflow.com/Shared drives/Clients/ADPDX (Portland)/Data/Clergy DB/staging/df_contacts_staging.csv', encoding='utf-8-sig')
df_contact_staging.to_csv('staging_files/contacts_staging.csv', encoding='utf-8-sig')


In [305]:
# upsert Contact records into SF using Bulk api

from simple_salesforce.exceptions import SalesforceMalformedRequest

bulk_data = []
for row in df_contact_staging.itertuples(index=False):
    d = row._asdict()
    # del d['Index']
    bulk_data.append(d)

try:
    # Attempt to upsert Contact records into SF using Bulk API
    contact_upsert = sf.bulk.Contact.upsert(data=bulk_data, external_id_field='Archdpdx_Migration_Id__c', batch_size=500, use_serial=False)
    contact_upsert_results = pd.DataFrame(contact_upsert)
except SalesforceMalformedRequest as e:
    # If a SalesforceMalformedRequest error occurs, print the error message and response content
    print(f"SalesforceMalformedRequest error: {e}")
    print(f"Response content: {e.content}")



In [306]:
# Print upsert results to local file

keys = contact_upsert[0].keys()
with open('results_files/contact_results', 'w', newline='') as csv_file:
    writer = csv.DictWriter(csv_file, keys)
    writer.writeheader()
    writer.writerows(contact_upsert)


# CONTACT > SPOUSES

#TODO: Contact Spouses migration


# CONTACTS > PHOTOS

#TODO: Contact Photos


# CONTACT > REGISTER ENTRIES


In [307]:
import pandas as pd

# Load CSV
df = (pd.read_csv('/Users/matthewmartin/Library/CloudStorage/GoogleDrive-matt@meribahflow.com/Shared drives/Clients/ADPDX (Portland)/Data/Clergy DB/reports from clergypdx/People.csv')
               .rename(columns=lambda x: x.replace(' ', '_')) # Remove whitespace in column names
               .drop(index=0) # Drops the extra row that replicates the labels
)

df

Unnamed: 0,Record_Number,Common_Name,Sort_Name,Type(s),Clergy_Status,Religious_Status,Login_ID,Password,Password_Must_be_Changed,Access_Permission,...,CARA_Ethnicity,Seminarian_Status,Other_Diaconal_Ministry,Spiritual_Director_Authorized,Link_to_Religious_Community,Place_of_Work,Volunteer_Place,Type_of_Work,Work_Load,Work_Title
1,2766,Rev. Stephen Abaukaka,abaukaka stephen ozovehe,Priest,Transferred Out,,sabukaka,def2a990be60a7998b1ed7c820101f3bd02d33b8992518...,Yes,,...,,,,,0,,,,,
2,2337,Mr. Rogelio Acevedo,acevedo rogelio,Staff,,,,,,,...,,,,,0,,,,,
3,3244,Mr. Sean Ackroyd,ackroyd sean,Staff,,,,,,,...,,,,,0,,,,,
4,3295,Ms. Sherril Acton,acton sherril,Staff,,,,,,,...,,,,,0,,,,,
5,2164,Ms. Barbara Adams,adams barbara,Staff,,,,,,,...,,,,,0,,,,,
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
3012,1670,Ms. Jenny Zomerdyk,zomerdyk jenny,Staff,,,,,,,...,,,,,0,,,,,
3013,2755,"Br. Daniel Zorrilla, MSpS",zorrilla daniel,Religious,,Active,dzorrilla,391eedf7c936f63d3d0a7d9ea7e506a84709662fd31ba9...,Yes,,...,,,,,14,,,,,
3014,1962,Ms. Kim Zuber,zuber kim,Staff,,,,,,,...,,,,,0,,,,,
3015,2202,Ms. Agnes Zueger,zueger agnes,Staff,,,,,,,...,,,,,0,,,,,


In [308]:
# Import all Contact fields that actually map to Register Entry records

import pandas as pd

# Define the structure of your column sets with correct attribute names
column_sets = [
    {'date': 'Baptism_Date', 'place': 'Place_of_Baptism', 'notation_type': 'Proof of Baptism'},
    {'date': 'Confirmation_Date', 'place': 'Place_of_Confirmation', 'notation_type': 'Notice of Confirmation'},
    {'date': 'Received_Date', 'place': 'Parish_of_Record', 'notation_type': 'Notice of Profession of Faith'},
    {'date': 'Marriage_Date', 'place': 'Place_of_Marriage', 'notation_type': 'Notice of Matrimony'},
    {'date': 'Diaconal_Ordination_Date', 'place': 'Diaconal_Ordination_Place', 'prelate': 'Diaconate_Ordination_Prelate', 'notation_type': 'Notice of Holy Orders', 'ordination_type': 'Diaconate'},
    {'date': 'Presbyteral_Ordination_Date', 'place': 'Presbyteral_Ordination_Place', 'prelate': 'Presbyteral_Ordination_Prelate', 'notation_type': 'Notice of Holy Orders', 'ordination_type': 'Presbyteral'},
    {'date': 'Episcopal_Ordination_Date', 'place': 'Episcopal_Ordination_Place', 'prelate': 'Episcopal_Ordination_Prelate', 'notation_type': 'Notice of Holy Orders', 'ordination_type': 'Episcopal'}
]

# New DataFrame for entries
register_entries = pd.DataFrame(columns=['RecordNumber', 'mbfc__Register_Entry_Type__c', 'mbfc__Type__c', 'mbfc__Notation_Type__c', 'mbfc__Ordination_Type__c', 'Date', 'Place', 'Prelate'])
new_entries = []  # List to store entries before final concatenation

# Processing rows
for row in df.itertuples():
    for column_set in column_sets:
        date_value = getattr(row, column_set['date'], None)
        if pd.notna(date_value):  # Check if date field is not NaN
            entry = {
                'RecordNumber': getattr(row, 'Record_Number', None),
                'Date': date_value,
                'Place': getattr(row, column_set['place'], None)
            }
            # Add Prelate if applicable
            if 'prelate' in column_set:
                entry['Prelate'] = getattr(row, column_set['prelate'], None)

            # Set 'mbfc__Register_Entry_Type__c', and conditionally add 'mbfc__Type__c' or 'mbfc__Notation_Type__c'
            if 'sacrament_type' in column_set:
                entry['mbfc__Type__c'] = column_set['sacrament_type']
                entry['mbfc__Register_Entry_Type__c'] = 'Sacrament'
            if 'notation_type' in column_set:
                entry['mbfc__Notation_Type__c'] = column_set['notation_type']
                entry['mbfc__Register_Entry_Type__c'] = 'Notation'

            # Handle ordination type specific updates
            if 'ordination_type' in column_set:
                entry['mbfc__Ordination_Type__c'] = column_set['ordination_type']

            new_entries.append(entry)
    
    # Add entries for 'Reader Date'
    reader_date = getattr(row, 'Reader_Date', None)
    if pd.notna(reader_date):
        entry = {
            'RecordNumber': getattr(row, 'Record_Number', None),
            'Date': reader_date,
            'mbfc__Notation_Type__c': 'Notice of Holy Orders',
            'mbfc__Ordination_Type__c': 'Minor Order: Reader',
            'mbfc__Register_Entry_Type__c': 'Notation'
        }
        new_entries.append(entry)
    
    # Add entries for 'Acolyte Date'
    acolyte_date = getattr(row, 'Acolyte_Date', None)
    if pd.notna(acolyte_date):
        entry = {
            'RecordNumber': getattr(row, 'Record_Number', None),
            'Date': acolyte_date,
            'mbfc__Notation_Type__c': 'Notice of Holy Orders',
            'mbfc__Ordination_Type__c': 'Minor Order: Acolyte',
            'mbfc__Register_Entry_Type__c': 'Notation'
        }
        new_entries.append(entry)

# Concatenate all new entries to the DataFrame at once
if new_entries:
    register_entries = pd.concat([register_entries, pd.DataFrame(new_entries)], ignore_index=True)

print(f"Total records added: {len(register_entries)}")

# Optionally, save the new DataFrame to a CSV
register_entries.to_csv('Register_Entries.csv', index=False)

# Display the DataFrame
register_entries.sample(10)


Total records added: 1872


Unnamed: 0,RecordNumber,mbfc__Register_Entry_Type__c,mbfc__Type__c,mbfc__Notation_Type__c,mbfc__Ordination_Type__c,Date,Place,Prelate
1358,161,Notation,,Notice of Matrimony,,1983-02-12,,
1836,666,Notation,,Notice of Holy Orders,Presbyteral,1967-06-13,"Teutopolis, IL","Most Rev. Jude Prost, OFM"
40,1525,Notation,,Notice of Holy Orders,Presbyteral,1981-11-07,"Cathedral of Chihuahua, Mexico",Most Rev. Adalberto Almeida
566,1075,Notation,,Notice of Holy Orders,Presbyteral,1999-06-03,,
930,1515,Notation,,Notice of Holy Orders,Presbyteral,2015-06-19,Mt. Angel Abbey,
328,627,Notation,,Notice of Holy Orders,Presbyteral,1986-11-07,"Cathedral of the Immaculate Conception, Portla...",Most Rev. William J. Levada
1854,3075,Notation,,Notice of Holy Orders,Minor Order: Reader,2007-11-25,,
1068,641,Notation,,Notice of Holy Orders,Minor Order: Acolyte,1956-05-02,,
524,189,Notation,,Notice of Holy Orders,Minor Order: Reader,1999-10-23,,
811,260,Notation,,Notice of Holy Orders,Diaconate,2011-10-29,"Cathedral of the Immaculate Conception, Portla...",


### Populate Lookup for Prelate 

In [309]:
from nameparser import HumanName
from nameparser.config import CONSTANTS

# Add dataset-specific Titles and Suffix constants for parsing
CONSTANTS.titles.add('Very', 'Rev.', 'Very Rev.', 'Sr.', 'Most Rev.')
CONSTANTS.suffix_acronyms.add('FRS', 'J.C.L.', 'J.C.L., D.D.', 'D.D.', 'OMI', 'OSA', 'OCD', 'OP', 'OC', 'FSE', 'OMV', 'SDB', 'SM', 'SFX', 'SP', 'OP', 'O.S.M', 'SNJM', 'OSF', 'HMRF', 'DD', 'CSJP', 'SDD', 'BVM', 'BVM - President', 'SJ', 'SL', 'IX', 'SSJ', 'J.C.L.', 'J.C.L', 'OFM', 'MSpS', 'Fco.' )


def parse_name(name):
    if pd.isna(name):  # Checks if the name is NaN or None
        return {
            'Salutation': '',
            'FirstName': '',
            'MiddleName': '',
            'LastName': '',
            'Suffix': ''
        }
    else:
        name = HumanName(name)
        return {
            'Salutation': name.title,
            'FirstName': name.first,
            'MiddleName': name.middle,
            'LastName': name.last,
            'Suffix': name.suffix
        }

# Apply the parsing function only where 'Prelate' exists and is not NaN
for entry in new_entries:
    if 'Prelate' in entry and pd.notna(entry['Prelate']):
        parsed_name = parse_name(entry['Prelate'])
        entry.update(parsed_name)

# Ensure the DataFrame creation from new_entries includes checks for existence of keys:
register_entries = pd.DataFrame(new_entries)
if 'Prelate' in register_entries.columns:
    register_entries['Salutation'] = register_entries['Prelate'].apply(lambda x: parse_name(x)['Salutation'] if pd.notna(x) else '')
    register_entries['FirstName'] = register_entries['Prelate'].apply(lambda x: parse_name(x)['FirstName'] if pd.notna(x) else '')
    register_entries['MiddleName'] = register_entries['Prelate'].apply(lambda x: parse_name(x)['MiddleName'] if pd.notna(x) else '')
    register_entries['LastName'] = register_entries['Prelate'].apply(lambda x: parse_name(x)['LastName'] if pd.notna(x) else '')
    register_entries['Suffix'] = register_entries['Prelate'].apply(lambda x: parse_name(x)['Suffix'] if pd.notna(x) else '')


# Display the DataFrame
print(f"Total records added: {len(register_entries)}")
register_entries.sample(10)



Total records added: 1872


Unnamed: 0,RecordNumber,Date,Place,Prelate,mbfc__Notation_Type__c,mbfc__Register_Entry_Type__c,mbfc__Ordination_Type__c,Salutation,FirstName,MiddleName,LastName,Suffix
1832,285,2010-11-06,,,Notice of Holy Orders,Notation,Minor Order: Acolyte,,,,,
6,557,2015-05-23,"Cathedral of the Immaculate Conception, Portla...",,Notice of Holy Orders,Notation,Diaconate,,,,,
1575,573,2014-06-14,"Abbey Church, Mount Angel Abbey, Saint Benedic...","Most Rev. Alexander K. Sample, J.C.L., D.D.",Notice of Holy Orders,Notation,Presbyteral,,J.C.L.,,Most Rev. Alexander K. Sample,D.D.
706,695,1984-06-30,"Our Lady Queen of Angels Detroit, Michigan",Most Rev. Patrick R. Cooney,Notice of Holy Orders,Notation,Presbyteral,Most Rev.,Patrick,R.,Cooney,
714,126,1992-12-19,,,Notice of Matrimony,Notation,,,,,,
106,931,1989-04-01,"Notre Dame, IN",,Notice of Holy Orders,Notation,Presbyteral,,,,,
1697,779,2010-11-01,,,Notice of Holy Orders,Notation,Minor Order: Reader,,,,,
685,1102,1974-11-30,"Toronto, ON",,Notice of Holy Orders,Notation,Diaconate,,,,,
1336,424,1970-07-16,"St. Paschal Baylon, Thousand Oaks, CA",,Proof of Baptism,Notation,,,,,,
893,292,2002-08-05,,,Notice of Matrimony,Notation,,,,,,


In [310]:
# Query Salesforce for existing contacts and create a dictionary for mapping

from simple_salesforce import Salesforce

query = """
SELECT Id, Archdpdx_Migration_Id__c
FROM Contact
"""
result = sf.query_all(query)
contact_map = {rec['Archdpdx_Migration_Id__c']: rec['Id'] for rec in result['records']}


In [311]:
# Get RecordTypeId for Contact.Priest

priest_contact_recordtype_id = df_sf_recordTypes.loc[
    (df_sf_recordTypes['DeveloperName'] == 'Priest') & (df_sf_recordTypes['SobjectType'] == 'Contact'),
    'Id'
    ].iloc[0]  # Use .iloc[0] to get the first item if you're expecting exactly one match


In [312]:
# Get RecordID for Prelates by querying for Contacts by FirstName and LastName and, if not found, Create New Contacts

from simple_salesforce import SFType, SalesforceResourceNotFound

contact = SFType('Contact', sf.session_id, sf.sf_instance)
for index, row in register_entries.iterrows():
    first_name, last_name = row.get('FirstName'), row.get('LastName')

    if pd.isna(first_name) or pd.isna(last_name) or first_name.strip() == '' or last_name.strip() == '':
        # If either first name or last name is missing or empty, skip this row or handle as needed
        print(f"Skipping row {index} due to missing name information.")
        continue

    try:
        # Search for contact by First and Last Name
        query = f"SELECT Id FROM Contact WHERE FirstName = '{first_name}' AND LastName = '{last_name}'"
        result = sf.query(query)
        if result['totalSize'] > 0:
            contact_id = result['records'][0]['Id']
        else:
            # Create a new contact if no match found
            new_contact = {
                'FirstName': first_name,
                'LastName': last_name,
                'Archdpdx_Job_Id__c': curr_job_id,
                'RecordTypeId': priest_contact_recordtype_id
            }
            create_result = contact.create(new_contact)
            contact_id = create_result['id']

        # Update DataFrame with the Salesforce Contact ID
        register_entries.at[index, 'mbfc__Celebrant__c'] = contact_id

    except SalesforceException as e:
        print(f"Error processing row {index}: {e}")



Skipping row 2 due to missing name information.
Skipping row 3 due to missing name information.
Skipping row 4 due to missing name information.
Skipping row 5 due to missing name information.
Skipping row 6 due to missing name information.
Skipping row 8 due to missing name information.
Skipping row 9 due to missing name information.
Skipping row 10 due to missing name information.
Skipping row 11 due to missing name information.
Skipping row 12 due to missing name information.
Skipping row 13 due to missing name information.
Skipping row 14 due to missing name information.
Skipping row 15 due to missing name information.
Skipping row 16 due to missing name information.
Skipping row 17 due to missing name information.
Skipping row 19 due to missing name information.
Skipping row 20 due to missing name information.
Skipping row 21 due to missing name information.
Skipping row 22 due to missing name information.
Skipping row 24 due to missing name information.
Skipping row 25 due to miss

### Prepare to Upsert   

In [None]:
# Map Contact IDs to Register Entries

register_entries_2 = register_entries

register_entries_2['mbfc__Contact__c'] = register_entries['RecordNumber'].map(contact_map)


In [None]:
# Append Job_Id__c
register_entries_2['Archdpdx_Job_Id__c'] = curr_job_id

In [None]:
# Generate an External ID
def create_external_id(row):
    record_number = str(row['RecordNumber']).replace(' ', '').replace('-', '')
    entry_type = str(row['mbfc__Register_Entry_Type__c']).replace(' ', '').replace('-', '')

    # Check whether to use Type or Notation Type based on what's available
    if 'mbfc__Type__c' in row and not pd.isna(row['mbfc__Type__c']):
        type_field = str(row['mbfc__Type__c']).replace(' ', '').replace('-', '')
    elif 'mbfc__Notation_Type__c' in row and not pd.isna(row['mbfc__Notation_Type__c']):
        type_field = str(row['mbfc__Notation_Type__c']).replace(' ', '').replace('-', '') + str(row['mbfc__Ordination_Type__c']).replace(' ', '').replace('-', '')
    else:
        type_field = 'Unknown'

    return f"{record_number}_{entry_type}_{type_field}"

In [None]:
# Assuming your DataFrame is named `register_entries`
register_entries_2['Archdpdx_Migration_Id__c'] = register_entries.apply(create_external_id, axis=1)

if register_entries['Archdpdx_Migration_Id__c'].duplicated().any():
    print("Warning: There are duplicate external IDs.")
    # Optionally, show the duplicates
    duplicates = register_entries[register_entries['external_id'].duplicated(keep=False)]
    print(duplicates)
else:
    print("All external IDs are unique.")


In [None]:
# Drop unnecessary columns:
register_entries_2.drop(['RecordNumber', 'Prelate', 'Salutation', 'FirstName', 'MiddleName', 'LastName', 'Suffix'], axis=1, inplace=True)

In [None]:
register_entries_staging = register_entries_2

In [None]:
# Remove all NaN values:
register_entries_staging.fillna('', inplace=True)

# Rename columns
register_entries_staging = register_entries_staging.rename(columns={
    'Place': 'Location_text__c',
    'Date': 'mbfc__Event_Date__c'
})


In [None]:
register_entries_staging[register_entries_staging.mbfc__Contact__c == '003Dx00000m0OtXIAU']


In [None]:
# Upsert Register Entry Records

bulk_data = []
for row in register_entries_staging.itertuples(index=False):
    d = row._asdict()
    # del d['Index']
    bulk_data.append(d)

# Keep the batch <100 as I've been getting an exceptionCode: 'InvalidBatch', 'exceptionMessage': 'Records not processed'
reg_entry_upsert = sf.bulk.mbfc__Sacrament__c.upsert(data=bulk_data, external_id_field='Archdpdx_Migration_Id__c', batch_size=100, use_serial=False)
reg_entry_upsert_results = pd.DataFrame(reg_entry_upsert)

In [None]:
# Print upsert results to local file

keys = reg_entry_upsert[0].keys()

with open('results_files/register_entry_results', 'w', newline='') as csv_file:
    writer = csv.DictWriter(csv_file, keys)
    writer.writeheader()
    writer.writerows(reg_entry_upsert)

# CONTACT > AFFILIATIONS


In [None]:
# Function to create a unique ID based on Person's Name + completion date + affiliation type
def create_unique_id(row):
    # Concatenate the three fields with mbfc__Person__c at the front
    combined = f"{row['mbfc__Person__c']}{row['mbfc__Completion_Date__c']}{row['mbfc__Affiliation__c']}"
    # Remove unwanted characters and convert to lowercase
    clean_id = ''.join(combined.split()).replace('-', '').replace('.', '').lower()
    # Limit the string to 50 characters
    return clean_id[:50]

## Education Affiliations

This section takes multiple sets of columns (all related to a person's education) from the Contacts table, and combines them into a single set of columns in a new dataframe for insertion into Salesforce as Affiliation records.


In [None]:
# Parse and stage Education Affiliation records
import pandas as pd
from functools import lru_cache

# Load CSV
df = (pd.read_csv('/Users/matthewmartin/Library/CloudStorage/GoogleDrive-matt@meribahflow.com/Shared drives/Clients/ADPDX (Portland)/Data/Clergy DB/reports from clergypdx/People.csv')
               .rename(columns=lambda x: x.replace(' ', '_')) # Remove whitespace in column names
               .drop(index=0) # Drops the extra row that replicates the labels
)


# Define the structure of your column sets with correct attribute names
degree_sets = [
    {'year': 'Bachelor_Degree_Year', 'type': 'Bachelor_Degree_Type', 'institution': 'Bachelor_Degree_Institution'},
    {'year': 'Graduate_1_Degree_Year', 'type': 'Graduate_1_Degree_Type', 'institution': 'Graduate_1_Degree_Institution'},
    {'year': 'Graduate_2_Degree_Year', 'type': 'Graduate_2_Degree_Type', 'institution': 'Graduate_2_Degree_Institution'},
    {'year': 'Graduate_3_Degree_Year', 'type': 'Graduate_3_Degree_Type', 'institution': 'Graduate_3_Degree_Institution'},
    {'year': 'Graduate_4_Degree_Year', 'type': 'Graduate_4_Degree_Type', 'institution': 'Graduate_4_Degree_Institution'}
]

# Query for the Record Type ID for 'Organization'
record_type_result = sf.query("SELECT Id FROM RecordType WHERE SobjectType = 'Account' AND DeveloperName = 'Organization'")
organization_record_type_id = record_type_result['records'][0]['Id'] if record_type_result['records'] else None

# Initialize the DataFrame for the staging table
education_staging = pd.DataFrame()

# Function to check and create institution account
@lru_cache(maxsize=None)
def get_or_create_institution_account(institution_name):
    if pd.isna(institution_name):
        return None  # Return None or handle as appropriate if institution name is NaN

    # Query Salesforce to find the institution
    query = f"SELECT Id, Name FROM Account WHERE Name = '{institution_name}' LIMIT 1"
    results = sf.query(query)
    
    # If exists, return the ID
    if results['records']:
        return results['records'][0]['Id']
    else:
        # Ensure no NaN values are sent to Salesforce
        account_data = {
            'Name': institution_name if pd.notna(institution_name) else "Default Name",  # Provide a default if NaN
            'RecordTypeId': organization_record_type_id,
            'Organization_Type__c': 'School'
        }
        # Remove keys with None values to avoid JSON serialization issues
        account_data = {k: v for k, v in account_data.items() if v is not None}
        
        new_account = sf.Account.create(account_data)
        return new_account['id']

# Get Contact record ID from Salesforce
@lru_cache(maxsize=None)
def get_contact_id_by_record_number(record_number):
    if pd.isna(record_number):
        return None
    query = f"SELECT Id FROM Contact WHERE Archdpdx_Migration_Id__c = '{record_number}'"
    results = sf.query(query)
    if results['records']:
        return results['records'][0]['Id']
    return None


# Initialize an empty list to collect DataFrames or dictionaries
new_entries = []

# Process each row and each degree set
for index, row in df.iterrows():
    for degree_set in degree_sets:
        year = row[degree_set['year']]
        if pd.notna(year):  # Only proceed if the year column is not NaN
            formatted_year = f"{int(year)}-01-01"  # Convert year to YYYY-MM-DD format
            institution_name = row[degree_set['institution']]
            account_id = get_or_create_institution_account(institution_name)
            contact_id = get_contact_id_by_record_number(row['Record_Number'])
            
            # Create a record for the staging table
            affiliation_record = {
                'mbfc__Person__c': contact_id,
                'mbfc__Completion_Date__c': formatted_year,
                'mbfc__Context__c': account_id,
                'mbfc__Category__c': 'Education (non-degree)',
                'mbfc__Affiliation__c': row[degree_set['type']]
                # 'Institution_Name': institution_name
            }
            new_entries.append(affiliation_record)

# Convert all collected records to a DataFrame in one go
education_staging = pd.DataFrame(new_entries)


#FIXME: There are 4 rows where no INSTITUTION is listed. This makes it impossible to import an Affiliation record. Need to figure out how to handle this with Client. 
#FIXME: There are about 15 rows where no DEGREE is listed. This makes it impossible to import an Affiliation record. Need to figure out how to handle this with Client. 

In [None]:
# Apply the function to each row and create a new column with the unique ID
education_staging['Archdpdx_Migration_Id__c'] = education_staging.apply(create_unique_id, axis=1)

# Check the first few rows to verify the new column
education_staging.head()

In [None]:
# Fill any NaN values
education_staging = education_staging.fillna('')

In [None]:
# Save the staging table to CSV
education_staging.to_csv('staging_files/education_staging.csv', index=False)


In [None]:
# Upsert Education Affiliation records

bulk_data = []
for row in education_staging.itertuples(index=False):
    d = row._asdict()
    # del d['Index']
    bulk_data.append(d)

try:
    # Attempt to upsert Education Affiliation records into SF using Bulk API
    education_affil_upsert = sf.bulk.mbfc__Placement__c.upsert(data=bulk_data, external_id_field='Archdpdx_Migration_Id__c', batch_size=100, use_serial=False)
    education_affil_upsert_results = pd.DataFrame(education_affil_upsert)
    education_affil_upsert_results.to_csv('results_files/education_affil_upsert_results')

except SalesforceMalformedRequest as e:
    # If a SalesforceMalformedRequest error occurs, print the error message and response content
    print(f"SalesforceMalformedRequest error: {e}")
    print(f"Response content: {e.content}")

In [None]:

#FIXME: A number of Education Affiliation records are missing either an Affiliation title or a Context

## Ecclesial Affiliations

This section handles Contact table fields that map to Affiliation records with record type 'Ecclesial Affiliation'.

These Ecclesial Affiliations can be subcategorized by the 'contexts' to which the Affiliation records is related:

| Affiliation            | Context                   | Completion Date           |
| ---------------------- | ------------------------- | ------------------------- |
| First Vows             | Religious Order           | Date of First Vows        |
| Final Vows             | Religious Order           | Date of Final Vows        |
| Incardination          | Incardinated from Diocese | Incardinated From Date    |
| Faculties (Type)       | Local Diocese             | Faculties Granted Date    |
| Faculties (Restricted) | Local Diocese             | Faculties Restricted Date |
| Faculties (Withdrawn)  | Local Diocese             | Faculties Withdrawn Date  |
| Excardinated           | Excardinated To Diocese   | Excardinated To Date      |

FIXME: There are a number of rows where a Faculties Granted is missing a date, and conversely, where there is a Faculties Granted Date but no description of the Faculties granted. This is a problem, because the application requires a date for when Faculties were granted.


In [None]:
import pandas as pd
from functools import lru_cache
from simple_salesforce import Salesforce

# Load CSV
df = (pd.read_csv('/Users/matthewmartin/Library/CloudStorage/GoogleDrive-matt@meribahflow.com/Shared drives/Clients/ADPDX (Portland)/Data/Clergy DB/reports from clergypdx/People.csv')
               .rename(columns=lambda x: x.replace(' ', '_')) # Remove whitespace in column names
               .drop(index=0) # Drops the extra row that replicates the labels
)

# Define the structure of your column sets with correct attribute names
column_sets = [
    {'year': 'Incardinated_From_Date', 'context': 'Incardinated_From_Diocese'},
    {'year': 'Excardinated_To_Date', 'context': 'Excardinated_To_Diocese'},
    {'year': 'Faculties_Granted_Date', 'affiliation': 'Faculties'},
    {'year': 'Faculties_Restricted_Date'},
    {'year': 'Faculties_Withdrawn_Date'},
]

# Query for the Record Type IDs
record_type_query = "SELECT Id, DeveloperName FROM RecordType WHERE SobjectType = 'Account' AND DeveloperName IN ('Church', 'Religious')"
record_type_result = sf.query(record_type_query)
record_type_ids = {record['DeveloperName']: record['Id'] for record in record_type_result['records']}

church_record_type_id = record_type_ids.get('Church')
religious_record_type_id = record_type_ids.get('Religious')

# Query for the Record Type ID for 'Ecclesial Affiliation' for mbfc__Placement__c object
record_type_query = "SELECT Id FROM RecordType WHERE SobjectType = 'mbfc__Placement__c' AND DeveloperName = 'Ecclesial_Affiliation' LIMIT 1"
record_type_result = sf.query(record_type_query)
ecclesial_affiliation_record_type_id = record_type_result['records'][0]['Id'] if record_type_result['records'] else None

# Initialize the DataFrame for the staging table
ecclesial_affiliation_staging = pd.DataFrame()

# Function to check and create institution account
@lru_cache(maxsize=None)
def get_or_create_church_account(context):
    if pd.isna(context):
        return None  # Return None or handle as appropriate if institution name is NaN

    # Query Salesforce to find the institution
    query = f"SELECT Id, Name FROM Account WHERE Name = '{context}' LIMIT 1"
    results = sf.query(query)
    
    # If exists, return the ID
    if results['records']:
        return results['records'][0]['Id']
    else:
        # Ensure no NaN values are sent to Salesforce
        if 'Diocese' in context or 'Archdiocese' in context:
            account_data = {
                'Name': context if pd.notna(context) else "Church Name Missing",  # Provide a default if NaN
                'RecordTypeId': church_record_type_id,
                'mbfc__Church_Type__c': 'Diocese'
            }
        else:
            account_data = {
                'Name': context if pd.notna(context) else "Religious Name Missing",  # Provide a default if NaN
                'RecordTypeId': religious_record_type_id
            }

        # Remove keys with None values to avoid JSON serialization issues
        account_data = {k: v for k, v in account_data.items() if v is not None}
        
        new_account = sf.Account.create(account_data)
        return new_account['id']

# Get Contact record ID from Salesforce
@lru_cache(maxsize=None)
def get_contact_id_by_record_number(record_number):
    if pd.isna(record_number):
        return None
    query = f"SELECT Id FROM Contact WHERE Archdpdx_Migration_Id__c = '{record_number}'"
    results = sf.query(query)
    if results['records']:
        return results['records'][0]['Id']
    return None

# Initialize an empty list to collect DataFrames or dictionaries
new_entries = []

# Process each row and each degree set
for index, row in df.iterrows():
    for col_set in column_sets:
        date = row[col_set['year']]
        if pd.notna(date):  # Only proceed if the year column is not NaN
            context = row.get(col_set.get('context'), None)
            account_id = get_or_create_church_account(context)
            contact_id = get_contact_id_by_record_number(row['Record_Number'])
            
            # Determine the mbfc__Affiliation__c value
            if 'Incardinated_From_Date' in col_set['year']:
                affiliation = 'Incardinated'
            elif 'Excardinated_To_Date' in col_set['year']:
                affiliation = 'Excardinated'
            elif 'Faculties_Granted_Date' in col_set['year']:
                faculties_value = row.get(col_set.get('affiliation', ''))
                if pd.isna(faculties_value):
                    affiliation = 'Faculties'
                else:
                    affiliation = f"Faculties ({faculties_value})"
                account_id = diocesan_account_id  # Override account ID for faculties
            elif 'Faculties_Restricted_Date' in col_set['year']:
                affiliation = 'Faculties (Restricted)'
                account_id = diocesan_account_id  # Override account ID for faculties
            elif 'Faculties_Withdrawn_Date' in col_set['year']:
                affiliation = 'Faculties (Withdrawn)'
                account_id = diocesan_account_id  # Override account ID for faculties
            elif 'Date_of_First_Vows' in col_set['year']:
                affiliation = 'First Vows'
            elif 'Date_of_Final_Vows' in col_set['year']:
                affiliation = 'Final Vows'
            else:
                affiliation = row.get(col_set.get('affiliation', ''), None)
            
            # Create a record for the staging table
            affiliation_record = {
                'RecordTypeId': ecclesial_affiliation_record_type_id,
                'mbfc__Person__c': contact_id,
                'mbfc__Completion_Date__c': date,
                'mbfc__Context__c': account_id,
                'mbfc__Category__c': 'Ecclesial Affiliations',
                'mbfc__Affiliation__c': affiliation
            }
            new_entries.append(affiliation_record)

# Convert all collected records to a DataFrame in one go
ecclesial_affiliations_staging = pd.DataFrame(new_entries)



In [None]:
# Apply the function to each row and create a new column with the unique ID
ecclesial_affiliations_staging['Archdpdx_Migration_Id__c'] = ecclesial_affiliations_staging.apply(create_unique_id, axis=1)

# Check for duplicates
ecclesial_affiliations_staging['Archdpdx_Migration_Id__c'].duplicated().value_counts()

In [None]:

# Save the new DataFrame to a CSV
ecclesial_affiliations_staging.to_csv('staging_files/Ecclesial_Affiliations_Staging.csv', index=False, encoding='utf-8-sig')

# Display the DataFrame
ecclesial_affiliations_staging.sample(10)

In [None]:
# Upsert Ecclesial Affiliation records

bulk_data = []
for row in ecclesial_affiliations_staging.itertuples(index=False):
    d = row._asdict()
    # del d['Index']
    bulk_data.append(d)

try:
    # Attempt to upsert Ecclesial Affiliation records into SF using Bulk API
    ecclesial_affil_upsert = sf.bulk.mbfc__Placement__c.upsert(data=bulk_data, external_id_field='Archdpdx_Migration_Id__c', batch_size=100, use_serial=False)
    ecclesial_affil_upsert_results = pd.DataFrame(ecclesial_affil_upsert)
    ecclesial_affil_upsert_results.to_csv('results_files/ecclesial_affil_upsert_results')

except SalesforceMalformedRequest as e:
    # If a SalesforceMalformedRequest error occurs, print the error message and response content
    print(f"SalesforceMalformedRequest error: {e}")
    print(f"Response content: {e.content}")

#FIXME: Handful of Ecclesial Affil records with error: [{'statusCode': 'FIELD_CUSTOM_VALIDATION_EXCEPTION', 'message': 'Context is required', 'fields': []}]"


# AFFILIATIONS


In [None]:
# @title Import Assignments.csv

import pandas as pd

# No longer needed...
# Organization_mapping = {
#     'Offices': 'Organization',
#     'Parishes': 'Church',
#     'RelCommunities': 'Religious',
#     'Schools': 'School',
#     'Vicariates': 'Deanery',
#     'NewmanCenters': 'Organization'
# }

df_affiliations = (
    pd.read_csv('/Users/matthewmartin/Library/CloudStorage/GoogleDrive-matt@meribahflow.com/Shared drives/Clients/ADPDX (Portland)/Data/Clergy DB/reports from clergypdx/Assignments (1).csv')
    .set_index('Record Number', verify_integrity=True)
    .drop(index='recNum', errors='ignore')  # Added errors='ignore' to prevent errors if 'recNum' does not exist
    .drop(columns=['Historic Name'], errors='ignore')  # Added errors='ignore' for the same reason
    .rename(columns=lambda x: x.replace(' ', '_'))  # Remove whitespace in column names
    .assign(Account_Ext_Id=lambda df: df['Organization_Table_Name'] + '_' + df['Organization_Table_Link'])
    .assign(mbfc__Person__r=lambda df: df['Assigned_Person'].apply(lambda x: {'Archdpdx_Migration_Id__c': x}))
    .assign(mbfc__Context__r=lambda df: df['Account_Ext_Id'].apply(lambda x: {'Archdpdx_Migration_Id__c': x}))
    # .assign(mbfc__Use_Custom_Title__c= True)
    .assign(mbfc__Category__c= 'Any All')
    # .assign(Archdpdx_Migration_Id__c= df_affiliations.index)
    .drop(columns=[
        'Assigned_Person'
        ,'Organization_Table_Name'
        ,'Organization_Table_Link'
        ,'Projected_Term_End_Date'
        ,'Term_Number'
        ,'Leave_Type' # Leave out 'Leave_Type' until mapped properly
        ])
    .rename(columns={
        'Duty_Load': 'Duty_Load__c',
        'Start_Date': 'mbfc__Start_Date__c',
        'End_Date': 'mbfc__Completion_Date__c',
        'Assignment_Title': 'mbfc__Affiliation__c',
        'Archdiocesan_Assignment': 'ADPDX_Archdiocesan_Assignment__c',
    })
    .replace({'ADPDX_Archdiocesan_Assignment__c': {'Yes': True, 'No': False, None: False}})
    .fillna('')
)

# Display a sample of the DataFrame to check the new structure
df_affiliations.sample(10)



In [None]:
#TODO: Required fields are missing: [mbfc__Category__c, mbfc__Affiliation__c] 
#TODO: INVALID_TYPE_ON_FIELD_IN_RECORD: Archdiocesan Assignment: value not of required type:  [ADPDX_Archdiocesan_Assignment__c]


In [None]:
# Set Archdpdx_Migration_Id__c External ID
df_affiliations['Archdpdx_Migration_Id__c'] = df_affiliations.index

# Create Job ID
df_affiliations['Archdpdx_Job_Id__c'] = curr_job_id



In [None]:
# Final cleanup
df_affiliations.drop(columns=['Account_Ext_Id'], inplace=True)

#FIXME: INVALID_FIELD: Foreign key external ID: relcommunities_23 not found for field Archdpdx_Migration_Id__c
#FIXME: INVALID_FIELD: Foreign key external ID: offices_0 not found for field Archdpdx_Migration_Id__c
#FIXME: Record #115 > FIELD_INTEGRITY_EXCEPTION: Start Date: invalid date: Tue Aug 01 00:00:00 GMT 1021 [mbfc__Start_Date__c

In [None]:
df_affiliations.to_csv('staging_files/affiliations_staging.csv', encoding='utf-8', index=False)

In [None]:
# @title Upsert Register Entry Records

bulk_data = []
for row in df_affiliations.itertuples(index=False):
    d = row._asdict()
    bulk_data.append(d)

In [None]:
# Attempt to use s-s's bulk 2.0 api
# with open('staging_files/affiliations_staging.csv', 'r', encoding='utf-8') as file:
#     csv_data = file.read()


# affiliation_upsert = sf.bulk2.mbfc__Placement__c.upsert('staging_files/affiliations_staging.csv', external_id_field='Archdpdx_Migration_Id__c', encode='utf-8')

In [None]:
# Upsert Salesforce records
# FIXME: Encoding is getting messed up and I'm unsure how to pass in a parameter that will fix this. 




try:
    # Attempt to upsert Affiliation records into SF using Bulk API
    affiliation_upsert = sf.bulk.mbfc__Placement__c.upsert(data=bulk_data, external_id_field='Archdpdx_Migration_Id__c', batch_size=1000, use_serial=False)
    affiliation_upsert_results = pd.DataFrame(affiliation_upsert)
    affiliation_upsert_results.to_csv('results_files/affiliation_upsert_results')

except SalesforceMalformedRequest as e:
    # If a SalesforceMalformedRequest error occurs, print the error message and response content
    print(f"SalesforceMalformedRequest error: {e}")
    print(f"Response content: {e.content}")


# Post-Migration Manual Updates

1. Convert 'Offices' that are ADPDX Pastoral Centre offices into record type: 'Groups', and set their parentID to the Diocese (there are just 6 of these accounts).
1. Update the Religous Order records 'Religious Superior' lookup.
1. Set 'organization type' field value for each account in the 'organization' load: Offices, Newman Centres, Schools, Organizations
1. Consolidate education degree titles in 'Affiliation.Affiliation' picklist into the standard value
