<a href="https://colab.research.google.com/github/Cath-Strategic-Tech/adpdx_etl/blob/main/ADPDX_ClergyDB.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>


# Introduction

The following notebook orchestrates the migration of ADPDX Accounts into Salesforce.


# Order of Operations

- Setup Enviro

  - [DONE] UDFs
  - [DONE] Load SF xref data

- ACCOUNTS

  - Extract Source Data
    - [DONE] Load 6 tables into separate dataframes
    - [DONE] Merge into single accounts table
    - [DONE]: Fix the ExternalID so that it references the original table, not the AccRecordType
  - Transform
    - TODO: Strip phone numbers
    - TODO: validate email addresses
    - TODO: handle churches that aren't parishes (missions, non-diocesan parishes, etc.)
  - Load
    - [DONE]Vicariates
    - [DONE] Organizations (Parishes, Schools, Newman Centres, Offices)
    - Religious
      - [DONE] Religious Parent accounts
      - [DONE] Religious Communities
      - [DONE] Religious Superiors (Contacts, set AccountID to Rel. Parent)
        - TODO: Handle invalid email addresses
        - TODO: Handle duplicate entries
      - TODO: Update Religious Communities with lookup to Rel. Superior
  - TODO: Unit Tests
    - Num of Accounts, by type
    - Spot checking 3-5 account records & field values

- CONTACTS

  - Extract

    - [DONE] Import Contact records
    - TODO: Get Photo directory @soames

  - Analysis

    - [DONE] Check columns & row count (3016)
    - [DONE] Identify unique languages

  - Transform

    - Complete ETL of fields that are more complex (search for TODO)
    - [DONE] Create new df_contact_staging, renaming columns to SF APIs
    - [DONE] Drop columns that don't map to Contact
    - Migrate Languages field (waiting on next package version) @soames
      - TODO: transform `,` to `;` so imports to multi-select list correctly
    - TODO: Concat Mailing Street Address lines into one
    - TODO: Handle Private Addresses: decide if will code changes or NOT use a custom Private Address field.
    - [DONE] Update boolean fields to True/False
    - [DONE] Set Contact Record Type (UDF)
    - [DONE] Validate, drop invalid emails
    - [DONE] Generate ExternalID > 'Archdpdx_External_Id\_\_c'
    - TODO: Preferred Email/Phone > where blank, set a default. Currently, all are getting set to 'Personal' and 'Mobile.'
    - TODO: Ecclesial Status (not mapping correctly)
    - [DONE] DROP columns that haven't been mapped yet

  - Load
    - [DONE] Set JobID to curr_job_id
    - [DONE] Handle character encoding that is geting messed up
    - FIXME: Fix why the simple-salesforce insert isn't working

- CONTACTS > SPOUSES

  - TODO: Contact's Spouses

- CONTACTS > PHOTOS

  - [DONE] Investigate how to migrate photos into a RTF (rich text field)
  - TODO: Contact's photos @slum-mfc

- CONTACTS > REGISTER ENTRIES

  - Parse columns into types of Sacraments or Notations
  - For lookups to Celebrants, query SF for contacts, create missing records
  - Generate External ID, apply to df
  - Clean up (remove extra columns, NaNs)
  - Upsert records

- CONTACTS > AFFILIATIONS

  - TODO: Map the various Contact fields that are actually Affiliations (start with manual migration)
    - Education/Degrees
    - Minor Orders
    - REligious Vows
    - Candidacy records (should this be another object?)
    - In/Excardination
    - Faculties

- AFFILIATIONS TABLE

  - Extract

    - [DONE] Turn the 'Org Table Name' & 'org Table Link' columns into External ID
    - Map in the Account IDs from SF
    -

  - Transform

    - Parse RecordTypeId
    - Parse Category
    - Map columns to SF field APIs

  - Load


# Setup Enviro


In [9]:
# !conda install -y simple-salesforce
# !conda install -y email_validator
# !conda install -y python-dotenv

# !conda install import-ipynb


In [10]:
# enviro setup

import pandas as pd

from datetime import datetime
now = datetime.now()

from simple_salesforce import Salesforce

In [11]:
# import environment variables (SF login credentials)
from dotenv import load_dotenv
import os

load_dotenv()

True

In [12]:
# @title Global Variables { run: "auto", vertical-output: true, display-mode: "both" }

target_enviro = "adpdx_devpro" # @param {type:"string"}

diocesan_account_id_devpro = "001Dx00001CwMTQIA3" # @param {type:"string"}

# @markdown The `run_upserts` variable controls whether or not upserts to Salesforce are executed when the notebook is run.
run_upserts = "True" # @param ["True", "False"]

In [13]:
# ADPDX dev_pro credentials
adpdx_devpro_user = os.getenv('ADPDX_DEVPRO_USER')
adpdx_devpro_pass = os.getenv('ADPDX_DEVPRO_PASS')
adpdx_devpro_token = os.getenv('ADPDX_DEVPRO_TOKEN')

# instantiate a SF session object
sf = Salesforce(domain='test', username='matt+adpdx@meribahflow.com.devpro', password=adpdx_devpro_pass, security_token=adpdx_devpro_token)

## UDFs


In [14]:
# Job ID Incrementer

def update_job_id(file_name):
    # Open the file in read mode and get the current job ID
    with open(file_name, 'r') as file:
        current_job_id = int(file.readline())

    # Increment the job ID
    new_job_id = current_job_id + 1

    # Open the file in write mode and update the job ID
    with open(file_name, 'w') as file:
        file.write(str(new_job_id))

    # Return the new job ID
    return new_job_id


# Concates two DF columns for an External ID

def concat_columns(df, columns, new_column, separator='_'):
    """
    Concatenates the values from specified columns into a single string
    with the specified separator and populates a new column in the DataFrame.

    Args:
    - df: pandas DataFrame
    - columns: list of column names to concatenate
    - new_column: name of the new column to be created
    - separator: separator to use between concatenated values (default is '_')

    Returns:
    - Updated pandas DataFrame with the new column
    """
    df[new_column] = df[columns].astype(str).apply(lambda x: separator.join(x), axis=1)
    return df



## Extract Salesforce xref data

The following cells downloads all records from the target Salesforce enviro for the following objects:

- RecordTypes
- Users
- Accounts
- Contacts


In [15]:
# get all ACTIVE SF users

sf_users = sf.query('Select Alias, FirstName, LastName, Username, id from User WHERE IsActive = True')
df_sf_users = pd.DataFrame(sf_users['records'])
df_sf_users = df_sf_users.drop(columns = 'attributes')
df_sf_users.shape

(8, 5)

In [16]:
# get all SF Record Types
get_all_recordTypes = 'Select Id, Name, DeveloperName, sObjecttype, namespaceprefix from RecordType'

# get list of records, add to dataframe
sf_recordTypes = sf.query(get_all_recordTypes)
df_sf_recordTypes = pd.DataFrame(sf_recordTypes['records'])
df_sf_recordTypes = df_sf_recordTypes.drop(columns = 'attributes')

# Create a dictionary mapping 'DeveloperName' to 'Id' for faster lookup
record_types_mapping = df_sf_recordTypes.set_index('DeveloperName')['Id'].to_dict()

df_sf_recordTypes

Unnamed: 0,Id,Name,DeveloperName,SobjectType,NamespacePrefix
0,012Dx0000007yCpIAI,Property,Property,Account,
1,012Dx0000007yIOIAY,Ecclesial Affiliation,Ecclesial_Affiliation,mbfc__Placement__c,
2,012Dx0000007yITIAY,Pastoral Assignments,Assignments_Clergy,mbfc__Placement__c,
3,012Dx0000007yIYIAY,Staff,Staff,mbfc__Placement__c,
4,012Dx0000007yIdIAI,Lay Person,Lay_Person,mbfc__Placement__c,
5,012Dx0000007yOCIAY,Diocesan Appointment,Diocesan_Appointment,mbfc__Placement__c,
6,012Dx0000007yOHIAY,Clergy/Religious Residence,Clergy_Religious_Residence,mbfc__Placement__c,
7,012Dx0000007yOMIAY,Education,Education,mbfc__Placement__c,
8,012Dx0000007yORIAY,Ministerial Status,Ministerial_Status,mbfc__Placement__c,
9,012Dx0000007yTgIAI,z) All Types,All_Types,mbfc__Placement__c,


In [17]:
# get SF Account
get_all_accounts = 'Select id, Name, RecordTypeId, Type, mbfc__Parish_Code__c, Job_Id__c, Archdpdx_Migration_Id__c from Account'

# get list of records, add to dataframe
sf_accounts = sf.query(get_all_accounts)
df_sf_accounts = pd.DataFrame(sf_accounts['records'])
df_sf_accounts = df_sf_accounts.drop(columns = 'attributes')
df_sf_accounts.shape

(2000, 7)

In [18]:
# get SF Contacts
get_all_contacts = 'Select id, Name, npe01__Type_of_Account__c, RecordTypeId, Archdpdx_Migration_Id__c, CreatedById from Contact'

# get list of records, add to dataframe
sf_contacts = sf.query(get_all_contacts)
df_sf_contacts = pd.DataFrame(sf_contacts['records'])
df_sf_contacts = df_sf_contacts.drop(columns = 'attributes')
df_sf_contacts.shape

(2000, 6)

# ACCOUNTS


## Extract


### Load ArchdPDX csvs as DataFrames

ADPDX data for organizations is held in 6 tables, all of which will be migrated into Salesforce's Accounts object.


In [19]:
df_offices = pd.read_csv('/Users/matthewmartin/Library/CloudStorage/GoogleDrive-matt@meribahflow.com/Shared drives/Clients/ADPDX (Portland)/Data/Clergy DB/reports from clergypdx/Offices.csv', skiprows= lambda x: x in [1])
df_offices["src_table"] = 'Offices'
df_offices["AccountRecordType"] = 'Organization'
df_offices.rename({"Name": "Account Name"}, axis="columns", inplace=True)
df_offices.columns

Index(['Record Number', 'Common Name', 'Account Name',
       'Archdiocese Assigns Clergy', 'Locator Description', 'Mailing Address',
       'Mailing Address 2', 'Mailing Address City', 'Mailing Address State',
       'Mailing Address Province', 'Mailing Address Postal Code',
       'Mailing Address Country', 'Phone', 'Fax', 'Email', 'Web Site',
       'src_table', 'AccountRecordType'],
      dtype='object')

In [20]:
df_parishes = pd.read_csv('/Users/matthewmartin/Library/CloudStorage/GoogleDrive-matt@meribahflow.com/Shared drives/Clients/ADPDX (Portland)/Data/Clergy DB/reports from clergypdx/Parishes (3).csv', dtype={'Vicariate': 'object'}, skiprows= lambda x: x in [1])
df_parishes["src_table"] = 'Parishes'
df_parishes["AccountRecordType"] = 'Church'
df_parishes.rename({"Parish Formal Name": "Account Name"}, axis="columns", inplace=True)
df_parishes.columns

Index(['Record Number', 'Common Name', 'Sort Name', 'Parish Name',
       'Account Name', 'Parish City', 'Archdiocese Assigns Clergy',
       'Mission Of', 'Established', 'Vicariate', 'Non-Latin',
       'Locator Description', 'Mailing Address', 'Mailing Address 2',
       'Mailing Address City', 'Mailing Address State',
       'Mailing Address Province', 'Mailing Address Postal Code',
       'Mailing Address Country', 'County', 'Phone', 'Fax', 'Email',
       'Web Site', 'Disabled Access', 'Sanctuary Capacity',
       'Lat/Long Coordinates Decimal', 'Google Small Embed URL',
       'Miles to Pastoral Center', 'Schedule 1 Head', 'Schedule 1 Text',
       'Schedule 2 Head', 'Schedule 2 Text', 'Schedule 3 Head',
       'Schedule 3 Text', 'Schedule 4 Head', 'Schedule 4 Text',
       'Schedule 5 Head', 'Schedule 5 Text', 'Schedule 6 Head',
       'Schedule 6 Text', 'Schedule 7 Head', 'Schedule 7 Text', 'src_table',
       'AccountRecordType'],
      dtype='object')

In [21]:
df_religious = pd.read_csv('/Users/matthewmartin/Library/CloudStorage/GoogleDrive-matt@meribahflow.com/Shared drives/Clients/ADPDX (Portland)/Data/Clergy DB/reports from clergypdx/RelCommunities.csv', skiprows= lambda x: x in [1])
df_religious["src_table"] = 'RelCommunities'
df_religious["AccountRecordType"] = 'Religious'
df_religious.rename({"Community Name": "Account Name"}, axis="columns", inplace=True)
df_religious.columns

Index(['Record Number', 'Common Name', 'Account Name', 'Community City',
       'Archdiocese Assigns Clergy', 'Order Full Name', 'Order Common Name',
       'Order Letters', 'Men or Women', 'Non-Latin Rite', 'Show Order in Name',
       'Description', 'Locator Description', 'Mailing Address',
       'Mailing Address 2', 'Mailing Address City', 'Mailing Address State',
       'Mailing Address Province', 'Mailing Address Postal Code',
       'Mailing Address Country', 'Phone', 'Fax', 'Email', 'Web Site',
       'Religious Order', 'Secular Order', 'Diocesan Order',
       'Pontifical Order', 'Local Superior', 'Major Superior Name',
       'Major Superior Phone', 'Major Superior Email', 'src_table',
       'AccountRecordType'],
      dtype='object')

In [22]:
df_schools = pd.read_csv('/Users/matthewmartin/Library/CloudStorage/GoogleDrive-matt@meribahflow.com/Shared drives/Clients/ADPDX (Portland)/Data/Clergy DB/reports from clergypdx/Schools.csv', skiprows= lambda x: x in [1])
df_schools["src_table"] = 'Schools'
df_schools["AccountRecordType"] = 'School'
df_schools.rename({"School Name": "Account Name"}, axis="columns", inplace=True)
df_schools.columns

Index(['Record Number', 'Common Name', 'Account Name', 'School City',
       'Archdiocese Assigns Clergy', 'Parish Link', 'Vicariate Link',
       'Archdiocesan School Code', 'Grades Provided', 'Established',
       'Locator Description', 'Mailing Address 1', 'Mailing Address 2',
       'Mailing Address City', 'Mailing Address State',
       'Mailing Address Province', 'Mailing Address Zip',
       'Mailing Address Country', 'Phone', 'Fax', 'Email', 'Web Site',
       'Lat/Long Coordinates Decimal', 'Google Small Embed URL', 'src_table',
       'AccountRecordType'],
      dtype='object')

In [23]:
df_vicariates = pd.read_csv('/Users/matthewmartin/Library/CloudStorage/GoogleDrive-matt@meribahflow.com/Shared drives/Clients/ADPDX (Portland)/Data/Clergy DB/reports from clergypdx/Vicariates.csv', skiprows= lambda x: x in [1])
df_vicariates["src_table"] = 'Vicariates'
df_vicariates["AccountRecordType"] = 'Deanery'
# As we want to designate the Common Name as what will be the Account Name in Salesforce, we are renaming these columns in a different pattern than prior CSVs.
df_vicariates.rename({"Common Name": "Account Name"}, axis="columns", inplace=True)

df_vicariates.columns

Index(['Record Number', 'Account Name', 'Vicariate Name',
       'Archdiocese Assigns Clergy', 'src_table', 'AccountRecordType'],
      dtype='object')

In [24]:
df_newman = pd.read_csv('/Users/matthewmartin/Library/CloudStorage/GoogleDrive-matt@meribahflow.com/Shared drives/Clients/ADPDX (Portland)/Data/Clergy DB/reports from clergypdx/NewmanCenters.csv', skiprows= lambda x: x in [1])
df_newman["src_table"] = 'NewmanCenters'
df_newman["AccountRecordType"] = 'Organization'
df_newman.rename({"Newman Center Name": "Account Name", "Newman Center City": "Mailing Address City2"}, axis="columns", inplace=True)
df_newman.columns

Index(['Record Number', 'Common Name', 'Account Name', 'Mailing Address City2',
       'Archdiocese Assigns Clergy', 'Established', 'Locator Description',
       'Mailing Address', 'Mailing Address 2', 'Mailing Address City',
       'Mailing Address State', 'Mailing Address Province',
       'Mailing Address Postal Code', 'Mailing Address Country', 'Phone',
       'Fax', 'Email', 'Web Site', 'Lat/Long Coordinates Decimal',
       'Google Small Embed URL', 'Miles to Pastoral Center', 'Schedule 1 Head',
       'Schedule 1 Text', 'Schedule 2 Head', 'Schedule 2 Text',
       'Schedule 3 Head', 'Schedule 3 Text', 'Schedule 4 Head',
       'Schedule 4 Text', 'Schedule 5 Head', 'Schedule 5 Text',
       'Schedule 6 Head', 'Schedule 6 Text', 'Schedule 7 Head',
       'Schedule 7 Text', 'src_table', 'AccountRecordType'],
      dtype='object')

Each of the 6 tables has an overlapping but distinct set of columns, making it challenging to conform these tables into a single staging table.

In addition, columns that correspond to the same field in salesforce are named differently in each table (eg. 'Parish City' vs. 'Religious City' vs. 'Newman Center City')


In [25]:
print('TABLE: (ROWS, COLUMNS)\n')

print(f'Offices:    {df_offices.shape}')
print(f'Parishes:   {df_parishes.shape}')
print(f'Religious:  {df_religious.shape}')
print(f'Schools:    {df_schools.shape}')
print(f'Vicariates: {df_vicariates.shape}')
print(f'Newman Ctr: {df_newman.shape}')

TABLE: (ROWS, COLUMNS)

Offices:    (35, 18)
Parishes:   (151, 45)
Religious:  (70, 34)
Schools:    (56, 26)
Vicariates: (18, 6)
Newman Ctr: (4, 37)


### Merge DFs into a single Accounts DF

This step takes 6 different tables and combines them into a single Accounts table for cleaning and staging.


In [26]:
# init list of DataFrames
src_accounts = [df_offices, df_parishes, df_religious, df_schools, df_vicariates, df_newman]

# concats the various Account dataframes into one large table
accounts = pd.concat(src_accounts, ignore_index=True)

accounts.head(5)

Unnamed: 0,Record Number,Common Name,Account Name,Archdiocese Assigns Clergy,Locator Description,Mailing Address,Mailing Address 2,Mailing Address City,Mailing Address State,Mailing Address Province,...,Major Superior Email,School City,Parish Link,Vicariate Link,Archdiocesan School Code,Grades Provided,Mailing Address 1,Mailing Address Zip,Vicariate Name,Mailing Address City2
0,1,Pastoral Center,Pastoral Center,Yes,,2838 E Burnside St,,Portland,OR,,...,,,,,,,,,,
1,3,Catholic Sentinel,Catholic Sentinel,No,,2838 E Burnside St,,Portland,OR,,...,,,,,,,,,,
2,4,Catholic Cemeteries,Catholic Cemeteries,No,,333 SW Skyline Blvd,,Portland,OR,,...,,,,,,,,,,
3,6,Griffin Center,Griffin Center,No,,11957 SE Fuller Rd,,Milwaukie,OR,,...,,,,,,,,,,
4,11,Providence Portland Medical Center,Providence Portland Medical Center,Yes,,4805 NE Glisan St,,Portland,OR,,...,,,,,,,,,,


Time to do some table column renaming and re-organizing!


In [27]:
# renames columns headers to consolidate account names into SF-conformed data model
accounts.rename({"Common Name": "Name, City"}, axis="columns", inplace=True)

accounts.rename(
    columns={
        'Account Name': 'Name',
        'Mailing Address': 'BillingStreet',
        'Mailing Address 2': 'BillingStreet2',
        'Mailing Address City': 'BillingCity',
        'Mailing Address State': 'BillingState',
        'Mailing Address Postal Code': 'BillingPostalCode',
        'Mailing Address Country': 'BillingCountry',
        'Email': 'mbfc__Email__c',
        'Web Site': 'Website',
        'Order Common Name': 'mbfc__Abbreviation__c',
        'Order Letters': 'mbfc__Religious_Suffix__c',
        'Men or Women': 'mbfc__Type_Members__c'
    },
    inplace=True
)

# reorder column order
col = accounts.pop('Name')
accounts.insert(2, col.name, col)

col = accounts.pop('Parish Name')
accounts.insert(3, col.name, col)

col = accounts.pop('AccountRecordType')
accounts.insert(1, col.name, col)

accounts[accounts.BillingStreet2.isna() == False]

Unnamed: 0,Record Number,AccountRecordType,"Name, City",Name,Parish Name,Archdiocese Assigns Clergy,Locator Description,BillingStreet,BillingStreet2,BillingCity,...,Major Superior Email,School City,Parish Link,Vicariate Link,Archdiocesan School Code,Grades Provided,Mailing Address 1,Mailing Address Zip,Vicariate Name,Mailing Address City2
14,32,Organization,Diaconate Office,Diaconate Office,,Yes,,Pastoral Center,2838 E Burnside St,Portland,...,,,,,,,,,,
32,58,Organization,Office of Marketing and Communications,Office of Marketing and Communications,,Yes,,Pastoral Center,2838 E Burnside St,Portland,...,,,,,,,,,,
35,1,Church,"Our Lady of Perpetual Help, St Mary’s, Albany","Our Lady of Perpetual Help, St Mary’s",,Yes,SW Ellsworth St between 8th and 9th Streets,"Our Lady of Perpetual Help, St Mary’s Parish",815 Broadalbin St SW,Albany,...,,,,,,,,,,
36,2,Church,"St. Andrew Dũng-Lạc Mission, Aloha",St. Andrew Dũng-Lạc,,No,SW Grabhorn Rd/209th Ave and Farmington Rd,St. Andrew Dũng-Lạc Mission,7390 SW Grabhorn Rd,Aloha,...,,,,,,,,,,
37,3,Church,"St. Elizabeth Ann Seton, Aloha",St. Elizabeth Ann Seton,,Yes,,St. Elizabeth Ann Seton Parish,3145 SW 192nd Ave,Aloha,...,,,,,,,,,,
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
236,62,Religious,"Work of Jesus the High Priest, Gresham (OJSS)",Work of Jesus the High Priest,,No,,OJSS Community,451 NW 1st St,Gresham,...,,,,,,,,,,
238,64,Religious,"Heralds of the Good News, Portland (HGN)",Heralds of the Good News,,No,,c/o Chancellor,2838 E Burnside St,Portland,...,rkappumkal@gmail.com,,,,,,,,,
239,65,Religious,"Missionary Oblates of Mary Immaculate, Rome, I...",Missionary Oblates of Mary Immaculate,,No,,Missionary Oblates of Mary Immaculate,Via Aurelia 290,Roma,...,gensec@omigen.org,,,,,,,,,
247,73,Religious,"Brothers of Saint John, Laredo, TX (CSJ)",Brothers of Saint John,,No,,St. John Priory,505 Century Dr S,Laredo,...,,,,,,,,,,


In [28]:
# export merged tables DESCRIPTION to CSV for mapping
accounts.describe(include='all').transpose().to_csv(f'/Users/matthewmartin/Library/CloudStorage/GoogleDrive-matt@meribahflow.com/Shared drives/Clients/ADPDX (Portland)/Data/Clergy DB/working/accounts.csv')
accounts.describe(include='all').transpose()

Unnamed: 0,count,unique,top,freq,mean,std,min,25%,50%,75%,max
Record Number,334.0,,,,54.5,41.389801,1.0,21.25,45.0,76.75,173.0
AccountRecordType,334,5,Church,151,,,,,,,
"Name, City",316,316,Pastoral Center,1,,,,,,,
Name,334,291,St. Mary,5,,,,,,,
Parish Name,5,5,St. Anne,1,,,,,,,
...,...,...,...,...,...,...,...,...,...,...,...
Grades Provided,52,12,PS-8,20,,,,,,,
Mailing Address 1,56,55,4420 SW St Marys Dr,2,,,,,,,
Mailing Address Zip,56.0,,,,97222.446429,124.9586,97005.0,97134.75,97217.5,97301.0,97526.0
Vicariate Name,18,18,Albany-Corvallis,1,,,,,,,


## Transform


In [29]:
# Create a single BillingAddress field
# billingstreet = str(f"Mailing Addresss 1 /n Mailing Address 2")
# accounts['BillingStreet'] = accounts['Mailing Address 1'].astype(str) + accounts['Mailing Address 2'].astype(str)

### AccountRecordType & ChurchType


In [30]:
# Sets all rows where AccountRecordType is Church as a Parish. THIS MAY NEED NUANCING
accounts.loc[accounts['AccountRecordType'] == 'Church', 'mbfc__Church_Type__c'] = 'Parish'
accounts[accounts['AccountRecordType'] == 'Church'].head(5)


Unnamed: 0,Record Number,AccountRecordType,"Name, City",Name,Parish Name,Archdiocese Assigns Clergy,Locator Description,BillingStreet,BillingStreet2,BillingCity,...,School City,Parish Link,Vicariate Link,Archdiocesan School Code,Grades Provided,Mailing Address 1,Mailing Address Zip,Vicariate Name,Mailing Address City2,mbfc__Church_Type__c
35,1,Church,"Our Lady of Perpetual Help, St Mary’s, Albany","Our Lady of Perpetual Help, St Mary’s",,Yes,SW Ellsworth St between 8th and 9th Streets,"Our Lady of Perpetual Help, St Mary’s Parish",815 Broadalbin St SW,Albany,...,,,,,,,,,,Parish
36,2,Church,"St. Andrew Dũng-Lạc Mission, Aloha",St. Andrew Dũng-Lạc,,No,SW Grabhorn Rd/209th Ave and Farmington Rd,St. Andrew Dũng-Lạc Mission,7390 SW Grabhorn Rd,Aloha,...,,,,,,,,,,Parish
37,3,Church,"St. Elizabeth Ann Seton, Aloha",St. Elizabeth Ann Seton,,Yes,,St. Elizabeth Ann Seton Parish,3145 SW 192nd Ave,Aloha,...,,,,,,,,,,Parish
38,4,Church,"St. Peter the Fisherman Mission, Arch Cape",St. Peter the Fisherman,,Yes,79441 Hwy 101 S,St. Peter the Fisherman Mission,PO Box 29,Seaside,...,,,,,,,,,,Parish
39,5,Church,"Our Lady of the Mountain, Ashland",Our Lady of the Mountain,,Yes,,Our Lady of the Mountain Parish,987 Hillview Dr,Ashland,...,,,,,,,,,,Parish


### Generate ExternalId


In [31]:
# Generate an External ID
# columns_to_concate = ['AccountRecordType', 'Record Number']
# accounts = concat_columns(accounts, columns_to_concate, 'Archdpdx_Migration_Id__c', separator='_')

In [32]:
# NEW Generate an External ID
columns_to_concate = ['src_table', 'Record Number']
accounts = concat_columns(accounts, columns_to_concate, 'Archdpdx_Migration_Id__c', separator='_')

In [33]:
# set Deanery RecordTypeId to the Church RecordTypeId
# map in RecordTypeIds
accounts['RecordTypeId'] = accounts['AccountRecordType'].map(record_types_mapping)
record_types_mapping

{'Property': '012Dx0000007yCpIAI',
 'Ecclesial_Affiliation': '012Dx0000007yIOIAY',
 'Assignments_Clergy': '012Dx0000007yITIAY',
 'Staff': '012Dx0000007yIYIAY',
 'Lay_Person': '012Dx0000009TK3IAM',
 'Diocesan_Appointment': '012Dx0000007yOCIAY',
 'Clergy_Religious_Residence': '012Dx0000007yOHIAY',
 'Education': '012Dx0000009TKBIA2',
 'Ministerial_Status': '012Dx0000007yORIAY',
 'All_Types': '012Dx0000007yTgIAI',
 'Religious': '012Dx0000009TK6IAM',
 'Church': '012Dx0000009TJxIAM',
 'Deanery': '012Dx0000009TJyIAM',
 'Group': '012Dx0000009TJzIAM',
 'School': '012Dx0000009TK1IAM',
 'Consecrated': '012Dx0000009TK2IAM',
 'Permanent_Deacon': '012Dx0000009TK4IAM',
 'Priest': '012Dx0000009TK5IAM',
 'All': '012Dx0000009TK7IAM',
 'Chancery_Users': '012Dx0000009TK8IAM',
 'Diocean_Users': '012Dx0000009TK9IAM',
 'Parish_Users': '012Dx0000009TKAIA2',
 'Employment': '012Dx0000009TKCIA2',
 'Ministry_Volunteer': '012Dx0000009TKDIA2',
 'Parishioner': '012Dx0000009TKEIA2',
 'Organization': '012Hu000001pkqEI

In [34]:
#TODO: Set 'organization type' field value for each account in the 'organization' load: Offices, Newman Centres, Schools, Organizations
# Might be best if it is set manually at the end of the migration.

## Load


### Generate a new Job ID


In [35]:
# increment to the job_id
file_name = '/Users/matthewmartin/Library/CloudStorage/GoogleDrive-matt@meribahflow.com/Shared drives/Clients/ADPDX (Portland)/Data/Clergy DB/jobs/job_id'
curr_job_id = update_job_id(file_name)
print(f"New job ID: {curr_job_id}")

# add/update account DF with job_id
accounts["Job_Id__c"] = curr_job_id


New job ID: 50


### A) Vicariates


In [36]:
vicariates = accounts[accounts['AccountRecordType'] == 'Deanery']

vicariates = vicariates[[
    'Record Number',
    'Name',
    # 'AccountRecordType',
    'Job_Id__c',
    'Archdpdx_Migration_Id__c',
    'RecordTypeId'
    ]]

# add parentid
vicariates["mbfc__Diocese__c"] = diocesan_account_id_devpro
vicariates['ParentId'] = diocesan_account_id_devpro
vicariates['mbfc__Church_Type__c'] = 'Deanery'

vicariates.rename(columns={
        # 'Name, City': 'Name',
        'External_Id': 'Archdpdx_Migration_Id__c'
    }, inplace=True)

vicariates.reset_index()
vicariates.set_index('Record Number', inplace=True)

vicariates

Unnamed: 0_level_0,Name,Job_Id__c,Archdpdx_Migration_Id__c,RecordTypeId,mbfc__Diocese__c,ParentId,mbfc__Church_Type__c
Record Number,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1
1,Albany-Corvallis Vicariate,50,Vicariates_1,012Dx0000009TJyIAM,001Dx00001CwMTQIA3,001Dx00001CwMTQIA3,Deanery
2,"Beaverton, Suburban Vicariate",50,Vicariates_2,012Dx0000009TJyIAM,001Dx00001CwMTQIA3,001Dx00001CwMTQIA3,Deanery
3,Columbia County Vicariate,50,Vicariates_3,012Dx0000009TJyIAM,001Dx00001CwMTQIA3,001Dx00001CwMTQIA3,Deanery
4,Downtown Portland Vicariate,50,Vicariates_4,012Dx0000009TJyIAM,001Dx00001CwMTQIA3,001Dx00001CwMTQIA3,Deanery
5,"East Portland, Suburban Vicariate",50,Vicariates_5,012Dx0000009TJyIAM,001Dx00001CwMTQIA3,001Dx00001CwMTQIA3,Deanery
6,Marion County Vicariate,50,Vicariates_6,012Dx0000009TJyIAM,001Dx00001CwMTQIA3,001Dx00001CwMTQIA3,Deanery
7,Metropolitan Eugene Vicariate,50,Vicariates_7,012Dx0000009TJyIAM,001Dx00001CwMTQIA3,001Dx00001CwMTQIA3,Deanery
8,Metropolitan Salem Vicariate,50,Vicariates_8,012Dx0000009TJyIAM,001Dx00001CwMTQIA3,001Dx00001CwMTQIA3,Deanery
9,North Coast Vicariate,50,Vicariates_9,012Dx0000009TJyIAM,001Dx00001CwMTQIA3,001Dx00001CwMTQIA3,Deanery
10,Northeast Portland Vicariate,50,Vicariates_10,012Dx0000009TJyIAM,001Dx00001CwMTQIA3,001Dx00001CwMTQIA3,Deanery


#### Export Vicariates to CSV


In [37]:
# export to CSV
vicariates.to_csv('/Users/matthewmartin/Library/CloudStorage/GoogleDrive-matt@meribahflow.com/Shared drives/Clients/ADPDX (Portland)/Data/Clergy DB/staging/vicariates_staging.csv')


#### Upsert Vicariates


In [38]:
bulk_data = []
for row in vicariates.itertuples(index=False):
    d = row._asdict()
    # del d['Index']
    bulk_data.append(d)

if run_upserts == 'True':
    vicariate_upsert = sf.bulk.Account.upsert(data=bulk_data, external_id_field='Archdpdx_Migration_Id__c', batch_size=100, use_serial=False)
    upserts = pd.DataFrame(vicariate_upsert)

    print(upserts)
    

    success  created                  id errors
0      True    False  001Dx00001FaTb0IAF     []
1      True    False  001Dx00001FaTb1IAF     []
2      True    False  001Dx00001FaTb2IAF     []
3      True    False  001Dx00001FaTb3IAF     []
4      True    False  001Dx00001FaTb4IAF     []
5      True    False  001Dx00001FaTb5IAF     []
6      True    False  001Dx00001FaTb6IAF     []
7      True    False  001Dx00001FaTb7IAF     []
8      True    False  001Dx00001FaTb8IAF     []
9      True    False  001Dx00001FaTb9IAF     []
10     True    False  001Dx00001FaTbAIAV     []
11     True    False  001Dx00001FaTbBIAV     []
12     True    False  001Dx00001FaTbCIAV     []
13     True    False  001Dx00001FaTbDIAV     []
14     True    False  001Dx00001FaTbEIAV     []
15     True    False  001Dx00001FaTbFIAV     []
16     True    False  001Dx00001FaTbGIAV     []
17     True    False  001Dx00001FaTbHIAV     []


In [39]:
# create a map of Vicariate lookup ids to the unique ids generated on the Vicariate records
# vicariates_externalid_map = vicariates.set_index('Record Number')['Archdpdx_Migration_Id__c'].to_dict()

# vicariates_externalid_map

In [40]:
# Generate an Errors log
import csv

keys = vicariate_upsert[0].keys()

with open('vicariate_results', 'w', newline='') as csv_file:
    writer = csv.DictWriter(csv_file, keys)
    writer.writeheader()
    writer.writerows(vicariate_upsert)

In [41]:
# @title Get Vicariate records from SF

sf_deaneries = sf.query("SELECT Archdpdx_Migration_Id__c, Id FROM Account WHERE RecordType.DeveloperName = 'Deanery'")

df_sf_deaneries = pd.DataFrame(sf_deaneries['records'])
df_sf_deaneries = df_sf_deaneries.drop(columns = 'attributes')

df_sf_deaneries

# Creates a dict of Vicariate unique ids to the new Salesforce record IDs, so can populate on latter Account records
vicariate_sf_recordids = df_sf_deaneries.set_index('Archdpdx_Migration_Id__c')['Id'].to_dict()
vicariate_sf_recordids

{'Vicariates_1': '001Dx00001FaTb0IAF',
 'Vicariates_2': '001Dx00001FaTb1IAF',
 'Vicariates_3': '001Dx00001FaTb2IAF',
 'Vicariates_4': '001Dx00001FaTb3IAF',
 'Vicariates_5': '001Dx00001FaTb4IAF',
 'Vicariates_6': '001Dx00001FaTb5IAF',
 'Vicariates_7': '001Dx00001FaTb6IAF',
 'Vicariates_8': '001Dx00001FaTb7IAF',
 'Vicariates_9': '001Dx00001FaTb8IAF',
 'Vicariates_10': '001Dx00001FaTb9IAF',
 'Vicariates_11': '001Dx00001FaTbAIAV',
 'Vicariates_12': '001Dx00001FaTbBIAV',
 'Vicariates_13': '001Dx00001FaTbCIAV',
 'Vicariates_14': '001Dx00001FaTbDIAV',
 'Vicariates_15': '001Dx00001FaTbEIAV',
 'Vicariates_16': '001Dx00001FaTbFIAV',
 'Vicariates_17': '001Dx00001FaTbGIAV',
 'Vicariates_18': '001Dx00001FaTbHIAV',
 'Deanery_1': '001Dx00001CwdOmIAJ',
 'Deanery_2': '001Dx00001CwdOnIAJ'}

### B) Parishes, Schools, Organizations


In [42]:
# NEW Create a new DF with Account records - excluding Deaneries (already handled) and Religious (to be handled differently, after)
acc_main = accounts[accounts['AccountRecordType'] != 'Deanery']
acc_main = acc_main[acc_main['AccountRecordType'] != 'Religious']

acc_main.loc[acc_main['AccountRecordType'] == 'Church', 'Vicariate_Ext_Id'] = 'Vicariates_' + acc_main['Vicariate']

acc_main.sample(5)

Unnamed: 0,Record Number,AccountRecordType,"Name, City",Name,Parish Name,Archdiocese Assigns Clergy,Locator Description,BillingStreet,BillingStreet2,BillingCity,...,Grades Provided,Mailing Address 1,Mailing Address Zip,Vicariate Name,Mailing Address City2,mbfc__Church_Type__c,Archdpdx_Migration_Id__c,RecordTypeId,Job_Id__c,Vicariate_Ext_Id
130,101,Church,"St. Mary’s Cathedral, Portland",St. Mary’s Cathedral of the Immaculate Conception,St. Mary’s Cathedral,Yes,NW 18th Ave and Couch St,St. Mary’s Cathedral Parish,1716 NW Davis St,Portland,...,,,,,,Parish,Parishes_101,012Dx0000009TJxIAM,50,Vicariates_4
9,22,Organization,Vocations,Vocations,,Yes,,2838 E Burnside St,,Portland,...,,,,,,,Offices_22,012Hu000001pkqEIAQ,50,
300,47,School,"St. Agatha Catholic School, Portland",St. Agatha Catholic School,,Yes,,,,Portland,...,PS-8,7960 SE 15th Ave,97202.0,,,,Schools_47,012Dx0000009TK1IAM,50,
57,26,Church,"St. Philip, Dallas",St. Philip,,Yes,,St. Philip Parish,825 SW Mill St,Dallas,...,,,,,,Parish,Parishes_26,012Dx0000009TJxIAM,50,Vicariates_8
277,24,School,"St. Anne Catholic School, Grants Pass",St. Anne Catholic School,,Yes,,,,Grants Pass,...,PK-5,1131 NE 10th St,97526.0,,,,Schools_24,012Dx0000009TK1IAM,50,


In [43]:
# OLD > Create a new DF with Account records - excluding Deaneries (already handled) and Religious (to be handled differently, after)
# acc_main = accounts[accounts['AccountRecordType'] != 'Deanery']
# acc_main = acc_main[acc_main['AccountRecordType'] != 'Religious']

# acc_main.loc[acc_main['AccountRecordType'] == 'Church', 'Vicariate_Ext_Id'] = 'Deanery_' + acc_main['Vicariate']

# acc_main.sample(5)

In [44]:
# TODO: This is now obsolete (I think)
# acc_main['VicariateUniqueId'] = acc_main['Vicariate'].map(vicariates_externalid_map)

acc_main['mbfc__Deanery__c'] = acc_main.Vicariate_Ext_Id.map(vicariate_sf_recordids)

acc_main[acc_main['AccountRecordType'] == 'Church']['mbfc__Deanery__c']

35     001Dx00001FaTb0IAF
36     001Dx00001FaTbCIAV
37     001Dx00001FaTbFIAV
38     001Dx00001FaTb8IAF
39     001Dx00001FaTbEIAV
              ...        
181    001Dx00001FaTb4IAF
182    001Dx00001FaTbGIAV
183    001Dx00001FaTb5IAF
184    001Dx00001FaTbHIAV
185    001Dx00001FaTb6IAF
Name: mbfc__Deanery__c, Length: 151, dtype: object

In [45]:
# Clean up NaN values

acc_main.fillna('', inplace=True)

In [46]:
# @title Export to CSV
# Export to CSV for manual loading

accounts_staging = acc_main[[
    'Name',
    'RecordTypeId',
    'mbfc__Church_Type__c',
    'mbfc__Deanery__c',
    'BillingStreet',
    'BillingCity',
    'BillingState',
    'BillingPostalCode',
    'BillingCountry',
    'Phone',
    'Fax',
    'mbfc__Email__c',
    'Website',
    'mbfc__Abbreviation__c',
    'mbfc__Religious_Suffix__c',
    'mbfc__Type_Members__c',
    'Description',
    'Job_Id__c',
    'Archdpdx_Migration_Id__c'

    ]]

In [47]:
accounts_staging.to_csv('/Users/matthewmartin/Library/CloudStorage/GoogleDrive-matt@meribahflow.com/Shared drives/Clients/ADPDX (Portland)/Data/Clergy DB/staging/accounts_staging.csv', encoding='utf-8-sig')

#### Upsert Accounts (TBD )


In [48]:
bulk_data = []
for row in accounts_staging.itertuples(index=False):
    d = row._asdict()
    # del d['Index']
    bulk_data.append(d)

In [49]:
if run_upserts == 'True':
    account_staging_upsert = sf.bulk.Account.upsert(data=bulk_data, external_id_field='Archdpdx_Migration_Id__c', batch_size=100, use_serial=False)
    account_upserts = pd.DataFrame(account_staging_upsert)

account_upserts

Unnamed: 0,success,created,id,errors
0,True,False,001Dx00001FaTcBIAV,[]
1,True,False,001Dx00001FaTcCIAV,[]
2,True,False,001Dx00001FaTcDIAV,[]
3,True,False,001Dx00001FaTcEIAV,[]
4,True,False,001Dx00001FaTcFIAV,[]
...,...,...,...,...
241,True,False,001Dx00001FaTg4IAF,[]
242,True,False,001Dx00001FaTg5IAF,[]
243,True,False,001Dx00001FaTg6IAF,[]
244,True,False,001Dx00001FaTg7IAF,[]


In [50]:
# Generate an Errors log
import csv

keys = account_staging_upsert[0].keys()

with open('accounts_results', 'w', newline='') as csv_file:
    writer = csv.DictWriter(csv_file, keys)
    writer.writeheader()
    writer.writerows(account_staging_upsert)

# TODO: Convert this into a UDF

In [51]:
# @title Extract SF Account records

sf_accounts = sf.query('Select id, Name, RecordTypeId, mbfc__Church_Type__c, Archdpdx_Migration_Id__c, Job_Id__c from Account WHERE Job_Id__c != null')
sf_accounts = pd.DataFrame(sf_accounts['records'])
sf_accounts = sf_accounts.drop(columns = 'attributes')
sf_accounts

Unnamed: 0,Id,Name,RecordTypeId,mbfc__Church_Type__c,Archdpdx_Migration_Id__c,Job_Id__c
0,001Dx00001DOFt5IAH,"Our Lady of Perpetual Help Church, St Mary's, ...",012Dx0000009TJxIAM,Church Building,Church_1_cloned_by_sl,12
1,001Dx00001FZm6kIAD,St. Francis of Assisi School,012Hu000001pkqEIAQ,,Organization_5,36
2,001Dx00001FZm6lIAD,Valley Catholic Elementary School,012Hu000001pkqEIAQ,,Organization_8,36
3,001Dx00001FZm6mIAD,Valley Catholic Middle School,012Hu000001pkqEIAQ,,Organization_9,36
4,001Dx00001FZm6nIAD,Valley Catholic High School,012Hu000001pkqEIAQ,,Organization_10,36
...,...,...,...,...,...,...
373,001Dx00001FaTg4IAF,Resurrection Catholic Parish School,012Dx0000009TK1IAM,,Schools_58,50
374,001Dx00001FaTg5IAF,OSU Newman Center,012Hu000001pkqEIAQ,,NewmanCenters_1,50
375,001Dx00001FaTg6IAF,St. Thomas More (UO) Newman Center,012Hu000001pkqEIAQ,,NewmanCenters_2,50
376,001Dx00001FaTg7IAF,Walsh Memorial (SOU) Newman Center at Our Lady...,012Hu000001pkqEIAQ,,NewmanCenters_3,50


### C) Religious Institutes (Parents)


In [52]:
"""
- 'acc_religious' DF: create unique_id of religious parents
- create 'acc_religious_orders' DF , upsert into SF
- extract accounts from Salesforce, create dict (external_ID : account_ID)
- map parent ids onto religious child accounts DF in main DF
- 'acc_religious' > staging DF ('acc_religious')
    - drop unnecessary columns
    - upsert create DF of religious children, upsert into SF with
"""

# Create a new DF of all Religious accounts
acc_religious = accounts[accounts['AccountRecordType'] == 'Religious']

# Create a simplified external ID field
acc_religious['Archdpdx_Migration_Id__c'] = acc_religious['Order Full Name'].apply(
    lambda x: x.lower().replace(' ', '')[:40]
)

acc_religious_2 = acc_religious

# Create a DF for only parent religious order accounts
acc_religious_parents = acc_religious_2[['Order Full Name', 'Name', 'mbfc__Abbreviation__c', 'mbfc__Religious_Suffix__c', 'mbfc__Type_Members__c', 'Archdpdx_Migration_Id__c']]

# Drop duplicate rows of the same parent Religious Order (becuase there are more than 1 local community of a particular order)
acc_religious_parents.drop_duplicates('Order Full Name', inplace=True)

# How many remaining rows after dropping duplicates?
print(acc_religious_parents.shape)

# Rename columns
acc_religious_parents = acc_religious_parents.rename(columns={
    'Order Full Name': 'Description'
    })

# Drop NA
acc_religious_parents.fillna('', inplace=True)

acc_religious_parents


(62, 6)


A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  acc_religious['Archdpdx_Migration_Id__c'] = acc_religious['Order Full Name'].apply(
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  acc_religious_parents.drop_duplicates('Order Full Name', inplace=True)


Unnamed: 0,Description,Name,mbfc__Abbreviation__c,mbfc__Religious_Suffix__c,mbfc__Type_Members__c,Archdpdx_Migration_Id__c
186,Societas Iesu,Colombiere Jesuit Community,Jesuits,SJ,Men,societasiesu
187,Ordo Cisterciensis Strictioris Observantiae,Abbey of Our Lady of Guadalupe,Trappists,OCSO,Men,ordocisterciensisstrictiorisobservantiae
189,Ordo Sancti Benedicti,Benedictine Monks of Mount Angel Abbey,Benedictines,OSB,Men,ordosanctibenedicti
190,Misioneros del Espíritu Santo,Missionaries of the Holy Spirit Provincial House,"Missionaries of the Holy Spirit, Christ the Pr...",MSpS,Men,misionerosdelespíritusanto
191,Apostles of Jesus,Apostles of Jesus,Apostles of Jesus,AJ,Men,apostlesofjesus
...,...,...,...,...,...,...
249,Fraternità san Carlo Borromeo,Priestly Fraternity of the Missionaries of St....,Fraternity of St. Charles,FSCB,Men,fraternitàsancarloborromeo
250,"Sons of Mary, Mother of Mercy","Sons of Mary, Mother of Mercy","Sons of Mary, Mother of Mercy",SMMM,Men,"sonsofmary,motherofmercy"
251,Society of the Divine Word,Society of the Divine Word,Society of the Divine Word,SVD,Men,societyofthedivineword
252,Society of the Divine Saviour,Society of the Divine Saviour,Society of the Divine Saviour,SDS,Men,societyofthedivinesaviour


In [53]:
acc_religious_parents['Religious_Type__c'] = 'Congregation'

In [54]:
#TODO: 'Religious Order', 'Secular Order', 'Diocesan Order', 'Pontifical Order'

In [55]:
# @title  Set recordType to 'Religious'

#
religious_recordtype_id = df_sf_recordTypes.loc[
    (df_sf_recordTypes['DeveloperName'] == 'Religious') & (df_sf_recordTypes['SobjectType'] == 'Account'),
    'Id'
    ].iloc[0]  # Use .iloc[0] to get the first item if you're expecting exactly one match

print(religious_recordtype_id)

acc_religious_parents['RecordTypeId'] = religious_recordtype_id

acc_religious_parents.sample(10)

012Dx0000009TK0IAM


Unnamed: 0,Description,Name,mbfc__Abbreviation__c,mbfc__Religious_Suffix__c,mbfc__Type_Members__c,Archdpdx_Migration_Id__c,Religious_Type__c,RecordTypeId
245,Oblates of the Virgin Mary,Oblates of the Virgin Mary,Oblates of the Virgin Mary,OMV,Men,oblatesofthevirginmary,Congregation,012Dx0000009TK0IAM
247,Brothers of Saint John,Brothers of Saint John,Brothers of Saint John,CSJ,Men,brothersofsaintjohn,Congregation,012Dx0000009TK0IAM
215,Misioneros del Rosario de Fátima,Missionary Sisters of the Rosary of Fatima,Missionaries of the Rosary of Fatima,HMRF,Women,misionerosdelrosariodefátima,Congregation,012Dx0000009TK0IAM
201,Ordo Fratrum Minorum Province of Saint Barbara,Franciscan Friars,Franciscans,OFM,Men,ordofratrumminorumprovinceofsaintbarbara,Congregation,012Dx0000009TK0IAM
186,Societas Iesu,Colombiere Jesuit Community,Jesuits,SJ,Men,societasiesu,Congregation,012Dx0000009TK0IAM
192,Franciscan Sisters of the Eucharist,Franciscan Sisters of the Eucharist,Franciscan Sisters of the Eucharist,FSE,Women,franciscansistersoftheeucharist,Congregation,012Dx0000009TK0IAM
202,Congregatio a Sancta Cruce,Congregation of the Holy Cross,Holy Cross,CSC,Men,congregatioasanctacruce,Congregation,012Dx0000009TK0IAM
203,Sociedad San Juan,Saint John Society,Saint John Society,SSJ,Men,sociedadsanjuan,Congregation,012Dx0000009TK0IAM
238,Heralds of the Good News,Heralds of the Good News,Heralds of the Good News,HGN,Men,heraldsofthegoodnews,Congregation,012Dx0000009TK0IAM
200,Domus Dei Clerical Society of Apostolic Life,Society of Domus Dei Holy House Monasteries,Domus Dei,SDD,Men,domusdeiclericalsocietyofapostoliclife,Congregation,012Dx0000009TK0IAM


In [56]:
# @title Send to CSV
acc_religious_parents.to_csv('/Users/matthewmartin/Library/CloudStorage/GoogleDrive-matt@meribahflow.com/Shared drives/Clients/ADPDX (Portland)/Data/Clergy DB/staging/religious_order_staging.csv', encoding='utf-8-sig')

In [57]:
# @title Upsert to Salesforce
bulk_data = []
for row in acc_religious_parents.itertuples(index=False):
    d = row._asdict()
    # del d['Index']
    bulk_data.append(d)

if run_upserts == 'True':
    religious_order_upsert = sf.bulk.Account.upsert(data=bulk_data, external_id_field='Archdpdx_Migration_Id__c', batch_size=100, use_serial=False)
    df_rel_order_upsert = pd.DataFrame(religious_order_upsert)

df_rel_order_upsert

Unnamed: 0,success,created,id,errors
0,True,False,001Dx00001FZmZjIAL,[]
1,True,False,001Dx00001FZmZkIAL,[]
2,True,False,001Dx00001FZmZlIAL,[]
3,True,False,001Dx00001FZmb9IAD,[]
4,True,False,001Dx00001FZmZmIAL,[]
...,...,...,...,...
57,True,False,001Dx00001FZmabIAD,[]
58,True,False,001Dx00001FZmacIAD,[]
59,True,False,001Dx00001FZmadIAD,[]
60,True,False,001Dx00001FZmaeIAD,[]


In [58]:
# Generate an Errors log
import csv

keys = religious_order_upsert[0].keys()

with open('religious_order_results', 'w', newline='') as csv_file:
    writer = csv.DictWriter(csv_file, keys)
    writer.writeheader()
    writer.writerows(religious_order_upsert)

# TODO: Convert this into a UDF

In [59]:
# @title get SF Accounts
get_all_rel_accounts = f"Select id, Name, RecordTypeId, Type, Archdpdx_Migration_Id__c from Account where RecordTypeID = '{religious_recordtype_id}'"

print(religious_recordtype_id)

# get list of records, add to dataframe
sf_accounts = sf.query(get_all_rel_accounts)
df_sf_accounts = pd.DataFrame(sf_accounts['records'])
df_sf_accounts = df_sf_accounts.drop(columns = 'attributes')

df_sf_accounts.sample(10)

012Dx0000009TK0IAM


Unnamed: 0,Id,Name,RecordTypeId,Type,Archdpdx_Migration_Id__c
52,001Dx00001FZma9IAD,Good Shepherd Sisters,012Dx0000009TK0IAM,,congregationofourladyofcharityofthegoods
8,001Dx00001FaTjGIAV,"Sons of Mary, Mother of Mercy, Umuahia, Nigeri...",012Dx0000009TK0IAM,,RelCommunities_76
120,001Dx00001FaTiYIAV,"Adrian Dominican Sisters, Adrian, MI (OP)",012Dx0000009TK0IAM,,RelCommunities_30
135,001Dx00001FaTinIAF,"Sisters of Jesus the Saviour, Gold Beach (SJS)",012Dx0000009TK0IAM,,RelCommunities_45
21,001Dx00001DLgQKIA1,"Apostles of Jesus, Beaverton",012Dx0000009TK0IAM,,
47,001Dx00001FZma4IAD,Thu Thiem Sisters,012Dx0000009TK0IAM,,loversofthuthiemholycrosssisters
109,001Dx00001FaTj3IAF,Society of the Missionaries of St. Francis Xav...,012Dx0000009TK0IAM,,RelCommunities_63
84,001Dx00001FaTisIAF,"Sisters of St. Francis, Lake Oswego (OSF)",012Dx0000009TK0IAM,,RelCommunities_50
28,001Dx00001FZmZkIAL,Abbey of Our Lady of Guadalupe,012Dx0000009TK0IAM,,ordocisterciensisstrictiorisobservantiae
108,001Dx00001FaTj2IAF,"Work of Jesus the High Priest, Gresham (OJSS)",012Dx0000009TK0IAM,,RelCommunities_62


In [60]:
religious_order_mapping = df_sf_accounts.set_index('Archdpdx_Migration_Id__c')['Id'].to_dict()
# religious_order_mapping

### D) Religious Communities


In [61]:
acc_religious_staging = (acc_religious
                         .rename(columns={'Archdpdx_Migration_Id__c' : 'Parent_Archdpdx_Migration_Id__c'})
)

acc_religious_staging['ParentId'] = acc_religious_staging['Parent_Archdpdx_Migration_Id__c'].map(religious_order_mapping)

In [62]:
# Enrich the data

acc_religious_staging['Religious_Type__c'] = 'Local Community'
acc_religious_staging['Archdpdx_Migration_Id__c'] = 'RelCommunities_' + acc_religious_staging['Record Number'].astype('str')
acc_religious_staging['RecordTypeId'] = religious_recordtype_id
acc_religious_staging.drop(columns='Name', inplace=True)
acc_religious_staging.rename(columns={
    'Name, City': 'Name'
}, inplace=True)

acc_religious_staging.sample(5)

Unnamed: 0,Record Number,AccountRecordType,Name,Parish Name,Archdiocese Assigns Clergy,Locator Description,BillingStreet,BillingStreet2,BillingCity,BillingState,...,Mailing Address Zip,Vicariate Name,Mailing Address City2,mbfc__Church_Type__c,Parent_Archdpdx_Migration_Id__c,RecordTypeId,Job_Id__c,ParentId,Religious_Type__c,Archdpdx_Migration_Id__c
247,73,Religious,"Brothers of Saint John, Laredo, TX (CSJ)",,No,,St. John Priory,505 Century Dr S,Laredo,TX,...,,,,,brothersofsaintjohn,012Dx0000009TK0IAM,50,001Dx00001FZmaZIAT,Local Community,RelCommunities_73
252,78,Religious,"Society of the Divine Saviour, Rome, Italy (SDS)",,No,,"Via della Conciliazione, 51",,Roma,,...,,,,,societyofthedivinesaviour,012Dx0000009TK0IAM,50,001Dx00001FZmaeIAD,Local Community,RelCommunities_78
210,34,Religious,"Sisters of St. Dominic of Caldwell, Caldwell, ...",,No,,1 Ryerson Avenue,,Caldwell,NJ,...,,,,,sistersofst.dominicofcaldwell,012Dx0000009TK0IAM,50,001Dx00001FZma1IAD,Local Community,RelCommunities_34
209,33,Religious,"Oblates of St. Martha, Portland (OSM)",,No,,National Sanctuary of Our Sorrowful Mother (Th...,PO Box 20008,Portland,OR,...,,,,,congregacióndeoblatasdesantamarta,012Dx0000009TK0IAM,50,001Dx00001FZma0IAD,Local Community,RelCommunities_33
226,50,Religious,"Sisters of St. Francis, Lake Oswego (OSF)",,No,,843 13th Ave N,,Clinton,IA,...,,,,,"sistersofst.francis,clinton,iowa",012Dx0000009TK0IAM,50,001Dx00001FZmaGIAT,Local Community,RelCommunities_50


In [63]:
acc_religious_staging_2 = acc_religious_staging[[
    'Name',
    'RecordTypeId',
    'Religious_Type__c',
    'BillingStreet',
    'BillingCity',
    'BillingState',
    'BillingPostalCode',
    'BillingCountry',
    'Phone',
    'Fax',
    'mbfc__Email__c',
    'Website',
    'mbfc__Abbreviation__c',
    'mbfc__Religious_Suffix__c',
    'mbfc__Type_Members__c',
    'Description',
    'Job_Id__c',
    'ParentId',
    'Archdpdx_Migration_Id__c'
    ]]

acc_religious_staging_2.sample(5)

Unnamed: 0,Name,RecordTypeId,Religious_Type__c,BillingStreet,BillingCity,BillingState,BillingPostalCode,BillingCountry,Phone,Fax,mbfc__Email__c,Website,mbfc__Abbreviation__c,mbfc__Religious_Suffix__c,mbfc__Type_Members__c,Description,Job_Id__c,ParentId,Archdpdx_Migration_Id__c
210,"Sisters of St. Dominic of Caldwell, Caldwell, ...",012Dx0000009TK0IAM,Local Community,1 Ryerson Avenue,Caldwell,NJ,7006,,973-403-3331,973-228-9611,dempsey@up.edu,https://caldwellop.org/,Sisters of St. Dominic,OP,Women,Serving the University of Portland,50,001Dx00001FZma1IAD,RelCommunities_34
188,"JCCU Jesuit Tertianship, Portland (SJ)",012Dx0000009TK0IAM,Local Community,3301 SE 45th Ave,Portland,OR,97206,,,,jctertianship@jesuits.org,,Jesuits,SJ,Men,,50,001Dx00001FZmZjIAL,RelCommunities_3
228,"Sisters of St. Joseph of Peace, Eugene (CSJP)",012Dx0000009TK0IAM,Local Community,CSJP Western Region Office,Bellevue,WA,98009,,425-467-5400,425-462-9760,,https://csjp.org/,Sisters of St. Joseph,CSJP,Women,"Serving Sacred Heart RiverBend Medical Center,...",50,001Dx00001FZmaIIAT,RelCommunities_52
205,"Adorers of the Holy Cross, Portland (MTG)",012Dx0000009TK0IAM,Local Community,7408 SE Alder St,Portland,OR,97215,,503-254-3284,503-255-3097,mtgdlhn@yahoo.com,http://menthanhgiadalat.net/,Adorers of the Holy Cross,MTG,Women,"Serving Our Lady of Lavang Parish, Portland; A...",50,001Dx00001FZmZyIAL,RelCommunities_29
235,Society of Christ Fathers Province in the Unit...,012Dx0000009TK0IAM,Local Community,Society of Christ,Lombard,IL,60148,POLAND,630-424-0401,,,https://tchr.us/en,Society of Christ,SCH,Men,Missionary priests and brothers serving Polish...,50,001Dx00001FZmaOIAT,RelCommunities_61


In [64]:
# Final Cleanup

acc_religious_staging_2 = acc_religious_staging_2.fillna('')

In [65]:
# @title Send to CSV
acc_religious_staging_2.to_csv('/Users/matthewmartin/Library/CloudStorage/GoogleDrive-matt@meribahflow.com/Shared drives/Clients/ADPDX (Portland)/Data/Clergy DB/staging/religious_community_staging.csv', encoding='utf-8-sig')

In [66]:
# @title Upsert to Salesforce
bulk_data = []
for row in acc_religious_staging_2.itertuples(index=False):
    d = row._asdict()
    # del d['Index']
    bulk_data.append(d)

if run_upserts == 'True':
    religious_community_upsert = sf.bulk.Account.upsert(data=bulk_data, external_id_field='Archdpdx_Migration_Id__c', batch_size=100, use_serial=False)
    df_rel_community_upsert = pd.DataFrame(religious_community_upsert)

df_rel_community_upsert

Unnamed: 0,success,created,id,errors
0,True,False,001Dx00001FaTiFIAV,[]
1,True,False,001Dx00001FaTiGIAV,[]
2,True,False,001Dx00001FaTiHIAV,[]
3,True,False,001Dx00001FaTiIIAV,[]
4,True,False,001Dx00001FaTiJIAV,[]
...,...,...,...,...
65,True,False,001Dx00001FaTjHIAV,[]
66,True,False,001Dx00001FaTjIIAV,[]
67,True,False,001Dx00001FaTjJIAV,[]
68,True,False,001Dx00001FaTjKIAV,[]


### E) Religious Superiors


In [67]:
acc_rel_superiors = acc_religious_2[[
    'Name',
    'Major Superior Name',
    'Major Superior Phone',
    'Major Superior Email',
    'Archdpdx_Migration_Id__c']]


acc_rel_superiors['AccountId'] = acc_rel_superiors.Archdpdx_Migration_Id__c.map(religious_order_mapping)

# acc_rel_superiors.sample(5)

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  acc_rel_superiors['AccountId'] = acc_rel_superiors.Archdpdx_Migration_Id__c.map(religious_order_mapping)


In [68]:
# @title Parse Complex Names
def parse_names(df, column_name):
    # Convert all non-string entries to strings (handling NaN and other data types)
    df[column_name] = df[column_name].fillna('').apply(str)

    # Create a new DataFrame to store the name parts
    name_parts = pd.DataFrame()

    # Parse each name in the column
    name_parts['First Name'] = df[column_name].apply(lambda x: HumanName(x).first if x.strip() != '' else '')
    name_parts['Last Name'] = df[column_name].apply(lambda x: HumanName(x).last if x.strip() != '' else '')
    name_parts['Middle Name'] = df[column_name].apply(lambda x: HumanName(x).middle if x.strip() != '' else '')
    name_parts['Title'] = df[column_name].apply(lambda x: HumanName(x).title if x.strip() != '' else '')
    name_parts['Suffix'] = df[column_name].apply(lambda x: HumanName(x).suffix if x.strip() != '' else '')
    name_parts['Nickname'] = df[column_name].apply(lambda x: HumanName(x).nickname if x.strip() != '' else '')

    # Combine the original DataFrame with the name parts DataFrame
    result_df = pd.concat([df, name_parts], axis=1)
    return result_df



In [69]:
!pip install nameparser
from nameparser import HumanName
from nameparser.config import CONSTANTS

# Add dataset-specific Titles and Suffix constants for parsing
CONSTANTS.titles.add('Very', 'Rev.', 'Very Rev.', 'Sr.')
CONSTANTS.suffix_acronyms.add('FRS', 'OMI', 'OSA', 'OCD', 'OP', 'OC', 'FSE', 'OMV', 'SDB', 'SM', 'SFX', 'SP', 'OP', 'O.S.M', 'SNJM', 'OSF', 'HMRF', 'DD', 'CSJP', 'SDD', 'BVM', 'BVM - President' )




SetManager({'fmp', 'enp', 'fnss', 'usmc', 'cma', 'fcela', 'cbp', 'faicp', 'asp', 'capp', 'frm', 'res', 'se', 'qgm', 'crde', 'omv', 'cwdp', 'ams', 'bpe', 'dtr', 'facog', 'rtrp', 'ra', 'aas', 'mcct', 'nbct', 'cbv', 'np', 'rp', 'gcmg', 'atc', 'cfc', 'nicet i', 'cisa', 'obe', 'ceas', 'cpwa', 'cmc', 'crma', 'ccna', 'gmr', 'ei', 'cmsp', 'bvm - president', 'alc', 'kt', 'vc', 'crp', 'abpp', 'ccp', 'cbnt', '(vet)', 'fhames', 'dpt', 'sm', 'ncidq', 'gmb', 'sfx', 'mcsd', 'pci', 'cfce', 'sscp', 'kcie', 'mcdba', 'cbm', 'aca', 'clsd', 'lpa', 'cbrte', 'ndtr', 'phr', 'caro', 'kg', 'dcm', 'cga', 'si', 'rdms', 'fec', 'ctbs', 'emd', 'mc', 'nbcdch-ps', 'usaf', 'cpss', 'dpm', 'pt', 'qc', 'vd', 'acas', 'fasid', 'cpacc', 'dep', 'ccs', 'nccp', 'dacvim', 'fws', 'prm', 'gaee[14]', 'gba', 'chse', 'sgm', 'cet', 'thd', 'fse', 'chp', 'do', 'gbe', 'bpt', 'rba', 'cela', 'cp', 'caps', 'phc', 'sa', 'cfm', 'csep', 'chpe', 'aem', 'obi', 'cgm', 'aba', 'asa', 'cie', 'facd', 'git', 'aqp', 'frs', 'bem', 'lpn', 'sccp', 'cmt', 

In [70]:
# Parse Complex Names
acc_rel_superiors_parsed = parse_names(acc_rel_superiors, 'Major Superior Name')

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df[column_name] = df[column_name].fillna('').apply(str)


In [71]:
# @title Final cleanup

acc_rel_superiors_staging = acc_rel_superiors_parsed.fillna('')

acc_rel_superiors_staging['Archdpdx_Migration_Id__c'] = acc_rel_superiors_staging['Major Superior Name'].apply(lambda x: x.replace(' ','').lower())

# Rename columns
acc_rel_superiors_staging = acc_rel_superiors_staging.rename(columns={
    'Major Superior Phone': 'Phone',
    'Major Superior Email': 'Email',
    'Title': 'Salutation',
    'First Name': 'FirstName',
    'Middle Name': 'MiddleName',
    'Last Name': 'LastName'
})

# Add job id
acc_rel_superiors_staging['Archdpdx_Job_Id__c'] = curr_job_id

# Drop columns
acc_rel_superiors_staging = acc_rel_superiors_staging.drop(columns=['Name', 'Major Superior Name', 'Nickname'])

# Drop empty rows
acc_rel_superiors_staging = acc_rel_superiors_staging[acc_rel_superiors_staging['LastName'].str.strip() != '']

acc_rel_superiors_staging.sample(10)

Unnamed: 0,Phone,Email,Archdpdx_Migration_Id__c,AccountId,FirstName,LastName,MiddleName,Salutation,Suffix,Archdpdx_Job_Id__c
249,+39 06 61571401,pr@sancarlo.org,fr.paolosottopietra,001Dx00001FZmabIAD,Paolo,Sottopietra,,Fr.,,50
227,610-459-4125,tfirenze@osfphila.org,"sr.theresamariefirenze,osf",001Dx00001FZmaHIAT,Theresa,Firenze,Marie,Sr.,OSF,50
254,510-658-8722,provincial@opwest.org,"veryrev.christopherfadok,op,provincial",001Dx00001FZmZsIAL,Christopher,Fadok,,Very Rev.,"OP, Provincial",50
219,314-397-9436,tponder@gspmna.org,toniponder,001Dx00001FZma9IAD,Toni,Ponder,,,,50
194,,osa-west@calprovince.org,"rev.garysanders,osa",001Dx00001FZmZpIAL,Gary,Sanders,,Rev.,OSA,50
218,563-588-2351,,"ladonnamanternach,bvm–president",001Dx00001FZmbAIAT,BVM,LaDonna Manternach,– President,,,50
193,,"P.O. Box 8816 Moshi, Tanzania",rev.charleslyimo,001Dx00001FZmZoIAL,Charles,Lyimo,,Rev.,,50
222,,,sr.josephine,001Dx00001FZmaCIAT,,Josephine,,Sr.,,50
255,510-658-8722,provincial@opwest.org,"veryrev.christopherfadok,o.p.",001Dx00001FZmZsIAL,Christopher,Fadok,,Very Rev.,O.P.,50
202,574-631-6196,info@holycrossusa.org,"rev.petera.jarret,c.s.c.",001Dx00001FZmZvIAL,Peter,Jarret,A.,Rev.,C.S.C.,50


In [72]:
# @title Send to CSV
acc_rel_superiors_staging.to_csv('/Users/matthewmartin/Library/CloudStorage/GoogleDrive-matt@meribahflow.com/Shared drives/Clients/ADPDX (Portland)/Data/Clergy DB/staging/religious_superiors_staging.csv', encoding='utf-8-sig')

In [73]:
# @title Upsert to Salesforce

def find_existing_contact(sf, first_name, last_name):
    query = f"SELECT Id, Archdpdx_Migration_Id__c FROM Contact WHERE FirstName = '{first_name}' AND LastName = '{last_name}'"
    result = sf.query(query)
    return result['records']



bulk_data = []
for row in acc_rel_superiors_staging.itertuples(index=False):
    d = row._asdict()
    existing_contacts = find_existing_contact(sf, d['FirstName'], d['LastName'])
    if existing_contacts:
        # Update existing contact with external ID
        d['Id'] = existing_contacts[0]['Id']
        bulk_data.append(d)
    else:
        bulk_data.append(d)


if run_upserts == 'True':
    religious_superior_upsert = sf.bulk.Contact.upsert(data=bulk_data, external_id_field='Archdpdx_Migration_Id__c', batch_size=100, use_serial=False)
    df_rel_superior_upsert = pd.DataFrame(religious_superior_upsert)

df_rel_superior_upsert

Unnamed: 0,success,created,id,errors
0,False,False,,"[{'statusCode': 'DUPLICATE_VALUE', 'message': ..."
1,True,False,003Dx00000m0jAWIAY,[]
2,True,False,003Dx00000m0jAXIAY,[]
3,True,False,003Dx00000m0jAYIAY,[]
4,False,True,,"[{'statusCode': 'INVALID_EMAIL_ADDRESS', 'mess..."
5,True,False,003Dx00000m0jAZIAY,[]
6,True,False,003Dx00000m0jAaIAI,[]
7,False,False,,"[{'statusCode': 'DUPLICATE_VALUE', 'message': ..."
8,True,False,003Dx00000m0jAbIAI,[]
9,True,False,003Dx00000m0jAcIAI,[]


In [74]:
# @title Update Religious Communities with Rel. Superior

# TBD: It would take much less time to simply do this post-migration manually.

# CONTACTS


## Extract


In [75]:
df_contacts = (pd.read_csv('/Users/matthewmartin/Library/CloudStorage/GoogleDrive-matt@meribahflow.com/Shared drives/Clients/ADPDX (Portland)/Data/Clergy DB/reports from clergypdx/People.csv')
               .set_index('Record Number', verify_integrity=True)
               .drop(index='recNum') # Drops the extra row that replicates the labels
               .rename(columns=lambda x: x.replace(' ', '_')) # Remove whitespace in column names
)

df_contacts.sample(10)


Unnamed: 0_level_0,Common_Name,Sort_Name,Type(s),Clergy_Status,Religious_Status,Login_ID,Password,Password_Must_be_Changed,Access_Permission,Spouse,...,CARA_Ethnicity,Seminarian_Status,Other_Diaconal_Ministry,Spiritual_Director_Authorized,Link_to_Religious_Community,Place_of_Work,Volunteer_Place,Type_of_Work,Work_Load,Work_Title
Record Number,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
2297,Ms. Gail Baird,baird gail,Staff,,,,,,,0,...,,,,,0,,,,,
1829,Ms. Amanda Wade,wade amanda,Staff,,,,,,,0,...,,,,,0,,,,,
728,Rev. James Stange,stange james,Priest,Deceased,,,,,,0,...,,,,,0,,,,,
1211,Rev. Gerald McCray,mccray gerald,Priest,Faculties Withdrawn,,,,,,0,...,,,,,0,,,,,
1283,"Rev. Bede Partridge, OSB",partridge bede,"Priest,Religious",Deceased,Deceased,,,,,0,...,,,,,4,,,,,
3076,Deacon Mac Chester,chester mac albert,Permanent Deacon,Active,,mchester,6bda1cb7769284d816e980fd85cabd5573e3fb604508cf...,No,,3077,...,Caucasian/white,,,No,0,,,,,
98,Mr. Jerry Roussell,roussell jerry jeroid oneil,Archive,,,,,,,0,...,,,,,0,,,,,
1310,"Rev. Christopher Renz, OP",renz christopher,"Priest,Religious",Transferred Out,Transferred Out,,,,,0,...,,,,,18,,,,,
2432,"Sr. Adele Marie Altenhofen, SSMO",altenhofen adele marie,Religious,,Active,,,,,0,...,,,,,53,,,,,
3292,Ms. Katie Bernards,bernards katie,Staff,,,,,,,0,...,,,,,0,,,,,


#### Get Photos


In [76]:
import os
import pandas as pd

# def list_jpeg_files(directory):
#     data = []
#     for filename in os.listdir(directory):
#         if filename.endswith(".jpeg") or filename.endswith(".jpg"):  # Checking for jpeg files
#             full_path = os.path.join(directory, filename)
#             data.append({'Filename': filename, 'Full Path': full_path})
#     return pd.DataFrame(data)

# # Specify your directory
# directory = '/content/drive/Shareddrives/Clients/ADPDX (Portland)/Data/Clergy DB/sql_backup/archdpdx.info backups/public_html/people/graphics/portraits/large'
# jpeg_files_df = list_jpeg_files(directory)


In [77]:
# # Query for the Library
# library_query = "SELECT Id, Name FROM ContentWorkspace WHERE Name = 'ADPDX Person Profile Photos'"
# library_result = sf.query(library_query)

# # Check if the library exists and get its ID
# if library_result['records']:
#     library_id = library_result['records'][0]['Id']
#     print(f"Library ID: {library_id}")

#     # Query for the Folder within the Library
#     folder_query = f"SELECT Id, Name FROM ContentFolder WHERE ParentContentFolderId = '{library_id}'"
#     folder_result = sf.query(folder_query)

#     # Check if the folder exists and get its ID
#     if folder_result['records']:
#         folder_id = folder_result['records'][0]['Id']
#         print(f"Folder ID: {folder_id}")
#     else:
#         print("Folder 'Large JPEGs' not found in the library.")
# else:
#     print("Library 'ADPDX Person Profile Photos' not found.")

## Analysis

Here we check the various columns and their types, count where values exist, count of unique values, sample data, etc.

DF shape:

- 142 columns
- 3017 rows


In [78]:
# Check the original shape of the imported CSV
print(f"Shape of original data set: {df_contacts.shape}")

# export to csv a list of the contact fields with count, unique, top, freq
contacts_describe = df_contacts.describe(include='all').transpose()
contacts_describe.to_csv(f'/Users/matthewmartin/Library/CloudStorage/GoogleDrive-matt@meribahflow.com/Shared drives/Clients/ADPDX (Portland)/Data/Clergy DB/analysis/contacts_describe.csv')

df_contacts.describe(include='all').transpose()  #initial analysis of the Contacts table

Shape of original data set: (3016, 141)


Unnamed: 0,count,unique,top,freq
Common_Name,3016,3011,Ms. Leslie Jones,2
Sort_Name,3016,3009,nguyen anthony,3
Type(s),3016,29,Staff,1139
Clergy_Status,1138,8,Transferred Out,462
Religious_Status,902,4,Active,456
...,...,...,...,...
Place_of_Work,269,133,Mount Angel Abbey,37
Volunteer_Place,54,47,Mary’s Woods,4
Type_of_Work,276,117,Pastoral Ministry,30
Work_Load,262,2,Full Time,230


In [79]:
unique_languages = df_contacts['Languages'].unique()
unique_languages

array([nan, 'English,Spanish', 'Igbo', 'English, Spanish',
       'Spanish, Mayaqeqchi', 'Spanish (Mass only)',
       'Latin Mass and written translation. Read French, Italian, Spanish.',
       'Spanish', 'Hindi, Konkani, Tamil',
       'French (fluent), Spanish (beginner), Latin (beginner)',
       'German, Spanish, Italian, French', 'Kiswahili, Kichagga',
       'Spanish (English is second language)',
       'German, Spanish, Italian, Latin Mass',
       'English, Spanish, Italian', 'Spanish, Italian', 'English',
       'Bicolango, Tagalog, Spanish', 'Spanish, Italian, Latin Mass',
       'Italian', 'Tagalog, English, Spanish',
       'French, Italian, Aramaic (modern), Spanish', 'Vietnamese',
       'German, Spanish', 'English,Spanish,Italian',
       'Conversant in Italian and Spanish, some facility with Latin and German',
       'English, Spanish, Latin Mass', 'Italian, Spanish',
       'Konkani, Hindi, Marathi, Spanish',
       'Tagalog, Bicol, Spanish (Mass only)', 'Spanish, E

In [80]:
import re
import numpy as np


def deduplicate_languages(list_languages):
    # Define a regular expression pattern to match periods and punctuation
    punctuation_pattern = r'[.,!?;:"]'

    # Flatten the array and filter out NaN values
    flattened_languages = [re.sub(punctuation_pattern, '', lang) for sublist in list_languages if pd.notna(sublist) for lang in sublist.split(',')]

    # Deduplicate the list of languages
    unique_languages = list(set(flattened_languages))

    return unique_languages


# Example usage:
unique_languages = deduplicate_languages(unique_languages)
print(unique_languages)


['', 'Latin Mass and written translation Read French', 'Portuguese', ' Bicol', 'Crijolle', ' Italian', ' Vietnamese (semi-fluent)', ' Tagalog', 'English', 'Italian', ' Spanish (small bits)', 'Tamil', 'Vietnamese (Mass only)', ' Ukrainian', ' Tamil', ' Greek', ' Croatian', ' Little Spanish', ' Mayaqeqchi', ' Portuguese', ' Swahili Mass', ' Maya Q’eqchi’', ' Telugu', ' some facility with Latin and German', ' German', 'Hindi', ' Latin Mass', 'Spanish', ' Englsih', ' Marathi', 'German', ' Arabic', 'French (small bits)', ' Russian', ' Spanish (Mass only)', ' Hindi', ' but can do rituals)', ' Hebrew (reading)', ' English', 'Tagalog', ' Chamorro', ' Vietnamese', ' Latin', ' French', 'French (fluent)', 'Swahili', 'Vietnamese', 'Latin', ' Konkawin', ' Latin (beginner)', ' Biblical Hebrew & Aramaic', 'Konkani', 'Polish', 'Kannada', 'Chuukese', 'French', ' Aramaic (modern)', 'Bicolango', ' a little Spanish(not conversational', ' Hebrew', ' German (small bits)', 'Chagga', 'Spanish (Mass only)', 'S

## Transform


In [81]:
# init list of columns NOT to be loaded as Contact attributes
misc_columns_to_drop = [
    'Password',
    'Password_Must_be_Changed',
    'Common_Name',
    'Sort_Name',
    'Private_Address_Province'
]

affiliation_columns = [
    'Seminarian_Student_Debt',
    'Seminarian_Medical_Benefits',
    'Baptism_Date',
    'Place_of_Baptism',
    'Confirmation_Date',
    'Place_of_Confirmation',
    'Received_Date',
    'Parish_of_Record',
    'Marriage_Date',
    'Place_of_Marriage',
    'Date_of_First_Vows',
    'Date_of_Final_Vows',
    'Accepted_to_Formation_Date',
    'Reader_Date',
    'Acolyte_Date',
    'Candidacy_Date',
    'Formation_Withdrawn_Date',
    'Formation_Deferred_Date',
    'Formation_Terminated_Date',
    'Terminate_or_Defer_Note',
    'Bachelor_Degree_Year',
    'Bachelor_Degree_Type',
    'Bachelor_Degree_Institution',
    'Graduate_1_Degree_Institution',
    'Graduate_1_Degree_Type',
    'Graduate_1_Degree_Year',
    'Graduate_2_Degree_Institution',
    'Graduate_2_Degree_Type',
    'Graduate_2_Degree_Year',
    'Graduate_3_Degree_Institution',
    'Graduate_3_Degree_Type',
    'Graduate_3_Degree_Year',
    'Graduate_4_Degree_Institution',
    'Graduate_4_Degree_Type',
    'Graduate_4_Degree_Year',
    'CARA_Highest_Ed_Level',
    'Diaconal_Ordination_Date',
    'Diaconal_Ordination_Place',
    'Diaconal_Ordination_Prelate',
    'Presbyteral_Ordination_Date',
    'Presbyteral_Ordination_Place',
    'Presbyteral_Ordination_Prelate',
    'Episcopal_Ordination_Date',
    'Episcopal_Ordination_Place',
    'Episcopal_Ordination_Prelate',
    'Ordination_Diocese',
    'Incardinated_From_Date',
    'Incardinated_From_Diocese',
    'Incardinated_Now',
    'Serving_Now',
    'Excardinated_To_Diocese',
    'Excardinated_To_Date',
    'Letter_of_Good_Standing_Date',
    'Religious_In_Archdiocese_Date',
    'Faculties',
    'Faculties_Granted_Date',
    'Faculties_Restricted_Date',
    'Faculties_Withdrawn_Date',
    'Last_Retreat_Date',
    'Last_Educ_Requirement_Date',
    'Policy_Manual_Acknowledgement_Date',
    'Harassment_Prevention_Course_Date',
    'Standards_of_Conduct_Date',
    'Last_Background_Check_Date',
    'Last_Child_Protection_Training_Date',
    'Out_of_Diocese_Date',
    'Senior_Status_Date',
    'Laicized_Date',
    'Coverage_Availability',
    'Advanced_Directive_Date',
    'End_of_Life_Plan_Date',
    'Will_Date',
    'Will_Note',
    'CIC_489_File',
    'Registered_Parish',
    'CARA_Ethnicity',
    'Seminarian_Status',
    'Other_Diaconal_Ministry',
    'Spiritual_Director_Authorized',
    'Link_to_Religious_Community',
    'Place_of_Work',
    'Volunteer_Place',
    'Type_of_Work',
    'Work_Load',
    'Work_Title'
]

In [82]:
# These fields need to be KEPT but while building the SF upsert flow these are dropped temporarily until mapping logic is included.
# TODO

fields_not_yet_mapped = [
    'Spouse',
    'Father_Full_Name',
    'Mother_Full_Maiden_Name',
    'Mailing_Address_2',
    'Mailing_Address_Province',
    'Private_Address_2',
    'Nickname',
    'Preferred_Address',
    'Private_Address__Street__s',
    'Private_Address__City__s',
    'Private_Address__StateCode__s',
    'Private_Address__PostalCode__s',
    'Private_Address__CountryCode__s',
    'Social_Security_Account_Number__c',  # The data is encrypted
    'Languages',  # Picklist is restricted, in MFC package. Needs unrestricting before I can migrate data.
    'Preferred_Email',
    'Preferred_Phone'

]

In [83]:
# UDF to combine multiple Mailing Street Address lines into one

def combine_addresses(row, *columns):
    address_parts = []
    for col in columns:
        address_parts.append(row[col])
    return 'CHAR(10)'.join(address_parts)


In [84]:
df_contact_staging = (df_contacts
                      .drop(columns='Salutation')
                      .rename(columns={
                          'Clergy_Status' : 'ADPDX_Clergy_Status__c',
                          'Religious_Status' : 'ADPDX_Religious_Status__c',
                          'Login_ID' : 'ADPDX_Login_ID__c',
                          'Access_Permission': 'ADPDX_Access_Permission__c',
                          'Title': 'Salutation',
                          'Christian_Name': 'FirstName',
                          'Middle_Name(s)': 'MiddleName',
                          'Surname': 'LastName',
                          'Suffix': 'Suffix',
                          #Mailing_Address & Mailing_Address_2
                          'Mailing_Address' : 'MailingStreet',
                          'Mailing_Address_City': 'MailingCity',
                          'Mailing_Address_State': 'MailingState',
                          #'Mailing_Address_Province': 'MailingProvince'
                          'Mailing_Address_Postal_Code': 'MailingPostalCode',
                          'Mailing_Address_Country': 'MailingCountry',
                          'Private_Address': 'Private_Address__Street__s',
                          'Private Address 2': 'Private_Address__Street__s',
                          'Private_Address_City': 'Private_Address__City__s',
                          'Private_Address_State': 'Private_Address__StateCode__s',
                          'Private_Address_Postal_Code': 'Private_Address__PostalCode__s',
                          'Private_Address_Country': 'Private_Address__CountryCode__s',
                          # 'Preferred_Address'
                          'Work_Phone': 'npe01__WorkPhone__c',
                          'Home_Phone': 'HomePhone',
                          'Cell_Phone': 'MobilePhone',
                        #   'Preferred_Phone': 'npe01__PreferredPhone__c',
                          # IF Preferred phone contains, 'do not publish'
                          'Work_Email' : 'npe01__WorkEmail__c',
                          'Archdiocesan_Email': 'npe01__AlternateEmail__c',
                          'Home_Email': 'npe01__HomeEmail__c',
                        #   'Preferred_Email': 'npe01__Preferred_Email__c',
                          # IF Preferred email contains 'do not publish''
                          'Directory_Include': 'Directory_Include__c',
                          'Directory_Include_Middle_Name': 'Directory_Include_Middle_Name__c',
                          'Directory_Include_Suffix': 'Directory_Include_Suffix__c',
                          'Suppress_From_Reports': 'Suppress_From_Reports__c',
                          'Send_Group_Mail_and_Email': 'Send_Group_Mail_and_Email__c',
                          'Birth_Date': 'Birthdate',
                          'Place_of_Birth': 'mbfc__Place_of_Birth__c',
                          'Foreign_Born': 'Foreign_Born__c',
                          'Foreign_Citizenship': 'Foreign_Citizenship__c',
                          'Immigration_Status': 'Immigration_Status__c',
                          'Passport/Visa_Expiration_Date': 'Passport_Visa_Expiration_Date__c',
                          'Social_Security_Account_Number': 'Social_Security_Account_Number__c',
                          'Deceased_Date': 'mbfc__Date_of_Death__c',
                        #   'Languages': 'Languages__c'
                          })
                      .assign(Bi_Ritual__c=lambda x: x['Type(s)'].str.contains('Biritual'))
                      .assign(Non_Latin_Rite__c=lambda x: x['Type(s)'].str.contains('Non-Latin Rite'))
                      .assign(Archdpdx_Migration_Id__c=lambda x: x.index)
                    #   .assign(Mailing_Address=lambda row: combine_addresses(row, 'Mailing_Address', 'Mailing_Address_2'), axis=1)
                      .drop(columns=misc_columns_to_drop)
                      .drop(columns=affiliation_columns)
                      .drop(columns=fields_not_yet_mapped)

        )


df_contact_staging.sample(10)

Unnamed: 0_level_0,Type(s),ADPDX_Clergy_Status__c,ADPDX_Religious_Status__c,ADPDX_Login_ID__c,ADPDX_Access_Permission__c,Salutation,FirstName,MiddleName,LastName,Suffix,...,Birthdate,mbfc__Place_of_Birth__c,Foreign_Born__c,Foreign_Citizenship__c,Immigration_Status__c,Passport_Visa_Expiration_Date__c,mbfc__Date_of_Death__c,Bi_Ritual__c,Non_Latin_Rite__c,Archdpdx_Migration_Id__c
Record Number,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
716,"Priest,Religious",Deceased,Deceased,,,Rev.,José,,Ortega,,...,1937-11-11,Mexico (Aguascalientes),Yes,,,,2018-08-23,False,False,716
3249,Staff,,,,,Ms.,Melissa,,Piña,,...,,,,,,,,False,False,3249
1615,Staff,,,,,Mrs.,Ruth,,Hayes-Barba,,...,,,,,,,,False,False,1615
3291,Staff,,,,,Mr.,Guy,,Allen,,...,,,,,,,,False,False,3291
2620,Religious,,Active,,,Sr.,Maria Ngoan,,Nguyen,,...,,,,,,,,False,False,2620
2640,Religious,,Active,,,Sr.,Linda,,Patrick,,...,,,,,,,,False,False,2640
1099,Priest,Deceased,,,,Rev.,Emmet,,Harrington,,...,,,,,,,1900-01-01,False,False,1099
1138,Priest,Deceased,,,,Rev.,Francis,,Kennard,,...,,,,,,,1900-01-01,False,False,1138
86,Permanent Deacon,Transferred Out,,camsberry,,Deacon,Charles,Melvin,Amsberry,,...,1951-07-23,York PA,No,,,,,False,False,86
3080,Staff,,,,,Ms.,Alanna,,O’Brien,,...,,,,,,,,False,False,3080


### Private Address Handling


In [85]:
# df_contact_staging.loc[:,'Private_Address__Street__s':'Private_Address__CountryCode__s'][~df_contact_staging['Private_Address__StateCode__s'].isna()]

In [86]:
# df_contact_staging['Private_Address__CountryCode__s'] = df_contact_staging.apply(lambda row: 'United States' if pd.notnull(row['Private_Address__StateCode__s']) and pd.isnull(row['Private_Address__CountryCode__s']) else row['Private_Address__CountryCode__s'], axis=1)

### Handle Boolean Fields


In [87]:
boolean_columns_to_convert = ['Foreign_Born__c', 'Directory_Include__c', 'Directory_Include_Middle_Name__c', 'Directory_Include_Suffix__c',
       'Suppress_From_Reports__c', 'Send_Group_Mail_and_Email__c', ]

df_contact_staging[boolean_columns_to_convert] = df_contact_staging[boolean_columns_to_convert].replace({'Yes': True, 'No': False})


In [88]:
df_contact_staging[boolean_columns_to_convert] = df_contact_staging[boolean_columns_to_convert].fillna(False)

df_contact_staging[boolean_columns_to_convert].sample(5)

Unnamed: 0_level_0,Foreign_Born__c,Directory_Include__c,Directory_Include_Middle_Name__c,Directory_Include_Suffix__c,Suppress_From_Reports__c,Send_Group_Mail_and_Email__c
Record Number,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
1758,False,False,False,False,False,True
69,False,False,False,False,False,True
934,False,False,False,False,False,False
805,True,True,False,False,False,True
997,False,True,False,False,False,True


### Set Contact Record Type


In [89]:
# Set Record Type

# Go down row by row and check the 'Type(s)' columns, check for certain words that are keys in a dictionary, and
# the that row's 'Type(s)' field contains a string that is in the a key in a dictionary the update another columns
# called 'ContactRecordType' with the paired value.

contact_type_map = {
    'Bishop': 'Priest',
    'Diaconate': 'Lay_Person',
    'Permanent Deacon': 'Permanent_Deacon',
    'Priest': 'Priest',
    'Staff': 'Lay_Person',
    'Seminarian': 'Lay_Person',
    'Wife': 'Lay_Person',
    'Religious': 'Religious',
    'Seminary Applicant': 'Lay_Person',
    'Transitional Deacon': 'Priest',
    'Archive': 'Lay_Person'
}

def update_contact_record_type(row):
    for key, value in contact_type_map.items():
        if key in row['Type(s)']:
            return value
    return None

df_contact_staging['ContactRecordType'] = df_contact_staging.apply(update_contact_record_type, axis=1)

In [90]:
# Map in the RecordTypeIDs
df_contact_staging['RecordTypeID'] = df_contact_staging['ContactRecordType'].map(record_types_mapping)

In [91]:
# Check for any Contacts who are missing a RecordTypeId
df_contact_staging[df_contact_staging['RecordTypeID'].isna()]

Unnamed: 0_level_0,Type(s),ADPDX_Clergy_Status__c,ADPDX_Religious_Status__c,ADPDX_Login_ID__c,ADPDX_Access_Permission__c,Salutation,FirstName,MiddleName,LastName,Suffix,...,Foreign_Born__c,Foreign_Citizenship__c,Immigration_Status__c,Passport_Visa_Expiration_Date__c,mbfc__Date_of_Death__c,Bi_Ritual__c,Non_Latin_Rite__c,Archdpdx_Migration_Id__c,ContactRecordType,RecordTypeID
Record Number,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1


### Email Validation


In [92]:
# !pip install email_validator
from email_validator import validate_email, EmailNotValidError

# function that validates an email and if it is invalid it returns nothing
def validate_email_address(email):
    if isinstance(email, float):
        return None
    try:
        v = validate_email(email)
        return v.email
    except EmailNotValidError:
        return None


In [93]:
# init a list of Email columns
# email_columns = ['npe01__HomeEmail__c', 'npe01__WorkEmail__c', 'npe01__AlternateEmail__c']

# df_contact_staging[email_columns] = df_contact_staging[email_columns].applymap(validate_email_address)

### Final Dataframe Cleanup


In [94]:
# drop columns that are no longer needed
del df_contact_staging['Type(s)']
del df_contact_staging['ContactRecordType']

In [95]:
# clean up all NaN values

df_contact_staging = df_contact_staging.fillna('')

## Load


In [96]:
df_contact_staging['Archdpdx_Job_Id__c'] = curr_job_id

In [97]:
# generate CSV for manual loading
df_contact_staging.to_csv(f'/Users/matthewmartin/Library/CloudStorage/GoogleDrive-matt@meribahflow.com/Shared drives/Clients/ADPDX (Portland)/Data/Clergy DB/staging/df_contacts_staging.csv', encoding='utf-8-sig')
df_contact_staging.to_csv('contacts_staging.csv', encoding='utf-8-sig')


In [98]:
# upsert Contact records into SF using Bulk api

from simple_salesforce.exceptions import SalesforceMalformedRequest

bulk_data = []
for row in df_contact_staging.itertuples(index=False):
    d = row._asdict()
    # del d['Index']
    bulk_data.append(d)

try:
    # Attempt to upsert Contact records into SF using Bulk API
    contact_upsert = sf.bulk.Contact.upsert(data=bulk_data, external_id_field='Archdpdx_Migration_Id__c', batch_size=100, use_serial=True)
    contact_upsert_results = pd.DataFrame(contact_upsert)
except SalesforceMalformedRequest as e:
    # If a SalesforceMalformedRequest error occurs, print the error message and response content
    print(f"SalesforceMalformedRequest error: {e}")
    print(f"Response content: {e.content}")

SalesforceMalformedRequest error: Malformed request https://adpdx--devpro.sandbox.my.salesforce.com/services/async/57.0/job/750Dx000007zh2yIAA/batch/751Dx000009HDSXIA4/result. Response content: {'exceptionCode': 'InvalidBatch', 'exceptionMessage': 'Records not processed'}
Response content: {'exceptionCode': 'InvalidBatch', 'exceptionMessage': 'Records not processed'}


In [99]:
# Print upsert results to local file

# keys = contact_upsert[0].keys()

# with open('contact_results', 'w', newline='') as csv_file:
#     writer = csv.DictWriter(csv_file, keys)
#     writer.writeheader()
#     writer.writerows(contact_upsert)


# CONTACT > REGISTER ENTRIES


In [100]:
import pandas as pd

# Load CSV
df = (pd.read_csv('/Users/matthewmartin/Library/CloudStorage/GoogleDrive-matt@meribahflow.com/Shared drives/Clients/ADPDX (Portland)/Data/Clergy DB/reports from clergypdx/People.csv')
               .rename(columns=lambda x: x.replace(' ', '_')) # Remove whitespace in column names
               .drop(index=0) # Drops the extra row that replicates the labels
)

df

Unnamed: 0,Record_Number,Common_Name,Sort_Name,Type(s),Clergy_Status,Religious_Status,Login_ID,Password,Password_Must_be_Changed,Access_Permission,...,CARA_Ethnicity,Seminarian_Status,Other_Diaconal_Ministry,Spiritual_Director_Authorized,Link_to_Religious_Community,Place_of_Work,Volunteer_Place,Type_of_Work,Work_Load,Work_Title
1,2766,Rev. Stephen Abaukaka,abaukaka stephen ozovehe,Priest,Transferred Out,,sabukaka,def2a990be60a7998b1ed7c820101f3bd02d33b8992518...,Yes,,...,,,,,0,,,,,
2,2337,Mr. Rogelio Acevedo,acevedo rogelio,Staff,,,,,,,...,,,,,0,,,,,
3,3244,Mr. Sean Ackroyd,ackroyd sean,Staff,,,,,,,...,,,,,0,,,,,
4,3295,Ms. Sherril Acton,acton sherril,Staff,,,,,,,...,,,,,0,,,,,
5,2164,Ms. Barbara Adams,adams barbara,Staff,,,,,,,...,,,,,0,,,,,
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
3012,1670,Ms. Jenny Zomerdyk,zomerdyk jenny,Staff,,,,,,,...,,,,,0,,,,,
3013,2755,"Br. Daniel Zorrilla, MSpS",zorrilla daniel,Religious,,Active,dzorrilla,391eedf7c936f63d3d0a7d9ea7e506a84709662fd31ba9...,Yes,,...,,,,,14,,,,,
3014,1962,Ms. Kim Zuber,zuber kim,Staff,,,,,,,...,,,,,0,,,,,
3015,2202,Ms. Agnes Zueger,zueger agnes,Staff,,,,,,,...,,,,,0,,,,,


In [101]:
# @title Parse Sacrament & Notation Types

import pandas as pd

# Define the structure of your column sets with correct attribute names
column_sets = [
    {'date': 'Baptism_Date', 'place': 'Place_of_Baptism', 'notation_type': 'Proof of Baptism'},
    {'date': 'Confirmation_Date', 'place': 'Place_of_Confirmation', 'notation_type': 'Notice of Confirmation'},
    {'date': 'Received_Date', 'place': 'Parish_of_Record', 'notation_type': 'Notation of Profession of Faith'},
    {'date': 'Marriage_Date', 'place': 'Place_of_Marriage', 'notation_type': 'Notice of Matrimony'},
    {'date': 'Diaconal_Ordination_Date', 'place': 'Diaconal_Ordination_Place', 'prelate': 'Diaconate_Ordination_Prelate', 'notation_type': 'Notice of Holy Orders', 'ordination_type': 'Diaconate'},
    {'date': 'Presbyteral_Ordination_Date', 'place': 'Presbyteral_Ordination_Place', 'prelate': 'Presbyteral_Ordination_Prelate', 'notation_type': 'Notice of Holy Orders', 'ordination_type': 'Presbyteral'},
    {'date': 'Episcopal_Ordination_Date', 'place': 'Episcopal_Ordination_Place', 'prelate': 'Episcopal_Ordination_Prelate', 'notation_type': 'Notice of Holy Orders', 'ordination_type': 'Episcopal'}
]

# New DataFrame for entries
register_entries = pd.DataFrame(columns=['RecordNumber', 'mbfc__Register_Entry_Type__c', 'mbfc__Type__c', 'mbfc__Notation_Type__c', 'mbfc__Ordination_Type__c', 'Date', 'Place', 'Prelate'])
new_entries = []  # List to store entries before final concatenation

# Processing rows
for row in df.itertuples():
    for column_set in column_sets:
        date_value = getattr(row, column_set['date'], None)
        if pd.notna(date_value):  # Check if date field is not NaN
            entry = {
                'RecordNumber': getattr(row, 'Record_Number', None),
                'Date': date_value,
                'Place': getattr(row, column_set['place'], None)
            }
            # Add Prelate if applicable
            if 'prelate' in column_set:
                entry['Prelate'] = getattr(row, column_set['prelate'], None)

            # Set 'mbfc__Register_Entry_Type__c', and conditionally add 'mbfc__Type__c' or 'mbfc__Notation_Type__c'
            if 'sacrament_type' in column_set:
                entry['mbfc__Type__c'] = column_set['sacrament_type']
                entry['mbfc__Register_Entry_Type__c'] = 'Sacrament'
            if 'notation_type' in column_set:
                entry['mbfc__Notation_Type__c'] = column_set['notation_type']
                entry['mbfc__Register_Entry_Type__c'] = 'Notation'

            # Handle ordination type specific updates
            if 'ordination_type' in column_set:
                entry['mbfc__Ordination_Type__c'] = column_set['ordination_type']

            new_entries.append(entry)

# Concatenate all new entries to the DataFrame at once
if new_entries:
    register_entries = pd.concat([register_entries, pd.DataFrame(new_entries)], ignore_index=True)

print(f"Total records added: {len(register_entries)}")

# Optionally, save the new DataFrame to a CSV
register_entries.to_csv('Register_Entries.csv', index=False)

# Display the DataFrame
register_entries.sample(10)


Total records added: 1534


Unnamed: 0,RecordNumber,mbfc__Register_Entry_Type__c,mbfc__Type__c,mbfc__Notation_Type__c,mbfc__Ordination_Type__c,Date,Place,Prelate
1378,2070,Notation,,Proof of Baptism,,1965-05-26,"Star of the Sea Parish, Astoria, OR",
1325,2773,Notation,,Proof of Baptism,,1983-11-06,"Immaculate Conception Church, Las Vegas, NM",
685,2030,Notation,,Notice of Matrimony,,2012-04-21,,
236,278,Notation,,Notice of Matrimony,,1990-03-17,,
654,2080,Notation,,Notice of Confirmation,,1964-04-25,"St. Patrick’s Church, Larkspur, CA",
371,319,Notation,,Notice of Holy Orders,Diaconate,2008-10-25,"Cathedral of the Immaculate Conception, Portla...",
164,2079,Notation,,Proof of Baptism,,1944-10-22,"St. Patrick Church, Tacoma, WA",
679,2840,Notation,,Notice of Matrimony,,2000-08-19,"Lutheran Church, Regina, Saskatchewan, Canada",
1181,597,Notation,,Notice of Holy Orders,Presbyteral,2009-06-13,"Cathedral of the Immaculate Conception, Portla...",Most Rev. John G. Vlazny
558,2771,Notation,,Proof of Baptism,,1996-07-13,"San Jose Church, Albuquerque, NM",


In [102]:
from nameparser import HumanName
from nameparser.config import CONSTANTS

# Add dataset-specific Titles and Suffix constants for parsing
CONSTANTS.titles.add('Very', 'Rev.', 'Very Rev.', 'Sr.', 'Most Rev.')
CONSTANTS.suffix_acronyms.add('FRS', 'J.C.L.', 'J.C.L., D.D.', 'D.D.', 'OMI', 'OSA', 'OCD', 'OP', 'OC', 'FSE', 'OMV', 'SDB', 'SM', 'SFX', 'SP', 'OP', 'O.S.M', 'SNJM', 'OSF', 'HMRF', 'DD', 'CSJP', 'SDD', 'BVM', 'BVM - President', 'SJ', 'SL', 'IX', 'SSJ', 'J.C.L.', 'J.C.L', 'OFM', 'MSpS', 'Fco.' )


def parse_name(name):
    if pd.isna(name):  # Checks if the name is NaN or None
        return {
            'Salutation': '',
            'FirstName': '',
            'MiddleName': '',
            'LastName': '',
            'Suffix': ''
        }
    else:
        name = HumanName(name)
        return {
            'Salutation': name.title,
            'FirstName': name.first,
            'MiddleName': name.middle,
            'LastName': name.last,
            'Suffix': name.suffix
        }

# Apply the parsing function only where 'Prelate' exists and is not NaN
for entry in new_entries:
    if 'Prelate' in entry and pd.notna(entry['Prelate']):
        parsed_name = parse_name(entry['Prelate'])
        entry.update(parsed_name)

# Ensure the DataFrame creation from new_entries includes checks for existence of keys:
register_entries = pd.DataFrame(new_entries)
if 'Prelate' in register_entries.columns:
    register_entries['Salutation'] = register_entries['Prelate'].apply(lambda x: parse_name(x)['Salutation'] if pd.notna(x) else '')
    register_entries['FirstName'] = register_entries['Prelate'].apply(lambda x: parse_name(x)['FirstName'] if pd.notna(x) else '')
    register_entries['MiddleName'] = register_entries['Prelate'].apply(lambda x: parse_name(x)['MiddleName'] if pd.notna(x) else '')
    register_entries['LastName'] = register_entries['Prelate'].apply(lambda x: parse_name(x)['LastName'] if pd.notna(x) else '')
    register_entries['Suffix'] = register_entries['Prelate'].apply(lambda x: parse_name(x)['Suffix'] if pd.notna(x) else '')


# Display the DataFrame
print(f"Total records added: {len(register_entries)}")
register_entries.sample(10)



Total records added: 1534


Unnamed: 0,RecordNumber,Date,Place,Prelate,mbfc__Notation_Type__c,mbfc__Register_Entry_Type__c,mbfc__Ordination_Type__c,Salutation,FirstName,MiddleName,LastName,Suffix
244,606,1984-06-09,"Cathedral of the Immaculate Conception, Portla...",Most Rev. Cornelius M. Power,Notice of Holy Orders,Notation,Presbyteral,Most Rev.,Cornelius,M.,Power,
67,2082,2008-12-27,,,Notice of Matrimony,Notation,,,,,,
1321,330,1976-05-21,,,Notice of Matrimony,Notation,,,,,,
1206,2078,1969-03-12,"Beaverton Foursquare Church, Beaverton, Oregon",,Proof of Baptism,Notation,,,,,,
1043,1602,2004-07-10,,,Notice of Holy Orders,Notation,Presbyteral,,,,,
451,1075,1999-06-03,,,Notice of Holy Orders,Notation,Presbyteral,,,,,
885,120,1985-01-01,,,Notice of Matrimony,Notation,,,,,,
158,647,1957-07-25,Dominican,,Notice of Holy Orders,Notation,Presbyteral,,,,,
1336,178,1953-03-07,,,Notice of Matrimony,Notation,,,,,,
153,428,1982-04-28,,,Notation of Profession of Faith,Notation,,,,,,


In [103]:
# @title Query Salesforce for existing contacts and create a dictionary for mapping

from simple_salesforce import Salesforce

query = """
SELECT Id, Archdpdx_Migration_Id__c
FROM Contact
"""
result = sf.query_all(query)
contact_map = {rec['Archdpdx_Migration_Id__c']: rec['Id'] for rec in result['records']}


In [104]:
# Get RecordTypeId for Contact.Priest

priest_contact_recordtype_id = df_sf_recordTypes.loc[
    (df_sf_recordTypes['DeveloperName'] == 'Priest') & (df_sf_recordTypes['SobjectType'] == 'Contact'),
    'Id'
    ].iloc[0]  # Use .iloc[0] to get the first item if you're expecting exactly one match


In [105]:
# @title Query for Contacts by Names and Create New Contacts

from simple_salesforce import SFType, SalesforceResourceNotFound

contact = SFType('Contact', sf.session_id, sf.sf_instance)
for index, row in register_entries.iterrows():
    first_name, last_name = row.get('FirstName'), row.get('LastName')

    if pd.isna(first_name) or pd.isna(last_name) or first_name.strip() == '' or last_name.strip() == '':
        # If either first name or last name is missing or empty, skip this row or handle as needed
        print(f"Skipping row {index} due to missing name information.")
        continue

    try:
        # Search for contact by First and Last Name
        query = f"SELECT Id FROM Contact WHERE FirstName = '{first_name}' AND LastName = '{last_name}'"
        result = sf.query(query)
        if result['totalSize'] > 0:
            contact_id = result['records'][0]['Id']
        else:
            # Create a new contact if no match found
            new_contact = {
                'FirstName': first_name,
                'LastName': last_name,
                'Archdpdx_Job_Id__c': curr_job_id,
                'RecordTypeId': priest_contact_recordtype_id
            }
            create_result = contact.create(new_contact)
            contact_id = create_result['id']

        # Update DataFrame with the Salesforce Contact ID
        register_entries.at[index, 'mbfc__Celebrant__c'] = contact_id

    except SalesforceException as e:
        print(f"Error processing row {index}: {e}")



Skipping row 2 due to missing name information.
Skipping row 3 due to missing name information.
Skipping row 4 due to missing name information.
Skipping row 5 due to missing name information.
Skipping row 6 due to missing name information.
Skipping row 8 due to missing name information.
Skipping row 9 due to missing name information.
Skipping row 10 due to missing name information.
Skipping row 11 due to missing name information.
Skipping row 12 due to missing name information.
Skipping row 13 due to missing name information.
Skipping row 16 due to missing name information.
Skipping row 17 due to missing name information.
Skipping row 18 due to missing name information.
Skipping row 19 due to missing name information.
Skipping row 20 due to missing name information.
Skipping row 21 due to missing name information.
Skipping row 22 due to missing name information.
Skipping row 23 due to missing name information.
Skipping row 24 due to missing name information.
Skipping row 26 due to miss

In [106]:
# @title Map Contact IDs to Register Entries

register_entries_2 = register_entries

register_entries_2['mbfc__Contact__c'] = register_entries['RecordNumber'].map(contact_map)


In [107]:
# @title Append Job_Id__c
register_entries_2['Archdpdx_Job_Id__c'] = curr_job_id

## Generate an External ID


In [108]:
def create_external_id(row):
    record_number = str(row['RecordNumber']).replace(' ', '').replace('-', '')
    entry_type = str(row['mbfc__Register_Entry_Type__c']).replace(' ', '').replace('-', '')

    # Check whether to use Type or Notation Type based on what's available
    if 'mbfc__Type__c' in row and not pd.isna(row['mbfc__Type__c']):
        type_field = str(row['mbfc__Type__c']).replace(' ', '').replace('-', '')
    elif 'mbfc__Notation_Type__c' in row and not pd.isna(row['mbfc__Notation_Type__c']):
        type_field = str(row['mbfc__Notation_Type__c']).replace(' ', '').replace('-', '') + str(row['mbfc__Ordination_Type__c']).replace(' ', '').replace('-', '')
    else:
        type_field = 'Unknown'

    return f"{record_number}_{entry_type}_{type_field}"

In [109]:
# Assuming your DataFrame is named `register_entries`
register_entries_2['Archdpdx_Migration_Id__c'] = register_entries.apply(create_external_id, axis=1)

if register_entries['Archdpdx_Migration_Id__c'].duplicated().any():
    print("Warning: There are duplicate external IDs.")
    # Optionally, show the duplicates
    duplicates = register_entries[register_entries['external_id'].duplicated(keep=False)]
    print(duplicates)
else:
    print("All external IDs are unique.")


All external IDs are unique.


In [110]:
# Drop unnecessary columns:
register_entries_2.drop(['RecordNumber', 'Prelate', 'Salutation', 'FirstName', 'MiddleName', 'LastName', 'Suffix'], axis=1, inplace=True)

In [111]:
register_entries_staging = register_entries_2

In [112]:
# Remove all NaN values:
register_entries_staging.fillna('', inplace=True)

# Rename columns
register_entries_staging = register_entries_staging.rename(columns={
    'Place': 'Location_text__c',
    'Date': 'mbfc__Event_Date__c'
})


In [113]:
# @title Sent to CSV
# register_entries_staging.to_csv('register_entries_staging.csv', encoding='utf-8-sig')

In [114]:
# @title Upsert Register Entry Records

bulk_data = []
for row in register_entries_staging.itertuples(index=False):
    d = row._asdict()
    # del d['Index']
    bulk_data.append(d)

# Keep the batch <100 as I've been getting an exceptionCode: 'InvalidBatch', 'exceptionMessage': 'Records not processed'
reg_entry_upsert = sf.bulk.mbfc__Sacrament__c.upsert(data=bulk_data, external_id_field='Archdpdx_Migration_Id__c', batch_size=100, use_serial=False)
reg_entry_upsert_results = pd.DataFrame(reg_entry_upsert)

In [115]:
# Print upsert results to local file

keys = reg_entry_upsert[0].keys()

with open('register_entry_results', 'w', newline='') as csv_file:
    writer = csv.DictWriter(csv_file, keys)
    writer.writeheader()
    writer.writerows(reg_entry_upsert)

# Users


In [116]:
# df_users = df_contacts[df_contacts['Access Permission'].isna() == False]
# df_users = df_users[['Record Number', 'Common Name', 'Sort Name', 'Type(s)', 'ContactRecordType', 'Login ID', 'Access Permission']]
# df_users.sort_values('Access Permission')
# df_users.to_csv(f'/content/drive/Shareddrives/Clients/ADPDX (Portland)/Data/Clergy DB/working/users_working.csv')

# CONTACT > AFFILIATIONS


## Education (Affiliations)

This section takes multiple sets of columns (all related to a person's education) from the Contacts table, and combines them into a single set of columns in a new dataframe for insertion into Salesforce as Affiliation records.

In [251]:
df.columns

Index(['Record_Number', 'Common_Name', 'Sort_Name', 'Type(s)', 'Clergy_Status',
       'Religious_Status', 'Login_ID', 'Password', 'Password_Must_be_Changed',
       'Access_Permission',
       ...
       'CARA_Ethnicity', 'Seminarian_Status', 'Other_Diaconal_Ministry',
       'Spiritual_Director_Authorized', 'Link_to_Religious_Community',
       'Place_of_Work', 'Volunteer_Place', 'Type_of_Work', 'Work_Load',
       'Work_Title'],
      dtype='object', length=142)

In [261]:
import pandas as pd
from functools import lru_cache

# Load CSV
df = (pd.read_csv('/Users/matthewmartin/Library/CloudStorage/GoogleDrive-matt@meribahflow.com/Shared drives/Clients/ADPDX (Portland)/Data/Clergy DB/reports from clergypdx/People.csv')
               .rename(columns=lambda x: x.replace(' ', '_')) # Remove whitespace in column names
               .drop(index=0) # Drops the extra row that replicates the labels
)


# Define the structure of your column sets with correct attribute names
degree_sets = [
    {'year': 'Bachelor_Degree_Year', 'type': 'Bachelor_Degree_Type', 'institution': 'Bachelor_Degree_Institution'},
    {'year': 'Graduate_1_Degree_Year', 'type': 'Graduate_1_Degree_Type', 'institution': 'Graduate_1_Degree_Institution'},
    {'year': 'Graduate_2_Degree_Year', 'type': 'Graduate_2_Degree_Type', 'institution': 'Graduate_2_Degree_Institution'},
    {'year': 'Graduate_3_Degree_Year', 'type': 'Graduate_3_Degree_Type', 'institution': 'Graduate_3_Degree_Institution'},
    {'year': 'Graduate_4_Degree_Year', 'type': 'Graduate_4_Degree_Type', 'institution': 'Graduate_4_Degree_Institution'}
]

# Query for the Record Type ID for 'Organization'
record_type_result = sf.query("SELECT Id FROM RecordType WHERE SobjectType = 'Account' AND DeveloperName = 'Organization'")
organization_record_type_id = record_type_result['records'][0]['Id'] if record_type_result['records'] else None

# Initialize the DataFrame for the staging table
education_staging = pd.DataFrame()

# Function to check and create institution account
@lru_cache(maxsize=None)
def get_or_create_institution_account(institution_name):
    if pd.isna(institution_name):
        return None  # Return None or handle as appropriate if institution name is NaN

    # Query Salesforce to find the institution
    query = f"SELECT Id, Name FROM Account WHERE Name = '{institution_name}' LIMIT 1"
    results = sf.query(query)
    
    # If exists, return the ID
    if results['records']:
        return results['records'][0]['Id']
    else:
        # Ensure no NaN values are sent to Salesforce
        account_data = {
            'Name': institution_name if pd.notna(institution_name) else "Default Name",  # Provide a default if NaN
            'RecordTypeId': organization_record_type_id,
            'Organization_Type__c': 'School'
        }
        # Remove keys with None values to avoid JSON serialization issues
        account_data = {k: v for k, v in account_data.items() if v is not None}
        
        new_account = sf.Account.create(account_data)
        return new_account['id']

# Get Contact record ID from Salesforce
@lru_cache(maxsize=None)
def get_contact_id_by_record_number(record_number):
    if pd.isna(record_number):
        return None
    query = f"SELECT Id FROM Contact WHERE Archdpdx_Migration_Id__c = '{record_number}'"
    results = sf.query(query)
    if results['records']:
        return results['records'][0]['Id']
    return None


# Initialize an empty list to collect DataFrames or dictionaries
new_entries = []

# Process each row and each degree set
for index, row in df.iterrows():
    for degree_set in degree_sets:
        year = row[degree_set['year']]
        if pd.notna(year):  # Only proceed if the year column is not NaN
            formatted_year = f"{int(year)}-01-01"  # Convert year to YYYY-MM-DD format
            institution_name = row[degree_set['institution']]
            account_id = get_or_create_institution_account(institution_name)
            contact_id = get_contact_id_by_record_number(row['Record_Number'])
            
            # Create a record for the staging table
            affiliation_record = {
                'mbfc__Person__c': contact_id,
                'mbfc__Completion_Date__c': formatted_year,
                'mbfc__Context__c': account_id,
                'mbfc__Category__c': 'Education (non-degree)',
                'mbfc__Affiliation__c': row[degree_set['type']]
                # 'Institution_Name': institution_name
            }
            new_entries.append(affiliation_record)

# Convert all collected records to a DataFrame in one go
education_staging = pd.DataFrame(new_entries)


#FIXME: There are 4 rows where no INSTITUTION is listed. This makes it impossible to import an Affiliation record. Need to figure out how to handle this with Client. 
#FIXME: There are about 15 rows where no DEGREE is listed.

In [262]:
# Function to create a unique ID based on completion date and affiliation type
def create_unique_id(row):
    # Concatenate the three fields with mbfc__Person__c at the front
    combined = f"{row['mbfc__Person__c']}{row['mbfc__Completion_Date__c']}{row['mbfc__Affiliation__c']}"
    # Remove unwanted characters and convert to lowercase
    clean_id = ''.join(combined.split()).replace('-', '').replace('.', '').lower()
    # Limit the string to 50 characters
    return clean_id[:50]

# Apply the function to each row and create a new column with the unique ID
education_staging['Archdpdx_Migration_Id__c'] = education_staging.apply(create_unique_id, axis=1)

# Check the first few rows to verify the new column
education_staging.head()

Unnamed: 0,mbfc__Person__c,mbfc__Completion_Date__c,mbfc__Context__c,mbfc__Category__c,mbfc__Affiliation__c,Archdpdx_Migration_Id__c
0,003Dx00000m0Oa7IAE,1996-01-01,,Education (non-degree),Theology,003dx00000m0oa7iae19960101theology
1,003Dx00000m0Oa7IAE,2013-01-01,001Dx00001FaYqrIAF,Education (non-degree),MA Pastoral Studies,003dx00000m0oa7iae20130101mapastoralstudies
2,003Dx00000m0OXKIA2,1976-01-01,001Dx00001FaYqwIAF,Education (non-degree),BA Liberal Arts,003dx00000m0oxkia219760101baliberalarts
3,003Dx00000m0OXKIA2,1980-01-01,001Dx00001FaYr1IAF,Education (non-degree),M.Div.,003dx00000m0oxkia219800101mdiv
4,003Dx00000m0OajIAE,2004-01-01,001Dx00001FaYr6IAF,Education (non-degree),"Bachelor, Philosophy","003dx00000m0oajiae20040101bachelor,philosophy"


In [263]:
# Save the staging table to CSV
education_staging.to_csv('staging_files/education_staging.csv', index=False)


# AFFILIATIONS


In [195]:
# @title Import Assignments.csv

import pandas as pd

# No longer needed...
# Organization_mapping = {
#     'Offices': 'Organization',
#     'Parishes': 'Church',
#     'RelCommunities': 'Religious',
#     'Schools': 'School',
#     'Vicariates': 'Deanery',
#     'NewmanCenters': 'Organization'
# }

df_affiliations = (
    pd.read_csv('/Users/matthewmartin/Library/CloudStorage/GoogleDrive-matt@meribahflow.com/Shared drives/Clients/ADPDX (Portland)/Data/Clergy DB/reports from clergypdx/Assignments (1).csv')
    .set_index('Record Number', verify_integrity=True)
    .drop(index='recNum', errors='ignore')  # Added errors='ignore' to prevent errors if 'recNum' does not exist
    .drop(columns=['Historic Name'], errors='ignore')  # Added errors='ignore' for the same reason
    .rename(columns=lambda x: x.replace(' ', '_'))  # Remove whitespace in column names
    .assign(Account_Ext_Id=lambda df: df['Organization_Table_Name'] + '_' + df['Organization_Table_Link'])
    .assign(mbfc__Person__r=lambda df: df['Assigned_Person'].apply(lambda x: {'Archdpdx_Migration_Id__c': x}))
    .assign(mbfc__Context__r=lambda df: df['Account_Ext_Id'].apply(lambda x: {'Archdpdx_Migration_Id__c': x}))
    # .assign(mbfc__Use_Custom_Title__c= True)
    .assign(mbfc__Category__c= 'Any All')
    # .assign(Archdpdx_Migration_Id__c= df_affiliations.index)
    .drop(columns=[
        'Assigned_Person'
        ,'Organization_Table_Name'
        ,'Organization_Table_Link'
        ,'Projected_Term_End_Date'
        ,'Term_Number'
        ,'Leave_Type' # Leave out 'Leave_Type' until mapped properly
        ])
    .rename(columns={
        'Duty_Load': 'Duty_Load__c',
        'Start_Date': 'mbfc__Start_Date__c',
        'End_Date': 'mbfc__Completion_Date__c',
        'Assignment_Title': 'mbfc__Affiliation__c',
        'Archdiocesan_Assignment': 'ADPDX_Archdiocesan_Assignment__c',
    })
    .replace({'ADPDX_Archdiocesan_Assignment__c': {'Yes': True, 'No': False, None: False}})
    .fillna('')
)

# Display a sample of the DataFrame to check the new structure
df_affiliations.sample(10)



Unnamed: 0_level_0,mbfc__Affiliation__c,ADPDX_Archdiocesan_Assignment__c,Duty_Load__c,mbfc__Start_Date__c,mbfc__Completion_Date__c,Account_Ext_Id,mbfc__Person__r,mbfc__Context__r,mbfc__Category__c
Record Number,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1
3296,Business Manager,False,,2023-01-01,,Parishes_9,{'Archdpdx_Migration_Id__c': '3148'},{'Archdpdx_Migration_Id__c': 'Parishes_9'},Any All
3349,In Residence,True,Full Time,2023-10-28,,RelCommunities_2,{'Archdpdx_Migration_Id__c': '3023'},{'Archdpdx_Migration_Id__c': 'RelCommunities_2'},Any All
2752,Children’s Faith Formation Coordinator,False,,,,Parishes_53,{'Archdpdx_Migration_Id__c': '2878'},{'Archdpdx_Migration_Id__c': 'Parishes_53'},Any All
3051,Director of Campus Ministry,False,Full Time,1995-07-01,2006-07-01,Schools_56,{'Archdpdx_Migration_Id__c': '629'},{'Archdpdx_Migration_Id__c': 'Schools_56'},Any All
165,Administrator,True,,,2018-06-30,Parishes_75,{'Archdpdx_Migration_Id__c': '804'},{'Archdpdx_Migration_Id__c': 'Parishes_75'},Any All
970,Business Manager,False,,,,Parishes_53,{'Archdpdx_Migration_Id__c': '1749'},{'Archdpdx_Migration_Id__c': 'Parishes_53'},Any All
1784,Parish Secretary,False,,,,Parishes_79,{'Archdpdx_Migration_Id__c': '2287'},{'Archdpdx_Migration_Id__c': 'Parishes_79'},Any All
2866,Music Coordinator,False,,,,Parishes_153,{'Archdpdx_Migration_Id__c': '2963'},{'Archdpdx_Migration_Id__c': 'Parishes_153'},Any All
3165,On Leave,True,Full Time,2022-12-31,,Offices_21,{'Archdpdx_Migration_Id__c': '564'},{'Archdpdx_Migration_Id__c': 'Offices_21'},Any All
2828,Music Director,False,,,,Parishes_112,{'Archdpdx_Migration_Id__c': '2933'},{'Archdpdx_Migration_Id__c': 'Parishes_112'},Any All


In [196]:
#FIXME

#TODO: Required fields are missing: [mbfc__Category__c, mbfc__Affiliation__c] 
#TODO: INVALID_TYPE_ON_FIELD_IN_RECORD: Archdiocesan Assignment: value not of required type:  [ADPDX_Archdiocesan_Assignment__c]


In [197]:
# Set Archdpdx_Migration_Id__c External ID
df_affiliations['Archdpdx_Migration_Id__c'] = df_affiliations.index

# Create Job ID
df_affiliations['Archdpdx_Job_Id__c'] = curr_job_id



In [198]:
# Final cleanup
df_affiliations.drop(columns=['Account_Ext_Id'], inplace=True)

#FIXME: INVALID_FIELD: Foreign key external ID: relcommunities_23 not found for field Archdpdx_Migration_Id__c
#FIXME: INVALID_FIELD: Foreign key external ID: offices_0 not found for field Archdpdx_Migration_Id__c
#FIXME: Record #115 > FIELD_INTEGRITY_EXCEPTION: Start Date: invalid date: Tue Aug 01 00:00:00 GMT 1021 [mbfc__Start_Date__c

In [234]:
df_affiliations.to_csv('staging_files/affiliations_staging.csv', encoding='utf-8', index=False)

In [217]:
# @title Upsert Register Entry Records

bulk_data = []
for row in df_affiliations.itertuples(index=False):
    d = row._asdict()
    bulk_data.append(d)

In [243]:
# Attempt to use s-s's bulk 2.0 api
# with open('staging_files/affiliations_staging.csv', 'r', encoding='utf-8') as file:
#     csv_data = file.read()


# affiliation_upsert = sf.bulk2.mbfc__Placement__c.upsert('staging_files/affiliations_staging.csv', external_id_field='Archdpdx_Migration_Id__c', encode='utf-8')

In [244]:
# Upsert Salesforce records
# FIXME: Encoding is getting messed up and I'm unsure how to pass in a parameter that will fix this. 
affiliation_upsert = sf.bulk.mbfc__Placement__c.upsert(data=bulk_data, external_id_field='Archdpdx_Migration_Id__c', batch_size=1000, use_serial=False)
affiliation_upsert_results = pd.DataFrame(affiliation_upsert)

affiliation_upsert_results

Unnamed: 0,success,created,id,errors
0,False,False,,"[{'statusCode': 'INVALID_FIELD', 'message': 'U..."
1,False,False,,"[{'statusCode': 'INVALID_FIELD', 'message': 'J..."
2,False,True,,[{'statusCode': 'FIELD_CUSTOM_VALIDATION_EXCEP...
3,False,False,,"[{'statusCode': 'INVALID_FIELD', 'message': 'J..."
4,False,False,,"[{'statusCode': 'INVALID_FIELD', 'message': 'J..."
...,...,...,...,...
7795,False,False,,"[{'statusCode': 'INVALID_FIELD', 'message': 'J..."
7796,False,True,,[{'statusCode': 'FIELD_CUSTOM_VALIDATION_EXCEP...
7797,False,False,,"[{'statusCode': 'INVALID_FIELD', 'message': 'J..."
7798,False,False,,"[{'statusCode': 'INVALID_FIELD', 'message': 'J..."


In [245]:
affiliation_upsert_results.to_csv('results_files/affiliation_upsert_results')


# Post-Migration Manual Updates

1. Convert 'Offices' that are ADPDX Pastoral Centre offices into record type: 'Groups', and set their parentID to the Diocese (there are just 6 of these accounts).
1. Update the Religous Order records 'Religious Superior' lookup.
1. Set 'organization type' field value for each account in the 'organization' load: Offices, Newman Centres, Schools, Organizations
