Skip to content

Latest commit

 

History

History
167 lines (111 loc) · 16.2 KB

README_SFDC.md

File metadata and controls

167 lines (111 loc) · 16.2 KB

Integration options for Salesforce Sales Cloud

Deployment configuration for Salesforce Sales Cloud

Parameter Meaning Default Value Description
SFDC.deployCDC Deploy CDC true Generate CDC processing scripts to run as DAGs in Cloud Composer. See the documentation for different ingestion options for Salesforce Sales Cloud.
SFDC.createMappingViews Create mapping views true The provided DAGs to fetch new records from the Salesforce APIs update records on landing. This value set to true will generate views in the CDC processed dataset to surface tables with the "latest version of the truth" from the Raw dataset. If false and SFDC.deployCDC is true, DAGs will be generated with change data capture processing based on SystemModstamp. See details on CDC processing for Salesforce.
SFDC.createPlaceholders Create Placeholders true Create empty placeholder tables in case they are not generated by the ingestion process to allow the downstream reporting deployment to execute without failure.
SFDC.datasets.raw Raw landing dataset - Used by the CDC process, this is where the replication tool lands the data from Salesforce. If using test data, create an empty dataset.
SFDC.datasets.cdc CDC Processed Dataset - Dataset that works as a source for the reporting views, and target for the records processed DAGs. If using test data, create an empty dataset.
SFDC.datasets.reporting Reporting Dataset SFDC "REPORTING_SFDC" Name of the dataset that is accessible to end users for reporting, where views and user-facing tables are deployed.

Loading Salesforce Sales Cloud data into BigQuery

We provide a replication solution based on Python scripts scheduled in Apache Airflow and Salesforce Bulk API 2.0. These Python scripts can be adapted and scheduled in your tool of choice. See section below for detailed set up instructions.

There are three sets of processing options for data integration:

  • API call and load into Raw datasets, updating existing records if needed
  • Source-to-target structure mapping views
  • CDC processing scripts

If you have datasets already loaded through a different tool in append-always mode, the CDC processing scripts contain mapping files to map the schema of the tables as generated by your tool into the names and data types of the structure required by the reporting views in Cortex Data Foundation. You can also add custom fields in the schema definition so they are incorporated in the CDC processing.

Note: For CDC scripts to work, the Id for each API (e.g., Account Id) and the SystemModStamp need to be present in the source table. These fields should either have their original name (Id, SystemModstamp) or be mapped respectively to the {object_name} Id and SystemModstamp.

For example, the source table with data of Account object should have original Id and SystemModstamp fields. If these fields have different names, then src/SFDC/src/table_schema/accounts.csv file must be updated with id field's name mapped to AccountId and whatever system modification timestamp field mapped to SystemModstamp.

If you already have the replication and CDC working for Salesforce APIs and only need the mapping, you can edit the mapping files to generate views that translate the structure generated by the integration tool to the structure expected by Cortex Data Foundation reporting views.

Salesforce Sales Cloud data requirements

  • The structure of the source tables follows snake_case naming in plural, i.e., some_objects. The columns have the same data types as how Salesforce represents them internally. Some fields have been renamed for better readability in the reporting layer.
  • Any required tables that did not exist within the raw dataset will be created as empty tables during deployment. This is to ensure the CDC deployment step runs correctly.
  • If required, for CDC scripts to work, the Id for each API (e.g., Account Id) and the SystemModStamp need to be present in the source table. The provided Raw processing scripts fetch these fields automatically from the APIs and update the target replication table.
  • The provided Raw processing scripts do not require additional change data capture processing. This behavior is set during deployment by default.

Source tables for Currency Conversion in Salesforce Sales Cloud

The currency conversion functionality of Salesforce relies on the existence of the objects CurrencyTypes and DatedConversionRates within the source Salesforce system, which are available only if Advanced Currency Management is enabled. If not, you may want to remove relevant entries from src/SFDC/config/ingestion_settings.yaml to avoid running into errors during Salesforce to Raw extraction.

If these tables are not available, we will automatically create empty placeholder tables for them during deployment to avoid break Reporting logic.

Configure Salesforce Sales Cloud integration with Cortex ingestion templates

Configure API integration and CDC for Salesforce Sales Cloud

Following a principle of openness, customers are free to use the provided replication scripts for Salesforce or a data replication tool of their choice as long as data meets the same structure and level of aggregation as provided by the Salesforce APIs. If you are using another tool for replication, this tool can either append updates as new records (append always pattern) or update existing records with changes when landing the data in BigQuery. If the tool does not update the records and replicates any changes as new records into a target (Raw) table, Cortex Data Foundation provides the option to create change-data-capture processing scripts.

To ensure the names of tables, names of fields, and data types are consistent with the structures expected by Cortex regardless of the replication tool, you can modify the mapping configuration to map your replication tool or existing schemata. This will generate mapping views compatible with the structure expected by Cortex Data Foundation.

Three options depending on replication tool

You can use the configuration in ingestion_settings.yaml to configure the generation of scripts to call the salesforce APIs and replicate the data into the Raw dataset (section salesforce_to_raw_tables) and the generation of scripts to process changes incoming into the Raw dataset and into the CDC processed dataset (section raw_to_cdc_tables).

By default, the scripts provided to read from APIs update changes into the Raw dataset, so CDC processing scripts are not required, and mapping views to align the source schema to the expected schema are created instead.

The generation of CDC processing scripts is not executed if SFDC.createMappingViews in the config.json file remains true (default behavior). If CDC scripts are required, set SFDC.createMappingViews to false. This second step also allows for mapping between the source schemata into the required schemata as required by Cortex Data Foundation.

The following example of a setting.yaml configuration file illustrates the generation of mapping views when a replication tool updates the data directly into the replicated dataset, as illustrated in option 3 (i.e., no CDC is required, only re-mapping of tables and field names). Since no CDC is required, this option executes as long as the parameter SFDC.createMappingViews in the config.json file remains true.

ingestion_settings.yaml example

In this example, removing the configuration for a base table or all of them from the sections will skip the generation of DAGs of that base table or the entire section, as illustrated for salesforce_to_raw_tables. For this scenario, setting the parameter deployCDC : False has the same effect, as no CDC processing scripts need to be generated.

The following example illustrates the mapping of the field unicornId as landed by a replication tool to the name and type expected by Cortex Data Foundation, AccountId as a String.

Only remap

Polymorphic field type names (e.g. Who.Type) can be replicated by adding a [Field Name]_Type item in the respective mapping CSV files. For example, if you need Who.Type field of Task object to be replicated, add Who_Type,Who_Type,STRING line to src/SFDC/src/table_schema/tasks.csv.

Make any changes to the DAG templates for CDC or for Raw as required by your instance of Airflow or Cloud Composer. You will find more information in the Appendix - Gathering Cloud Composer settings.

If you do not need any DAGs for Raw data generation from API calls or CDC processing, set parameter deployCDC to false. Alternatively, you can remove the contents of the sections in ingestion_settings.yaml. If data structures are known to be consistent with those expected by Cortex Data Foundation, you can skip the generation of mapping views with parameter SFDC.createMappingViews set to false.

Configuring Salesforce to BigQuery extraction module

These are the generic steps to use the Salesforce to BigQuery extraction module provided by Data Foundation. Your requirements and flow may vary depending on your system and existing configuration. You can alternatively use other available tools.

Setting up required credentials & Connected App in Salesforce

You need to login as an administrator to your Salesforce instance to complete the following:

  1. Create or identify a profile in Salesforce that
    • Is granted permission for Apex REST Services & API Enabled under System Permissions.
    • Is granted View All permission for all objects that you would like to replicate. For example, Account, Cases, etc. Check for restrictions or issues with your security administrator.
    • Is ideally not granted any permissions related to user interface login like Salesforce Anywhere in Lightning Experience,Salesforce Anywhere on Mobile,Lightning Experience User,Lightning Login User & etc. Check for restrictions or issues with your security administrator.
  2. Create or use identify existing user in Salesforce. You need to know the user's user name, password, and security token.
    • This should ideally be a user dedicated to execute this replication.
    • The user should be assigned to the profile you have created or identified in Step 1.
    • You can see User Name and reset Password here.
    • You can reset the security token if you do not have it and it is not used by another process.
  3. Create a Connected App. It will be the only communication channel to establish connection to Salesforce from the external world with the help of profile, Salesforce API, standard user credentials and its security token.
    • Follow the instructions to enable OAuth Settings for API Integration.
    • Make sure Require Secret for Web Server Flow and Require Secret for Refresh Token Flow are enabled in API (Enabled OAuth Settings) section.
    • See the documentation on how to get your consumer key (which will be later used as your Client ID). Check with your security administrator for issues or restrictions.
  4. Assign your Connected App created in Step 3 to the profile created in Step 1.
    • Select Setup from the top right of the Salesforce home screen.
    • In the Quick Find box, enter profile, then select Profile. Search for the profile created / identified in Step 1.
    • Open the profile.
    • Click the Assigned Connected Apps link.
    • Click Edit.
    • Add the newly created Connected App from Step 3.
    • Click on the Save button.

Note down User Name, Password, Secret Token and Client ID from steps above.

Setting up Google Cloud Secret Manager

The Salesforce-to-BigQuery module relies on Google Cloud Secret Manager to work. This process is thoroughly documented in the documentation for Cloud Composer

Please create a secret as follows:

Secret Name:

airflow-connections-salesforce-conn

Secret Value:

http://<username>:<password>@https%3A%2F%2F<instance-name>.lightning.force.com?client_id=<client_id>&security_token=<secret-token>

Where User Name, Password, Secret Token and Client ID were noted from the steps above.

See the documentation on how to find your Instance Name.

Cloud Composer libraries for Salesforce replication

To execute the Python scripts in the DAGs provided by the Data Foundation, you need to install some dependencies.

For Airflow version 1.10, follow the documentation to install the following packages, in order:

tableauserverclient==0.17
apache-airflow-backport-providers-salesforce==2021.3.3

For Airflow version 2.x, follow the documentation to install apache-airflow-providers-salesforce~=5.2.0.

Here is a command to install each required package:

$ gcloud composer environments update <ENVIRONMENT_NAME> \
    --location <LOCATION> \
     --update-pypi-package <PACKAGE_NAME><EXTRAS_AND_VERSION>

For example,

$ gcloud composer environments update my-composer-instance \
    --location us-central1 \
     --update-pypi-package apache-airflow-backport-providers-salesforce>=2021.3.3

Enable Secret Manager as a backend

Enable Google Secret Manager as the security backend. See details here.

Allow the Composer service account to access secrets

Make sure your Composer service account (default: GCE service account) has Secret Manager Secret Accessor permission. See details in the access control documentation.

BigQuery connection in Airflow

Make sure to create the connection sfdc_cdc_bq according to instructions.