Capabilities

Seema Guggari edited this page Oct 25, 2018 · 61 revisions

This document outlines a variety of different capabilities that the platform currently supports.

Datasets

Microsoft Graph data connect supports the following data sets today. To view the schemas for each data set, create a new data set in ADF and use the Schema tab to view it. More detailed steps are found in the UX walk through at the bottom of Run Azure Data Factory pipelines to copy Office 365 Data.

If you would like to see additional data sets supported, please let us know.

Data set name Description
BasicDataSet_v0.Contact_v0 This table contains contact information from each user's address book. This corresponds to Outlook personal contacts
BasicDataSet_v0.Event_v0 This table contains the events in each user's calendar. This corresponds to events in the Outlook calendar API
BasicDataSet_v0.Message_v0 This table contains the message in each user's mailbox. This corresponds to messages in the Outlook mail API
BasicDataSet_v0.SentItem_v0 This table contains the messages sent from each user's mailbox. This corresponds to messages in the Outlook mail API
BasicDataSet_v0.User_v0 This table contains user information (DisplayName, UserPrincipalName, etc). This corresponds to users in the Outlook user API
BasicDataSet_v0.MailboxSettings_v0 This table contains the mailbox settings of each user. This corresponds to Outlook Mailbox Settings API
BasicDataSet_v0.MailFolder_v0 This table contains the mail folders from each user's mailbox. This corresponds to the Outlook mail folder API

To play around further with the Graph API, check the Graph Explorer

ADF Sinks

The following Sinks are supported for a copy activity with Office 365 as the source:

  • Azure Data Lake Storage Gen 1
  • Azure Storage Blob
  • Azure Data Lake Storage Gen 2

Refer here for samples of each sink type.

The output files will be of format Json lines. The output format is fixed and there is no support for modifying the format of the output.

Policies

The following Azure policies are supported for an Azure Managed Application built using Office 365 data:

Inclusion of the following policies is coming soon:

When you select any of the policies during Azure marketplace publishing, the policy compliance status will be checked and enforced for all installations of your application. All selected policies that are compliant will be shown to the data approvers as part of the data request. Any policy compliance violation would cause the pipeline run to fail and stop the data movement.

User Selection

In the SourceLinkedService, you can specify a list of groups to reduce the scope of the pipeline from your entire organization to only the members of the groups. To do so, add a new field allowedGroups under typeProperties and set this to a list of up to 10 groups' Object Ids. To see how to create a group, refer to Create a group and add members in Azure Active Directory.

If no groups are specified by default data will be extracted for the entire organization.

Example of a SourceLinkedService with two groups:

{
    "type": "linkedservices",
    "name": "[variables('sourceLinkedServiceName')]",
    "comments": "Creates the link to O365 which is used as the source of the data extraction. Using service principal supplied by the source ADLS owner.",
    "dependsOn": [
        "[variables('dataFactoryName')]"
    ],
    "apiVersion": "2017-09-01-preview",
    "properties": {
        "type": "Office365",
        "description": "Source O365 linked service",
        "typeProperties": {
            "office365tenantId": "[subscription().tenantId]",
            "PrivacyPolicyUri": "http://www.wkw.com/privacy",
            "TermsOfUseUri": "http://www.wkw.com/tos",
            "servicePrincipalId": "[variables('sourceLinkedServicePrincipalId')]",
            "servicePrincipalKey": {
                "type": "SecureString",
                "value": "[variables('sourceLinkedServicePrincipalKey')]"
            },
            "servicePrincipalTenantId": "[variables('sourceLinkedServicePrincipalTenantId')]",
            "allowedGroups": ["a61c9441-7298-4d30-9847-ef8ff8482083","ee018a32-8e5d-4c4e-a3b4-4788a4a2582c"]
        }
    }
}

Filters

A filter or predicate expression that evaluates to True or False can be used to filter out rows of data extracted from the Office 365 dataset.

Currently, predicate based filtering is supported on the following columns of Office 365 datasets:

Dataset Name Columns that support filtering Default Range
BasicDataSet_v0.Contact_v0 CreatedDateTime
LastModifiedDateTime
now - 30d < CreatedDateTime < now
BasicDataSet_v0.Event_v0 CreatedDateTime
LastModifiedDateTime
now - 30d < CreatedDateTime < now
BasicDataSet_v0.Message_v0 CreatedDateTime
LastModifiedDateTime
ReceivedDateTime
SentDateTime
now - 30d < ReceivedDateTime < now
BasicDataSet_v0.SentItem_v0 CreatedDateTime
LastModifiedDateTime
ReceivedDateTime
SentDateTime
now - 30d < SentDateTime < now

If no predicate is specified, by default the last 30 days of data is fetched.

The predicate expression specified in the ARM template must follow these rules:

  • There should be a maximum of two predicates per expression, separated by AND, OR, && or || operator and each must evaluate to a boolean value
  • Each predicate should be in the format <Filterable column of table> <Predicate operator> <Datetime value>
  • Filterable column of table - This should be one of the filterable columns of the table that needs to be extracted. Please refer to the table above to see the list of filterable columns for each supported table.
  • Predicate operator - This can be one of the following: <, >, <=, >=, !=, gt, lt, ge, le, ne
  • DateTime value - This is the value of the columns that the rows will be filtered on. The following datetime patterns are supported
    • Short date pattern, example: 2016/10/28
    • General date/time pattern (long time), example: 6/15/2009 1:45:30 PM
    • Roundtrip date/time pattern, examples: 2018-06-20T00:00:27Z, 2018-06-20, 2017-10-28T08:50:27, 2017-10-28T08:50:27.000, 2017-10-28T08:50:27.0000000+03:00
  • Examples of valid predicates for BasicDataSet_v0.Event_v0 dataset:
    • CreatedDateTime > 2016-10-28T15:50:27 || LastModifiedDateTime < 2018-06-20T08:50:27.000
    • CreatedDateTime gt 2016/10/28 10:50:27 AM OR CreatedDateTime le 2018-06-20
    • CreatedDateTime != 2016/10/28 AND LastModifiedDateTime ne 2018-06-20T08:50:27.0000000+03:00
    • CreatedDateTime >= 2016/10/28 && LastModifiedDateTime lt 2018-06-20T13:45:27.0000000-07:00
    • CreatedDateTime lt 2018-06-20
  • Expressions and functions can also appear inside predicate, using a feature called string interpolation where expressions are wrapped in @{ ... }, for details refer to Expressions and functions in Azure Data Factory:
    • CreatedDateTime >= @{formatDateTime(adddays(utcnow(), -5), 'yyyy-MM-dd')}

In the InputDataset, you can add a predicate field under typeProperties of Office365Table source dataset. Here's an example of the Office365Table source dataset with a predicate:

{
    "type": "datasets",
    "name": "[variables('inputEventDatasetName')]",
    "dependsOn": [
        "[variables('dataFactoryName')]",
        "[variables('sourceLinkedServiceName')]"
    ],
    "apiVersion": "2017-09-01-preview",
    "properties": {
        "published": false,
        "type": "Office365Table",
        "linkedServiceName": {
            "referenceName": "[variables('sourceLinkedServiceName')]",
            "type": "LinkedServiceReference"
        },
        "structure": [
            {
                "name": "Subject",
                "type": "string"
            },
            {
                "name": "HasAttachments",
                "type": "boolean"
            },
            {
                "name": "End",
                "type": "string"
            },
            {
                "name": "Start",
                "type": "string"
            },
            {
                "name": "ResponseStatus",
                "type": "string"
            },
            {
                "name": "Organizer",
                "type": "string"
            },
            {
                "name": "Attendees",
                "type": "string"
            },
            {
                "name": "Importance",
                "type": "string"
            },
            {
                "name": "Sensitivity",
                "type": "string"
            },
            {
                "name": "Recurrence",
                "type": "string"
            },
            {
                "name": "WebLink",
                "type": "string"
            }
        ],
        "typeProperties": {
            "tableName": "BasicDataSet_v0.Event_v0",
            "predicate": "CreatedDateTime >= 2016/10/28 OR CreatedDateTime <= 2018-06-20"
        },
        "external": true
    }
}

Data Regions

Every data factory has a Azure Integration runtime that is auto resolved as the default behavior unless an Azure IR is explicitly defined. For the auto resolved Azure IR, the region will be location of the sink and if not detectable it will be the region of the data factory. Refer here to understand more details of how ADF computes the data movement region.

For connecting to Office 365, the data movement region determined above should be the same region where the Office365Tenant users' mailbox is located. Users' mailbox that are not located in the computed data movement region are dropped during extraction. In the event that none of the users' mailbox matches the azure data movement region, the pipeline will fail. Below is a mapping between the Office Region and the Azure Regions.

Office Region Azure Region
North America EAST US 2
EAST US
West Central US
West US
West US 2
Europe West Europe
North Europe
Asia-Pacific Southeast Asia
Australia *Australia East
*Australia Southeast

[*] - These regions only support blob as destination sink

Integration with Privileged Access Management

Overview

Pipelines that copy data from Office 365 into Azure must be approved by a member of a data access request approver group. Refer here on how to set up the data access approver group. Approvals persist for 6 months, and one approval is needed per copy activity in the pipeline.

Approval request emails will be sent to the each member of the approver group to notify them when copy activities request access to extract Office 365 data. Refer here on how to approve requests. Approvers can approve or deny these requests, or revoke a previously approved request.

NOTE: In the future, approvers will be able to approve, deny, revoke and view all requests through the Office Admin portal.

Every request will always include details about the data set and the users from whom data is being extracted:

  • DataTable: The data set being extracted (e.g Sent Items)
  • Columns: The list of columns being extracted from the data table (e.g Sent DateTime)
  • AllowedGroups: The group(s) of users against whom the pipeline is extracting data. If the list of groups is empty, then the pipeline is requesting access to data from ALL users in the tenant.
  • OutputUri: The output path in which the extracted data will be stored.

There are also some fields in the request that will be available only in some cases:

  • Application Name and the Marketplace URI (available only for Azure Marketplace Applications)
  • Links to the application's privacy policy and terms of service (available only if the application provides it)
  • The compliance policies that the application enforces, such as data encryption at rest in the output storage location. (available only for Azure Marketplace Applications)

Approval Behavior

Approval requests are based on the Azure Data Factory, Pipeline and Copy Activity names. Every copy activity run will verify that the Office 365 admin has approved the copy activity's request to access Office data, and will validate the important parameters of the copy activity run against the parameters of the approval.

Under certain conditions, a new approval request will automatically be triggered. A Office 365 Data Access approver will have to approve the new request before the copy activity can access Office 365 data.

  • If the data factory, pipeline or copy activity names change, then a new approval request will be triggered.
  • If the parameters of the copy activity run changes, then a new approval request will be triggered.
    For example: A new approval will be required if the data table or set of columns that the copy activity is accessing changes.
  • Copy activities will have to be approved once every 6 months. If the original approval was approved 6 months ago, then a new approval request will automatically be triggered.

If an Office 365 Data Access approver has denied an approval request or revoked a previously approved request, then the copy activity will fail continually. You should work with the approver to understand the reason for the denial or revocation and fix the parameters of the copy activity accordingly. A new copy activity will have to deployed, or the name of the existing copy activity will have to be changed in order to trigger a new approval request for approval.

An approval request will expire in 24 hours unless a Office 365 data access approver acts on the request. A new request will be submitted once every 24 hours for approval. If you see your copy activity waiting for approval (in the Consent Pending stage), then work with Office 365 data access approvers to get your request approved.

You can’t perform that action at this time.
You signed in with another tab or window. Reload to refresh your session. You signed out in another tab or window. Reload to refresh your session.
Press h to open a hovercard with more details.