New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adding GDPR annotations #34997

Merged
merged 39 commits into from Sep 26, 2017

Conversation

Projects
None yet
1 participant
@kieferrm
Contributor

kieferrm commented Sep 25, 2017

This PR adds comments that describe our telemetry events. For each property of each event it adds what kind of data they contain and for what reason the data is collected.

How the annotations work

Let's assume we send the following event and the timer data is dynamic, i.e. the properties of timer can not known statically:

logEvent('E1', {
    E1P1: 'activitybar',
    ...f1,
    ...f4,
    timer: {
        waited: 536,
        processing: 43,
        queued: 97,
        elasped: 812
    }
});

The event is constructed in multiple steps, normally scattered across several files:

function logEvent(eventName, eventData) {
    eventData.CP1 = getSQMUserId();
    service.sendEvent(eventName, eventData);
}

let f1 : F1 = { F1P1 : 23 };

let f2 : F2 = { F2P1 : document.getLine(1) };
 
let f3 : F3 = { F3P1: publisher.displayName };

let f4 : F4 = {
    F4P1: extension.extensionName,
    F4P2: {
        ...f2,
        ...f3
    }
};

logEvent('E1', {
    E1P1: 'activitybar',
    ...f1,
    ...f4,
    timer: {
        waited: 536,
        processing: 43,
        queued: 97,
        elasped: 812
    }
});

In order to extract the event descriptions with simple scanners we use specific comments to describe telemetry events and their properties. We place those comments as close as possible to where the events and properties are generated. The code above would be annotated as follows:

// __GDPR__COMMON__ "CP1" : { "endPoint": "SqmUserId", "classification": "EndUserPseudonymizedInformation", "purpose": "BusinessInsight" }
function logEvent(eventName, eventData) {
    eventData.CP1 = getSQMUserId();
    service.sendEvent(eventName, eventData);
}

/* __GDPR__FRAGMENT__
   "F1" : {
      "F1P1": { "classification": "SystemMetaData", "purpose": "FeatureInsight" }
   }
 */
let f1 : F1 = { F1P1 : 23 };

/* __GDPR__FRAGMENT__
   "F2" : {
      "F2P1" : { "classification": "CustomerContent", "purpose": "PerformanceAndHealth" }
   }
 */
let f2 : F2 = { F2P1 : document.getLine(1) };
 
/* __GDPR__FRAGMENT__
   "F3" : {
      "F3P1" : { "classification": "PublicPersonalData", "purpose": "FeatureInsight" }
   }
 */
let f3 : F3 = { F3P1: publisher.displayName };

/* __GDPR__FRAGMENT__
   "F4" : {
      "F4P1" : { "classification": "PublicNonPersonalData", "purpose": "FeatureInsight" },
      "F4P2": { 
          "${inline}": [
              "${F2}",
              "${F3}"
            ] 
        }
   }
 */
let f4 : F4 = {
    F4P1: extension.extensionName,
    F4P2: {
        ...f2,
        ...f3
    }
};

/* __GDPR__
   "E1" : {
      "E1P1" : { "classification": "SystemMetaData", "purpose": "FeatureInsight" },
      "${include}": [ 
          "${F1}",
          "${F4}"
        ],
      "${wildcard}": [
         {
            "${prefix}": "timer.",
            "${classification}": { "classification": "SystemMetaData", "purpose": "FeatureInsight" }
         }
      ]
   }
 */
logEvent('E1', {
    E1P1: 'activitybar',
    ...f1,
    ...f4,
    timer: {
        waited: 536,
        processing: 43,
        queued: 97,
        elasped: 812
    }
});

The comments are processed and result in the following "final" description of the E1 event. Every property that starts with timer. such as timer.waited is classified as system metadata that is collected for gaining insights into how the feature is being used.

   "E1" : {
      "E1P1" : { "classification": "SystemMetaData", "purpose": "FeatureInsight" },
      "F1P1": { "classification": "SystemMetaData", "purpose": "FeatureInsight" },
      "F4P4" : { "classification": "PublicNonPersonalData", "purpose": "FeatureInsight" },
      "F4P2.F2P1" : { "classification": "CustomerContent", "purpose": "PerformanceAndHealth" },
      "F4P2.F3P1" : { "classification": "PublicPersonalData", "purpose": "FeatureInsight" },
      "${wildcard}": [
         {
            "${prefix}": "timer.",
            "${classification}": { "classification": "SystemMetaData", "purpose": "FeatureInsight" }
         }
      ],
      "CP1" : { "endPoint": "SqmUserId", "classification": "EndUserPseudonymizedInformation", "purpose": "BusinessInsight" }
   }

More about the syntax

All GDPR comments are tagged with one of the following tags and are otherwise well-formed JSON.

  • __GDPR__ - describes the name and the properties of a telemetry event
  • __GDPR__FRAGMENT__ - describes the name and the properties of a fragment of the data of an event, fragments are either included or inlined by other fragments or events
  • __GDPR_COMMON__ - describes a property added to every telemetry event

Each property is described with an object that looks like this:

{ 
    endPoint?: "none" | "SqmUserId" | "SqmMachineId",
    classification: "SystemMetaData" | "CustomerContent" | "EndUserPseudonymizedInformation" | "PublicPersonalData" | "PublicNonPersonalData",
    purpose: "FeatureInsight" | "PerformanceAndHealth" | "BusinessInsight" | "SecurityAndAuditing",
    isMeasurement?: Boolean
}

If endPoint is omitted, it defaults to none. That's appropriate for pretty much all properties with the exception of a couple of common properties.

The values for classification are mostly self-explaining. EndUserPseudonymizedInformation is what allows us to identify if two separate actions are performed by the same user, although we don't know who the user is. machineId or instanceId fall in this category. PublicPersonalData and PublicNonPersonalData is information that users provide us with, for example, publisher information on the marketplace. CustomerContent is information the user generated such as urls of repositories or custom snippets. Everything else is SystemMetaData.

purpose is usually FeatureInsight or PerformanceAndHealth. Only events generated by NPS surveys are sent to gain BusinessInsight.

isMeasurement is only used when describing properties that are added as custom measurements to events sent by VS Code extensions.

Special constructs

${include}

If A includes B, this is equivalent to the union of A and B. Fragments are referenced using ${FragmentName}.

${inline}

If A inlines B at property P, all properties of B are added to A under the key "P.<Name in B>". Fragments are referenced using ${FragmentName}.

${wildcard}

Wildcards can be used as a temporary workaround to describe dynamic properties that have a common prefix. In the long run all dynamic properties need to be removed and be sent as values of a static property. A wild card is an array of wildcard entries. Each entry has a ${prefix} and a ${classification} property. The value of ${prefix} is a string representing the common prefix of all properties it matches. The value of ${classification} is a property description detailed above.

@kieferrm kieferrm merged commit 5ebdf73 into Microsoft:master Sep 26, 2017

0 of 2 checks passed

continuous-integration/appveyor/pr Waiting for AppVeyor build to complete
Details
continuous-integration/travis-ci/pr The Travis CI build is in progress
Details

@kieferrm kieferrm referenced this pull request Nov 21, 2017

Merged

Adding GDPR annotations #179

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment