# Windows Metadata Structure and Value Issues

This notebook shows a few examples of the varience that occurs and encumbers parsing windows metadata extracted and serialised via `Get-EventMetadata.ps1` into the file `.\Extracted\EventMetadata.json.zip`.

Below is the number of records in my sample metadata extract.

In [24]:

import os, zipfile, json, pandas as pd
if 'Windows Event Metadata' not in os.getcwd():
  os.chdir('Windows Event Metadata')
json_import = json.load(zipfile.ZipFile('./Extracted/EventMetadata.json.zip', 'r').open('EventMetadata.json'))
df = pd.json_normalize(json_import)
n_records = len(df)
n_records

1258



## Null vs empty lists for Keywords, Tasks, Opcodes and Levels

It's very common for some of the provider or message structure to not be used, e.g. Keywords. How these unused or undefined values are handled is highly inconsistent. Windows provider metadata has at least 3 variations for undefined metadata:

- Null value
- Empty list
- List which may contain a null value

## Null values

Keyword nodes for Providers can have null values or empty lists. As another example, the Keyword metadata for 'Microsoft-Windows-EtwCollector' is serialised as:

```json
{
  {
    "Name": "Microsoft-Windows-EtwCollector",
    "Id": "9e5f9046-43c6-4f62-ba13-7b19896253ff",
    "MessageFilePath": "C:\\WINDOWS\\system32\\ieetwcollectorres.dll",
    "ResourceFilePath": "C:\\WINDOWS\\system32\\ieetwcollectorres.dll",
    "ParameterFilePath": null,
    "HelpLink": null,
    "DisplayName": null,
    "LogLinks": [],
    "Levels": null,
    "Opcodes": null,
    "Keywords": null,
    "Tasks": null,
    "Events": null,
    "ProviderName": "Microsoft-Windows-EtwCollector"
  }
}
```

The listed summation results below per column label show a handful of Providers didn't use lists for Keywords, Tasks, and Opcodes, but instead were simply Null. E.g. 21 Providers had Null for Keywords.


In [25]:

df.isnull().sum()

Name                    0
Id                      0
MessageFilePath        13
ResourceFilePath      311
ParameterFilePath    1141
HelpLink               67
DisplayName           580
LogLinks                0
Levels                 21
Opcodes                21
Keywords               21
Tasks                  25
Events                 23
ProviderName            0
dtype: int64

## Empty lists

For providers, quite often empty lists indicate no keywords are defined. E.g. note the `"Keywords": []` for the Powershell provider (JSON object truncated for brevity).

```json
{
  "Name": "PowerShell",
  "Id": "00000000-0000-0000-0000-000000000000",
  "MessageFilePath": "C:\\WINDOWS\\system32\\WindowsPowerShell\\v1.0\\pwrshmsg.dll",
  "ResourceFilePath": null,
  "ParameterFilePath": null,
  "HelpLink": "https://go.microsoft.com/fwlink/events.asp?CoName=Microsoft%20Corporation&ProdName=Microsoft%c2%ae%20Windows%c2%ae%20Operating%20System&ProdVer=10.0.18362.1&FileName=pwrshmsg.dll&FileVer=10.0.18362.1",
  "DisplayName": null,
  "LogLinks": [
    {
      "LogName": "Windows PowerShell",
      "IsImported": true,
      "DisplayName": null
    }
  ],
  "Levels": [],
  "Opcodes": [],
  "Keywords": [],
  "Tasks": [
    {
      "Name": "Engine Health\r\n",
      "Value": 1,
      "DisplayName": "Engine Health",
      "EventGuid": "00000000-0000-0000-0000-000000000000"
    },
    {
      "Name": "Command Health\r\n",
      "Value": 2,
      "DisplayName": "Command Health",
      "EventGuid": "00000000-0000-0000-0000-000000000000"
    }
  ]
}
```

E.g. Overall there were 684 empty list values in Keywords.

In [26]:
empty_counts = {}
for c in ['Keywords', 'Tasks', 'Opcodes', 'Levels']:
  empty_counts.update(
    {c: len(df[df[c].apply(lambda i: isinstance(i, list) and len(i) == 0)])}
  )
empty_counts

{'Keywords': 684, 'Tasks': 591, 'Opcodes': 580, 'Levels': 365}



## Null values in Keyword lists

### Event Keywords

For the Event metadata level, keywords can be defined as an empty list, but more often, they are serialised as a list usually with a null item regardless of how many other valid keywords are defined.

Keywords at the Provider metadata level don't seem to have nullfied name values (both 'DisplayName' and 'Name').

In [27]:
df_e = pd.json_normalize(json_import, record_path='Events', meta_prefix='Provider.', meta=['Id', 'Name'])
len(df_e)

53822

Sometimes Keywords at the Event metadata level are empty lists, but not often. Only ~1200 used a null value.

In [28]:
len(df_e[df_e['Keywords'].apply(lambda i: isinstance(i, list) and len(i) == 0)])

1228

As a sample of events using an empty keyword list object.

In [29]:
df_e[df_e['Keywords'].apply(lambda i: isinstance(i, list) and len(i) == 0)].head()

Unnamed: 0,Id,Version,Keywords,Template,Description,LogLink.LogName,LogLink.IsImported,LogLink.DisplayName,Level.Name,Level.Value,Level.DisplayName,Opcode.Name,Opcode.Value,Opcode.DisplayName,Task.Name,Task.Value,Task.DisplayName,Task.EventGuid,Provider.Id,Provider.Name
62,0,0,[],"<template xmlns=""http://schemas.microsoft.com/win/2004/08/events"">\r\n <data name=""stringPtr"" i...",%1,,False,,win:LogAlways,0,Log Always,,0,,,0,,00000000-0000-0000-0000-000000000000,fdc7b3f9-eb64-4043-9d47-bf2b7457baa6,EsifLfEtwProvider
1757,702,0,[],"<template xmlns=""http://schemas.microsoft.com/win/2004/08/events"">\r\n <data name=""EventType"" i...","ChatStoreChanged event: event type [%1], item type [%2]",,False,,win:Verbose,5,Verbose,,0,,,0,,00000000-0000-0000-0000-000000000000,fb19ee2c-0d22-4a2e-969e-dd41ae0ce1a9,Microsoft-Windows-UserDataAccess-UserDataService
1764,713,0,[],"<template xmlns=""http://schemas.microsoft.com/win/2004/08/events"">\r\n <data name=""RcsChatId"" i...","ComposingStatusChanged event: source [RCS], chat id [%1], is group [%2], teluri [%3], is composi...",,False,,win:Verbose,5,Verbose,,0,,,0,,00000000-0000-0000-0000-000000000000,fb19ee2c-0d22-4a2e-969e-dd41ae0ce1a9,Microsoft-Windows-UserDataAccess-UserDataService
1769,718,0,[],"<template xmlns=""http://schemas.microsoft.com/win/2004/08/events"">\r\n <data name=""serviceType""...","RcsServiceStatusChanged event: service type [%1], is supported [%2]",,False,,win:Verbose,5,Verbose,,0,,,0,,00000000-0000-0000-0000-000000000000,fb19ee2c-0d22-4a2e-969e-dd41ae0ce1a9,Microsoft-Windows-UserDataAccess-UserDataService
1771,720,0,[],,Rcs service initialization started,,False,,win:Informational,4,Information,,0,,,0,,00000000-0000-0000-0000-000000000000,fb19ee2c-0d22-4a2e-969e-dd41ae0ce1a9,Microsoft-Windows-UserDataAccess-UserDataService


Most Keywords at the Event metadata level do seem to have at least one item with both 'DisplayName' and 'Name' as null.

In [30]:
def has_null_names(o):
  if isinstance(o, list):
    for i in o:
      if i['Name'] == None and i['DisplayName'] == None:
        return True
  elif isinstance(o, dict):
    return i['Name'] == None and i['DisplayName'] == None
  return False

len(df_e[df_e['Keywords'].apply(has_null_names)])


43516

And as a sample of the dual null keyword names

In [31]:
pd.options.display.max_colwidth = 100
df_e[df_e['Keywords'].apply(has_null_names)][['Id','Keywords','Description','LogLink.LogName','Provider.Name']].head()

Unnamed: 0,Id,Keywords,Description,LogLink.LogName,Provider.Name
0,55,"[{'Name': None, 'Value': -9223372036854775808, 'DisplayName': None}]",A corruption was discovered in the file system structure on volume %1.\r\n\r\n%8,System,Ntfs
1,130,"[{'Name': None, 'Value': -9223372036854775808, 'DisplayName': None}]",The file system structure on volume %2 has now been repaired.,System,Ntfs
2,131,"[{'Name': None, 'Value': -9223372036854775808, 'DisplayName': None}]",The file system structure on volume %2 cannot be corrected.\r\nPlease run the chkdsk utility on ...,System,Ntfs
3,132,"[{'Name': None, 'Value': -9223372036854775808, 'DisplayName': None}]",Too many repair events have occurred in a short period of time.\r\nTemporarily suspending postin...,System,Ntfs
4,133,"[{'Name': None, 'Value': -9223372036854775808, 'DisplayName': None}]",Skipped posting of %1 repair events. Repair event posting will now be resumed.\r\n Here are the...,System,Ntfs


With over 40,000 having the nullified keyword name present, it be interesting to observe the events that dont. E.g. for Keywords:

In [32]:
pandas.reset_option('display.max_colwidth')
df_e[df_e['Keywords'].apply(lambda k: not has_null_names(k))].head()

NameError: name 'pandas' is not defined

Unlike Keywords, Task, Opcode and Level objects were already flattened by `json_normalize()` into lables (as these are not nested in a list like Keywords). E.g a sample of nullified tasks.

In [33]:
df_e[df_e['Task.Name'].isnull() & df_e['Task.DisplayName'].isnull()][['Id','Task.Value','Task.Name','Task.DisplayName','Description','LogLink.LogName','Provider.Name']].head()

Unnamed: 0,Id,Task.Value,Task.Name,Task.DisplayName,Description,LogLink.LogName,Provider.Name
0,55,0,,,A corruption was discovered in the file system structure on volume %1.\r\n\r\n%8,System,Ntfs
1,130,0,,,The file system structure on volume %2 has now been repaired.,System,Ntfs
2,131,0,,,The file system structure on volume %2 cannot be corrected.\r\nPlease run the chkdsk utility on ...,System,Ntfs
3,132,0,,,Too many repair events have occurred in a short period of time.\r\nTemporarily suspending postin...,System,Ntfs
4,133,0,,,Skipped posting of %1 repair events. Repair event posting will now be resumed.\r\n Here are the...,System,Ntfs


The nullified names for Tasks, Opcodes and Levels counted.

In [34]:
display_name_and_name_null_count = {}
for c in ['Level', 'Task', 'Opcode']:
  display_name_and_name_null_count.update(
    {c: len(df_e[df_e[f'{c}.Name'].isnull() & df_e[f'{c}.DisplayName'].isnull()])}
  )
display_name_and_name_null_count

{'Level': 3863, 'Task': 15626, 'Opcode': 10938}

So while not being lists, the Task, Opcode and Level metadata for events is often nullfied. Even 3863 event ID had no level defined.

### Provider Keywords, Tasks, Opcodes and Levels

However, the Keyword metadata for Providers doens't include the nullfied name items like seen in the Event metadata.

In [35]:
has_null_names_in_list_counts = {}
for c in ['Keywords', 'Tasks', 'Opcodes', 'Levels']:
  has_null_names_in_list_counts.update(
    {c: len(df[df[c].apply(has_null_names)])}
  )
has_null_names_in_list_counts

{'Keywords': 0, 'Tasks': 0, 'Opcodes': 0, 'Levels': 0}

## Conclusion

Undefined Keywords, Tasks, Opcodes and Levels have widely divergent data structures. Sometimes it's a simple Null value and other times an empty list. But the metadata level of Provider vs Event also affects the structure used. Keyword lists are particularly awkward and often include special nullified value with a null 'DisplayName' and 'Names'. This nullfied value seems to be unecessarily included along with non-null defined keywords in the list.