O365beat is an open source log shipper used to fetch Office 365 audit logs from the Office 365 Management Activity API and forward them with all the flexibility and capability provided by the beats platform (specifically, libbeat).
Note: Filebeat officially supports o365 log collection using the o365 module as of version 7.7.0 (source). For most users we expect the best choice is to move to that solution, to ensure the greatest compatibility with the overall Elastic Stack.
Thank you so much to the users, especially those who reached out with issues, including feature requests. We hope this tool added value - if there's anything we can do to help, please contact us. We'll have one more release to encompass recent updates, but will not be adding features. Please continue to help us and the community by opening issues or submitting pull requests if you notice problems in testing or production. We appreciate the feedback!
The easiest way to get started with o365beat is to use the pre-built binaries available in the latest release.
These pre-built packages include configuration files which contain all the necessary credential information to connect to the audit logs for your tenancy. The default configuration file (o365beat.yml
) pulls this information from your environment or beats keystores (see this issue or the filebeat docs), like so:
o365beat:
# pull secrets from environment (e.g, > set -a; . ./ENV_FILE; set +a;)
# or a key store (https://www.elastic.co/guide/en/beats/filebeat/current/keystore.html)
# or hard-code here:
tenant_domain: ${O365BEAT_TENANT_DOMAIN:}
client_secret: ${O365BEAT_CLIENT_SECRET:}
client_id: ${O365BEAT_CLIENT_ID:} # aka application id (GUID)
directory_id: ${O365BEAT_DIRECTORY_ID:} # aka tenant id (GUID)
registry_file_path: ${O365BEAT_REGISTRY_PATH:./o365beat.state}
certificate_path: ${O365BEAT_CERTIFICATE_PATH:} #path to your .pfx file
certificate_pwd: ${O365BEAT_CERTIFICATE_PWD:} #password of your .pfx file
# the following content types will be pulled from the API
# for available types, see https://docs.microsoft.com/en-us/office/office-365-management-api/office-365-management-activity-api-reference#working-with-the-office-365-management-activity-api
content_types:
- Audit.AzureActiveDirectory
- Audit.Exchange
- Audit.SharePoint
- Audit.General
NOTE 1: In pre-packaged releases before v1.5.0, the packaged config file contains an additional processors
section that gets merged into the o365beat.yml and shadows the custom processors used by this beat. You must manually remove the second processors
section, or merge the two, to avoid problems. This is due to a quirk in the libbeat build system which was fixed in release v1.5.0. v1.5.0 packages (and later) do not exhibit this issue, but if you retain your old configuration files you may still have the problematic processors
section. Please see this issue for more information on how to fix it.
NOTE 2: If you decide to hard-code your configuration values, be sure to replace the ${:}
syntax, which pulls from the environment. For example, use tenant_domain: acme.onmicrosoft.com
not tenant_domain: ${acme.onmicrosoft.com:}
.
O365beat requires that you enable audit log search for your Office 365 tenancy, done through the Security and Compliance Center in the Office 365 Admin Portal. If you want detailed Exchange events, you also have to enable mailbox auditing (on by default since January 2019, but worth checking).
It also needs access to the Office 365 Management API: instructions for setting this up are available in the Microsoft documentation.
Once you have these set up, you'll be able to get the information needed in the config file. The naming conventions for the settings are a bit odd, in o365beat.yml
you’ll see some of the synonyms: client id is also called the application id, and the directory id is also called the tenant id. In the Azure portal, go to "App registrations" and you’ll see the Application (Client) ID – a GUID – right there in the application list. If you click on that you’ll see the application (client) id and the directory (tenant) id in the top area.
The next step is API authentication, which can be done in one of the two ways outlined below.
You can create client secrets by clicking the "Certificates & secrets" link on the left there. Be sure to copy it somewhere or you’ll have to create a new one … there’s no facility for viewing them later. The default config file expects these config values to be in your environment (i.e., as environment variables) or in a keystore, named O365BEAT_TENANT_DOMAIN, O365BEAT_CLIENT_SECRET, etc. You can hard-code them in that file if you like, especially when testing, just be smart about the permissions. If you choose this method be sure to O365BEAT_CERTIFICATE_PATH and O365BEAT_CERTIFICATE_PWD fields empty.
Alternativly you can authenticate via certificates, which can be generated using openssl, as described here. Then, you need to upload the certificate (the .crt file), which can be done in the Certificates & secrets tab to the left of the application registration menu.
The default config file expects these config values to be in your environment (i.e., as environment variables) or in a keystore, named O365BEAT_CERTIFICATE_PATH and O365BEAT_CERTIFICATE_PWD in addition to common fields like O365BEAT_TENANT_DOMAIN. You can hard-code them in that file if you like, especially when testing, just be smart about the permissions. If you choose this method be sure to leave the O365BEAT_CLIENT_SECRET field empty.
Finally, the Azure app registration permissions should look like this:
You can edit those using that “API permissions” link on the left, with more detailed instructions available from Microsoft. The beat should automatically subscribe you to the right feeds, though that functionality is currently undergoing testing.
To run O365beat with all debugging output enabled, run:
./o365beat --path.config . -c o365beat.yml -e -d "*" # add --strict.perms=false under WSL 1
State is maintained in the registry_file_path
location, by default in the working directory as o365beat.state
. This file currently contains only a timestamp representing the creation date of the last content blob retrieved, to prevent repeat downloads.
NOTE: Unless it's installed, o365beat doesn't know where to look for its configuration so you have to specify that explicitly. If you see errors authenticating it may be the beat's not seeing your config. Future versions will have more helpful error messages in this regard.
If you're receiving o365beat logs with logstash, use the input type beats
:
input {
beats {
port => "5044"
}
}
As of v1.2.0, o365beat includes a processor to map the raw API-provided events to Elastic Common Schema (ECS) fields. This allows this beat to work with standard Kibana dashboards, including capabilities in Elastic SIEM. Updates in v1.4.0 and v1.4.1 corrected some parsing issues and included at least one more ECS field.
Implementing this as a processor means you can disable it if you don't use the ECS functionality, or change from "copy" to "rename" if you only use ECS. We may end up adding some ECS stuff in the "core" of the beat as well, but this is a decent start. These processors are critical for the proper functioning of the beat and its visualizations. Disabling or modifying them can lead to dropped events or other issues. Please update with caution.
See the Office 365 Management API schema documentation for details on the raw events. The ECS mapping is as follows (excerpt from o365beat.yml
):
# from: https://docs.microsoft.com/en-us/office/office-365-management-api/office-365-management-activity-api-schema
# to: https://www.elastic.co/guide/en/ecs/current/ecs-client.html
processors:
- convert:
fields:
- {from: Id, to: 'event.id', type: string} # ecs core
- {from: RecordType, to: 'event.code', type: string} # ecs extended
- {from: Operation, to: 'event.action', type: string} # ecs core
- {from: OrganizationId, to: 'cloud.account.id', type: string} # ecs extended
- {from: Workload, to: 'event.category', type: string} # ecs core
- {from: ResultStatus, to: 'event.outcome', type: string} # ecs extended
- {from: UserId, to: 'user.id', type: string} # ecs core
- {from: ClientIP, to: 'client.ip', type: ip} # ecs core
- {from: 'dissect.clientip', to: 'client.ip', type: ip} # ecs core
- {from: Severity, to: 'event.severity', type: string} # ecs core
# the following fields use the challenging array-of-name-value-pairs format
# converting them to strings fixes issues in elastic, eases non-script parsing
# easier to rehydrate into arrays from strings than vice versa:
- {from: Parameters, type: string} # no ecs mapping
- {from: ExtendedProperties, type: string} # no ecs mapping
- {from: ModifiedProperties, type: string} # no ecs mapping
Please open an issue or a pull request if you have suggested improvements to this approach.
-
Why can't I see events from Exchange (or some other source)?
- Confirm all the content types are listed under the
content_types
key ino365beat.yml
, like so:content_types: - Audit.AzureActiveDirectory - Audit.Exchange - Audit.SharePoint - Audit.General
- Confirm audit log search is enabled for your tenancy.
- Many exchange events require mailbox auditing to be enabled. Confirm mailbox auditing is enabled.
- Some audit events take time to create. If this is a test tenancy, or if you just enabled new audit subscriptions, it can take up to 12 hours for all the data to start showing up in the results.
- Check the logs created by o365beat for any errors. You can do this by running it at the command line with all debugging enabled:
./o365beat --path.config . -c o365beat.yml -e -d "*"
- Confirm all the content types are listed under the
-
Why can't I see the ECS fields like
client.ip
in my events?Due to a quirk in the libbeat build system, the default config file contains an additional
processors
section that gets merged into theo365beat.yml
and shadows the custom processors used by this beat. You must manually remove the secondprocessors
section (the one that containsadd_host_metadata
andadd_cloud_metadata
, neither of which is particularly useful), or merge the two, to avoid problems. Please see this issue for more information, we're working on a durable fix. -
I'm seeing
non-200
errors in my debugging output for some API calls, am I getting all events?Please update to release v1.4.3 or later. There were a few cases where the
PublisherIdentifier
was not appended to requests, which could cause API throttling in certain cases, which has now been fixed. -
Can I use this beat with GCC High endpoints, or other non-standard Office 365 deployments?
Yes! As of version 1.5.0, the beat pulls Login URL and Resource URL values from the config file. The default values work for typical Office 365 situations, but you can connect to GCC High endpoints by modifying the following keys:
o365beat: login_url: login.microsoftonline.us # default is login.microsoftonline.com resource_url: manage.office365.us # default is manage.office.com # rest of your config ...
-
Why am I getting timeout errors when retrieving certain content types?
For busy tenants or certain networking environments the default
api_timeout
of 30 seconds might be insufficient. You can extend this ino365beat.yml
. Additionally, you can minimize risk of timeouts by reducing thecontent_max_age
setting (default 7 days, or 168 hours) to something like 1 day (1d
) or a few hours (say,5h
). Generally this will only impact you on the first time you run the beat, as every request thereafter will only be requesting data for the precedingperiod
(default, 5 minutes). See this issue for additional discussion. -
Can I parse event fields like
ExtendedProperties
andParameters
that contain arrays of name-value pairs on the client side before shipping them?As of version 1.5.1, the beat imports the
script
processor and provides a sample processor script ino365beat.reference.yml
to convert fields that contain arrays of name-value pairs into a "normal" object. See this issue for more discussion. -
Why are the authentication events (especially logon failures and errors) so confusing?
Please see this issue for an in-depth discussion of some of the idiosyncrasies of the audit log events themselves. This beat just ships them, Microsoft makes decisions about what's in them.
-
I don't see my problem listed here, what gives?
Please review this full README and the issues list, and submit a new issue if you can't find a solution. And you can always contact us for assistance. Thanks!
If you'd like to build yourself, read on.
- Golang 1.7
To build the binary for O365beat run the command below. This will grab vendor dependencies if you don't have them already, and generate a binary in the same directory with the name o365beat.
make
To test O365beat, run the following command:
make testsuite
alternatively:
make unit-tests
make system-tests
make integration-tests
make coverage-report
The test coverage is reported in the folder ./build/coverage/
Each beat has a template for the mapping in elasticsearch and a documentation for the fields
which is automatically generated based on fields.yml
by running the following command.
make update
To clean o365beat source code, run the following command:
make fmt
To clean up the build directory and generated artifacts, run:
make clean
To clone o365beat from the git repository, run the following commands:
mkdir -p ${GOPATH}/src/github.com/counteractive/o365beat
git clone https://github.com/counteractive/o365beat ${GOPATH}/src/github.com/counteractive/o365beat
For further development, check out the beat developer guide.
The beat frameworks provides tools to cross-compile and package your beat for different platforms. This requires docker and vendor-ing as described above. To build packages of your beat, run the following command:
make release
Be sure you have python, virtualenv, gcc, and docker installed, and that the user you're using to build the release is in the docker
group (if not, it'll just hang with no helpful error message).
This will fetch and create all images required for the build process. The whole process to finish can take several minutes.
- Support multiple tenancies with a single beat instance
- Support client certificates (in addition to client secrets)
- Tests
- ECS field mappings beyond the API's common schema
- Add visualizations and dashboard
- ECS field mappings for API's common schema
- v1.5.1 - Added support for the
script
processor (to fix #41), updated README and config files to highlight options to help avoid timeouts (#39), updated README to link to references on API event data (#37) - v1.5.0 - Added and documented feature to customize API endpoints (#25), updates libbeat to v7.5.1, properly parses certain
ClientIP
field formats (#16, #31), fixes build issue that caused important processors to be shadowed in config (#9), fixes issue parsing corrupted state/registry files (#19). - v1.4.3 - Fixed bugs related to throttling and troubleshooting (closes issues #17 and #21)
- v1.4.2 - Fixed multiple processor bugs (closes issues #12, #13, and #14)
- v1.4.1 - Added kibana visualizations and dashboard and updated processors to better handle fields containing data arrays
- v1.4.0 - Bumped libbeat to v7.4.0 and fixed throttling issue
- v1.3.1 - Updated documentation and improved error messages
- v1.3.0 - Fixed auto-subscribe logic and updated documentation
- v1.2.0 - Initial production release