# Unusual Account Activity
 <details>
     <summary>&nbsp;<u>Notebook Details...</u></summary>

 **Notebook Version:** 2.0<br>
 **Python Version:** Python 3.8+<br>
 **Required Packages**: msticpy, msticnb<br>

 **Data Sources Required**:
 - Sentinel - SecurityAlert, SecurityEvent, HuntingBookmark, Syslog, AAD SigninLogs, AzureActivity, OfficeActivity, ThreatIndicator
 - (Optional) - VirusTotal, AlienVault OTX, IBM XForce, Open Page Rank, (all require accounts and API keys)
 </details>

## TBD

<!DOCTYPE html>
<html>
  <head>
  </head>
  <body>
    <h1>Contents<span class="tocSkip"></span></h1>
    <div class="toc">
      <ul class="toc-item">
        <li><span><a href="TBD">TBD</a></span></li>
        <li><span><a href="TBD">TBD</a></span></li>
      </ul>
    </div>
    
  </body>
</html>


## Hunting Hypothesis
TBD


Flow:
- Query - risk-flagged sign-ins
- Add supplemental queries
- Query alerts for related accounts
- TI lookup for source IP (other?)

---
# Notebook initialization
This should complete without errors. If you encounter errors or warnings look at the following notebooks:

- <a href="https://github.com/Azure/Azure-Sentinel-Notebooks/blob/master/A%20Getting%20Started%20Guide%20For%20Azure%20Sentinel%20ML%20Notebooks.ipynb">Getting Started Notebook</a>
- [TroubleShootingNotebooks](https://github.com/Azure/Azure-Sentinel-Notebooks/blob/master/TroubleShootingNotebooks.ipynb)
- [ConfiguringNotebookEnvironment](https://github.com/Azure/Azure-Sentinel-Notebooks/blob/master/ConfiguringNotebookEnvironment.ipynb)

<details>
    <summary>&nbsp;<u>Details...</u></summary>
The next cell:
- Checks for the correct Python version
- Checks versions and optionally installs required packages
- Imports the required packages into the notebook
- Sets a number of configuration options.

If you are running in the Azure Sentinel Notebooks environment (Azure Notebooks or Azure ML) you can run live versions of these notebooks:
- [Getting Started](./A Getting Started Guide For Azure Sentinel ML Notebooks.ipynb)
- [Run TroubleShootingNotebooks](./TroubleShootingNotebooks.ipynb)
- [Run ConfiguringNotebookEnvironment](./ConfiguringNotebookEnvironment.ipynb)

You may also need to do some additional configuration to successfully use functions such as Threat Intelligence service lookup and Geo IP lookup. 
There are more details about this in the `ConfiguringNotebookEnvironment` notebook and in these documents:
- [msticpy configuration](https://msticpy.readthedocs.io/en/latest/getting_started/msticpyconfig.html)
- [Threat intelligence provider configuration](https://msticpy.readthedocs.io/en/latest/data_acquisition/TIProviders.html#configuration-file)
</details>

In [1]:
from datetime import datetime, timedelta, timezone

REQ_PYTHON_VER = "3.8"
REQ_MSTICPY_VER = "2.3.0"

# %pip install --upgrade msticpy

import msticpy as mp
mp.init_notebook()


In [3]:
# papermill default parameters
ws_name = "Default"
end = datetime.now(timezone.utc)
start = end - timedelta(days=2)
baseline_period = 28
run_date = end

### Get Workspace and Authenticate

<details>
    <summary><u>Authentication help...</u></summary>
    If you want to use a workspace other than one you have defined in your<br>
msticpyconfig.yaml create a connection string with your AAD TENANT_ID and<br>
your WORKSPACE_ID (these should both be quoted UUID strings).

```python
  workspace_cs = "loganalytics://code().tenant('TENANT_ID').workspace('WORKSPACE_ID')"
```
e.g.
```python
  workspace_cs = "loganalytics://code().tenant('c3de0f06-dcb8-40fb-9d1a-b62faea29d9d').workspace('c62d3dc5-11e6-4e29-aa67-eac88d5e6cf6')"
```
Then in the Authentication cell replace
the call to `qry_prov.connect` with the following:
```python
  qry_prov.connect(connect_str=workspace_cs)
```
The cell should now look like this:

```python
...
  # Authentication
  qry_prov = QueryProvider(data_environment="MSSentinel")
  qry_prov.connect(connect_str=workspace_cs)
...
```

On successful authentication you should see a ```popup schema``` button.
To find your Workspace Id go to [Log Analytics](https://ms.portal.azure.com/#blade/HubsExtension/Resources/resourceType/Microsoft.OperationalInsights%2Fworkspaces). Look at the workspace properties to find the ID.
</details>

In [4]:
print("Configured workspaces: ", ", ".join(msticpy.settings.get_config("AzureSentinel.Workspaces").keys()))
import ipywidgets as widgets
ws_param = widgets.Combobox(
    description="Workspace Name",
    value=ws_name,
    options=list(msticpy.settings.get_config("AzureSentinel.Workspaces").keys())
)
ws_param

Configured workspaces:  ASIHuntOMSWorkspaceV4, CCIS, Centrica, CyberSecuritySoc, Default, GovCyberSecuritySOC, NationalGrid, RedmondSentinelDemoEnvironment


Combobox(value='Default', description='Workspace Name', options=('ASIHuntOMSWorkspaceV4', 'CCIS', 'Centrica', …

In [6]:
from msticpy.common.timespan import TimeSpan
from msticpy.context.azure import MicrosoftSentinel

# Authentication
qry_prov = mp.QueryProvider(data_environment="MSSentinel")
qry_prov.connect(workspace=ws_param.value)

sentinel = MicrosoftSentinel(workspace=ws_param.value, connect=True)

nb_timespan = TimeSpan(start, end)
qry_prov.query_time.timespan = nb_timespan
md("<hr>")
md("Confirm time range to search", "bold")
qry_prov.query_time

Connecting... connected


VBox(children=(HTML(value='<h4>Set query time boundaries</h4>'), HBox(children=(DatePicker(value=datetime.date…

# Notebook Logic

{period} = query time period

{baseline} = {period}.start - 28 days ... {period}.start

1. Find users with high risk and unmitigated signin for {period}
2. Find users with high risk signins for {baseline}
3. Divide 1 into:
   a. Users with on-going high risk - for triage
   b. Users with new high risk status
4. For users in 3.a, check:
   - Azure activity - any activity types in {period} not in baseline {baseline}
   - Azure audit - any activity types in {period} not in baseline {baseline}

Output (dynamic summary):
- List of ongoing high risk users
- New high risk users:
  - Signin types and locations
  - Novel Azure Activity and Audit types

In [50]:
import re
import urllib
from collections import namedtuple, defaultdict
from datetime import datetime, timedelta, timezone
from typing import Dict, NamedTuple, Optional

import httpx
import pandas as pd
import yaml
from tqdm.auto import tqdm

from msticpy.context.azure.sentinel_dynamic_summary import DynamicSummary, DynamicSummaryItem


# Summary report classes
class SummaryItem(NamedTuple):
    """Data report collection for summary."""
    key: str
    data: pd.DataFrame
    properties: Dict[str, Any]


class SummaryReport:
    """Class to hold summary reports during exec of notebook."""
    def __init__(self):
        self._summary_reports: Dict[str, Dict[str, SummaryItem]] = defaultdict(dict)

    def add_summary_data(self, data: pd.DataFrame, user_column: str, section: str, **kwargs):
        """Add data for users to the summary report"""
        for user, user_data in data.groupby(user_column):
            summary = SummaryItem(
                key=user,
                data=user_data,
                properties=kwargs
            )
            self._summary_reports[user.casefold()][section] = summary

    @property
    def users(self):
        return sorted(self._summary_reports)

    @property
    def report_types(self):
        return sorted({
            report for user_reports in self._summary_reports.values()
            for report in user_reports
        })


summary_report = SummaryReport()


# DF display function
def df_caption(data: pd.DataFrame, caption: str):
    """Display dataframe with a caption."""
    caption_css = "; ".join([
        "caption-side: top",
        "text-align: left",
        "font-size: 15pt",
        "font-weight: bold",
        "padding: 5pt",
    ])
    display(
        data.style.set_caption(f"{caption}").set_table_styles(
            [
                {
                    "selector": "caption",
                    "props": caption_css,
                }
            ]
        )
    )


def get_user_param(data: pd.DataFrame) -> str:
    """Return user names from DataFrame as comma-sep string."""
    return  ",".join([
        f"'{user}'" for user
        in data.UserPrincipalName.values
    ])


# update any changes to start/end datetimes
start = qry_prov.query_time.start
end = qry_prov.query_time.end

# Get risk-flagged sign-ins

This query retrieves user signins that have been flagged by Azure Identity Protection
as at risk. See [Azure Identity Protection](https://learn.microsoft.com/azure/active-directory/identity-protection/overview-identity-protection)
for more background.

In [51]:
signing_risk_query = """
SigninLogs
| where TimeGenerated between (datetime({start}) .. datetime({end}))
| where RiskState != "none"
| project UserPrincipalName, ResultDescription, RiskState, RiskDetail, RiskEventTypes,
  RiskEventTypes_V2, RiskLevelAggregated, RiskLevelDuringSignIn, IPAddress
| extend SigninRisk = case(
        RiskLevelDuringSignIn == "high", 5,
        RiskLevelDuringSignIn == "medium", 3,
        RiskLevelDuringSignIn == "low", 1,
        0
    ),
    AggRisk = case(
        RiskLevelAggregated == "high", 5,
        RiskLevelAggregated == "medium", 3,
        RiskLevelAggregated == "low", 1,
        0
    )
| extend RiskEventDyn = parse_json(RiskEventTypes), RiskEventV2Dyn = parse_json(RiskEventTypes_V2)
| mv-expand RiskEventDyn, RiskEventV2Dyn
| summarize SignIns=count(AggRisk), MeanAggRisk=avg(AggRisk), MeanSigninRisk=avg(SigninRisk), 
  RiskStates=make_set(RiskState), RiskEvents=make_set(RiskEventDyn), RiskEventsV2=make_set(RiskEventV2Dyn),
  SourceIPs=make_set(IPAddress)
  by UserPrincipalName
| order by MeanAggRisk, MeanSigninRisk asc nulls last
"""

# run the query
signin_risk_users_df = qry_prov.exec_query(
    signing_risk_query.format(start=start, end=end)
)
# expand RiskStates (list)
risk_states_df = signin_risk_users_df.explode("RiskStates")
# Extract list of users where risk was mitigated 
safe_users_df = risk_states_df[risk_states_df["RiskStates"].isin(["remediated", "confirmedSafe"])].UserPrincipalName.drop_duplicates()

# Separate unmitigated from mitigated risk users
risk_users_df = signin_risk_users_df[~signin_risk_users_df["UserPrincipalName"].isin(safe_users_df)]
mitigated_users_df = signin_risk_users_df[signin_risk_users_df["UserPrincipalName"].isin(safe_users_df)]

df_caption(risk_users_df[["UserPrincipalName"]], "Unmitigated risk users")
df_caption(mitigated_users_df[["UserPrincipalName"]], "Mitigated risk users")

Unnamed: 0,UserPrincipalName
1,tamuto@seccxpninja.onmicrosoft.com


Unnamed: 0,UserPrincipalName
0,pdemo@seccxpninja.onmicrosoft.com
2,dwilliams@seccxp.ninja
3,adm_pwatkins@seccxpninja.onmicrosoft.com


## Retrieve the historical risk level for previous N days

This is used to distinguish accounts that have a new "At Risk"
designation from those accounts that have a history of risk signins.

> Note: "N" is the `baseline_period` parameter for the notebook - default is 28 days

## Signing Summary for users with unmitigated risk

In [53]:
_AADSIL_DISPLAY_COLUMNS = [
    'TimeGenerated', 'ResultType', 'ResultDescription', 'UserPrincipalName', 'UserId',
    'Location', 'IPAddress', 'AppDisplayName', 'ClientAppUsed', 'AppId',
    'AuthenticationDetails', 'AuthenticationMethodsUsed',
    'RiskEventTypes', 'RiskEventTypes_V2', 'RiskLevelAggregated',
    'RiskLevelDuringSignIn', 'RiskState', 'ResourceDisplayName',
    'LocationDetails', 'MfaDetail', 'NetworkLocationDetails',
    'UserAgent', 'UserDisplayName', 'UserType', 'IPAddressFromResourceProvider',
    'ResourceTenantId', 'HomeTenantId', 'AutonomousSystemNumber', 'Type'
]


# Function to summarize the history data
def weekly_signin_summary(data) -> pd.DataFrame:
    """Create signin summary from historical data."""
    return (
        data
        [_AADSIL_DISPLAY_COLUMNS]
        .explode(["RiskEventTypes"])
        .groupby(["UserPrincipalName", pd.Grouper(key="TimeGenerated", freq="W")])
        .agg(
            LoginCount=pd.NamedAgg("ResultType", "count"),
            ResultTypes=pd.NamedAgg("ResultType", "unique"),
            RiskEventTypes=pd.NamedAgg("RiskEventTypes", "unique"),
            RiskLevels=pd.NamedAgg("RiskLevelAggregated", "unique"),
            RiskLevelSignins=pd.NamedAgg("RiskLevelDuringSignIn", "unique"),
            IPs=pd.NamedAgg("IPAddress", "nunique"),
            Locations=pd.NamedAgg("Location", "nunique"),
            Apps=pd.NamedAgg("AppDisplayName", "nunique"),
            UserAgents=pd.NamedAgg("UserAgent", "nunique"),
            StartDate=pd.NamedAgg("TimeGenerated", "min"),
            EndDate=pd.NamedAgg("TimeGenerated", "max"),
        )
        .sort_index()
    )


# Get historical risk level for previous {period} days
risk_hist_query = """
let q_end = datetime({start});
let q_start = datetime_add("day", -{period}, q_end);
SigninLogs
| where TimeGenerated between (q_start .. q_end)
| where RiskState != "none"
| where UserPrincipalName in ({users})
| extend RiskEventTypes = parse_json(RiskEventTypes), RiskEventTypes_V2 = parse_json(RiskEventTypes_V2)
"""

# Unmitigated risk users
risk_user_hist_df = qry_prov.exec_query(
    risk_hist_query.format(
        users=get_user_param(risk_users_df),
        start=start,
        period=baseline_period,
    )
)

risk_users_history = weekly_signin_summary(risk_user_hist_df).reset_index()

# Isolate users that have no history of risk in previous period
users_with_past_risk_criteria = risk_users_df.UserPrincipalName.isin(risk_user_hist_df.UserPrincipalName.unique())
risk_users_df = risk_users_df.copy()
risk_users_df.loc[~users_with_past_risk_criteria, "RiskHistory"] = "New"
risk_users_df.loc[users_with_past_risk_criteria, "RiskHistory"] = "Existing"

summary_report.add_summary_data(
    data=risk_users_df,
    user_column="UserPrincipalName",
    section="Risk Users Summary",
)
summary_report.add_summary_data(
    data=risk_users_history,
    user_column="UserPrincipalName",
    section="Risk Users History",
)

df_caption(risk_users_df, "Sign-in risk summary - unmitigated")

Unnamed: 0,UserPrincipalName,SignIns,MeanAggRisk,MeanSigninRisk,RiskStates,RiskEvents,RiskEventsV2,SourceIPs,RiskHistory
1,tamuto@seccxpninja.onmicrosoft.com,3,1.0,1.0,['atRisk'],"['unfamiliarFeatures', 'unlikelyTravel']","['unfamiliarFeatures', 'unlikelyTravel']",['20.25.98.192'],Existing


In [54]:
not_mit_risk_history

Unnamed: 0,UserPrincipalName,TimeGenerated,LoginCount,ResultTypes,RiskEventTypes,RiskLevels,RiskLevelSignins,IPs,Locations,Apps,UserAgents,StartDate,EndDate
0,tamuto@seccxpninja.onmicrosoft.com,2023-01-22 00:00:00+00:00,1,[0],[mcasImpossibleTravel],[low],[none],1,1,1,1,2023-01-18 01:27:19.258557600+00:00,2023-01-18 01:27:19.258557600+00:00


## Signing Summary for users with mitigated risk
### [info only]

In [13]:
# History of mitigated risk users
mit_risk_user_hist_df = qry_prov.exec_query(
    risk_hist_query.format(
        users=get_user_param(mitigated_users_df),
        start=start,
        period=baseline_period
    )
)


# Isolate users that have no history of risk in previous period
users_with_past_risk_criteria = mitigated_users_df.UserPrincipalName.isin(mit_risk_user_hist_df.UserPrincipalName.unique())
mitigated_users_df = mitigated_users_df.copy()
mitigated_users_df.loc[~users_with_past_risk_criteria, "RiskHistory"] = "New"
mitigated_users_df.loc[users_with_past_risk_criteria, "RiskHistory"] = "Existing"
mitigated_users_df
df_caption(mitigated_users_df, "Sign-in risk summary - mitigated")

Unnamed: 0,UserPrincipalName,SignIns,MeanAggRisk,MeanSigninRisk,RiskStates,RiskEvents,RiskEventsV2,SourceIPs,RiskHistory
0,pdemo@seccxpninja.onmicrosoft.com,268,1.619403,4.276119,"['atRisk', 'dismissed', 'confirmedSafe']","['unfamiliarFeatures', 'unlikelyTravel']","['unfamiliarFeatures', 'unlikelyTravel']","['84.59.133.96', '49.207.205.157', '182.48.225.204', '202.171.187.206', '83.6.102.205', '94.239.55.19', '51.142.235.76', '110.49.50.142', '49.37.163.128', '165.225.120.88', '89.211.239.104', '94.245.87.14', '50.208.71.66', '140.186.246.113', '201.191.218.23', '190.104.120.0', '109.48.220.202', '73.43.36.19', '76.67.108.134', '148.64.97.101', '51.142.111.1', '37.186.51.21', '76.184.244.1', '172.13.62.40', '107.129.128.34', '99.248.154.225', '87.187.23.105', '108.34.158.176', '80.187.114.167', '157.49.156.108', '194.107.2.82', '167.220.24.243', '187.56.121.165', '73.25.210.7', '90.146.97.205', '195.97.138.43', '14.187.179.30', '114.79.170.132', '147.161.199.96', '8.23.71.2', '20.122.92.1', '189.249.64.2', '168.149.166.14', '47.231.129.2', '212.180.224.82', '107.11.97.170', '96.234.155.228', '147.235.216.117', '39.9.193.150', '109.147.153.136', '168.149.166.78', '70.164.213.113', '147.161.199.101', '24.98.48.107', '180.218.164.251', '94.174.54.38', '79.107.37.34', '208.104.177.188', '24.15.125.185', '104.219.136.49', '159.196.229.180', '209.65.150.148', '61.68.47.232']",Existing
2,dwilliams@seccxp.ninja,2,0.5,3.0,"['atRisk', 'confirmedSafe']",['unfamiliarFeatures'],['unfamiliarFeatures'],['20.227.3.22'],New
3,adm_pwatkins@seccxpninja.onmicrosoft.com,24,0.0,4.833333,['remediated'],['anonymizedIPAddress'],['anonymizedIPAddress'],"['185.220.100.247', '192.42.116.216']",Existing


# Retrieve and Run UEBA hunting queries on risk-flagged users

> UEBA = User Entity Behavior Analytics

The next cell retrieves the current UEBA hunting
queries and runs them against the risk-flagged users.

For more information see [Microsoft Sentinel UEBA](https://learn.microsoft.com/azure/sentinel/identify-threats-with-entity-behavior-analytics)

In [14]:
# Hunting Queries
_SENTINEL_REPO = "https://raw.githubusercontent.com/Azure/Azure-Sentinel/master"
_SI_LOG_ROOT = f"{_SENTINEL_REPO}/Hunting%20Queries/SigninLogs"
_GEN_HUNTING_QRY = [
    # "AnomalousUserAppSigninLocationIncreaseDetail.yaml",
    # "LegacyAuthAttempt.yaml",
    # "Signins-From-VPS-Providers.yaml",
    # "UserAccountsMeasurableincreaseofsuccessfulsignins.yaml",
    # "riskSignInWithNewMFAMethod.yaml",
    # "signinBurstFromMultipleLocations.yaml",
]

# UEBA Hunting Queries
_UEBA_HQ_ROOT = f"{_SENTINEL_REPO}/Solutions/UEBA%20Essentials/Hunting%20Queries"
_UEBA_HUNTING_QRY = [
    "anomaliesOnVIPUsers.yaml",
    "Anomalous AAD Account Manipulation.yaml",
    "Anomalous Account Creation.yaml",
    "Anomalous Activity Role Assignment.yaml",
    "Anomalous Code Execution.yaml",
    "Anomalous Data Access.yaml",
    "Anomalous Defensive Mechanism Modification.yaml",
    "Anomalous Failed Logon.yaml",
    "Anomalous Geo Location Logon.yaml",
    "Anomalous Login to Devices.yaml",
    "Anomalous Password Reset.yaml",
    "Anomalous RDP Activity.yaml",
    "Anomalous Resource Access.yaml",
    "Anomalous Role Assignment.yaml",
    "Anomalous Sign-in Activity.yaml",
    "anomalousActionInTenant.yaml",
    "dormantAccountActivityFromUncommonCountry.yaml",
    "firstConnectionFromGroup.yaml",
    "loginActivityFromBotnet.yaml",
    "newAccountAddedToAdminGroup.yaml",
    # "terminatedEmployeeAccessHVA.yaml",
    # "terminatedEmployeeActivity.yaml",
    "updateKeyVaultActivity.yaml",
]

ALL_QUERIES = {qry: _SI_LOG_ROOT for qry in _GEN_HUNTING_QRY}
ALL_QUERIES.update({qry: _UEBA_HQ_ROOT for qry in _UEBA_HUNTING_QRY})

TIME_TOKEN = re.compile(r"(\{\{StartTimeISO\}\}|\{\{EndTimeISO\}\})")
_LEFT_BRACE = r"[^{](\{)[^{]"
_RIGHT_BRACE = r"[^}](\})[^}]"
_LB_TOKEN = "%%~[~%%"
_RB_TOKEN = "%%~]~%%"


def replace_time_params(query):
    repl_query = re.sub(_LEFT_BRACE, _LB_TOKEN, query)
    repl_query = re.sub(_RIGHT_BRACE, _RB_TOKEN, repl_query)
    repl_query = repl_query.replace("{{StartTimeISO}}", "{start}").replace("{{EndTimeISO}}", "{end}")
    return repl_query.replace(_LB_TOKEN, "{{").replace(_RB_TOKEN, "}}")


QueryProps = namedtuple("QueryProps", "name, query, req_time, description, url, raw_query")


def fetch_queries(query_dict: Dict[str, str], verbose: bool = False) -> Dict[str, QueryProps]:
    """Fetch queries from Sentinel GitHub repo."""
    discover_queries: Dict[str, QueryProps] = {}
    error_queries: Dict[str, str] = {}
    for query, path in tqdm(query_dict.items()):
        q_path = f"{path}/{urllib.parse.quote(query)}"
        resp = httpx.get(q_path)
        if resp.status_code != 200:
            print(f"invalid URL {path}")
            continue
        try:
            q_dict = yaml.safe_load(resp.content)
        except yaml.scanner.ScannerError as err:
            print(f"could not parse query {query} at {q_path}")
            error_queries[query] = resp.content
            continue

        query_text = q_dict.get("query")
        req_time = False
        if re.search(TIME_TOKEN, query_text):
            query_text = replace_time_params(query_text)
            req_time = True

        if "UEBA" in path:
            query_text = add_ueba_time_params(query_text)
        if verbose:
            print(f"Query {query}, {q_dict['name']}, req time: {req_time}")
        discover_queries[query] = QueryProps(
            name=q_dict.get("name"),
            query=query_text,
            req_time=req_time,
            description=q_dict.get("description"),
            url=q_path,
            raw_query=q_dict.get("query"),
        )
    return discover_queries


PRIM_TABLE_EXP = r"(?P<prefix>^|\n)(?P<table>(BehaviorAnalytics|AuditLogs|IdentityInfo|SigninLogs))(?=[\s\n\)\|])"
PRIM_TABLE_REPL = r"\g<prefix>\g<table>\n| where TimeGenerated > datetime({start})"
JOIN_TABLE_EXP = r"(?P<join>\|\s+join[^(]*\(\s*[^\s]+)(?=[\s\)\|])"
JOIN_TABLE_REPL = r"\g<join>\n| where TimeGenerated > datetime({start})"


def add_ueba_time_params(query):
    if isinstance(query, tuple):
        if query.req_time:
            return query.query
        query = query.query
    return re.sub(
        JOIN_TABLE_EXP,
        JOIN_TABLE_REPL,
        re.sub(PRIM_TABLE_EXP, PRIM_TABLE_REPL, query)
    )


def display_query_table(queries):
    ht_table = "<table>{rows}</table>"
    rows = [f"<tr><td>{q.name}</td><td>{q.url}</td></tr>"
        for q in queries.values()]
    from IPython.display import HTML
    display(HTML(ht_table.format(rows="".join(rows))))


hunting_queries = fetch_queries(ALL_QUERIES)

display_query_table(hunting_queries).head()

100%|██████████| 21/21 [00:04<00:00,  5.14it/s]


0,1
Anomalies on users tagged as VIP,https://raw.githubusercontent.com/Azure/Azure-Sentinel/master/Solutions/UEBA%20Essentials/Hunting%20Queries/anomaliesOnVIPUsers.yaml
Anomalous AAD Account Manipulation,https://raw.githubusercontent.com/Azure/Azure-Sentinel/master/Solutions/UEBA%20Essentials/Hunting%20Queries/Anomalous%20AAD%20Account%20Manipulation.yaml
Anomalous AAD Account Creation,https://raw.githubusercontent.com/Azure/Azure-Sentinel/master/Solutions/UEBA%20Essentials/Hunting%20Queries/Anomalous%20Account%20Creation.yaml
Anomalous Activity Role Assignment,https://raw.githubusercontent.com/Azure/Azure-Sentinel/master/Solutions/UEBA%20Essentials/Hunting%20Queries/Anomalous%20Activity%20Role%20Assignment.yaml
Anomalous Code Execution,https://raw.githubusercontent.com/Azure/Azure-Sentinel/master/Solutions/UEBA%20Essentials/Hunting%20Queries/Anomalous%20Code%20Execution.yaml
Anomalous Data Access,https://raw.githubusercontent.com/Azure/Azure-Sentinel/master/Solutions/UEBA%20Essentials/Hunting%20Queries/Anomalous%20Data%20Access.yaml
Anomalous Defensive Mechanism Modification,https://raw.githubusercontent.com/Azure/Azure-Sentinel/master/Solutions/UEBA%20Essentials/Hunting%20Queries/Anomalous%20Defensive%20Mechanism%20Modification.yaml
Anomalous Failed Logon,https://raw.githubusercontent.com/Azure/Azure-Sentinel/master/Solutions/UEBA%20Essentials/Hunting%20Queries/Anomalous%20Failed%20Logon.yaml
Anomalous Geo Location Logon,https://raw.githubusercontent.com/Azure/Azure-Sentinel/master/Solutions/UEBA%20Essentials/Hunting%20Queries/Anomalous%20Geo%20Location%20Logon.yaml
Anomalous Login to Devices,https://raw.githubusercontent.com/Azure/Azure-Sentinel/master/Solutions/UEBA%20Essentials/Hunting%20Queries/Anomalous%20Login%20to%20Devices.yaml


## Browser for UEBA queries - not used in notebook

In [35]:
import ipywidgets as widgets
import difflib

def browse_queries(queries: Dict[str, QueryProps]):
    """
    Browse Hunting queries.
    
    Notes
    -----
    T
    """
    select_query = widgets.Select(
        description="Query",
        options=[(qry.name, idx) for idx, qry in queries.items()],
        layout=widgets.Layout(height="200px", width="50%", padding="5pt")
    )
    layout_query = lambda x, y: widgets.Layout(height=x, width=y, padding="5pt")
    layout_w = lambda x: widgets.Layout(width=x, padding="5pt")
    qry_view = widgets.Textarea(layout=layout_query("200px", "95%"))
    qry_view_repl = widgets.Textarea(layout=layout_query("200px", "95%"))
    qry_view_diff = widgets.Textarea(layout=layout_query("150px", "50%"))
    qry_file = widgets.Label(layout=layout_w("60%"))
    orig_lbl = widgets.Label(value="Original query", layout=layout_w("60%"))
    mod_lbl = widgets.Label(value="Modified query", layout=layout_w("60%"))
    vbox = widgets.VBox([
        select_query,
        qry_file,
        widgets.HBox([
            widgets.VBox([orig_lbl, qry_view], layout=layout_query("250px", "45%")),
            widgets.VBox([mod_lbl, qry_view_repl], layout=layout_query("250px", "45%"))
        ]),
        qry_view_diff
    ])

    def update_query(change):
        query = queries[select_query.value]
        qry_file.value = query.url
        qry_view.value = query.raw_query
        qry_view_repl.value = query.query
        qry_view_diff.value = "\n".join(difflib.unified_diff(qry_view.value.splitlines(), qry_view_repl.value.splitlines()))

    select_query.observe(update_query, names="value")
    update_query(None)
    return vbox

# Uncomment the follow line to browse the hunting queries
# browse_queries(hunting_queries)

VBox(children=(Select(description='Query', layout=Layout(height='200px', padding='5pt', width='50%'), options=…

## Run Hunting queries for time range on risky accounts

In [15]:

def run_ueba_queries(queries, start, end) -> pd.DataFrame:
    dfs = []
    query_params = {"end": end, "start": start}
    for query in tqdm(queries.values()):
        if "UEBA" not in query.url:
            continue
        try:
            repl_query = query.query
            if "{start}" in repl_query or "{end}" in repl_query:
                try:
                    repl_query = repl_query.format(**query_params)
                except KeyError:
                    print(f"Format error: {query.name}")
            result_df = qry_prov.exec_query(repl_query)
            result_df["UEBAQuery"] = query.name
            dfs.append(result_df)
        except Exception as err:
            print("Exception:", type(err), query.name)
    return pd.concat(dfs)

ueba_df = run_ueba_queries(hunting_queries, start=start, end=end)

100%|██████████| 21/21 [00:27<00:00,  1.33s/it]


In [62]:
ueba_summary = (
    ueba_df[ueba_df["UserPrincipalName"].str.lower().isin(risk_users_df.UserPrincipalName)]
    .groupby(["UserPrincipalName", "UEBAQuery"])
    .agg(
        UEBAEventCount=pd.NamedAgg("TimeGenerated", "count"),
        StartTime=pd.NamedAgg("TimeGenerated", "min"),
        EndTime=pd.NamedAgg("TimeGenerated", "max"),
    )
)
summary_report.add_summary_data(
    data=ueba_summary.reset_index(),
    user_column="UserPrincipalName",
    section="UEBA Summary",
)
df_caption(
    ueba_summary,
    caption="UEBA entries for unmitigated risk users"
)

Unnamed: 0_level_0,Unnamed: 1_level_0,UEBAEventCount,StartTime,EndTime
UserPrincipalName,UEBAQuery,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
tamuto@seccxpninja.onmicrosoft.com,Anomalous Sign-in Activity,84,2023-01-31 00:54:39+00:00,2023-02-01 13:04:04+00:00
tamuto@seccxpninja.onmicrosoft.com,Anomalous action performed in tenant by privileged user,1,2023-02-01 06:00:31+00:00,2023-02-01 06:00:31+00:00


In [61]:
ueba_summary.reset_index()

Unnamed: 0,UserPrincipalName,UEBAQuery,UEBAEventCount,StartTime,EndTime
0,tamuto@seccxpninja.onmicrosoft.com,Anomalous Sign-in Activity,84,2023-01-31 00:54:39+00:00,2023-02-01 13:04:04+00:00
1,tamuto@seccxpninja.onmicrosoft.com,Anomalous action performed in tenant by privileged user,1,2023-02-01 06:00:31+00:00,2023-02-01 06:00:31+00:00


In [17]:
df_caption(
    ueba_df[ueba_df["UserPrincipalName"].str.lower().isin(mitigated_users_df.UserPrincipalName)]
    .groupby(["UserPrincipalName", "UEBAQuery"])
    .agg(
        UEBAEventCount=pd.NamedAgg("TimeGenerated", "count"),
        StartTime=pd.NamedAgg("TimeGenerated", "min"),
        EndTime=pd.NamedAgg("TimeGenerated", "max"),
    ),
    caption="UEBA entries for mitigated risk users"
)

Unnamed: 0_level_0,Unnamed: 1_level_0,UEBAEventCount,StartTime,EndTime
UserPrincipalName,UEBAQuery,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
PDemo@seccxpninja.onmicrosoft.com,Anomalous Sign-in Activity,1064,2023-01-31 06:41:25+00:00,2023-02-01 23:47:02+00:00
adm_pwatkins@seccxpninja.onmicrosoft.com,Anomalies on users tagged as VIP,33,2023-01-31 12:51:07+00:00,2023-01-31 13:17:32+00:00
adm_pwatkins@seccxpninja.onmicrosoft.com,Anomalous Sign-in Activity,30,2023-01-31 12:59:44+00:00,2023-01-31 13:17:32+00:00
adm_pwatkins@seccxpninja.onmicrosoft.com,"Anomalous login activity originated from Botnet, Tor proxy or C2",25,2023-01-31 13:02:17+00:00,2023-01-31 13:17:32+00:00
dwilliams@seccxp.ninja,Anomalous Sign-in Activity,3,2023-01-31 12:12:42+00:00,2023-02-01 00:53:23+00:00
pdemo@seccxpninja.onmicrosoft.com,Anomalous Sign-in Activity,369,2023-01-30 23:51:12+00:00,2023-01-31 16:27:02+00:00


# Signin Summaries for prior week

In [18]:
user_summary_query = """
let si_history = SigninLogs
| where TimeGenerated between (datetime({start}) .. datetime({end}))
| where UserPrincipalName in~ ({users})
| summarize count() by UserPrincipalName, ResultType, RiskLevelAggregated, RiskLevelDuringSignIn, ClientAppUsed, UserAgent, IPAddress, Location;
si_history
| summarize OpCount=sum(count_) by UserPrincipalName, ClientAppUsed
| project UserPrincipalName, Attribute="ClientAppUser", Value=ClientAppUsed, OpCount
| union ( 
si_history
| summarize OpCount=sum(count_) by UserPrincipalName, IPAddress
| project UserPrincipalName, Attribute="IPAddress", Value=IPAddress, OpCount
)
| union ( 
si_history
| summarize OpCount=sum(count_) by UserPrincipalName, UserAgent
| project UserPrincipalName, Attribute="UserAgent", Value=UserAgent, OpCount
)
| union ( 
si_history
| summarize OpCount=sum(count_) by UserPrincipalName, Location
| project UserPrincipalName, Attribute="Location", Value=Location, OpCount
)
"""
week_ago = (end - timedelta(7))
user_summary_df = qry_prov.exec_query(user_summary_query.format(
    users=get_user_param(risk_users_df),
    start=week_ago,
    end=end
))

summary_report.add_summary_data(
    data=user_summary_df,
    user_column="UserPrincipalName",
    report="Signin summary for previous week"
)
df_caption(
    user_summary_df.groupby(["UserPrincipalName", "Attribute"]).agg(
        Values=pd.NamedAgg("Value", "unique"),
        NumUniqueValues=pd.NamedAgg("Value", "nunique"),
        OpCount=pd.NamedAgg("Value", "count"),
    )
    .reset_index()
    .pivot(index=['UserPrincipalName'], columns='Attribute', values=["Values", "NumUniqueValues"]),
    caption="Sign-in summary for previous week"
)

Unnamed: 0_level_0,Values,Values,Values,Values,NumUniqueValues,NumUniqueValues,NumUniqueValues,NumUniqueValues
Attribute,ClientAppUser,IPAddress,Location,UserAgent,ClientAppUser,IPAddress,Location,UserAgent
UserPrincipalName,Unnamed: 1_level_2,Unnamed: 2_level_2,Unnamed: 3_level_2,Unnamed: 4_level_2,Unnamed: 5_level_2,Unnamed: 6_level_2,Unnamed: 7_level_2,Unnamed: 8_level_2
tamuto@seccxpninja.onmicrosoft.com,['Browser'],['118.200.55.233' '20.25.98.192'],['SG' 'US'],"['Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/109.0.0.0 Safari/537.36 Edg/109.0.1518.70'  'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/109.0.0.0 Safari/537.36 Edg/109.0.1518.61'  'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/109.0.0.0 Safari/537.36 Edg/109.0.1518.70']",1,2,2,3


# Related alerts

## 1. Alerts that name the account explicitly

In [38]:
related_alerts_df = pd.concat([
    (
        qry_prov.SecurityAlert.list_related_alerts(account_name=acct)
        .assign(UserPrincipalName=acct)
    )
    for acct in tqdm(risk_users_df.UserPrincipalName)
])
summary_report.add_summary_data(
    data=related_alerts_df,
    user_column="UserPrincipalName",
    section="Related alerts for user"
)
df_caption(related_alerts_df.drop(
    columns=["Description", "RemediationSteps", "ExtendedProperties"]),
    caption="Related alerts for account")

100%|██████████| 1/1 [00:02<00:00,  2.26s/it]


## 2. Alerts related to signin-in IP addresses

In [88]:
related_alerts_ip_df = pd.concat([
    (
        qry_prov.SecurityAlert.list_alerts_for_ip(source_ip_list=ip_addr)
        .assign(UserPrincipalName=acct, IPAddress=ip_addr)
    )
    for acct, ip_addr in tqdm(
        risk_users_df.explode("SourceIPs")[["UserPrincipalName", "SourceIPs"]].apply(tuple, axis=1)
    )
])
summary_report.add_summary_data(
    data=related_alerts_ip_df,
    user_column="UserPrincipalName",
    section="Related alerts for user signin IP address"
)

df_caption(related_alerts_ip_df, caption="Related alerts for sign-in IP Address")


100%|██████████| 3/3 [00:07<00:00,  2.35s/it]


# Threat Intelligence reports for sign-in IPs

In [82]:
# look up IP addresses - join UserPrincipalName from source DF to output
ti_user_ip = IpAddress.tilookup_ip(
    risk_users_df.explode("SourceIPs")[["UserPrincipalName", "SourceIPs"]],
    column="SourceIPs",
    join="left"
).query("Severity != 'information'")

summary_report.add_summary_data(
    data=ti_user_ip,
    user_column="UserPrincipalName",
    section="Threat intel reports for user sign-in IP address(es)"
)

df_caption(ti_user_ip, caption="Threat intel reports for risky sign-in IPs")

Observables processed: 100%|██████████| 6/6 [00:00<00:00, 600.07obs/s]


Unnamed: 0,UserPrincipalName,SourceIPs,QuerySubtype,Result,Details,RawResult,Reference,Status,Ioc,IocType,SafeIoc,Severity,Provider
2,tamuto@seccxpninja.onmicrosoft.com,20.25.98.192,,True,"{'summary': {'resolutions': 0, 'certificates': 0, 'malware_hashes': 0, 'projects': 0, 'articles': 30, 'total': 30, 'netblock': '20.0.0.0/11', 'os': 'n/a', 'asn': 'AS8075 - MICROSOFT-CORP-MSN-AS-BLOCK', 'hosting_provider': 'n/a', 'link': 'https://community.riskiq.com/search/20.25.98.192', 'links': {'resolutions': 'https://community.riskiq.com/search/20.25.98.192/resolutions', 'services': 'https://community.riskiq.com/search/20.25.98.192/services', 'certificates': 'https://community.riskiq.com/search/20.25.98.192/certificates', 'projects': 'https://community.riskiq.com/search/20.25.98.192/projects', 'articles': 'https://community.riskiq.com/research?query=20.25.98.192', 'trackers': 'https://community.riskiq.com/search/20.25.98.192/trackers', 'components': 'https://community.riskiq.com/search/20.25.98.192/components', 'host_pairs': 'https://community.riskiq.com/search/20.25.98.192/hostpairs', 'reverse_dns': 'https://community.riskiq.com/search/20.25.98.192/dns', 'cookies': 'https://community.riskiq.com/search/20.25.98.192/cookies', 'malware_hashes': 'https://community.riskiq.com/search/20.25.98.192/hashes'}, 'services': 0}, 'reputation': {'score': 4, 'classification': 'UNKNOWN', 'rules': []}}","{'summary': {'resolutions': 0, 'certificates': 0, 'malware_hashes': 0, 'projects': 0, 'articles': 30, 'total': 30, 'netblock': '20.0.0.0/11', 'os': 'n/a', 'asn': 'AS8075 - MICROSOFT-CORP-MSN-AS-BLOCK', 'hosting_provider': 'n/a', 'link': 'https://community.riskiq.com/search/20.25.98.192', 'links': {'resolutions': 'https://community.riskiq.com/search/20.25.98.192/resolutions', 'services': 'https://community.riskiq.com/search/20.25.98.192/services', 'certificates': 'https://community.riskiq.com/search/20.25.98.192/certificates', 'projects': 'https://community.riskiq.com/search/20.25.98.192/projects', 'articles': 'https://community.riskiq.com/research?query=20.25.98.192', 'trackers': 'https://community.riskiq.com/search/20.25.98.192/trackers', 'components': 'https://community.riskiq.com/search/20.25.98.192/components', 'host_pairs': 'https://community.riskiq.com/search/20.25.98.192/hostpairs', 'reverse_dns': 'https://community.riskiq.com/search/20.25.98.192/dns', 'cookies': 'https://community.riskiq.com/search/20.25.98.192/cookies', 'malware_hashes': 'https://community.riskiq.com/search/20.25.98.192/hashes'}, 'services': 0}, 'reputation': {'score': 4, 'classification': 'UNKNOWN', 'rules': []}}",https://community.riskiq.com,0,20.25.98.192,ipv4,20.25.98.192,high,RiskIQ


# Unusual Azure Audit entries

Look for operations in Azure audit for selected accounts
where account used operations type in the current time slot that
it had not used in the baseline period (default prior 30 days)

In [20]:
# Azure Audit
# Find any operation types for current period that weren't seen for
# that user in previous baseline period
azure_audit_query = """
let start = datetime("{start}");
let end = datetime("{end}");
let baseline_start = start - ({baseline_period} * 1d);
let bl_threshold = {threshold};
let operation_history = AuditLogs
| where TimeGenerated between(baseline_start .. start)
| where Identity !in ("Azure AD Cloud Sync", "Managed Service Identity", "Microsoft.Azure.SyncFabric")
| where bag_has_key(InitiatedBy, "user")
| extend UserPrincipalName = tostring(InitiatedBy["user"]["userPrincipalName"])
| where UserPrincipalName in~ ({users})
| summarize EventCount=count() by UserPrincipalName, OperationName
| where EventCount > bl_threshold;
AuditLogs
| where TimeGenerated between(end .. start)
| where Identity !in ("Azure AD Cloud Sync", "Managed Service Identity", "Microsoft.Azure.SyncFabric")
| where bag_has_key(InitiatedBy, "user")
| extend UserPrincipalName = tostring(InitiatedBy["user"]["userPrincipalName"]), IPAddress = InitiatedBy["user"]["ipAddress"]
| where UserPrincipalName in~ ({users})
| join kind=leftanti (operation_history) on UserPrincipalName, OperationName
| project Identity, UserPrincipalName, OperationName, LoggedByService, InitiatedBy, AdditionalDetails, TargetResources
"""

end = datetime.now(tz=timezone.utc)
start = end-timedelta(1)
from datetime import datetime, timezone, timedelta
fmt_query = azure_audit_query.format(
    start=start,
    end=end,
    baseline_period=baseline_period,
    threshold=0,
    users=get_user_param(risk_users_df),
)
az_audit_df = qry_prov.exec_query(fmt_query)
summary_report.add_summary_data(
    data=az_audit_df,
    user_column="UserPrincipalName",
    section="Unusual Azure Audit log entries for user"
)
df_caption(az_audit_df, caption="Azure audit activity types not seen in baseline period.")

Unnamed: 0,Identity,UserPrincipalName,OperationName,LoggedByService,InitiatedBy,AdditionalDetails,TargetResources


# New or unusual Office 365 activity

Office operations occurring in the measured period that had
not occurred or rarely occurred in the baseline period.

In [31]:
o365_baseline_activity_query = """
let num_stddev = {std_dev_scale};
let bl_period = datetime_add("day", -{baseline_period}, datetime({start}));
OfficeActivity
| where TimeGenerated between (bl_period .. datetime({start}))
| where UserId in~ ({users})
// count operations by user and op type per day
| summarize OpCount = count() by UserId, OfficeWorkload, Operation, bin(TimeGenerated, 1d)
// calculate mean and average values for the user/op combos
| summarize OpStdev = stdev(OpCount), OpMean = avg(OpCount) by UserId, OfficeWorkload, Operation
// Calculate a baseline score Mean + N StdDevs * StdDev (default to 1 if 0 variance)
| extend OpBase = OpMean + (num_stddev * iif(OpStdev > 0, OpStdev, 1.0))
| extend RecType="baseline"
"""

o365_current_activity_query = """
OfficeActivity
| where TimeGenerated between (datetime({start}) .. datetime({end}))
| where UserId in~ ({users})
| summarize OpCount = count() by UserId, OfficeWorkload, Operation
| extend RecType="current"
"""

# set number of std deviations from mean to use as indicating
# anomalous activity
_STD_THRESHOLD = 2

end = datetime.now(tz=timezone.utc)
start = end - timedelta(1)
office_baseline_df = qry_prov.exec_query(
    o365_baseline_activity_query.format(
        users=get_user_param(risk_users_df),
        std_dev_scale=_STD_THRESHOLD,
        start=start,
        baseline_period=baseline_period,
    )
)
office_current_df = qry_prov.exec_query(
    o365_current_activity_query.format(
        users=get_user_param(risk_users_df),
        start=start,
        end=end
    )
)

# Pull out any current activity that exceeds the baseline threshold (mean + N*stddev)
office_activity_df = (
    office_current_df
    .merge(office_baseline_df, on=["UserId", "OfficeWorkload", "Operation"], how="left")
    .fillna({"OpBase": 0})
    .query("OpCount > OpBase")
)

In [83]:
df_caption(office_baseline_df, "Office baseline operations.")
df_caption(office_current_df, "Office current operations.")
df_caption(office_activity_df, "Office anomalous operations.")
summary_report.add_summary_data(
    data=office_activity_df,
    user_column="UserId",
    section="Unusual Office activity for user"
)
summary_report.add_summary_data(
    data=office_current_df,
    user_column="UserId",
    section="Summarized current Office activity for user"
)

Unnamed: 0,UserId,OfficeWorkload,Operation,OpStdev,OpMean,OpBase,RecType
0,tamuto@seccxpninja.onmicrosoft.com,SharePoint,FileUploaded,0.188982,2.964286,3.34225,baseline
1,tamuto@seccxpninja.onmicrosoft.com,Exchange,MailItemsAccessed,2.572629,2.25,7.395258,baseline
2,tamuto@seccxpninja.onmicrosoft.com,Exchange,Create,0.0,1.0,3.0,baseline
3,tamuto@seccxpninja.onmicrosoft.com,SharePoint,FileAccessed,0.0,2.0,4.0,baseline


Unnamed: 0,UserId,OfficeWorkload,Operation,OpCount,RecType
0,tamuto@seccxpninja.onmicrosoft.com,Exchange,MailItemsAccessed,2,current
1,tamuto@seccxpninja.onmicrosoft.com,SharePoint,FileUploaded,3,current


Unnamed: 0,UserId,OfficeWorkload,Operation,OpCount,RecType_x,OpStdev,OpMean,OpBase,RecType_y


# Unusual Azure activity

Azure activity operations occurring in the measured period that had
not occurred in the baseline period.


In [85]:
azure_activity_df.columns

Index(['TimeGenerated', 'UserPrincipalName', 'OperationNameValue', 'IPAddress',
       'EventDataId', 'ActivityStatusValue', 'ResourceGroup', 'SubscriptionId',
       'TenantId'],
      dtype='object')

In [90]:
# Azure Activity
azure_activity_query = """
let start = datetime("{start}");
let end = datetime("{end}");
let baseline_start = start - ({period} * 1d);
let bl_threshold = {threshold};
let operation_history = AzureActivity
| where TimeGenerated between(baseline_start .. start)
| where Caller in~ ({users})
| project UserPrincipalName=Caller, OperationNameValue
| summarize EventCount=count() by UserPrincipalName, OperationNameValue
| where EventCount > bl_threshold;
AzureActivity
| where TimeGenerated between(start .. end)
| where Caller in~ ({users})
| project-rename UserPrincipalName=Caller
| join kind=leftanti (operation_history) on UserPrincipalName, OperationNameValue
| project TimeGenerated, UserPrincipalName, OperationNameValue, IPAddress=CallerIpAddress,
  EventDataId, ActivityStatusValue, ResourceGroup, SubscriptionId, TenantId
"""

fmt_query = azure_activity_query.format(
    end=datetime.now(tz=timezone.utc),
    start=end-timedelta(1),
    period=28,
    threshold=0,
    users=get_user_param(risk_users_df),
)
azure_activity_df = qry_prov.exec_query(fmt_query)

aa_summary_cols = [
    "UserPrincipalName",
    "OperationNameValue",
    "IPAddress",
    "ResourceGroup",
    "SubscriptionId",
    "TenantId",
]

azure_activity_summary_df = azure_activity_df.groupby(aa_summary_cols).agg(
    EventCount=pd.NamedAgg("TimeGenerated", "count"),
    ActivityStatusValue=pd.NamedAgg("ActivityStatusValue", "unique"),
    StartTime=pd.NamedAgg("TimeGenerated", "min"),
    EndTime=pd.NamedAgg("TimeGenerated", "max"),
).reset_index().sort_values("StartTime", ascending=True)

summary_report.add_summary_data(
    data=azure_activity_summary_df,
    user_column="UserPrincipalName",
    section="Unusual Azure activity for user"
)
df_caption(azure_activity_summary_df, "Azure activity operations not seen in baseline period.")

Unnamed: 0,UserPrincipalName,OperationNameValue,IPAddress,ResourceGroup,SubscriptionId,TenantId,EventCount,ActivityStatusValue,StartTime,EndTime
3,tamuto@seccxpninja.onmicrosoft.com,MICROSOFT.AUTOMATION/AUTOMATIONACCOUNTS/HYBRIDRUNBOOKWORKERGROUPS/GETWORKERCOUNT/ACTION,147.243.27.203,SIMULAND,d1d8779d-38d7-4f06-91db-9cbc8de0176f,8ecf8077-cf51-4820-aadd-14040956f35d,10,['Success' 'Start'],2023-02-01 05:52:53.605463700+00:00,2023-02-01 06:00:53.770754+00:00
6,tamuto@seccxpninja.onmicrosoft.com,MICROSOFT.AUTOMATION/AUTOMATIONACCOUNTS/HYBRIDRUNBOOKWORKERGROUPS/HYBRIDRUNBOOKWORKERS/WRITE,147.243.27.203,SIMULAND,d1d8779d-38d7-4f06-91db-9cbc8de0176f,8ecf8077-cf51-4820-aadd-14040956f35d,2,['Success' 'Start'],2023-02-01 06:00:13.837572300+00:00,2023-02-01 06:00:15.603256700+00:00
12,tamuto@seccxpninja.onmicrosoft.com,MICROSOFT.COMPUTE/VIRTUALMACHINES/EXTENSIONS/WRITE,147.243.27.203,SIMULAND,d1d8779d-38d7-4f06-91db-9cbc8de0176f,8ecf8077-cf51-4820-aadd-14040956f35d,2,['Accept' 'Start'],2023-02-01 06:00:15.803323200+00:00,2023-02-01 06:00:21.787995800+00:00
11,tamuto@seccxpninja.onmicrosoft.com,MICROSOFT.COMPUTE/VIRTUALMACHINES/EXTENSIONS/DELETE,147.243.27.203,SIMULAND,d1d8779d-38d7-4f06-91db-9cbc8de0176f,8ecf8077-cf51-4820-aadd-14040956f35d,7,['Success' 'Accept' 'Start'],2023-02-01 06:00:19.999565200+00:00,2023-02-01 06:12:47.627469900+00:00
5,tamuto@seccxpninja.onmicrosoft.com,MICROSOFT.AUTOMATION/AUTOMATIONACCOUNTS/HYBRIDRUNBOOKWORKERGROUPS/HYBRIDRUNBOOKWORKERS/DELETE,147.243.27.203,SIMULAND,d1d8779d-38d7-4f06-91db-9cbc8de0176f,8ecf8077-cf51-4820-aadd-14040956f35d,4,['Success' 'Start' 'Failure'],2023-02-01 06:00:52.906841700+00:00,2023-02-01 06:01:17.421008100+00:00
20,tamuto@seccxpninja.onmicrosoft.com,MICROSOFT.NETWORK/NETWORKINTERFACES/EFFECTIVENETWORKSECURITYGROUPS/ACTION,118.200.55.233,SIMULAND,d1d8779d-38d7-4f06-91db-9cbc8de0176f,8ecf8077-cf51-4820-aadd-14040956f35d,18,['Accept' 'Start' 'Success'],2023-02-01 06:02:56.308497800+00:00,2023-02-01 12:52:20.724925900+00:00
24,tamuto@seccxpninja.onmicrosoft.com,MICROSOFT.RECOVERYSERVICES/LOCATIONS/BACKUPSTATUS/ACTION,118.200.55.233,SIMULAND,d1d8779d-38d7-4f06-91db-9cbc8de0176f,8ecf8077-cf51-4820-aadd-14040956f35d,4,['Success' 'Start'],2023-02-01 06:03:00.808491100+00:00,2023-02-01 06:04:01.636515100+00:00
4,tamuto@seccxpninja.onmicrosoft.com,MICROSOFT.AUTOMATION/AUTOMATIONACCOUNTS/HYBRIDRUNBOOKWORKERGROUPS/GETWORKERCOUNT/ACTION,147.243.28.22,SIMULAND,d1d8779d-38d7-4f06-91db-9cbc8de0176f,8ecf8077-cf51-4820-aadd-14040956f35d,10,['Success' 'Start'],2023-02-01 07:05:00.205528100+00:00,2023-02-01 10:46:02.598749700+00:00
7,tamuto@seccxpninja.onmicrosoft.com,MICROSOFT.AUTOMATION/AUTOMATIONACCOUNTS/HYBRIDRUNBOOKWORKERGROUPS/HYBRIDRUNBOOKWORKERS/WRITE,147.243.28.22,SIMULAND,d1d8779d-38d7-4f06-91db-9cbc8de0176f,8ecf8077-cf51-4820-aadd-14040956f35d,2,['Success' 'Start'],2023-02-01 07:05:49.958148300+00:00,2023-02-01 07:05:52.100110600+00:00
13,tamuto@seccxpninja.onmicrosoft.com,MICROSOFT.COMPUTE/VIRTUALMACHINES/EXTENSIONS/WRITE,147.243.28.22,SIMULAND,d1d8779d-38d7-4f06-91db-9cbc8de0176f,8ecf8077-cf51-4820-aadd-14040956f35d,5,['Success' 'Accept' 'Start'],2023-02-01 07:05:52.301304600+00:00,2023-02-01 07:18:07.123937800+00:00


# Summarizing data

Create dynamic summaries for each user and upload to sentinel

> Note: we could offer the option to group by report type instead
> of user. That would result in a Dynamic Summary entry for each
> report type (with consistent schema) but with data from (potentially)
> multiple users.

In [110]:
# Iterate through summary reports and create a summary for each user

dynamic_summaries = []
for user, reports in summary_report._summary_reports.items():
    # Create a summary for each user
    user_ds = DynamicSummary(
        summary_name=f"AccountEvaluation - {user}",
        summary_description="Summary generated from AccountSignInEvaluation notebook.",
        source_info="AccountSignInEvaluation.ipynb"
    )

    for report_type, summary_item in reports.items():
        ds_item_params = {
            "event_time_utc": end,
            "search_key": user,
            "observable_type": "report_type",
            "observable_value": report_type
        }
        user_ds.add_summary_items(
            data=summary_item.data,
            **ds_item_params
        )
    dynamic_summaries.append(user_ds)

for dyn_summary in dynamic_summaries:
    # Create or update the report
    # sentinel.create_dynamic_summary(dyn_summary)
    print(dyn_summary.summary_name)


AccountEvaluation - tamuto@seccxpninja.onmicrosoft.com


## Appendix - Pickling and restoring data

In [111]:
import pickle
obj = pickle.dumps(dynamic_summaries)

with open("acct_nb_summaries.pkl", "wb") as pickle_file:
    pickle_file.write(obj)

In [112]:
# note - you need to have the DynamicSummary class imported
from msticpy.context.azure.sentinel_dynamic_summary import DynamicSummary
# defined (see earlier in the notebook) to successfully restore
# the summary report
with open("acct_nb_summaries.pkl", "rb") as pickle_file:
    summary_obj = pickle_file.read()
    dynamic_summaries_copy = pickle.loads(obj)

print([ds.summary_name for ds in dynamic_summaries_copy])

['AccountEvaluation - tamuto@seccxpninja.onmicrosoft.com']
