# Entity Explorer - Account
 <details>
     <summary>&nbsp;<u>Details...</u></summary>

 **Notebook Version:** 1.0<br>
 **Python Version:** Python 3.6 (including Python 3.6 - AzureML)<br>
 **Required Packages**: kqlmagic, msticpy, pandas, numpy, matplotlib, networkx, ipywidgets, ipython, dnspython, ipwhois, folium, maxminddb_geolite2<br>

 **Data Sources Required**:
 - Log Analytics - SecurityAlert, SecurityEvent, HuntingBookmark, Syslog, AAD SigninLogs, AzureActivity, OfficeActivity, ThreatIndicator
 - (Optional) - VirusTotal, AlienVault OTX, IBM XForce, Open Page Rank, (all require accounts and API keys)
 </details>

 Brings together a series of queries and visualizations to help you determine the security state of an Account. The account can be a Windows or Linux account or an Azure Active Directory/Office 365 account.

The Notebook contains sections for reviewing activity for Host accounts (Linux and Windows) and for Azure Active Directory accounts. It also has a general section that looks for related items independent of the account type.

<h1>Contents<span class="tocSkip"></span></h1>
<div class="toc"><ul class="toc-item"><li><span><a href="#Hunting-Hypothesis" data-toc-modified-id="Hunting-Hypothesis-1"><span class="toc-item-num">1&nbsp;&nbsp;</span>Hunting Hypothesis</a></span><ul class="toc-item"><li><span><a href="#Notebook-initialization" data-toc-modified-id="Notebook-initialization-1.1"><span class="toc-item-num">1.1&nbsp;&nbsp;</span>Notebook initialization</a></span></li><li><span><a href="#Get-Workspace-and-Authenticate" data-toc-modified-id="Get-Workspace-and-Authenticate-1.2"><span class="toc-item-num">1.2&nbsp;&nbsp;</span>Get Workspace and Authenticate</a></span><ul class="toc-item"><li><span><a href="#Authentication-and-Configuration-Problems" data-toc-modified-id="Authentication-and-Configuration-Problems-1.2.1"><span class="toc-item-num">1.2.1&nbsp;&nbsp;</span>Authentication and Configuration Problems</a></span></li></ul></li></ul></li><li><span><a href="#Enter-account-name-and-query-time-window" data-toc-modified-id="Enter-account-name-and-query-time-window-2"><span class="toc-item-num">2&nbsp;&nbsp;</span>Enter account name and query time window</a></span></li><li><span><a href="#Data-Sources-available-to-query" data-toc-modified-id="Data-Sources-available-to-query-3"><span class="toc-item-num">3&nbsp;&nbsp;</span>Data Sources available to query</a></span></li><li><span><a href="#Search-for-Account-Name-in-Host,-Azure-Active-Directory-(AAD),-Azure-and-Office-365-Data." data-toc-modified-id="Search-for-Account-Name-in-Host,-Azure-Active-Directory-(AAD),-Azure-and-Office-365-Data.-4"><span class="toc-item-num">4&nbsp;&nbsp;</span>Search for Account Name in Host, Azure Active Directory (AAD), Azure and Office 365 Data.</a></span><ul class="toc-item"><li><span><a href="#Query-Data-Sources" data-toc-modified-id="Query-Data-Sources-4.1"><span class="toc-item-num">4.1&nbsp;&nbsp;</span>Query Data Sources</a></span></li></ul></li><li><span><a href="#Display-logons-from-account-sources-and-choose-an-account-to-explore" data-toc-modified-id="Display-logons-from-account-sources-and-choose-an-account-to-explore-5"><span class="toc-item-num">5&nbsp;&nbsp;</span>Display logons from account sources and choose an account to explore</a></span></li><li><span><a href="#Related-Alerts-and-Hunting-Bookmarks" data-toc-modified-id="Related-Alerts-and-Hunting-Bookmarks-6"><span class="toc-item-num">6&nbsp;&nbsp;</span>Related Alerts and Hunting Bookmarks</a></span><ul class="toc-item"><li><span><a href="#Alerts" data-toc-modified-id="Alerts-6.1"><span class="toc-item-num">6.1&nbsp;&nbsp;</span>Alerts</a></span></li><li><span><a href="#Hunting/Investigation-Bookmarks" data-toc-modified-id="Hunting/Investigation-Bookmarks-6.2"><span class="toc-item-num">6.2&nbsp;&nbsp;</span>Hunting/Investigation Bookmarks</a></span></li></ul></li><li><span><a href="#Further-Investigation" data-toc-modified-id="Further-Investigation-7"><span class="toc-item-num">7&nbsp;&nbsp;</span>Further Investigation</a></span></li><li><span><a href="#Windows-Host" data-toc-modified-id="Windows-Host-8"><span class="toc-item-num">8&nbsp;&nbsp;</span>Windows Host</a></span><ul class="toc-item"><li><span><a href="#Host-Logon-Summary" data-toc-modified-id="Host-Logon-Summary-8.1"><span class="toc-item-num">8.1&nbsp;&nbsp;</span>Host Logon Summary</a></span></li><li><span><a href="#Threat-Intelligence-for-logon-IP-Addresses" data-toc-modified-id="Threat-Intelligence-for-logon-IP-Addresses-8.2"><span class="toc-item-num">8.2&nbsp;&nbsp;</span>Threat Intelligence for logon IP Addresses</a></span></li><li><span><a href="#Geolocation-and-ownership-for-source-logon-IP-addresses" data-toc-modified-id="Geolocation-and-ownership-for-source-logon-IP-addresses-8.3"><span class="toc-item-num">8.3&nbsp;&nbsp;</span>Geolocation and ownership for source logon IP addresses</a></span></li><li><span><a href="#Additional-Alerts-for-logged-on-hosts" data-toc-modified-id="Additional-Alerts-for-logged-on-hosts-8.4"><span class="toc-item-num">8.4&nbsp;&nbsp;</span>Additional Alerts for logged-on hosts</a></span><ul class="toc-item"><li><span><a href="#Additional-alerts-for-source-IP-addresses" data-toc-modified-id="Additional-alerts-for-source-IP-addresses-8.4.1"><span class="toc-item-num">8.4.1&nbsp;&nbsp;</span>Additional alerts for source IP addresses</a></span></li></ul></li><li><span><a href="#Additional-Investigation-Bookmarks-for-logged-on-hosts" data-toc-modified-id="Additional-Investigation-Bookmarks-for-logged-on-hosts-8.5"><span class="toc-item-num">8.5&nbsp;&nbsp;</span>Additional Investigation Bookmarks for logged-on hosts</a></span></li></ul></li><li><span><a href="#Linux-Host" data-toc-modified-id="Linux-Host-9"><span class="toc-item-num">9&nbsp;&nbsp;</span>Linux Host</a></span><ul class="toc-item"><li><span><a href="#Host-Logon-Summary" data-toc-modified-id="Host-Logon-Summary-9.1"><span class="toc-item-num">9.1&nbsp;&nbsp;</span>Host Logon Summary</a></span></li><li><span><a href="#Threat-Intelligence-for-logon-IP-Addresses" data-toc-modified-id="Threat-Intelligence-for-logon-IP-Addresses-9.2"><span class="toc-item-num">9.2&nbsp;&nbsp;</span>Threat Intelligence for logon IP Addresses</a></span></li><li><span><a href="#Geolocation-and-ownership-for-source-logon-IP-addresses" data-toc-modified-id="Geolocation-and-ownership-for-source-logon-IP-addresses-9.3"><span class="toc-item-num">9.3&nbsp;&nbsp;</span>Geolocation and ownership for source logon IP addresses</a></span></li><li><span><a href="#Additional-Alerts-for-logged-on-hosts" data-toc-modified-id="Additional-Alerts-for-logged-on-hosts-9.4"><span class="toc-item-num">9.4&nbsp;&nbsp;</span>Additional Alerts for logged-on hosts</a></span><ul class="toc-item"><li><span><a href="#Additional-alerts-for-source-IP-addresses" data-toc-modified-id="Additional-alerts-for-source-IP-addresses-9.4.1"><span class="toc-item-num">9.4.1&nbsp;&nbsp;</span>Additional alerts for source IP addresses</a></span></li></ul></li><li><span><a href="#Additional-Investigation-Bookmarks-for-logged-on-hosts" data-toc-modified-id="Additional-Investigation-Bookmarks-for-logged-on-hosts-9.5"><span class="toc-item-num">9.5&nbsp;&nbsp;</span>Additional Investigation Bookmarks for logged-on hosts</a></span></li></ul></li><li><span><a href="#AAD/Office-Account" data-toc-modified-id="AAD/Office-Account-10"><span class="toc-item-num">10&nbsp;&nbsp;</span>AAD/Office Account</a></span><ul class="toc-item"><li><span><a href="#Azure/Office-Summary" data-toc-modified-id="Azure/Office-Summary-10.1"><span class="toc-item-num">10.1&nbsp;&nbsp;</span>Azure/Office Summary</a></span></li><li><span><a href="#Threat-Intelligence-for-IP-Addresses" data-toc-modified-id="Threat-Intelligence-for-IP-Addresses-10.2"><span class="toc-item-num">10.2&nbsp;&nbsp;</span>Threat Intelligence for IP Addresses</a></span></li><li><span><a href="#Geolocation-and-ownership-for-source-IP-addresses" data-toc-modified-id="Geolocation-and-ownership-for-source-IP-addresses-10.3"><span class="toc-item-num">10.3&nbsp;&nbsp;</span>Geolocation and ownership for source IP addresses</a></span></li><li><span><a href="#Additional-alerts-for-source-IP-addresses" data-toc-modified-id="Additional-alerts-for-source-IP-addresses-10.4"><span class="toc-item-num">10.4&nbsp;&nbsp;</span>Additional alerts for source IP addresses</a></span></li></ul></li><li><span><a href="#Appendices" data-toc-modified-id="Appendices-11"><span class="toc-item-num">11&nbsp;&nbsp;</span>Appendices</a></span><ul class="toc-item"><li><span><a href="#Available-DataFrames" data-toc-modified-id="Available-DataFrames-11.1"><span class="toc-item-num">11.1&nbsp;&nbsp;</span>Available DataFrames</a></span></li><li><span><a href="#Saving-data-to-Excel" data-toc-modified-id="Saving-data-to-Excel-11.2"><span class="toc-item-num">11.2&nbsp;&nbsp;</span>Saving data to Excel</a></span></li></ul></li><li><span><a href="#Configuration" data-toc-modified-id="Configuration-12"><span class="toc-item-num">12&nbsp;&nbsp;</span>Configuration</a></span><ul class="toc-item"><li><span><a href="#msticpyconfig.yaml-configuration-File" data-toc-modified-id="msticpyconfig.yaml-configuration-File-12.1"><span class="toc-item-num">12.1&nbsp;&nbsp;</span><code>msticpyconfig.yaml</code> configuration File</a></span></li></ul></li></ul></div>

## Hunting Hypothesis
Our broad initial hunting hypothesis is that a we have received account name entity which is suspected to be compromised and is being used malicious manner in internal networks, we will need to hunt from a range of different positions to validate or disprove this hypothesis.

---
### Notebook initialization
The next cell:
- Checks for the correct Python version
- Checks versions and optionally installs required packages
- Imports the required packages into the notebook
- Sets a number of configuration options.

This should complete without errors. If you encounter errors or warnings look at the following two notebooks:
- [TroubleShootingNotebooks](https://github.com/Azure/Azure-Sentinel-Notebooks/blob/master/TroubleShootingNotebooks.ipynb)
- [ConfiguringNotebookEnvironment](https://github.com/Azure/Azure-Sentinel-Notebooks/blob/master/ConfiguringNotebookEnvironment.ipynb)

If you are running in the Microsoft Sentinel Notebooks environment (Azure Notebooks or Azure ML) you can run live versions of these notebooks:
- [Run TroubleShootingNotebooks](./TroubleShootingNotebooks.ipynb)
- [Run ConfiguringNotebookEnvironment](./ConfiguringNotebookEnvironment.ipynb)

You may also need to do some additional configuration to successfully use functions such as Threat Intelligence service lookup and Geo IP lookup. 
There are more details about this in the `ConfiguringNotebookEnvironment` notebook and in these documents:
- [msticpy configuration](https://msticpy.readthedocs.io/en/latest/getting_started/msticpyconfig.html)
- [Threat intelligence provider configuration](https://msticpy.readthedocs.io/en/latest/data_acquisition/TIProviders.html#configuration-file)


In [None]:
from pathlib import Path
from IPython.display import display, HTML

REQ_PYTHON_VER = "3.6"
REQ_MSTICPY_VER = "1.0.0"

display(HTML("<h3>Starting Notebook setup...</h3>"))

# If not using Azure Notebooks, install msticpy with
# %pip install msticpy

from msticpy.nbtools import nbinit
nbinit.init_notebook(
    namespace=globals(),
    extra_imports=["ipwhois, IPWhois"]
);

### Get Workspace and Authenticate
<details>
    <summary>&nbsp;<u>Details...</u></summary>
If you are using user/device authentication, run the following cell. 
- Click the 'Copy code to clipboard and authenticate' button.
- This will pop up an Azure Active Directory authentication dialog (in a new tab or browser window). The device code will have been copied to the clipboard. 
- Select the text box and paste (Ctrl-V/Cmd-V) the copied value. 
- You should then be redirected to a user authentication page where you should authenticate with a user account that has permission to query your Log Analytics workspace.

Use the following syntax if you are authenticating using an Azure Active Directory AppId and Secret:
```
%kql loganalytics://tenant(aad_tenant).workspace(WORKSPACE_ID).clientid(client_id).clientsecret(client_secret)
```
instead of
```
%kql loganalytics://code().workspace(WORKSPACE_ID)
```

Note: you may occasionally see a JavaScript warning displayed at the end of the authentication - you can safely ignore this.<br>
On successful authentication you should see a ```popup schema``` button.
To find your Workspace Id go to [Log Analytics](https://ms.portal.azure.com/#blade/HubsExtension/Resources/resourceType/Microsoft.OperationalInsights%2Fworkspaces). Look at the workspace properties to find the ID.
</details>

In [None]:
# See if we have a Microsoft Sentinel Workspace defined in our config file.
# If not, let the user specify Workspace and Tenant IDs

ws_config = WorkspaceConfig()
if not ws_config.config_loaded:
    ws_config.prompt_for_ws()
    
qry_prov = QueryProvider(data_environment="AzureSentinel")

In [None]:
# Authenticate to Microsoft Sentinel workspace
qry_prov.connect(ws_config)
table_index = qry_prov.schema_tables

#### Authentication and Configuration Problems

<br>
<details>
    <summary>Click for details about configuring your authentication parameters</summary>
    
    
The notebook is expecting your Microsoft Sentinel Tenant ID and Workspace ID to be configured in one of the following places:
- `config.json` in the current folder
- `msticpyconfig.yaml` in the current folder or location specified by `MSTICPYCONFIG` environment variable.
    
For help with setting up your `config.json` file (if this hasn't been done automatically) see the [`ConfiguringNotebookEnvironment`](https://github.com/Azure/Azure-Sentinel-Notebooks/blob/master/ConfiguringNotebookEnvironment.ipynb) notebook in the root folder of your Microsoft-Sentinel-Notebooks project. This shows you how to obtain your Workspace and Subscription IDs from the Microsoft Sentinel Portal. You can use the SubscriptionID to find your Tenant ID). To view the current `config.json` run the following in a code cell.

```%pfile config.json```

For help with setting up your `msticpyconfig.yaml` see the [Setup](#Setup) section at the end of this notebook and the [ConfigureNotebookEnvironment notebook](https://github.com/Azure/Azure-Sentinel-Notebooks/blob/master/ConfiguringNotebookEnvironment.ipynb)
</details>

## Enter account name and query time window
Type the account name that you want to search for and the time bounds over which you want to search. 

You can specify the account as:

- a simple user name (e.g. `alice`)
- a user principal name (`alice@contoso.com`)
- a qualified windows user name `mydomain\alice`

In the second two cases the domain qualifier will be stripped off before the search. The search is not case sensitive and will match full substrings. E.g. `bob` will match `domain\bob` and `bob@contoso.com` but not `bobg` or `bo`.

In [None]:
WIDGET_DEFAULTS = {
    "layout": widgets.Layout(width="95%"),
    "style": {"description_width": "initial"},
}
accountname_text = widgets.Text(description='Enter the Account name to search for:', **WIDGET_DEFAULTS)
display(accountname_text)

In [None]:
query_times = nbwidgets.QueryTime(units='day', max_before=200, before=5, max_after=7)
query_times.display()

In [None]:
# Set up function to allow easy reference to common parameters for queries
def acct_query_params():
    return {
        "start": query_times.start,
        "end": query_times.end,
        "account_name": accountname_text.value,
    }

## Data Sources available to query
This shows all of the tables in the workspace with a string matching the account name entered.
Note that these matches may be accidental and not necessarily relate to the account that you are interested in.

> **Note**: "search" queries can be long-running and resource intensive. Feel free to skip this

In [None]:
# KQL query for full text search of IP address and display all datatypes 
datasource_status = '''
search \'{account_name}\'
| where TimeGenerated >= datetime({start}) and TimeGenerated <= datetime({end})
| summarize RowCount=count() by Table=$table
'''.format(**acct_query_params())
datasource_status_df = qry_prov.exec_query(datasource_status)

#Display result as transposed matrix of datatypes availabel to query for the query period 
if len(datasource_status_df) > 0:
    display(Markdown("###  <span style='color:blue'> "
                     + "Datasources available to query for Account "
                     + f"*{acct_query_params()['account_name']}* </span>"))
    display(datasource_status_df)
else:
    display(Markdown(f'### <span style="color:orange"> No datasources available to query for the query period </span>'))

## Search for Account Name in Host, Azure Active Directory (AAD), Azure and Office 365 Data.

This section searches for activity related to the account name entered earlier. It looks for the most recent activity in the following sources:
- Azure Active Directory Signin logs
- Azure Activity log
- Office Activity log
- Windows Security events for logon and logon failure
- Linux Security events for logons

### Query Data Sources

In [None]:
pd.set_option("display.html.table_schema", False)

# AAD
md("Searching for AAD activity...")
summarize_clause = """
| summarize arg_max(TimeGenerated, *) by UserPrincipalName, OperationName, 
  Identity, IPAddress, tostring(LocationDetails)
| project TimeGenerated, UserPrincipalName, Identity, IPAddress, LocationDetails"""

aad_signin_df = (qry_prov.Azure
                 .list_aad_signins_for_account(**acct_query_params(),
                                               add_query_items=summarize_clause)
                )

md("Searching for Azure activity...")
# Azure Activity
summarize_clause = """
| summarize arg_max(TimeGenerated, *) by Caller, OperationName, 
  CallerIpAddress, ResourceId
| project TimeGenerated, UserPrincipalName=Caller, IPAddress=CallerIpAddress"""

azure_activity_df = (qry_prov.Azure
                     .list_azure_activity_for_account(**acct_query_params(),
                                                      add_query_items=summarize_clause)
                    )

md("Searching for Office365 activity...")
# Office Activity
summarize_clause = """
| project TimeGenerated, UserId = tolower(UserId), OfficeWorkload, Operation, ClientIP, UserType
| summarize arg_max(TimeGenerated, *) by UserId, OfficeWorkload, ClientIP
| order by TimeGenerated desc"""

o365_activity_df = (qry_prov.Office365
                    .list_activity_for_account(**acct_query_params(),
                                               add_query_items=summarize_clause)
                    )

md("Searching for Windows logon activity...")
# Windows Host
summarize_clause = """
| extend LogonStatus = iff(EventID == 4624, "success", "failed")
| project TimeGenerated, TargetUserName, TargetDomainName, Computer, LogonType, SubjectUserName, 
  SubjectDomainName, TargetUserSid, EventID, IpAddress, LogonStatus 
| summarize arg_max(TimeGenerated, *) by TargetUserName, TargetDomainName, LogonType, Computer, LogonStatus"""

win_logon_df = (qry_prov.WindowsSecurity
                .list_logon_attempts_by_account(**acct_query_params(),
                                                add_query_items=summarize_clause)
               )

md("Searching for Linux logon activity...")
# Linux host
summarize_clause = """
| summarize arg_max(TimeGenerated, *) by LogonType, SourceIP, Computer, LogonResult"""

linux_logon_df = (qry_prov.LinuxSyslog
                  .list_logons_for_account(**acct_query_params(),
                                           add_query_items=summarize_clause)
                 )

rec_count = (
    len(aad_signin_df) + len(azure_activity_df) 
    + len(o365_activity_df) + len(win_logon_df) 
    + len(linux_logon_df)
)
md(f"Found {rec_count} records...")
md(f"  {len(aad_signin_df)} records in AAD")
md(f"  {len(azure_activity_df)} records in Azure Activity")
md(f"  {len(o365_activity_df)} records in Office Activity")
md(f"  {len(win_logon_df)} records in Windows logon data")
md(f"  {len(linux_logon_df)} records in Linux logon data")

## Display logons from account sources and choose an account to explore
If any records were found in the previous search these will be displayed in a selection list. You can filter this list to reduce the number of items shown. Unique combinations of Account name and activity source are displayed in this list. So yo may see the same account listed against multiple activity types. E.g. an entry for alex@xyz.com for Office O365Activity and alex@xyz.com for AADLogon.

As you select each account, the records from the previous search are displayed.

Following this selection list there is a general section (applicable to accounts from all sources) and sections that are specific to account in specific domains (Linux, Windows or Azure/Office).

Choosing an account affects which later parts of the notebook are applicable. For example if the account chosen is from a Linux logon, only the Linux section will be applicable but Windows and AAD/Office will not. If you have multiple accounts listed you can come back and chose a different account and re-run the later parts of the notebook on each account.


In [None]:
from collections import namedtuple
AccountDFs = namedtuple("AccountDFs", ["linux", "windows", "aad", "azure", "o365"])
account_dfs = AccountDFs(
    linux=linux_logon_df,
    windows=win_logon_df,
    aad=aad_signin_df,
    azure=azure_activity_df,
    o365=o365_activity_df,
)

# Combine into single data frame

lx_df = (linux_logon_df[["AccountName", "TimeGenerated"]]
        .groupby("AccountName")
        .max()
        .reset_index()
        .assign(Source="LinuxHostLogon"))

win_df = (win_logon_df[["TargetUserName", "TimeGenerated"]]
          .groupby("TargetUserName")
          .max()
          .reset_index()
          .rename(columns={"TargetUserName": "AccountName"})
          .assign(Source="WindowsHostLogon"))

o365_df = (o365_activity_df[["UserId", "TimeGenerated"]]
           .groupby("UserId")
           .max()
           .reset_index()
           .rename(columns={"UserId": "AccountName"})
           .assign(Source="O365Activity"))

aad_df = (aad_signin_df[["UserPrincipalName", "TimeGenerated"]]
          .groupby("UserPrincipalName")
          .max()
          .reset_index()
          .rename(columns={"UserPrincipalName": "AccountName"})
          .assign(Source="AADLogon"))

azure_df = (azure_activity_df[["UserPrincipalName", "TimeGenerated"]]
            .groupby("UserPrincipalName")
            .max()
            .reset_index()
            .rename(columns={"UserPrincipalName": "AccountName"})
            .assign(Source="AzureActivity"))


all_sources_df = pd.concat([lx_df, win_df, o365_df, aad_df, azure_df])


# Display the results that we've found
format_tuple = (lambda x: 
                (x.AccountName + "   " + x.Source
                 + " (Last activity: " + str(x.TimeGenerated) + ")",
                 x.AccountName + " " + x.Source))
accts_dict = {item[0]: item[1] for item in all_sources_df.apply(format_tuple, axis=1)}


def display_activity(selected_item):
    acct, source = selected_account(selected_item)
    outputs = []
    title = HTML(f"<b>{acct} (source: {source})</b>")
    outputs.append(title)
    if source == "LinuxHostLogon":
        outputs.append(
            linux_logon_df[linux_logon_df["AccountName"] == acct]
            .sort_values("TimeGenerated", ascending=True)
        )
    if source == "WindowsHostLogon":
        outputs.append(
            win_logon_df[win_logon_df["TargetUserName"] == acct]
            .sort_values("TimeGenerated", ascending=True)
        )
    if source == "AADLogon":
        outputs.append(
            aad_signin_df[aad_signin_df["UserPrincipalName"] == acct]
            .sort_values("TimeGenerated", ascending=True)
        )
    if source == "AzureActivity":
        outputs.append(
            azure_activity_df[azure_activity_df["UserPrincipalName"] == acct]
            .sort_values("TimeGenerated", ascending=True)
        )
    if source == "O365Activity":
        outputs.append(
            o365_activity_df[o365_activity_df["UserId"] == acct]
            .sort_values("TimeGenerated", ascending=True)
        )
    return outputs

def selected_account(selected_acct):
    if not selected_acct:
        return "", ""
    acct, source = selected_acct.split(" ")
    return acct, source

select_acct = nbwidgets.SelectItem(
    item_dict=accts_dict,
    auto_display=True,
    description="Select an account to explore",
    action=display_activity,
    height="200px",
    width="100%")

## Related Alerts and Hunting Bookmarks
### Alerts
Any alerts with a matching account name are shown here. Select an alert to view the contents.

In [None]:
account_name, account_source = selected_account(select_acct.value)
related_alerts = qry_prov.SecurityAlert.list_related_alerts(
    **acct_query_params()
)

def print_related_alerts(alertDict, entityType, entityName):
    if len(alertDict) > 0:
        md(f"Found {len(alertDict)} different alert types related to this {entityType} (`{entityName}`)",
           "large, bold"
        )
        for (k, v) in alertDict.items():
            print(f"- {k}, # Alerts: {v}")
    else:
        md(f"No alerts for {entityType} entity `{entityName}`")


if isinstance(related_alerts, pd.DataFrame) and not related_alerts.empty:
    alert_items = (
        related_alerts[["AlertName", "TimeGenerated"]]
        .groupby("AlertName")
        .TimeGenerated.agg("count")
        .to_dict()
    )
    print_related_alerts(alert_items, "account", account_name)
    nbdisplay.display_timeline(
        data=related_alerts, title="Alerts", source_columns=["AlertName"], height=200
    )
else:
    display(Markdown("No related alerts found."))

def disp_full_alert(alert):
    global related_alert
    related_alert = SecurityAlert(alert)
    return nbdisplay.format_alert(related_alert, show_entities=True)

if related_alerts is not None and not related_alerts.empty:
    related_alerts["CompromisedEntity"] = related_alerts["src_accountname"]
    display(Markdown("### Click on alert to view details."))
    rel_alert_select = nbwidgets.SelectAlert(
        alerts=related_alerts,
        action=disp_full_alert,
    )
    rel_alert_select.display()

### Hunting/Investigation Bookmarks
Any bookmarks created that reference the selected account are shown here. Select a bookmark to view the contents.

In [None]:
acct_name = acct_query_params()["account_name"]
related_bkmark_df = qry_prov.AzureSentinel.list_bookmarks_for_entity(
    **acct_query_params(), entity_id=acct_name
)

def print_related_bkmk(bookmarks, entityType, entityName):
    if len(bookmarks) > 0:
        md(f"Found {len(bookmarks)} different bookmarks related to this {entityType} (`{entityName}`)",
           "large, bold"
        )
    else:
        md(f"No alerts for {entityType} entity `{entityName}`")


if isinstance(related_bkmark_df, pd.DataFrame) and not related_bkmark_df.empty:
    bookmarks = (related_bkmark_df
                 .apply(lambda x: (f"{x.BookmarkName} {x.Tags}  {x.TimeGenerated}", x.BookmarkId),
                        axis=1)
                 .tolist())
    print_related_bkmk(bookmarks, "account", account_name)
    nbdisplay.display_timeline(
        data=related_bkmark_df,
        title="Bookmarks",
        source_columns=["BookmarkName", "Tags"], height=200
    )
else:
    display(Markdown("No related bookmarks found."))

def disp_bookmark(bookmark_id):
    return related_bkmark_df[related_bkmark_df["BookmarkId"] == bookmark_id].T

if related_bkmark_df is not None and not related_bkmark_df.empty:
    display(Markdown("### Click on bookmark to view details."))
    rel_bkmk_select = nbwidgets.SelectItem(
        item_list=bookmarks,
        action=disp_bookmark,
        auto_display=True
    )
    

## Further Investigation
Depending on the type of account (AAD or Host/Endpoint account) we can drill deeper to look at data specific to that account type.

This cell determines which section of the notebook is applicable to the account type.

In [None]:
# Function definitions used below
# This cell should be executed before continuing further.

# WHOIS lookup function
from functools import lru_cache
from ipwhois import IPWhois
from ipaddress import ip_address

@lru_cache(maxsize=1024)
def get_whois_info(ip_lookup, show_progress=False):
    try:
        ip = ip_address(ip_lookup)
    except ValueError:
        return "Not an IP Address", {}
    if ip.is_private:
        return "private address", {}
    if not ip.is_global:
        return "other address", {}
    whois = IPWhois(ip)
    whois_result = whois.lookup_whois()
    if show_progress:
        print(".", end="")
    return whois_result["asn_description"], whois_result


ti_lookup = TILookup()
def check_ip_ti(df, ip_col):

    ip4_rgx = r"((?:[0-9]{1,3}\.){3}[0-9]{1,3})"
    df = (df
          .assign(IP_ext=lambda x: x[ip_col].str.extract(ip4_rgx, expand=False))
          .rename(columns={ip_col: ip_col + "_orig"})
          .rename(columns={"IP_ext": ip_col})
         )
    src_ip_addrs = (df[[ip_col]]
                    .dropna()
                    .drop_duplicates()
                   )
    md(f"Querying TI for {len(src_ip_addrs)} indicators...")
    ti_results = ti_lookup.lookup_iocs(data=src_ip_addrs, obs_col=ip_col)
    ti_results = ti_results[ti_results["Severity"].isin(['warning', 'high'])]

    ti_merged_df = df.merge(ti_results, how="left", left_on=ip_col, right_on="Ioc")
    return ti_results, ti_merged_df, src_ip_addrs


geo_lookup = GeoLiteLookup()
def check_geo_whois(ip_df, df, ip_col):
    
    ip4_rgx = r"((?:[0-9]{1,3}\.){3}[0-9]{1,3})"
    df = (df
          .assign(IP_ext=lambda x: x[ip_col].str.extract(ip4_rgx, expand=False))
          .rename(columns={ip_col: ip_col + "_orig"})
          .rename(columns={"IP_ext": ip_col})
         )
    md(f"Querying geolocation for {len(ip_df)} ip addresses...")
    
    geo_ips = geo_lookup.lookup_ip(ip_addr_list=list(ip_df[ip_col].values))
    # TODO replace
    ip_dicts = [{**ent.Location.properties, "IpAddress": ent.Address} for ent in geo_ips[1]]
    df_out = pd.DataFrame(data=ip_dicts)
    geo_df = df.merge(df_out, how="left", left_on=ip_col, right_on="IpAddress")

    md(f"Querying WhoIs for {len(ip_df)} ip addresses...")
    whois_df = ip_df.copy()
    # Get the WhoIs results
    whois_df[["ASNDesc", "WhoisResult"]] = (
        ip_df
        .apply(lambda x: get_whois_info(x[ip_col], show_progress=True),
               axis=1, result_type="expand"))
    geo_whois_df = geo_df.merge(whois_df, how="left", right_on=ip_col, left_on=ip_col)
    return geo_whois_df

# Based on the account type, advice the user where to go next.

acct, source = selected_account(select_acct.value)
md(f"Account '{acct}'. Source is '{source}'", "bold, large, blue")

goto = lambda x: display(Markdown(f"### For further analysis go to {x}"))
if source == "LinuxHostLogon":
    goto("go to [LinuxHostLogon](#Linux-Host)")
if source == "WindowsHostLogon":
    goto("go to [WindowsHostLogon](#Windows-Host)")
if source in ["AADLogon", "AzureActivity", "O365Activity"]:
    goto("go to [AAD/Office Account](#AAD/Office-Account)")

## Windows Host
For Windows accounts we look for the following types of data:

- Logon Summary
- Threat Intelligence reports for logon source IP Address(es)
- Geo location and Whois lookup for logon source IP Address(es)
- Additional alerts for the hosts where the account had logged on
- Additional bookmarks for the hosts where the account had logged on

In [None]:
md("Fetching logon data...")
ext_logon_status = "| extend LogonStatus = iff(EventID == 4624, 'success', 'failed')"
all_win_logons = (qry_prov.WindowsSecurity
                  .list_logon_attempts_by_account(**acct_query_params(),
                                                 add_query_items=ext_logon_status))
md("done")

### Host Logon Summary

In [None]:
logon_summary = (all_win_logons
 .groupby("Computer")
 .agg(
     TotalLogons=pd.NamedAgg(column="EventID", aggfunc="count"),
     LogonResult=pd.NamedAgg(column="LogonStatus", aggfunc=lambda x: x.value_counts().to_dict()),
     IPAddresses=pd.NamedAgg(column="IpAddress", aggfunc=lambda x: x.unique().tolist()),
     LogonTypeCount=pd.NamedAgg(column="LogonType", aggfunc=lambda x: x.value_counts().to_dict()),
     FirstLogon=pd.NamedAgg(column="TimeGenerated", aggfunc="min"),
     LastLogon=pd.NamedAgg(column="TimeGenerated", aggfunc="max"),
  )
)

display(logon_summary)
if len(all_win_logons) > 1:
  nbdisplay.display_timeline(
    data=all_win_logons,
    group_by="IpAddress",
    source_columns=["Computer", "LogonStatus", "LogonType"],
    title="Logons"
  )

### Threat Intelligence for logon IP Addresses
<details>
    <summary>TI Configuration</summary>
If you have not used msticpy threat intelligence lookups before you will need to supply API keys for the 
TI Providers that you want to use. Please see the section on configuring [msticpyconfig.yaml](#msticpyconfig.yaml-configuration-File)

Then reload provider settings:
```
mylookup = TILookup()
mylookup.reload_provider_settings()
```
</details>

In [None]:
ti_results, all_win_logons_ti, src_ip_addrs_win = check_ip_ti(df=all_win_logons, ip_col="IpAddress")
if not ti_results.empty:
    md(f"{len(ti_results)} threat intelligence hits have been "
       + "matched on one or more source IP addresses.", "bold, red, large")
    md(" You should investigate the hosts accessed from these adddresses "
       + "(see previous cell for host name accessed by IP Address)"
       + "the 'Entity Explorer - Windows Host' notebook", "bold, red" )
    md("Logon details for TI matches are in the `all_win_logons_ti` DataFrame")
    display(ti_results)
else:
    md("No additional items found for logged on hosts")

### Geolocation and ownership for source logon IP addresses
We use the source IP addresses for the activity to perform and Geo-location lookup and a WhoIs lookup to try to identify the IP address owner.

In [None]:
# src_ip_addrs_win = all_win_logons[["IpAddress"]].drop_duplicates()
all_win_logons_geo = check_geo_whois(src_ip_addrs_win, all_win_logons, "IpAddress")
md("Geolocations and ASN Owner for account logon source IP addresses. Information only", "bold")

if len(all_win_logons_geo) < 5:
  display(all_win_logons_geo)
else:
  display(
    all_win_logons_geo[~all_win_logons_geo["CountryName"].isna()]
  .groupby(["Computer", "IpAddress", "CountryCode","CountryName", "City", "ASNDesc"])
  .agg(
      TotalLogons=pd.NamedAgg(column="EventID", aggfunc="count"),
      LogonResult=pd.NamedAgg(column="LogonStatus", aggfunc=lambda x: x.value_counts().to_dict()),
      LogonTypeCount=pd.NamedAgg(column="LogonType", aggfunc=lambda x: x.value_counts().to_dict()),
      FirstLogon=pd.NamedAgg(column="TimeGenerated", aggfunc="min"),
      LastLogon=pd.NamedAgg(column="TimeGenerated", aggfunc="max"),
    )
  )


### Additional Alerts for logged-on hosts

In [None]:
related_host_alerts = []
for host in all_win_logons["Computer"].unique():
    host_alerts = qry_prov.SecurityAlert.list_related_alerts(
        start=acct_query_params()["start"],
        end=acct_query_params()["end"],
        host_name=host
    )
    related_host_alerts.append(host_alerts)
    
related_host_alerts_df = pd.concat(related_host_alerts)

# Show host alerts that were not in the Account alerts list
related_host_alerts_df = related_host_alerts_df[~related_host_alerts_df["SystemAlertId"]
                                                .isin(related_alerts["SystemAlertId"])]
if not related_host_alerts_df.empty:
    md(f"{len(related_host_alerts_df)} additional alerts have been "
       + "triggered on one or more hosts.", "bold, red, large")
    md(" You should investigate these hosts using "
       + "the 'Entity Explorer - Windows Host' notebook", "bold, red" )
    display(related_host_alerts_df)
else:
    md("No additional alerts found")

#### Additional alerts for source IP addresses
We can also search for alerts that contain the IP addresses that were the origin of logons to the host.

In [None]:
ip_list = ",".join(list(all_win_logons["IpAddress"].unique()))
related_ip_alerts_df = qry_prov.SecurityAlert.list_alerts_for_ip(
    start=acct_query_params()["start"],
    end=acct_query_params()["end"],
    source_ip_list=ip_list
)
# remove Account and host alerts already seen
related_ip_alerts_df = related_ip_alerts_df[~related_ip_alerts_df["SystemAlertId"]
                                            .isin(related_alerts["SystemAlertId"])]
related_ip_alerts_df = related_ip_alerts_df[~related_ip_alerts_df["SystemAlertId"]
                                            .isin(related_host_alerts_df["SystemAlertId"])]
if not related_ip_alerts_df.empty:
    md(f"{len(related_ip_alerts_df)} additional alerts have been "
       + "triggered from one or more source IPs.", "bold, red, large")
    md(" You should investigate these IPs using "
       + "the 'Entity Explorer - IP Address' notebook", "bold, red" )
    display(related_ip_alerts_df)
else:
    md("No additional alerts found.")

### Additional Investigation Bookmarks for logged-on hosts

In [None]:
related_host_bkmks = []
for host in all_win_logons["Computer"].unique():
    host_bkmks = qry_prov.AzureSentinel.list_bookmarks_for_entity(
        start=acct_query_params()["start"],
        end=acct_query_params()["end"],
        entity_id=f"'{host}'"
    )
    related_host_bkmks.append(host_bkmks)
    
related_host_bkmks_df = pd.concat(related_host_bkmks)

# Show host bookmarks that were not in the Account bookmarks list
related_host_bkmks_df = related_host_bkmks_df[~related_host_bkmks_df["BookmarkId"]
                                              .isin(related_bkmark_df["BookmarkId"])]
if not related_host_bkmks_df.empty:
    md(f"{len(related_host_bkmks_df)} additional investigation bookmarks have been "
       + "found for one or more hosts.", "bold, red, large")
    md(" You should investigate these hosts using "
       + "the 'Entity Explorer - Windows Host' notebook", "bold, red" )
    display(related_host_bkmks_df)
else:
    md("No additional items found for logged on hosts")

## Linux Host
For Linux accounts we look for the following types of data:
- Logon Summary
- Threat Intelligence reports for logon source IP Address(es)
- Geo location and Whois lookup for logon source IP Address(es)
- Additional alerts for the hosts where the account had logged on
- Additional bookmarks for the hosts where the account had logged on


In [None]:
md("Fetching logon data...")
all_lx_logons = (qry_prov.LinuxSyslog
                 .list_logons_for_account(**acct_query_params()))
md("done")

### Host Logon Summary

In [None]:
logon_summary = (all_lx_logons
 .groupby("Computer")
 .agg(
     TotalLogons=pd.NamedAgg(column="Computer", aggfunc="count"),
     FailedLogons=pd.NamedAgg(column="LogonResult", aggfunc=lambda x: x.value_counts().to_dict()),
     IPAddresses=pd.NamedAgg(column="SourceIP", aggfunc=lambda x: x.unique().tolist()),
     LogonTypeCount=pd.NamedAgg(column="LogonType", aggfunc=lambda x: x.value_counts().to_dict()),
     FirstLogon=pd.NamedAgg(column="TimeGenerated", aggfunc="min"),
     LastLogon=pd.NamedAgg(column="TimeGenerated", aggfunc="max"),
  )
)

display(logon_summary)
nbdisplay.display_timeline(data=all_lx_logons,
                           group_by="SourceIP",
                           source_columns=["Computer", "LogonResult", "LogonType"],
                           title="Logons");

### Threat Intelligence for logon IP Addresses
<details>
    <summary>TI Configuration</summary>
If you have not used msticpy threat intelligence lookups before you will need to supply API keys for the 
TI Providers that you want to use. Please see the section on configuring [msticpyconfig.yaml](#msticpyconfig.yaml-configuration-File)

Then reload provider settings:
```
mylookup = TILookup()
mylookup.reload_provider_settings()
```
</details>

In [None]:
ti_results_lx, all_lx_logons_ti, src_ip_addrs_lx = check_ip_ti(df=all_lx_logons, ip_col="SourceIP")

if not ti_results_lx.empty:
    md(f"{len(ti_results_lx)} threat intelligence hits have been "
       + "matched on one or more source IP addresses.", "bold, red, large")
    md(" You should investigate these hosts accessed from these adddresses "
       + "(see previous cell for host name accessed by IP Address)"
       + "the 'Entity Explorer - Linux Host' notebook", "bold, red" )
    display(ti_results_lx)
else:
    md("No additional items found for logged on hosts")

### Geolocation and ownership for source logon IP addresses
We use the source IP addresses for the activity to perform and Geo-location lookup and a WhoIs lookup to try to identify the IP address owner.

In [None]:
all_lx_logons_geo = check_geo_whois(src_ip_addrs_lx, all_lx_logons, "SourceIP")

md("Geolocations and ASN Owner for account logon source IP addresses. Information only", "bold")

(all_lx_logons_geo[~all_lx_logons_geo["CountryName"].isna()]
 .groupby(["Computer", "SourceIP", "CountryCode","CountryName", "City", "ASNDesc"])
 .agg(
     TotalLogons=pd.NamedAgg(column="SourceSystem", aggfunc="count"),
     LogonResult=pd.NamedAgg(column="LogonResult", aggfunc=lambda x: x.value_counts().to_dict()),
     LogonTypeCount=pd.NamedAgg(column="LogonType", aggfunc=lambda x: x.value_counts().to_dict()),
     FirstLogon=pd.NamedAgg(column="TimeGenerated", aggfunc="min"),
     LastLogon=pd.NamedAgg(column="TimeGenerated", aggfunc="max"),
  )
)

### Additional Alerts for logged-on hosts

In [None]:
related_host_alerts = []
for host in all_lx_logons["Computer"].unique():
    host_alerts = qry_prov.SecurityAlert.list_related_alerts(
        start=acct_query_params()["start"],
        end=acct_query_params()["end"],
        host_name=host
    )
    related_host_alerts.append(host_alerts)
    
related_host_alerts_df = pd.concat(related_host_alerts)

# Show host alerts that were not in the Account alerts list
related_host_alerts_df = related_host_alerts_df[~related_host_alerts_df["SystemAlertId"]
                                                .isin(related_alerts["SystemAlertId"])]
if not related_host_alerts_df.empty:
    md(f"{len(related_host_alerts_df)} additional alerts have been "
       + "triggered on one or more hosts.", "bold, red, large")
    md(" You should investigate these hosts using "
       + "the 'Entity Explorer - Linux Host' notebook", "bold, red" )
    display(related_host_alerts_df[['TenantId','TimeGenerated','AlertDisplayName','ConfidenceLevel','ConfidenceScore','Computer','ExtendedProperties','Entities']])
else:
    md("No additional items found for logged on hosts")

#### Additional alerts for source IP addresses
We can also search for alerts that contain the IP addresses that were the origin of logons to the host.

In [None]:
ip_list = ",".join(list(all_lx_logons["SourceIP"].unique()))
related_ip_alerts_df = qry_prov.SecurityAlert.list_alerts_for_ip(
    start=acct_query_params()["start"],
    end=acct_query_params()["end"],
    source_ip_list=ip_list
)
# remove Account and host alerts already seen
related_ip_alerts_df = related_ip_alerts_df[~related_ip_alerts_df["SystemAlertId"]
                                            .isin(related_alerts["SystemAlertId"])]
related_ip_alerts_df = related_ip_alerts_df[~related_ip_alerts_df["SystemAlertId"]
                                            .isin(related_host_alerts_df["SystemAlertId"])]
if not related_ip_alerts_df.empty:
    md(f"{len(related_ip_alerts_df)} additional alerts have been "
       + "triggered from one or more source IPs.", "bold, red, large")
    md(" You should investigate these IPs using "
       + "the 'Entity Explorer - IP Address' notebook", "bold, red" )
    display(related_ip_alerts_df)
else:
    md("No additional alerts found.")

### Additional Investigation Bookmarks for logged-on hosts

In [None]:
related_host_bkmks = []
for host in all_lx_logons["Computer"].unique():
    host_bkmks = qry_prov.AzureSentinel.list_bookmarks_for_entity(
        start=acct_query_params()["start"],
        end=acct_query_params()["end"],
        entity_id=host
    )
    related_host_bkmks.append(host_bkmks)
    
related_host_bkmks_df = pd.concat(related_host_bkmks)

# Show host bookmarks that were not in the Account bookmarks list
related_host_bkmks_df = related_host_bkmks_df[~related_host_bkmks_df["BookmarkId"]
                                              .isin(related_bkmark_df["BookmarkId"])]
if not related_host_bkmks_df.empty:
    md(f"{len(related_host_bkmks_df)} additional investigation bookmarks have been "
       + "found for one or more hosts.", "bold, red, large")
    md(" You should investigate these hosts using "
       + "the 'Entity Explorer - Windows Host' notebook", "bold, red" )
    display(related_host_bkmks_df)
else:
    md("No additional items found for logged on hosts")

## AAD/Office Account
For an Azure Active Directory account we look for the following data:
- AAD Sign-on activity
- Azure Activity
- Office 365 operations
- Threat intelligence reports for the client IP Address used in any of these activities
- Geo location and Whois lookup for logon source IP Address(es)
- Additional alerts for the logon source IP Address(es)

In [None]:
md("Fetching Azure/Office data...")
# Fetch the data
aad_sum_qry = """
| extend UserPrincipalName=tolower(UserPrincipalName)
| project-rename Operation=OperationName, AppResourceProvider=AppDisplayName"""
aad_signin_df = (qry_prov.Azure
                 .list_aad_signins_for_account(**acct_query_params(),
                                              add_query_items=aad_sum_qry)
                )

az_sum_qry = """
| extend UserPrincipalName=tolower(Caller)
| project-rename IPAddress=CallerIpAddress, Operation=OperationName,
  AppResourceProvider=ResourceProvider"""
azure_activity_df = (qry_prov.Azure
                     .list_azure_activity_for_account(**acct_query_params(),
                                                      add_query_items=az_sum_qry)
                    )

o365_sum_qry = """
| extend UserPrincipalName=tolower(UserId)
| project-rename IPAddress=ClientIP, ResourceId=OfficeObjectId,
  AppResourceProvider=OfficeWorkload"""
o365_activity_df = (qry_prov.Office365
                    .list_activity_for_account(**acct_query_params(),
                                               add_query_items=o365_sum_qry)
                    )
md("done")

### Azure/Office Summary

In [None]:
az_all_data = pd.concat([aad_signin_df, azure_activity_df, o365_activity_df], sort=False)

nbdisplay.display_timeline(data=az_all_data,
                          group_by="AppResourceProvider",
                          source_columns=["Operation", "IPAddress", "AppResourceProvider"],
                          title="Azure Signin activity by Provider")
nbdisplay.display_timeline(data=az_all_data,
                          group_by="IPAddress",
                          source_columns=["Operation", "IPAddress", "AppResourceProvider"],
                          title="Azure Operations by Source IP")
nbdisplay.display_timeline(data=az_all_data,
                          group_by="Operation",
                          source_columns=["Operation", "IPAddress", "AppResourceProvider"],
                          title="Azure Operations by Operation");

In [None]:
(az_all_data
.groupby(["UserPrincipalName", "Type", "IPAddress", "AppResourceProvider", "UserType"])
.agg(
     OperationCount=pd.NamedAgg(column="Type", aggfunc="count"),
     OperationTypes=pd.NamedAgg(column="Operation", aggfunc=lambda x: x.unique().tolist()),
     Resources=pd.NamedAgg(column="ResourceId", aggfunc="nunique"),
     FirstOperation=pd.NamedAgg(column="TimeGenerated", aggfunc="min"),
     LastOperation=pd.NamedAgg(column="TimeGenerated", aggfunc="max"),
  )
)

### Threat Intelligence for IP Addresses
<details>
    <summary>TI Configuration</summary>
If you have not used msticpy threat intelligence lookups before you will need to supply API keys for the 
TI Providers that you want to use. Please see the section on configuring [msticpyconfig.yaml](#msticpyconfig.yaml-configuration-File)

Then reload provider settings:
```
mylookup = TILookup()
mylookup.reload_provider_settings()
```
</details>

In [None]:
ti_results_az, all_az_ti, src_ip_addrs_az = check_ip_ti(df=az_all_data, ip_col="IPAddress")

if not ti_results_az.empty:
    md(f"{len(ti_results_az)} threat intelligence hits have been "
       + "matched on one or more source IP addresses.", "bold, red, large")
    md(" You should investigate these IP addresses using "
       + "the 'Entity Explorer - IP Address' notebook", "bold, red" )
    display(ti_results_az)
else:
    md("No additional items found")

### Geolocation and ownership for source IP addresses
We use the source IP addresses for the activity to perform and Geo-location lookup and a WhoIs lookup to try to identify the IP address owner.

In [None]:
all_az_geo = check_geo_whois(src_ip_addrs_az.iloc[0:50], az_all_data, "IPAddress")

md("Geolocations and ASN Owner for source IP addresses. Information only", "bold")

(all_az_geo[~all_az_geo["CountryName"].isna()]
 .groupby(["UserPrincipalName", "IPAddress", "CountryCode","CountryName", "City", "ASNDesc"])
 .agg(
     TotalOperations=pd.NamedAgg(column="SourceSystem", aggfunc="count"),
     Operations=pd.NamedAgg(column="Operation", aggfunc=lambda x: x.value_counts().to_dict()),
     AppResources=pd.NamedAgg(column="AppResourceProvider", aggfunc=lambda x: x.unique().tolist()),
     FirstLogon=pd.NamedAgg(column="TimeGenerated", aggfunc="min"),
     LastLogon=pd.NamedAgg(column="TimeGenerated", aggfunc="max"),
  )
)

### Additional alerts for source IP addresses

In [None]:
ip_list = ",".join(list(src_ip_addrs_az["IPAddress"].unique()))
related_ip_alerts_df = qry_prov.SecurityAlert.list_alerts_for_ip(
    start=acct_query_params()["start"],
    end=acct_query_params()["end"],
    source_ip_list=ip_list
)
# remove Account and host alerts already seen
related_ip_alerts_df = related_ip_alerts_df[~related_ip_alerts_df["SystemAlertId"]
                                            .isin(related_alerts["SystemAlertId"])]
if not related_ip_alerts_df.empty:
    md(f"{len(related_ip_alerts_df)} additional alerts have been "
       + "triggered from one or more source IPs.", "bold, red, large")
    md(" You should investigate these IPs using "
       + "the 'Entity Explorer - IP Address' notebook", "bold, red" )
    display(related_ip_alerts_df)
else:
   md("No related alerts for source IPs found.")

## Appendices

### Available DataFrames

In [None]:
print('List of current DataFrames in Notebook')
print('-' * 50)
current_vars = list(locals().keys())
for var_name in current_vars:
    if isinstance(locals()[var_name], pd.DataFrame) and not var_name.startswith('_'):
        print(var_name)

### Saving data to Excel
To save the contents of a pandas DataFrame to an Excel spreadsheet
use the following syntax
```
writer = pd.ExcelWriter('myWorksheet.xlsx')
my_data_frame.to_excel(writer,'Sheet1')
writer.save()
```

## Configuration

### `msticpyconfig.yaml` configuration File
You can configure primary and secondary TI providers and any required parameters in the `msticpyconfig.yaml` file. This is read from the current directory or you can set an environment variable (`MSTICPYCONFIG`) pointing to its location.

To configure this file see the [ConfigureNotebookEnvironment notebook](https://github.com/Azure/Azure-Sentinel-Notebooks/blob/master/ConfiguringNotebookEnvironment.ipynb)