# Scan and Manage Data Sources in Microsoft Purview

This notebook demonstrates how to scan, classify, and manage data sources in Microsoft Purview using the PVW CLI, including classification rules, filters, key vaults, and scan history.

**For detailed CLI and API documentation, see the main [README](../..//README.md) and [PVW_and_PurviewClient.md](../../docs/PVW_and_PurviewClient.md).**


# Scanning Data Plane

In [1]:
# Install/Upgrade purviewcli package
%pip install --upgrade pvw-cli

In [None]:
# Environment Variables
%env PURVIEW_ACCOUNT_NAME=YOUR_PURVIEW_ACCOUNT_NAME
%env AZURE_CLIENT_ID=YOUR_CLIENT_ID
%env AZURE_TENANT_ID=YOUR_TENANT_ID
%env AZURE_CLIENT_SECRET=YOUR_CLIENT_SECRET

PURVIEW_ACCOUNT_NAME = os.getenv("PURVIEW_ACCOUNT_NAME")
AZURE_CLIENT_ID = os.getenv("AZURE_CLIENT_ID")
AZURE_TENANT_ID = os.getenv("AZURE_TENANT_ID")
AZURE_CLIENT_SECRET = os.getenv("AZURE_CLIENT_SECRET")

print(f"Purview Name: {PURVIEW_ACCOUNT_NAME}")
print(f"Client ID: {AZURE_CLIENT_ID}")

env: PURVIEW_ACCOUNT_NAME="my-purview-account"
env: AZURE_CLIENT_ID=""
env: AZURE_TENANT_ID=""
env: AZURE_CLIENT_SECRET=""
Purview Name: "my-purview-account"
Client ID: ""


In [None]:
# Commands
!pvw scan --help

## Classification Rules

In [None]:
# Create or Update
!pvw scan putClassificationRule --classificationRuleName "BANK_ACCOUNT_NUMBER" --payloadFile "../json/scan/classification_rule.json"

In [None]:
# Delete
!pvw scan deleteClassificationRule --classificationRuleName "BANK_ACCOUNT_NUMBER"

In [None]:
# Get
!pvw scan readClassificationRule --classificationRuleName "BANK_ACCOUNT_NUMBER"

In [None]:
# List All
!pvw scan readClassificationRules

In [None]:
# List Versions By Classification Rule Name
!pvw scan readClassificationRuleVersions --classificationRuleName "BANK_ACCOUNT_NUMBER"

## Data Sources

In [None]:
# Create or Update
!pvw scan putDataSource --dataSourceName "DataSource2" --payloadFile "../json/scan/scan_source.json"

In [None]:
# Delete
!pvw scan deleteDataSource --dataSourceName "DataSource2"

In [None]:
# Get
!pvw scan readDataSource --dataSourceName "DataSource2"

In [None]:
# List All
!pvw scan readDataSources

## Filters

In [None]:
# Create or Update
!pvw scan putFilter --dataSourceName "AzureDataLakeStorage-kl2" --scanName "Scan-HcH" --payloadFile "../json/scan/scan_filter.json"

In [None]:
# Get
!pvw scan readFilters --dataSourceName "AzureDataLakeStorage-kl2" --scanName "Scan-HcH"

## Key Vault Connections

In [None]:
# Create
!pvw scan putKeyVault --keyVaultName "babylon" --payloadFile "../json/scan/scan_keyvault.json"

In [None]:
# Delete
!pvw scan deleteKeyVault --keyVaultName "babylon"

In [None]:
# Get
!pvw scan readKeyVault --keyVaultName "babylon"

In [None]:
# List All
!pvw scan readKeyVaults

## Scan Result

In [None]:
# List Scan History
!pvw scan readScanHistory --dataSourceName "AzureDataLakeStorage-kl2" --scanName "Scan-HcH"

In [None]:
# Run Scan
!pvw scan runScan --dataSourceName "AzureDataLakeStorage-kl2" --scanName "Scan-HcH"

## Scan Rulesets

In [None]:
# Create or Update
!pvw scan putScanRuleset --scanRulesetName "myScanRuleset" --payloadFile "../json/scan/scan_ruleset.json"

In [None]:
# Delete
!pvw scan deleteScanRuleset --scanRulesetName "myScanRuleset"

In [None]:
# Get
!pvw scan readScanRuleset --scanRulesetName "myScanRuleset"

In [None]:
# List All
!pvw scan readScanRulesets

## Scans

In [None]:
# Create or Update
!pvw scan putScan --dataSourceName "DataSource2" --scanName "myScan" --payloadFile "../json/scan/scan.json"

In [None]:
# Delete
!pvw scan deleteScan --dataSourceName "DataSource2" --scanName "myScan"

In [None]:
# Get
!pvw scan readScan --dataSourceName "DataSource2" --scanName "myScan"

In [None]:
# List By Data Source
!pvw scan readScans --dataSourceName "DataSource2"

## System Scan Rulesets

In [None]:
# Get
!pvw scan readSystemScanRuleset --dataSourceType "AdlsGen2"

In [None]:
# Get By Version
!pvw scan readSystemScanRulesetVersion --version "2" --dataSourceType "AdlsGen2"

In [None]:
# Get Latest
!pvw scan readSystemScanRulesetLatest --dataSourceType "AdlsGen2"

In [None]:
# List All
!pvw scan readSystemScanRulesets

In [None]:
# List Versions By Data Source
!pvw scan readSystemScanRulesetVersions --dataSourceType "AdlsGen2"

## Triggers

In [None]:
# Create
!pvw scan putTrigger --dataSourceName "DataSource2" --scanName "myScan" --payloadFile "../json/scan/scan_trigger.json"

In [None]:
# Delete
!pvw scan deleteTrigger --dataSourceName "DataSource2" --scanName "myScan"

In [None]:
# Get
!pvw scan readTrigger --dataSourceName "DataSource2" --scanName "myScan"