# Create Data Sources and Collections in Microsoft Purview

This notebook demonstrates how to create data sources and collections in Microsoft Purview using the PVW CLI, including both single and batch operations.

**For detailed CLI and API documentation, see the main [README](../..//README.md).**


# Sources
Example notebook on how to create sources (including collections).

In [None]:
# Create a Source
# Note: See samples/json/sources for examples on how to construct the JSON payloads for different kinds of sources (e.g.AdlsGen2, AmazonS3, Hive, etc).
!pvw scan putDataSource --dataSourceName "AzureSynapseWorkspace" --payloadFile "..\json\source\AzureSynapseWorkspace.json" --purviewName "purview-sandbox"

In [None]:
# Create Sources
# Note: Microsoft Purview does not currently surface a bulk endpoint to create multiple sources in a single operation.
# The sample code below demonstrates how this can be achieved by reading a JSON document which contains an array of sources.
import os, json

# Open JSON document that contains an array of sources
with open('/Users/taygan/Desktop/purviewcli/sources/data_lz_sources.json') as f:
    sources = json.load(f)

# Persist each source in a temporary JSON document and execute purviewcli command (pvw scan putDataSource)
cwd = os.getcwd()
filepath = os.path.join(cwd,'temp_source.json')
for source in sources:
    with open(filepath, 'w') as out_file:
        json.dump(source, out_file, indent=4, sort_keys=True)
    !pvw scan putDataSource --dataSourceName {source['name']} --payloadFile {filepath} --purviewName "pvtest"

# Clean-up temporary JSON document
os.remove(filepath)
