# Create a new pandas Datasource
Use this notebook to configure a new pandas Datasource and add it to your project.

In [1]:
import great_expectations as gx
from great_expectations.cli.datasource import sanitize_yaml_and_save_datasource, check_if_datasource_name_exists
context = gx.get_context()

## Customize Your Datasource Configuration

**If you are new to Great Expectations Datasources,** you should check out our [how-to documentation](https://docs.greatexpectations.io/docs/guides/connecting_to_your_data/connect_to_data_overview)

**My configuration is not so simple - are there more advanced options?**
Glad you asked! Datasources are versatile. Please see our [How To Guides](https://docs.greatexpectations.io/docs/guides/connecting_to_your_data/connect_to_data_overview)!

Give your datasource a unique name:

In [2]:
datasource_name = "black_friday"

### For files based Datasources:
Here we are creating an example configuration.  The configuration contains an **InferredAssetFilesystemDataConnector** which will add a Data Asset for each file in the base directory you provided. It also contains a **RuntimeDataConnector** which can accept filepaths.   This is just an example, and you may customize this as you wish!

Also, if you would like to learn more about the **DataConnectors** used in this configuration, including other methods to organize assets, handle multi-file assets, name assets based on parts of a filename, please see our docs on [InferredAssetDataConnectors](https://docs.greatexpectations.io/docs/guides/connecting_to_your_data/how_to_configure_an_inferredassetdataconnector) and [RuntimeDataConnectors](https://docs.greatexpectations.io/docs/guides/connecting_to_your_data/how_to_configure_a_runtimedataconnector).


In [3]:
example_yaml = f"""
name: {datasource_name}
class_name: Datasource
execution_engine:
  class_name: PandasExecutionEngine
data_connectors:
  default_inferred_data_connector_name:
    class_name: InferredAssetFilesystemDataConnector
    base_directory: ../data_raw/Ingestion
    default_regex:
      group_names:
        - data_asset_name
      pattern: (.*)
    default_runtime_data_connector_name:
        class_name: RuntimeDataConnector
        module_name: great_expectations.datasource.data_connector
        assets:
          my_runtime_asset_name:
            class_name: Asset
            module_name: great_expectations.datasource.data_connector.asset
            batch_identifiers:
              - runtime_batch_identifier_name
"""
print(example_yaml)


name: black_friday
class_name: Datasource
execution_engine:
  class_name: PandasExecutionEngine
data_connectors:
  default_inferred_data_connector_name:
    class_name: InferredAssetFilesystemDataConnector
    base_directory: ../data_raw/Ingestion
    default_regex:
      group_names:
        - data_asset_name
      pattern: (.*)
    default_runtime_data_connector_name:
        class_name: RuntimeDataConnector
        module_name: great_expectations.datasource.data_connector
        assets:
          my_runtime_asset_name:
            class_name: Asset
            module_name: great_expectations.datasource.data_connector.asset
            batch_identifiers:
              - runtime_batch_identifier_name



# Test Your Datasource Configuration
Here we will test your Datasource configuration to make sure it is valid.

This `test_yaml_config()` function is meant to enable fast dev loops. **If your
configuration is correct, this cell will show you some snippets of the data
assets in the data source.** You can continually edit your Datasource config
yaml and re-run the cell to check until the new config is valid.

If you instead wish to use python instead of yaml to configure your Datasource,
you can use `context.add_datasource()` and specify all the required parameters.

In [4]:
context.test_yaml_config(yaml_config=example_yaml)

Attempting to instantiate class from config...
	Instantiating as a Datasource, since class_name is Datasource
	Successfully instantiated Datasource


ExecutionEngine class name: PandasExecutionEngine
Data Connectors:
	default_inferred_data_connector_name : InferredAssetFilesystemDataConnector

	Available data_asset_names (3 of 946):
		.DS_Store (1 of 1): ['.DS_Store']
		data_1.csv (1 of 1): ['data_1.csv']
		data_10.csv (1 of 1): ['data_10.csv']

	Unmatched data_references (0 of 0):[]



<great_expectations.datasource.new_datasource.Datasource at 0x1066235e0>

In [5]:
print(context.get_available_data_asset_names())

{'my_datasource': {'default_inferred_data_connector_name': ['data_605.csv', 'data_333.csv', 'data_732.csv', 'data_784.csv', 'data_987.csv', 'data_447.csv', 'data_660.csv', 'data_964.csv', 'data_117.csv', 'data_313.csv', 'data_389.csv', 'data_86.csv', 'data_223.csv', 'data_364.csv', 'data_624.csv', 'data_173.csv', 'data_59.csv', 'data_685.csv', 'data_155.csv', 'data_764.csv', 'data_93.csv', 'data_98.csv', 'data_936.csv', 'data_417.csv', 'data_309.csv', 'data_214.csv', 'data_961.csv', 'data_739.csv', 'data_827.csv', 'data_706.csv', 'data_777.csv', 'data_89.csv', 'data_996.csv', 'data_769.csv', 'data_200.csv', 'data_914.csv', 'data_647.csv', 'data_670.csv', 'data_911.csv', 'data_574.csv', 'data_463.csv', 'data_123.csv', 'data_16.csv', 'data_537.csv', 'data_924.csv', 'data_910.csv', 'data_457.csv', 'data_202.csv', 'data_470.csv', 'data_611.csv', 'data_32.csv', 'data_139.csv', 'data_499.csv', 'data_145.csv', 'data_175.csv', 'data_877.csv', 'data_199.csv', 'data_34.csv', 'data_346.csv', 'dat

## Save Your Datasource Configuration
Here we will save your Datasource in your Data Context once you are satisfied with the configuration. Note that `overwrite_existing` defaults to False, but you may change it to True if you wish to overwrite. Please note that if you wish to include comments you must add them directly to your `great_expectations.yml`.

In [6]:
sanitize_yaml_and_save_datasource(context, example_yaml, overwrite_existing=True)
context.list_datasources()

[{'name': 'my_datasource',
  'class_name': 'Datasource',
  'module_name': 'great_expectations.datasource',
  'execution_engine': {'class_name': 'PandasExecutionEngine',
   'module_name': 'great_expectations.execution_engine'},
  'data_connectors': {'default_inferred_data_connector_name': {'class_name': 'InferredAssetFilesystemDataConnector',
    'module_name': 'great_expectations.datasource.data_connector',
    'base_directory': '../data_raw/Ingestion',
    'default_regex': {'group_names': ['data_asset_name'], 'pattern': '(.*)'}},
   'default_runtime_data_connector_name': {'class_name': 'RuntimeDataConnector',
    'module_name': 'great_expectations.datasource.data_connector',
    'assets': {'my_runtime_asset_name': {'class_name': 'Asset',
      'module_name': 'great_expectations.datasource.data_connector.asset',
      'batch_identifiers': ['runtime_batch_identifier_name']}}}}},
 {'name': 'datasource',
  'class_name': 'Datasource',
  'module_name': 'great_expectations.datasource',
  'ex

Now you can close this notebook and delete it!