Skip to content

Commit

Permalink
docs(ingest/sql-queries): Add documentation (datahub-project#9406)
Browse files Browse the repository at this point in the history
  • Loading branch information
asikowitz authored and Salman-Apptware committed Dec 15, 2023
1 parent f343209 commit fa02eb3
Show file tree
Hide file tree
Showing 3 changed files with 33 additions and 2 deletions.
8 changes: 8 additions & 0 deletions metadata-ingestion/docs/sources/sql-queries/sql-queries.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,8 @@
### Example Queries File

```json
{"query": "SELECT x FROM my_table", "timestamp": 1689232738.051, "user": "user_a", "downstream_tables": [], "upstream_tables": ["my_database.my_schema.my_table"]}
{"query": "INSERT INTO my_table VALUES (1, 'a')", "timestamp": 1689232737.669, "user": "user_b", "downstream_tables": ["my_database.my_schema.my_table"], "upstream_tables": []}
```

Note that this is not a valid standard JSON file, but rather a file containing one JSON object per line.
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
datahub_api: # Only necessary if using a non-DataHub sink, e.g. the file sink
server: http://localhost:8080
timeout_sec: 60
source:
type: sql-queries
config:
platform: "snowflake"
default_db: "SNOWFLAKE"
query_file: "./queries.json"
18 changes: 16 additions & 2 deletions metadata-ingestion/src/datahub/ingestion/source/sql_queries.py
Original file line number Diff line number Diff line change
Expand Up @@ -88,11 +88,25 @@ def compute_stats(self) -> None:

@platform_name("SQL Queries")
@config_class(SqlQueriesSourceConfig)
@support_status(SupportStatus.TESTING)
@support_status(SupportStatus.INCUBATING)
@capability(SourceCapability.LINEAGE_COARSE, "Parsed from SQL queries")
@capability(SourceCapability.LINEAGE_FINE, "Parsed from SQL queries")
class SqlQueriesSource(Source):
# TODO: Documentation
"""
This source reads a specifically-formatted JSON file containing SQL queries and parses them to generate lineage.
This file should contain one JSON object per line, with the following fields:
- query: string - The SQL query to parse.
- timestamp (optional): number - The timestamp of the query, in seconds since the epoch.
- user (optional): string - The user who ran the query.
This user value will be directly converted into a DataHub user urn.
- operation_type (optional): string - Platform-specific operation type, used if the operation type can't be parsed.
- downstream_tables (optional): string[] - Fallback list of tables that the query writes to,
used if the query can't be parsed.
- upstream_tables (optional): string[] - Fallback list of tables the query reads from,
used if the query can't be parsed.
"""

urns: Optional[Set[str]]
schema_resolver: SchemaResolver
builder: SqlParsingBuilder
Expand Down

0 comments on commit fa02eb3

Please sign in to comment.