You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Background
Building on #746 we want a robust fingerprinting and destination info to include in LoadInfo and pipeline traces. The fingerprinting is used to anonymously identify cloud destinations and allows to build proper tracing and data lineage
Tasks
improve filesystem fingerprint by including only the schema and netloc. currently we also include path in the bucket which is too much. for local filesystem configuration return a hash of empty string.
add method to destination factory that will return a destination info as named tuple/dict
improve fingerprinting of destinations that run locally (duckdb/filesystem/postgres)
Implementation
Please do 3 PRs for 3 tasks
destination info contains:
destination type, name, environment and fingerprint
destination str() representation (which is display safe)
local/remote flag
local destination:
for destinations that may run locally (duckdb, postgres, waeviate, quadrant, filesystem etc.) we should start generating fingerprints
if destination runs locally (always duckdb, file:// on filesystem, localhost and 127.0.0.1 + ip6 localhost if connection string) a local flag must be set to true in info
fingerprint must include .anonymous_id as used by telemetry and path to file/database when applicable
.anonymous_id should be detached from telemetry code and become independent (maybe part of paths.py module)
The text was updated successfully, but these errors were encountered:
Background
Building on #746 we want a robust fingerprinting and destination info to include in
LoadInfo
and pipeline traces. The fingerprinting is used to anonymously identify cloud destinations and allows to build proper tracing and data lineageTasks
filesystem
fingerprint by including only the schema and netloc. currently we also include path in the bucket which is too much. for local filesystem configuration return a hash of empty string.Implementation
Please do 3 PRs for 3 tasks
destination info contains:
local destination:
for destinations that may run locally (duckdb, postgres, waeviate, quadrant, filesystem etc.) we should start generating fingerprints
file://
on filesystem, localhost and 127.0.0.1 + ip6 localhost if connection string) a local flag must be set to true in info.anonymous_id
as used by telemetry and path to file/database when applicable.anonymous_id
should be detached from telemetry code and become independent (maybe part ofpaths.py
module)The text was updated successfully, but these errors were encountered: