-
Notifications
You must be signed in to change notification settings - Fork 0
Terms
Alex Merced edited this page May 21, 2026
·
2 revisions
Welcome to the comprehensive dictionary of Data, Lakehouse, and Artificial Intelligence terms. Below is an alphabetical listing of definitions and architectures.
A | B | C | D | E | F | G | H | I | J | K | L | M
- ACID
- Active Learning
- Ad-hoc Query
- Agentic AI
- Aggregation
- AI-ism
- Airflow
- Analytics Engineer
- ANN (Approximate Nearest Neighbor)
- Anomaly Detection
- Apache Hudi
- Apache Iceberg
- Apache Polaris
- Apache Spark Structured Streaming
- API (Application Programming Interface)
- API Gateway
- Artificial Intelligence
- Artificial Neural Network
- Athena
- Attention Mechanism
- Autoencoder
- Avro
- AVX-512
- AWS (Amazon Web Services)
- AWS Glue
- Azure (Microsoft Azure)
- Azure Blob Storage
- Backpropagation
- Bag-of-Words
- BASE
- Batch Size
- Bfloat16
- BI (Business Intelligence)
- Bias-Variance Tradeoff
- BigQuery
- Blob Storage
- Boosting
- Bootstrap Aggregating (Bagging)
- Business Glossary
- Cache
- Catalog
- CDC (Change Data Capture)
- CDN (Content Delivery Network)
- Classification
- ClickHouse
- Cloud Storage
- Clustering
- Clustering Algorithm
- Colocation
- Columnar Database
- Compaction
- Compaction Job
- Compute-Storage Separation
- Computer Cluster
- Computer Vision
- Consensus Protocol
- Containerization
- Context Window
- Copy-on-Write
- Correlation
- Cosine Similarity
- Cross-Validation
- DAG (Directed Acyclic Graph)
- Dagster
- Dashboard
- Data Analyst
- Data Asset
- Data Catalog
- Data Cube
- Data Democratization
- Data Engineer
- Data Fabric
- Data Federation
- Data Governance
- Data Ingestion
- Data Lake
- Data Lakehouse
- Data Lakehouse Catalog
- Data Lineage
- Data Lineage Graph
- Data Marketplace
- Data Mart
- Data Mesh
- Data Modeling
- Data Pipeline
- Data Quality
- Data Refresh
- Data Retention
- Data Science
- Data Scientist
- Data Sharing
- Data Silo
- Data Source
- Data Sovereignty
- Data Steward
- Data Stewardship
- Data Transformation
- Data Virtualization
- Data Warehouse
- Database Index
- Databricks
- DBT (Data Build Tool)
- Decoder
- Deep Learning
- Deep Neural Network
- Delta Lake
- Denormalization
- Differentiable Programming
- Dimension Table
- Dimensional Modeling
- Distributed Database
- Distributed Query
- Docker
- Document Store
- Dot Product
- Double-bracket link
- DuckDB
- ELT (Extract, Load, Transform)
- Embedding
- Embedding Layer
- Embedding Model
- Encoder
- Entity Resolution
- Epoch
- ETL (Extract, Transform, Load)
- ETL Metadata
- Euclidean Distance
- Event-driven Architecture
- Execution Plan
- Exploratory Data Analysis (EDA)
- F1 Score
- Fact Table
- False Negative
- False Positive
- Feature Engineering
- Feature Selection
- Feature Store
- Feature Vector
- Federated Query Engine
- Few-Shot Learning
- Fine-tuning
- Flink (Apache Flink)
- Fully Connected Layer
- GCP (Google Cloud Platform)
- Generative Adversarial Network (GAN)
- Generative AI
- Glowmorphism
- Google Cloud Storage
- GPT (Generative Pre-trained Transformer)
- GPU (Graphics Processing Unit)
- Gradient Descent
- Graph Database
- GraphQL
- Guardrails
- Hallucination
- Heuristic
- Hidden Partitioning
- Hive Metastore
- HNSW (Hierarchical Navigable Small World)
- Hot Tier
- Hudi Timeline
- Hyperparameter
- IaC (Infrastructure as Code)
- IAM (Identity and Access Management)
- Iceberg Manifest
- Iceberg REST Catalog
- In-database Analytics
- In-memory Database
- Incremental Load
- Indexing
- Inference
- Internet of Things (IoT) Data
- IVF-FLAT
- K-Means Clustering
- K-Nearest Neighbors
- Kafka (Apache Kafka)
- Kafka Connect
- Key-Value Store
- KPI (Key Performance Indicator)
- Kubernetes