ETL-Pipeline-using-Azure-Databricks

Create your first ETL Pipeline using Azure Databricks

What is Azure Databricks

Azure Databricks is an Apache Spark-based analytics platform optimized for the Microsoft Azure cloud services platform. Designed with the founders of Apache Spark, Databricks is integrated with Azure to provide one-click setup, streamlined workflows, and an interactive workspace that enables collaboration between data scientists, data engineers, and business analysts.Azure Databricks is a fast, easy, and collaborative Apache Spark-based analytics service . ref:https://docs.microsoft.com/en-us/azure/databricks/scenarios/what-is-azure-databricks#:~:text=Azure%20Databricks%20is%20an%20Apache,Microsoft%20Azure%20cloud%20services%20platform.&text=For%20a%20big%20data%20pipeline,Event%20Hub%2C%20or%20IoT%20Hub.

Apache Spark-based analytics platform

Azure Databricks comprises the complete open-source Apache Spark cluster technologies and capabilities. Spark in Azure Databricks includes the following components: . Spark SQL and DataFrames: Spark SQL is the Spark module for working with structured data. A DataFrame is a distributed collection of data organized into named columns. It is conceptually equivalent to a table in a relational database or a data frame in R/Python.

Streaming: Real-time data processing and analysis for analytical and interactive applications. Integrates with HDFS, Flume, and Kafka.

MLlib: Machine Learning library consisting of common learning algorithms and utilities, including classification, regression, clustering, collaborative filtering, dimensionality reduction, as well as underlying optimization primitives.

GraphX: Graphs and graph computation for a broad scope of use cases from cognitive analytics to data exploration.

Spark Core API: Includes support for R, SQL, Python, Scala, and Java.

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
images		images
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

ETL-Pipeline-using-Azure-Databricks

What is Azure Databricks

Apache Spark-based analytics platform

About

Releases

Packages

AthiraSPillai/ETL-Pipeline-using-Azure-Databricks

Folders and files

Latest commit

History

Repository files navigation

ETL-Pipeline-using-Azure-Databricks

What is Azure Databricks

Apache Spark-based analytics platform

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Packages