Skip to content

AthiraSPillai/ETL-Pipeline-using-Azure-Databricks

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 

Repository files navigation

ETL-Pipeline-using-Azure-Databricks

Create your first ETL Pipeline using Azure Databricks

What is Azure Databricks

Azure Databricks is an Apache Spark-based analytics platform optimized for the Microsoft Azure cloud services platform. Designed with the founders of Apache Spark, Databricks is integrated with Azure to provide one-click setup, streamlined workflows, and an interactive workspace that enables collaboration between data scientists, data engineers, and business analysts.Azure Databricks is a fast, easy, and collaborative Apache Spark-based analytics service Alt text. ref:https://docs.microsoft.com/en-us/azure/databricks/scenarios/what-is-azure-databricks#:~:text=Azure%20Databricks%20is%20an%20Apache,Microsoft%20Azure%20cloud%20services%20platform.&text=For%20a%20big%20data%20pipeline,Event%20Hub%2C%20or%20IoT%20Hub.

Apache Spark-based analytics platform

Azure Databricks comprises the complete open-source Apache Spark cluster technologies and capabilities. Spark in Azure Databricks includes the following components: Alt text. Spark SQL and DataFrames: Spark SQL is the Spark module for working with structured data. A DataFrame is a distributed collection of data organized into named columns. It is conceptually equivalent to a table in a relational database or a data frame in R/Python.

Streaming: Real-time data processing and analysis for analytical and interactive applications. Integrates with HDFS, Flume, and Kafka.

MLlib: Machine Learning library consisting of common learning algorithms and utilities, including classification, regression, clustering, collaborative filtering, dimensionality reduction, as well as underlying optimization primitives.

GraphX: Graphs and graph computation for a broad scope of use cases from cognitive analytics to data exploration.

Spark Core API: Includes support for R, SQL, Python, Scala, and Java.

About

Create your first ETL Pipeline using Azure Databricks

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages