Skip to content

chasinggreg/DE_Roadmap

Repository files navigation

My Roadmap to become an Azure Data Engineer

100 Days of Data Engineering

TLDR:

  1. Learn Python And SQL
  2. Learn Data Modeling
  3. Learn Data Pipelines
  4. Learn About The Cloud
  5. Everything Else

Additional Tips

  1. Pass AZ-900 and DP-900 prior to attempting DP-203 because it will help tremendously in remembering the core concepts and for you to differentiate the different service options in Azure.

  2. Focus on topics that cover moving and storing data securely between services.

  3. Focus on data partitioning for optimisation in terms of query speed and cost.

  4. Know how to monitor performance of batch and stream processing.

  5. Get hands on experience in setting up managed identities and key vaults against Azure Databricks, Synapse, Azure Storage.

  6. If you have no prior experience with databases, highly recommended to focus on learning SQL, Python, PySpark language.

  7. Try all the lab exercises from MS Learn github. Practice moving data from ingestion to and from the data store in all of the supported languages, in Azure Databricks & Synapse Analytics.

  8. Read the topics for Data Engineer path line by line in MS Learn. Don’t skip!


I have also included the 6 months to Azure Data Engineer and notes for AZ-900 Azure Fundamentals

Language Requirements

As a Data Engineer, you will be working with data technologies, tools, and systems that may require proficiency in certain programming languages. Here are some language requirements that you should consider for the DP-203 Data Engineer certification:

SQL: Structured Query Language (SQL) is essential for data management and querying relational databases. As a Data Engineer, you should have a good understanding of SQL and its syntax.

Python: Python is a popular language for data processing, analysis, and machine learning. It has a wide range of libraries and tools that make it ideal for data engineering tasks such as data extraction, transformation, and loading (ETL).

R: R is a programming language for statistical computing and graphics. It is commonly used for data analysis, machine learning, and data visualization tasks.

Java: Java is a general-purpose programming language that is widely used in enterprise applications, including big data processing and analytics.

Scala: Scala is a programming language that is used for big data processing with Apache Spark. It combines object-oriented and functional programming features and is known for its concise syntax and scalability.

It's important to note that the DP-203 exam does not require knowledge of all of these languages. However, having proficiency in one or more of these languages can help you perform better in your job as a Data Engineer and prepare for the DP-203 exam. You should also consider the specific data technologies and systems that you will be woSQL: Structured Query Language (SQL) is essential for data management and querying relational databases. As a Data Engineer, you should have a good understanding of SQL and its syntax.

Python: Python is a popular language for data processing, analysis, and machine learning. It has a wide range of libraries and tools that make it ideal for data engineering tasks such as data extraction, transformation, and loading (ETL).

R: R is a programming language for statistical computing and graphics. It is commonly used for data analysis, machine learning, and data visualization tasks.

Java: Java is a general-purpose programming language that is widely used in enterprise applications, including big data processing and analytics.

Scala: Scala is a programming language that is used for big data processing with Apache Spark. It combines object-oriented and functional programming features and is known for its concise syntax and scalability.

To be a successful Data Engineer, it's crucial to possess a solid understanding of at least one programming language.

The data engineering field commonly utilizes several languages, including Python, which is preferred for its ease of use and comprehensive library of data processing, analysis, and visualization frameworks.

Additionally, SQL is extensively used for managing relational databases and manipulating large datasets. Java is another frequently used language for constructing large-scale distributed systems, while Scala is chosen for processing massive datasets and constructing distributed systems that operate on the Java Virtual Machine (JVM). R is frequently used for statistical computing and graphics, with applications in data analysis and visualization.

Ultimately, specific language requirements for a Data Engineer position may differ based on the job, company, and project. Thus, it's critical to research the specific language needs and continually develop skills in the languages most applicable to your work.

Worth it?

The worthiness of the DP-203 certification (Microsoft Certified: Azure Data Engineer Associate) depends on several factors, including your career goals, current job responsibilities, and the demand for Azure Data Engineers in your industry.

If you are looking to develop your skills in designing and implementing data solutions in the Azure cloud platform, the DP-203 certification could be a valuable asset. It demonstrates that you have the knowledge and expertise to design and implement data solutions, including data storage, data processing, and data security.

In terms of job opportunities, Azure Data Engineers are in high demand, as organizations increasingly adopt cloud technologies and need professionals with the skills to manage their data solutions in the cloud.

However, if your career goals and job responsibilities do not align with the skills covered by the DP-203 certification, it may not be worth the investment of time and resources for you.

Ultimately, the worthiness of the DP-203 certification for you will depend on your individual career goals and the opportunities available in your industry. You should consider your current skills, experience, and the market demand for Azure Data Engineers before making a decision.

About

Azure Data Engineering Roadmap

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published