Skip to content
Behnam Yazdanpanahi edited this page May 9, 2024 · 5 revisions

Python for Data Engineering Course

Welcome to the PythonForDataEngineeringCourse wiki!

01: Python Basics

Python Fundamentals

Assignment: Ass1

02: Data Manipulation and Analysis

Data Manipulation with Pandas

  • Introduction to Pandas and its data structures (Series, DataFrame)
  • Data manipulation (filtering, sorting, grouping)
  • Reading and writing data from/to different file formats
  • Data cleaning and transformation techniques
  • Handling missing data
  • Merging and joining DataFrames

Numerical Computations with NumPy

  • Introduction to NumPy arrays and mathematical operations
  • Array operations (slicing, indexing, etc.)
  • Array manipulation (reshaping, stacking, splitting)
  • Working with random numbers
  • Linear algebra operations

Assignment: Ass2

03: Data Visualization

Data Visualization with Matplotlib and Seaborn

  • Data visualization basics
  • Line plots, scatter plots, bar plots
  • Histograms, box plots, violin plots
  • Customizing plots and adding labels, titles, etc.

Assignment: Ass3

04: Web Scraping & APIs

Web Crawling with Requests, BeautifulSoup and Selenium

  • Introduction to APIs
  • Accessing data from APIs
  • Introduction to web scraping
  • Working with HTML structure
  • Scraping data from websites using BeautifulSoup
  • Handling dynamic content with Selenium (JavaScript rendering)
  • Parsing data from web pages

Assignment: Ass4

05: Object-Oriented Programming, Working with Data Sources and Storages, and Serialization

Object-Oriented Programming (OOP)

  • Object-oriented programming concepts (classes, objects, inheritance, polymorphism)
  • Design patterns and best practices in OOP

Working with Data Sources and Storages, and Serialization

  • Reading and writing data from/to various file formats
  • Introduction to data serialization formats (JSON, Parquet, Pickle)
  • Serializing and deserializing data objects
  • Best practices for data serialization and storage

Assignment: Ass5

06: SQL & NoSQL Databases with Python

Working with SQL Databases

  • Introduction to SQL and relational databases
  • SQL basics (SELECT, FROM, WHERE, JOIN)
  • Creating and managing databases, tables, and indexes
  • CRUD operations (Create, Read, Update, Delete)
  • Connecting to databases
  • Executing SQL queries
  • Fetching and manipulating data with SQL
  • Using SQLAlchemy for database interaction

Working with NoSQL Databases

  • Understanding NoSQL databases (e.g., MongoDB, Redis)
  • Connecting to NoSQL databases
  • Querying and manipulating data in NoSQL databases
  • Handling document-based and key-value data models

Assignment: Ass6

07: Building Data Pipelines

Data Pipelines

  • ETL (Extract, Transform, Load)
  • Understanding data pipelines and their components
  • Designing and architecting data pipelines
  • Implementing data ingestion, transformation, and loading (ETL)

Assignment: Ass7

Capstone Project

Project Development

  • Apply all the concepts learned in a real-world data engineering project
  • Work with various data sources including web data and APIs
  • Implement ETL pipelines, data processing, and analysis using Python libraries and tools

Project: Final Project