Skip to content

SergeySenigov/data-engineer-practicum-portfolio

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

62 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Data Engineer Course Projects

Project Stack, tools, libraries
Data quality cheks. RFM datamart. SQL, Common Table Expression, Window Functions, PostgreSQL, cloudbeaver
Modifying DWH. Migration to the new model. SQL, Window Functions, PostgreSQL, cloudbeaver
Modifying ETL and datamarts. Implementing idempotency. AirFlow, SQL, PostgreSQL, cloudbeaver, bash, pandas, SQLAlchemy, PostgresOperator, BashOperator
Data quality checks in ETL AirFlow, SQL, PostgreSQL
Datamart in DWH based on multiple sources Airflow, PostgreSQL, MongoDB Compass, pendulum, Jupyter Notebook, bash, SQLAlchemy, PostgresHook
Datamart based on Analytical Database Vertica AirFlow, Yandex S3 Storage, Common Table Expression, SQL, Vertica, cloudbeaver, pandas
Working with PySpark in Hadoop. Working with HDFS. Hadoop, Spark, PySpark, YARN, MapReduce, Window Functions, HDFS, Airflow, SparkSubmitOperator, Parquet
Processing stream data with Spark Kafka, PySpark, AirFlow, kcat, Jupyter Notebook, SQL, PostgreSQL, Spark Streaming
Cloud services Yandex Cloud Services, Datalense, Kubernetes, kubectl, Kafka, kcat, confluent_kafka, flask, Docker Compose, Helm, Redis
Combining data streams. Analytics datamart. Yandex S3, DWH, Vertica, boto3, Airflow, TriggerDagRunOperator, Metabase

Releases

No releases published

Packages

No packages published