Data_Engineering

Data Engineering standard practices of Extract - Transform - Load (ETL)

A data warehouse is a large and centralized repository of data that is used for storing and managing an organization's data from various sources. The purpose of a data warehouse is to provide a single source of truth for all data in an organization, allowing for easy analysis and reporting.

Terraform is an open-source infrastructure-as-code (IAC) tool developed by HashiCorp. It allows developers to manage and provision infrastructure resources such as virtual machines, networks, and storage using code.

Terraform uses a declarative language to define the desired state of infrastructure resources, allowing developers to easily create, modify, and destroy infrastructure resources using version-controlled configuration files. This enables teams to automate infrastructure provisioning and ensure consistency across environments.

The relational DB section entails the management of structured and relational database systems using SQL (Structured Query Language) for the database querying and maintainance. Here, the postgresql engine is used for Data modeling operations which entails: Tables creation, Joins, Normalization, Denormalization, Schema, Warehousing

Requirements: 'python3', 'postgre', 'sql', 'pandas', 'numpy' and 'json' ..

NoSQL

The non-relational database section implements the no-tabular schema that is optimized for the specific requirements of the type of data being stored. Here, the CQL of the Cassandra engine is used for Data modeling operations which entails: Tables creation, Joins, Denormalization, Clauses.

Requirements: 'python3', 'cassandra', 'psycopg2', 'pandas', 'numpy' and 'json' ..

Data_Warehousing

The Data_Warehousing section uses the postgresql and cql to manage schemas on the Pagila dataset including, ETL, Fact and Dimension Tables, OLAP and OLTP Cubes.

Requirements: 'python3', 'postgre', 'sql', 'pandas', 'numpy' and 'json' ..

Licensing, Authors, Acknowledgements

The Pagila posgre movie rental dataset is used for anaysis in this work. You can find the Licensing for the data and other descriptive information at the link available here. Otherwise, feel free to use the code here as you would like.

Name		Name	Last commit message	Last commit date
Latest commit History 37 Commits
Data_Warehousing		Data_Warehousing
Terraform		Terraform
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Data_Warehousing

Data_Warehousing

Terraform

Terraform

README.md

README.md

Repository files navigation

Data_Engineering

Table of Contents

Project Motivation

Requirements

Contents

Relational_DBMS

NoSQL

Data_Warehousing

Licensing, Authors, Acknowledgements

About

Releases

Packages

Languages

Sanmilee/Data_Engineering

Folders and files

Latest commit

History

Repository files navigation

Data_Engineering

Table of Contents

Project Motivation

Requirements

Contents

Relational_DBMS

NoSQL

Data_Warehousing

Licensing, Authors, Acknowledgements

About

Resources

Stars

Watchers

Forks

Languages