Skip to content

carteakey/world-energy-stats

Repository files navigation

world-energy-stats

https://world-energy-stats.fly.dev

Overview

Significant shifts in global energy dynamics over the past 50 years, driven by technology advancements, emerging energy sources, and growing climate awareness, highlight the need for understanding and analyzing changes in energy consumption.

Using Big Data tools, this project analyzes consumption trends for primary energy sources on global/country level over the past few decades and generate insights.

Final Project for MDS @ TMU Course - DS8003.

Data Pipeline Architecture

process

Contributors

Amarpreet Kaur
Amarpreet Kaur

🔣💻
Ruchi
Ruchi

🔣 💻

Citation

This project uses the awesome data from OWID.

Hannah Ritchie, Max Roser and Pablo Rosado (2023) - “Energy” Published online at OurWorldInData.org. Retrieved from: 'https://ourworldindata.org/energy' [Online Resource]

Running Locally

docker compose up airflow-init
docker ps --format "table {{.ID}}\t{{.Names}}\t{{.Status}}"
docker compose down --volumes --remove-orphans
docker exec spark-notebook jupyter server list
docker exec -it hive-server hive
hdfs dfs -ls -R /app-logs

Services

App Server Link
Hadoop ResourceManager UI http://localhost:8088/cluster
Hadoop Namenode UI http://localhost:9870/
Hadoop NodeManager UI http://localhost:8842
Jupyter Notebook UI http://localhost:8888/
Airflow Web Server UI http://localhost:8082/
Spark Master http://localhost:8080/
Spark Worker 1 http://localhost:8081/
Spark Worker 2 http://localhost:8083/

Project Structure

.
├── Procfile
├── README.md
├── airflow
│   ├── dags
│   │   ├── hadoop_setup_python.py
│   │   ├── hdfs_data_download.py
│   │   ├── hdfs_data_upload.py
│   │   ├── hive_create_database.py
│   │   ├── run_all.py
│   │   ├── run_data_categorization.py
│   │   ├── run_data_transformation.py
│   │   ├── run_hive_sql.py
│   │   └── run_mapred_genstats.py
├── app.py
├── assets
│   ├── big-players.png
│   ├── data
│   │   ├── 1_energy_overview.csv
│   │   ├── 2_energy_consumption_pct_rem.csv
│   │   ├── 2_energy_consumption_pct_top15.csv
│   │   ├── 2_energy_consumption_top15.csv
│   │   ├── 3_energy_breakdown_top15.csv
│   │   ├── 4_electricity_gen_top15.csv
│   │   ├── 4_electricity_share_top15.csv
│   │   ├── 5_population_correlation.csv
│   │   └── energy_share.csv
│   ├── electricity-mix.png
│   ├── energy-consumption.png
│   ├── energy-gdp-pop.png
│   ├── energy-mix.png
│   └── styles.css
├── components
│   ├── insight_1.py
│   ├── insight_2.py
│   ├── insight_3.py
│   ├── insight_4.py
│   └── insight_5.py
├── docker-compose.env
├── docker-compose.yml
├── energy-data
│   ├── README.md
│   ├── owid-energy-codebook.csv
│   └── owid-energy-data.csv
├── fly.toml
├── notebooks
│   ├── eda.ipynb
│   ├── hive_queries_ak.ipynb
│   ├── hive_queries_kc.ipynb
│   ├── output
│   ├── spark_etl_countries.ipynb
│   ├── spark_etl_world.ipynb
│   ├── sql
│   └── utils.py
├── process.png
├── requirements.txt
├── scripts
│   ├── hadoop
│   │   ├── eda_pandas_mapper.py
│   │   ├── eda_pandas_mapper_orig.py
│   │   ├── eda_pandas_reducer.py
│   │   ├── eda_pandas_reducer_orig.py
│   │   ├── null_percent_mapper.py
│   │   ├── null_percent_reducer.py
│   │   └── test
│   ├── install_python.sh
│   └── pyspark
│       ├── data_categorization.py
│       ├── data_transformation.py
│       ├── run_hive_query.py
│       └── utils.py
├── setup.sh
└── sql
    ├── 1_energy_overview.sql
    ├── 2_energy_consumption_pct_rem.sql
    ├── 2_energy_consumption_pct_top15.sql
    ├── 2_energy_consumption_top15.sql
    ├── 3_energy_breakdown_top15.sql
    ├── 4_electricity_gen_top15.sql
    ├── 4_electricity_share_top15.sql
    ├── 5_population_correlation.sql
    ├── combined_energy_data.sql
    └── energy_share.sql

About

World Energy Statistics using various data tools. Project @ MDS TMU.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published