Capstone Project: Retrieving, Processing, and Visualizing Data with Python (University of Michigan through Coursera)
Hello Everyone !
In order to run the full code, follow these steps:
1- You have to register at https://www.eia.gov/opendata/register.php to get your api_key.
2- When you get it, just make a modification in extract.py and add your api_key in line 33.
3- Run extract.py
4- Run clean.py
5- Run index.py
6- Run analyze.py
7- Run visualize.py
8- Open visualize.html
So now, let met give you an overview of this project.
At first, I identified a Data Source from U.S. Energy Information Administration, here is the website: https://www.eia.gov/
This source contains data and statistics on different sources of energy such as petroleum, natural gas, electricity and their production, consumption, import, export and so on.
I'm interested to perform some analysis on the consumption of petroleum in several locations and its impact on CO2 Emissions over the world.
So I believe that this subject is useful for me personally and professionally.
At present, I started connecting to the Data source (https://www.eia.gov) from an API using a specific Key, then pull the Data from it and parse it into JSON Format, and at the end insert it to a Database. In fact, this Data source contains almost 4 Millions row of Data, and we are allowed to gather, only up to 5000 records per request.
So, all the Data that is extracted should be processed and cleaned up before analyzing and visualizing, by dropping a duplicate records and deleting some special characters using Pandas, and then insert it to another Database as shown in the Data program structure below.
In addition , we need to design a Database model index as demonstrated below, that improve the performance and the speed to retrieve results as demonstrated below.
So well, here we are in the final step of this capstone project. The outcome of our process is presented in the two pictures below which the first one demonstrate the Data analyzing of the top 10 largest Petroleum consuming in 2021 (in thousand barrels per day), and CO2 Emissions over the world throughout the time (in million metric tonnes carbon dioxide), and the second one display the Data visualizing in HTML browser coding with D3 JS.
All in all, in this capstone project, I have been able to try my hands at a few really interesting Data extracting, processing and visualizing, I have learned a lot more than I expected, that was super important for me in how to overcome different problems and find solutions, as well as in how to manage all the stages of this project.
Thank you everyone for your attention, its pleasure to share my work with you, I remain available for any feedback or any further information you may need about my project.
Best regards.