Skip to content

This Project summarizes the data set of Tweets related to the forth coming 2023 general election in Nigeria targeted at the two leading presidential aspirants in the country

Notifications You must be signed in to change notification settings

OLAMIDE100/Data-Engineering-Project

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

28 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation


Logo

DataTalks Data & Analytical Engineering ZoomCamp Project

A wrap up project for the completion of the 7 weeks of the ZoomCamp!
Explore the docs »

View Dashboard · Report Bug

Table of Contents
  1. About The Project
  2. Usage
  3. Roadmap
  4. Acknowledgments
  5. Contact

About The Project

2023 NIGERIA GENERAL ELECTION POLITICAL ARENA

Logo

Today microblogging has become a very common platform for exchanging opinion among us. Many users exchange their thoughts on a various aspect of their activity. Consequently, microblogging websites are the substantial origin of information for sentiment analysis and opinion mining. Twitter is a famous microblogging website where 500 million tweets are posted every day. This Project summarizes the data set of Tweets related to the forth coming 2023 general election in Nigeria targeted at the two leading presidential aspirants in the country.

The dataset will be scraped daily from twitter, cleaned and transformed with the necessary sentimental analysis carried out on the tweets before loading to the datalake, then the data warehouse for storage and staging for provisioning the data studio with clean data for presenting the insights and analysis using well defined charts and dashboards. All the processes above will be carried out using the various knowledge and tools(cloud engineering and devops) associated with data and analytical engineering.

(back to top)

Architecture

architecture diagram

  • Nigeria Political Tweets: the dataset we will use during the course.
  • Pandas: a fast, powerful, flexible and easy to use open source data analysis and manipulation tool, built on top of the Python programming language.
  • Google BigQuery: serverless data warehouse (central repository of integrated data from one or more disparate sources).
  • Airflow: workflow management platform for data engineering pipelines. In other words, a pipeline orchestration tool.
  • Docker: a set of platform as a service (PaaS) products that use OS-level virtualization to deliver software in packages called containers.
  • Google Cloud Storage: a RESTful online file storage web service for storing and accessing data on Google Cloud Platform infrastructure.
  • Google Data Studio: Google Data Studio turns your data into fully customizable informative reports and dashboards that are easy to read and share

Built With

Language, frameworks, libraries, Services and Tools used to bootstrap this project.

  • Python
  • MySQL
  • Pandas
  • Docker
  • Google Cloud
  • Terraform
  • Apache Airflow
  • Git
  • GitHub
  • Visual Studio Code
  • Twitter
  • Linux

(back to top)

Usage

for real time dashboard of our data and with its analysis , please refer to the Political Arena Dashboard

dashboard diagram

(back to top)

Roadmap

  • Create a GCP project and Get the google service key and store in a file path
  • Install Terraform and create the main.tf and variable.tf file
  • Provision the various Google Cloud Resources Using Terraform
  • Create an Airflow folder with dags,logs and plugins folders inside it
  • Install Docker and Docker Compose
  • Add a custom Docker file with airflow image to take in airflow environment, python environment and google development kit/environment
  • Build the airflow image
  • Add the docker compose file with various airflow services and variables together with google variables
  • Build the bash data ingestion script
  • Build the dag python file with various operators for the execution off various tasks
  • Run the docker compose up to build and start our containers for the execution of the project
  • Connect the ingested dataset in the dataware to google data studio
  • Build dashboards to pass the necessary information effectively

See the open issues for a full list of proposed features (and known issues).

(back to top)

Acknowledgments

I am extremely grateful for the time this set of wonderful people put in place to ensure we understood the various aspect of data and analytical engineering

(back to top)

Contact

Your Name - Adesoba Adewale Olamide

Project Link: 2023 Political Arena

(back to top)

About

This Project summarizes the data set of Tweets related to the forth coming 2023 general election in Nigeria targeted at the two leading presidential aspirants in the country

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published