This repository contains data engineering and data science projects and exercises using open data sources as part of the MADE/SAKI course, taught by the FAU Chair for Open-Source Software (OSS) in the Winter 23/24 semester. This repo is forked from made-template repository.
The Paphos International Airport plays a crucial role in connecting Cyprus to the global network of air travel. As a vital transportation hub, it is essential to understand how daily air traffic at this airport is influenced by weather conditions. This data science project focuses on analyzing the relationship between weather patterns and air traffic to provide valuable insights for airlines, passengers, and decision-makers. By examining these factors, we aim to help stakeholders better plan and manage their operations, ultimately benefiting Cyprus's economy and promoting efficient and safe air travel.
For this project we are using two open data sources: European Data Portal, which contains information on air traffic in Paphos International Airport, and meteostat, which provides daily weather and climate data of Paphos International Airport.
The project follows a structured ETL (Extract, Transform, Load) pipeline approach, encompassing various directories and modules with specific functionalities. The "etl_main.py" serves as the entry point for running the pipeline using the command "python ./project/etl_main.py", resulting in the generation of the final dataset stored in an SQLite database named as "project.sqlite".
- Clone the repository
- Install Python, then create a virtual environment for the project.
- Installing Dependecies using requirements.txt: To install the dependencies for this project, run the following command to install the dependencies specified in the requirements.txt file: "pip install -r requirements.txt"
- Run the project by running the pipeline shell script ".project/pipeline.sh"
- To run the test will have to execute test shell script ".project/tests.sh"
- Finally, run and explore the report at "./project/report.ipynb"
- (Optional) Also can check the related slides of the project and project presentation video
During the semester you will need to complete exercises, sometimes using Python, sometimes using Jayvee. You must place your submission in the exercises
folder in your repository and name them according to their number from one to five: exercise<number from 1-5>.<jv or py>
.
In regular intervalls, exercises will be given as homework to complete during the semester. We will divide you into two groups, one completing an exercise in Jayvee, the other in Python, switching each exercise. Details and deadlines will be discussed in the lecture, also see the course schedule. At the end of the semester, you will therefore have the following files in your repository:
./exercises/exercise1.jv
or./exercises/exercise1.py
./exercises/exercise2.jv
or./exercises/exercise2.py
./exercises/exercise3.jv
or./exercises/exercise3.py
./exercises/exercise4.jv
or./exercises/exercise4.py
./exercises/exercise5.jv
or./exercises/exercise5.py
We provide automated exercise feedback using a GitHub action (that is defined in .github/workflows/exercise-feedback.yml
).
To view your exercise feedback, navigate to Actions -> Exercise Feedback in your repository.
The exercise feedback is executed whenever you make a change in files in the exercise
folder and push your local changes to the repository on GitHub. To see the feedback, open the latest GitHub Action run, open the exercise-feedback
job and Exercise Feedback
step. You should see command line output that contains output like this:
Found exercises/exercise1.jv, executing model...
Found output file airports.sqlite, grading...
Grading Exercise 1
Overall points 17 of 17
---
By category:
Shape: 4 of 4
Types: 13 of 13