# Netherlands R&D Expenditure in relation with Employees from Abroad.

This repository contains data engineering and data science projects and exercises using open data sources as part of the AMSE/SAKI course, taught by the FAU Chair for Open-Source Software (OSS) in the Winter'23 semester. This repo is forked from 2023-amse-template repository.

The task was to build a Data Engineering Project, which takes at least two public available datasources and processes them with an automated datapipeline, in order to report some findings from the result.

# Netherlands R&D Expenditure in relation with Employees from Abroad.

Overview

The aim of this project is to investigate if there is a relationship between the R&D Expenditure and employees working in the Netherland who are from abroad. Therefore it uses open available datasources provided by

For details see the project plan.

Project Structure

project/
├── pipeline.py                 # ETL data pipeline implementation
├── pipeline.sh                 # Bash script for running the datapipeline for ETL
├── requirements.txt            # Dependencies for external libraries
├── test_pipeline.py            # Test cases for component & Systems testing
├── tests.sh                    # Bash script for running all the test cases
├── Know_data_sources.ipynb     # Notebook for data exploration
├── report.ipynb                # Notebook for final project project
└── project-plan.md             # Project plan and documentation

Important files of the project and their roles:

project/pipiline.sh: The Bash script will run an automated ETL pipeline that creates an SQLite databases named employees_data.sqlite & R&D_Expenditure.sqlite that contain tables representing two open data sources of the project.
project/tests.sh: A bash script that will execute the component and system-level testing for the project.
project/report.ipynb: This Jupyter notebook serves as the final report for the project, providing a comprehensive exploration of all aspects and findings. The report primarily investigates to identify how much of the increased rate of employees from abroad is related to the increase in Netherlands R&D expenditure in whole country over the years from 2013 to 2017, addressing various key questions, based on the data in employees_data.sqlite & R&D_Expenditure.sqlite. See the report.

Continuous Integration Pipeline using GitHub Action:

A Continuous Integration pipeline has been implemented using a GitHub action defined in Continuous Integration. This pipeline is triggered whenever changes are made to the project/ directory (with a few exceptions: Know_data_sources.ipynb, report.ipynb, project-plan.md) and pushed to the GitHub repository, or when a pull request is created and merged into the main branch. The Project_feedback.yml workflow executes the project/tests.sh test script, and in case of any failures, it sends an error message to the owner Email, as shown in the sample screenshot below:

Project Setup

Clone this git repository

git clone git@github.com:Malik-Naeem-Awan/made-project-FAU.git

Install Python. Then create a virtual environment inside the repo and activate it.

python3 -m venv <env_name>
source <env_name>/bin/activate

Go to the project/ directory, Download and install the required Python packages for the project.

pip install -r requirements.txt

To run the project, go to the project/ directory and run the pipeline.sh bash script. It will run the whole ETL pipeline and generate SQLite databases named employees_data.sqlite & R&D_Expenditure.sqlite that contains tables, employees and R&D_Expenditure, representing two open data sources of the project.

chmod +x pipeline.sh
sh pipeline.sh

To run the test script which will execute the component and system-level testing for the project, run the following command.

chmod +x tests.sh
sh tests.sh

Finally, run and explore the project/report.ipynb project notebook, and also feel free to modify it.

Exercises (not part of the project)

During the semester we had to complete exercises, sometimes using Python, and sometimes using Jayvee. Automated exercise feedback is provided using a GitHub action that is defined in .github/workflows/exercise-feedback.yml.

The exercise feedback is executed whenever we make a change in files in the exercise/ directory and push our local changes to the repository on GitHub. To see the feedback, open the latest GitHub Action run, and open the exercise-feedback job and Exercise Feedback steps executed.

Name		Name	Last commit message	Last commit date
Latest commit History 82 Commits
.github/workflows		.github/workflows
data		data
examples		examples
exercises		exercises
img		img
project		project
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

# Netherlands R&D Expenditure in relation with Employees from Abroad.

Overview

Project Structure

Project Setup

Exercises (not part of the project)

About

Releases

Packages

Languages

License

Malik-Naeem-Awan/made-project-FAU

Folders and files

Latest commit

History

Repository files navigation

# Netherlands R&D Expenditure in relation with Employees from Abroad.

Overview

Project Structure

Project Setup

Exercises (not part of the project)

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages