# 🧪 Day 3: Testing Exercise

Welcome to **Day 3** of the Research Software Engineering course!  
In this notebook, you'll practice writing and running **unit tests** for Python code.

---

### 🗒️ Learning Objectives
By the end of this session, you should be able to:
- ✅ Clone a GitHub repository
- ✅ Set up your environment
- ✅ Write and run simple tests using `pytest`
- ✅ Understand test failures and fix broken code

Let's get started! 🚀


In [None]:
# 📥 Clone the course repository from GitHub
!git clone https://github.com/likeajumprope/RSE_Juelich.git

Cloning into 'RSE_Juelich'...
remote: Enumerating objects: 182, done.[K
remote: Counting objects: 100% (182/182), done.[K
remote: Compressing objects: 100% (133/133), done.[K
remote: Total 182 (delta 74), reused 103 (delta 26), pack-reused 0 (from 0)[K
Receiving objects: 100% (182/182), 91.83 KiB | 5.40 MiB/s, done.
Resolving deltas: 100% (74/74), done.


In [None]:
# 🧹 Optional: Remove the repository if you need to start fresh
# !rm -rf RSE_Juelich

In [None]:
# 📂 Navigate into the Day 3 folder (adjust the path if necessary)
%cd /content/RSE_Juelich/day3

/content/RSE_Juelich/day3


In [None]:
# 📦 Install required packages
!pip install -r requirements.txt



## Write tests

We start with writing simple `assert` statements. These statements can be inserted directly into the code.

First, use the `assert` statement to assert that the cleaned data has no Nans.

Example:
load file # this is mock goce
assert file.isna().sum().sum() == 0

In [9]:
import os
import pandas as pd

In [10]:
os.chdir('/content/RSE_Juelich/day3/reproducible-research-project/data/clean')
clean_data = pd.read_csv('clean.csv')
len(clean_data)

909

In [None]:
assert clean_data.isna().sum().sum() == 0, "Data contains missing values"

Write a small test function that does the same.
Run your function on the clean data

In [7]:
def test_clean_data(data):
    assert data.isna().sum().sum() == 0, "Data contains missing values"

In [8]:
test_clean_data(clean_data)

Write a second function that checks that the number of columns is

In [9]:
def test_number_of_row(data):
    assert len(data) == 909, "Data does not have the expected number of rows"

Create a folder names `tests.`
Create a file that contains the two tests.

In the next step, you want to run the tests from the root of the folder using the  the command `pytest`.

Create a conftest.py. Add a fixture that you can use in your tests.
(an example could be to read the file from the data/clean folder)


Note:
Make sure the file starts with test_*.py or *_test.py

In [11]:
!pwd
os.chdir('/content/RSE_Juelich/day3/reproducible-research-project')
!pytest --disable-warnings -q

/content/RSE_Juelich/day3/reproducible-research-project/data/clean
[32m.[0m[32m.[0m[32m                                                                       [100%][0m
[32m[32m[1m2 passed[0m[32m in 0.03s[0m[0m


Create a (snake) make file that automates running your tests.

1. Create a file `MAKEFILE` in the root of the reproducible-research-project folder (no extension).

2. Copy the pytest command in there.



In [12]:
!pwd

/content/RSE_Juelich/day3/reproducible-research-project


In [13]:
!make test

[32m.[0m[32m.[0m[32m                                                                       [100%][0m
[32m[32m[1m2 passed[0m[32m in 0.02s[0m[0m


Export your environment variables

In [14]:
!pip freeze > requirements.txt

Run the functions in scr.

In [27]:
import sys
sys.path.append('/content/RSE_Juelich/day3/reproducible-research-project/src/')

inputfile = '/content/RSE_Juelich/day3/reproducible-research-project/data/raw/student_habits_performance.csv'
outputfile = '/content/RSE_Juelich/day3/reproducible-research-project/data/clean/clean.csv'

from process_data import clean_data

clean_data(inputfile, outputfile)



Cleaned data saved to: /content/RSE_Juelich/day3/reproducible-research-project/data/clean/clean.csv


Add the execution to the make file