#| label: intro

# Course material 10
## Lesson 11 (21.12.2023)

> Disclaimer: Material is taken from 
> 
> + mCoding with James Murphy, "Automated Testing in Python with pytest, tox, and GitHub Actions" retrieved from [https://www.youtube.com/watch?v=DhUpxWjOhME&t=430s](https://www.youtube.com/watch?v=DhUpxWjOhME&t=430s) (20.12.2023)
> + freeCodeCamp with @iamrithmic, "Pytest Tutorial – How to Test Python Code" retrieved from [https://www.youtube.com/watch?v=cHYq1MRoyI0](https://www.youtube.com/watch?v=cHYq1MRoyI0) (20.12.2023)

## Setting up the folder structure

+ Before we start writing tests we have to restructure our folder a bit in order to setup a testing environment
+ We start by making some changes in the `pyproject.toml` file. Change the current content with the following:

+ now we will create a new file, called `setup.cfg` 
+ The `setup.cfg` file is a configuration file commonly used in Python projects to specify metadata, options, and dependencies for a Python package.
    + [metadata]
        + name: The name of the project/package, which is "example_project" in this case.
        + description: A brief description of the project, indicating that it is the "First example project."
        + url: The URL of the project's GitHub repository.
        + author: The name of the author, specified as "Florence Bockting."
        + license: The license under which the project is distributed, specified as the MIT License.
        + license_files: The file(s) containing the text of the license, specified as "LICENSE."
        + classifiers: Metadata classifiers providing information about the project. It indicates compatibility with Python 3, specifically versions 3.11.7 and 3.12.0.
    + [options]
        + python_requires: Specifies the minimum Python version required for the project, set to be >=3.6.
        + packages: Specifies the Python packages to include in the distribution, and in this case, it includes the "example_project" package.
        + install_requires: Lists the dependencies required by the project, including specific versions for libraries such as numpy, pandas, polars, pyarrow, scipy, seaborn, matplotlib, and Requests.
    + [options.extras_require]
        + testing: Specifies additional dependencies required for testing. These include tox, pytest, and pytest-cov.
        + docs: Specifies additional dependencies required for generating documentation. These include sphinx, sphinx-book-theme, numpydoc, myst_nb, and sphinx_design.

+ create a new file called `setup.cfg` in your main example-project directory and copy&paste the following code into it

+ then we create a new file called `tox.ini`
+ The `tox.ini` configuration file is used to define and run multiple test environments for your Python project.
    + Tox Configuration:
        + `minversion = 3.8.0`: Specifies the minimum required version of tox.
        + `envlist = py3117, py3120`: Defines the list of environment names to be created and tested.
        + `isolated_build = true`: Indicates that tox should create isolated environments.
    + GitHub Actions Configuration:
        + Provides a mapping between Python versions specified in the GitHub Actions workflow and the environment names defined in tox.
        + For example, if the GitHub Actions workflow specifies Python 3.11.7, it will use the py3117 environment.
    + Test Environment Configuration:
        + `setenv`: Sets environment variables; in this case, it sets PYTHONPATH to the project directory.
        + `deps`: Installs dependencies specified in the `requirements_dev.txt` file.
        + `commands`: Executes the specified command (`pytest --basetemp={envtmpdir}`) for running tests. The {envtmpdir} is a directory that tox provides for temporary files related to the specific environment.

The following tox.ini file sets up two test environments (py3117 and py3120) and uses them to run tests using pytest. The GitHub Actions section helps map Python versions used in GitHub Actions to tox environment names. 

+ The requirements_dev.txt file contains development dependencies needed for testing. Let's create this file now.
+ create a `requirements_dev.txt` and copy&paste the following content

+ and we have to create a new workflow such that github is informed about running out tests
+ create in `.github/workflows/` a new file called `tests.yml` and copy&paste the following content:

## Testing with pytest
+ activate your virtual environment
    + `$ conda activate example-project`
+ make sure your git repo is up-to-date
    + `$ git pull`  
+ install pytest
    + `$ pip install pytest`
+ pytest ([https://docs.pytest.org/en/7.1.x/contents.html](https://docs.pytest.org/en/7.1.x/contents.html)) is a python library that allows for very flexible testing of your code.
+ In the following I will only provide a small intro into what is possible with pytest

### Prepare our Code
+ Before we write our first test, we write two new simple functions that compute the mean and variance of a given sample:
    + create a new file in the folder example_project called "expectation" and open it
        + `$ touch example_project/expectation.py`
        + `$ start example_project/expectation.py`
    + write the following two functions:

+ now let us write our first test:
    + create a new file in our folder `tests` and open it:
        + `$ touch tests/test_expectation.py`
        + `$ start tests/test_expectation.py`
    + write the following lines: 

+ save the file, go to the terminal and type
    + `$ cd tests`
    + `$ pytest test_expectation.py`
+ you should get something similar to:

+ Thus we have one pass of 100%. This is because we have only one test function and this test function only "passes". Thus it cannot fail.
+ Let us modify now the test a bit:
    + in `test_mean()` we
        + provide an input vector [1,2,3] to our function and then
        + check whether the result matches with the expected value "2" using the `assert` python-keyword
    + in `test_variance()` we do basically the same except that we compute the variance by using the numpy.var function. 
+ Let us run the test again. 
    + `$ pytest test_expectation.py`  

+ Now, you should get something similar to the following:

+ the second test failed. But why? Remember that we talked about floating point representation and rounding issues that might occur. Pytest hints us already to this fact. We see that our computed variance is almost identical to the compute variance from numpy only in the 16th digit it differs.
+ For our current purpose we would rather like that the numbers are treated as equivalent.
+ Pytest provides for exactly this situation a function called `pytest.approx()`
+ Let's change the code for `test_variance` as follows and run pytest again:

+ all tests should pass now without problems.

### Pytest Fixtures
+ As you have seen above, we copied the entry [1,2,3]. It would be better to declare it once and then to use the assigned variable where we need it
+ For such problems, we can use `pytest.fixture` which is a particular *decorator*

#### Small excursus: Decorators
+ In Python, a decorator is a design pattern and a special type of syntactic construct that allows you to extend or modify the behavior of functions or methods.
+ The decorator itself is a function that takes another function (or method) as its argument.
+ The decorator function can modify the behavior of the passed function or perform additional actions before or after its execution.
+ Consider the following example:
    + we write first a very simple function `hello` and call it
    + then we define a new function (`my_decorator`) that modifies the `hello` function (our decorator)
    + finally, we add the decorator to the definiton of our `hello` function and call the function again

In [5]:
def hello():
    print("Hello!")

hello()

Hello!


In [8]:
def my_decorator(func):
    def wrapper():
        print("Something is happening before the function is called.")
        func()
        print("Something is happening after the function is called.")
    return wrapper

@my_decorator
def hello():
    print("Hello!")

hello()

Something is happening before the function is called.
Hello!
Something is happening after the function is called.


+ Let us return to our example
+ In the following, we want to have one sample from a uniform that we can call in all our tests
+ therefore we use a fixture which allows us to share the same information between test definitions
+ importantly, we define the scope of the fixture to be `scope = "class"` that is the fixture is invoked once per test class.

### Another very useful pytest decorator is `@pytest.mark.parameterize`
+ pytest.mark.parametrize decorator enables parametrization of arguments for a test function
+ allows one to define multiple sets of arguments and fixtures at the test function or class.
+ consider we want to test the function `expectation.mean` for negative, positive and mixed values
+ we could define three different tests

In [None]:
def test_mean_v1():
    x = [2,3]
    expected_result = 2.5
    assert expectation.mean(x) == expected_result
    
def test_mean_v2():
    x = [-2,-3]
    expected_result = -2.5
    assert expectation.mean(x) == expected_result
    
def test_mean_v3():
    x = [-2,3]
    expected_result = 0.5
    assert expectation.mean(x) == expected_result

+ however, we could also define only one function `test_mean` and then define the vectors as well as the output values as parameters of our test function
+ we can use the `@pytest.mark.parameterize` decorator and define each test as a tuple consisting of the vector and the expected result (e.g. `([2,3], 2.5)`)
+ the first argument in the parameterize decorator specifies the parameter name of each component in the tuple (e.g., `"x, expected_result"`)
+ the full test function looks then as follows (note: Pytest runs here three separate tests)

In [None]:
@pytest.mark.parametrize("x, expected_result", [([2,3], 2.5), ([-2,-3],-2.5), ([-2,3],0.5)])
def test_mean2(x, expected_result):
    assert expectation.mean(x) == expected_result

### Tests for download_data.py (@pytest.fixture, @pytest.mark.parametrize, and mocking)

+ Let us run now some tests for the file `download_data.py`
+ First, we open the file `example_project\download_data.py` and make one small change in the function `set_cwd`: 

+ save the change and go into the folder `tests`. 
+ open then file `test_download_data.py`
+ First we want to make a test for the function `set_cwd`
+ this test should ensure that the set_cwd method of the DownloadData class correctly changes the current working directory to the specified path and that the printed output reflects this change.
+ Therefore we do the following:
    + This test is using the `download_data_instance` fixture to create an instance of the DownloadData class with a temporary directory for testing.
    + The `test_set_cwd` function then calls the set_cwd method of this instance with the argument "examples." (which refers to the "examples" folder in our main directory)
    + After that, it captures the printed output using the capsys.readouterr() method.
    + The test asserts that the printed output (current working directory) ends with the expected path "examples."
    + It does this by converting both the actual and expected paths to pathlib.Path objects and comparing the last few components of the paths using the parts attribute.

+ Second, we want to create a test for the function `download_data`
+ This test is essentially checking whether the download_data method works correctly by ensuring that it creates a file with the expected content when given a specific URL. 
    + Again we create a fixture for the instance of the DownloadData class
    + This time, we want to check whether our code retrieves correctly data sets from different urls
    +  Using the requests-mock library to mock the download process: The code uses the requests-mock library to mock (=imitate) the download process. The patch function is used to temporarily replace the behavior of requests.get with a mock object. The mock object's content attribute is set to "mocked data content."
    +  Assertions: After the download process is mocked and the download_data method is called, the code performs assertions to verify that:
        + The file specified by file_name exists.
        + The content of the file matches the expected "mocked data content."
        + The requests.get method is called exactly once with the provided data_url.

### Tests for politeness_data.py (@pytest.fixture)

Now let us switch to the file `test_politeness_data.py` and make the following small change in the `plot_data` function:

+ Then go to the `tests` folder and open the file `test_politeness_data.py`
+ We will create first a test that ensures that the PolitenessData class, when provided with an example data file, processes the data correctly, and the output has the expected properties.
+ Therefore we do the followin:
    + **Fixture Setup** (example_data_file):
        + A Pytest fixture named `example_data_file` is defined. Fixtures are used to set up resources needed for tests.
        + This fixture creates a temporary CSV file (`example_data.csv`) containing sample data related to politeness, using the `tmp_path` fixture to get a temporary directory path.
        + The fixture returns the path to the created CSV file as a string.
    + **Test Function** (test_data_preprocessed):
        + The actual test function is defined. It takes two parameters: example_data_file (the fixture) and capsys (a fixture to capture output created during the test).
        + An instance of the `PolitenessData` class is created, initialized with the path to the example data file.
        + The `data_preprocessed` method of the PolitenessData class is then called with the example data file.
    + **Assertions** are made on the returned values:
        + It checks if the result of data_preprocessed is a `polars.DataFrame` (pl.DataFrame), assuming the class uses the Polars library for data manipulation.
        + It checks if the summaries variable is also a `polars.DataFrame`.
        + It verifies that the `length of the processed data` (df_joined) is 3, assuming there are three rows in the example data.
        + It checks that the `length of the summaries` DataFrame is 2, assuming there are two unique combinations of 'attitude' and 'gender' in the example data.

+ Finally, we also add a test for the function `plot_data`
+ The test should ensure that the plot_data method in the PolitenessData class behaves as expected.
+ Specifically, it should verify that the method returns a Matplotlib Axes object and that the generated plot has the expected number of legend items.
+ Therefore, we do the following:
  + **Fixture Setup** (example_data_file and capsys):
      + Similar to the previous example, the `example_data_file` fixture is used to create a temporary CSV file with example data.
      + The capsys fixture is used to capture the output during the test.

  + **Test Function** (test_plot_data):
       + An `instance of the PolitenessData` class is created and initialized with the path to the example data file.
       + The `data_preprocessed` method of the PolitenessData class is called to obtain processed data (df_joined).
       + The `plot_data` method is then called with the processed data, and the result is captured in the axs variable.
  + **Assertions** are made on the results:
       + It checks if the `axs` variable is an instance of plt.Axes, indicating that the method returns a Matplotlib Axes object.
       + It checks that the number of legend items in the plot is 2, assuming there are two items in the legend. 

### @pytest.mark.skip and @pytest.mark.xfail

+ You have a lot of different opportunities with pytest.
+ Among others it is also possible to skip certain test or to `xfail` them (that is: you already know that this test will fail but have no fix for the problem at the moment)
+ In the following two examples for each case:                                                                                          

## Using tox for testing multiple environments
+ [tox](https://tox.wiki/en/4.11.4/) is a generic virtual environment management and test command line tool which aims to automate and standardize testing in Python
+ Let's first install tox:
    + `pip install tox`
+ Then we can run tox (make sure you are in the main directory of your package)
    + `tox`

## Adding test badge to README.md

+ finally, let us create in our README.md also a "badge" for our tests
+ open the `README.md` and add the following line:
    + change "USERNAME" with your personal GitHub user name 

+ now we can add, commit, and push our changes to our Github repo
    + `$ git add --all`
    + `$ git commit -m "add tests"`
    + `$ git push`

let's open our GitHub repo site and check whether all deployment stages work properly and the tests pass