<div align="center">
    <h1>Classes, unit tests and other good coding practices 👌 </h1>
    <h3> Weizmann AI Hub for Scientific Discovery </h3>
    <h4>Nathan LEVY</h4>
    nathan.levy@weizmann.ac.il
    <p>with inputs from M.Kim</p>
    <p>Summer 2024</p>
</div>

The goal of this tutorial is to build a Python package for k-nn classification. We will implement the k-nn algorithm, write unit tests for it, and package it in a clean and organized way.

Specifically, we will cover the following topics:
- Object-oriented programming (developing the knn classifier as a class inspired by scikit-learn architecture)
- Unit tests (using the `unittest` module)
- Package management (using `pyproject.toml`)
- Code documentation (using docstrings)
- Code linting and formatting (using `ruff`)


Overall, the goal of this tutorial is to provide a hands-on experience with good coding practices and to give you the tools to develop your own Python packages - $escaping~from~Jupyter~ notebooks!$ 🎉

💡 This tutorial is built for everyone in the hub and does not assume specific knowledge apart from Python programming. We assume that you already solved the  `ex-home-knn` prior to start this tutorial.

💡 We will not cover version control tools (Git) but we highly encourage you to start a git repository for this project. 

💡 We recommend using VSCode GUI for this project. 

Over the course of its development, Python experienced many enhancements and new features, detailed in PEPs (Python Enhancement Proposals). For instance:

- PEP 8 is a style guide which introduced naming styles, indentation, and other conventions, cf https://realpython.com/python-pep8/#toc
- PEP 257 is a docstring convention, cf https://peps.python.org/pep-0257/

You can birefly look at these PEPs. In this tutorial we will see how to use dedicated tools to make your code compliant with these conventions!

## 1. Package structure in Python

A Python package is a way of organizing related Python code into a directory hierarchy. Most of the tools that you use in Python like NumPy, Pandas are organized as Python packages. Packages can be installed using package managers like pip, and are often distributed through the Python Package Index [PyPI](https://pypi.org/). This system allows Python developers to easily share and reuse code across projects and with the wider Python community.

The `pyproject.toml` file was introduced in [PEP 518](https://pip.pypa.io/en/stable/reference/build-system/pyproject-toml/).  It is the configuration file for a package. It is used to specify the project's dependencies, build system, and other configuration options. 


It needs to be located at the root of the project directory and has 3 main sections: `[build-system]`, `[project]`, and `[tool]`.

##### 1. `[build-system]` section

This section specifies specifies the build backend to be used. A build backend for a Python package is a tool or system that handles the process of building, packaging, and preparing your Python project for distribution. Here we will use default Hatchling as a build backend  - so you can let this section as is.


##### 2. `[project]` section

This section specifies the project's metadata. You can fill the following fields:
- `name`: the name of the project (as it will appear when you want to install it with pip)
- `version`: the version of the project
- `description`: a one-line description of the project
- `readme`: the path to the README file
- `requires-python`: the minimum Python version supported by the project
- `authors`: the authors of the project
- `license`: the license of the project
- `dependencies`: the dependencies of the project
- `urls`: the URLs of the project, i.e. a public git repository


The non-trivial field is the `dependencies` field. It specifies the dependencies of the project, i.e. the packages that need to be installed in order to use the project. For instance, if you want to specify that you need `numpy` and `scipy` to run your project, you can write:

`dependencies = ["numpy", "scipy"]`

You can also specify the version of the dependencies that you need. For instance, if you need `numpy` version greater or equal to 1.20.0 and `scipy` version 1.6.0, you can write:

`dependencies = ["numpy>=1.20.0", "scipy==1.6.0"]`


We'll fill this field as we progress on our project.

We also specify optional dependencies, which are needed only for specific features of our package. In our case, the package required to run the tests. In our case we will need `pytest`, so we write: 

`[project.optional-dependencies]`

`test = ["pytest"]`

##### 3. `[tool]` section

In this last part, we have to specify the details of external tools that we'll need. For our project, we want to do **linting**. Linting is the process of checking the source code for programmatic and stylistic errors. Linters are tools that perform static analysis on your code to find potential errors, bugs, and stylistic issues. Linters can help you catch bugs early in the development process, and ensure that your code is clean, readable, and maintainable. In our case, we will use `ruff`. Let's have a short overview of this tool before filling the `[tool]` section.

## 2. Linting your code with `ruff`


https://docs.astral.sh/ruff/tutorial/

https://docs.astral.sh/ruff/configuration/

https://dev-to.translate.goog/ken_mwaura1/enhancing-python-code-quality-a-comprehensive-guide-to-linting-with-ruff-3d6g?_x_tr_sl=en&_x_tr_tl=fr&_x_tr_hl=fr&_x_tr_pto=sc

First of all, we need to install `ruff`: if you are working with VSCODE, you can install the extension `ruff` directly from the marketplace. If not simply add it to your environment with `pip install ruff`.

Ruff can check for more than 800 rules, that are listed in the [ruff documentation](https://docs.astral.sh/ruff/rules/), but to begin with you can use the default rules, which correspond to the F and E rules in the documentation and are sufficient ot cover most common errors. 

- We reproduced the [tutorial](https://docs.astral.sh/ruff/tutorial/#getting-started) example file in `test_ruff.py`. You can run the linting with the following command:

    `ruff check test_ruff.py`

    You can also run the linting on a whole directory with:

    `ruff check .`

- You can then fix the auto-fixable errors: 

     `ruff check --fix .`

- And finally format your files so that all lines are less than the line length you provided: 

    `ruff format .`


A Python package is a directory that contains a special file called `__init__.py`. This file can be empty, but it is necessary to tell Python that the directory should be considered a package. The package directory can also contain other Python files, modules, and subpackages.

## Classes

https://ccbskillssem.github.io/assets/Catamura_Skills_Sem.pdf


Type hints



## Documentation 

https://www.datacamp.com/tutorial/documenting-python-code

## Unit testing


https://www.datacamp.com/tutorial/coding-best-practices-and-guidelines

## Application: build a k-nn classifier

https://scikit-learn.org/1.5/auto_examples/datasets/plot_iris_dataset.html

## Build the package

https://docs.google.com/presentation/d/1AKVx6vlzv6sAVBoyT7gLJnJtRaNXMFf1/edit#slide=id.p22