# Deeplay Developer Tutorial

In the following tutorials, we intend to provide a comprehensive guide to the Deeplay library, focused on helping developers contribute to the project. We will cover the following topics:
- Project file structure
- Style guide 
    - Naming conventions
    - Code formatting
    - Documentation
    - Testing
- The base classes and knowing what to subclass
- The methods and attributes that can be overridden to customize the behavior of the base classes


## Project file structure

### Root level files

The project contains the following files at the root level:
- `LICENSE.txt`: The license file for the project
- `.pylintrc`: The configuration file for the pylint tool. It contains the rules for code formatting and style, especially the warning to be ignored.
- `README.md`: The project's README file
- `requirements.txt`: The file containing the dependencies for the project
- `setup.cfg`: The configuration file for the setup tool. It contains the metadata for the project.
- `setup.py`: The setup file for the project. It contains the instructions for installing the project.
- `stylestubgen.py`: The script for generating the style stubs for the project. These are type hints for the style system. It creates .pyi files for select classes in the project, and adds overrides to the `style` method to enforce the type hints. It also handles the doc strings for the styles in the same way.

### Root level directories

The project contains the following directories at the root level:
- `.github`: Contains the GitHub actions workflow files for the project. These run the continuous integration tests for the project.
- `.vscode`: Contains the Visual Studio Code settings for the project. These settings are used to configure the editor for the project. They include good defaults for the editor, such as the code formatter and the linter.
- `deeplay`: Contains the source code for the project. This is where the main code for the project is located.
- `guidelines`: Contains the guidelines for contributing to the project. These are the instructions for developers who want to contribute to the project, and should be followed when making changes to the project.
- `tutorials`: Contains the tutorial files for the project. These are jupyter notebooks that provide a comprehensive guide to the Deeplay library, focused on helping users and developers get started with, and make the most of the library.

### Deeplay directory

The deeplay source code is organized in a hierarchical structure. The main focus is ensuring that files only depend on other files in the same or lower (closer to the root) directories. This is to prevent circular dependencies and make the codebase easier to understand. So, for example in the structure:
``` bash
a_folder/
    __init__.py
    a.py
    b_folder/
        __init__.py
        b.py
        ...
    c_folder/
        __init__.py
        c.py
        c_extra.py
        ...
    ...
```

`a.py` can import `b.py` and `c.py`, but `b.py` and `c.py` cannot import `a.py`. Moreover, `b.py` should not import `c.py` or `c_extra.py`. But `c.py` can import `c_extra.py` and vice versa.

This means that the root level files contain the most general classes and functions, while the lower level files contain more specific classes and functions. This makes it easier to understand the codebase and to find the code you are looking for.

## File scope

In general, each file should export a single class or function. This makes it easier to understand the codebase and to find the code you are looking for. If a file exports multiple classes or functions, they should be related to each other and should be used together. If a file exports multiple unrelated classes or functions, it should be split into multiple files. It is better to organize the codebase such that related objects are in the same folder instead of the same file.

## Deeplay root level files

Let's quickly overview the root level files.

#### `module.py`

This file contains the `DeeplayModule` class, which is the base class for all modules in the Deeplay library. It also contains the configuration logic and the selection logic.

#### `meta.py`

This file contains the metaclass that all `DeeplayModule` subclasses should use.

#### `list.py`

This file contains list-like classes (most importantly `LayerList` and `Sequential`), which are used as containers for layers, blocks and components in the Deeplay library.

#### `decorators.py`

This file contains the decorators used in the Deeplay library. These are mainly method decorators that are used to modify the behavior of methods in the library to ensure methods are called at the right point in the lifecycle of the object.

#### `trainer.py`

This file contains the `Trainer` class, which is used to train models in the Deeplay library. It extends the lightning `Trainer` class.

## Deeplay subdirectories

The `deeplay` directory contains the following subdirectories:

### `activelearning`

This directory contains the classes and functions related to active learning in the Deeplay library. This includes application wrappers, criterion, and dataset classes.

### `applications`

This directory contains the classes and functions related to applications in the Deeplay library. Applications are classes that contain the training logic for specific tasks, such as classification, regression, segmentation, etc. They handle all the details of training a model for a specific task, except for the model architecture.

Generally, the individual applications will be placed in further subdirectories, such as `classification`, `regression`, `segmentation`, etc. However, this is less strict than the root level file structure.

### `blocks`

This directory contains the classes and functions related to blocks in the Deeplay library. Blocks are the building blocks of models in the Deeplay library. They are used to define the architecture of a model, and can be combined to create complex models. The most important block classes are in the subfolders `conv`, `linear`, `sequence` and in the files `base.py` anb `sequential.py`.

### `callbacks`

Contains deeplay specific callbacks. Mainly the logging of the training history and the custom progress bar.

### `components`

Contains the reusable components of the library. These are generally built as a combination of blocks. They are more flexible than full models, but less flexible than blocks.

### `external`

Contains logic for interacting with external classes and object, such as from `torch`. Most important objects are `Layer` and `Optimizer`.

### `initializers`

Contains the classes for initializing the weights of the models.

### `models`

Contains the models of the library. These are the full models that are used for training and inference. They are built from blocks and components, and are less flexible than both. They generally represent a specific architecture, such as `ResNet`, `UNet`, etc. 

### `ops`

Contains individual operations that are used in the blocks and components. These are generally low-level, non-trainable operations, such as `Reshape`, `Cat`, etc. They act like individual layers.

### `tests`

Contains the tests for the library. These are used to ensure that the library is working correctly and to catch any bugs that may arise.


## Style guide

The code style should follow the [PEP 8](https://www.python.org/dev/peps/pep-0008/) guidelines. The code should be formatted using
[black](https://black.readthedocs.io/en/stable/). We are not close to lint-compliance yet, but we are working on it.

Use type hints extensively to make the code more readable and maintainable. The type hints should be as specific as possible.
For example, if a string can be one of several values, use a `Literal` type hint. Similarly, if a function takes a list of integers,
the type hint should be `List[int]` instead of `List`. We are currently supporting Python 3.8 and above. Some features of Python 3.9
are not supported yet, such as the `|` operator for type hints. You can get around this by importing `annotations` from `__future__`.


Classes should have their attribute types defined before the `__init__` method. An example is shown below:

```python
class MyClass:
    attribute: int

    def __init__(self, attribute: int):
        self.attribute = attribute
```

### Naming conventions

Beyond what is defined in the PEP 8 guidelines, we have the following naming conventions:

- Minimize the use of abbreviations. If an abbreviation is used, it should be well-known and not ambiguous.
- Use the following names:
  - "layer" for a class that represents a single layer in a neural network, typically the learnable part of a block.
  - "activation" for a class that represents a non-learnable activation function.
  - "normalization" for a class that represents a normalization layer.
  - "dropout" for a class that represents a dropout layer.
  - "pool" for a class that represents a pooling layer.
  - "block" / "blocks" for a class that represents a block in a neural network, typically a sequence of layers.
  - "backbone" for a class that represents the main part of a neural network, typically a sequence of blocks.
  - "head" for a class that represents the final part of a neural network, typically a single layer followed by an optional activation function.
  - "model" for a class that represents a full neural network architecture.
  - "optimizer" for a class that represents an optimizer.
  - "loss" for a class that represents a loss function.
  - "metric" for a class that represents a metric.
- If there is a naming conflict within a class, add numbers to the end of the name with an underscore, 0-indexed. For example, `layer_0` and `layer_1`.
  - This is correct: `layer_0`, `layer_1`, `layer_2`.
  - This is incorrect: `layer_1`, `layer_2`, `layer_3`.
  - This is incorrect: `layer`, `layer_1`, `layer_3`.

### Imports

Use absolute imports for all imports, unless the target is at the same level in the hierarchy.

```python
# deeplay/external/optimizers/optimizer.py
from deeplay.module import DeeplayModule # correct
from ...module import DeeplayModule # incorrect

from deeplay.external.optimizers.adam import Adam # correct
from .adam import Adam # also allowed
```

In the `__init__.py` files, you may use * imports from directories, but not from files. From files, you should import the specific classes or functions you need.

```python
# deeplay/external/__init__.py
from .optimizers import * # correct
from .optimizers.adam import * # incorrect
from .optimizers.adam import Adam # correct 
```

## Documentation

Documentation should follow the [NumpyDoc style guide](https://numpydoc.readthedocs.io/en/latest/format.html#style-guide).

In general, all non-trivial classes and methods should be documented. The documentation should include a description of the class or method, the parameters, the return value, and any exceptions that can be raised. We sincerely appreciate any effort to improve the documentation, particularly by including examples of how to use the classes and methods.

## Testing

All new features should be tested. The tests should cover all possible code paths and should be as comprehensive as possible. The tests should be written using the `unittest` module in Python. The tests should be placed in the `tests` folder. The tests should be run using `unittest`. Not all tests follow our guidelines yet, but we are working on improving them.

In general, we aim to mirror the structure of the `deeplay` package in the `tests` package. For example, `deeplay.external.layer` should have a corresponding `tests/external/test_layer.py` file. The name of the file should be the same as the module it tests, but with `test_` prepended. Note that when adding a folder, the `__init__.py` file should be added to the folder to make it a package.

Each test file should contain one `unittest.TestCase` class per class or method being tested. The test methods should be as descriptive as possible. For example, `test_forward` is not a good name for a test method. Instead, use `test_forward_with_valid_input` or `test_forward_with_invalid_input`. The test methods should be as independent as possible, but we value coverage over independence.

It is fine to use multiple subtests using `with self.subTest()` to test multiple inputs or edge cases. This is particularly useful when testing a method that has many possible inputs.

It is fine and preferred to use mocking where appropriate. For example, if a method calls an external API, the API call should be mocked. The `unittest.mock` module is very useful for this purpose.