<h1 align = "center">BOILERPLATE</h1>

---

**Objective:** The file provides a simple *boilerplate* to concentrate on what is necessary, and stop doing same tasks! The boilerplate is also configured with certain [**nbextensions**](https://gitlab.com/ZenithClown/computer-configurations-and-setups) that I personally use. Install them, if required, else ignore them as they do not participate in any type of code-optimizations. For any new given competitions *edit* this file or `File > Make a Copy` to get started with the project. Some settings and configurations are already provided, as mentioned below.

**Table of Contents**

Generally, I use the [**`Table of Contents (2)`**](https://jupyter-contrib-nbextensions.readthedocs.io/en/latest/nbextensions/toc2/README.html) to easily create and manage my notebook sections. However, in the section some already *pre-defined* sections are explained:
 1. **Code Imports** - As per [PEP8 Conventions](https://riptutorial.com/python/example/11956/pep8-rules-for-imports) all imports should be addressed at the beginning of the file. This section can be used for the same.
 2. **Global Arguments** - The are defined to control the project flow - like defining a **`PROJECT_CODE`** which will serve as an identification for each sub-competitions.
 3. **Read & Process Input File(s)** - Process input file(s) and prepare it for any ML/AI/Analysis model.

In [11]:
# show current code version
# use https://semver.org/
# this file is kept to keep track of individual
# project/competitions progress in check
# the actual tag is represented as: <PROJECT_CODE>:<version>
open("VERSION", 'rt').read() # bump codecov

'development #semver-2.0.0'

## Code Imports

**PEP8 Style Guide** lists out the following *guidelines* for imports:
 1. Imports should be on separate lines,
 2. Import order should be:
    * standard library/modules,
    * related third party imports,
    * local application/user defined imports
 3. Wildcard import (`*`) should be avoided, else specifically tagged with **`# noqa: F403`** as per `flake8`
 4. Avoid using relative imports; use explicit imports instead.
 
For more details, visit [here](https://peps.python.org/pep-0008/#imports) for more information. Note, that actual `flake8` file is currently missing from the template, and will be later added if required. In addition, `logging` module is imported and configured.

[**`logging`**](https://docs.python.org/3/howto/logging.html) is a standard python module that is meant for tracking any events that happen during any software/code operations. This module is super powerful and helpful for code debugging and other purposes. The next section defines a `logging` configuration in **`/logs/`** directory. Each project is separated as `<PROJECT_CODE>/<VERSION>/<DATE>.log` file. The directory is automatically created, if not available. Use logging operations like:

```python
 >> logging.debug("This is a Debug Message.")
 >> logging.info("This is a Information Message.")
 >> logging.warning("This is a Warning Message.")
 >> logging.error("This is a ERROR Message.")
 >> logging.critical("This is a CRITICAL Message.")
```

In [6]:
import logging # configure logging on `global arguments` section

In [4]:
from time import ctime # will be used in logging, file/output directory create etc.
from os import makedirs # create directories dynamically, if not already done so manually
from os.path import join # keep directories `os`-independent
from copy import deepcopy # `pd.Dataframe` is mutable, so any `df` operation may need `deepcopy`
from tqdm import tqdm as TQ # provide progress bar for code completions
from uuid import uuid1 as UUID # keep output file name unique
from datetime import datetime as dt # formatting datetime objects

In [5]:
# import numpy as np
import pandas as pd

## Define Global Arguments

In [7]:
# a single project can have multiple sub-projects and/or output
# generally, each sub-project has it's own `notebook` and code files
# use the `PROJECT_CODE` tag to create a directory of the format
# <execution date>/<PROJECT_CODE> thus giving an unique identity for
# each run of code. Once defined, keep this code same throughout.
# this code can also be used for keeping track on progress per
# sub-project level.
PROJECT_CODE = "BOILERPLATE"

In [8]:
ROOT = "." # current directory
DATA = join(ROOT, "data")

In [9]:
# define output directory
# this is defined on current date
# `today` is so configured that it permits windows/*.nix file/directory names
today = dt.strftime(dt.strptime(ctime(), "%a %b %d %H:%M:%S %Y"), "%a, %b %d %Y")

print(f"Code Execution Started on: {today}") # only date

Code Execution Started on: Wed, Apr 06 2022


In [12]:
OUTPUT_DIR = join(ROOT, "output", today, PROJECT_CODE)
makedirs(OUTPUT_DIR, exist_ok = True) # create dir if not exist

# also create directory for `logs`
LOGS_DIR = join("/", "logs", PROJECT_CODE, open("VERSION", 'rt').read())
makedirs(LOGS_DIR, exist_ok = True)

In [13]:
logging.basicConfig(
    filename = join(LOGS_DIR, f"{today}.log"), # change `reports` file name
    filemode = "a", # append logs to existing file, if file exists
    format = "%(asctime)s - %(name)s - CLASS:%(levelname)s:%(levelno)s:L#%(lineno)d - %(message)s",
    level = logging.DEBUG
)

In [14]:
# set/change output file name
OUTPUT_FILE = f"{UUID()}.xlsx" # randomly generate names

# log/inform users of current output file name
logging.info(f"Output File : {join(OUTPUT_DIR, OUTPUT_FILE)}")
print(f"Output File : {join(OUTPUT_DIR, OUTPUT_FILE)}") # use this syntax

Output File : .\output\Wed, Apr 06 2022\BOILERPLATE\1ab9acd9-b5a4-11ec-92d7-5405db104a4e.xlsx


In [10]:
INPUT_FILENAME = join(DATA, "<give-file-name-here-with-extension>")

## Read & Process Input File(s)

In [None]:
data = pd.read_excel(INPUT_FILENAME)
data.sample()