<h1 align = "center">Boilerplate/Template Design</h1>

---

**Objective:** The file provides a simple *boilerplate* to concentrate on what is necessary, and stop doing same tasks (`DRY` - Don't Repeat Yourself)! The boilerplate is also configured with certain [**nbextensions**](https://gitlab.com/ZenithClown/computer-configurations-and-setups) that I personally use. Install them, if required, else ignore them as they do not participate in any type of code-optimizations. For any new project *edit* this file or `File > Make a Copy` to get started with the project. Some settings and configurations are already provided, as mentioned below. In addition, some user defined modules are available to import. Check `CHANGELOG.md` for more details, however specific *user-defined* imports maybe documented/versioned seperately. Any dependent [**`submodule(s)`**](https://www.atlassian.com/git/tutorials/git-submodule) is available under `../utilities/submodules` directory.

In [1]:
# show current code version using https://semver.org/ convention
# version release information is also available under CHANGELOG.md
__version__ = open("../VERSION", 'rt').read() # bump codecov
print(f"Current Code Version: {__version__}")

# the author name is skipped, however copywright is provided as such
# commit level author is available on git commits, and details can be setup
# the template repository is designed to keep code simple, create or edit copyright
__copyright__ = "Copywright © 2023 Debmalya Pramanik"

Current Code Version: v0.1.2


## Code Imports

A code must be written such that it is always _production ready_. The conventional guidelines provided under [**PEP8**](https://peps.python.org/pep-0008/#imports) defines the conventional or syntactically useful ways of defining and/or manipulating functions. Necessar guidelines w.r.t. code imports are mentioned below, and basic libraries and import settings are defined.

 1. Imports should be on separate lines,
 2. Import order should be:
    * standard library/modules,
    * related third party imports,
    * local application/user defined imports
 3. Wildcard import (`*`) should be avoided, else specifically tagged with **`# noqa: F403`** as per `flake8` or **`# pylint: disable=unused-import`** as per `pylint`
 4. Avoid using relative imports; use explicit imports instead.

In [2]:
import os   # miscellaneous os interfaces
import sys  # configuring python runtime environment
# import time # library for time manipulation, and logging

In [3]:
# use `datetime` to control and preceive the environment
# in addition `pandas` also provides date time functionalities
import datetime as dt

In [4]:
# from copy import deepcopy      # dataframe is mutable
# from tqdm import tqdm as TQ    # progress bar for loops
# from uuid import uuid4 as UUID # unique identifier for objs

In [5]:
# import warnings # module for warnings management

### Code Debugging & Logging

[**`logging`**](https://docs.python.org/3/howto/logging.html) is a standard python module that is meant for tracking any events that happen during any software/code operations. This module is super powerful and helpful for code debugging and other purposes. The next section defines a `logging` configuration in **`../logs/`** directory. Modify the **`LOGS_DIR`** variable under *Global Arguments* to change the default directory. The module is configured with a simplistic approach, such that any `print())` statement can be update to `logging.LEVEL_NAME()` and the code will work. Use logging operations like:

```python
 >> logging.debug("This is a Debug Message.")
 >> logging.info("This is a Information Message.")
 >> logging.warning("This is a Warning Message.")
 >> logging.error("This is a ERROR Message.")
 >> logging.critical("This is a CRITICAL Message.")
```

Note: some directories related to logging is created by default. This can be updated/changed in the following configuration section.

In [6]:
# import logging # configure logging on `global arguments` section, as file path is required

### Data Analysis and AI/ML Libraries

Import of data analysis and AI/ML libraries required at different intersections. Check settings and configurations [here](https://gitlab.com/ZenithClown/computer-configurations-and-setups) and code snippets [here](https://gitlab.com/ZenithClown/computer-configurations-and-setups/-/tree/master/template/snippets/vscode) for understanding settings that is used in this notebook. The code uses `matplotlib.styles` which is a custom `.mplstyle` file recognised by the `matplotlib` downlodable from [this link](https://gitlab.com/ZenithClown/computer-configurations-and-setups/-/tree/master/settings/python/matplotlib).

In [7]:
# import swifter # https://github.com/jmcarpenter2/swifter
import numpy as np
import pandas as pd

%precision 3
pd.set_option('display.max_rows', 50) # max. rows to show
pd.set_option('display.max_columns', 17) # max. cols to show
np.set_printoptions(precision = 3, threshold = 15) # set np options
pd.options.display.float_format = '{:,.3f}'.format # float precisions

In [8]:
import seaborn as sns
import matplotlib.pyplot as plt

%matplotlib inline
sns.set_style('whitegrid');
# plt.style.use('default-style'); # http://tinyurl.com/mpl-default-style

In [9]:
# sklearn metrices for analysis can be imported as below
# considering `regression` problem, rmse is imported metrics
# for rmse, use `squared = False` : https://stackoverflow.com/a/18623635/
# from sklearn.metrics import mean_squared_error as MSE

In [10]:
import tensorflow as tf
print(f"Tensorflow Version: {tf.__version__}", end = "\n") # required >= 2.8

# check physical devices, and gpu compute capability (if available)
if len(tf.config.list_physical_devices(device_type = "GPU")):
    # https://stackoverflow.com/q/38009682/6623589
    # https://stackoverflow.com/a/59179238/6623589
    print("GPU Computing Available.", end = " ")
    
    # experimentally, get the gpu details and computation power
    # https://www.tensorflow.org/api_docs/python/tf/config/experimental/get_device_details
    devices = tf.config.list_physical_devices(device_type = "GPU")[0] # first
    details = tf.config.experimental.get_device_details(devices) # only first
    details.get('device_name', 'compute_capability')
    print(f"EXPERIMENTAL : {details}")
else:
    print("GPU Computing Not Available. If `GPU` is present, check configuration. Detected Devices:")
    print("  > ", tf.config.list_physical_devices())

Tensorflow Version: 2.12.0
GPU Computing Not Available. If `GPU` is present, check configuration. Detected Devices:
  >  [PhysicalDevice(name='/physical_device:CPU:0', device_type='CPU')]


### Additional Libraries

In [11]:
# import xlwings as xw # https://www.xlwings.org/

### User Defined Function(s)

It is recommended that any UDFs are defined outside the scope of the *jupyter notebook* such that development/editing of function can be done more practically. As per *programming guidelines* as [`src`](https://fileinfo.com/extension/src) file/directory is beneficial in code development and/or production release. However, *jupyter notebook* requires *kernel restart* if any imported code file is changed in disc, for this frequently changing functions can be defined in this section.

**Getting Started** with **`PYTHONPATH`**

One must know what are [Environment Variable](https://medium.com/chingu/an-introduction-to-environment-variables-and-how-to-use-them-f602f66d15fa) and how to call/use them in your choice of programming language. Note that an environment variable is *case sensitive* in all operating systems (except windows, since DOS is not case sensitive). Generally, we can access environment variables from terminal/shell/command prompt as:

```shell
# macOS/*nix
echo $VARNAME

# windows
echo %VARNAME%
```

Once you've setup your system with [`PYTHONPATH`](https://bic-berkeley.github.io/psych-214-fall-2016/using_pythonpath.html) as per [*python documentation*](https://docs.python.org/3/using/cmdline.html#envvar-PYTHONPATH) is an important directory where any `import` statements looks for based on their order of importance. If a source code/module is not available check necessary environment variables and/or ask the administrator for the source files. For testing purpose, the module boasts the use of `src`, `utils` and `config` directories. However, these directories are available at `ROOT` level, and thus using `sys.path.append()` to add directories while importing.

In [12]:
# append `src` and sub-modules to call additional files these directory are
# project specific and not to be added under environment or $PATH variable
# sys.path.append(os.path.join("..", "src", "agents")) # agents for reinforcement modelling
# sys.path.append(os.path.join("..", "src", "engine")) # derivative engines for model control
# sys.path.append(os.path.join("..", "src", "models")) # actual models for decision making tools

In [13]:
# also append the `utilities` directory for additional helpful codes
# sys.path.append(os.path.join("..", "utilities"))

# you may also want to append the `utilities/submodules` directory
# sys.path.append(os.path.join("..", "utilities", "submodules"))

<div class="alert alert-block alert-danger">
<b>DISCLAIMER:</b> The following codes are designed and created by the <a href = "https://github.com/ZenithClown">author</a> of this repository.
Please read the <a href = "https://github.com/ZenithClown/.github/blob/master/.github/CODE_OF_CONDUCT.md">CODE OF CONDUCT</a> and
<a href = "https://github.com/ZenithClown/.github/blob/master/.github/CONTRIBUTING.md">CONTRIBUTING</a> guidelines for more information.
</div>

<div class="alert alert-block alert-info">
<b>NOTE:</b> More information on Alert Box is available <a href = "https://gist.github.com/DanielKotik/4b81480c479a57e0dd13ac4d153e4451">here</a> for Markdown/Jupyter Notebooks.
</div>

In [14]:
# libraries hosted in pypi
# import nlpurify # natural language utility functions, https://pypi.org/project/nlpurify/
# import pandaswizard as pdw # wrapper function for the pandas, https://pypi.org/project/pandas-wizard/

In [15]:
import sqlparser # https://gist.github.com/ZenithClown/3fc21f94cf9567003b153bcfca738f6d
import datetime_ as dt_ # https://gist.github.com/ZenithClown/d2dd294c5f528459e16b139c04c0b182

## Global Argument(s)

The global arguments are *notebook* specific, however they may also be extended to external libraries and functions on import. The *boilerplate* provides a basic ML directory structure which contains a directory for `data` and a separate directory for `output`. In addition, a separate directory (`data/processed`) is created to save processed dataset such that preprocessing can be avoided.

In [16]:
ROOT = ".." # the document root is one level up, that contains all code structure
DATA = os.path.join(ROOT, "data") # the directory contains all data files, subdirectory (if any) can also be used/defined

# processed data directory can be used, such that preprocessing steps is not
# required to run again-and-again each time on kernel restart
PROCESSED_DATA = os.path.join(DATA, "processed")

In [17]:
# long projects can be overwhelming, and keeping track of files, outputs and
# saved models can be intriguing! to help this out, `today` can be used. for
# instance output can be stored at `output/<today>/` etc.
# `today` is so configured that it permits windows/*.nix file/directory names

# also, if used, update the `OUTPUT_DIR` configuration as required

# today = dt.datetime.strftime(dt.datetime.strptime(time.ctime(), "%a %b %d %H:%M:%S %Y"), "%a, %b %d %Y")
# print(f"Code Execution Started on: {today}") # only date, name of the sub-directory

In [18]:
OUTPUT_DIR = os.path.join(ROOT, "output")

# OUTPUT_DIR = os.path.join(ROOT, "output", today)
# os.makedirs(OUTPUT_DIR, exist_ok = True) # create dir if not exist

# also create directory for `logs`
# LOGS_DIR = os.path.join(ROOT, "logs", open("../VERSION", 'rt').read())
# os.makedirs(LOGS_DIR, exist_ok = True)

In [19]:
# logging.basicConfig(
#     filename = os.path.join(LOGS_DIR, f"{today}.log"), # change `reports` file name
#     filemode = "a", # append logs to existing file, if file exists
#     format = "%(asctime)s - %(name)s - CLASS:%(levelname)s:%(levelno)s:L#%(lineno)d - %(message)s",
#     level = logging.DEBUG
# )

## Model Development & PoC Section

A typical machine learning project revolves around six important stages (as available in [Amazon ML Life Cycle Documentation](https://docs.aws.amazon.com/wellarchitected/latest/machine-learning-lens/well-architected-machine-learning-lifecycle.html)). This notebook boilerplate can be used to understand the data file, perform statitical tests and other EDA as required for any AI/ML application. Later, using the below study a *full-fledged* application can be generated using other sections of the boilerplate.

## Reporting & End Note(s)

In [20]:
# wb = xw.Book(os.path.join(ROOT, "template", "template.xlsx"))

# # populate the sheets with sheet selection, and defining output cell, like:
# wb.sheets["sheet"]["cell"].options(header = False, index = False).value = data

# # finally, close and save the object like:
# outfile = f"[{dt.datetime.now().date()} #{str(UUID()).upper()[:3]}] Output File Name.xlsx"
# print(f"Output File Generated as: {outfile}")

# wb.save(os.path.join(OUTPUT_DIR, outfile))
# wb.close()