# Datalabframework

The datalabframework is a productivity framework for ETL, ML application. Simplifying some of the common activities which are typical in Data pipeline such as project scaffolding, data ingesting, start schema generation, forecasting etc.

In [1]:
import datalabframework as dlf
from datalabframework import logging as log

## Logging

One of the main things here is to have configuration and code separated in different files. Project is all about setting the correct working directories where to run and find your notebooks, python files and configuration files. When the datalabframework project is loaded, it starts by searching for a `__main__.py` file, according to python module file naming conventions. When such a file is found, the corresponding directory is set as the root path for the project. All modules and alias paths are all relative to the project root path.

### Metadata

Logging can be configured via metadata.yml file. The logging section of the metadata will allow you to define three types of handlers: a stdout handler, a file handler, and a kafka handler. Here below the configuration details:

```
loggers:
    root:
        severity: info

    datalabframework:
        name: dlf
        stdio:
            enable: true
            severity: notice
        file:
            enable: true
            severity: notice
        kafka:
            enable: false
            severity: info
            hosts:
                kafka-node1:9092
                kafka-node2:9092
            topic: dlf
```

### Logs

Logging via the datalabframework support 5 levels:
  - info
  - notice
  - warning
  - error
  - fatal

#### No project metadata loaded.
Logging will work without loading any metadata project configuration, but in this case it will use the default cofiguration of the python root logger. By default, `debug`, `info` and `notice` level are filtered out. To enable the full functionality, including logging to kafka and logging the custom logging information about the project (sessionid, username, etc) you must load a project first.

In [3]:
log.debug('debug')
log.info('notice')
log.notice('jnotice')
log.warning('a warning message')
log.error('this is an error')
log.critical('critical condition')

this is an error
critical condition


#### Loading a metadata profile
If a logging configuration is loaded, then extra functionality will be available. In particular, logging will log datalabframework specific info, such as the session id, and data can be passed as a dictionary, optionally with a custom message

In [4]:
dlf.project.load()

Loading packages:
  -  org.apache.hadoop:hadoop-aws:3.1.1
  -  com.microsoft.sqlserver:mssql-jdbc:6.4.0.jre8
  -  mysql:mysql-connector-java:8.0.12
  -  org.postgresql:postgresql:42.2.5


<datalabframework.project.Project at 0x7faf1fed2e48>

In [5]:
# custom message
dlf.logging.notice('hello world')

NOTICE - run_code - hello world - {}


In [6]:
# custom data
dlf.logging.warning({'test_value':42})



In [7]:
# custom data and message
dlf.logging.warning('custom message', extra={'more':123})



In [8]:
# from a function

def my_nested_function():
    log.warning('another message')
    log.error('custom',extra=[1,2,3])
    
def my_function():
    log.notice({'a':'text', 'b':2})
    my_nested_function()
    
my_function()

NOTICE - my_function - data - {'b': 2, 'a': 'text'}
ERROR - my_nested_function - custom - [1, 2, 3]
