# Describe Function
Get the table's summary statistics and summary plots

    :param context:         the function context.
    :param table:           MLRun input pointing to pandas dataframe (csv/parquet file path).
    :param label_column:    ground truth column label.
    :param class_labels:    label for each class in tables and plots.
    :param plot_hist:       (True) set this to False for large tables.
    :param plots_dest:      destination folder of summary plots (relative to artifact_path).
    :param update_dataset:  when the table is a registered dataset update the charts in-place.

## Output artifacts

The function will output the following artifacts per column within the data frame (based on data types):
1. histogram chart
2. violin chart
3. imbalance chart
4. correlation-matrix chart
5. correlation-matrix csv
6. imbalance-weights-vec csv


### MLconfig

In [1]:
import os
import mlrun

mlrun.set_environment(
    api_path="http://mlrun-api:8080", artifact_path=os.path.abspath("./")
)

('default', '/User/functions/describe')

## Save

In [2]:
import yaml

with open("item.yaml") as item_file:
    items = yaml.load(item_file, Loader=yaml.FullLoader)
    
# create job function object from notebook code
fn = mlrun.code_to_function(
    items["name"],
    kind=items["spec"]["kind"],
    handler=items["spec"]["handler"],
    filename=items["spec"]["filename"],
    image=items["spec"]["image"],
    description=items["description"],
    categories=items["categories"],
    labels=items["labels"],
    requirements=items["spec"]["requirements"],
)

fn.export("describe.yaml")

> 2021-02-18 07:42:48,095 [info] function spec saved to path: describe.yaml


<mlrun.runtimes.kubejob.KubejobRuntime at 0x7f2d4f12ea50>

## Examples

In [3]:
fn.apply(mlrun.platforms.auto_mount())

<mlrun.runtimes.kubejob.KubejobRuntime at 0x7f2d4f12ea50>

In [4]:
from describe import summarize

DATA_URL = "https://s3.wasabisys.com/iguazio/data/iris/iris_dataset.csv"

task = mlrun.NewTask(
    name="tasks-describe",
    handler=summarize,
    inputs={"table": DATA_URL},
    params={"update_dataset": True, "label_column": "label"},
)

### Run Locally

In [5]:
run = mlrun.run_local(task)

> 2021-02-18 07:42:48,687 [info] starting run tasks-describe uid=1d606b11b11a4559964dde6419ca2329 DB=http://mlrun-api:8080


project,uid,iter,start,state,name,labels,inputs,parameters,results,artifacts
default,...19ca2329,0,Feb 18 07:42:48,completed,tasks-describe,v3io_user=eyalskind=handlerowner=eyalshost=jupyter-eyals-666bf556fc-5v7bf,table,update_dataset=Truelabel_column=label,,histogramsviolinimbalanceimbalance-weights-veccorrelation-matrixcorrelation


to track results use .show() or .logs() or in CLI: 
!mlrun get run 1d606b11b11a4559964dde6419ca2329 --project default , !mlrun logs 1d606b11b11a4559964dde6419ca2329 --project default
> 2021-02-18 07:42:53,632 [info] run executed, status=completed


### Run Remotely

In [6]:
fn.run(task, inputs={"table": DATA_URL})

> 2021-02-18 07:42:53,637 [info] starting run tasks-describe uid=967f203d8e7647a0b5ff988694565988 DB=http://mlrun-api:8080
> 2021-02-18 07:42:53,790 [info] Job is running in the background, pod: tasks-describe-r2xlr
> 2021-02-18 07:43:02,614 [info] run executed, status=completed
final state: completed


project,uid,iter,start,state,name,labels,inputs,parameters,results,artifacts
default,...94565988,0,Feb 18 07:42:58,completed,tasks-describe,v3io_user=eyalskind=jobowner=eyalshost=tasks-describe-r2xlr,table,update_dataset=Truelabel_column=label,,histogramsviolinimbalanceimbalance-weights-veccorrelation-matrixcorrelation


to track results use .show() or .logs() or in CLI: 
!mlrun get run 967f203d8e7647a0b5ff988694565988 --project default , !mlrun logs 967f203d8e7647a0b5ff988694565988 --project default
> 2021-02-18 07:43:02,949 [info] run executed, status=completed


<mlrun.model.RunObject at 0x7f2d4626a450>