# Using Draco for Visualization Design Space Exploration

In this example we will use Draco to explore the visualization design space for the Seattle weather dataset.
Starting with nothing but a raw dataset, we are going to use the reusable building blocks that Draco provides to generate a wide space
of recommendations and we will investigate the produced designs using the debugger module.

In [1]:
# Suppressing warnings raised by altair in the background
# (iteration-related deprecation warnings)
import warnings

warnings.filterwarnings("ignore")

In [2]:
# Display utilities
from pprint import pprint
from IPython.display import display

## Loading the Data

We will use the Seattle weather dataset from the [Vega Datasets](https://vega.github.io/vega-datasets/) for this example.

In [3]:
import draco as drc
import pandas as pd
from vega_datasets import data as vega_data

# Loading data to be explored
df: pd.DataFrame = vega_data.seattle_weather()
df.head()

Unnamed: 0,date,precipitation,temp_max,temp_min,wind,weather
0,2012-01-01,0.0,12.8,5.0,4.7,drizzle
1,2012-01-02,10.9,10.6,2.8,4.5,rain
2,2012-01-03,0.8,11.7,7.2,2.3,rain
3,2012-01-04,20.3,12.2,5.6,4.7,rain
4,2012-01-05,1.3,8.9,2.8,6.1,rain


We can use the `schema_from_dataframe` function to generate the schema of the dataset, including the data types of each column and their statistical properties.

In [4]:
data_schema = drc.schema_from_dataframe(df)
pprint(data_schema)

{'field': [{'entropy': 7287,
            'name': 'date',
            'type': 'datetime',
            'unique': 1461},
           {'entropy': 2422,
            'max': 55,
            'min': 0,
            'name': 'precipitation',
            'std': 6,
            'type': 'number',
            'unique': 111},
           {'entropy': 3934,
            'max': 35,
            'min': -1,
            'name': 'temp_max',
            'std': 7,
            'type': 'number',
            'unique': 67},
           {'entropy': 3596,
            'max': 18,
            'min': -7,
            'name': 'temp_min',
            'std': 5,
            'type': 'number',
            'unique': 55},
           {'entropy': 3950,
            'max': 9,
            'min': 0,
            'name': 'wind',
            'std': 1,
            'type': 'number',
            'unique': 79},
           {'entropy': 1201,
            'freq': 714,
            'name': 'weather',
            'type': 'string',
            'unique': 5}

We transform the data schema into a set of facts that Draco can use to reason about the data when generating recommendations. We use the `dict_to_facts` function to do so which takes a dictionary and returns a list of facts.
The output list of facts encodes the same information as the input dictionary, it is just a different representation that we can feed into [Clingo](https://potassco.org/clingo/) under the hood.

In [5]:
data_schema_facts = drc.dict_to_facts(data_schema)
pprint(data_schema_facts)

['attribute(number_rows,root,1461).',
 'entity(field,root,0).',
 'attribute((field,name),0,date).',
 'attribute((field,type),0,datetime).',
 'attribute((field,unique),0,1461).',
 'attribute((field,entropy),0,7287).',
 'entity(field,root,1).',
 'attribute((field,name),1,precipitation).',
 'attribute((field,type),1,number).',
 'attribute((field,unique),1,111).',
 'attribute((field,entropy),1,2422).',
 'attribute((field,min),1,0).',
 'attribute((field,max),1,55).',
 'attribute((field,std),1,6).',
 'entity(field,root,2).',
 'attribute((field,name),2,temp_max).',
 'attribute((field,type),2,number).',
 'attribute((field,unique),2,67).',
 'attribute((field,entropy),2,3934).',
 'attribute((field,min),2,-1).',
 'attribute((field,max),2,35).',
 'attribute((field,std),2,7).',
 'entity(field,root,3).',
 'attribute((field,name),3,temp_min).',
 'attribute((field,type),3,number).',
 'attribute((field,unique),3,55).',
 'attribute((field,entropy),3,3596).',
 'attribute((field,min),3,-7).',
 'attribute(

## Generating Recommendations

We start by defining `input_spec_base` which is a list of facts including the data schema, a single view and a single mark.
This is the minimal set of facts that Draco needs to generate recommendations which can be rendered into charts.

We instantiate a `Draco` object, using the default knowledge base, and an `AltairRenderer` object which will be used to render the recommendations into Vega-Lite charts.

In [6]:
from draco.renderer import AltairRenderer

input_spec_base = data_schema_facts + [
    "entity(view,root,v0).",
    "entity(mark,v0,m0).",
]
d = drc.Draco()
renderer = AltairRenderer()

We can now use the `complete_spec` method of the `Draco` object to generate recommendations from incomplete specifications.
The function below is a reusable utility for this example, responsible for generating, rendering and displaying the recommendations.

In [7]:
def recommend_charts(
    spec: list[str], num: int = 5, labeler=lambda i: f"CHART {i+1}"
) -> dict[str, dict]:
    # Dictionary to store the generated recommendations, keyed by chart name
    chart_specs = {}
    for i, model in enumerate(d.complete_spec(spec, num)):
        chart_name = labeler(i)
        spec = drc.answer_set_to_dict(model.answer_set)
        chart_specs[chart_name] = drc.dict_to_facts(spec)

        print(chart_name)
        print(f"COST: {model.cost}")
        display(renderer.render(spec=spec, data=df))

    return chart_specs

We are using `input_spec_base` as the starting point for our exploration, that is, we are only specifying the data schema, and that we want the recommendations to have at least one view and one mark.

In [8]:
input_spec = input_spec_base
rec = recommend_charts(input_spec)

CHART 1
COST: [3]


CHART 2
COST: [4]


CHART 3
COST: [4]


CHART 4
COST: [4]


CHART 5
COST: [5]


While the above recommendations are valid, they are not very diverse. We can extend the input specification to better specify the design space we want to see recommendations for.
Let's say, we want the fields `date` and `temp_max` of the weather dataset to be encoded in the charts.
Also, we specify that we want the chart to be a faceted chart.
Note that we are not specifying the mark type, the encoding channels for the fields nor for the facet. We leave this to Draco to decide, based on its underlying knowledge base.

In [9]:
input_spec = input_spec_base + [
    # We want to encode the `date` field
    "entity(encoding,m0,e0).",
    "attribute((encoding,field),e0,date).",
    # We want to encode the `temp_max` field
    "entity(encoding,m0,e1).",
    "attribute((encoding,field),e1,temp_max).",
    # We want the chart to be a faceted chart
    "entity(facet,v0,f0).",
    "attribute((facet,channel),f0,col).",
]
rec = recommend_charts(input_spec, 5)

CHART 1
COST: [16]


CHART 2
COST: [16]


CHART 3
COST: [17]


CHART 4
COST: [17]


CHART 5
COST: [17]


## Debugging Recommendations

We can use the `DracoDebug` class to investigate the recommendations generated by Draco and whether they violate any of the soft constraints.
We start by instantiating a `DracoDebug` object, passing the recommendations and the `Draco` object used to generate them.
A `DataFrame` is returned, containing the recommendations and the soft constraints that they violate as well as the weights associated with each constraint.

In [10]:
debugger = drc.DracoDebug(specs=rec, draco=d)
chart_preferences = debugger.chart_preferences
chart_preferences.head()

Unnamed: 0,chart_name,pref_name,pref_description,count,weight
0,CHART 1,cartesian_coordinate,Cartesian coordinates.,1,0
1,CHART 1,summary_point,Point mark for summary tasks.,1,0
2,CHART 1,linear_y,Linear scale with y channel.,1,0
3,CHART 1,linear_x,Linear scale with x channel.,1,0
4,CHART 1,c_c_point,Continuous by continuous for point mark.,1,0


Let's take a look at the number of violated preferences:

In [11]:
num_violations = len(
    set(chart_preferences[chart_preferences["count"] != 0]["pref_name"])
)
num_all = len(set(chart_preferences["pref_name"]))
print(
    f"{num_violations} preferences are violated out of a total of {num_all} preferences (soft constraints)"
)

19 preferences are violated out of a total of 147 preferences (soft constraints)


To get a better overview of the soft constraint violations, we can use the `DracoDebugPlotter` class to visualize the debug `DataFrame` produced by `DracoDebug`.

In [12]:
plotter = drc.DracoDebugPlotter(chart_preferences, plot_size=(800, 400))
chart_config = drc.DracoDebugPlotter.__DEFAULT_CONFIGS__[1]
plotter.create_chart(chart_config, violated_prefs_only=True)

## Generating Input Specifications Programmatically

To get a better impression of the space of possible visualizations and to produce examples that might be covered by more soft constraints, we can programmatically generate further input specifications.
We define a list of possible values for the mark type, fields and encoding channels that we want to be used in the recommendations and combine them using a nested list comprehension.
We also filter out designs with less than 3 encodings and exclude multi-layer or multi-view designs for now.

In [13]:
marks = ["point", "bar", "line", "rect"]
fields = ["weather", "temp_min", "date"]
encoding_channels = ["color", "shape", "size"]

input_specs = [
    (
        (mark, field, enc_ch),
        input_spec_base
        + [
            f"attribute((mark,type),m0,{mark}).",
            "entity(encoding,m0,e0).",
            f"attribute((encoding,field),e0,{field}).",
            f"attribute((encoding,channel),e0,{enc_ch}).",
            # filter out designs with less than 3 encodings
            ":- {entity(encoding,_,_)} <= 2.",
            # exclude multi-layer or multi-view designs
            ":- {entity(mark,_,_)} >= 2.",
        ],
    )
    for mark in marks
    for field in fields
    for enc_ch in encoding_channels
]
recs = {}
for cfg, input_spec in input_specs:
    labeler = lambda i: f"CHART {i+1} ({' | '.join(cfg)})"
    recs = recs | recommend_charts(input_spec, 1, labeler)

CHART 1 (point | weather | color)
COST: [25]


CHART 1 (point | weather | shape)
COST: [28]


CHART 1 (point | weather | size)
COST: [30]


CHART 1 (point | temp_min | color)
COST: [27]


CHART 1 (point | temp_min | shape)
COST: [41]


CHART 1 (point | date | color)
COST: [28]


CHART 1 (point | date | shape)
COST: [42]


CHART 1 (point | date | size)
COST: [19]


CHART 1 (bar | weather | color)
COST: [25]


CHART 1 (bar | temp_min | color)
COST: [27]


CHART 1 (bar | date | color)
COST: [28]


CHART 1 (line | weather | color)
COST: [45]


CHART 1 (line | temp_min | color)
COST: [47]


CHART 1 (line | date | color)
COST: [48]


CHART 1 (rect | weather | color)
COST: [71]


CHART 1 (rect | temp_min | color)
COST: [39]


CHART 1 (rect | date | color)
COST: [40]


It is no secret that some of the above recommendations are not very useful when it comes to communicating the data. Nevertheless, they are valid visualizations from the space of possibilities. Following the already introduced workflow, we can use `DracoDebug` to investigate the soft constraint violations of the generated recommendations. If there are recommendations we are not happy with, we can extend the knowledge base to cover them so that they do not appear in the future.

In [14]:
debugger = drc.DracoDebug(specs=recs, draco=d)
chart_preferences = debugger.chart_preferences
chart_preferences.head()

Unnamed: 0,chart_name,pref_name,pref_description,count,weight
0,CHART 1 (point | weather | color),cartesian_coordinate,Cartesian coordinates.,1,0
1,CHART 1 (point | weather | color),summary_point,Point mark for summary tasks.,1,0
2,CHART 1 (point | weather | color),aggregate_mean,Mean as aggregate op.,1,1
3,CHART 1 (point | weather | color),aggregate_count,Count as aggregate op.,1,0
4,CHART 1 (point | weather | color),ordinal_color,Ordinal scale with color channel.,1,8


Let's take a look at the number of violated preferences:

In [15]:
num_violations = len(
    set(chart_preferences[chart_preferences["count"] != 0]["pref_name"])
)
num_all = len(set(chart_preferences["pref_name"]))
print(
    f"{num_violations} preferences are violated out of a total of {num_all} preferences (soft constraints)"
)

37 preferences are violated out of a total of 147 preferences (soft constraints)


In [16]:
plotter = drc.DracoDebugPlotter(chart_preferences, plot_size=(800, 400))
chart_config = drc.DracoDebugPlotter.__DEFAULT_CONFIGS__[1]
plotter.create_chart(chart_config, violated_prefs_only=True)