# Draco 1 vs. Draco 2

_Draco 2_ builds upon the core idea of _Draco 1_, that is, using constraints based on Answer Set Programming (ASP) to represent design knowledge about effective visualization designs. However, _Draco 2_ is a complete rewrite of _Draco 1_, with various improvements and new features.

In this notebook, we compare and contrast the capabilities of _Draco 1_ and _Draco 2_ through hands-on examples.

> ⚠️ This notebook requires a Node.js runtime so that the `draco1` bindings work as expected. Draco 2 **does not** require non-Python dependencies

In [1]:
# Display utilities
from IPython.display import display, Markdown
from typing import Callable
import json

def md(markdown: str):
    display(Markdown(markdown))

def md_json(dct: dict):
    md(f"```json\n{json.dumps(dct, indent=2)}\n```")

def run_guarded(func: Callable):
    try:
        func()
    except Exception as e:
        md(f'**Error:** <i style="color: red;">{e}</i>')

We'll be installing a forked version of Draco 1, specifically named `draco1`. This is to prevent any conflicts with the currently installed `draco` package, which refers to Draco 2 (i.e., this repository). It's important to note that the `draco1` fork doesn't modify the original functionality of Draco 1 - it's simply a renaming of the package. This way, we can clearly distinguish between the two versions of Draco for our comparison and we can interact with them within the same notebook.

In [2]:
# Installing `clyngor` prior to `draco1`, as it is a build requirement
!pip -qq install --upgrade pip && pip -qq install clyngor
!pip install -qq 'git+https://github.com/peter-gy/draco.git@named-to-draco1#draco1'

In [3]:
import draco1 as drc1
import draco as drc2

md(f'Comparing _Draco 1: v{drc1.__version__}_ with _Draco 2: v{drc2.__version__}_')

Comparing _Draco 1: v0.0.9_ with _Draco 2: v2.0.0b5_

## API Implementation Comparison

We set off by comparing and contrasting the APIs of _Draco 1_ and _Draco 2_ as well as investigating the features and technical characteristics of the two versions.

> ✅: Feature is implemented <br/>
> ✓: Feature is implemented, but it is limited in some way compared to the other version <br/>
> 🚫: Feature is not implemented <br/>
> _spec_: Refers to a chart specification <br/>
> _ASP_: Refers to Answer Set Programming <br/>

|                                              | Draco 1                                                              | Draco 2                                                                         |
|----------------------------------------------|----------------------------------------------------------------------|---------------------------------------------------------------------------------|
| Execute an ASP problem                       | ✅ `drc1.run`                                                         | ✅ `drc2.run_clingo`                                                             |
| Access results of a run                      | ✅ `drc1.Result`                                                      | ✅ `drc2.run.Model`                                                              |
| Check whether an ASP problem is satisfiable  | ✅ `drc1.is_valid`                                                    | ✅ `drc2.check_spec`                                                             |
| List ASP problem violations                  | ✅`drc1.Result.violations.keys`                                       | ✅`drc2.get_violations`                                                          |
| Show how often a spec violates a preference  | ✅`drc1.Result.violations`                                            | ✅`drc2.count_preferences`                                                       |
| Generate ASP definitions from data           | ✅ `drc1.data_to_asp`                                                 | ✅`drc2.schema_from_dataframe`                                                   |
| Conversion between spec formats              | ✅ ASP ↔️Vega-Lite, CompassQL                                   | ✓ ASP ↔️Nested dictionary                                                       |
| Render recommendations                       | ✅ `drc1.Result.as_vl`                                                | ✅ `drc2.renderer.BaseRenderer.render`                                           |
| Constraint weight learning                   | ✓ separate project `draco-learn`                                     | ✅`drc2.learn`                                                                   |
| Web browser support                          | ✓ [`draco-vis`](https://github.com/uwdata/draco-vis) - deviating API | ✅[`draco-pyodide`](https://www.npmjs.com/package/draco-pyodide) - identical API |
| Compatibility with `altair`                  | 🚫                                                                   | ✅                                                                               |
| Standalone function for completing a partial spec                        | 🚫                                                                   | ✅`drc2.complete_spec`                                                           |
| RESTful interface                            | 🚫                                                                   | ✅ `drc2.server`                                                                 |
| Recommendation & constraint weight debugging | 🚫                                                                   | ✅ `drc2.debug`                                                                  |
| Full Python compatibility                    | 🚫                                                                   | ✅                                                                               |

### API Differences in Practice

While the last aspect of the comparison table above may seem like only a minor detail for the first sight, it's important to note that _Draco 2_ is written entirely in Python, whereas _Draco 1_ is written in TypeScript and its Python API is only a wrapper around a Node.js subprocess. This means that _Draco 2_ is much easier to install and use, as it doesn't require any non-Python dependencies. Furthermore, it provides a much more seamless integration with the Python ecosystem, as it can be easily used in conjunction with other Python libraries without having to worry about serialization issues.

We demonstrate this particular advantage of _Draco 2_ over _Draco 1_ in the cells below through a common use case: generating the schema of a dataset in preparation for the generation of recommendations.

We set off by loading the [Seattle Weather](https://github.com/vega/vega/blob/main/docs/data/seattle-weather.csv) dataset from the [Vega Datasets](https://pypi.org/project/vega-datasets/) package. We then use Draco 1 (`drc1`) and Draco 2 (`drc2`) to generate the schema of the dataset. We represent the data schema as a list of Answer set programming (ASP) rules in both versions of Draco.

In [4]:
import pandas as pd
from vega_datasets import data as vega_data

df: pd.DataFrame = vega_data.seattle_weather()
df.head()

Unnamed: 0,date,precipitation,temp_max,temp_min,wind,weather
0,2012-01-01,0.0,12.8,5.0,4.7,drizzle
1,2012-01-02,10.9,10.6,2.8,4.5,rain
2,2012-01-03,0.8,11.7,7.2,2.3,rain
3,2012-01-04,20.3,12.2,5.6,4.7,rain
4,2012-01-05,1.3,8.9,2.8,6.1,rain


#### Draco 1

As the cells below show, while _Draco 1_ exposes the `data_to_asp` function to generate the schema of a dataset, it is not directly compatible with a Pandas `DataFrame`. What's more, even after converting the dataframe to a list of dictionaries - under the assumption that it will be JSON serializable without issues - the function still fails to generate the schema due to the fact that the `data` column of the dataset is stored as a `Timestamp` object, which is not JSON serializable.

We succeed with the schema generation only after converting the `date` column to a string of the format `YYYY-MM-DD`.

In [5]:
# Attempt to generate the schema of the dataframe directly
run_guarded(lambda: drc1.data_to_asp(df))

**Error:** <i style="color: red;">Object of type DataFrame is not JSON serializable</i>

In [6]:
# Attempt to generate the schema of the dataframe after converting it to a list of dictionaries
data_records = df.to_dict('records')
run_guarded(lambda: drc1.data_to_asp(data_records))

**Error:** <i style="color: red;">Object of type Timestamp is not JSON serializable</i>

In [7]:
# Attempt to generate the schema of the dataframe after converting it to a list of dictionaries
# and converting the `date` column to a string of the format `YYYY-MM-DD`
df_serializable = df.copy()
df_serializable['date'] = df_serializable['date'].apply(lambda x: x.strftime('%Y-%m-%d'))
data_records = df_serializable.to_dict('records')
drc1.data_to_asp(data_records)

['num_rows(1461).',
 '',
 'fieldtype("date",string).',
 'cardinality("date", 1461).',
 'fieldtype("precipitation",number).',
 'cardinality("precipitation", 111).',
 'fieldtype("temp_max",number).',
 'cardinality("temp_max", 67).',
 'fieldtype("temp_min",number).',
 'cardinality("temp_min", 55).',
 'fieldtype("wind",number).',
 'cardinality("wind", 79).',
 'fieldtype("weather",string).',
 'cardinality("weather", 5).',
 '']

#### Draco 2

Thanks to the fact that _Draco 2_ is written entirely in Python, it is able to directly accept a Pandas `DataFrame` as input for the schema generation without any traces of the issues we encountered with _Draco 1_.

In [8]:
data_schema = drc2.schema_from_dataframe(df)
drc2.dict_to_facts(data_schema)

['attribute(number_rows,root,1461).',
 'entity(field,root,0).',
 'attribute((field,name),0,date).',
 'attribute((field,type),0,datetime).',
 'attribute((field,unique),0,1461).',
 'attribute((field,entropy),0,7287).',
 'entity(field,root,1).',
 'attribute((field,name),1,precipitation).',
 'attribute((field,type),1,number).',
 'attribute((field,unique),1,111).',
 'attribute((field,entropy),1,2422).',
 'attribute((field,min),1,0).',
 'attribute((field,max),1,55).',
 'attribute((field,std),1,6).',
 'entity(field,root,2).',
 'attribute((field,name),2,temp_max).',
 'attribute((field,type),2,number).',
 'attribute((field,unique),2,67).',
 'attribute((field,entropy),2,3934).',
 'attribute((field,min),2,-1).',
 'attribute((field,max),2,35).',
 'attribute((field,std),2,7).',
 'entity(field,root,3).',
 'attribute((field,name),3,temp_min).',
 'attribute((field,type),3,number).',
 'attribute((field,unique),3,55).',
 'attribute((field,entropy),3,3596).',
 'attribute((field,min),3,-7).',
 'attribute(

## Visualization Specification Language Differences

To express knowledge about visualizations, we first need a language to describe them. Both versions of Draco use sets of logical facts to describe visualizations and their context. While _Draco 1_ and _Draco 2_ share the fundamental approach, the underlying language designs are quite different.

The language used to express visualizations in _Draco 1_ is based entirely on [Vega-Lite](https://vega.github.io/vega-lite/), a concise, yet expressive high-level visualization language. Although this choice makes the conversion between the ASP facts and a Vega-Lite spec easy in _Draco 1_, the design space is bounded by the capabilities of Vega-Lite. Furthermore, the language cannot be extended with user-defined details, making it more rigid overall.

On the other hand, the visualization specification language of _Draco 2_ was designed with flexibility and extensibility in mind. It can be used to specify all the visualizations that _Draco 1_ can express, **and more**, by using a nested specification format based on entities and attributes.

We demonstrate the different approaches _Draco 1_ and _Draco 2_ take for specifying visualizations by showing how the same chart can be specified using their languages.

### Language Differences in Practice

**The chart to be encoded**

Still having the Seattle Weather dataset at hand, let's suppose that we are interested in how the maximum temperatures across different weather conditions compare. For this very simple analytical task, we can create a bar chart that encodes the `weather` field on the `x` channel and the mean of the `temp_max` field on the `y` channel.

In [9]:
import altair as alt

alt.Chart(df).mark_bar().encode(
    x=alt.X(field='weather', type='ordinal'),
    y=alt.Y(field='temp_max', type='quantitative', aggregate='mean', scale=alt.Scale(zero=True))
)

#### Draco 1

As _Draco 1_ is based on the Vega-Lite specification, the ASP function names and values it uses to declare facts are identical to the JSON attributes in a Vega-Lite specification.

In [10]:
drc1_asp = [
   # Use a bar mark
  'mark(bar).',
   # Declare the existence of our first encoding, identified by `e0`
  'encoding(e0).',
   # The encoding `e0` uses the `x` channel
  'channel(e0,x).',
   # The encoding `e0` encodes the `weather` field of the dataset
  'field(e0,"weather").',
   # The encoding `e0` has the `ordinal` type
  'type(e0,ordinal).',
  
   # Declare the existence of our second encoding, identified by `e1`
  'encoding(e1).',
   # The encoding `e1` uses the `y` channel
  'channel(e1,y).',
   # The encoding `e1` encodes the `temp_max` field of the dataset
  'field(e1,"temp_max").',
   # The encoding `e1` has the `quantitative` type
  'type(e1,quantitative).',
   # The encoding `e1` uses `mean` for aggregation
  'aggregate(e1,mean).',
   # On the scale of the encoding `e1`, the `zero` attribute is set to `true`
  'zero(e1).',
]

We can use the `drc1.asp2vl` function to convert this chart specification from _Draco 1_'s ASP format into a Vega-Lite specification, the dictionary-based format _Draco 1_ supports.

In [11]:
drc1_spec_converted = drc1.asp2vl(drc1_asp)
md_json(drc1_spec_converted)

```json
{
  "$schema": "https://vega.github.io/schema/vega-lite/v5.json",
  "data": {
    "url": "data/cars.json"
  },
  "mark": "bar",
  "encoding": {
    "x": {
      "type": "ordinal",
      "field": "weather"
    },
    "y": {
      "type": "quantitative",
      "aggregate": "mean",
      "field": "temp_max",
      "scale": {
        "zero": true
      }
    }
  }
}
```

```{note}
Why does the output above have `"url": "data/cars.json"` inside the `data` attribute?
This shortcoming of _Draco 1_ is also caused by the fact that it does not have first-class Python support and data serialization is needed
every single time a function inside the `drc1` module is invoked since it is running in a Node.js environment under the hood. Therefore, the `data` attribute has this hard-coded placeholder, so that the actually rendered data does not need to do a roundtrip between the Python and the Node.js process.
```

#### Draco 2

In [12]:
drc2_asp = [
 # Declare the existence of a top-level `field` entity with ASP identifier "temp_max"
 'entity(field,root,temp_max).',
 # ... set the `name` attribute of the `field` entity identified by `temp_max` to "temp_max"
 'attribute((field,name),temp_max,temp_max).',
 # ... set the `type` attribute of the `field` entity identified by `temp_max` to "number"
 'attribute((field,type),temp_max,number).',
 
 # Declare the existence of a top-level `field` entity with ASP identifier "weather"
 'entity(field,root,weather).',
 # ... set the `name` attribute of the `field` entity identified by `weather` to "weather"
 'attribute((field,name),weather,weather).',
 # ... set the `type` attribute of the `field` entity identified by `weather` to "string"
 'attribute((field,type),weather,string).',
 
 # Declare the existence of a top-level `view` entity with ASP identifier "v0"
 'entity(view,root,v0).',
 # ... set the `coordinates` attribute of the `view` entity identified by `v0` to "cartesian"
 'attribute((view,coordinates),v0,cartesian).',

 # Declare the existence of a `mark` entity with ASP identifier "v0", nested into the `v0` view entity
 'entity(mark,v0,m0).',
 # ... set the `type` attribute of the `mark` entity identified by `m0` to "bar"
 'attribute((mark,type),m0,bar).',
 
 # Declare the existence of an `encoding` entity with ASP identifier "e0", nested into the `m0` mark entity
 'entity(encoding,m0,e0).',
 # ... set the `channel` attribute of the `encoding` entity identified by `e0` to "x"
 'attribute((encoding,channel),e0,x).',
 # ... set the `field` attribute of the `encoding` entity identified by `e0` to "weather"
 'attribute((encoding,field),e0,weather).',
 
 # Declare the existence of an `encoding` entity with ASP identifier "e1", nested into the `m0` mark entity
 'entity(encoding,m0,e1).',
 # ... set the `channel` attribute of the `encoding` entity identified by `e1` to "y"
 'attribute((encoding,channel),e1,y).',
 # ... set the `field` attribute of the `encoding` entity identified by `e1` to "temp_max"
 'attribute((encoding,field),e1,temp_max).',
 # ... set the `aggregate` attribute of the `encoding` entity identified by `e1` to "mean"
 'attribute((encoding,aggregate),e1,mean).',

 # Declare the existence of a `scale` entity with ASP identifier "s0", nested into the `v0` view entity
 'entity(scale,v0,s0).',
 # ... set the `channel` attribute of the `scale` entity identified by `s0` to "x"
 'attribute((scale,channel),s0,x).',
 # ... set the `type` attribute of the `scale` entity identified by `s0` to "ordinal"
 'attribute((scale,type),s0,ordinal).',
 
 # Declare the existence of a `scale` entity with ASP identifier "s1", nested into the `v0` view entity
 'entity(scale,v0,s1).',
 # ... set the `channel` attribute of the `scale` entity identified by `s1` to "y"
 'attribute((scale,channel),s1,y).',
 # ... set the `type` attribute of the `scale` entity identified by `s1` to "linear"
 'attribute((scale,type),s1,linear).',
 # ... set the `zero` attribute of the `scale` entity identified by `s1` to "true"
 'attribute((scale,zero),s1,true).'
]

We can use the `drc2.facts_to_dict` function to convert this chart specification from _Draco 1_'s ASP format into a nested, dictionary-based format.

In [13]:
drc2_spec_converted = drc2.facts_to_dict(drc2_asp)
md_json(drc2_spec_converted)

```json
{
  "field": [
    {
      "name": "temp_max",
      "type": "number"
    },
    {
      "name": "weather",
      "type": "string"
    }
  ],
  "view": [
    {
      "mark": [
        {
          "encoding": [
            {
              "channel": "x",
              "field": "weather"
            },
            {
              "channel": "y",
              "field": "temp_max",
              "aggregate": "mean"
            }
          ],
          "type": "bar"
        }
      ],
      "scale": [
        {
          "channel": "x",
          "type": "ordinal"
        },
        {
          "channel": "y",
          "type": "linear",
          "zero": "true"
        }
      ],
      "coordinates": "cartesian"
    }
  ]
}
```

```{note}
In contrast to the specification language of _Draco 1_, when specifying the chart using _Draco 2_'s approach, we only used two functions: `entity` and `attribute`. This is the key idea allowing _Draco 2_'s specification language to be easy-to-extend, as it makes it straightforward to define a new property (`entity`) at any nesting level, with any value (`attribute`). 
```