### EuroVis Dataset
- **Raw Data:** Academic publications metadata from the EuroVis conference, including titles, abstracts, authors, and awards.
- **Prepared Data:** [merged_artifacts.parquet](https://www.dropbox.com/scl/fi/i285q892wjmm6f9oak41g/merged_artifacts.parquet?rlkey=1y32rk8uzbiet9u18no760jad&dl=1) (5599 rows, 18 columns)
  - **Potential columns for visualization:**
    - **X & Y Coordinates:** `x`, `y`
    - **Point Size:** `n_tokens` (number of tokens in the abstract)
    - **Color:** Cluster labels (`cluster_05`, `cluster_08`, etc.)
    - **Label:** `title`
  - **Related code file:** [eurovis.py](https://github.com/thorwhalen/imbed_data_prep/blob/main/imbed_data_prep/eurovis.py)

## Get data

In [None]:
# Install and import

import os
if not os.getenv('IN_COSMO_DEV_ENV'):
    %pip install -q cosmograph tabled cosmodata

import cosmodata
from functools import partial 
import tabled
from cosmograph import cosmo

In [29]:
get_parquet = partial(tabled.get_table, ext='.parquet')
src = 'https://www.dropbox.com/scl/fi/i285q892wjmm6f9oak41g/merged_artifacts.parquet?rlkey=1y32rk8uzbiet9u18no760jad&dl=1'
data = cosmodata.acquire_data(src, 'eurovis_merged_artifacts.parquet', getter=get_parquet)

## Peep at the data

In [44]:
cosmodata.print_dataframe_info(data)

DataFrame shape: (5599, 18)
First row
------------------------------------------------------------
conference                                                      EuroVis
year                                                               2024
title                 A Prediction-Traversal Approach for Compressin...
doi                                                   10.1111/cgf.15097
abstract              We explore an error-bounded lossy compression ...
authorNamesDeduped           Congrong Ren;Xin Liang 0001;Hanqi Guo 0001
award                                                              None
resources                                                             P
link                     https://vispubs.com/?paper=10.1111%2Fcgf.15097
segment               ##A Prediction-Traversal Approach for Compress...
n_tokens                                                            318
x                                                            -16.307396
y                                    

## Visualize data

### 💡 Visualization Mapping

🧭 What You’ll See
- Each point = one paper
- Position (x,y) → semantic similarity in embedding space
- Color (cluster_13) → paper topic cluster
- Size (year) → publication recency
- Hovering or zooming reveals paper titles interactively

🧠 Optional Extensions
- Replace cluster_13 with cluster_21 or cluster_34 for finer granularity.
- Use conference to color points if merging across multiple conferences.
- Filter by resources == 'P' or award ≠ None to highlight papers with supplementary resources or awards.

In [40]:
cosmo(
    data,
    point_x_by="x",
    point_y_by="y",
    point_color_by="cluster_13",     # Try other cluster granularities too (05, 08, 21, 34)
    point_label_by="title",
    point_size_by="year",             # optional, newer papers slightly larger
    point_timeline_by="year",         # enables time slider by year
    show_dynamic_labels=True,
    disable_point_color_legend=False,
    background_color="#111111",
    fit_view_on_init=True,
)

Cosmograph(background_color='#111111', disable_point_color_legend=False, fit_view_on_init=True, focused_point_…

### Resource & Abstract-Length Lens

- Grey background
- Color encodes the **type of resource** (e.g., poster/talk/etc.).
- Size reflects **abstract length** (n_tokens) to surface denser papers.
- Top labels show the **wordiest** items; hover for full titles.
- Extra columns included for rich tooltips.

In [43]:
cosmo(
    data,
    point_x_by="x",
    point_y_by="y",
    point_color_by="resources",
    point_size_by="n_tokens",
    show_top_labels=True,
    show_top_labels_by="n_tokens",
    show_top_labels_limit=40,
    point_label_by="title",
    point_include_columns=["authorNamesDeduped", "doi", "link", "resources", "year"],
    fit_view_on_init=True,
    background_color="#aaaaaa",
)


Cosmograph(background_color='#aaaaaa', fit_view_on_init=True, focused_point_ring_color=None, hovered_point_rin…

### Chronological Heatmap of Topics

- Color encodes publication year to reveal temporal waves across the embedding.
- Uses a fine-grained topic view via cluster_34 as static labels for context (optional).
- Uniform point size to emphasize the color gradient.

In [45]:
cosmo(
    data,
    point_x_by="x",
    point_y_by="y",
    point_color_by="year",
    # Optional: faint labels for dense topical areas
    show_top_labels=True,
    show_top_labels_by="cluster_34",
    show_top_labels_limit=60,
    point_label_by="cluster_34",
    point_size=3.5,
    point_include_columns=["title", "year", "cluster_34", "doi", "link"],
    fit_view_on_init=True,
    background_color="#0e0e0e",
)


Cosmograph(background_color='#0e0e0e', fit_view_on_init=True, focused_point_ring_color=None, hovered_point_rin…