Let's download a big dataset of English words, plus some hyponym-hypernym relationships. (A hyponym-hypernym relationship is a “type-of” relationship where a hyponym is a more specific term (e.g., “dog”) and a hypernym is a broader term (e.g., “animal”).)

In [1]:
import pandas as pd
from cosmograph import cosmo

df = pd.read_parquet('https://www.dropbox.com/scl/fi/4mnk1e2wx31j9mdsjzecy/wordnet_feature_meta.parquet?rlkey=ixjiiso80s1uk4yhx1v38ekhm&dl=1')
hyponyms = pd.read_parquet('https://www.dropbox.com/scl/fi/pl72ixv34soo1o8zanfrz/hyponyms.parquet?rlkey=t4d606fmq1uinn29qmli7bx6r&dl=1')

Peep at the data:

In [2]:
print(f"{df.shape=}")
df.iloc[0]

df.shape=(123587, 8)


word                                                          a
frequency                                              0.015441
definition    a metric unit of length equal to one ten billi...
lexname                                           noun.quantity
name                                              angstrom.n.01
pos                                                        noun
umap_x                                                 3.027916
umap_y                                                 3.760965
Name: angstrom.n.01.a, dtype: object

In [3]:
print(f"{hyponyms.shape=}")
hyponyms.iloc[0]

hyponyms.shape=(258896, 2)


source           vitamin_a.n.01.a
target    vitamin_a1.n.01.retinol
Name: 0, dtype: object

Let's plot the data using the [UMAP projection](https://umap-learn.readthedocs.io/en/latest/) 
of the (OpenAI) [embeddings](https://www.deepset.ai/blog/the-beginners-guide-to-text-embeddings)
of the words, coloring by "part-of-speech" and sizing by the usage frequency of the word.


In [4]:
cosmo(
    df,
    point_id_by='lemma',
    point_label_by='word',
    point_x_by='umap_x',
    point_y_by='umap_y',
    point_color_by='pos',
    point_size_by='frequency',
    point_size_scale=6,  # often have to play with this number to get the size right
    disable_point_size_legend=True
)

Cosmograph(background_color=None, disable_point_size_legend=True, focused_point_ring_color=None, hovered_point…

And now, let's put some hypernym-hyponym links, and let the network converge to a stable layout using a force-directed simulation (try it yourself, the convergence is pretty!)

In [5]:
cosmo(
    points=df,
    links=hyponyms,
    link_source_by='source',
    link_target_by='target',
    point_id_by='lemma',
    point_label_by='word',
    # point_x_by='umap_x',
    # point_y_by='umap_y',
    point_color_by='pos',
    point_size_by='frequency',
    point_size_scale=0.2,  # often have to play with this number to get the size right
    disable_point_size_legend=True
)

Cosmograph(background_color=None, disable_point_size_legend=True, focused_point_ring_color=None, hovered_point…