# Gephi (GEXF)

GEXF (Graph Exchange XML Format) is a common interchange format used by Gephi and other tools.
This notebook covers a small local sample and a medium-sized dataset with GEXF viz metadata.


In [1]:
import os
from pathlib import Path
from urllib.request import urlretrieve


import graphistry

# To specify Graphistry account & server, use:
# graphistry.register(api=3, username='...', password='...', protocol='https', server='hub.graphistry.com')
# For more options: https://pygraphistry.readthedocs.io/en/latest/server/register.html


In [2]:
GRAPHISTRY_SERVER = os.environ.get("GRAPHISTRY_SERVER", "hub.graphistry.com")
GRAPHISTRY_PROTOCOL = os.environ.get("GRAPHISTRY_PROTOCOL", "https")
GRAPHISTRY_USERNAME = os.environ.get("GRAPHISTRY_USERNAME")
GRAPHISTRY_PASSWORD = os.environ.get("GRAPHISTRY_PASSWORD")

if not GRAPHISTRY_USERNAME or not GRAPHISTRY_PASSWORD:
    raise RuntimeError("Set GRAPHISTRY_USERNAME and GRAPHISTRY_PASSWORD to upload.")

graphistry.register(
    api=3,
    protocol=GRAPHISTRY_PROTOCOL,
    server=GRAPHISTRY_SERVER,
    username=GRAPHISTRY_USERNAME,
    password=GRAPHISTRY_PASSWORD,
)


<graphistry.pygraphistry.GraphistryClient at 0x7a5cbe8579b0>

In [3]:
gexf_path = Path("demos/demos_databases_apis/gexf/sample.gexf")
if not gexf_path.exists():
    gexf_path = Path("sample.gexf")
g = graphistry.gexf(str(gexf_path))

g._nodes.head()


Unnamed: 0,node_id,label,category,viz_color,viz_opacity,viz_x,viz_y,viz_z,viz_size,viz_shape,viz_shape_icon
0,n10,Delta,typeA,#EFAD42,0.5,10.0,20.5,0.0,2.5,disc,circle
1,n11,Epsilon,typeB,#0A141E,1.0,-5.0,7.5,0.0,1.25,square,square


GEXF viz attributes map to Graphistry bindings (color, size, position, opacity, icons).
You can plot directly using the GEXF defaults:


In [4]:
g.name("GEXF sample").plot()


## Medium GEXF demo: SiS Words

This dataset includes GEXF viz encodings for node color, size, and position.
The source uses a single color and size value, so the default plot looks uniform.
Below we show the faithful default binding, how to drop GEXF colors/sizes while
keeping layout, and then how to apply Graphistry encodings.


In [5]:
DATA_URL = "https://raw.githubusercontent.com/medialab/medialab-network-dataset/master/SiS%20Words.gexf"
DATA_DIR = Path("demos/demos_databases_apis/gexf/data")
if not DATA_DIR.exists():
    DATA_DIR = Path("data")
GEXF_PATH = DATA_DIR / "sis_words.gexf"

DATA_DIR.mkdir(parents=True, exist_ok=True)
if not GEXF_PATH.exists():
    urlretrieve(DATA_URL, GEXF_PATH)

GEXF_PATH.exists()


True

In [6]:
g = graphistry.gexf(str(GEXF_PATH))
counts = {"nodes": len(g._nodes), "edges": len(g._edges)}
bindings = {
    "point_color": g._point_color,
    "point_size": g._point_size,
    "point_x": g._point_x,
    "point_y": g._point_y,
    "edge_color": g._edge_color,
    "play": g._url_params.get("play"),
}
counts, bindings


({'nodes': 6704, 'edges': 71744},
 {'point_color': 'viz_color',
  'point_size': 'viz_size',
  'point_x': 'viz_x',
  'point_y': 'viz_y',
  'edge_color': None,
  'play': 0})

In [7]:
g._nodes.head()


Unnamed: 0,node_id,label,class,main,occurences,viz_size,viz_x,viz_y,viz_z,viz_color
0,w70401,populations indigènes,populations et amélioration des conditions de vie,True,3,10.0,-649.47797,-996.46686,0.0,#999999
1,w70416,impact des activités humaines,réchauffement climatique et elavation du nivea...,True,2,10.0,789.2527,10.201024,0.0,#999999
2,w70453,préservation de la qualité,développement durable et environnement,True,3,10.0,1131.0421,-927.3175,0.0,#999999
3,w70455,préservation de la nature,préservation de la nature et de la biodiversité,True,2,10.0,1068.5127,-995.344,0.0,#999999
4,w70454,préservation des ressources naturelles,développement durable et environnement,True,4,10.0,982.2927,-890.7963,0.0,#999999


In [8]:
g._edges.head()


Unnamed: 0,source,target
0,w70401,w69745
1,w70401,w69741
2,w70401,w54632
3,w70401,w53692
4,w70401,w53637


In [9]:
g.name("SiS Words (GEXF defaults)").plot()


## Drop GEXF colors/sizes (keep layout)

Use `bind_node_viz` / `bind_edge_viz` to keep only the bindings you want.
Here we keep position for layout, and drop color/size/opacity/icon bindings.


In [10]:
g_layout_only = graphistry.gexf(
    str(GEXF_PATH),
    bind_node_viz=["position"],
    bind_edge_viz=[],
)


In [11]:
g_layout_only.name("SiS Words (layout only)").plot()


## Apply Graphistry encodings

After dropping bindings, use Graphistry encodings for color/size.
Here we color by `class` using a categorical mapping for the most frequent
classes (and a default for everything else), and size by `occurences`.


In [12]:
required_cols = ["class", "occurences"]
missing_cols = [col for col in required_cols if col not in g_layout_only._nodes.columns]
assert not missing_cols, f"Missing expected node columns: {missing_cols}"

class_counts = g_layout_only._nodes["class"].value_counts()
top_classes = class_counts.head(8).index.tolist()
palette = ["#4C78A8", "#F58518", "#54A24B", "#E45756", "#72B7B2", "#EECA3B", "#B279A2", "#FF9DA6"]
class_color_map = dict(zip(top_classes, palette))

g_encoded = (
    g_layout_only
    .encode_point_color(
        "class",
        categorical_mapping=class_color_map,
        default_mapping="#D0D0D0",
        as_categorical=True,
    )
    .encode_point_size("occurences")
)

class_color_map


{'génétique': '#4C78A8',
 'préservation de la nature et de la biodiversité': '#F58518',
 'commerce équitable': '#54A24B',
 'développement durable et environnement': '#E45756',
 'gaz à effet de serre et pollution de l air': '#72B7B2',
 'maîtrise de l énergie': '#EECA3B',
 'connaissance et domaine scientifique': '#B279A2',
 'culture scientifique et technique et pédagogie des sciences': '#FF9DA6'}

In [13]:
g_encoded.name("SiS Words (layout + encodings)").plot()
