Skip to content

ItemGraph load_from_file() does not load URL nodes correctly #2415

@PancoSH

Description

@PancoSH

Describe the bug
When using the ItemGraph module load_from_file() function, the function splits nodes by underscores to get the item ID, regardless on whether the node contains an item ID. This leads to it breaking the nodes & edges associated to a URL containing underscores, which only becomes clear when calling .required_by() on the URL nodes.

For instance, "https://services.arcgisonline.com/ArcGIS/rest/services/World_Imagery/MapServer"
is stored in the .gml as "node_https://services.arcgisonline.com/ArcGIS/rest/services/World_Imagery/MapServer"
but loaded into python as "https://services.arcgisonline.com/ArcGIS/rest/services/World"

From what I can tell the issue comes from line 469 of _item_graph.py (itemid = data.split("_")[1]) in the destringize_node() sub-function.

To Reproduce
Steps to reproduce the behavior:

  • Create an ItemGraph that contains an item that depends on a web item that includes and underscore in the URL.
  • Save the ItemGraph as a .gml file and load it back in with load_from_file()
  • Identify the associated node and run brokennode.required_by()

Alternatively, using the code samples written here:

nodes = new_graph.all_items()

# define our sort function
def count_reqs(node):
    return len(node.required_by("id"))

# this orders our nodes properly
sorted_nodes = sorted(nodes, key=count_reqs)

# remove ones that don't have items and aren't dependencies of others
for node in sorted_nodes:
    if not node.item and not node.contained_by():
        new_graph.remove_node(node.id)

error:

---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
File /opt/conda/lib/python3.11/site-packages/networkx/classes/digraph.py:954, in DiGraph.predecessors(self, n)
    953 try:
--> 954     return iter(self._pred[n])
    955 except KeyError as err:

KeyError: 'https://services.arcgisonline.com/ArcGIS/rest/services/World'

The above exception was the direct cause of the following exception:

NetworkXError                             Traceback (most recent call last)
Cell In[35], line 7
      4     return len(node.required_by("id"))
      6 # this orders our nodes properly
----> 7 sorted_nodes = sorted(nodes, key=count_reqs)
      9 # remove ones that don't have items and aren't dependencies of others
     10 for node in sorted_nodes:

Cell In[35], line 4, in count_reqs(node)
      2 def count_reqs(node):
      3     # print(f"{node.id} - {node.required_by('id')}")
----> 4     return len(node.required_by("id"))

File /opt/conda/lib/python3.11/site-packages/arcgis/apps/itemgraph/_item_graph.py:175, in ItemNode.required_by(self, out_format)
    152 def required_by(self, out_format: str = "node"):
    153     """
    154     Compiles a deep list of all items that require this item to exist. For example, if this
    155     item is a Feature Service found in a WebMap that is then itself found in a Dashboard,
   (...)
    172         A list of item ID's or items.
    173     """
--> 175     return self._handle_nodes(list(nx.ancestors(self.graph, self.id)), out_format)

File <class 'networkx.utils.decorators.argmap'> compilation 34:3, in argmap_ancestors_31(G, source, backend, **backend_kwargs)
      1 import bz2
      2 import collections
----> 3 import gzip
      4 import inspect
      5 import itertools

File /opt/conda/lib/python3.11/site-packages/networkx/utils/backends.py:967, in _dispatchable.__call__(self, backend, *args, **kwargs)
    965     if backend is not None and backend != "networkx":
    966         raise ImportError(f"'{backend}' backend is not installed")
--> 967     return self.orig_func(*args, **kwargs)
    969 # Use `backend_name` in this function instead of `backend`.
    970 # This is purely for aesthetics and to make it easier to search for this
    971 # variable since "backend" is used in many comments and log/error messages.
    972 backend_name = backend

File /opt/conda/lib/python3.11/site-packages/networkx/algorithms/dag.py:110, in ancestors(G, source)
     76 @nx._dispatchable
     77 def ancestors(G, source):
     78     """Returns all nodes having a path to `source` in `G`.
     79 
     80     Parameters
   (...)
    108     descendants
    109     """
--> 110     return {child for parent, child in nx.bfs_edges(G, source, reverse=True)}

File /opt/conda/lib/python3.11/site-packages/networkx/algorithms/dag.py:110, in <setcomp>(.0)
     76 @nx._dispatchable
     77 def ancestors(G, source):
     78     """Returns all nodes having a path to `source` in `G`.
     79 
     80     Parameters
   (...)
    108     descendants
    109     """
--> 110     return {child for parent, child in nx.bfs_edges(G, source, reverse=True)}

File /opt/conda/lib/python3.11/site-packages/networkx/algorithms/traversal/breadth_first_search.py:194, in bfs_edges(G, source, reverse, depth_limit, sort_neighbors)
    190     yield from generic_bfs_edges(
    191         G, source, lambda node: iter(sort_neighbors(successors(node))), depth_limit
    192     )
    193 else:
--> 194     yield from generic_bfs_edges(G, source, successors, depth_limit)

File /opt/conda/lib/python3.11/site-packages/networkx/algorithms/traversal/breadth_first_search.py:93, in generic_bfs_edges(G, source, neighbors, depth_limit)
     91 n = len(G)
     92 depth = 0
---> 93 next_parents_children = [(source, neighbors(source))]
     94 while next_parents_children and depth < depth_limit:
     95     this_parents_children = next_parents_children

File /opt/conda/lib/python3.11/site-packages/networkx/classes/digraph.py:956, in DiGraph.predecessors(self, n)
    954     return iter(self._pred[n])
    955 except KeyError as err:
--> 956     raise NetworkXError(f"The node {n} is not in the digraph.") from err

NetworkXError: The node https://services.arcgisonline.com/ArcGIS/rest/services/World is not in the digraph.

Expected behavior
Non-item nodes should be loaded in fully so that they do not cause errors when trying to find what they depend on.

Platform (please complete the following information):

  • OS: Windows 11
  • Browser: Firefox
  • Python API Version 2.4.1

Additional context
This bug was found trying to set up an automatically updating Org-Wide Graph using the associated sample document in an ArcGIS Online notebook.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions