You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Currently, we require edge attributes and node attributes to be loaded separately from different dataframes via the load_edge_attributes and load_node_attributes functions in Arachne. This issue proposes that we allow node attribute columns to be specified for vertices during edge insertion where the data specified for that vertex in that edge can be stored as a node attribute.
This will improve performance since currently a user may have to do a large merge-join on an edge attribute dataframe to get all of the data for each vertex.
Example of desired functionality from Tom:
importarkoudaasakimportarachneasakgfromglobimportglobimportpandasaspdimportnumpyasnpimportsocketimporttimeitimportosak.connect("xxxx")
rawfilelist= ["file1","file2","file3"]
rawfilelist=rawfilelist[0:500]
columns= ["src_ip","src_port","dst_ip","dst_port","protocol"]
rawdata=ak.readmethod(rawfilelist,datasets=columns) # substitute with method to read the appropriate file typesraw_df=ak.DataFrame(rawdata)
raw_df.columns
["src_ip",
"src_port",
"dst_ip",
"dst_port",
"protocol"]
## TEMP FIX FOR PROPERTY GRAPH AS CODE LOOKS for "src" and "dst"#filtered_df=raw_dffiltered_df["src"] =filtered["src_ip"]
filtered_df["dst"] =filtered["dst_ip"]
prop_graph=akg.PropGraph()
## Add new collection to indicate properties to gather from src/dst nodes when they are created.# In my case I calculated those values to use in the load_node_attributes method like the following:## prop_graph.load_node_attributes(node_df,node_column="nodes",label_columns=["ip","port","protocol"]## BELOW IS THE DESIRED CODE WHICH AVOIDS HAVING TO PERFORM MERGE-JOINS.## NOTE: Some items left to decide. Does the load_edge_attributes calculate the node_columns or are they# provided in the filtered_df. If there are multiple node_column values for each vertice, are they# stored as a collection. For instance, src has two different protocols (two edges) or two ports, etc.#prop_graph.load_edge_attributes(filtered_df,
source_column="src",
destination_column="dst",
relationship_columns=["protocol","src_port","dst_port"],
node_columns=["ip","port","protocol"])
The text was updated successfully, but these errors were encountered:
Currently, we require edge attributes and node attributes to be loaded separately from different dataframes via the
load_edge_attributes
andload_node_attributes
functions in Arachne. This issue proposes that we allow node attribute columns to be specified for vertices during edge insertion where the data specified for that vertex in that edge can be stored as a node attribute.This will improve performance since currently a user may have to do a large merge-join on an edge attribute dataframe to get all of the data for each vertex.
Example of desired functionality from Tom:
The text was updated successfully, but these errors were encountered: