Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Combined edge and node attribute insertion #118

Open
alvaradoo opened this issue May 1, 2024 · 0 comments
Open

Combined edge and node attribute insertion #118

alvaradoo opened this issue May 1, 2024 · 0 comments
Assignees
Labels
enhancement New feature or request

Comments

@alvaradoo
Copy link
Member

Currently, we require edge attributes and node attributes to be loaded separately from different dataframes via the load_edge_attributes and load_node_attributes functions in Arachne. This issue proposes that we allow node attribute columns to be specified for vertices during edge insertion where the data specified for that vertex in that edge can be stored as a node attribute.

This will improve performance since currently a user may have to do a large merge-join on an edge attribute dataframe to get all of the data for each vertex.

Example of desired functionality from Tom:

import arkouda as ak
import arachne as akg
from glob import glob
import pandas as pd
import numpy as np
import socket
import timeit
import os

ak.connect("xxxx")

rawfilelist = ["file1","file2","file3"]
rawfilelist = rawfilelist[0:500]

columns = ["src_ip","src_port","dst_ip","dst_port","protocol"]

rawdata = ak.readmethod(rawfilelist,datasets = columns) # substitute with method to read the appropriate file types
raw_df = ak.DataFrame(rawdata)

raw_df.columns

["src_ip",
 "src_port",
 "dst_ip",
 "dst_port",
 "protocol"]

#
# TEMP FIX FOR PROPERTY GRAPH AS CODE LOOKS for "src" and "dst"
#
filtered_df = raw_df
filtered_df["src"] = filtered["src_ip"]
filtered_df["dst"] = filtered["dst_ip"]

prop_graph = akg.PropGraph()

#
# Add new collection to indicate properties to gather from src/dst nodes when they are created.
# In my case I calculated those values to use in the load_node_attributes method like the following:
#
# prop_graph.load_node_attributes(node_df,node_column="nodes",label_columns=["ip","port","protocol"]
#
# BELOW IS THE DESIRED CODE WHICH AVOIDS HAVING TO PERFORM MERGE-JOINS.
#
# NOTE: Some items left to decide.  Does the load_edge_attributes calculate the node_columns or are they
#       provided in the filtered_df.  If there are multiple node_column values for each vertice, are they
#       stored as a collection.  For instance, src has two different protocols (two edges) or two ports, etc.
#
prop_graph.load_edge_attributes(filtered_df,
                                source_column="src",
                                destination_column="dst",
                                relationship_columns=["protocol","src_port","dst_port"],
                                node_columns=["ip","port","protocol"])
@alvaradoo alvaradoo added the enhancement New feature or request label May 1, 2024
@alvaradoo alvaradoo self-assigned this May 1, 2024
@Bears-R-Us Bears-R-Us deleted a comment from mdindoost May 15, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

When branches are created from issues, their pull requests are automatically linked.

1 participant