Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make edges #162

Merged
merged 121 commits into from
May 23, 2020
Merged

Make edges #162

merged 121 commits into from
May 23, 2020

Conversation

justaddcoffee
Copy link
Collaborator

@justaddcoffee justaddcoffee commented May 19, 2020

Code to emit positive and negatives train/test/validation edges for ML

Given a graph (from formatted node and edge TSVs), output positive edges and negative
edges for use in machine learning.

To generate positive edges: a set of test (and optionally validation) positive edges equal in number to [(1 - train_fraction) * number of edges in input graph] are randomly selected from the edges in the input graph, such that both nodes participating in the edge have a degree greater than min_degree (to avoid creating disconnected components). These edges are emitting as positive test [and optionally positive validation] nodes. These edges are then removed from the edges from the input graph and these are the training edges.

Negative edges are selected by randomly selecting pairs of nodes that are not connected by an edge in the input graph. The number of negative edges emitted is equal to the number of positive edges emitted above.

Outputs these files in [output_dir]:

    pos_train_edges.tsv - input graph with test [and validation] positive edges
                          removed
    pos_test_edges.tsv - positive edges for test
    pos_valid_edges.tsv (optional) - positive edges for validation
    neg_train.tsv - a set of negative edges equal in number to pos_train_edges.tsv
    neg_test.tsv - a set of negative edges equal in number to pos_test_edges.tsv
    neg_valid.tsv (optional) - a set of negative edges equal in number to pos_valid_edges.tsv

    pos_train_nodes.tsv - identical to input nodes tsv

@justaddcoffee justaddcoffee marked this pull request as ready for review May 20, 2020 19:28
@deepakunni3
Copy link
Member

@justaddcoffee Looks good. 👍

@deepakunni3 deepakunni3 merged commit f702427 into master May 23, 2020
@justaddcoffee justaddcoffee deleted the make_edges branch June 9, 2020 04:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants