Make edges #162

justaddcoffee · 2020-05-19T02:08:59Z

Code to emit positive and negatives train/test/validation edges for ML

Given a graph (from formatted node and edge TSVs), output positive edges and negative
edges for use in machine learning.

To generate positive edges: a set of test (and optionally validation) positive edges equal in number to [(1 - train_fraction) * number of edges in input graph] are randomly selected from the edges in the input graph, such that both nodes participating in the edge have a degree greater than min_degree (to avoid creating disconnected components). These edges are emitting as positive test [and optionally positive validation] nodes. These edges are then removed from the edges from the input graph and these are the training edges.

Negative edges are selected by randomly selecting pairs of nodes that are not connected by an edge in the input graph. The number of negative edges emitted is equal to the number of positive edges emitted above.

Outputs these files in [output_dir]:

    pos_train_edges.tsv - input graph with test [and validation] positive edges
                          removed
    pos_test_edges.tsv - positive edges for test
    pos_valid_edges.tsv (optional) - positive edges for validation
    neg_train.tsv - a set of negative edges equal in number to pos_train_edges.tsv
    neg_test.tsv - a set of negative edges equal in number to pos_test_edges.tsv
    neg_valid.tsv (optional) - a set of negative edges equal in number to pos_valid_edges.tsv

    pos_train_nodes.tsv - identical to input nodes tsv

deepakunni3 · 2020-05-23T03:17:31Z

@justaddcoffee Looks good. 👍

justaddcoffee added 30 commits May 14, 2020 09:15

Add edges command

dff92df

Merge stuff from master

09fdb8b

Merge again

884ce03

Add edgespy

d116035

Add click edges() command

71df96e

Tidy up args

8537ce3

Tidy up args

e4d7b05

Fix click args

bdcaf1c

Add test files

b016c40

Documentation

bc5fcce

Tests for edges command

73a65b7

Tweak args

acc03be

Fix test

3a3c4cd

Add args to make_edges()

e340c5d

Add test of query command

b5a8ffb

Code/tests for tsv to df

f66e50c

Prettify

4ec3463

Bigger test graph

e0f76c5

Add edge_label and relation col to edge TSV

a543d79

Add pass

3e16911

Check graph for disconnected nodes

361174c

Check graph for disconnected nodes

651bb2f

Stubs for new methods

5eec87c

Implementing negative edge selection

ac0d13d

Tests for negative edges

2d75711

Tests for negative edges

cc2d427

Neg edges passing shape test

6e6fb61

Add test for column names

54e15a9

Refactor

15c3fad

Test edge_label and relation columns

4ff7ef7

justaddcoffee added 22 commits May 19, 2020 19:04

Fix documentation about min_degree

172a8fa

Documentation

887b49f

Documentation

18e0861

Documentation

d6e4561

Documentation

4517e42

Update outfile names

36e26f9

Added failing tests for make_edges

4e84012

Tidying up

28ff256

Refactor

4758a4c

Passing df_to_tsv tests

6d54dbc

Refactor

1f82c5b

Tests for output files

23dafbb

Refactor

22513d2

Tests for node output file

cdf80ea

Lint

00e7ee8

More tests

7c44198

Write validation set

214cfff

More tests

1d288f8

All tests passing but 2

75ea858

All tests passing

c8596d0

Better progress messages

1a80bce

Better progress messages

ee7f270

justaddcoffee marked this pull request as ready for review May 20, 2020 19:28

justaddcoffee requested a review from deepakunni3 May 20, 2020 19:40

justaddcoffee added 2 commits May 20, 2020 14:21

Simple e2e test

f848bdb

Remove e2e tests, path issues

3baec06

deepakunni3 approved these changes May 23, 2020

View reviewed changes

deepakunni3 merged commit f702427 into master May 23, 2020

justaddcoffee deleted the make_edges branch June 9, 2020 04:06

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Make edges #162

Make edges #162

justaddcoffee commented May 19, 2020 •

edited

Loading

deepakunni3 commented May 23, 2020

Make edges #162

Make edges #162

Conversation

justaddcoffee commented May 19, 2020 • edited Loading

deepakunni3 commented May 23, 2020

justaddcoffee commented May 19, 2020 •

edited

Loading