Better approach to ids in WriteGraphML & WriteGEXF #168

greebie · 2018-01-29T15:49:07Z

Currently, the WriteGraphML and WriteGEXF scripts use the urls themselves as ids. This causes problems for the Igraph library and is really not good practice for the production of uniqueids.

The challenge is that, with Apache spark partitioning and our current graph output ((url1, url2, date), count) you cannot just create an autonum on the fly. For instance, if i assign ids using a .zipWithIndex (( (1, url1), (2, url2), 20180129), 456) how would I be able to model it if the next entry is ((url2, url3, 20180129), 265) using a regular map since I would want the output to be ((2, url2),(3, url3), 20180129), 265)?

The current script identifies nodes using a flatmap .distinct() but the Node ids are also used to determine edge / links. If we cannot ensure a standard practice of assigning node ids, it will be impossible to match edges properly.

Possible approaches:

Simply use the hash(urlname) to produce a unique id. Since the hash will be the same result for url & the hash, all should work well.
Create an id-assigning function and map with that.
There may be some tools in the GraphX library that can help produce the correct result.
Apply labels in some different manner, perhaps with quotes around them to confirm that they are string and see if that works in Igraph.

In the end, both gephi and igraph will re-assign node and edge ids using their graph map.

The text was updated successfully, but these errors were encountered:

ianmilligan1 · 2018-02-17T18:55:13Z

Closed with #170

ianmilligan1 added the enhancement label Jan 29, 2018

ianmilligan1 assigned greebie Feb 6, 2018

greebie mentioned this issue Feb 13, 2018

Graphml Improvements #170

Merged

ianmilligan1 closed this as completed Feb 17, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Better approach to ids in WriteGraphML & WriteGEXF #168

Better approach to ids in WriteGraphML & WriteGEXF #168

greebie commented Jan 29, 2018

ianmilligan1 commented Feb 17, 2018

Better approach to ids in WriteGraphML & WriteGEXF #168

Better approach to ids in WriteGraphML & WriteGEXF #168

Comments

greebie commented Jan 29, 2018

ianmilligan1 commented Feb 17, 2018