You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Currently, the WriteGraphML and WriteGEXF scripts use the urls themselves as ids. This causes problems for the Igraph library and is really not good practice for the production of uniqueids.
The challenge is that, with Apache spark partitioning and our current graph output ((url1, url2, date), count) you cannot just create an autonum on the fly. For instance, if i assign ids using a .zipWithIndex (( (1, url1), (2, url2), 20180129), 456) how would I be able to model it if the next entry is ((url2, url3, 20180129), 265) using a regular map since I would want the output to be ((2, url2),(3, url3), 20180129), 265)?
The current script identifies nodes using a flatmap .distinct() but the Node ids are also used to determine edge / links. If we cannot ensure a standard practice of assigning node ids, it will be impossible to match edges properly.
Possible approaches:
Simply use the hash(urlname) to produce a unique id. Since the hash will be the same result for url & the hash, all should work well.
Create an id-assigning function and map with that.
There may be some tools in the GraphX library that can help produce the correct result.
Apply labels in some different manner, perhaps with quotes around them to confirm that they are string and see if that works in Igraph.
In the end, both gephi and igraph will re-assign node and edge ids using their graph map.
The text was updated successfully, but these errors were encountered:
Currently, the WriteGraphML and WriteGEXF scripts use the urls themselves as ids. This causes problems for the Igraph library and is really not good practice for the production of uniqueids.
The challenge is that, with Apache spark partitioning and our current graph output
((url1, url2, date), count)
you cannot just create an autonum on the fly. For instance, if i assign ids using a .zipWithIndex(( (1, url1), (2, url2), 20180129), 456)
how would I be able to model it if the next entry is((url2, url3, 20180129), 265)
using a regular map since I would want the output to be((2, url2),(3, url3), 20180129), 265)
?The current script identifies nodes using a flatmap .distinct() but the Node ids are also used to determine edge / links. If we cannot ensure a standard practice of assigning node ids, it will be impossible to match edges properly.
Possible approaches:
In the end, both gephi and igraph will re-assign node and edge ids using their graph map.
The text was updated successfully, but these errors were encountered: