Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Better approach to ids in WriteGraphML & WriteGEXF #168

Closed
greebie opened this issue Jan 29, 2018 · 1 comment
Closed

Better approach to ids in WriteGraphML & WriteGEXF #168

greebie opened this issue Jan 29, 2018 · 1 comment
Assignees

Comments

@greebie
Copy link
Contributor

greebie commented Jan 29, 2018

Currently, the WriteGraphML and WriteGEXF scripts use the urls themselves as ids. This causes problems for the Igraph library and is really not good practice for the production of uniqueids.

The challenge is that, with Apache spark partitioning and our current graph output ((url1, url2, date), count) you cannot just create an autonum on the fly. For instance, if i assign ids using a .zipWithIndex (( (1, url1), (2, url2), 20180129), 456) how would I be able to model it if the next entry is ((url2, url3, 20180129), 265) using a regular map since I would want the output to be ((2, url2),(3, url3), 20180129), 265)?

The current script identifies nodes using a flatmap .distinct() but the Node ids are also used to determine edge / links. If we cannot ensure a standard practice of assigning node ids, it will be impossible to match edges properly.

Possible approaches:

  1. Simply use the hash(urlname) to produce a unique id. Since the hash will be the same result for url & the hash, all should work well.
  2. Create an id-assigning function and map with that.
  3. There may be some tools in the GraphX library that can help produce the correct result.
  4. Apply labels in some different manner, perhaps with quotes around them to confirm that they are string and see if that works in Igraph.

In the end, both gephi and igraph will re-assign node and edge ids using their graph map.

@ianmilligan1
Copy link
Member

Closed with #170

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants