What's the format of config? #5

PJthunder · 2019-08-07T18:45:53Z

Looks like users should run through the config file (yaml file) for training/evaluation. But I have no idea what's the format it should be. Could you add some simple examples in the readme file? Maybe just run LINE on karate club network (adjacent matrix/edgelist)?

In addition, it is not very clear what will be included in the output pickle file. Could you please add some simple explanation about that?

KiddoZhu · 2019-08-07T18:58:45Z

For the yaml format, please refer to the document page.

The pickled object is a dict that contains all embeddings, and their index mappings to the original node names. The following lines may help you understand what is in the pickled object.

with open("line_blogcatalog.pkl", "rb") as fin:
    blogcatalog = pickle.load(fin)
print(blogcatalog.keys())

names = blogcatalog.id2name
embeddings = blogcatalog.vertex_embeddings
print(names[1024], embddings[1024])

PJthunder · 2019-08-08T00:40:22Z

Thank you for your prompt reply. The document is helpful but a little bit confusing to me. So if I want to specify a edge_link file to run. I need to create a single yaml file and change the global yaml file to indicate the dataset path? Also, I am not pretty sure which data format is acceptable for your project.

This information may be stored somewhere in the document. An easy start point in the readme file may still be very helpful for people who only want to calculate the embedding in a short run.

KiddoZhu · 2019-08-08T01:02:40Z

The dataset path in the global yaml file is for downloading/caching standard datasets. You don't necessarily need to modify it in most cases.

To run your own dataset,

Fork an existing yaml.
Change dataset paths in graph, evaluation. It's better to use absolute paths.
Change hyperparameters if necessary.

For all applications, the dataset format is strings separated by delimiters. There can be some comment at the end of each line. By default, delimiters are any blank characters, and comment prefix is "#". For node embeddings, the following examples are valid.

# this is a comment line
xxx yyy 1.5 # some comment
xxx yyy # some comment

The edge weight is optional.

If you're using Python, you can also pass list of (string, string, float) to the interface.

Sorry for not clarifying the data formats. We will add documentation of dataset format.

PJthunder · 2019-08-08T23:10:45Z

Thanks for your explanation!
Looks like I just need to provide a common edge list to it.
I have opened another issue about the evaluation file format. I hope it will be easy for you to add a simple documents for that one as well.

kalufinnle · 2019-08-08T23:19:11Z

I also found it a bit confusing of what the desired dataset format should be. Could you add some simple examples in the readme file. Or maybe tutorials on how we can run the codes on our own data? Thank you

PJthunder · 2019-08-13T21:57:23Z

sorry to brother you again. But I don't know what's the input format for LargeViz? I know it should be vectors for nodes, but do not know exactly how the vector should be formatted.

KiddoZhu · 2019-08-13T23:28:13Z

That's good. I will add it to the document.

If you call LargeVis from yaml, there are two formats, depending on the task. For graph visualization, it's an edge list. For vector visualization, it's an n*d text matrix, i.e. n lines of d-dimensional samples.

The command line graphvite visualize is only designed for vector visualization, but you can also pass a numpy dump of n*d matrix as input, with .npy suffix.

PJthunder · 2019-08-13T23:29:27Z

Thank you for the quick reply!

PJthunder · 2019-08-19T23:41:57Z

Want to double check. The label file for largevis is just an array (length = n) of string or integer right?

KiddoZhu · 2019-08-20T17:48:14Z

Yes. It can be either n lines of strings (*.txt) or a 1d numpy array (*.npy).

KiddoZhu · 2019-10-12T01:55:21Z

Added in v0.2.0

KiddoZhu closed this as completed Aug 7, 2019

KiddoZhu reopened this Aug 8, 2019

KiddoZhu added the documentation Improvements or additions to documentation label Aug 8, 2019

KiddoZhu mentioned this issue Aug 9, 2019

Does DeepWalk support weighted edges? #9

Closed

KiddoZhu closed this as completed Oct 12, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

What's the format of config? #5

What's the format of config? #5

PJthunder commented Aug 7, 2019

KiddoZhu commented Aug 7, 2019 •

edited

PJthunder commented Aug 8, 2019

KiddoZhu commented Aug 8, 2019 •

edited

PJthunder commented Aug 8, 2019

kalufinnle commented Aug 8, 2019

PJthunder commented Aug 13, 2019

KiddoZhu commented Aug 13, 2019 •

edited

PJthunder commented Aug 13, 2019

PJthunder commented Aug 19, 2019

KiddoZhu commented Aug 20, 2019 •

edited

KiddoZhu commented Oct 12, 2019

What's the format of config? #5

What's the format of config? #5

Comments

PJthunder commented Aug 7, 2019

KiddoZhu commented Aug 7, 2019 • edited

PJthunder commented Aug 8, 2019

KiddoZhu commented Aug 8, 2019 • edited

PJthunder commented Aug 8, 2019

kalufinnle commented Aug 8, 2019

PJthunder commented Aug 13, 2019

KiddoZhu commented Aug 13, 2019 • edited

PJthunder commented Aug 13, 2019

PJthunder commented Aug 19, 2019

KiddoZhu commented Aug 20, 2019 • edited

KiddoZhu commented Oct 12, 2019

KiddoZhu commented Aug 7, 2019 •

edited

KiddoZhu commented Aug 8, 2019 •

edited

KiddoZhu commented Aug 13, 2019 •

edited

KiddoZhu commented Aug 20, 2019 •

edited