Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

pi output type #8

Closed
alfonsotecnalia opened this issue Sep 3, 2020 · 17 comments
Closed

pi output type #8

alfonsotecnalia opened this issue Sep 3, 2020 · 17 comments

Comments

@alfonsotecnalia
Copy link
Collaborator

At this moment, we are considering the following output types: scalar, vector, matrix, labelled_matrix and string.

We see in the output file that you are considering output types such as vector_of_vector and vector_of_matricies. Could they be considered as a matrices (type matrix)? Would there be any reason to add vector_of_vector/vector_of_matrices types to the list of accepted output types?

@dzhvansky
Copy link
Owner

There is a reason for 5 parameters: 'pattern_fwhm', 'pattern_coa', 'patterns_similarity', 'synergies_similarity', 'matching_standard_reference_index'.
For these parameters, we can get a different number of synergies at different speeds (in the case of using 'BLF' or 'R2=0.90' as NMF stop criteria). For example, we can get [[96.3, 9.8, 41.3], [93.2, 3.4, 36.4, 49.0, 90.4], [93.4, 6.6, 21.4, 38.6]] for 'pattern_coa' parameter.
For the rest of the parameters, we have reduced the types to the standard ones.

@alfonsotecnalia
Copy link
Collaborator Author

You mean that the parameters which result is of type vector_of_vector could have variable number of vectors of different sizes, don't you?

@dzhvansky
Copy link
Owner

dzhvansky commented Sep 10, 2020

The number of vector elements will always be 3 because we have 3 different speeds in this setup. But the size of each element of the vector can be different (minimum 3, maximum ~ 6 depending on the number of synergies calculated).

@alfonsotecnalia
Copy link
Collaborator Author

Ok, then I think we have to extend the types to accomodate vector_of_vector and vector_of_matrices. What do you think @aremazeilles?

@aremazeilles
Copy link
Collaborator

No issue with that. I am only wondering whether we should add for labels for the lines / columns, to enrich (potentially) the visualization and comparison of such results

@aremazeilles
Copy link
Collaborator

@dzhvansky

We have updated the PI output format: see the doc

In your case, the following changes would be needed:

patterns_similarity.yaml : change vector_of_matricies to vector_of_matrix. Right now your format is:

type: vector_of_matricies
value: [[[0.46952, -0.61304, 0.87954, 0.64039], [0.93869, -0.28111, 0.38969, 0.88667], [-0.25053, 0.95214, -0.70183, -0.51035], [0.10085, 0.54473, -0.6915, -0.27004], [0.1191, -0.45409, 0.77165, 0.34175]], [[0.68391, -0.4158, 0.82883, 0.024115], [0.8097, -0.38895, 0.34239, 0.64038], [-0.24222, 0.95568, -0.61256, -0.46461], [-0.3614, 0.50409, -0.43918, -0.44096], [-0.056503, -0.43976, 0.46824, 0.71175]], [[0.96272, -0.19, 0.39977, 0.1049], [-0.094108, 0.47302, -0.30215, 0.16371], [-0.29254, 0.96314, -0.4772, -0.37951], [0.38344, -0.38879, 0.26328, 0.96102], [0.69427, -0.43879, 0.94155, 0.12186]]]

We would suggest shifting it to:

type: vector_of_matrix
value:
  - value: [[0.46952, -0.61304, 0.87954, 0.64039], [0.93869, -0.28111, 0.38969, 0.88667], [-0.25053, 0.95214, -0.70183, -0.51035], [0.10085, 0.54473, -0.6915, -0.27004], [0.1191, -0.45409, 0.77165, 0.34175]]
  - value: [[0.68391, -0.4158, 0.82883, 0.024115], [0.8097, -0.38895, 0.34239, 0.64038], [-0.24222, 0.95568, -0.61256, -0.46461], [-0.3614, 0.50409, -0.43918, -0.44096], [-0.056503, -0.43976, 0.46824, 0.71175]]
  - value: [[0.96272, -0.19, 0.39977, 0.1049], [-0.094108, 0.47302, -0.30215, 0.16371], [-0.29254, 0.96314, -0.4772, -0.37951], [0.38344, -0.38879, 0.26328, 0.96102], [0.69427, -0.43879, 0.94155, 0.12186]]

Meaning in matrix of the vector is placed on a single line.

Same change would be needed for the synergies_similarity.

Then, for all format, we give the opportunity to define labels per row, cols, etc. If it is relevant for you, please check the format in the doc.

If you need help with this update, please let me know.

@aremazeilles
Copy link
Collaborator

ping

@dzhvansky
Copy link
Owner

Prepared an update for data types. Several questions also popped up:

  1. Does the doc indicate the structure for type vector_of_vector correctly? Should the key be values or value like for other pi types?
  2. For vector_of_vector and vector_of_matrix data types, I suggest adding a high-level label to tag the main vector:

type: 'vector_of_vector'
label: [vec_label_1, vec_label_2]
values:

  • label: [label_1, label_2, label_3, label_4]
    value: [val_1, val_2, val_3, val_4]
  • label: [label_1, label_2, label_3]
    value: [val_1, val_2, val_3]

  1. why is the indent 3 spaces and not 2? As far as I know, the common practice for yaml is 2 space indentation.

@aremazeilles
Copy link
Collaborator

You are right,, it should be value and not values.
Your proposition for adding a label item sounds good to me as well.

This is unclear where there is 3 spaces and not 2. Could you clarify this?

@dzhvansky
Copy link
Owner

This is unclear where there is 3 spaces and not 2. Could you clarify this?

I mean there is one extra space before the hyphen in - label keys:
image

What I suggest (common yaml format):
image

And the last question is whether quotes are needed for string variables in our output yaml format?

@aremazeilles
Copy link
Collaborator

In this documentation, they are suggesting to add an indentation.

Actually I am using in the cI process yamllint to verify all yaml file in the repository. If I launch it with your suggestion:

type: 'vector_of_vector'
values:
- label: [label_1, label_2]
  value: [val_1, val_2]
- label: [label_1, label_2, label_3]
  value: [val_1, val_2, val_3]

It states:

╭─anthony@PRT0962AL ~/tmo 
╰─$ yamllint pi.yaml
pi.yaml
  1:1       warning  missing document start "---"  (document-start)
  3:1       error    wrong indentation: expected 2 but found 0  (indentation)

Note that under python, loading of both versions works, and it is probable that both version works. But as I use yamlint I would be more in favour in having an indentation, as we suggest.
Actually there might be a differentiation whether the list is at the root of the yaml file (in that case, no indentation) or whether the list is associated to a key (which is our case).

And the last question is whether quotes are needed for string variables in our output yaml format?

I would say that this is not mandatory

@aremazeilles
Copy link
Collaborator

Actually to respect the yamllint indication, we should even go for an indentation of two spaces:

type: 'vector_of_vector'
value:
  - label: [label_1, label_2]
    value: [val_1, val_2]
  - label: [label_1, label_2, label_3]
    value: [val_1, val_2, val_3]

But the tool seems to accept indentation of one space, so I think we are flexible in that, whether you use 1 or 2 space indentation

@dzhvansky
Copy link
Owner

OK, thank you! I indent 2 spaces and updated the output files. Please check the reproducibility on linux machine and replace output files if necessary.

@aremazeilles
Copy link
Collaborator

The result I get now is slightly differing from the reference output data. I attach the files that get generated. Can you confirm these variations are minor to you (so that I can place them as reference output data?)

output.zip

I guess this is why the CI process fails right now

@aremazeilles
Copy link
Collaborator

Appart from that, I think the output format are correct

@dzhvansky
Copy link
Owner

Yes, I confirm the differences are minor, so you can replace the reference output.

@aremazeilles
Copy link
Collaborator

Reference updated.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants