Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cosym: choose cluster containing the most identity reindexing ops #1514

Merged
merged 2 commits into from
Dec 11, 2020

Conversation

rjgildea
Copy link
Contributor

@rjgildea rjgildea commented Dec 7, 2020

This can be important if the lattice symmetry is only very approximately related by pseudosymmetry to the true symmetry. If all potential reindexing ops are genuine indexing ambiguities, then it doesn't matter which one is chosen, however if not, then choosing the wrong one will distort the true unit cell. In such cases it is likely that the input datasets were already indexed consistently, therefore default to choosing the cluster that contains the most identity reindexing ops.

@codecov
Copy link

codecov bot commented Dec 9, 2020

Codecov Report

Merging #1514 (6159374) into master (b36efe5) will decrease coverage by 0.00%.
The diff coverage is 96.19%.

@@            Coverage Diff             @@
##           master    #1514      +/-   ##
==========================================
- Coverage   65.62%   65.61%   -0.01%     
==========================================
  Files         614      614              
  Lines       68965    68958       -7     
  Branches     9529     9523       -6     
==========================================
- Hits        45257    45250       -7     
- Misses      21866    21868       +2     
+ Partials     1842     1840       -2     

Copy link
Contributor

@graeme-winter graeme-winter left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Popped a couple of comments inline, mostly asking for small improvements to the documentation of the code as there are some sgtbx type rabbit holes in this... I assume that the code does what the commit messages say and overall the spirit of the change set makes a lot of sense. I would wonder if this is a case for not squash merging however as the structure of the PR does have bug fix and tidying commits separately. Maybe squash down to 2 commits? So bugfix+news, cleaning+test changes

algorithms/symmetry/cosym/target.py Show resolved Hide resolved
@@ -175,10 +175,12 @@ def _map_space_group_to_input_cell(intensities, space_group):
sg_best = sg_primitive.change_basis(cb_op_best_primitive.inverse())
# best_subgroup above is the bravais type, so create thin copy here with the
# user-input space group instead
best_subsym = best_subsym.customized_copy(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this stuff could do with some kind of annotation somewhere... I am certain you have gained some understanding as a side-effect of these changes which should probably be documented inline?i.e. the why of est_subsym.change_basis(cb_op_inp_best.inverse()) etc.

I ask this as someone who may have to debug this one day :-)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is just to obtain the subsym in the input setting rather than the "best" setting -> cb_op_inp_best.inverse()

@@ -370,28 +372,6 @@ def _analyse_symmetry(self):
)
self.params.cluster.n_clusters = len(cosets.partitions)

def _space_group_for_dataset(self, dataset_id, sym_ops):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're not wrong that retaining this adds confusion, as I was trying to work out how things still work with all this stuff removed...

algorithms/symmetry/cosym/__init__.py Show resolved Hide resolved
@@ -177,9 +178,17 @@ def run(self):
self.cosym_analysis.run()

reindexing_ops = {}
sym_op_counts = {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is behaviour well defined in the unlikely event of a tie?

@@ -177,9 +178,17 @@ def run(self):
self.cosym_analysis.run()

reindexing_ops = {}
sym_op_counts = {
cluster_id: collections.Counter(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wait, what is this syntax?

Learning opportunity for YT

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, dictionary creation from iterator...

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Possibly re-arranging the code slightly (e.g. a dummy variable to move everything onto a single line) would make it look less magic?

newsfragments/1514.bugfix Outdated Show resolved Hide resolved
* Deprecate target.get_sym_ops(), use target.sym_ops instead

* Remove CosymAnalysis.space_groups - this has been superceded by the
pointless-style symmetry analysis, and retaining this likely just causes
added complexity or confusion.

* Tidy up CosymAnalysis._reindexing_ops_for_dataset method

* Call cosym run() directly instead of via procrunner

* Set n_clusters param after clustering - n_clusters may have been Auto,
in which case we now know how many clusters were found by clustering

* Cleaner to use 'if i_cluster in reindexing_ops: break'; add some
docstrings to the method.

* Minimal test for CosymAnalysis._reindexing_ops_for_dataset()
…exing ops

This can be important if the lattice symmetry is only very approximately related by
pseudosymmetry to the true symmetry. If all potential reindexing ops are genuine
indexing ambiguities, then it doesn't matter which one is chosen, however if not,
then choosing the wrong one will distort the true unit cell. In such cases it is
likely that the input datasets were already indexed consistently, therefore default
to choosing the cluster that contains the most identity reindexing ops.
@rjgildea rjgildea merged commit fcfd7b4 into master Dec 11, 2020
@rjgildea rjgildea deleted the cosym_reindexing_ops branch December 11, 2020 13:05
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants