Skip to content
This repository has been archived by the owner on Jun 5, 2024. It is now read-only.

Problem with the cluster indexing after DBSCAN #41

Closed
PavelKavrigin opened this issue Aug 25, 2021 · 3 comments
Closed

Problem with the cluster indexing after DBSCAN #41

PavelKavrigin opened this issue Aug 25, 2021 · 3 comments

Comments

@PavelKavrigin
Copy link
Contributor

After DBSCAN is applied in epix, the clusters are indexed for further merging:

epix/epix/clustering.py

Lines 33 to 41 in 53d48c8

# Splitting into individual events and time cluster and apply space clustering space:
groups = df.groupby(['entry', 'time_cluster'])
groups = groups.apply(lambda x: _find_cluster(x, cluster_size_space))
for i in np.unique(groups.index.get_level_values(0)):
add_to_cluster = 0
for j in range(len(groups[i])):
groups[i][j]+=add_to_cluster
add_to_cluster = np.max(groups[i][j])+1

An order of cluster indices in the column which is created here does not match the order of rows in the dataframe with interactions, which creates a problem in this line:

df['cluster_id'] = np.concatenate(groups.values)

Therefore we have a lot of cases where a cluster contains interactions which do not belong to this cluster. Since a weighted averaging is applied during cluster merging, this issue is not that noticeable in the resulting output. A detailed description of the issue and a possible solution is here:

/dali/lgrandi/pkavrigin/2021-08-25_FixEpixDemo/2021-08-25_FixForEpix_Demo.ipynb

NB: The notebook requires 'dbg_out' branch of epix if you want to re-run it.

@PavelKavrigin
Copy link
Contributor Author

If you agree with the proposed solution, I can create a new branch with this fix only, without all the debugging stuff in 'dbg_out' branch.

@ramirezdiego
Copy link
Collaborator

Hey @PavelKavrigin, thank you very much for the super explanatory notebook. I want to have another look at the proposed solution tomorrow, but the investigation seems conclusive in any case.

@HenningSE
Copy link
Contributor

Issue was solved in #45.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants