-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Yadav clustering #2
base: master
Are you sure you want to change the base?
Conversation
Awesome work Kelle! Glad you were able to figure out pull requests, I'm going to quote part of your email to me just so I can keep the conversation here:
I'll step through the code and may leave questions/comments as I work through it. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Awesome work Kelle! I have a few comments/suggestions for further conversation.
1D clustering Yadav.R
Outdated
spines_in_cluster_test_1D <- cluster_freq_test_1D %>% group_by(is_clustered) %>% summarise(num_clusters_test_1D = sum(Freq)) #count how many spines that are in a true cluster or not | ||
spines_clustered_test_1D <- spines_in_cluster_test_1D[2,2] # define how many spines are in a cluster | ||
spines_not_test_1D <- as.numeric(total_spines - spines_clustered_test_1D) # calculate how many spines are not clustered | ||
spines_clustered_test_1D[is.na(spines_clustered_test_1D)] <- 0 #returns 0 instead of Na if no spines are clustered in the random sample |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This would be the case when there are no clusters that contain more than one spine?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
correct. sometimes there would be "no clusters" (no groups with more than one spine) and that would return an NA and ruin all the code
1D clustering Yadav.R
Outdated
|
||
# 3D random spines for loop | ||
for(j in 1:100){ | ||
test_data_X <- data.frame(sample(df$X), df$Y, df$Z) # randomize the X's, Y's, and Z's to make a "biologically plausible" dataframe. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think only the first line and last line in this block are necessary:
test_data <- data.frame(sample(df$X), sample(df$Y), sample(df$Z))
test_dist_3D <- as.matrix(dist(test_data)) # creates distance matrix for random sample
But I think I see your thought process to constrain how "random" the datapoints are, so I'll just pin this for further conversation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I have an idea for how to more accurately constrain the datapoints, since the random points still seem to be a little too random. (we should expect clustering on most dendritic branches, we just want to see changes in the degree).
I'm thinking that we need to set up some sort of selection criteria for the random coordinates.
The dendrites and spines essentially live in a known cylindrical space (length being the length of the dendrite which we have 3D coordinates to regenerate, and the width/radius being the max spine length--currently when I map the random 3D coordinates it doesn't really look like spines along a dendritic length.
So, if we make criteria that the random 3D coordinates must fall within the known limits, we may be able to have better randomization. I'm going to try thinking about that this week, but let me know what you think.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That sounds like a good idea, but let me think through it.
I'll make up some numbers (that may not be biologically plausible).
if we have a 50um dendrite segment with 150 spine heads, then we have 150 x-coordinates, 150 y-coordinates and 150 z-coordinates.
If we randomized using the data available, we would have 150*150*150 or 150^3 or 3375000 possible combinations of 150 coordinates.
If we generate a list of possible coordinates within a cylinder incrementing by 0.01, this list should include the 3375000 coordinates produced by the previous method (since all those xyz-coordinates were actually observed and recorded) and several billion others.
napkin math (if I assume a radius of 5um):
volume of cylinder: 50*(pi*5^2) ~ 3927um
increments of 0.01: 3927 / 0.01 = 392700 coordinates
all combinations of selecting 150 points: 392700 choose 150 ~ 2187165081225137310382017233099995046337100876544142411431297970118841785272017984540668630264009105699815020447043722621456400930097774157111152285573435447744179823216702996167899049246262375503173540118312468464288508173005027094694777928571049706316046267083117726289897816726177377941620617732303764895199090971241059657147091983664496692488541826387269341802889197146363386214674094117662445818311269533829858635004649504925878489693364264340423726784051623161387140226054951726805351551643633068009282354287466945089405647311894109982832344478823923883759838211381115148 possible combinations of 150 coordinates
(I did not use a napkin).
From this example, randomizing the observed data constrains the space where spine heads "can" be by many orders of magnitude. I think the hop from 1D space to 3D space adds more dimensions for data to move and thus completely random data are less likely to cluster, and I don't know if there is an easy way to get around that, besides adding some conditionals that say spines that are too far from the next closest spine should be reselected.
Using the observed data should look closer to the biological reality (but probably not anywhere close to perfect), and the resulting coordinates should be within the cylinder of interest since we are using the observed data to generate "random" data points. Generating all possible datapoints will create a bunch of new places the spine heads can be, but I cannot think of a reason why they would be more clustered since we are still dealing with the problem of 3 dimensions versus 1 dimension (it's easier to cluster in 1 dimension than 3). Does this make sense?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since the coordinates do not look right, do we need to include information about SOMA-DISTANCE in addition to the x-y-z coordinates? perhaps I'm still confused on what each variable represents in space.
1D clustering Yadav.R
Outdated
std_curve_3D <- sd(curve_dnorm_3D) | ||
mean_curve_3D <- mean(curve_dnorm_3D) | ||
|
||
Cscore_3D <- pnorm(spines_clustered_3D, mean_test_3D, std_test_3D) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I think you are right that we will not see much useful output here since the amount of observed clustering is above anything that was simulated. We can leverage the z-score:
zscore <- (spines_clustered_3D - mean_test_3D) / std_test_3D
With the file you shared with me, I got about 6.23, So the observed density was 6.23 standard deviations above the simulated random distribution.
Co-Authored-By: kellnett <44405714+kellnett@users.noreply.github.com>
current code I'm using I worked on it separately outside of GitHub so a lot is different
No description provided.