Yadav clustering #2

kellnett · 2019-03-15T15:46:26Z

No description provided.

jdkent · 2019-03-18T02:27:05Z

Awesome work Kelle! Glad you were able to figure out pull requests, I'm going to quote part of your email to me just so I can keep the conversation here:

Just as a quick summary, I found the function "IdClusters", which performs the UPGMA analysis and gives information concerning how many clusters fall within the cutoff given, and how many spines are in said clusters. From there I find number of clustered spines (given that the cluster has >1 spines in it). I run that with the random data a number of times to get a distribution of # of clustered spines given the total number of spines.

This next part I'm really trying to think through conceptually, so I'm not quite sure if I have it right yet, but I use dnorm() on the number of clusters (and the mean/std of that sample) to find the probability density function, then I take the pnorm() of that calculated dnorm to get a Cscore. I think that is how the Yadav paper (attached) calculates Cscore, but like I said, I'm still trying to think through it.

I like the idea of using Zscores too. My goal is to keep pursuing all of these 'possible' analyses routes to really compare the types of values we get. Once we get working code too I can run it on the whole set of data to see if you new analysis somewhat aligns with the original analysis I was using.

I'll step through the code and may leave questions/comments as I work through it.

jdkent

Awesome work Kelle! I have a few comments/suggestions for further conversation.

1D clustering Yadav.R

jdkent · 2019-03-18T05:43:36Z

1D clustering Yadav.R

+    spines_in_cluster_test_1D <- cluster_freq_test_1D %>% group_by(is_clustered) %>% summarise(num_clusters_test_1D = sum(Freq)) #count how many spines that are in a true cluster or not
+    spines_clustered_test_1D <- spines_in_cluster_test_1D[2,2] # define how many spines are in a cluster
+    spines_not_test_1D <- as.numeric(total_spines - spines_clustered_test_1D) # calculate how many spines are not clustered
+    spines_clustered_test_1D[is.na(spines_clustered_test_1D)] <- 0  #returns 0 instead of Na if no spines are clustered in the random sample


This would be the case when there are no clusters that contain more than one spine?

correct. sometimes there would be "no clusters" (no groups with more than one spine) and that would return an NA and ruin all the code

1D clustering Yadav.R

jdkent · 2019-03-18T06:00:17Z

1D clustering Yadav.R

+
+# 3D random spines for loop
+  for(j in 1:100){
+    test_data_X <- data.frame(sample(df$X), df$Y, df$Z) # randomize the X's, Y's, and Z's to make a "biologically plausible" dataframe.


I think only the first line and last line in this block are necessary:

test_data <- data.frame(sample(df$X), sample(df$Y), sample(df$Z)) test_dist_3D <- as.matrix(dist(test_data)) # creates distance matrix for random sample

But I think I see your thought process to constrain how "random" the datapoints are, so I'll just pin this for further conversation.

I think I have an idea for how to more accurately constrain the datapoints, since the random points still seem to be a little too random. (we should expect clustering on most dendritic branches, we just want to see changes in the degree).

I'm thinking that we need to set up some sort of selection criteria for the random coordinates.
The dendrites and spines essentially live in a known cylindrical space (length being the length of the dendrite which we have 3D coordinates to regenerate, and the width/radius being the max spine length--currently when I map the random 3D coordinates it doesn't really look like spines along a dendritic length.

So, if we make criteria that the random 3D coordinates must fall within the known limits, we may be able to have better randomization. I'm going to try thinking about that this week, but let me know what you think.

That sounds like a good idea, but let me think through it.
I'll make up some numbers (that may not be biologically plausible).

if we have a 50um dendrite segment with 150 spine heads, then we have 150 x-coordinates, 150 y-coordinates and 150 z-coordinates.
If we randomized using the data available, we would have 150*150*150 or 150^3 or 3375000 possible combinations of 150 coordinates.

If we generate a list of possible coordinates within a cylinder incrementing by 0.01, this list should include the 3375000 coordinates produced by the previous method (since all those xyz-coordinates were actually observed and recorded) and several billion others.
napkin math (if I assume a radius of 5um):
volume of cylinder: 50*(pi*5^2) ~ 3927um
increments of 0.01: 3927 / 0.01 = 392700 coordinates
all combinations of selecting 150 points: 392700 choose 150 ~ 2187165081225137310382017233099995046337100876544142411431297970118841785272017984540668630264009105699815020447043722621456400930097774157111152285573435447744179823216702996167899049246262375503173540118312468464288508173005027094694777928571049706316046267083117726289897816726177377941620617732303764895199090971241059657147091983664496692488541826387269341802889197146363386214674094117662445818311269533829858635004649504925878489693364264340423726784051623161387140226054951726805351551643633068009282354287466945089405647311894109982832344478823923883759838211381115148 possible combinations of 150 coordinates

(I did not use a napkin).

From this example, randomizing the observed data constrains the space where spine heads "can" be by many orders of magnitude. I think the hop from 1D space to 3D space adds more dimensions for data to move and thus completely random data are less likely to cluster, and I don't know if there is an easy way to get around that, besides adding some conditionals that say spines that are too far from the next closest spine should be reselected.

Using the observed data should look closer to the biological reality (but probably not anywhere close to perfect), and the resulting coordinates should be within the cylinder of interest since we are using the observed data to generate "random" data points. Generating all possible datapoints will create a bunch of new places the spine heads can be, but I cannot think of a reason why they would be more clustered since we are still dealing with the problem of 3 dimensions versus 1 dimension (it's easier to cluster in 1 dimension than 3). Does this make sense?

Since the coordinates do not look right, do we need to include information about SOMA-DISTANCE in addition to the x-y-z coordinates? perhaps I'm still confused on what each variable represents in space.

jdkent · 2019-03-18T06:24:08Z

1D clustering Yadav.R

+  std_curve_3D <- sd(curve_dnorm_3D)
+  mean_curve_3D <- mean(curve_dnorm_3D)
+
+  Cscore_3D <- pnorm(spines_clustered_3D, mean_test_3D, std_test_3D)


Yeah, I think you are right that we will not see much useful output here since the amount of observed clustering is above anything that was simulated. We can leverage the z-score:

zscore <- (spines_clustered_3D - mean_test_3D) / std_test_3D

With the file you shared with me, I got about 6.23, So the observed density was 6.23 standard deviations above the simulated random distribution.

Co-Authored-By: kellnett <44405714+kellnett@users.noreply.github.com>

current code I'm using I worked on it separately outside of GitHub so a lot is different

kellnett added 2 commits March 15, 2019 10:43

Create 1D clustering Yadav.R

1712c4e

Update 1D clustering Yadav.R

e65c4aa

kellnett changed the title ~~Create 1D clustering Yadav.R~~ Yadav clustering Mar 15, 2019

kellnett added 2 commits March 15, 2019 14:12

Update 1D clustering Yadav.R

a94a6ac

Update 1D clustering Yadav.R

130ce90

jdkent reviewed Mar 18, 2019

View reviewed changes

jdkent and others added 4 commits March 18, 2019 10:21

Apply suggestions from code review

d8dccee

Co-Authored-By: kellnett <44405714+kellnett@users.noreply.github.com>

Update 1D clustering Yadav.R

68e60c2

Create 3D clustering.R

595d51d

Update 1D clustering Yadav.R

1e8a84e

jdkent mentioned this pull request Mar 29, 2019

less random 3D point selection : Calculate Points within a Cylinder #3

Open

kellnett and others added 4 commits October 28, 2019 15:06

Add files via upload

7cc875d

Create 3D-update

dfb8fe6

updated_loop

839b277

current code I'm using I worked on it separately outside of GitHub so a lot is different

Merge branch 'master' of https://github.com/kellnett/spineDensity

c44031d

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Yadav clustering #2

Yadav clustering #2

kellnett commented Mar 15, 2019

jdkent commented Mar 18, 2019

jdkent left a comment

jdkent Mar 18, 2019

kellnett Mar 18, 2019

jdkent Mar 18, 2019

kellnett Mar 18, 2019

jdkent Mar 18, 2019 •

edited

Loading

jdkent Mar 18, 2019

jdkent Mar 18, 2019

Yadav clustering #2

Are you sure you want to change the base?

Yadav clustering #2

Conversation

kellnett commented Mar 15, 2019

jdkent commented Mar 18, 2019

jdkent left a comment

Choose a reason for hiding this comment

jdkent Mar 18, 2019

Choose a reason for hiding this comment

kellnett Mar 18, 2019

Choose a reason for hiding this comment

jdkent Mar 18, 2019

Choose a reason for hiding this comment

kellnett Mar 18, 2019

Choose a reason for hiding this comment

jdkent Mar 18, 2019 • edited Loading

Choose a reason for hiding this comment

jdkent Mar 18, 2019

Choose a reason for hiding this comment

jdkent Mar 18, 2019

Choose a reason for hiding this comment

jdkent Mar 18, 2019 •

edited

Loading