Skip to content

Timeline

Brian Pho edited this page Nov 5, 2022 · 51 revisions

This page describes my progression through my masters project. Although it's only a summary, the main milestones and achievements are described. Detailed tasks can be found on the programming and project Kanban boards.

September 2020 - April 2021

I started this project in September 2021 and as with starting anything, I had to learn a lot very quickly. For the first few months, I was

  • Getting access to the systems and servers. This includes NoMachine and WinSCP for remotely accessing the server.
  • Managing my schedule with meetings, classes, and deadlines. I had two lab meetings (since I had two supervisors) and a one-on-one meeting with my main supervisor. I was also taking two classes at the time and was TAing an intro psych course.
  • Writing scholarship essays for the Borealis AI fellowship and the NSERC CGS-M scholarship.
  • Go through a Coursera MRI course.
  • Read a lot of literature on the development of intelligence.

After getting access to the server. I started learning the organization of the data and started writing the Matlab preprocessing script. This took some time as the script was very outdated and didn't follow good coding practices. In the meanwhile, I set up this Github repo and prepared for my first advisory committee meeting in January. Near the end of 2021, I set up Anaconda and Jupyter notebook and started generated the functional connectivity matrices for each subject. At the same time, I had just finished preprocessing all of the data. Then I took a few weeks off for Christmas and New Years.

Once I came back, I had to deal with the committee meeting, which went fine. Even though the Matlab AA preprocessing was done, I still had to check that the preprocessing was valid and ensure that the bad subjects didn't slip through. This took some time to manually go through all subjects, both their structural and functional MRI scans. Since I didn't have the cognitive measures at this point, I did some exploratory data analysis using PCA but that didn't result in much.

Once I got the measures, I started to explore the cognitive measures of the dataset, mainly WISC and IQ, using supervised models. Early results didn't yield much so I started work on the hypothesis-driven analysis using the Yeo atlas. This meant I had to generate another set of FCs which took time. A few months pass at this point and we came across a PLS paper that predicted age using FC so we tried it on our dataset.

It was successful and replicated the results so well. We then applied it to predict IQ instead of age but that didn't go so well until we did feature selection using mutual information. This was the moment that we finally hit upon good results. With this model, we started digging deeper by using age bins and exploring the connections. That's where I'm currently at.

May 2021

Reducing Overfit

So Yalda gave me some suggestions to reduce the overfitting (train r^2 > test r^2) of the PLS model and I've implemented them with some mixed results.

  • Grouping and average: dividing the dataset into groups of 6 and averaging the 6 subjects to form one 'subject'. This results in 96 subjects and a test r^2 of 0.18. However, the training r^2 is still above the test r^2.
  • Adding noise: adding some Gaussian noise to the FC and IQ scores. Adding the noise to the entire dataset resulted in poor results (as in -0.6 test score) and adding the noise to just the training set also has poor results. Bobby suggested that I don't add noise to the IQ but only to the FC , which is what I'll try next.

PLS Coefficient Similarity

So I checked the similarity of the PLS coefficients across the different age bins and the results are shown here: https://github.com/Brian-Pho/MSc_Research-Project/wiki/PLS#cosine-similarity I get the same results using a normalized Euclidean distance measure so the results are consistent. Me and Bobby discussed this result and we found it interesting that the similarity correlates with age, although that strength differs for different age bins. The next step is to compare how many overlapping connections overlap with each other between the age bins.

PLS Primary Index for Age Bins

I ran the MI hyperparameter search for the age bins on the WISC primary indices and the results are here: https://github.com/Brian-Pho/MSc_Research-Project/wiki/Mutual-Information#wisc It's similar to the results for all ages with visual spatial and verbal comprehension doing the best in the age bins too.

Ridge Results

Finally, the last result is from doing a hyperparameter search using the Ridge model on each age bin. Results are here: https://github.com/Brian-Pho/MSc_Research-Project/wiki/Regularization#ridge It's somewhat consistent with the PLS model in that all ages and bin 2 both have results close to the PLS, and that bins 1 and 3 do poorly. But bin 2 doesn't do as good as the PLS and the scores are negative for bin 1 and 3 for Ridge, which are positive for PLS. Next step is to check the coefficients against each age bin and against the PLS. Also, a hyperparameter search using Lasso doesn't do well at all, with a score of 4%.

June 2021

  • Lin regression between FC variance and IQ
  • Checked sex distribution
  • Checked IQ distribution per diagnosis
  • Internship (ask for reference)
  • Abstract deadline
  • Advisory committee meeting
  • Which journal for submission?
  • Timeline for summer

July 2021

  • After fixing the leaking testing data, the r^2 goes down to 0.09 for all ages all IQ PLS.
  • Standard scaling with Ridge gives reduced r^2?!? Nevermind, it just needed more regularization strength.
  • Can't use Neurosynth since these aren't exactly fMRI activations. We can use it to determine the anatomical location though.

August 2021

  • How to split dataset (only ADHD, reduce ADHD, keep same)
  • Check feature weight reliability
  • ADHD vs healthy control group (priority) [one paper]
    • Redo all experiments
    • Need more healthy controls
      • Either download it, or find it in my data
  • Equal num of each sample
  • Compare PLS weights for multivariate vs univariate
  • Check PLS multivariate math
  • Get HBN access (so I can download and explore the site myself)
  • Read circular analysis and power atlas papers
  • ADHD co-morbid or not
  • Figure out where the healthy subjects are going (what's biasing it)
  • Current status of the dataset
  • Matlab preprocessing error
  • What do you think is the timeline for my masters? Defend around June 2022.
  • Discuss dataset (num adhd vs controls, adhd co-morbid)
  • Discuss whether to include development or not
  • Downloaded and preprocessing healthy subjects (running into issues)
  • Flux society abstract acceptance
  • Advisory committee report
  • Get subjects with only adhd vs ahdh co-morbid
  • We're doing everything (all diagnosis, all age bins, all models)
  • In the next week, do as much analysis for the poster (Aug 26 done analysis for poster)
    • Do analysis on 300 adhd and 50 healthy controls
    • Only do age binning for adhd, but run all ages for both groups
  • Try ICC on both models for all age bins
  • Meet next Thursday to discuss analysis
  • Inkscape/Illustrator/Powerpoint for the poster

September 2021

  • Submitted Advisory Committee report
  • ADHD non-comorbid results done
  • Applied to neuroscience travel awards
  • Poster draft
  • Test multivariate PLS vs uni to save time
  • Updates on flux poster
  • Updates on preprocessing (4th last step)
  • Updates on ICC (issues with ICC for high number of repeats (> 50))
  • Go over univariate vs multivariate PLS notes (I believe univariate PLS reduces down to PCR)
  • Go over master data pipeline
    • Repeat modeling for PLS and Ridge
    • Only save model weights for significant values (can't do it with perm testing func)

October 2021

  • Don't use random state, may need to rerun all results depending on if repeat k-fold isn't random across repeats
  • Check if repeated-k-fold cross-validation is randomized across fold and repeats
    • It is randomized across folds and repeats, but isn't across runs
  • Check how the repeated-k-fold random state affects the ICC
    • Random state doesn't affect ICC, same results as before
    • Find explanation for high ICC in ICCk cases
  • Do both out-sample and in-sample ICC
  • No longer doing
    • Comparing PLS and Ridge
    • No healthy age bins
    • No cosine similarity
  • Check that the permutation test is two-tailed (not only for positive correlations)
  • CSCN Lightning talk submission
  • Results
    • Subsampling
    • FDR
    • Data distribution
    • ICC
  • Possible explanations for why we can do better for ADHD > Healthy
    • When you have a neurodevelopmental disorder, it constrains the space that cognition develops
    • Cognition is more fluid in non-adhd individuals
  • Do cross population applied to weights
  • Poster/lightning is uncorrected, but correcting for multiple comparison
  • ICC between ridge and multivariate pls in healthy controls
  • Exhaust all computational reasons before thinking about brain

November 2021

  • Present in both Bobby's and Yalda's lab meetings
  • Consider using classifier or clustering to separate ADHD and healthy populations
  • Complete out-of-sample ICC
  • Check standard scaler normalization
    • Not an error as we're not scaling IQ but the FC values, so we expect an intercept
  • Healthy subjects
    • Continue investigating getting more healthy subjects
    • Check where 6 missing healthy subjects went
    • Ignore or delete the last two bold nii to generate FC
    • Try resetting the origin
  • Cross prediction
    • Exchanging weights between different populations, models, and age bins
    • Only keep what's need for testing (model weights)
    • Rerun the cross-prediction (FSIQ) for ten different adhd group splits
  • Ideas explaining cross-prediction results
    • The healthy group is noiser/more variable so training on it lets it generalize and prevents overfitting.
      • Healthy model = more flexible, this structure is present in adhd g1 and g2, we pull out a main component of clinical brains
      • ADHD model = captures some structure, this is present in adhd and healthy
      • Shows the shared information between two groups
    • ICC: there is some similarity between the groups and that is what allows it to predict between the groups
      • Validation and replication of Zalesky paper
      • We're reaching the limit of what linear models can do
    • We find important differences between age groups, populations, wisc measures, and models

December 2021

  • Start paper draft
    • Drafted "methods" section
  • Cross-prediction
    • Average p-values from regression function for all cross folds (doesn't work because p-value isn't comparable to other p-values)
    • Report three scores: MSE, Pearson r, and p-value
    • Instead of averaging for all cross folds, do permutation test like the within-diagnosis-group analysis (3000 permutations)
    • Only run for significant WISC measures (5 measures)
    • Match testing set size of the cross-groups to match variance seen with the within-group
  • Don't interpret non-significant p-values
  • Filtering for subjects 80 < IQ < 140 changes the final results so not using this range
  • Repeated modeling for final dataset (filtered for > 70 IQ, loses 10 ADHD subjects and 0 healthy subjects)
  • Only do PLS vs Ridge for ADHD and Healthy, then stick with Ridge for the rest of the modeling (cross-prediction, visualization)

January 2022

  • Work out cross-prediction methodology with Bobby
    • Presplit all bins using 10-repeat-10-fold cross validation
    • Get the true r-score for within-bin after training on in-group bin, get the true r-score for between-bin by testing the model on test sets for out-group bins
    • Shuffle/permute the targets for all bins, train on the in-group bin training set, test on the in-group bin testing set and the two out-group testing sets
    • Compare the true r-score to the permutated r-scores within each bin to get the p-value for each r-value
  • Cross-prediction method notes
    • Maintains near-equal testing set size for all bins
    • Achieves same p-values as the built-in scikit-learn permutation_test_score function
    • Returns three r-values and three p-values respectively
    • Only shuffles within-bin, no data is ever mixed between bins
  • Do lit review on these key words: kids, healthy, linear model, cognitions, movie-watching fmri, intelligence
  • Schedule advisory committee meeting
  • Get statistics for the dataset (lowest, highest, mean, std)
  • Experiment with visualizing permutation results
  • Experiment with number of permutations to run (Between 100, 500, and 3000, decided on 500)
  • We've been talking about the age axis, but ignoring the wisc axis
  • Visualize cross-prediction results using heatmaps
  • Submit two abstracts to NRD and WRF
  • No ICC for cross-prediction, the cross-prediction encompasses ICC
  • Do lit review on kid data, healthy, linear model, cognition, movie-watching fmri
  • Don't need lowest and highest for measures

February 2022

  • Word choice: intelligence vs cognition (use cognition)), healthy vs typically developing (use TD)?
    • Change healthy references to TD
  • We leak testing data through the ridge alpha parameter
    • Acceptable leak though, alternative is to get alpha parameter from only training set to prevent leak
  • Need FDR correction with permutation testing?
    • Maybe, sticking with FDR correction
  • Diagnosis cross-prediction only done on one adhd split (t-test for age and IQ, chi-squared for sex)
    • Results don't matter since the ADHD groups have more samples than TD
  • Is mean r what we want? Maybe median is better. Plot the 100 r-values.
    • Send ADHD and healthy fsiq all age ridge
    • Results show that mean and median are nearly identical, doesn't matter which one to use, sticking with mean
  • Maybe we can't predict in healthy because most of the data falls into bin 1
    • Quantify num of individuals in each age bin for TD group
    • Most of the healthy data does fall into bin 1
  • Advisory committee meeting
    • Visualization
      • Reduce down visualization from Power nodes into Power networks (take mean/median of nodes in network)
      • Feature importance (permutation) connections at the network-level, remove blocks of weights (17x17)
      • Gray matter lesion (removing nodes) vs white matter lesion (removing connections)
    • Questions
      • Could I generate a dataset to get this result?
      • Do we have wonky data in TD? (Can try bootstrapping aka averaging TD samples to create new samples)
    • Cross-prediction (CP)
      • Don't split adhd into two groups for CP, is confusing and doesn't match num samples
      • CP is a measure of similarity, but we find asymmetries
      • However, for the symmetries, there's some shared information there
      • Jorn not convinced about CP results
    • To Do
      • Compare hyperparameters between TD and ADHD
      • Next committee meeting May
      • Generate random dataset (but match stats of original data)
      • Compare variance for connections between ADHD and TD
      • CP with matched sample size of ADHD = TD
      • CP with random data
      • Within-group TD with bootstrapping
      • Save visualization stuff for the end
      • Create CP diagram
    • ADHD results are fine, but address TD results
    • Think of a case where we can cross-predict but can't self-predict
    • We hypothesize that ADHD-ADHD, TD-TD, but not ADHD-TD. We think the developmental trajectory between ADHD and TD differ
  • CP with matched sample size of ADHD = TD
    • Remember to ignore results between ADHD groups because they share data (mix between train and test data)
    • Some CP still exists
  • CP with random data
    • Some results are significant, but some aren't
  • Within-group TD with bootstrapping
  • Problems with TD
    • Either the method is bad or the data is bad (check both)
    • Too much noise?
    • Maybe common problem is age (Bin 1, Bin 3, and TD)
    • Maybe common problem is num_samples (Bin 1, Bin 3, and TD)
    • If TD is binned like ADHD, most of the data falls into Bin 1
  • Do pairwise correlations for every TD subject to check how much the dataset varies
    • There is some pattern of similarity in the data
    • If we had noise, we'd expect a distribution that matches noise, which we don't see
  • High-level questions
    • Do the features weights change in development?
    • Tags: ADHD, developmental ADHD, model comparison, cross-prediction
    • Why does the cross-prediction work but not within-prediction?
  • Problems with trying to fix the CP results
    • When do we stop checking? When we find a difference? The difference between the groups is the diagnosis (matched for sample size).
    • We start leaking testing data through us, the experimenter
  • Last week of Feb meeting agenda
    • Conferences: NRD (presentation and questions), WRF (poster), OTU prosem
    • Bootstrapping results (no difference)
    • Visualization
      • Should use all weights
      • Split into pos and neg images
      • Use chord graph (looks too messy)
      • Main issues: too many connections (sum or mean them)
      • Use glass brain with summed nodes (add ROI labels)
      • Maybe don't interpret model weights since they're not reliable
    • Possible TD fixes
      • Search a larger hyperparameter space
      • Do with age binning, but combine bins 2 + 3 to match bin 1 sample numbers
  • For fri, bin 2-bin 1, get the fdr q for that cell (p=0.027944, q=0.027778)

March 2022

  • Use ADHD and TD to predict age and sex
    • Age: TD r2=0.19, ADHD r2=0.38, ADHD r2(n=106)=-0.016, 0.061, 0.438, 0.493, 0.204
    • Sex: TD accuracy=62%, ADHD accuracy=73%, ADHD (n=106 with 100 subsamplings) accuracy=76%
  • Run ADHD to predict age subsampling 100 times
  • For TD, search a larger hyperparameter space
    • Doesn't change significance
  • Run TD with age binning
    • Don't result in any significant results
  • Cluster the data using k-means
    • ADHD
      • For n_clusters = 2 The average silhouette_score is : 0.12292706
      • For n_clusters = 3 The average silhouette_score is : 0.12527709
      • For n_clusters = 4 The average silhouette_score is : 0.06564607
      • For n_clusters = 5 The average silhouette_score is : 0.0434154
      • For n_clusters = 6 The average silhouette_score is : 0.028786479
    • Healthy
      • For n_clusters = 2 The average silhouette_score is : 0.095638305
      • For n_clusters = 3 The average silhouette_score is : 0.033691477
      • For n_clusters = 4 The average silhouette_score is : 0.051742084
      • For n_clusters = 5 The average silhouette_score is : 0.0518758
      • For n_clusters = 6 The average silhouette_score is : 0.0060232934
    • Get mean for each WISC measure, mean age, proportion of sex
  • Run self-correlation with leave-one-out to identify noisy TD samples
    • Run for ADHD age bins
    • Run cross prediction with clean-TD and num-sample-matched ADHD
    • We can predict now but result is biased due to leaking testing data through experimenter
    • Dropping TD results and thus diagnosis cross-prediction (CP)
  • Try two age bins (6-10 (n=180) and 11-16 (n=200))
    • Bin 1 can predict all but WMI and PSI. Bin 2 can predict all but VSI and PSI.
  • Do modeling with sample-size-matched age bins (Bin 2 n=114)
    • No result comes out as significant
  • Do cross-prediction with two age bins
    • Didn't do because dataset will change
  • KMeans
    • Switch to Bayesian Information Criteria (BIC) metric instead of silhouette
    • Hyperparameter search on num_clusters using BIC, results suggest 5 clusters
    • Send mean and standard deviation for all WISC measures
    • Send Bobby WISC measures as clustered by kmeans
  • Check over cross-prediction code
    • Didn't do
  • Visualization
    • Hypothesis driven (grey out regions we don't care about
    • Visualize difference between model weights (all-bin1, all-bin2, bin1-bin2, bin2-bin1, all-td, td-all)
    • Can't have uncertain Power nodes, plot them and try extrapolation
    • Extrapolate Power atlas uncertain nodes by grabbing the closest node's label (works well)

April 2022

  • KMeans clustered one sample as it's own cluster, suggesting something weird. Inspecting that sample's FC lead to that FC being noise so now we have to recheck all of the FCs
  • To do
    • Inspect all functional connectivity matrices
      • Removed seven subjects, including the one singled out by kmeans
    • Remove bad subjects
    • Rerun modeling with three -> two -> one age bins
    • Rerun age cross-prediction (original data and equal sample size)
    • Run PCA (num components = 90% explained variance) -> KMeans -> Send Bobby clustered-cognition vectors in csv
    • Rerun sex prediction with balanced classes (previous run wasn't balanced for sex)
  • See how other papers deal with "uncertain" Power atlas nodes
  • Finalize thesis defense committee and dates (people going on vacation)
  • Present paper at lab meeting (Reproducible brain-wide association studies require thousands of individuals)
  • Notes
    • can we predict sex? yes, can we predict age? yes, can we predict cognition? kind-of
    • If we collapse from three bins -> two bins, is it sample size?
    • ridge: is there anything special by age? is there a developmental difference
    • kmeans: independent of age, confirm brain data link to cognition
  • ADHD redo modeling
    • Remove bad subjects
    • PLS / Ridge All Bins PTS and CV (save weights)
    • ICC between weights
    • Ridge Two Bins
    • Ridge All Equal Bins
    • Age cross-prediction
    • Sex prediction with balanced classes
      • ridge_adhd_sex_all=0.7240967283072547
    • PCA -> KMeans
    • Send Bobby csv (both before and after PCA)
    • Redo dataset statistics (histograms and self-correlation)
    • Cross-prediction with equal bin sizes
  • Removing the noisy subjects removed the small bump in the self-correlation distribution
  • PCA -> KMeans
    • Graph num components vs explained variance
    • Need 145 out of 373 components for 80% explained variance
  • Max-statistic correction
    • https://youtu.be/zg0tmnGGwBI
    • Can get p-value from percentage of true score to combined null distribution
    • Don't need exact p-values, if significant say p < 0.05
  • Simplified cross-prediction
    • Mean one age bin, then correlate every sample from another bin to that mean
    • Results in repo, all correlations are similar
  • Generate WISC distributions for each bin
  • Work on visualization

May 2022

  • Work on Power atlas network-level visualizations
    • Add Python functions to label each connection in the 34,000 vector with two networks
  • Final advisory committee meeting
    • Decide on defense committee
    • Try dropping standard scaling for cross-prediction
    • Got permission to write thesis
  • AC Presentation
    • Null hypothesis is that there's no link between brain data and cognition (randomly shuffle)
    • Decided on a well-motivated sample (ADHD only, ADHD vs control, all together)
      • Initially wanted ADHD vs control but couldn't get model to work for control, so sticking with just ADHD
    • Explore a larger model space
      • KMeans and PCA
      • Show a bad FC subject so had to check all FCs
    • How PLS and Ridge differ
      • No intercept for PLS
      • Training time differs
      • But both are fundamentally regularization models
    • Changes
      • Switch from FDR to maxT (FDR too conservative when used with perm testing)
      • With no TD model, abandon diagnosis CP
    • Discuss age and sex prediction
    • Discuss self-correlation results
    • Discuss rerunning all modeling with no changes to results
    • Discuss sample size power exploration (two bins, matched bin 2)
    • Discuss age cross prediction
    • Discuss visualizations and relating it back to the brain
    • Answer why we can't predict Bin 1 and Bin 3
    • Answer why we can't predict PSI
    • Interpret model weights for the significant results (FSIQ and VSI)
    • Must address
      • Why not TD
      • Why not Bin 3
      • Why not cross prediction
  • Old cross prediction notes
    • Prove to me that the data isn't noise
      • Could I generate a dataset to get this result?
      • Generative model argument (NN for cat, can't predict for cat, but can predict for dogs and fire hydrants)
    • Suppose an adhd person and a td person both like rocks, they would synchronize with each other, but not with others in their group
    • Maybe the mean from each group can be used to predict the out-group, but each person in the group differs to much from the mean
    • It's ok to get between-pred if within-pred works
    • CP works because the model recognizes the link between brain -> cognition
  • To do
    • Create cross-prediction diagram
    • Try cross-prediction without standard scaling (we only scale the train but not the test)
      • Tried but results don't change, same goes for TD data
    • Visualization (Do the features weights change in development?)
  • Complete correlation cross-prediction for within network (FPN, SMH, SAL, VAN, DAN, CON, DMN)
    • High correlations suggest that only brain data is good
    • When adding the cognitive scores is when it falls apart
    • The neural mechanisms seem to be the same, but the link between them and cognition differs
  • Visualization
    • Circular graph with weights per network (mean, sum, sum of absolute value)
    • Difference between model weights (all-bin1, all-bin2, bin1-bin2, bin2-bin1)
    • Merge CG plots into table like r-results
  • Discuss unscaled results (makes results worse, sticking with scaling)

June 2022

  • Project
    • How to reduce down all edges into networks (When to mean? Before filtering or after?)
    • Go over results
      • ICC between model weights of different age bins
      • separate red and blue, use different scales
      • developmental connections (subtract weights only for significant WISC)
      • Bin 3 plot FSIQ, VSI, VCI, PSI
      • ICC between model weights of different age bins
      • code review with Brandon
        • Try using the same CV
    • Circular graph doesn't show within-network connections
      • Use FC matrix
    • Plot WMI for Bin 2 - Bin 1 for comparison
    • Discard non-matched Bin 2 (n=147) results?
      • How to subsample?
      • Further complicates results, either keep original results or spend more time running subsampling results
      • Only did 113 to show it wasn't a power issue, but now we need it for all results. Too much work and issues
    • Plot mean FC per bin
    • Don't need CG legend since nodes have labels (redundant)
    • Redo modeling for missing data
      • PLS and Ridge for Bin 2 Equal (n=113)
      • Redo TD Ridge with proper alpha range
      • Redo age CP with full Bin 2 to get perm scores
  • Thesis/paper
    • Don't want to commit to writing a section if it isn't needed
    • Should have two rounds of revision
    • P-values for ICC table? Just say all are p<.001
    • Show redundant info in the cross-pred purple image by showing within-pred? Sure
    • Use correct binomial dist to get p-value for sex prediction
  • Try ICC on network-level-reduce feature weights
  • Generate FC matrix to complement CG graphs
  • Regenerate CG graphs with set scales (set the scale to be the same for the rows, within each cognition measure)
  • Regenerate CP purple table without Bin 3 and without within-prediction
  • Generate feature weight diff and similar plots
    • Merge using Inkscape
    • Add WMI
    • Save images as svg
    • Threshold plots using top/bottom 10 connections
    • Switch diff plots to absolute diff plots (take absolute value of diff)
    • Similarity plot is just inverse cbar of diff plots (reciprocal doesn't work because of small values, reciprocal of small values gets exploded)
    • Changed titles for top 10 diff -> distinct connectivity profiles, and bottom 10 diff -> shared connectivity profiles
    • Use diff colors for distinct (pink) and shared (lime) profiles

July - August 2022

Stopped work on programming and focused on writing the thesis.

August - September 2022

Worked on thesis defense presentation and successfully defended Oct 18, 2022.