Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add linear model analysis #26

Merged
merged 7 commits into from
Jan 19, 2023
Merged

Add linear model analysis #26

merged 7 commits into from
Jan 19, 2023

Conversation

gwaybio
Copy link
Member

@gwaybio gwaybio commented Jan 19, 2023

Adding CP feature linear model and visualizing results.

Copy link
Member

@jenna-tomkinson jenna-tomkinson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Not many comments on this one as I feel a bit better at interpreting the R files. I have included two small questions.

Two main big questions I have is in the figure for "How features contribute to NF1 genotype and cell density", I was wondering if the negative values on the x-axis would mean that cell density is not contributing or contributing less? And if so, should we look at the features that are on the left of the red line and with a high genotype contribution value?


# Define inputs and outputs
data_dir = pathlib.Path("..", "..", "..", "4_processing_features", "data")
cp_file = pathlib.Path(data_dir, "nf1_sc_norm_cellprofiler.csv.gz")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a reason to why you use commas instead of / or is this just a preference?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think that I am very consistent with this... thanks for pointing it out!

To be fair, the docs is not consistent either https://docs.python.org/3/library/pathlib.html - but it does seem to use / more often, so I will switch and try to be more consistent in the future!

X = cp_df.loc[:, variables]

# Add dummy matrix of categorical genotypes
genotype_x = pd.get_dummies(data=cp_df.Metadata_genotype)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Based on this, can you only make a linear regression model if the metadata is integers (or a categorical) based on the comment? No strings like "WT" or "Null".

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, the sklearn implementation requires numerical data (integers or floats) and not categorical data.

@@ -15,3 +15,5 @@ dependencies:
- conda-forge::seaborn
- conda-forge::umap-learn
- conda-forge::matplotlib
- pip:
- git+https://github.com/cytomining/pycytominer@afac3ea16818ad25f37318ecd5c5090c0eff5806
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@gwaybio

Looks like you need a new line at the end of the file!

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wait now the red thing is gone? Can you double check this please 😄

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

arggg, it got me! 242d3c8

@gwaybio
Copy link
Member Author

gwaybio commented Jan 19, 2023

"How features contribute to NF1 genotype and cell density", I was wondering if the negative values on the x-axis would mean that cell density is not contributing or contributing less? And if so, should we look at the features that are on the left of the red line and with a high genotype contribution value?

This is a very good question. Thank you for asking!

We really should be looking at the absolute value to determine the magnitude of the impact of each variable. However, we can look to the sign to see the direction of the impact.

The way to interpret the beta values is the expected outcome of Y if the predictor X differed by 1 unit in X. In other words, the given CP feature would change by 0.8 (for example) if the genotype shifted from WT to Null. Or conversely, change by -0.8 (for example) if the genotype shifted from Null to WT. (remember the dummy matrix). The magnitude of the cell density contribution is less because the number differences are larger (i.e. to get from 11 cells to 47, multiply the contribution on the x axis by 36. In other words, cell count has a HUGE impact in our dataset, but, mostly for nuclei features, and not so much ER features :D

@gwaybio
Copy link
Member Author

gwaybio commented Jan 19, 2023

The question has caused me to update the plot, see 12a853c

Thanks again!

@gwaybio gwaybio merged commit 596e5f7 into WayScience:main Jan 19, 2023
@gwaybio gwaybio deleted the add-lm branch January 19, 2023 16:44
d33bs added a commit to d33bs/NF1_SchwannCell_data that referenced this pull request Jan 31, 2023
commit fc76764
Merge: 0ba2392 d2b68b7
Author: Jenna Tomkinson <107513215+jenna-tomkinson@users.noreply.github.com>
Date:   Mon Jan 30 09:52:25 2023 -0700

    Merge pull request WayScience#38 from jenna-tomkinson/run_plates_with_cp_pipelines

    Run plates with cp pipelines

commit d2b68b7
Author: jenna-tomkinson <jenna.tomkinson@ucdenver.edu>
Date:   Fri Jan 27 12:56:42 2023 -0700

    edit documentation

commit 68e3ded
Author: jenna-tomkinson <jenna.tomkinson@ucdenver.edu>
Date:   Fri Jan 27 09:38:08 2023 -0700

    rerun plate 2 cellprofiler ic to confirm sc count

commit 62dadfa
Author: jenna-tomkinson <jenna.tomkinson@ucdenver.edu>
Date:   Fri Jan 27 09:19:10 2023 -0700

    fix issues wioth extract sc for plate2

commit b3bce04
Author: jenna-tomkinson <jenna.tomkinson@ucdenver.edu>
Date:   Thu Jan 26 15:37:13 2023 -0700

    reorganize repo and run all plates in cp pipelines

commit 347c191
Author: jenna-tomkinson <jenna.tomkinson@ucdenver.edu>
Date:   Tue Jan 24 15:49:47 2023 -0700

    add documentation and converted notebooks

commit 6b917a4
Author: jenna-tomkinson <jenna.tomkinson@ucdenver.edu>
Date:   Tue Jan 24 13:25:07 2023 -0700

    run all plates with cellprofiler pipelines

commit a5221b0
Author: jenna-tomkinson <jenna.tomkinson@ucdenver.edu>
Date:   Mon Jan 23 15:59:10 2023 -0700

    run plate 1 through cp

commit 0ba2392
Merge: 263a94c 310c12f
Author: Jenna Tomkinson <107513215+jenna-tomkinson@users.noreply.github.com>
Date:   Mon Jan 23 11:00:11 2023 -0700

    Merge pull request WayScience#35 from jenna-tomkinson/plate2_cellprofiler

    run plate 2 through cp and extract sc

commit 310c12f
Author: jenna-tomkinson <jenna.tomkinson@ucdenver.edu>
Date:   Mon Jan 23 09:51:03 2023 -0700

    fix red error symbol issue

commit 9404554
Author: jenna-tomkinson <jenna.tomkinson@ucdenver.edu>
Date:   Mon Jan 23 09:39:28 2023 -0700

    save optimized parameters for actin segmenation

commit ec6eb12
Author: jenna-tomkinson <jenna.tomkinson@ucdenver.edu>
Date:   Mon Jan 23 09:18:30 2023 -0700

    run plate 2 through cp and extract sc

commit 263a94c
Merge: a90feed 933a4b1
Author: Jenna Tomkinson <107513215+jenna-tomkinson@users.noreply.github.com>
Date:   Fri Jan 20 12:00:19 2023 -0700

    Merge pull request WayScience#33 from jenna-tomkinson/plate2_illum_correction

    perform IC on plate 2

commit 933a4b1
Author: jenna-tomkinson <jenna.tomkinson@ucdenver.edu>
Date:   Fri Jan 20 11:22:14 2023 -0700

    perform IC on plate 2

commit a90feed
Author: Gregory Way <gregory.way@gmail.com>
Date:   Fri Jan 20 09:15:18 2023 -0700

    DP feature linear modeling and visualization (WayScience#31)

    * perform linear model with DP features

    * add DP visualization for linear model

    * add DP lm figure and small tweak to CP lm fig

    * remove extra empty cell

    * add DP cyto feature power analysis and viz

    * add title to CP power analysis viz

commit 62170be
Author: Gregory Way <gregory.way@gmail.com>
Date:   Fri Jan 20 08:43:01 2023 -0700

    Perform power analysis for CP features (WayScience#30)

    * perform power analysis

    * visualize power analysis

    * visualize top ER feature from KS test (outside scope of PR, sorry!)

    * fix variable name

commit 49fc39a
Author: Gregory Way <gregory.way@gmail.com>
Date:   Fri Jan 20 08:38:29 2023 -0700

    Create umap figure with ggplot (WayScience#29)

    * create umap figure with ggplot

    * add DP figure

commit 77ca026
Merge: 3a615f4 8da9f1c
Author: Jenna Tomkinson <107513215+jenna-tomkinson@users.noreply.github.com>
Date:   Thu Jan 19 15:36:30 2023 -0700

    Merge pull request WayScience#28 from jenna-tomkinson/nf1_dp_statistics

    DeepProfiler data stat analysis

commit 8da9f1c
Author: jenna-tomkinson <jenna.tomkinson@ucdenver.edu>
Date:   Thu Jan 19 15:35:23 2023 -0700

    remove blank figures and edit code

commit 3a615f4
Author: Gregory Way <gregory.way@gmail.com>
Date:   Thu Jan 19 11:28:54 2023 -0700

    Create ComplexHeatmaps for Pilot Data plate 1 (WayScience#27)

    * add notebook to generate complexheatmaps

    * add complex heatmaps

    * add complexheatmap to env

    * save pngs too

    * add pngs and recreate pdfs

commit 0f2e400
Author: jenna-tomkinson <jenna.tomkinson@ucdenver.edu>
Date:   Thu Jan 19 10:58:15 2023 -0700

    DeepProfiler data stat analysis (heatmap and umap)

commit 596e5f7
Author: Gregory Way <gregory.way@gmail.com>
Date:   Thu Jan 19 09:44:06 2023 -0700

    Add linear model analysis (WayScience#26)

    * add code to fit a linear model

    * add linear model results

    * add notebook to visualize linear model results

    * add pycytominer to analyze conda env

    * actually save the figure

    * update figure to show sign

    * remove red symbol

commit 95ecdb6
Author: Gregory Way <gregory.way@gmail.com>
Date:   Wed Jan 18 17:26:48 2023 -0700

    Visualizing the KS test results (WayScience#25)

    * add figure generaetion environment

    * add ks test visualizations

    * add visualization notebook for ks test results

    * get rid of red symbol EOF

    * nbconvert ipynb

    * nbconverting to r instead of python, good catch!

commit e71bb1f
Merge: 1ae36c3 3d18b66
Author: Jenna Tomkinson <107513215+jenna-tomkinson@users.noreply.github.com>
Date:   Wed Jan 18 08:55:10 2023 -0700

    Merge pull request WayScience#23 from jenna-tomkinson/nf1_dp_normalization

    deepprofiler project processing features

commit 3d18b66
Author: jenna-tomkinson <jenna.tomkinson@ucdenver.edu>
Date:   Wed Jan 18 08:50:36 2023 -0700

    edits and update README

commit 46868d3
Author: jenna-tomkinson <jenna.tomkinson@ucdenver.edu>
Date:   Tue Jan 17 15:53:31 2023 -0700

    fix feature selection

commit 3137e54
Author: jenna-tomkinson <jenna.tomkinson@ucdenver.edu>
Date:   Tue Jan 17 15:33:49 2023 -0700

    deeprofiler project processing features

commit 1ae36c3
Merge: 8da88a4 a52b9c7
Author: Jenna Tomkinson <107513215+jenna-tomkinson@users.noreply.github.com>
Date:   Tue Jan 17 11:27:50 2023 -0700

    Merge pull request WayScience#22 from jenna-tomkinson/nf1_analysis

    heatmap and ks-test analysis

commit a52b9c7
Author: jenna-tomkinson <jenna.tomkinson@ucdenver.edu>
Date:   Tue Jan 17 11:27:14 2023 -0700

    edits and add figures

commit f758340
Author: jenna-tomkinson <jenna.tomkinson@ucdenver.edu>
Date:   Tue Jan 17 09:32:01 2023 -0700

    edits

commit ab68cb7
Author: jenna-tomkinson <jenna.tomkinson@ucdenver.edu>
Date:   Fri Jan 13 15:45:52 2023 -0700

    heatmap and ks-test analysis

commit 8da88a4
Merge: 610854c 283d170
Author: Jenna Tomkinson <107513215+jenna-tomkinson@users.noreply.github.com>
Date:   Wed Jan 11 11:05:31 2023 -0700

    Merge pull request WayScience#21 from jenna-tomkinson/nf1_statistics

    NF1 UMAP Stats

commit 283d170
Author: jenna-tomkinson <jenna.tomkinson@ucdenver.edu>
Date:   Wed Jan 11 11:00:48 2023 -0700

    edit plot of number of sc

commit ec78e8c
Author: jenna-tomkinson <jenna.tomkinson@ucdenver.edu>
Date:   Wed Jan 11 10:10:43 2023 -0700

    improved code to remove extra lines

commit 08bbb0f
Author: jenna-tomkinson <jenna.tomkinson@ucdenver.edu>
Date:   Tue Jan 10 15:54:06 2023 -0700

    complete review edits

commit 74d1917
Author: jenna-tomkinson <jenna.tomkinson@ucdenver.edu>
Date:   Mon Jan 9 15:19:21 2023 -0700

    UMAP notebooks and visualizations

commit 9adc2b4
Author: jenna-tomkinson <jenna.tomkinson@ucdenver.edu>
Date:   Fri Jan 6 15:43:45 2023 -0700

    add notebook for sc counts

commit 5782633
Author: jenna-tomkinson <jenna.tomkinson@ucdenver.edu>
Date:   Fri Jan 6 15:41:31 2023 -0700

    start of NF1 statistics

commit 610854c
Merge: 1e70c20 b7bec10
Author: Jenna Tomkinson <107513215+jenna-tomkinson@users.noreply.github.com>
Date:   Wed Dec 14 14:13:53 2022 -0700

    Merge pull request WayScience#19 from jenna-tomkinson/add_new_dataset

    Add second NF1 dataset and correct the images

commit b7bec10
Author: jenna-tomkinson <jenna.tomkinson@ucdenver.edu>
Date:   Wed Dec 14 14:08:04 2022 -0700

    fix barcode platemap

commit ab702df
Author: jenna-tomkinson <jenna.tomkinson@ucdenver.edu>
Date:   Wed Dec 14 11:09:49 2022 -0700

    update .py file

commit 17fa768
Author: jenna-tomkinson <jenna.tomkinson@ucdenver.edu>
Date:   Wed Dec 14 11:06:52 2022 -0700

    update code and run second plate

commit 8d90300
Author: jenna-tomkinson <jenna.tomkinson@ucdenver.edu>
Date:   Tue Dec 13 14:50:50 2022 -0700

    add updated dataset (from 12/12) and edit code

commit c6bf525
Author: jenna-tomkinson <jenna.tomkinson@ucdenver.edu>
Date:   Mon Dec 12 15:54:05 2022 -0700

    updating code

commit ec3321c
Author: jenna-tomkinson <jenna.tomkinson@ucdenver.edu>
Date:   Fri Dec 9 16:50:49 2022 -0700

    update main README

commit a6b9987
Author: jenna-tomkinson <jenna.tomkinson@ucdenver.edu>
Date:   Fri Dec 9 16:49:38 2022 -0700

    update main README

commit a073b94
Author: jenna-tomkinson <jenna.tomkinson@ucdenver.edu>
Date:   Fri Dec 9 16:40:58 2022 -0700

    edits from review

commit 0d0b2cb
Author: jenna-tomkinson <jenna.tomkinson@ucdenver.edu>
Date:   Fri Dec 9 09:33:59 2022 -0700

    add black format to .py file

commit 23209b0
Author: jenna-tomkinson <jenna.tomkinson@ucdenver.edu>
Date:   Fri Dec 9 09:31:37 2022 -0700

    edit the documentation in the .py file

commit a3cb4b5
Author: jenna-tomkinson <jenna.tomkinson@ucdenver.edu>
Date:   Fri Dec 9 09:06:08 2022 -0700

    edit README

commit 6c878e5
Author: jenna-tomkinson <jenna.tomkinson@ucdenver.edu>
Date:   Fri Dec 9 08:56:38 2022 -0700

    edit README

commit 3a928d2
Author: jenna-tomkinson <jenna.tomkinson@ucdenver.edu>
Date:   Thu Dec 8 18:08:13 2022 -0700

    edit notebook

commit da69b4e
Author: jenna-tomkinson <jenna.tomkinson@ucdenver.edu>
Date:   Thu Dec 8 18:02:36 2022 -0700

    small edits to .py file

commit ed150bc
Author: jenna-tomkinson <jenna.tomkinson@ucdenver.edu>
Date:   Thu Dec 8 18:01:48 2022 -0700

    corrected metadata and finished code

commit 8e855e0
Author: jenna-tomkinson <jenna.tomkinson@ucdenver.edu>
Date:   Thu Dec 8 17:44:04 2022 -0700

    update data and reordered the metadata

commit 6fd34de
Author: jenna-tomkinson <jenna.tomkinson@ucdenver.edu>
Date:   Tue Dec 6 14:27:31 2022 -0700

    add second plate

commit 1e70c20
Merge: 5536d86 861d71b
Author: Jenna Tomkinson <107513215+jenna-tomkinson@users.noreply.github.com>
Date:   Mon Nov 28 13:06:10 2022 -0700

    Merge pull request WayScience#17 from jenna-tomkinson/edit_features_module

    Edit 4_processing_features module

commit 861d71b
Author: jenna-tomkinson <jenna.tomkinson@ucdenver.edu>
Date:   Mon Nov 28 13:04:49 2022 -0700

    edits

commit 1c8d46d
Author: jenna-tomkinson <jenna.tomkinson@ucdenver.edu>
Date:   Mon Nov 28 12:51:42 2022 -0700

    edits

commit be76e1b
Author: jenna-tomkinson <jenna.tomkinson@ucdenver.edu>
Date:   Mon Nov 28 12:50:06 2022 -0700

    edited documentation

commit cb7f39f
Author: jenna-tomkinson <jenna.tomkinson@ucdenver.edu>
Date:   Mon Nov 28 09:45:35 2022 -0700

    edited instructions and added file

commit 39f1bb0
Author: jenna-tomkinson <jenna.tomkinson@ucdenver.edu>
Date:   Mon Nov 14 14:32:25 2022 -0700

    correct metadata to rerun dp module

commit df7cedd
Author: jenna-tomkinson <jenna.tomkinson@ucdenver.edu>
Date:   Mon Nov 14 13:54:50 2022 -0700

    updated main README

commit 7c26bd6
Author: jenna-tomkinson <jenna.tomkinson@ucdenver.edu>
Date:   Mon Nov 14 13:51:13 2022 -0700

    edit data and create readme
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants