Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

LoadData CSV File Missing #162

Open
jenna-tomkinson opened this issue Dec 16, 2022 · 1 comment
Open

LoadData CSV File Missing #162

jenna-tomkinson opened this issue Dec 16, 2022 · 1 comment

Comments

@jenna-tomkinson
Copy link

Hello!

I will be rerunning the Cell Health dataset that was run with CellProfiler 3.0 (I assume) using the latest CellProfiler 4.2.4 version for a project to assess any differences.

As I am trying to use the LoadData module, I noticed it was struggling with being able to select metadata for grouping. I attempted to use both .csv files located in the 0.download_data/IDR folder, but neither of them worked.

I figured out this was due to the files not containing columns with metadata (e.g. Metadata_Plate, etc.).

Do you happen to have the exact .csv file that you used to run Cell Health with CellProfiler 3.0? If you did, that would greatly help my ability to reproduce the results with the newer version.

Thank you!

@shntnu
Copy link
Collaborator

shntnu commented Dec 19, 2022

I found this on my laptop!

Archive.zip

But they are also on our S3 bucket; see details below


It's worth @gwaybio downloading the files listed below (using his AWS account) and checking if they are the same as the archive. The S3 version is more reliable

aws s3 ls --profile imaging-amazon --recursive s3://imaging-platform/projects/2015_07_01_Cell_Health_Vazquez_Cancer_Broad/workspace/load_data_csv/ |grep load_data_with_illum.csv
2020-03-05 10:02:10    6421131 projects/2015_07_01_Cell_Health_Vazquez_Cancer_Broad/workspace/load_data_csv/CRISPR_PILOT_B1/SQ00014610/load_data_with_illum.csv
2020-03-05 10:02:05    6422953 projects/2015_07_01_Cell_Health_Vazquez_Cancer_Broad/workspace/load_data_csv/CRISPR_PILOT_B1/SQ00014611/load_data_with_illum.csv
2020-03-05 10:02:11    6422935 projects/2015_07_01_Cell_Health_Vazquez_Cancer_Broad/workspace/load_data_csv/CRISPR_PILOT_B1/SQ00014612/load_data_with_illum.csv
2020-03-05 10:02:12    6422966 projects/2015_07_01_Cell_Health_Vazquez_Cancer_Broad/workspace/load_data_csv/CRISPR_PILOT_B1/SQ00014613/load_data_with_illum.csv
2020-03-05 10:02:13    6422958 projects/2015_07_01_Cell_Health_Vazquez_Cancer_Broad/workspace/load_data_csv/CRISPR_PILOT_B1/SQ00014614/load_data_with_illum.csv
2020-03-05 10:02:09    6422918 projects/2015_07_01_Cell_Health_Vazquez_Cancer_Broad/workspace/load_data_csv/CRISPR_PILOT_B1/SQ00014615/load_data_with_illum.csv
2020-03-05 10:02:07    6422919 projects/2015_07_01_Cell_Health_Vazquez_Cancer_Broad/workspace/load_data_csv/CRISPR_PILOT_B1/SQ00014616/load_data_with_illum.csv
2020-03-05 10:02:08    6422929 projects/2015_07_01_Cell_Health_Vazquez_Cancer_Broad/workspace/load_data_csv/CRISPR_PILOT_B1/SQ00014617/load_data_with_illum.csv
2020-03-05 10:02:06    6423015 projects/2015_07_01_Cell_Health_Vazquez_Cancer_Broad/workspace/load_data_csv/CRISPR_PILOT_B1/SQ00014618/load_data_with_illum.csv

Wait ~12h for this to be restored:

aws s3 ls --profile imaging-amazon --recursive s3://imaging-platform/projects/2015_07_01_Cell_Health_Vazquez_Cancer_Broad/workspace/load_data_csv/ |grep load_data_with_illum.csv|tr -s ' '|cut -d" " -f4 > /tmp/load_data_files.txt
parallel -a /tmp/load_data_files.txt aws s3api --profile imaging-amazon restore-object --bucket imaging-platform --key {} --restore-request GlacierJobParameters={"Tier"="Standard"}

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants