Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Data posted on AWS Open Data is inconsistent #10

Open
shntnu opened this issue Mar 12, 2020 · 4 comments
Open

Data posted on AWS Open Data is inconsistent #10

shntnu opened this issue Mar 12, 2020 · 4 comments
Assignees

Comments

@shntnu
Copy link
Contributor

shntnu commented Mar 12, 2020

Screen Shot 2020-03-12 at 7 45 48 PM

There are new folders called profiles_cp and they were presumably previously names profiles. Its possible this was done when we were reprocessing the data.

@jccaicedo Any clue what might have happened here?

Only Bioactives-BBBC022-Gustafsdottir has both profiles_cp and profiles. The others have only profiles_cp.

@shntnu
Copy link
Contributor Author

shntnu commented Mar 13, 2020

There's also something strange about the aggregated profiles in profiles and profiles_cp – they don't match up

Bioactives-BBBC022-Gustafsdottir$ pwd
/Users/shsingh/work/projects/2018_cytodata/datasets/Bioactives-BBBC022-Gustafsdottir
Bioactives-BBBC022-Gustafsdottir$ csvcut -c1-3 profiles/Bioactives-BBBC022-Gustafsdottir/20585/20585.csv|head
Image_Metadata_Plate,Image_Metadata_Well,Cells_AreaShape_Area
20585,A01,3878.0
20585,A02,3552.0
20585,A03,4059.0
20585,A04,3005.5
20585,A05,3549.0
20585,A06,3904.0
20585,A07,3928.0
20585,A08,4541.0
20585,A09,4039.0
Bioactives-BBBC022-Gustafsdottir$ csvcut -c1-3 profiles_cp/20585/20585.csv|head
Metadata_Plate,Metadata_Well,Cells_AreaShape_Area
20585,A01,2684.5
20585,A02,1959.5
20585,A03,2393
20585,A04,2076
20585,A05,2257
20585,A06,2167
20585,A07,2080
20585,A08,2751.5
20585,A09,1965

@shntnu
Copy link
Contributor Author

shntnu commented Mar 13, 2020

I compared these with data from s3://cellpainting-datasets/Bioactives-BBBC022-Gustafsdottir/workspace/backend/BBBC022_2013/

$ csvcut -c Metadata_Plate,Metadata_Well,Cells_AreaShape_Area 20585_augmented.csv|head
Metadata_Plate,Metadata_Well,Cells_AreaShape_Area
20585,A01,3878.0
20585,A02,3552.0
20585,A03,4059.0
20585,A04,3005.5
20585,A05,3549.0
20585,A06,3904.0
20585,A07,3928.0
20585,A08,4541.0
20585,A09,4039.0

which matches with the CSV in profiles

@shntnu shntnu changed the title Profiles have been moved around - figure out why Data posted on AWS Open Data is inconsistent Mar 13, 2020
@shntnu
Copy link
Contributor Author

shntnu commented Mar 14, 2020

@cells2numbers You are our best bet for figuring this out :) Do you recollect renaming the profiles folder to profiles_cp? Any clues you might have will be super useful. Note that its not just the renaming but that one of the datasets Bioactives-BBBC022-Gustafsdottir has what I think are the "right" profiles in profiles but not in profiles_cp

@ErinWeisbart
Copy link
Contributor

I just bumped into this getting ready to copy select data to cellpainting-gallery.
Based on this thread I'm going to:

  • copy s3://cytodata/datasets/LUAD-BBBC043-Caicedo/profiles_cp/ to s3://cellpainting-gallery/cpg0031-caicedo-cmvip/broad/workspace/profiles/
  • copy s3://cytodata/datasets/Bioactives-BBBC022-Gustafsdottir/profiles/ to s3://cellpainting-gallery/cpg0030-gustafsdottir-cellpainting/broad/workspace/profiles/

unless you think I should do anything differently @shntnu ?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants