Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add an AnnData experiment to our local SCXA #340

Open
3 tasks done
ke4 opened this issue Apr 24, 2023 · 14 comments
Open
3 tasks done

Add an AnnData experiment to our local SCXA #340

ke4 opened this issue Apr 24, 2023 · 14 comments
Assignees
Labels
high priority Do this first, then the rest

Comments

@ke4
Copy link
Contributor

ke4 commented Apr 24, 2023

  • Karoly
  • Upendra
  • Lingyun

After a discussion with Pedro the suitable anndata experiment for our local environment is with the accession: E-ANND-3.
It can be found here: /nfs/production/irene/ma/anndata-ingest/datasets/tabula_sapiens/E-ANND-3/*

Steps:

  1. Go to a folder in your local machine where you would like to download the file bundle
  2. Download the experiment file bundles to the local machine: scp -r codon-login:/nfs/production/irene/ma/sc_experiments/E-ANND-3 . You have to do this unfortunately, as these experiments are not available yet on our FTP. This download should be less than 1 hour.
  3. As the idf file is missing from that file bundle for now (@pmb59 said that they are going to fix this), you can download it from here: https://gitlab.ebi.ac.uk/ebi-gene-expression/scxa-metadata/-/blob/feature/add_E-ANND-3/ANND/E-ANND-3/E-ANND-3.idf.txt. Please put it into the root folder of the experiment (E-ANND-3).
  4. Rename the umap.tsv to E-ANND-3.umap.tsv.
  5. Create a temp container with mounting the PostgreSQL data volume: docker container create --name pgvol -v scxa_atlas-data-exp:/atlas-data/exp ubuntu:jammy
  6. Copy the file bundles into the PostgreSQL volume: docker cp E-ANND-3 pgvol:/atlas-data/exp/magetab/
  7. Add E-ANND-3 to your test-data.env file under docker/prepare-dev-environment folder
  8. Run the Postgres step: ./docker/prepare-dev-environment/postgres/run.sh -r -l pg-anndata.log
  9. Execute the PostgreSQL step to add the experiment's data to the DB: SCHEMA_VERSION=latest \ docker compose --env-file ./docker/dev.env \-f ./docker/docker-compose-postgres.yml \ up
  10. To add the experiment's metadata execute the Solr step: ./docker/prepare-dev-environment/solr/run.sh -r -l solr.log
@ke4
Copy link
Contributor Author

ke4 commented Apr 24, 2023

I started to work on this ticket. I started to update its description regarding the dataset.
I am going to update this ticket as I go along with this experiment to load our local environment.

@ke4
Copy link
Contributor Author

ke4 commented Apr 24, 2023

At my 1st run of the PostgreSQL step I had a couple of missing files error:

2023-04-24 13:06:54.176  INFO 154 --- [           main] .a.e.a.c.e.CreateUpdateExperimentCommand : Starting loading/updating experiments:
2023-04-24 13:06:54.177  INFO 154 --- [           main] .a.e.a.c.e.CreateUpdateExperimentCommand : Loading E-ANND-3
2023-04-24 13:06:55.543 ERROR 154 --- [           main] .a.e.a.c.e.CreateUpdateExperimentCommand : Could not load E-ANND-3 due to java.nio.file.NoSuchFileException: /atlas-data/scxa/magetab/E-ANND-3/E-ANND-3.idf.txt
2023-04-24 13:06:55.543  WARN 154 --- [           main] u.a.e.a.cli.AbstractPerAccessionCommand  : 1 experiments failed

and also these lines of errors:

E-ANND-3: Matrix file /atlas-data/scxa/magetab/E-ANND-3/E-ANND-3.aggregated_filtered_normalised_counts.mtx.gz missing, exiting.
ls: cannot access '/atlas-data/scxa/magetab/E-ANND-3/E-ANND-3.tsne*.tsv': No such file or directory
ls: cannot access '/atlas-data/scxa/magetab/E-ANND-3/E-ANND-3.umap*.tsv': No such file or directory
[04/24/2023 13:06:56]      Clusters: Create data file for E-ANND-3...
Error in fread(opt$clusters_path, header = TRUE, check.names = FALSE,  : 
  File '/atlas-data/scxa/magetab/E-ANND-3/E-ANND-3.clusters.tsv' does not exist or is non-readable. getwd()=='/root/db-scxa/bin'
Execution halted
scxa-postgres:5432 - accepting connections
ls: cannot access '/atlas-data/scxa/magetab/E-ANND-3/E-ANND-3.marker_genes_*.tsv': No such file or directory

@ke4
Copy link
Contributor Author

ke4 commented Apr 24, 2023

@pmb59 told me that for now we can download the missing idf file from this gitlab repo:

https://gitlab.ebi.ac.uk/ebi-gene-expression/scxa-metadata/-/blob/feature/add_E-ANND-3/ANND/E-ANND-3/E-ANND-3.idf.txt

I discussed it with him that it is in a feature branch and it should be merged ASAP into the repo's main branch (master, I think)

@ke4
Copy link
Contributor Author

ke4 commented Apr 24, 2023

We still have these missing files according to the log:

  1. ...normalised_counts.mtx.gz --> Not resolved yet!!!
  2. ...tsne*.tsv --> this specific experiment file bundle does not have tsne file(s)
  3. ...umap*.tsv --> the bundle has a umap.tsv, we assume with @pmb59 that we have to rename it to E-ANND-3.umap.tsv
  4. ...clusters.tsv --> the bundle has a clusters_for_bundle.txt file, I assume that we have to rename it to E-ANND-3.clusters.tsv
  5. ...marker_genes_*.tsv --> Not resolved yet!!!

This is the current state. I still need more info from the curators/bioinformaticians...

@ke4
Copy link
Contributor Author

ke4 commented May 3, 2023

We have the files ready at this location if you join codon:

/nfs/production/irene/ma/sc_experiments_failed/E-ANND-2

but E-ANND-2 is too big to be able to use it for local dev environment.

Now we are waiting for @pmb59 and/or @irisdianauy to fix the files for E-ANND-2. That is currently the smallest dataset from anndata experiments.

@ke4
Copy link
Contributor Author

ke4 commented May 3, 2023

The cell counts in the various anndata experiments:

  • E-ANND-1: 584 884
  • E-ANND-2: 209 126
  • E-ANND-3: 27 051
  • E-ANND-4: 456 101

@ke4
Copy link
Contributor Author

ke4 commented May 15, 2023

I am going to move this task to the next sprint as I really hope that we are going to get the relevant files from the data prod / curation team in the next 2 weeks.

@ke4
Copy link
Contributor Author

ke4 commented Jun 20, 2023

I tried to load E-ANND-3 experiment locally by using the files from /nfs/production/irene/ma/sc_experiments/E-ANND-3 . It looks like the E-ANND-3.clusters.tsv is not containing the correct data.

I got this error:

[06/20/2023 11:21:10]   Copying cell groups data to the db...
ERROR:  null value in column "value" violates not-null constraint
DETAIL:  Failing row contains (1966, E-ANND-3, 149, null).
CONTEXT:  COPY scxa_cell_group, line 130: "E-ANND-3|149|"
Cell groups  write failed

After @alfonsomunozpomer helped me investigating this error, it looks like that the 3rd column should be numerical in E-ANND-3.clusters.tsv file as in the other experiments, but it is containing this value: type i pneumocyte.

@ke4
Copy link
Contributor Author

ke4 commented Jun 20, 2023

I asked @YalanBi and @irisdianauy to look into this issue.

@ke4
Copy link
Contributor Author

ke4 commented Jul 18, 2023

Data production team need to reanalyse E-ANND-3 and E-ANND-4 experiments. Currently there is a problem with the cluster text file.

@ke4 ke4 added the high priority Do this first, then the rest label Aug 1, 2023
@ke4
Copy link
Contributor Author

ke4 commented Oct 12, 2023

@irisdianauy notified me that the files for the test loading of E-ANND-3 are available now in /nfs/production/irene/ma/sc_experiments/E-ANND-3.

@ke4
Copy link
Contributor Author

ke4 commented Oct 19, 2023

I successfully loaded E-ANND-3 into my dev env, but in the web app the result is not coming up as there is no data for plotTypesAndOptions in the JSON content on the HTML code.
I did some investigation and there is no data in the scxa_dimension_reduction table.
I also looked at the DB loading log : ls: cannot access '/atlas-data/exp/magetab/E-ANND-3/E-ANND-3.umap*.tsv': No such file or directory.
After a discussion with Iris it looks like there should be a umap related data file, but for some reason it is not in the experiment's folder.
Iris is investigating relating this bug.

@ke4
Copy link
Contributor Author

ke4 commented Nov 29, 2023

@irisdianauy Fixed the above mentioned UMAP data file problem and that file is already provided with the other data files.
Loading E-ANND-3 experiment to my local environment was successful.
Please follow the steps in the ticket's description.

@upendrakumbham
Copy link
Contributor

Hi @ke4, After loading Anndata experiment ('E-ANND-3') into my local DB. I want to highlight a few corrections to the above steps.

  • We missed to add this step - ./docker/prepare-dev-environment/postgres/run.sh -r -l pg-anndata.log # anndata support (This takes care of the DB loading part)

  • Directory mount is wrong atlas-data/exp instead of atlas-data/scxa

Please let me know if you want me to update these steps.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
high priority Do this first, then the rest
Projects
None yet
Development

No branches or pull requests

5 participants