-
Notifications
You must be signed in to change notification settings - Fork 7
Description
Hi,
Regarding the @tropfenameimer comment on the build of Gallus gallus cisTarget database issue #4, I tried to take similar steps for building pig cisTarget database.
First, I got the regulatory fasta file through Ensembl/BioMart
Then, I made the TF motifs (JASPAR2022_CORE_vertebrates_non-redundant_pfms_jaspar.txt) in clusterbuster format and run the create_cistarget_motif_databases.py by the inputs. However, the job stopped with
Traceback (most recent call last): File "~/stg_000??/SCENIC/create_cisTarget_databases/create_cistarget_motif_databases.py", line 504, in <module> main() File "~/stg_000??/SCENIC/create_cisTarget_databases/create_cistarget_motif_databases.py", line 289, in main region_or_gene_ids = RegionOrGeneIDs.get_region_or_gene_ids_from_fasta( File "~/stg_000??/SCENIC/create_cisTarget_databases/cistarget_db.py", line 150, in get_region_or_gene_ids_from_fasta region_id = line[1:].split(maxsplit=1)[0] IndexError: list index out of range
I don't know the reason for the upper errors!!! Would you please help me to fix the issue?
and also the last error
ValueError: Error: region ID "SEPTIN3|ENSSSCG00000000040" is not unique in FASTA file
I prepared the fasta file by both gene name and gene ensemble ID, because some genes are not annotated in pig and just have the ID. So, just to unify the gene names as the input for SCENIC in where Gene ID is considered when the gene name is absent, I used both gene name and gene ID. However, by this most of the headers have both of them. How can I remove part of the headers to have just gene name in case of both gene name and gene ID.
Thanks for your time and help.
Best regards
Samaneh