Skip to content

pig cisTarget database issues! #16

@Samaneh14

Description

@Samaneh14

Hi,

Regarding the @tropfenameimer comment on the build of Gallus gallus cisTarget database issue #4, I tried to take similar steps for building pig cisTarget database.

First, I got the regulatory fasta file through Ensembl/BioMart
Ensembl_biomart
Then, I made the TF motifs (JASPAR2022_CORE_vertebrates_non-redundant_pfms_jaspar.txt) in clusterbuster format and run the create_cistarget_motif_databases.py by the inputs. However, the job stopped with
Traceback (most recent call last): File "~/stg_000??/SCENIC/create_cisTarget_databases/create_cistarget_motif_databases.py", line 504, in <module> main() File "~/stg_000??/SCENIC/create_cisTarget_databases/create_cistarget_motif_databases.py", line 289, in main region_or_gene_ids = RegionOrGeneIDs.get_region_or_gene_ids_from_fasta( File "~/stg_000??/SCENIC/create_cisTarget_databases/cistarget_db.py", line 150, in get_region_or_gene_ids_from_fasta region_id = line[1:].split(maxsplit=1)[0] IndexError: list index out of range

I don't know the reason for the upper errors!!! Would you please help me to fix the issue?

and also the last error
ValueError: Error: region ID "SEPTIN3|ENSSSCG00000000040" is not unique in FASTA file

I prepared the fasta file by both gene name and gene ensemble ID, because some genes are not annotated in pig and just have the ID. So, just to unify the gene names as the input for SCENIC in where Gene ID is considered when the gene name is absent, I used both gene name and gene ID. However, by this most of the headers have both of them. How can I remove part of the headers to have just gene name in case of both gene name and gene ID.

Thanks for your time and help.

Best regards
Samaneh

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions