You can read FASTA files using the ``read_fasta()`` function:

In [1]:
import aaanalysis as aa
file_path = "data/example_FASTA.fasta"
df_seq = aa.read_fasta(file_path)
aa.display_df(df_seq, n_rows=4)

Unnamed: 0,entry,sequence
1,"SEMA4A,38.4",LAAQQSYWPHFVTVT...IILVASPLRALRARG
2,"SEMA4B,47.0",WGADRSYWKEFLVMC...LFLLYRHRNSMKVFL
3,"SEMA4C,86.6",EARAPLENLGLVWLA...LLLVLSLRRRLREEL
4,"SEMA4D,19.1",TMYLKSSDNRLLMSL...FFYNCYKGYLPRQCL


To adjust the names of the columns for the primary FASTA file information, use the ``col_id`` and ``col_seq`` parameters:

In [2]:
df_seq = aa.read_fasta(file_path, col_id="ENTRY", col_seq="SEQUENCE")
aa.display_df(df_seq, n_rows=4)

Unnamed: 0,ENTRY,SEQUENCE
1,"SEMA4A,38.4",LAAQQSYWPHFVTVT...IILVASPLRALRARG
2,"SEMA4B,47.0",WGADRSYWKEFLVMC...LFLLYRHRNSMKVFL
3,"SEMA4C,86.6",EARAPLENLGLVWLA...LLLVLSLRRRLREEL
4,"SEMA4D,19.1",TMYLKSSDNRLLMSL...FFYNCYKGYLPRQCL


The ``col_id`` column should only contain the unique identifier. If the FASTA file comprises additional information, use the ``sep`` (default='|') argument to save them in additional columns, named ``info1`` to ``info(n)``:

In [3]:
df_seq = aa.read_fasta(file_path, sep=",")
aa.display_df(df_seq, n_rows=4)

Unnamed: 0,entry,sequence,info1
1,SEMA4A,LAAQQSYWPHFVTVT...IILVASPLRALRARG,38.4
2,SEMA4B,WGADRSYWKEFLVMC...LFLLYRHRNSMKVFL,47.0
3,SEMA4C,EARAPLENLGLVWLA...LLLVLSLRRRLREEL,86.6
4,SEMA4D,TMYLKSSDNRLLMSL...FFYNCYKGYLPRQCL,19.1


To adjust the name of the additional columns, provide a list of column names by ``cols_info``:

In [4]:
df_seq = aa.read_fasta(file_path, sep=",", cols_info=["prediction"])
aa.display_df(df_seq, n_rows=4)

Unnamed: 0,entry,sequence,prediction
1,SEMA4A,LAAQQSYWPHFVTVT...IILVASPLRALRARG,38.4
2,SEMA4B,WGADRSYWKEFLVMC...LFLLYRHRNSMKVFL,47.0
3,SEMA4C,EARAPLENLGLVWLA...LLLVLSLRRRLREEL,86.6
4,SEMA4D,TMYLKSSDNRLLMSL...FFYNCYKGYLPRQCL,19.1


The headers of FASTA files can start with a database abbreviation (e.g., 'sp' for Swiss-Prot). To properly convert these into a database column, provide a name to the ``col_db`` parameter:

In [5]:
file_path = "data/example_FASTA_db.fasta"
df_seq = aa.read_fasta(file_path, col_db="database", sep=",")
aa.display_df(df_seq, n_rows=4)

Unnamed: 0,entry,sequence,database,info1
1,SEMA4A,LAAQQSYWPHFVTVT...IILVASPLRALRARG,sp,38.4
2,SEMA4B,WGADRSYWKEFLVMC...LFLLYRHRNSMKVFL,sp,47.0
3,SEMA4C,EARAPLENLGLVWLA...LLLVLSLRRRLREEL,sp,86.6
4,SEMA4D,TMYLKSSDNRLLMSL...FFYNCYKGYLPRQCL,sp,19.1
