# Introduction

In this tutorail we will see how to perform correspondence analysis with CodonU

In [1]:
from CodonU import correspondence_analysis as CA

# Setting file path

In [2]:
in_file = 'Nucleotide/Staphylococcus_aureus.fasta'

# Building Contingency Tables
In this section we will see how to build contingency tables for codon count, codon RSCU and aa count

## Codon Frequency

In [3]:
codon_count_cont = CA.build_contingency_table_codon_count(
    handle=in_file,
    genetic_table_num=11
)

You can also give desired values for `min_len_threshold`, `save_file`, `file_name`, `folder_path`.
If you save the file, it will saved in `.xlsx` format.

In [7]:
print(codon_count_cont.head())

                                                   TTT TTC TTA TTG TCT TCC  \
AP009351.1|Staphylococcus aureus subsp. aureus ...  13   8  25   3   5   0   
AP009351.1|Staphylococcus aureus subsp. aureus ...   9   8  27   4  11   1   
AP009351.1|Staphylococcus aureus subsp. aureus ...   9   1  26   7   5   1   
AP009351.1|Staphylococcus aureus subsp. aureus ...  10   9  34   8  12   1   
AP009351.1|Staphylococcus aureus subsp. aureus ...   9  12  49  11   8   0   

                                                   TCA TCG TAT TAC  ... GCA  \
AP009351.1|Staphylococcus aureus subsp. aureus ...   8   3  11   6  ...  13   
AP009351.1|Staphylococcus aureus subsp. aureus ...  11   2   6   1  ...   5   
AP009351.1|Staphylococcus aureus subsp. aureus ...   1   3  12   3  ...   5   
AP009351.1|Staphylococcus aureus subsp. aureus ...   8   2  23   6  ...  23   
AP009351.1|Staphylococcus aureus subsp. aureus ...  25   1  16   5  ...  28   

                                                   GCG G

## Codon RSCU

In [8]:
codon_rscu_cont = CA.build_contingency_table_codon_rscu(in_file, 11)
print(codon_rscu_cont.head())

                                                         TTT       TTC  \
AP009351.1|Staphylococcus aureus subsp. aureus ...  1.238095  0.761905   
AP009351.1|Staphylococcus aureus subsp. aureus ...  1.058824  0.941176   
AP009351.1|Staphylococcus aureus subsp. aureus ...       1.8       0.2   
AP009351.1|Staphylococcus aureus subsp. aureus ...  1.052632  0.947368   
AP009351.1|Staphylococcus aureus subsp. aureus ...  0.857143  1.142857   

                                                         TTA       TTG  \
AP009351.1|Staphylococcus aureus subsp. aureus ...  4.054054  0.486486   
AP009351.1|Staphylococcus aureus subsp. aureus ...      4.32      0.64   
AP009351.1|Staphylococcus aureus subsp. aureus ...  3.319149  0.893617   
AP009351.1|Staphylococcus aureus subsp. aureus ...  3.813084  0.897196   
AP009351.1|Staphylococcus aureus subsp. aureus ...   3.62963  0.814815   

                                                         TCT       TCC  \
AP009351.1|Staphylococcus aureus sub

## AA Frequency

In [9]:
aa_count_cont = CA.build_contingency_table_aa_count(
    handle='Protein/Staphylococcus_aureus.fasta',
    genetic_table_num=11
)
print(aa_count_cont.head())

                                                     K   N   T   R   S   I  \
BAF66273.1 chromosomal replication initiator pr...  34  26  30  19  23  45   
BAF66274.1 DNA polymerase III beta subunit          25  22  27  12  29  36   
BAF66275.1 DNA replication and repair protein RecF  26  24  21  17  20  29   
BAF66276.1 DNA gyrase B subunit                     42  32  38  39  30  44   
BAF66277.1 DNA gyrase A subunit                     40  47  53  75  52  70   

                                                     M   Q   H   P   L   E  \
BAF66273.1 chromosomal replication initiator pr...   4  22  12  20  36  47   
BAF66274.1 DNA polymerase III beta subunit           6  11   5  16  37  30   
BAF66275.1 DNA replication and repair protein RecF  11  26  11   8  47  27   
BAF66276.1 DNA gyrase B subunit                     13  25  16  17  53  57   
BAF66277.1 DNA gyrase A subunit                     22  32  14  25  80  84   

                                                     D   A   G

# Performing Correspondence Analysis (COA)

## Codon Frequency

In [11]:
ca_codon__count_obj = CA.ca_codon(
    contingency_table=codon_count_cont,
    n_components=20,
)

          eigenvalue % of variance % of variance (cumulative)
component                                                    
0              0.036        10.74%                     10.74%
1              0.031         9.30%                     20.04%
2              0.017         5.10%                     25.14%
3              0.015         4.56%                     29.69%
4              0.011         3.43%                     33.12%
5              0.011         3.16%                     36.28%
6              0.009         2.66%                     38.94%
7              0.008         2.33%                     41.27%
8              0.008         2.32%                     43.59%
9              0.007         2.15%                     45.74%
10             0.006         1.78%                     47.52%
11             0.006         1.70%                     49.22%
12             0.006         1.68%                     50.90%
13             0.005         1.53%                     52.43%
14      

The returned object is of type `CA`. You can find more about this object and its methods [here](https://maxhalford.github.io/prince/ca/).

Also the function has a `save_file` option. If you provide `True` to that, the above seen results will be saved.

## Codon RSCU

In [12]:
ca_codon_rscu_obj = CA.ca_codon(contingency_table=codon_rscu_cont)

          eigenvalue % of variance % of variance (cumulative)
component                                                    
0              0.025        12.98%                     12.98%
1              0.015         7.86%                     20.83%
2              0.011         5.66%                     26.50%
3              0.008         3.93%                     30.42%
4              0.007         3.65%                     34.07%
5              0.006         3.24%                     37.31%
6              0.006         3.16%                     40.47%
7              0.006         3.01%                     43.48%
8              0.005         2.80%                     46.28%
9              0.005         2.77%                     49.05%
10             0.005         2.69%                     51.74%
11             0.005         2.47%                     54.21%
12             0.005         2.40%                     56.61%
13             0.005         2.33%                     58.94%
14      

## AA Frequency

In [13]:
ca_aa_count_obj = CA.ca_aa(contingency_table=aa_count_cont)

          eigenvalue % of variance % of variance (cumulative)
component                                                    
0              0.033        21.48%                     21.48%
1              0.018        12.07%                     33.55%
2              0.015        10.18%                     43.73%
3              0.010         6.86%                     50.59%
4              0.009         5.98%                     56.57%
5              0.009         5.91%                     62.48%
6              0.006         4.27%                     66.76%
7              0.006         3.69%                     70.45%
8              0.005         3.45%                     73.89%
9              0.005         3.22%                     77.11%
10             0.005         3.07%                     80.18%
11             0.004         2.91%                     83.10%
12             0.004         2.68%                     85.78%
13             0.004         2.63%                     88.41%
14      