# Calculating Relative Abundance and Absolute Abundance
Author: Lindsay Hopson<br>Date: April 30th 2021<br>Written in: Jupyter Notebook<br>Availability: https://github.com/GW-HIVE/microbiome


### Objective 

The goal of this code is to calculate the relative and absolute abundance of bacteria using the number of 'hits' produced from Hexagon computations. When  the Hexagon output CSV files are downloaded from Hive to a local computer, change the name of each CSV file to match the sample the Hexagon computation was performed on. On the local computer, save all the Hexagon output CSV files in a folder also containing this code. This code does the following tasks: 
1. Parses through each Hexagon computation output csv file saved 
2. Calculates the absolute and relative abundance (rounded to the 8th decimal place) of reach organism present in the file 
3. Writes the absolute and relalative abundance column into the original CSV file saved in the current directory

#### Important Note: <br> <p style="color:red;">Before running this code, save a copy of all the original Hexagon output CSV files into a seperate directory on your computer. 

### Implementation

In [None]:
import glob 
import pandas as pd
import numpy as np

    
files = glob.glob('*.csv') 

In [86]:
for f in files:
    
    chart_copy = pd.read_csv(f)
    chart = pd.read_csv(f)
    unaligned_hits = chart['Hits'].iloc[0]
    total_hits = float(np.int64(chart['Hits'].iloc[-1]).item())
    unaligned_aligned = unaligned_hits + total_hits
    
    col_list = chart['Hits'].to_list() 
    abs_abund_list = []
    for p in range(len(col_list)-1):
        abs_abun = col_list[p]/unaligned_aligned
        abs_abund_list.append(abs_abun)
    chart = chart.reset_index(drop = True)
    chart["Absolute Abundance"] = pd.Series(abs_abund_list)
    chart = chart.drop([len(chart['Reference'])-1,len(chart['Reference'])-1])
    chart.to_csv(f, index=False)
    chart_copy = chart_copy.drop([0,0])
    
    total_hits_copy = float(np.int64(chart_copy['Hits'].iloc[-1]).item()) 
    col_list_copy = chart_copy['Hits'].to_list() 
    rel_abs_list = []
    
    for x in range(len(col_list_copy)-1):
        rel_abs = col_list_copy[x]/total_hits_copy
        rel_abs_list.append(rel_abs)
    chart_copy = chart_copy.reset_index(drop = True)
    chart_copy["Relative Abundance"] = pd.Series(rel_abs_list)

    del chart_copy['id']
    del chart_copy['Hits']
    del chart_copy['Hits Unique']
    del chart_copy['Density']
    del chart_copy['RPKM']
    del chart_copy['Reference']
    del chart_copy['Length']

    
    chart_copy.loc[len(chart_copy)] = 0
    chart_copy = chart_copy.shift()
    chart_copy.loc[0] = np.nan 
    
    result = pd.concat([chart, chart_copy], axis=1)

    result.to_csv(f, index=False)




    

In [None]:
print("Complete")