# LOQ COMPLIANCE PROGRAM NOTEBOOK

Printing hello world as a test (to check out it exports usable code)

In [None]:
print("Hello World")

Exporting works, so now to list the functions of the program to be created, in order from the most basic to all of the bells and whistles. The most basic options (points 1-XX) are what will be necessary for a "successful" program. The options beyond that are the "bells and whistles".

### The "Must Haves":

Input by user:

1. Total Diet Study `.txt` file, which is tab-delimited and format it to allow for analysis
   
2. "Analyte Type", which will be the column containing the "Analyte" that is desired for output
   
3. "Analyte", which is what is being measured in the Total Diet Study
   
4. Optional: New Cut off concentration (default = 0)

The program must take all of the above files and parameters and output a new `.txt` file containing only rows that have the Analyte requested.

Additional, "behind the scenes", processing.

1. Do NOT include any rows that contain "RAP" (generally in the "Sample Qualifier" column)
   
2. Check the provided LOQ (limit of quantitation) column and compare it with the "Conc" (concentration) observed column. If the LOQ is GREATER than the Conc, then the row is not included in the output, as any values below the LOQ cannot be certain.
   
3. If no new cut off concentration is provided (item #5 from user input), then the program will default to "zero tolerance", meaning anything above zero and the LOQ will be output. If a new cut off IS provided, then this will output anything above the new cut off that is also above the LOQ.

### The "Bells and Whistles":

1. Have the option to disallow the progression of the run if the requested concentration cut off is less than the LOQ provided (item #4 in the "Behind the Scenes" processing.

2. Find and convert any units in the "Unit" column that are not `mg/kg`.

3. Optional: Food Number (ignore if not provided)
    * If provided, and if the requested new cut off concentration is less than the LOQ, a warning will be printed "Your requested concentration cut off is less than the LOQ provided for your requested analyte"

In [3]:
# testing to see what this notebook is running
from platform import python_version
print(python_version())

3.7.3


## THE CODE BEGINS

#### Import packages and parse/define arguments

In [4]:
# import the necessary packages
import csv
import pandas as pd
import numpy as np
from pandas import DataFrame
import argparse
import sys
import chardet

In [7]:
# Parsing arguments with argparse
parser = argparse.ArgumentParser(description = 'This script allows data selection for various requested analytes from Total Diet Studies at the FDA.')
parser.add_argument('--file', required=True, help='The Total Diet Study file to be analyzed.')
parser.add_argument('--analyte', required=True, help='The analyte that is to be extracted, e.g. Arsenic.')
parser.add_argument('--type', required=True, help='The type of analyte that the Total Diet Study input file is measuring, e.g. Element.')
parser.add_argument('--number', required=False, help='optional: The Food Number associated with a specific food.')
parser.add_argument('--cutoff', required=False, type=float, help='optional: Specifiy a new cut-off concentration, default=None.')
parser.add_argument('--filename', required=False, default='outfile', help='optional: name of the file, default=outfile.txt, output as TSV')
args = parser.parse_args()

usage: ipykernel_launcher.py [-h] --file FILE --analyte ANALYTE --type TYPE
                             --out OUT [--number NUMBER] [--cutoff CUTOFF]
ipykernel_launcher.py: error: the following arguments are required: --file, --analyte, --type, --out


SystemExit: 2

#### TESTING: Tested argparse by generating `test_argparse.py` in the `tests` folder of the project. Used it by running the following command:


`python test_argparse.py --file ~/Desktop/Python_program/Individual\ Year\ Analytical\ Results_0/Elements_2003.txt --analyte Arsenic --type Element --out ~/Desktop/Python_program/LOQ_Compliance/tests/`


And retrieved the following output:

`Namespace(analyte='Arsenic', cutoff=None, file='/Users/brittany.ott/Desktop/Python_program/Individual Year Analytical Results_0/Elements_2003.txt', number=None, out='/Users/brittany.ott/Desktop/Python_program/LOQ_Compliance/tests/', type='Element')`

So, the next step is being able to process the `.txt` file containing our data.

This little script was used to test many by having it print out different results (such as the type of one of my arguments).

#### Cleaning up the data

In [None]:
# determining the file encoding for reading by pandas
rawdata = open(args.file, 'rb').read()
result = chardet.detect(rawdata)
charenc = result['encoding']

#### Processing the input file

In [None]:
# reading in the file
file = pd.read_csv(args.file, sep='\t', encoding=charenc)

# obtaining the rows that contain the desired analyte
df1 = file[file[args.type].str.contains(args.analyte)]

#incorporating food number; currently sorting this out
#df2 = df1[df1['Food No.'].str.contains(args.number))]

# removes rows containing RAP
df_remove = df1[df1['Sample Qualifier'].str.contains('UAP', '', na=True, regex=False)]

#### Checking against LOQ

In [None]:
# generating a dataframe that compares the concentration detected to a cut off
if args.cutoff == None:
	LOQ_compliant = df_remove[df_remove['Conc'] > df_remove['LOQ']] # if no cutoff is provided
else:
	LOQ_compliant = df_remove[df_remove['Conc'] > args.cutoff] # if a cutoff is provided

No_detect = df_remove[df_remove['Conc'] == 0]

#### Generating output files

In [None]:
# Code for printing LOQ compliant to a file 
if args.filename == None:
	output_LOQ = open('outfile.txt', 'w')
else:
	output_LOQ = open(args.filename + '_outfile.txt', 'w')

LOQ_compliant.to_csv(output_LOQ, sep='\t')

output_LOQ.close()

# Code for printing the No Detect to a file
if args.filename == None:
	output_nodetect = open('No_detect.txt', 'w')
else:
	output_nodetect = open(args.filename + '_nodetect.txt', 'w')

No_detect.to_csv(output_nodetect, sep='\t')

output_nodetect.close()

#### Conclusions

Test iteration 6 is the program that has the most basic functionality.

If you run `python test_csv6.py --help`:

In [None]:
usage: test_csv6.py [-h] --file FILE --analyte ANALYTE --type TYPE
                    [--number NUMBER] [--cutoff CUTOFF] [--filename FILENAME]

This script allows data selection for various requested analytes from Total
Diet Studies at the FDA.

optional arguments:
  -h, --help           show this help message and exit
  --file FILE          The Total Diet Study file to be analyzed.
  --analyte ANALYTE    The analyte that is to be extracted, e.g. Arsenic.
  --type TYPE          The type of analyte that the Total Diet Study input
                       file is measuring, e.g. Element.
  --number NUMBER      optional: The Food Number associated with a specific
                       food.
  --cutoff CUTOFF      optional: Specifiy a new cut-off concentration,
                       default=None.
  --filename FILENAME  optional: name of the file, default=outfile.txt, output
                       as TSV


If you run the following command:

In [None]:
python test_csv6.py --file /<PATH>/Elements_2003.txt --analyte Arsenic --type Element --cutoff 0.001 --filename Elements_2003

Two files will emerge with the desired aspects, one containing all samples above the requested cutoff and one containing all samples where nothing was detected for the analyte requested. If you run it without a cutoff, *everything* is returned that is above the LOQ provided for that element in that specific food type/category (as specified in the file).