# Information Entry Script for Metagenomic Analysis
This script allows you to enter in information into the Jupyter_metagenomic_analysis.ipynb file. This script will prompt you to enter in relevant information as well as locate necessary files for the analysis, and this information will be stored into a .csv file that can be read in by the Jupyter_metagenomic_analysis.ipynb file. For output verification, you can open this .csv file (named --.csv) to ensure the information looks correct before running the analysis script. 

When running this script, run all of the cells sequentially one after another after you are done editing the cell preceding it. This will ensure the script runs smoothly.

For questions about this script, please contact Akhil Gupta (gupta305@umn.edu). 

---

## Import UI Elements Script
This will load in all necessary widgets. Ignore the warning message for now

In [1]:
# Importing necessary functions from another .ipynb script
%run ui_elem.ipynb

----------------

## Text Entry
Here you should specify the column name (needs a better description) as well as give a name for the folder where the graphs and statistics should be stored. 

In [2]:
direct_name_data

VBox(children=(Text(value='ID', description='Column Name:', layout=Layout(width='70%'), placeholder='What is t…

In [3]:
# Test to see if the value in the text boxes are saved
print("Sample column ID: " + sample_column_id.value)
print("Graph Output Directory Name: " + graph_output_dir.value)
print("Stats Output Directory Name: " + stats_output_dir.value)

Sample column ID: ID
Graph Output Directory Name: graphs
Stats Output Directory Name: stats


--------------------------------------------------

## File Paths

Prior to running this portion of the script, be sure that all the files that will be used for analysis are uploaded to the data folder in the file directory where this script was opened from. Open the directory named "data", and upload all files that will be necessary to run the analysis. All the scripts that are required for the analysis are listed below. 

Once the files have been uploaded, click on the circulary icon in the top left of the screen to restart the python kernal - Jupyter doesn't update the list below while the kernel is running, you will instead have to restart the kernel. Once restarted, the file should appear in the drop down menu below, where you can then select the corresponding file. The current placeholders will work just fine if you don't have a certain file on hand. 

__*This section needs more work in terms of descriptions*__

#### Loading in resistome files

In [4]:
resistome_filenames

VBox(children=(Dropdown(description='AMR Count Matrix File:', index=7, layout=Layout(width='50%'), options=('t…

#### Loading in microbiome files

In [5]:
microbiome_filenames

VBox(children=(Dropdown(description='Biom File:', index=4, layout=Layout(width='50%'), options=('test_16S_dna-…

#### Save the file names
This will also print the output so you can double check its accuracy

In [6]:
save_filepath_button

Button(description='Save the filepaths for analysis', icon='save', layout=Layout(width='70%'), style=ButtonSty…

AMR Count Matrix Filepath: test_AMR_analytic_matrix.csv
AMR Metadata Filepath: test_AMR_metadata.csv
MEGARes Annotation Filepath: test_AMR_megares_annotations_v1.03.csv
Biom Filepath: test_16S_otu_table_json.biom
Fasta Filepath: test_16S_tree.nwk
Taxonomy Filepath: test_16S_taxonomy.tsv
Microbiome Temp Metadata Filepath: test_16S_metadata.csv

All filepaths saved


---

## Exploratory Variables
The following will allow you to enter in variables for the analyses. It's based on variables in your metadata.csv file that you want to use for EXPLORATORY analysis (NMDS, PCA, alpha rarefaction, barplots)

### Sliders
The slider allows you to choose how many separate analyses should be run, and how many of them should be AMR analyses and how many should be Microbiome. 


### Explanatory Variables
Every time the slider value is changed, the code to generate the boxes should be rerun in order to update them. They will automatically contain the correct number of boxes based on the values entered in the slider. 

**Name**: 

**Subset**: Subset will allow you to filter out variables of interest and remove unnecessary variables. To select variables of interest, the format should be “*column-2 == column-variable-1*”. This is exactly how you might enter this into R. 

To remove certain variables from the analysis, the format should be “*column_2 != column_variable_2*”. The key point is having the *!* symbol instead of the first exclamation point.

**Explanatory Variable**: This should also be the name of a column of interest for analysis. 

*NOTE*: Exploratory variables cannot be numeric. 

**Order**: This should describe the order that will be used during the analysis and when printing out result plots. *Each item in the list should be separated by a comma.*

In [7]:
display(exp_graph_var_amr)
display(exp_graph_var_microbiome)

IntSlider(value=5, continuous_update=False, description='AMR', max=10)

IntSlider(value=5, continuous_update=False, description='Microbiome', max=10)

In [8]:
var_info(exp_graph_var_amr.value, exp_graph_var_microbiome.value)

Tab(children=(Accordion(children=(VBox(children=(Text(value='', layout=Layout(width='70%'), placeholder='name'…

--------

## IMPORTANT STEP
Make sure to run the code below once you have finished entering the data into the boxes above. This will ensure that the data is stored correctly and will be output into the .csv file. 

In [16]:
# Saves and prints the variables entered above into a list to be used when creating the .csv file
list_vals_a, list_vals_m = save_print_variables(exp_graph_var_amr.value, exp_graph_var_microbiome.value)

---

## Outputting information into .csv file
This below will now store everything entered above into a .csv file that can be read in by the analysis script. The .csv file will be stored in the current working directory where this script is also stored. If your analysis script is located in another directory, be sure to move the .csv file into that same directory before running that analysis script. 

In [10]:
display(vars_save_button)
vars_save_button.on_click(vars_to_csv)

Button(description='Save variables for analysis script', icon='save', layout=Layout(width='50%'), style=Button…

Variables Exported. Check directory for .csv file
Variables Exported. Check directory for .csv file


---

# Next Steps

The results from this script will be output in a .csv file that the R script will then use to run the rest of the analysis. You don't need to worry about the .csv file, all you need to do now is open the script named "Jupyter_metagenomic_analysis.ipynb" in the directory where this file was also located and run it. 

## Run staging script with R magic

In [17]:
import rpy2.rinterface
%load_ext rpy2.ipython

The rpy2.ipython extension is already loaded. To reload it, use:
  %reload_ext rpy2.ipython


In [None]:
%%R -o AMR_melted_analytic -o microbiome_melted_analytic

# I basically turned your jupyter notebook "Jupyter_metagenomic_analysis.ipynb" into the staging script.
source("staging_script.R")

From cffi callback <function _processevents at 0x7f76c49badd0>:
Traceback (most recent call last):
  File "/srv/conda/envs/notebook/lib/python3.7/site-packages/rpy2/rinterface_lib/callbacks.py", line 264, in _processevents
    @ffi_proxy.callback(ffi_proxy._processevents_def,
KeyboardInterrupt
From cffi callback <function _processevents at 0x7f76c49badd0>:
Traceback (most recent call last):
  File "/srv/conda/envs/notebook/lib/python3.7/site-packages/rpy2/rinterface_lib/callbacks.py", line 264, in _processevents
    @ffi_proxy.callback(ffi_proxy._processevents_def,
KeyboardInterrupt
From cffi callback <function _processevents at 0x7f76c49badd0>:
Traceback (most recent call last):
  File "/srv/conda/envs/notebook/lib/python3.7/site-packages/rpy2/rinterface_lib/callbacks.py", line 264, in _processevents
    @ffi_proxy.callback(ffi_proxy._processevents_def,
KeyboardInterrupt
From cffi callback <function _processevents at 0x7f76c49badd0>:
Traceback (most recent call last):
  File "/srv/con

In [None]:
AMR_melted_analytic

In [22]:
#%%R

# This part takes longer, so for now we'll keep it seperate.
## Run code to make some exploratory figures, zero inflated gaussian model, and output count matrices.
#suppressMessages(source('scripts/print_figures.R'))

# Example of python widget

In [20]:
import ipywidgets as widgets
from ipywidgets import interact, interact_manual
import os
from IPython.display import Image

## The example below has to be fixed because I had to hard code the directory path
You'll probably have an easier time editing this than I was having.

We can either make one widget per dataset type (AMR/Microbiome), or modify the widget to have another button that lets you select the dataset type first, then the exploratory variable, and then the figure to see

In [21]:
# Create microbiome widgets
directory = widgets.Dropdown(options=os.listdir('graphs/Microbiome/'))
images = widgets.Dropdown(options=os.listdir('graphs/Microbiome/' + directory.value))

# Updates the image options based on directory value
def update_images(*args):
    images.options = os.listdir(directory.value)

# Tie the image options to directory value
directory.observe(update_images, 'value')

# Show the images
def show_images(fdir, file):
    display(Image(f'{fdir}/{file}'))

_ = interact(show_images, fdir=('graphs/Microbiome/' + directory.value), file=images)

interactive(children=(Text(value='graphs/Microbiome/Treatment', description='fdir'), Dropdown(description='fil…