# Information Entry Script for Metagenomic Analysis
This script allows you to enter in information into the Jupyter_metagenomic_analysis.ipynb file. This script will prompt you to enter in relevant information as well as locate necessary files for the analysis, and this information will be stored into a .csv file that can be read in by the Jupyter_metagenomic_analysis.ipynb file. For output verification, you can open this .csv file (named --.csv) to ensure the information looks correct before running the analysis script. 

When running this script, run all of the cells sequentially one after another after you are done editing the cell preceding it. This will ensure the script runs smoothly.

For questions about this script, please contact Akhil Gupta (gupta305@umn.edu). 

---

## Import UI Elements Script
This will load in all necessary widgets. Ignore the warning message for now

In [2]:
# Importing necessary functions from another .ipynb script
%run ui_elem.ipynb

----------------

## Text Entry
Here you should specify the column name (needs a better description) as well as give a name for the folder where the graphs and statistics should be stored. 

In [3]:
direct_name_data

VBox(children=(Text(value='ID', description='Column Name:', layout=Layout(width='70%'), placeholder='What is t…

In [4]:
# Test to see if the value in the text boxes are saved
print("Sample column ID: " + sample_column_id.value)
print("Graph Output Directory Name: " + graph_output_dir.value)
print("Stats Output Directory Name: " + stats_output_dir.value)

Sample column ID: ID
Graph Output Directory Name: graphs
Stats Output Directory Name: stats


--------------------------------------------------

## File Paths


Here you should load the files that will be used during the analysis. *This will load only load in the file path, not the actual file itself*. Each button will open a pop-up window that will prompt you to select a file using your operating system's default file explorer. Make sure to either 1) select a file or 2) close the file explorer window before pressing another below. If no files are given, a default file-path will be selected, which can be seen under these buttons.s

The first set of buttons prompts you to enter in files for resistome analysis, and the second set prompts you to enter in microbiome datasets.

#### Loading in resistome files

In [5]:
file_path_buttons_resistomes

VBox(children=(Button(description='Load the data, MEGARes annotations, and metadata', icon='upload', layout=La…

#### Loading in microbiome files

In [6]:
file_path_buttons_microbiome

VBox(children=(Button(description='Load the .biom file', icon='upload', layout=Layout(width='100%'), style=But…

#### Showing the final file paths. The paths can be changed by selecting a different file path above

In [7]:
### Print the file paths entered. If no file path was entered, the default file path will be returned.
### Here you should be able to see the final file paths that will be used. 
view_filepaths(amr_count_matrix_filepath,
              amr_metadata_filepath,
              megares_annotation_filename,
              biom_file,
              tre_file,
              tax_fasta,
              taxa_file,
              microbiome_temp_metadata_file)

# The output should be made much neater than this

# Assign these variables to be global variables first, then they don't need to be passed into the function
# like this and can still be printed. 

AMR Count Matrix: data/test_data/strict_SNP_confirmed_AMR_analytic_matrix.csv
AMR Metadata: data/test_data/FC_meat_AMR_metadata.csv
Megares Annotation: data/amr/megares_annotations_v1.03.csv
.biom File: data/test_data/exported-biom-table/otu_table_json.biom
tre File: data/test_data/exported-tree/tree.nwk
tax fasta File: data/test_data/exported-rep-seqs/dna-sequences.fasta
Taxa File: data/test_data/exported-biom-table-taxa/taxonomy.tsv
Microbiome Temp Metadata File: data/test_data/FC_meat_metadata.csv


---

## Exploratory Variables
The following will allow you to enter in variables for the analyses. It's based on variables in your metadata.csv file that you want to use for EXPLORATORY analysis (NMDS, PCA, alpha rarefaction, barplots)

### Sliders
The slider allows you to choose how many separate analyses should be run, and how many of them should be AMR analyses and how many should be Microbiome. 


### Explanatory Variables
Every time the slider value is changed, the code to generate the boxes should be rerun in order to update them. They will automatically contain the correct number of boxes based on the values entered in the slider. 

**Name**: 

**Subset**: Subset will allow you to filter out variables of interest and remove unnecessary variables. To select variables of interest, the format should be “*column-2 == column-variable-1*”. This is exactly how you might enter this into R. 

To remove certain variables from the analysis, the format should be “*column_2 != column_variable_2*”. The key point is having the *!* symbol instead of the first exclamation point.

**Explanatory Variable**: This should also be the name of a column of interest for analysis. 

*NOTE*: Exploratory variables cannot be numeric. 

**Order**: This should describe the order that will be used during the analysis and when printing out result plots. *Each item in the list should be separated by a comma.*

In [8]:
display(exp_graph_var_amr)
display(exp_graph_var_microbiome)

IntSlider(value=5, continuous_update=False, description='AMR', max=10)

IntSlider(value=5, continuous_update=False, description='Microbiome', max=10)

In [9]:
var_info(exp_graph_var_amr.value, exp_graph_var_microbiome.value)

Tab(children=(Accordion(children=(VBox(children=(Text(value='', layout=Layout(width='70%'), placeholder='name'…

In [45]:
print(name_a3.value)

three


In [49]:
list_vals_a = []

for i in range(exp_graph_var_amr.value):
    string = 'temp_list = [name_a{}.value, subset_a{}.value, exploratory_a{}.value, order_a{}.value]'.format(i,i,i,i)
    # print(string)
    exec(string)
    list_vals_a.append(temp_list)
    print(temp_list)


print()
print(list_vals_a)

['zero', '', '', '']
['one', '', '', '']
['two', '', '', '']
['three', '', '', '']
['four', '', '', '']

[['zero', '', '', ''], ['one', '', '', ''], ['two', '', '', ''], ['three', '', '', ''], ['four', '', '', '']]


## IMPORTANT STEP
Make sure to run the code below once you have finished entering the data into the boxes above. This will ensure that the data is stored correctly and will be output into the .csv file. 

In [39]:
# Saves and prints the variables entered above into a list to be used when 
# creating the .csv file
list_vals_a, list_vals_m = save_print_variables(exp_graph_var_amr.value, exp_graph_var_microbiome.value)

# This should be made to look cleaner later
# This can also be integrated into a button probably

In [40]:
list_vals_a

[['zero', "list('')", '', 'c("")'],
 ['one', "list('')", '', 'c("")'],
 ['two', "list('')", '', 'c("")'],
 ['three', "list('')", '', 'c("")'],
 ['four', "list('')", '', 'c("")']]

In [41]:
list_vals_m

[['name0', "list('sub')", '', 'c("")'],
 ['name1', "list('')", '', 'c("")'],
 ['name2', "list('')", '', 'c("")'],
 ['name3', "list('')", '', 'c("")'],
 ['name4', "list('')", '', 'c("")']]

In [None]:
global list_vals_a
global list_vals_m
list_vals_a = []
list_vals_m = []

def save_print_variables(amr, mic):
    list_vals_a = []
    list_vals_m = []
    exp = ["_a", "_m"]
    num = [amr, mic]
    
    for i in range(2):
        for j in range(num[i]):
            analysis = exp[i]
            exec("order_new{}{} = order_format(order{}{}.value)".format(analysis, j, analysis, j))
            exec("subset_new{}{} = subset_format(subset{}{}.value)".format(analysis, j, analysis, j))
            string = 'list_vals{}.append([name{}{}.value, subset_new{}{}, exploratory{}{}.value, order_new{}{}])'.format(analysis, analysis, j, analysis, j, analysis, j, analysis, j)
            exec(string)
    
    #print(list_vals_a)
    #print("")
    #print(list_vals_m)
    
    return list_vals_a, list_vals_m

---

## Outputting information into .csv file
This below will now store everything entered above into a .csv file that can be read in by the analysis script. The .csv file will be stored in the current working directory where this script is also stored. If your analysis script is located in another directory, be sure to move the .csv file into that same directory before running that analysis script. 

In [30]:
display(vars_save_button)
vars_save_button.on_click(vars_to_csv)

Button(description='Save variables for analysis script', icon='save', layout=Layout(width='50%'), style=Button…

Variables Exported. Check directory for .csv file


---

# Integrated Analysis Script (in R)
The following is the analysis script using R integrated into the same jupyter notebook using R Magic.

In [50]:
%load_ext rpy2.ipython

  from pandas.core.index import Index as PandasIndex


In [51]:
%%R

# Loading in data for use in the analysis script from the .csv file generated above
file <- read.csv(paste(getwd(), "/metagenome_analysis_vars.csv", sep = ""),
                 colClasses = "character",
                 header = FALSE,
                 col.names = c("V1", "V2", "V3","V4"))

sample_column_id              <- file[1,2]
graph_output_dir              <- file[2,2]
stats_output_dir              <- file[3,2]
amr_count_matrix_filepath     <- file[4,2]
amr_metadata_filepath         <- file[5,2]
megares_annotation_filename   <- file[6,2]
biom_file                     <- file[7,2]
tre_file                      <- file[8,2]
tax_fasta                     <- file[9,2]
taxa_file                     <- file[10,2]
microbiome_temp_metadata_file <- file[11,2]

# Creates a list of all the AMR variables, and is dependent on the number inputed in the previous script
AMR_exploratory_analyses <- list()
for (i in 1:(which(file$V1 == "microbiome_exploratory_analyses") - 13)){
  subset_list = eval(parse(text=file[(12+i),2]))
  AMR_exploratory_analyses <- append(AMR_exploratory_analyses, 
                                     list(list(name = file[(12+i),1],
                                               subset = subset_list,
                                               exploratory_var = file[(12+i),3],
                                               order = file[(12+i),4])))
}


y <- which(file$V1 == "microbiome_exploratory_analyses")
microbiome_exploratory_analyses <- list()
for (i in 1:(nrow(file) - y)){
  subset_list = eval(parse(text=file[(y+i),2]))
  microbiome_exploratory_analyses <- append(microbiome_exploratory_analyses, 
                                     list(list(name = file[(y+i),1],
                                               subset = subset_list,
                                               exploratory_var = file[(y+i),3],
                                               order = file[(y+i),4])))
}

# This part is important for us to improve. STOP HERE
## Right now, we have to manually select which scripts to be run based on the analysis you need
* All of the objects above wil get used by the scripts we run below



Some things to work on:
* I'm getting a warning now due to some changes in other packages. It would be great if we could update the code to work with the new syntax
* Make drop down list where users can select which combination of analyses they need:
 * AMR/16S qiime2 results/kraken2 results

In [None]:
%%R
BiocManager::install("phyloseq")
























































































	‘/private/var/folders/sz/7dfgjxcs2ys3wxplh6d60d700000gn/T/Rtmp1Cv0w5/downloaded_packages’


  'ModelMetrics', 'PKI', 'R6', 'RCurl', 'RJDBC', 'RJSONIO', 'RSQLite', 'Rcpp',
  'SQUAREM', 'TTR', 'askpass', 'backports', 'bit', 'blob', 'boot', 'broom',
  'caTools', 'callr', 'caret', 'class', 'cli', 'clipr', 'cluster', 'curl',
  'data.table', 'dbplyr', 'devtools', 'dplyr', 'fansi', 'forcats', 'foreach',
  'formatR', 'fs', 'ggplot2', 'gh', 'git2r', 'glmnet', 'glue', 'gower',
  'haven', 'hexbin', 'hms', 'htmltools', 'htmlwidgets', 'httpuv', 'httr',
  'ipred', 'iterators', 'knitr', 'later', 'lattice', 'lava', 'lubridate',
  'markdown', 'mgcv', 'modelr', 'mongolite', 'nlme', 'nnet', 'numDeriv',
  'odbc', 'openssl', 'pillar', 'pkgbuild', 'pkgconfig', 'plyr', 'prettyunits',
  'processx', 'prodlim', 'profvis', 'progress', 'promises', 'ps', 'purrr',
  'quantmod', 'rJava', 'rcmdcheck', 'recipes', 'remotes', 'repr', 'reprex',
  'reshape2', 'rex', 'rlang', 'rmarkdown',

In [4]:
%%R

####### END OF USER CONTROLS ######

## Pick the correct script that handles resistome data and/or microbiome data.
#### If shotgun microbiome and megares analysis, run:
#source('scripts/metagenomeSeq_megares_kraken.R')

#### If 16S microbiome and megares analysis, run:
source('scripts/metagenomeSeq_megares_qiime.R')


Error in import_biom(biom_file, tre_file, tax_fasta) : 
  could not find function "import_biom"


  could not find function "import_biom"



# Now, you can print exploratory figures to your local directory. 

This part takes a long time, but creates relative abundance barplots, diversity barplots, NMDS, PCA, and heatmaps. 
For now it's commented out because the NMDS function creates a bunch of messages that look ugly. We can probably find a way to fix this though. 

Other little things:
* I want to add better functionality to the functions in the "scripts/meg_utility_functions.R" file where we can better handle cases with factors > 20 in length

In [None]:
%%R

######## THEN print figures #

# After running this script, these are the useful objects that contain all the data aggregated to different levels
# The metagenomeSeq objects are contained in these lists "AMR_analytic_data" and "microbiome_analytic_data"
# Melted counts are contained in these data.table objects "amr_melted_analytic" "microbiome_melted_analytic"

## Run code to make some exploratory figures, zero inflated gaussian model, and output count matrices.
suppressMessages(source('scripts/print_figures.R'))

# Everything after this is where we can get creative to summarize our results. For now, let's focus on streamlining how we use everything above
## Here, we can have an area to show them how to play around with ggplot2
    
### The main objects to use are
* AMR
  * amr_melted_analytic/amr_raw_melted_analytic
   * Object of all counts in long form
  * AMR_analytic_data
   * List of MRexperiment objects at each level; Class, Mechanism, Group, Gene
* Microbiome
  * microbiome_melted_analytic/microbiome_raw_melted_analytic
  * microbiome_analytic_data

    First, combine the normalized count tables with the metadata file.

In [18]:
%%R

head(amr_melted_analytic)


Error in head(amr_melted_analytic) : 
  object 'amr_melted_analytic' not found


  object 'amr_melted_analytic' not found



In [None]:
%%R

### Start of code for figures, combine table objects to include meta
setkey(amr_melted_raw_analytic,ID) 
setkey(amr_melted_analytic,ID) 

setkey(microbiome_melted_analytic,ID)
# Set keys for both metadata files
setkey(metadata,ID)
setkey(microbiome_metadata,ID)
microbiome_melted_analytic <- microbiome_melted_analytic[microbiome_metadata]
amr_melted_raw_analytic <- amr_melted_raw_analytic[metadata]
amr_melted_analytic <- amr_melted_analytic[metadata]

In [None]:
%%R

## Figure 1 showing resistome composition
AMR_class_sum <- amr_melted_analytic[Level_ID=="Class", .(sum_class= sum(Normalized_Count)),by=.(ID, Name, Packaging, Treatment)][order(-Packaging )]
AMR_class_sum[,total:= sum(sum_class), by=.(ID)]
AMR_class_sum[,percentage:= sum_class/total ,by=.(ID, Name) ]
AMR_class_sum$Class <- AMR_class_sum$Name
fig1 <- ggplot(AMR_class_sum, aes(x = ID, y = percentage, fill = Class)) + 
  geom_bar(stat = "identity",colour = "black")+
  facet_wrap( ~ Treatment, scales='free',ncol = 2) +
  #scale_fill_brewer(palette="Dark2") +
  theme(
    panel.grid.major=element_blank(),
    panel.grid.minor=element_blank(),
    strip.text.x=element_text(size=22),
    strip.text.y=element_text(size=22, angle=0),
    axis.text.x=element_blank(), #element_text(size=16, angle=20, hjust=1)
    axis.text.y=element_text(size=20),
    axis.title=element_text(size=22),
    legend.position="right",
    panel.spacing=unit(0.1, "lines"),
    plot.title=element_text(size=22, hjust=0.5),
    legend.text=element_text(size=10),
    legend.title=element_text(size=20),
    panel.background = element_rect(fill = "white")
  ) +
  ggtitle("\t\tResistome composition by sample") +
  xlab('Sample') +
  ylab('Relative abundance') +
  scale_fill_tableau("Tableau 20") 
fig1