# User Interface

This Jupyter notebook is intended to assist in the post processing analysis of Single Cell Proteomic data.






## Imports

The following cell contains all of the necessary import for this notebook to run. Please make sure this is run every time you wish to work with the notebook

In [1]:
import read_files as rf


## Prep Files

Read in the protein/peptide files as well as the settings file

### Read Files

The following cell calls one function that accepts one variable called filelist. It is a list of dictionaries and allows you to analyze multiple protein/peptide file  duos at the same time.   

**Syntax** 

The general syntax should look like square brackets "[]" around a list of curly brackets "{}" that are separated by commas.     

The curly brackets "{}" denote a dictionary which is another list of items separated by commas. This time however, each item in the list contains a key:value pair. The key is the type of file ("protein_file") and the value is the name of the actual file "input/report.pg_matrix.tsv".     

Example of a dictionary:     
{"protein_file": "input/report.pg_matrix.tsv",    
 "peptide_file": "input/report.pr_matrix.tsv",
 "processing app": "diann"}

Example of a list: 
[        
{dictionary from above},   
     
{second dictionary}     
]


**Notes**

_Note 1_: Place the file_name in quotes. If the file is in a sub folder, write the name of the subfolder, followed by a "/", then the name of the file.

_Note 2_: The following columns are required in the tsv files for the program to run
- Protein.Names
- Protein.Group (just in protein)
- Precursor.Id (just in peptide)
- File names (with / in the name)

In [None]:
filelist = [
    {"protein_file": "input/report.pg_matrix.tsv", 
     "peptide_file": "input/report.pr_matrix.tsv",
     "processing_app": "diann"
     },
     
    {"protein_file": "input/HYE_prot_matrix.tsv", 
     "peptide_file": "input/HYE_pep_matrix.tsv",
     "processing_app": "diann"
     },

    {"protein_file": "input/fragpipe_combined_protein.tsv", 
     "peptide_file": "input/fragpipe_combined_peptide.tsv",
     "processing_app": "fragpipe"
     }
]

data_obj = rf.read_files(filelist)

### Settings File

#### Description

The Settings File can be used to tell the program how to filter, group, and name the data.

The program expects the settings file to be tab-delimited. This means there is a grid like structure to the file with tabs separating the columns. To create or edit a settings file, open it in excel, make the edits, then save the file as a Tab delimited Text (.txt) file. This should be an option in the "Save as" on Excel.

The settings file has 4 required columns: "Conditions", "filter_in", "filter_out", and "Organism"

- Conditions:
  - This is the name of the differing groups in the data. This creates a group that lines up with the row.
  - Ex: "human a" or "bacteria b"

- filter_in:
  - This is a part of the file name (column) that you wish to include in the group. If you wish to add multiple snippets, separate them by commas.
  - Ex: "Hela_1cpw" or "HYEA_250pg"

- filter_out:
  - This is a part of the file name (column) that you wish to specifically exclude from the group. If you wish to add multiple snippets, separate them by commas.
  - Ex: "Hela_1cpw" or "HYEA_250pg"

- Organism:
  - This is organism you wish to include in this group (needs to be ALL CAPS like it is in the Protein.Names column)
  - Ex: "HUMAN" or "ECOLI"

After these columns, you may add other columns to further group the data together. For example, if you wished to group all yeast and human data, you can add a fifth column called "Combined", with the name "Yeast_Human" on each row of yeast and each row of human, then "Other" in every other row.