# User Interface

This Jupyter notebook is intended to assist in the post processing analysis of Single Cell Proteomic data.






## Imports

The following cell contains all of the necessary import for this notebook to run. Please make sure this is run every time you wish to work with the notebook

In [1]:
import python_files.read_files as rf
import python_files.create_defaults as cd

## Prep Files

Read in the protein/peptide files as well as the settings file

### Generate Default Settings



If Create_Defaults is False, then no template files will be created. 

If Create_Defaults is True, then the following files will be created.    
- default_file_template.yaml
  - Change the number of files to create the right number!

WARNING: This function will overwrite any file with the same name. So change the name of your files before pressing play on the cell



In [2]:
Create_Defaults = False

if Create_Defaults:
    
    number_of_files = 13
    cd.generate_input_files_template(number_of_files)

### Read Data

The following cell calls one function that accepts one variable called yaml_input_files. It is a file (.yaml) that carries the information regarding the protein and peptide matrices to be processed.     

For a template for this file, change "Create_Defaults" to True above. (Change number_of_files to generate the right number of template spots).    
Be sure to change the name of your files so it is not overwritten.

**Notes**

_Note 1_: Place the file name in quotes.     
_Note 2_: If the file is in a sub folder, write the name of the subfolder, followed by a "/", then the name of the file.


In [3]:
yaml_input_files = "testing.yaml"

data_obj = rf.read_files(yaml_input_files)

Peptide file must be the combined_ion.tsv file.


FileNotFoundError: Peptide file must be the combined_ion.tsv file.

In [None]:
display(data_obj["pep_abundance"].head(20))

Unnamed: 0,Sequence,Protein Name,0-0,0-1,0-2,0-3,0-4,0-5,0-6,0-7,...,0-50,0-51,0-52,0-53,0-54,0-55,0-56,0-57,0-58,0-59
0,AAAAAAAAAPAAAATAPTTAATTAATAAQ3,SRP14_HUMAN,,,,,,,593309.9,490824.3,...,722264.9,,,646607.56,780311.2,737640.94,1099972.0,1373296.5,1167776.4,1301979.0
1,AAAAAAALQAK2,RL4_HUMAN,,,,,,,1191644.5,1837376.2,...,1657824.6,2056592.2,2609519.2,2202425.5,1706271.4,1999661.6,2546828.5,1846868.4,2990392.2,2838274.8
2,AAAAADLANR2,CLPX_HUMAN,,,,,,,,,...,49234.21,,,,31686.605,38781.656,,,,
3,AAAAVQGGR2,SPCS2_HUMAN,,,,,,,,,...,,,,,,,,,,
4,AAAEELLAR2,SHIP2_HUMAN,,,,,,,,,...,33976.69,,,,,30296.02,,,,
5,AAAEEQIK2,IPO9_HUMAN,,,,,,,,,...,,,,,,,,72528.29,123060.98,
6,AAAEQAISVR2,EXOSX_HUMAN,,,,,,,,,...,,,,,,,,,,
7,AAAEVAGQFVIK2,TFR1_HUMAN,,,,,,,382105.1,453756.03,...,443569.03,561349.94,698817.56,549023.6,531625.3,570722.25,698484.2,686860.44,755311.44,680468.75
8,AAAEVNQDYGLDPK2,FUMH_HUMAN,,,,,,,,,...,412727.3,,,,,,757692.56,,,
9,AAAGEDYK2,SYWC_HUMAN,,,,,,,,,...,55201.57,,,,,119927.734,,97465.305,176991.44,213294.0


In [None]:
# data_obj["pep_abundance"].to_csv("data_obj/pep_abundance.tsv", sep="\t")

In [None]:
# data_obj["prot_abundance"].to_csv("data_obj/prot_abundance.tsv", sep="\t")

### Settings File

#### Description

The Settings File can be used to tell the program how to filter, group, and name the data.

The program expects the settings file to be tab-delimited. This means there is a grid like structure to the file with tabs separating the columns. To create or edit a settings file, open it in excel, make the edits, then save the file as a Tab delimited Text (.txt) file. This should be an option in the "Save as" on Excel.

The settings file has 4 required columns: "Conditions", "filter_in", "filter_out", and "Organism"

- Conditions:
  - This is the name of the differing groups in the data. This creates a group that lines up with the row.
  - Ex: "human a" or "bacteria b"

- filter_in:
  - This is a part of the file name (column) that you wish to include in the group. If you wish to add multiple snippets, separate them by commas.
  - Ex: "Hela_1cpw" or "HYEA_250pg"

- filter_out:
  - This is a part of the file name (column) that you wish to specifically exclude from the group. If you wish to add multiple snippets, separate them by commas.
  - Ex: "Hela_1cpw" or "HYEA_250pg"

- Organism:
  - This is organism you wish to include in this group (needs to be ALL CAPS like it is in the Protein.Names column)
  - Ex: "HUMAN" or "ECOLI"

After these columns, you may add other columns to further group the data together. For example, if you wished to group all yeast and human data, you can add a fifth column called "Combined", with the name "Yeast_Human" on each row of yeast and each row of human, then "Other" in every other row.