# A basic guide to crema as a package

The purpose of this vignette is to demonstrate the basics of how to use crema as a Python package. We'll be looking at how to feed crema some data and obtain the confidence estimate results we're looking for.

Note that the following calculations are performed within a [Jupyter notebook](https://jupyter.org/) to allow for an easy and convenient walkthrough experience.

## Following along locally

To run this notebook, you'll need to have [crema](https://crema-ms.readthedocs.io/en/latest/) installed. Additionally, you'll need a file containing data on a set of PSMs. The data we’ll be using comes from single-cell proteomics experiment from this [paper](https://www.biorxiv.org/content/10.1101/665307v4):

> Specht, Harrison. "Single-cell proteomic and transcriptomic analysis of macrophage heterogeneity." bioRxiv. 01 Jan 2020, doi: https://doi.org/10.1101/665307

The files I use in this example are slightly modified. If you need them, you can download it from the crema repository here ([example_psms_target.txt](https://raw.githubusercontent.com/Noble-Lab/crema/master/data/example_psms_target.txt), [example_psms_decoy.txt](https://raw.githubusercontent.com/Noble-Lab/crema/master/data/example_psms_decoy.txt)) and set the path to your input file:

In [1]:
input_files = ["../../../data/example_psms_target.txt", "../../../data/example_psms_decoy.txt"]

## Step 1: Setup our Python environment

The first thing we need to do is import the Python packages that we'll need; crema is super lightweight so this is easy! We're also going to create an output directory to save our results in later.

In [2]:
import os
import crema

# Create an output directory
out_dir = "example_crema_output_dir"
os.makedirs(out_dir, exist_ok=True)

## Step 2: Read the PSMs

Now we'll need to give crema the input file (or files) to read PSMs from. The [read_file()](https://crema-ms.readthedocs.io/en/latest/api/functions.html#crema.read_file) function returns a [PsmDataset](https://crema-ms.readthedocs.io/en/latest/api/dataset.html#crema.dataset.PsmDataset) object, which uses a Pandas DataFrame to store specific column data from the input files that are necessary for confidence estimate calculations.

In [3]:
psms = crema.read_file(input_files,  spectrum_col='scan', score_col='combined p-value', target_col='target/decoy')

Note that the [read_file()](https://crema-ms.readthedocs.io/en/latest/api/functions.html#crema.read_file) function only requires one parameter - the input files. The following parameters need not be specified unless the column names within the input files differ from the default names that [read_file()](https://crema-ms.readthedocs.io/en/latest/api/functions.html#crema.read_file) searches for. In this specific example, we specify the additional parameters for clarity purposes only; it is not necessary here because the input files we are working with have column names that are default to the [read_file()](https://crema-ms.readthedocs.io/en/latest/api/functions.html#crema.read_file) function.

The following information is the data extracted from the input_files and saved into our [PsmDataset](https://crema-ms.readthedocs.io/en/latest/api/dataset.html#crema.dataset.PsmDataset) object:

In [4]:
psms.data.head()

Unnamed: 0,scan,combined p-value,target/decoy
0,11510,1.7e-05,True
1,17317,0.018082,True
2,11896,0.003307,True
3,7676,0.008335,True
4,9993,0.002828,True


## Step 3: Calculate confidence estimates

After our [PsmDataset](https://crema-ms.readthedocs.io/en/latest/api/dataset.html#crema.dataset.PsmDataset) object has been created, we can proceed to calculating confidence estimates using one of crema's many confidence estimate methods. These methods return a [Result](https://crema-ms.readthedocs.io/en/latest/api/result.html#crema.result.Result) object, which uses a Pandas DataFrame to store the data manipulated from the [PsmDataset](https://crema-ms.readthedocs.io/en/latest/api/dataset.html#crema.dataset.PsmDataset) object along with the respective confidence estimate values.

For this vignette, we'll be using the [calculate_tdc()](https://crema-ms.readthedocs.io/en/latest/api/functions.html#crema.calculate_tdc) function. This is crema's most basic confidence estimate algorithm that uses target-decoy competition.

In [5]:
results = crema.calculate_tdc(psms)

Keep in mind that crema has other confidence estimate methods that can be used in a similar fashion. They can be found [here](https://crema-ms.readthedocs.io/en/latest/api/functions.html).

This is what our [Results](https://crema-ms.readthedocs.io/en/latest/api/dataset.html#crema.dataset.PsmDataset) object looks like after running the confidence estimation method:

In [6]:
results.data.head()

Unnamed: 0,scan,combined p-value,target/decoy,FDR,Q_Value
0,15869,2.9e-31,True,1.0,0.000303
1,11368,9.73e-29,True,0.5,0.000303
2,11505,1.6400000000000001e-28,True,0.333333,0.000303
3,15515,3.21e-26,True,0.25,0.000303
4,15987,5.83e-26,True,0.2,0.000303


## Step 4: Save and export the results

Great, we're nearly done! All that is left to do is call the [write_file()](https://crema-ms.readthedocs.io/en/latest/api/result.html#crema.result.Result.write_csv) function which will export our results to the specified location.

In [7]:
result_files = results.write_file(output_dir=out_dir)
result_files

'example_crema_output_dir\\crema.psm_results.txt'

## Wrapping Up

Congrats! You are now capable of using crema as a Python package! If you'd like to take your crema skills to the next level, check out some of the other vignettes. For more details about any of the crema functions and classes that we used, see the [crema Python API documentation](https://crema-ms.readthedocs.io/en/latest/api/index.html).