# PUPPI Tutorial

This Jupyter notebook demonstrates how to use the `puppi` Python package to process a BioID/AP-MS dataset. We will:

1. Load an example intensity file
2. Run feature engineering
3. Train a PU-learning model and estimate FDR
4. Save the output

In [None]:
import pandas as pd
from puppi.feature_engineering import run_feature_engineering
from puppi.training_and_fdr import run_training_and_fdr

## Load Example Data

In [None]:
input_df = pd.read_csv("tutorial/example_input.csv")
input_df.head()

## Run Feature Engineering

You must specify control keywords (substrings identifying control samples).

In [None]:
features_df = run_feature_engineering(input_df, control_keywords=["EGFP", "Empty"])
features_df.head()

## Train PU-learning Model and Estimate FDR

In [None]:
final_df = run_training_and_fdr(features_df, initial_positives=10, initial_negatives=200)
final_df.head()

## Save Output

In [None]:
final_df.to_csv("tutorial/example_output.csv", index=False)
print("Done! Output saved to tutorial/example_output.csv")

## Notes

- You can replace `example_input.csv` with your own file.
- Input must have a "Protein" column and replicate intensity columns named as `BAIT1_rep1`, `BAIT1_rep2`, etc.
- Control columns should contain identifiable substrings like "EGFP" or "Empty".