# PUPPI Tutorial

This Jupyter notebook demonstrates how to use the `puppi` Python package to process a BioID/AP-MS dataset. We will:

1. Load an example intensity file
2. Run feature engineering
3. Train a PU-learning model and estimate FDR
4. Save the output

In [None]:
import pandas as pd
from puppi.features import feature_engineering
from puppi.train import train_and_score

## Load Example Data

In [None]:
input_df = pd.read_csv("input_intensity_dataset.tsv", sep='\t')
input_df.head(10)

## Run Feature Engineering

You must specify control (substrings identifying control samples).

In [None]:
features_df = feature_engineering(input_df, controls=["EGFP", "Empty", "NminiTurbo"])
features_df.head()

## Train PU-learning Model and Estimate FDR

In [None]:
final_df = train_and_score(features_df, initial_positives=20, initial_negatives=200)
final_df.head()

## Save Output

In [None]:
final_df.to_csv("puppi_output.csv", index=False)