Skip to content

StructuralGenomicsConsortium/EASMS-data-processing

Repository files navigation

ASMS Data Curation Pipeline

This repository contains a Python-based data curation pipeline for processing Affinity Selection Mass Spectrometry (ASMS) datasets. The pipeline prepares data for machine learning by performing cleaning, labeling, and fingerprint extraction.

Main Features

  • Splits protein-specific data into separate files
  • Detects and filters out anomalous entries
  • Handles isomer corrections
  • Adds negative samples from a master list
  • Generates binary labels for machine learning
  • Extracts chemical fingerprints (e.g., ECFP4, FCFP6, MACCS)
  • Saves curated data in both CSV and Parquet formats

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages