Crosslinking Datasets Repository

This repository contains SDRF (Sample and Data Relationship Format) files for crosslinking-annotated datasets in ProteomeXchange.

Purpose

The goal of this repository is to provide standardized metadata annotations for crosslinking proteomics datasets submitted to ProteomeXchange. Each dataset has its own SDRF file that describes the experimental design, samples, and data relationships according to the MAGE-TAB SDRF format.

Repository Structure

crosslinking-datasets/
├── datasets/
│   ├── PXD000001/
│   │   └── PXD000001.sdrf.tsv
│   ├── PXD000002/
│   │   └── PXD000002.sdrf.tsv
│   └── ...
├── templates/
│   └── sdrf-template.tsv
└── README.md

SDRF Format

SDRF (Sample and Data Relationship Format) is a tab-delimited format that describes:

Sample characteristics and experimental variables
Protocols and protocol parameters
Raw and processed data files
Relationships between samples and data files

Required Columns

Common SDRF columns for crosslinking experiments include:

source name: Biological source identifier
characteristics[organism]: Species/organism
characteristics[cell type]: Cell type (if applicable)
characteristics[disease]: Disease state (if applicable)
comment[data file]: Raw data file names
comment[fraction identifier]: Fraction information
comment[technical replicate]: Technical replicate number
comment[biological replicate]: Biological replicate number
comment[label]: Labeling information
comment[instrument]: Mass spectrometry instrument
comment[modification parameters]: PTMs and crosslinker information
comment[cleavage agent details]: Protease used

Adding a New Dataset

Create a new directory under datasets/ with the ProteomeXchange accession ID (e.g., PXD012345)
Create an SDRF file named <accession>.sdrf.tsv in that directory
Follow the SDRF format guidelines and use the template as reference
Ensure all required columns are present and properly formatted
Validate the SDRF file using appropriate validation tools

SDRF Validation

SDRF files should be validated before submission. You can use tools like:

sdrf-pipelines for validation
ProteomeXchange submission validation tools

Automated Validation: All pull requests that modify *.sdrf.tsv files are automatically validated using the sdrf-pipelines tool before they can be merged into the main branch. The validation checks:

File format and structure
Required columns presence
Ontology term correctness
Data consistency

Contributing

To contribute a new SDRF file:

Fork this repository
Add your SDRF file following the structure above
Submit a pull request with a description of the dataset
Ensure your SDRF file passes automated validation

Resources

Contact

For questions or issues, please open an issue in this repository.

Name		Name	Last commit message	Last commit date
Latest commit History 39 Commits
.github/workflows		.github/workflows
datasets		datasets
templates		templates
.gitignore		.gitignore
CONTRIBUTING.md		CONTRIBUTING.md
README.md		README.md
crosslinking_datasets_list.tsv		crosslinking_datasets_list.tsv
identify_crosslinking_datasets.py		identify_crosslinking_datasets.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Crosslinking Datasets Repository

Purpose

Repository Structure

SDRF Format

Required Columns

Adding a New Dataset

SDRF Validation

Contributing

Resources

Contact

About

Uh oh!

Releases

Packages

Contributors 4

Uh oh!

Languages

bigbio/crosslinking-datasets

Folders and files

Latest commit

History

Repository files navigation

Crosslinking Datasets Repository

Purpose

Repository Structure

SDRF Format

Required Columns

Adding a New Dataset

SDRF Validation

Contributing

Resources

Contact

About

Resources

Contributing

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 4

Uh oh!

Languages

Packages