Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Compute TPA and protein concentration #10

Merged
merged 4 commits into from
Apr 21, 2023
Merged
Show file tree
Hide file tree
Changes from 3 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
105 changes: 105 additions & 0 deletions bin/compute_tpa.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,105 @@
#!/usr/bin/env python
# -*- coding: utf-8 -*-

import click
import pandas as pd
WangHong007 marked this conversation as resolved.
Show resolved Hide resolved
WangHong007 marked this conversation as resolved.
Show resolved Hide resolved
from pandas import DataFrame, Series
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

3% of developers fix this issue

F401: 'pandas.DataFrame' imported but unused

❗❗ 3 similar findings have been found in this PR

🔎 Expand here to view all instances of this finding
File Path Line Number
bin/compute_tpa.py 6
bin/compute_tpa.py 9
bin/compute_tpa.py 12

Visit the Lift Web Console to find more details in your report.


ℹ️ Expand to see all @sonatype-lift commands

You can reply with the following commands. For example, reply with @sonatype-lift ignoreall to leave out all findings.

Command Usage
@sonatype-lift ignore Leave out the above finding from this PR
@sonatype-lift ignoreall Leave out all the existing findings from this PR
@sonatype-lift exclude <file|issue|path|tool> Exclude specified file|issue|path|tool from Lift findings by updating your config.toml file

Note: When talking to LiftBot, you need to refresh the page to see its response.
Click here to add LiftBot to another repo.

from pyopenms import *
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

0% of developers fix this issue

F403: 'from pyopenms import *' used; unable to detect undefined names


ℹ️ Expand to see all @sonatype-lift commands

You can reply with the following commands. For example, reply with @sonatype-lift ignoreall to leave out all findings.

Command Usage
@sonatype-lift ignore Leave out the above finding from this PR
@sonatype-lift ignoreall Leave out all the existing findings from this PR
@sonatype-lift exclude <file|issue|path|tool> Exclude specified file|issue|path|tool from Lift findings by updating your config.toml file

Note: When talking to LiftBot, you need to refresh the page to see its response.
Click here to add LiftBot to another repo.


from bin.compute_ibaq import print_help_msg, parse_uniprot_name
from ibaq.ibaqpy_commons import PROTEIN_NAME, NORM_INTENSITY, SAMPLE_ID, CONDITION
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

1% of developers fix this issue

E501: line too long (82 > 79 characters)

❗❗ 14 similar findings have been found in this PR

🔎 Expand here to view all instances of this finding
File Path Line Number
bin/compute_tpa.py 17
bin/compute_tpa.py 18
bin/compute_tpa.py 20
bin/compute_tpa.py 21
bin/compute_tpa.py 22
bin/compute_tpa.py 24
bin/compute_tpa.py 41
bin/compute_tpa.py 61
bin/compute_tpa.py 75
bin/compute_tpa.py 83

Showing 10 of 14 findings. Visit the Lift Web Console to see all.


ℹ️ Expand to see all @sonatype-lift commands

You can reply with the following commands. For example, reply with @sonatype-lift ignoreall to leave out all findings.

Command Usage
@sonatype-lift ignore Leave out the above finding from this PR
@sonatype-lift ignoreall Leave out all the existing findings from this PR
@sonatype-lift exclude <file|issue|path|tool> Exclude specified file|issue|path|tool from Lift findings by updating your config.toml file

Note: When talking to LiftBot, you need to refresh the page to see its response.
Click here to add LiftBot to another repo.

from ibaq.ibaqpy_commons import plot_distributions, plot_box_plot
import numpy as np
import os

@click.command()
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

2% of developers fix this issue

E302: expected 2 blank lines, found 1


ℹ️ Expand to see all @sonatype-lift commands

You can reply with the following commands. For example, reply with @sonatype-lift ignoreall to leave out all findings.

Command Usage
@sonatype-lift ignore Leave out the above finding from this PR
@sonatype-lift ignoreall Leave out all the existing findings from this PR
@sonatype-lift exclude <file|issue|path|tool> Exclude specified file|issue|path|tool from Lift findings by updating your config.toml file

Note: When talking to LiftBot, you need to refresh the page to see its response.
Click here to add LiftBot to another repo.

@click.option("-f", "--fasta", help="Protein database")
@click.option("-p", "--peptides", help="Peptide identifications with intensities following the peptide intensity output")
@click.option("-r", "--ruler", help="Whether to use proteomicRuler", default=True)
@click.option("-n", "--ploidy", help="ploidy number", default=2)
@click.option("-c", "--cpc", help="cellular protein concentration(g/L)", default=200)
@click.option("-o", "--output", help="Output file with the proteins and ibaq values")
def tpa_compute(fasta: str, peptides: str, ruler: bool, ploidy: int, cpc: float, output: str) -> None:
"""
This command computes the protein copies and concentrations according to a file output of peptides with the
format described in peptide_contaminants_file_generation.py.
:param fasta: Fasta file used to perform the peptide identification
:param peptides: Peptide intensity file.
:param ruler: Whether to compute protein copies, weight and concentration.
:param ploidy: ploidy number.
:param cpc: cellular protein concentration(g/L).
:param output: output format containing the ibaq values.
:return:
"""
if peptides is None or fasta is None:
print_help_msg(ibaq_compute)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

1% of developers fix this issue

F405: 'ibaq_compute' may be undefined, or defined from star imports: pyopenms

❗❗ 2 similar findings have been found in this PR

🔎 Expand here to view all instances of this finding
File Path Line Number
bin/compute_tpa.py 49
bin/compute_tpa.py 53

Visit the Lift Web Console to find more details in your report.


ℹ️ Expand to see all @sonatype-lift commands

You can reply with the following commands. For example, reply with @sonatype-lift ignoreall to leave out all findings.

Command Usage
@sonatype-lift ignore Leave out the above finding from this PR
@sonatype-lift ignoreall Leave out all the existing findings from this PR
@sonatype-lift exclude <file|issue|path|tool> Exclude specified file|issue|path|tool from Lift findings by updating your config.toml file

Note: When talking to LiftBot, you need to refresh the page to see its response.
Click here to add LiftBot to another repo.

WangHong007 marked this conversation as resolved.
Show resolved Hide resolved
WangHong007 marked this conversation as resolved.
Show resolved Hide resolved
exit(1)

data = pd.read_csv(peptides, sep=",")
print(data.head())

res = pd.DataFrame(data.groupby([PROTEIN_NAME, SAMPLE_ID, CONDITION])[NORM_INTENSITY].sum())
res = res.reset_index()
proteins = res["ProteinName"].unique().tolist()
proteins = sum([i.split(";") for i in proteins], [])

# calculate molecular weight of quantified proteins
mw_dict = dict()
fasta_proteins = list() # type: list[FASTAEntry]
FASTAFile().load(fasta, fasta_proteins)
for entry in fasta_proteins:
accession, name = entry.identifier.split("|")[1:]
if name in proteins:
mw = AASequence().fromString(entry.sequence).getMonoWeight()
mw_dict[name] = mw

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

1% of developers fix this issue

W293: blank line contains whitespace


ℹ️ Expand to see all @sonatype-lift commands

You can reply with the following commands. For example, reply with @sonatype-lift ignoreall to leave out all findings.

Command Usage
@sonatype-lift ignore Leave out the above finding from this PR
@sonatype-lift ignoreall Leave out all the existing findings from this PR
@sonatype-lift exclude <file|issue|path|tool> Exclude specified file|issue|path|tool from Lift findings by updating your config.toml file

Note: When talking to LiftBot, you need to refresh the page to see its response.
Click here to add LiftBot to another repo.

# calculate TPA for every protein group
def get_protein_group_mw(group: str) -> float:
mw_list = [mw_dict[i] for i in group.split(";")]
return sum(mw_list)

res["MolecularWeight"] = res.apply(lambda x: get_protein_group_mw(x["ProteinName"]), axis=1)
res["MolecularWeight"] = res["MolecularWeight"].fillna(1)
res["MolecularWeight"] = res["MolecularWeight"].replace(0, 1)
res["TPA"] = res["NormIntensity"] / res["MolecularWeight"]
plot_distributions(res, "TPA", SAMPLE_ID, log2=True)
plot_box_plot(res, "TPA", SAMPLE_ID, log2=True,
title="TPA Distribution", violin=False)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

0% of developers fix this issue

E127: continuation line over-indented for visual indent


ℹ️ Expand to see all @sonatype-lift commands

You can reply with the following commands. For example, reply with @sonatype-lift ignoreall to leave out all findings.

Command Usage
@sonatype-lift ignore Leave out the above finding from this PR
@sonatype-lift ignoreall Leave out all the existing findings from this PR
@sonatype-lift exclude <file|issue|path|tool> Exclude specified file|issue|path|tool from Lift findings by updating your config.toml file

Note: When talking to LiftBot, you need to refresh the page to see its response.
Click here to add LiftBot to another repo.


# calculate protein weight(ng) and concentration(nM)
if ruler:
avogadro = 6.02214129e23
average_base_pair_mass = 617.96 # 615.8771
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

5% of developers fix this issue

E261: at least two spaces before inline comment

❗❗ 4 similar findings have been found in this PR

🔎 Expand here to view all instances of this finding
File Path Line Number
bin/compute_tpa.py 85
bin/compute_tpa.py 86
bin/compute_tpa.py 93
bin/compute_tpa.py 94

Visit the Lift Web Console to find more details in your report.


ℹ️ Expand to see all @sonatype-lift commands

You can reply with the following commands. For example, reply with @sonatype-lift ignoreall to leave out all findings.

Command Usage
@sonatype-lift ignore Leave out the above finding from this PR
@sonatype-lift ignoreall Leave out all the existing findings from this PR
@sonatype-lift exclude <file|issue|path|tool> Exclude specified file|issue|path|tool from Lift findings by updating your config.toml file

Note: When talking to LiftBot, you need to refresh the page to see its response.
Click here to add LiftBot to another repo.


organism = res.loc[0, "ProteinName"].split("_")[1].lower()
histone_df = pd.read_json(open(os.path.split(__file__)[0] + os.sep + "histones.json", "rb")).T
target_histones = histone_df[histone_df["name"] == organism.lower()]
genome_size = target_histones["genome_size"].values[0]
histones_list = target_histones["histone_entries"].values[0]

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

1% of developers fix this issue

W293: blank line contains whitespace


ℹ️ Expand to see all @sonatype-lift commands

You can reply with the following commands. For example, reply with @sonatype-lift ignoreall to leave out all findings.

Command Usage
@sonatype-lift ignore Leave out the above finding from this PR
@sonatype-lift ignoreall Leave out all the existing findings from this PR
@sonatype-lift exclude <file|issue|path|tool> Exclude specified file|issue|path|tool from Lift findings by updating your config.toml file

Note: When talking to LiftBot, you need to refresh the page to see its response.
Click here to add LiftBot to another repo.

dna_mass = ploidy * genome_size * average_base_pair_mass / avogadro

def calculate(protein_intensity, histone_intensity, mw):
copy = (protein_intensity / histone_intensity) * dna_mass * avogadro / mw
# The number of moles is equal to the number of particles divided by Avogadro's constant
moles = copy * 1e9 / avogadro # unit nmol
weight = moles * mw # unit ng
return tuple([copy, moles, weight])

def proteomicRuler(df):
histone_intensity = df[df["ProteinName"].isin(histones_list)]["NormIntensity"].sum()
histone_intensity = histone_intensity if histone_intensity > 0 else 1
df[["Copy", "Moles[nmol]", "Weight[ng]"]] = df.apply(lambda x: calculate(x["NormIntensity"], histone_intensity, x["MolecularWeight"]), axis = 1, result_type="expand")
WangHong007 marked this conversation as resolved.
Show resolved Hide resolved
WangHong007 marked this conversation as resolved.
Show resolved Hide resolved
volume = df["Weight[ng]"].sum() * 1e-9 / cpc # unit L
df["Concentration[nM]"] = df["Moles[nmol]"] / volume # unit nM
return df

res = res.groupby(["Condition"]).apply(proteomicRuler)

plot_distributions(res, "Concentration[nM]", SAMPLE_ID, log2=True)
plot_box_plot(res, "Concentration[nM]", SAMPLE_ID, log2=True,
title="Concentration[nM] Distribution", violin=False)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

0% of developers fix this issue

E127: continuation line over-indented for visual indent


ℹ️ Expand to see all @sonatype-lift commands

You can reply with the following commands. For example, reply with @sonatype-lift ignoreall to leave out all findings.

Command Usage
@sonatype-lift ignore Leave out the above finding from this PR
@sonatype-lift ignoreall Leave out all the existing findings from this PR
@sonatype-lift exclude <file|issue|path|tool> Exclude specified file|issue|path|tool from Lift findings by updating your config.toml file

Note: When talking to LiftBot, you need to refresh the page to see its response.
Click here to add LiftBot to another repo.

res.to_csv(output, index=False)

if __name__ == '__main__':
WangHong007 marked this conversation as resolved.
Show resolved Hide resolved
tpa_compute()
WangHong007 marked this conversation as resolved.
Show resolved Hide resolved
Loading