# Figure 1: Raw data visualization

## Library imports

If you are using this Jupyter notebook in **Binder** (the lightest version with already saved necessary data to reproduce the plots), we suggest to uncomment and run the cell below to install all necessary Python packages that you need to run this notebook. 

In [66]:
## Install all packages directly in the notebook
!pip install numpy pandas plotly alphatims holoviews psutil datashader pyarrow

If you would like to read the raw data on your local machine in the Jupyter notebook please uncomment and install additional packages to read the Thermo data.

> #### Installing pythonnet on Windows
    > it can be done just installing the pythonnet with pip

> #### Installing pythonnet on Ubuntu (Linux)
    > 1. sudo apt-get install build-essential
    > 2. Intall mono from mono project website [install mono on Linux](https://www.mono-project.com/download/stable/#download-lin)
    > 3. pip install pythonnet

> #### Installing pythonnet on MacOS
> 1. brew install pkg-config
> 2. Intall mono from mono project website;
> 3. "export PKG_CONFIG_PATH=/usr/local/lib/pkgconfig:/usr/lib/pkgconfig:/Library/Frameworks/Mono.framework/Versions/6.12.0/lib/pkgconfig:$PKG_CONFIG_PATH"; `or` add these PKG_CONFIG_PATH into ~./bash_profile, and run "source ~/bash_profile". 6.12.0 is my mono version
> 4. pip install pythonnet

In [None]:
# only on Windows
# !pip install numba tqdm pythonnet

In [20]:
# import all necessary libraries
import pandas as pd
import numpy as np

import holoviews as hv
from holoviews import opts
from holoviews.operation.datashader import dynspread, rasterize, shade
import plotly.graph_objects as go

import utils
import alphatims.bruker

hv.extension('plotly')

## Figure 1 A: TIC and BPI

For this figure we used the Thermo raw file from the [PXD012867 PRIDE project](https://www.ebi.ac.uk/pride/archive/projects/PXD012867). To reproduce this step on your PC please download the specified file from the PRIDE repository and change the path to it on your computer.

In [2]:
# # specify a path to the Thermo .raw file
# sample_path = '.../Data/PXD012867_yeast_project/20190124_QX3_JuSc_SA_JS7_1_wt_4h_1.raw'

# # upload the thermo raw file
# data = load_thermo_raw(sample_path)

# # save ms1 separately 
# df_ms1 = pd.DataFrame({'scan': data['scan_list_ms1'], 
#                     'RT': data['rt_list_ms1'],
#                    'intensity': data['int_list_ms1'],
#                     'order': 'ms1'})

# # calculate summed and max intensity per each scan
# df_ms1['summed_intensity'] = df_ms1.intensity.apply(lambda x: sum(x))
# df_ms1['max_intensity'] = df_ms1.intensity.apply(lambda x: max(x))
# df_ms1.head()

> To simplify this step and enable to reproduce it on Binder, upload the dataframe `thermo_chrom_ms1` where information about MS1 scans and corresponding summed and max intensities are saved across the retention time were already saved into the parquet file for you. 

In [21]:
thermo_chrom_ms1 = pd.read_parquet('Data/20190124_QX3_JuSc_SA_JS7_1_wt_4h_1_chrom', engine='pyarrow')
thermo_chrom_ms1.head()

Unnamed: 0,scan,RT,order,summed_intensity,max_intensity
0,1,0.00204,ms1,739489458,57035932
1,17,0.015798,ms1,2000855366,124119728
2,33,0.02804,ms1,1815002750,101905192
3,49,0.040419,ms1,2054335850,100841456
4,65,0.053017,ms1,1774525781,104155856


In [22]:
def plot_tic(
    df: pd.DataFrame, 
    title: str, 
    width: int = 900,
    height: int = 500
):
    """Create a total ion chromatogram (TIC) and Base Peak chromatogram (BPI) for the MS1 data.

    Parameters
    ----------
    df : pandas Dataframe
        A table with the extracted MS1 data.
    title : str
        The title of the plot.
    width : int
        The width of the plot.
        Default is 1000.
    height : int
        The height of the plot.
        Default is 320.

    Returns
    -------
    a Plotly line plot
        The line plot containing TIC and BPI for MS1 data of the provided dataset.
    """
    fig = go.Figure()
    
    total_ion_col = ['RT', 'summed_intensity']
    base_peak_col = ['RT', 'max_intensity']
    
    for chrom_type in ['TIC MS1', 'BPI MS1']:
        if chrom_type == 'TIC MS1':
            data = df[total_ion_col]
        elif chrom_type == 'BPI MS1':
            data = df[base_peak_col]
        fig.add_trace(
            go.Scatter(
                x=data.iloc[:, 0],
                y=data.iloc[:, 1],
                name=chrom_type,
                hovertemplate='<b>RT:</b> %{x};<br><b>Intensity:</b> %{y}.',
            )
        )
    
    fig.update_layout(
        title=dict(
            text=title,
            font=dict(
                size=16,
            ),
            x=0.5,
            xanchor='center',
            yanchor='top'
        ),
        xaxis=dict(
            title='RT, min',
            titlefont_size=14,
            tickmode = 'auto',
            tickfont_size=14,
        ),
        yaxis=dict(
            title='Intensity',
        ),
        legend=dict(
            orientation="h",
            yanchor="bottom",
            y=1.02,
            xanchor="right",
            x=1
        ),
        legend_title_text='Select:',
        hovermode="x",
        template="plotly_white",
        width=width,
        height=height
    )

    fig.update_xaxes(range=[0, df.RT.max()])
    return fig
    

In [23]:
plot_tic(thermo_chrom_ms1, 'Chromatogram').show(config=utils.config)

## Figure 1 B: MS1 map

For this figure we used the same file as for the Fig_1a (`20190124_QX3_JuSc_SA_JS7_1_wt_4h_1.raw` file from the [PXD012867 PRIDE project](https://www.ebi.ac.uk/pride/archive/projects/PXD012867)). 

In [112]:
# # specify a path to the Thermo .raw file
# sample_path = '.../Data/PXD012867_yeast_project/20190124_QX3_JuSc_SA_JS7_1_wt_4h_1.raw'

# # upload the thermo raw file
# data = load_thermo_raw(sample_path)

# df = pd.DataFrame({
#     'scan': data['scan_list_ms1'], 
#     'RT': data['rt_list_ms1'], 
#     'mz': data['mass_list_ms1']
# })
# lst_col = 'mz'
# ms1 = pd.DataFrame({col:np.repeat(df[col].values, df[lst_col].str.len()) for col in df.columns.drop(lst_col)})
# ms1['mz'] = np.concatenate(data['mass_list_ms1'])
# ms1['intensity'] = np.concatenate(data['int_list_ms1'])
# ms1.head()

> To simplify this step and enable to reproduce it on Binder, upload the dataframe `thermo_ms1` where information about MS1 precursors (m/z and intensity) is already saved into the parquet files for you. 

In [108]:
thermo_ms1 = pd.read_parquet(
    ['Data/20190124_QX3_JuSc_SA_JS7_1_wt_4h_1_ms1_part1',
     'Data/20190124_QX3_JuSc_SA_JS7_1_wt_4h_1_ms1_part2'], 
    engine='pyarrow'
)
thermo_ms1.head()

Unnamed: 0,RT,mz,intensity
0,0.00204,300.143433,60268
1,0.00204,300.160583,64932
2,0.00204,300.179596,51113
3,0.00204,300.476562,13740
4,0.00204,300.595093,24882


In [113]:
# this function is taken from the AlphaViz package (https://github.com/MannLabs/alphaviz) and modified
def plot_heatmap_ms1(
    df: pd.DataFrame,
    x_axis_label: str = "m/z, Th",
    y_axis_label: str = "RT, min",
    z_axis_label: str = "Intensity",
    title: str = "",
    width: int = 700,
    height: int = 400,
    background_color: str = "black",
    colormap: str = "fire",
):
    """Create a heatmap showing a correlation of m/z and ion mobility with color coding for signal intensity.

    Parameters
    ----------
    df : pandas Dataframe
        A dataframe obtained by slicing an alphatims.bruker.TimsTOF object.
    x_axis_label : str
        An x-axis label.
        Default is "m/z, Th".
    y_axis_label : str
        An y-axis label.
        Default is "Inversed IM, V·s·cm\u207B\u00B2".
    z_axis_label : str
        An z-axis label using for the coloring.
        Default is "Intensity".
    title: str
        The title of the plot.
         Default is "".
    width : int
        The width of the plot.
        Default is 700.
    height : int
        The height of the plot.
        Default is 400.
    background_color : str
        The background color of the plot.
        Default is "black".
    colormap : str
        The name of the colormap in Plotly.
        Default is "fire".

    Returns
    -------
    a Plotly scatter plot
        The scatter plot showing all found features in the specified rt and m/z ranges of the provided dataset.
    """
    labels = {
        'm/z, Th': "mz",
        'RT, min': "RT",
        'Inversed IM, V·s·cm\u207B\u00B2': "mobility_values",
        'Intensity': "intensity",
    }
    x_dimension = labels[x_axis_label]
    y_dimension = labels[y_axis_label]
    z_dimension = labels[z_axis_label]

    def hook(plot, element):
        plot.handles['layout']['xaxis']['gridcolor'] = background_color
        plot.handles['layout']['yaxis']['gridcolor'] = background_color

    opts_ms1=dict(
        width=width,
        height=height,
        title=title,
        xlabel=x_axis_label,
        ylabel=y_axis_label,
        bgcolor=background_color,
        hooks=[hook],
    )
    dmap = hv.DynamicMap(
        hv.Points(
            df,
            [x_dimension, y_dimension],
            z_dimension
        )
    )
    agg = rasterize(
        dmap,
        width=width,
        height=height,
        aggregator='sum'
    )
    fig = dynspread(
        shade(
            agg,
            cmap=colormap
        )
    ).opts(plot=opts_ms1)

    return fig

In [114]:
plot_heatmap_ms1(thermo_ms1, title='MS1 retention time map')



## Figure 1 C: XIC

Here we use the same data as for the figure 1b.
> To simplify this step and enable to reproduce it on Binder, upload the dataframe `thermo_ms1` where information about MS1 precursors (m/z and intensity) is already saved into the parquet files for you. 

In [115]:
thermo_ms1 = pd.read_parquet(
    ['Data/20190124_QX3_JuSc_SA_JS7_1_wt_4h_1_ms1_part1',
     'Data/20190124_QX3_JuSc_SA_JS7_1_wt_4h_1_ms1_part2'], 
    engine='pyarrow'
)
thermo_ms1.head()

Unnamed: 0,RT,mz,intensity
0,0.00204,300.143433,60268
1,0.00204,300.160583,64932
2,0.00204,300.179596,51113
3,0.00204,300.476562,13740
4,0.00204,300.595093,24882


In [119]:
def sum_binned_data(rt_values, intensity_values, min_value, max_value, bins):
    """ 
    Sum the intensities over retention time
    """
    bin_delta = (max_value - min_value) / bins
    bins_array = np.linspace(min_value, max_value, bins+1)
    rt_bins = ((rt_values - min_value) / bin_delta).astype(np.int64)
    intensity_bins = np.zeros(bins+1, dtype=np.int64)
    for rt_bin, intensity in zip(rt_bins, intensity_values):
        intensity_bins[rt_bin] += intensity
        bin_centers = bins_array[1:] - bin_delta/2
    return bin_centers, intensity_bins[1:]

In [117]:
def plot_xic(
    df: pd.DataFrame, 
    xic_mz: float,
    mz_tol_value: int,
    rt_min: float,
    rt_max: float,
    bins: int,
    width: int = 900,
    height: int = 500
):
    """Create an Extracted ion chromatogram (XIC) for the selected m/z.

    Parameters
    ----------
    df : pandas Dataframe
        A table with the extracted MS1 data.
    xic_mz : float
        An m/z value of the precursor/feature that should be used for the XIC.
    mz_tol_value : int
        An m/z tolerance value in ppm.
    rt_min : float
        Start of the retention time window.
    rt_max : float
        End of the retention time window.
    bins: int
        The number of bins for the plot's creation.
    width : int
        The width of the plot.
        Default is 900.
    height : int
        The height of the plot.
        Default is 500.

    Returns
    -------
    a Plotly line plot
        The line plot showing XIC for the selected m/z of the provided dataset.
    """
    fig = go.Figure()
    
    xic_mz_low_mz = xic_mz / (1 + mz_tol_value / 10**6)
    xic_mz_high_mz = xic_mz * (1 + mz_tol_value / 10**6)

    d = df[(df.mz >= xic_mz_low_mz) & (df.mz <= xic_mz_high_mz)]

    bin_centers, intensity_bins = sum_binned_data(d.RT, d.intensity, rt_min, rt_max, bins)
    
    fig.add_trace(
        go.Scatter(
            x=bin_centers,
            y=intensity_bins,
            hovertemplate='<b>RT:</b> %{x};<br><b>Intensity:</b> %{y}.',
        )
    )
    
    fig.update_layout(
        title=dict(
            text=f'XIC for the m/z = {xic_mz}, m/z tolerance = {mz_tol_value} ppm.',
            font=dict(
                size=16,
            ),
            x=0.5,
            xanchor='center',
            yanchor='top'
        ),
        xaxis=dict(
            title='RT, min',
            titlefont_size=14,
            tickmode = 'auto',
            tickfont_size=14,
        ),
        yaxis=dict(
            title='Intensity',
        ),
        hovermode="x",
        template="plotly_white",
        width=width,
        height=height
    )

    fig.update_xaxes(range=[0, df.RT.max()])
    return fig

In [120]:
# let's plot the XIC for the analyte with m/z = 457.997855 and m/z tolerance = 5 ppm
plot_xic(
    df=thermo_ms1, 
    xic_mz=457.997855, 
    mz_tol_value=5, 
    rt_min=0, 
    rt_max=thermo_ms1.RT.max(), 
    bins=300
).show(config=utils.config)

## Figure 1 D: m/z vs. IM heatmap

The Bruker raw file is used from the [Project PXD017703 from ProteomeXchange](https://www.ebi.ac.uk/pride/archive/projects/PXD017703).

To read the raw TIMS-TOF data we will use a recently published [AlphaTims package](https://github.com/MannLabs/alphatims).

In [15]:
file_path = '../Data/PXD017703_diaPASEF/20200428_Evosep_60SPD_SG06-16_MLHeLa_200ng_py8_S3-A6_1_2452.hdf'

In [16]:
raw_data = alphatims.bruker.TimsTOF(file_path)

In [17]:
# this function is taken from the AlphaViz package (https://github.com/MannLabs/alphaviz) and modified
def plot_heatmap(
    df: pd.DataFrame,
    x_axis_label: str = "m/z, Th",
    y_axis_label: str = "Inversed IM, V·s·cm\u207B\u00B2",
    z_axis_label: str = "Intensity",
    title: str = "",
    width: int = 700,
    height: int = 400,
    background_color: str = "black",
    colormap: str = "fire",
):
    """Create a heatmap showing a correlation of m/z and ion mobility with color coding for signal intensity.

    Parameters
    ----------
    df : pandas Dataframe
        A dataframe obtained by slicing an alphatims.bruker.TimsTOF object.
    x_axis_label : str
        An x-axis label.
        Default is "m/z, Th".
    y_axis_label : str
        An y-axis label.
        Default is "Inversed IM, V·s·cm\u207B\u00B2".
    z_axis_label : str
        An z-axis label using for the coloring.
        Default is "Intensity".
    title: str
        The title of the plot.
         Default is "".
    width : int
        The width of the plot.
        Default is 700.
    height : int
        The height of the plot.
        Default is 400.
    background_color : str
        The background color of the plot.
        Default is "black".
    colormap : str
        The name of the colormap in Plotly.
        Default is "fire".

    Returns
    -------
    a Plotly scatter plot
        The scatter plot showing all found features in the specified rt and m/z ranges of the provided dataset.
    """
    labels = {
        'm/z, Th': "mz_values",
        'RT, min': "rt_values",
        'Inversed IM, V·s·cm\u207B\u00B2': "mobility_values",
        'Intensity': "intensity_values",
    }
    x_dimension = labels[x_axis_label]
    y_dimension = labels[y_axis_label]
    z_dimension = labels[z_axis_label]

    df["rt_values"] /= 60

    def hook(plot, element):
        plot.handles['layout']['xaxis']['gridcolor'] = background_color
        plot.handles['layout']['yaxis']['gridcolor'] = background_color

    opts_ms1=dict(
        width=width,
        height=height,
        title=title,
        xlabel=x_axis_label,
        ylabel=y_axis_label,
        bgcolor=background_color,
        hooks=[hook],
    )
    dmap = hv.DynamicMap(
        hv.Points(
            df,
            [x_dimension, y_dimension],
            z_dimension
        )
    )
    agg = rasterize(
        dmap,
        width=width,
        height=height,
        aggregator='sum'
    )
    fig = dynspread(
        shade(
            agg,
            cmap=colormap
        )
    ).opts(plot=opts_ms1)

    return fig

In [18]:
raw_data[6004]

Unnamed: 0,raw_indices,frame_indices,scan_indices,precursor_indices,push_indices,tof_indices,rt_values,rt_values_min,mobility_values,quad_low_mz_values,quad_high_mz_values,mz_values,intensity_values
0,279083635,6004,33,0,5517709,211464,633.716165,10.561936,1.601114,-1.0,-1.0,694.773080,9
1,279083636,6004,35,0,5517711,326378,633.716165,10.561936,1.598886,-1.0,-1.0,1252.148662,9
2,279083637,6004,36,0,5517712,35698,633.716165,10.561936,1.597771,-1.0,-1.0,157.530243,9
3,279083638,6004,36,0,5517712,50307,633.716165,10.561936,1.597771,-1.0,-1.0,187.655130,9
4,279083639,6004,36,0,5517712,64580,633.716165,10.561936,1.597771,-1.0,-1.0,219.631055,9
...,...,...,...,...,...,...,...,...,...,...,...,...,...
379511,279463146,6004,916,0,5518592,182636,633.716165,10.561936,0.602828,-1.0,-1.0,580.518001,73
379512,279463147,6004,916,0,5518592,253798,633.716165,10.561936,0.602828,-1.0,-1.0,881.147624,93
379513,279463148,6004,917,0,5518593,184218,633.716165,10.561936,0.601681,-1.0,-1.0,586.522010,78
379514,279463149,6004,917,0,5518593,6192,633.716165,10.561936,0.601681,-1.0,-1.0,104.719141,100


In [19]:
# this heatmap is generated for the MS1 frame № 6004 in the middle of the gradient
plot_heatmap(raw_data[6004], title='MS1 ion mobility heatmap')



## Additional figure: MS1 feature map

This plot is not included in the manuscript but could be generated using the MaxQuant output features file.

In [121]:
# path_to_features_mq_file = '../Data/20190402_QX1_SeVW_MA_HeLa_500ng_LC11.features.tsv'

In [122]:
# features = pd.read_csv(path_to_features_mq_file, sep='\t')
# features['id'] = features.index
# features.head()

Unnamed: 0,mz,mostAbundantMz,charge,rtStart,rtApex,rtEnd,fwhm,nIsotopes,nScans,averagineCorr,mass,massCalib,intensityApex,intensitySum,id
0,401.848359,401.848359,3,1.407345,1.437862,1.485042,0.087918,2,19,0.758352,1202.515967,1202.515603,333899.279297,5775192.0,0
1,414.837956,414.837956,1,1.376529,1.450545,1.523679,0.263146,2,37,0.999643,413.830402,413.830372,412544.174316,12626900.0,1
2,476.198018,476.198018,1,1.376529,1.435587,1.48892,0.499391,3,28,0.973631,475.188811,475.188842,741800.526855,17980060.0,2
3,493.838298,493.838298,1,1.416147,1.445596,1.485042,0.025322,2,13,0.964025,492.83095,492.830919,244041.166626,3760315.0,3
4,550.216764,550.216764,1,1.420375,1.45544,1.531359,0.126799,3,28,0.997846,549.208431,549.208429,482958.62793,15129000.0,4


Upload the raw data.

In [None]:
# # specify a path to the Thermo .raw file
# path_to_raw_file = '../Data/20190402_QX1_SeVW_MA_HeLa_500ng_LC11.raw'

# # upload the thermo raw file
# data_f = load_thermo_raw(path_to_raw_file)

# df = pd.DataFrame({
#     'scan': data_f['scan_list_ms1'], 
#     'RT': data_f['rt_list_ms1'], 
#     'mz': data_f['mass_list_ms1']
# })
# lst_col = 'mz'
# ms1 = pd.DataFrame({col:np.repeat(df[col].values, df[lst_col].str.len()) for col in df.columns.drop(lst_col)})
# ms1['mz'] = np.concatenate(data_f['mass_list_ms1'])
# ms1['intensity'] = np.concatenate(data_f['int_list_ms1'])
# ms1.head()

In [123]:
def plot_features_map(
    ms1: pd.DataFrame,
    features: pd.DataFrame,
    rt_range: tuple,
    mz_range: tuple,
    mz_tol_ppm: int = 2,
    title: str = "Feature map",
    width: int = 850,
    height: int = 550
):
    """Create a feature map for the selected m/z and RT ranges.

    Parameters
    ----------
    ms1 : pandas Dataframe
        A table with the extracted MS1 data.
    features : pandas Dataframe
        A table of features found by MaxQuant.
    rt_range : tuple
        Start and end of the retention time range.
    mz_range : tuple
        Start and end of m/z range.
    mz_tol_ppm : int
        An m/z tolerance value in ppm.
    title: str
        The title of the plot.
         Default is "Feature map".
    width : int
        The width of the plot.
        Default is 850.
    height : int
        The height of the plot.
        Default is 550.

    Returns
    -------
    a Plotly scatter plot
        The scatter plot showing all found features in the specified rt and m/z ranges of the provided dataset.
    """
    # slice the raw data
    ms1_sliced = ms1[
        (ms1.mz >= mz_range[0]) & \
        (ms1.mz <= mz_range[1]) & \
        (ms1.RT >= rt_range[0]) & \
        (ms1.RT <= rt_range[1])
    ]
    ms1_sliced['id'] = -1
    
    # filter the features and assign the feature id to the raw data
    for feat in features[
        (features.mz >= mz_range[0]) & \
        (features.mz <= mz_range[1]) & \
        (features.rtStart >= rt_range[0]) & \
        (features.rtEnd <= rt_range[1])
    ].itertuples():
        mz_low = feat.mz / (1 + mz_tol_ppm / 10**6)
        mz_high = feat.mz * (1 + mz_tol_ppm / 10**6)
        ms1_sliced.loc[
            (ms1_sliced.mz >= mz_low) & \
            (ms1_sliced.mz <= mz_high) & \
            (ms1_sliced.RT >= feat.rtStart) & \
            (ms1_sliced.RT <= feat.rtEnd), 
            'id'
        ] = feat.id
    
#     fig = px.scatter(d, x='mz', y='RT', color='id', color_discrete_sequence=["red", "blue"])
    
    fig = go.Figure()

    fig.add_trace(
        go.Scatter(
            x=ms1_sliced[ms1_sliced.id == -1].mz,
            y=ms1_sliced[ms1_sliced.id == -1].RT,
            mode='markers',
            marker=dict(color='lightgrey', size=4),
            hovertext=ms1_sliced[ms1_sliced.id == -1].intensity,
            hovertemplate='<b>m/z:</b> %{x};<br><b>RT:</b> %{y};<br><b>Intensity:</b> %{hovertext}.',
            name='',
            showlegend=False
        )
    )

    for feat in ms1_sliced.id.unique():
        if feat != -1:
            fig.add_trace(
                go.Scatter(
                    x=ms1_sliced[ms1_sliced.id == feat].mz,
                    y=ms1_sliced[ms1_sliced.id == feat].RT,
                    mode='markers',
                    marker=dict(size=4),
                    hovertext=ms1_sliced[ms1_sliced.id == -1].intensity,
                    hovertemplate='<b>m/z:</b> %{x};<br><b>RT:</b> %{y};<br><b>Intensity:</b> %{hovertext}.',
                    name='',
                    showlegend=False
                )
            )
    
    fig.update_layout(
        title=dict(
            text=title,
            font=dict(
                size=16,
            ),
            x=0.5,
            y=0.88,
            xanchor='center',
#             yanchor='middle'
        ),
        xaxis=dict(
            title='m/z, Th',
            titlefont_size=14,
            tickmode = 'auto',
            tickfont_size=14,
        ),
        yaxis=dict(
            title='RT, min',
        ),
        hovermode="closest",
        template="plotly_white",
        width=width,
        height=height
    )
    
    return fig

Specify the range to show the plot in retention time and m/z dimensions.

In [126]:
# plot_features_map(ms1=ms1, features=features, rt_range=(54, 56), mz_range=(457, 459)).show(config=utils.config)
# at the center of the plot we see a feature with m/z ~ 457.997855