<div style="text-align: justify; padding:5px; background-color:rgb(252, 253, 255); border: 1px solid lightgrey; padding-left: 1em; padding-right: 1em;">
    <font color='red'>To begin: Click anywhere in this cell and press <kbd>Run</kbd> on the menu bar. This executes the current cell and then highlights the next cell. There are two types of cells. A <i>text cell</i> and a <i>code cell</i>. When you <kbd>Run</kbd> a text cell (<i>we are in a text cell now</i>), you advance to the next cell without executing any code. When you <kbd>Run</kbd> a code cell (<i>identified by <span style="font-family: courier; color:black; background-color:white;">In[ ]:</span> to the left of the cell</i>) you advance to the next cell after executing all the Python code within that cell. Any visual results produced by the code (text/figures) are reported directly below that cell. Press <kbd>Run</kbd> again. Repeat this process until the end of the notebook. <b>NOTE:</b> All the cells in this notebook can be automatically executed sequentially by clicking <kbd>Kernel</kbd><font color='black'>→</font><kbd>Restart and Run All</kbd>. Should anything crash then restart the Jupyter Kernal by clicking <kbd>Kernel</kbd><font color='black'>→</font><kbd>Restart</kbd>, and start again from the top.
        
</div>

<div style="text-align: justify; padding:5px; background-color:rgb(252, 253, 255); border: 1px solid lightgrey; padding-left: 1em; padding-right: 1em;">
<img src="https://github.com/CIMCB/MetabComparisonBinaryML/blob/master/cimcb_logo.png?raw=true" width="180px" align="right" style="padding: 20px">

<a id="introduction"></a>

<h1> Notebook 3: Reporting (Single Batch)</h1>

<br>
<br>
<br>
<p  style="text-align: justify">[Enter Text Here]</p>

</div>

<div style="background-color:rgb(255, 250, 250); padding:5px; padding-left: 1em; padding-right: 1em;">
    
<a id="1"></a>
<h2 style="text-align: justify">1. Import Packages</h2>

<p  style="text-align: justify">[Enter Text Here]</p>

<ul>
<li style="text-align: justify"><a href="http://www.numpy.org/"><code>numpy</code></a>: A standard package primarily used for the manipulation of arrays</li>

<li style="text-align: justify"><a href="https://pandas.pydata.org/"><code>pandas</code></a>: A standard package primarily used for the manipulation of data tables</li>

<li style="text-align: justify"><a href="https://github.com/CIMCB/qcrsc"><code>qcrsc</code></a>: A library of helpful functions and tools provided by the authors</li>


</li>

</ul>

<br>

</div>

In [1]:
import numpy as np
import pandas as pd
import qcrsc           

print('All packages successfully loaded')

# Remove later
%load_ext autoreload
%autoreload 2

All packages successfully loaded


<div style="background-color:rgb(255, 250, 250); padding:5px; padding-left: 1em; padding-right: 1em;">

<a id="2"></a>
<h2 style="text-align: justify">2. Load Data & Peak Sheet</h2>

<p  style="text-align: justify">[Enter Text Here]</p>

<p  style="text-align: justify"><code>qcrsc.load_dataXL()</code> parameters:</p> 

<ul>
    <li><code>filename</code>: The name of the excel file (.xlsx file)</li>
    <li><code>DataSheet</code>: The name of the data sheet in the file. Requires Order, SampleType, Batch.</li>
    <li><code>PeakSheet</code>: The name of the peak sheet in the file. Required Idx, Name, Label.</li>
</ul>   
<br>

</div>

In [2]:
# Import data

home = 'data/'
file = 'MTBLS290.xlsx' 

DataTable, PeakTable = qcrsc.load_dataXL(home + file,'Data','Peak')
DataTableX, PeakTableX = qcrsc.load_dataXL(home + file,'DataTableX','PeakTableX')


Loadings PeakFile: Peak
Loadings DataFile: Data
Data Table is suitable for use with QCRSC
TOTAL SAMPLES: 95 TOTAL PEAKS: 236
Done!
Loadings PeakFile: PeakTableX
Loadings DataFile: DataTableX
Data Table is suitable for use with QCRSC
TOTAL SAMPLES: 95 TOTAL PEAKS: 236
Done!


<div style="background-color:rgb(255, 250, 250); padding:5px; padding-left: 1em; padding-right: 1em;">

<a id="2"></a>
<h2 style="text-align: justify">Clean step</h2>

<p  style="text-align: justify">[Enter Text Here]</p>


</div>

In [3]:
# RSD 30, Dratio 20

rsd = PeakTableX['RSD*_QCT']  
dratio = PeakTableX['DRatio*_QCT']  
PeakTableXClean = PeakTableX[(rsd < 20) & (dratio  < 30)]   
PeakTableXClean = PeakTableXClean.reset_index() # Not needed

print("Number of Peaks remaining: {}".format(len(PeakTableXClean)))

Number of Peaks remaining: 159


<div style="background-color:rgb(255, 250, 250); padding:5px; padding-left: 1em; padding-right: 1em;">

<a id="2"></a>
<h2 style="text-align: justify">3. View Correction Per Peak</h2>

<p  style="text-align: justify">[Enter Text Here]</p>

<br>
<p  style="text-align: justify"><code>qcrsc.peak()</code> parameters:</p> 

<ul>
    <li><code>DataTable</code>: DataTable</li>
    <li><code>PeakTable</code>: PeakTable</li>
    <li><code>batch</code>: e.g. 1 or [1] or [1, 2, 3] or -1 for all</li>
    <li><code>peak</code>: e.g. 'M1' or 'R' for random</li>
    <li><code>gamma</code>: False or (min, max, step) (default (0.5, 5, 0.2))</li>
    <li><code>transform</code>: 'log' or False (default 'log')</li>
    <li><code>parametric</code>: True or False (default 'parametric')</li>
    <li><code>plot</code>: list e.g. ['Sample', QC', 'Blank] or ['Sample', QC'] (default ['Sample', QC'])</li>
    <li><code>zeroflag</code>: True or False (default True)</li>
    <li><code>control_limit</code>: False or ('RSD', value) or ('D-ratio', value) (default False)</li>

</div>

In [4]:

qcrsc.peak(DataTable, 
           PeakTableXClean,
           batch=1,
           peak='r', 
           gamma=(0.5, 5, 0.2), 
           transform='log', 
           parametric=True,
           zero_remove=True, 
           plot=['QC', 'Sample', 'QCT', 'Blank'],
           control_limit={'RSD':30}, 
           colormap='Accent',
           fill_points=False,
           scale_x=1, 
           scale_y=1)

<div style="background-color:rgb(255, 250, 250); padding:5px; padding-left: 1em; padding-right: 1em;">

<a id="2"></a>
<h2 style="text-align: justify">4. PCA Plot</h2>

<p  style="text-align: justify">[Enter Text Here]</p>

<br>
<p  style="text-align: justify"><code>qcrsc.pca_plot()</code> parameters:</p> 

<ul>
    <li><code>DataTable</code>: DataTable</li>
    <li><code>PeakTable</code>: PeakTable</li>
     <li><code>pcx</code>: pc on x-axis e.g. 1 (default 1)</li>
    <li><code>pcy</code>: pc on y-axis e.g. 2 (default 2)</li>
    <li><code>batch</code>: e.g. 1 or [1] or [1, 2, 3] or 'all' (default 'all')</li>
    <li><code>transform</code>: 'log' or False (default 'log')</li>
    <li><code>scale</code>: 'auto', 'unit', 'pareto', 'vast', 'level' or False (default 'auto')</li>
    <li><code>knn</code>: any value (default 3)</li>
    <li><code>plot</code>: list e.g. ['Sample', 'QC', 'QCT', 'Blank], ['Sample', QC'], or 'all' (default 'all')
    <li><code>control_limit</code>: False or dict: {'RSD: 20, 'D-ratio: 30} (default False)</li>
    <li><code>colormap</code>: https://matplotlib.org/3.1.1/gallery/color/colormap_reference.html (default 'Accent')</li>
    <li><code>plot_ellipse</code>: True or False (default True)</li>
    <li><code>alpha_ellipse</code>: tuple or list e.g. (0.05, 0.1) (SAMPLE, QC) (default (0.1, 0.1))</li>
    <li><code>plot_points</code>: True or False (default True)</li>
    <li><code>fill_points</code>: True or False (default True)</li>
    <li><code>scale_x</code>: any value (default 1</li>
    <li><code>scale_y</code>: any value (default 1)</li>
<ul>

</div>

In [5]:
# Before
print("Before PCA")
qcrsc.pca_plot(DataTable, 
               PeakTable, 
               pcx=1,
               pcy=2, 
               batch='all', 
               transform='log', 
               scale='unit',
               knn=3,
               plot=['QC', 'Sample', 'QCT', 'Blank'],
               control_limit={'RSD':20},
               colormap = 'Accent',
               plot_ellipse='all',
               alpha_ellipse = (0.1, 0.2),
               plot_points = True,  
               fill_points = False,
               scale_y = 1,
               scale_x = 1)

# After
print("After PCA")
qcrsc.pca_plot(DataTableX, 
               PeakTableX, 
               pcx=1,
               pcy=2, 
               batch='all', 
               transform='log', 
               scale='unit',
               knn=3,
               plot=['QC', 'Sample', 'QCT', 'Blank'],
               control_limit={'RSD':20},
               colormap = 'Accent',
               plot_ellipse='all',
               alpha_ellipse = (0.1, 0.2),
               plot_points = True,  
               fill_points = False,
               scale_y = 1,
               scale_x = 1)


print("After PCA (Cleaned)")
qcrsc.pca_plot(DataTableX, 
               PeakTableXClean, 
               pcx=1,
               pcy=2, 
               batch='all', 
               transform='log', 
               scale='unit',
               knn=3,
               plot=['QC', 'Sample', 'QCT', 'Blank'],
               control_limit={'RSD':20},
               colormap = 'Accent',
               plot_ellipse='all',
               alpha_ellipse = (0.1, 0.2),
               plot_points = True,  
               fill_points = False,
               scale_y = 1,
               scale_x = 1)

Before PCA


After PCA


After PCA (Cleaned)


<div style="background-color:rgb(255, 250, 250); padding:5px; padding-left: 1em; padding-right: 1em;">

<a id="2"></a>
<h2 style="text-align: justify">5. Distribution Plot</h2>

<p  style="text-align: justify">[Enter Text Here]</p>

<br>
<p  style="text-align: justify"><code>qcrsc.dist_plot()</code> parameters:</p> 

<ul>
    <li><code>DataTable</code>: Requires DataTable </li>
    <li><code>PeakTable</code>: Requires PeakTable </li>
    <li><code>metric</code>: RSD or Dratio</li>
    <li><code>tranform</code>: False or 'log'</li>
    <li><code>parametric</code>: True or False (default True) </li>
    <li><code>batch</code>: e.g. 1 or [1] (default 1)</li>
    <li><code>plot</code>: ["QC", "QCT"]</li>
    <li><code>colormap</code>: based on categorical colormaps https://matplotlib.org/tutorials/colors/colormaps.html </li>
</div>

In [6]:
print("Before DIST")
qcrsc.dist_plot(DataTable,
                PeakTable, 
                parametric = True, 
                batch = [1], 
                plot = 'all',
                colormap = 'Accent',
                scale_x = 1, 
                scale_y = 1,
                padding = 0.10,
                smooth = None,
                alpha = 0.05,
                legend= True)

print("After DIST")
qcrsc.dist_plot(DataTableX,
                PeakTableX, 
                parametric = True, 
                batch = [1], 
                plot = 'all',
                colormap = 'Accent',
                scale_x = 1, 
                scale_y = 1,
                padding = 0.10,
                smooth = None,
                alpha = 0.05,
                legend= True)

print("After DIST (Cleaned)")
qcrsc.dist_plot(DataTableX,
                PeakTableXClean, 
                parametric = True, 
                batch = [1], 
                plot = 'all',
                colormap = 'Accent',
                scale_x = 1, 
                scale_y = 1,
                padding = 0.10,
                smooth = None,
                alpha = 0.05,
                legend= True)

Before DIST


After DIST


After DIST (Cleaned)
