<a href="https://colab.research.google.com/github/chenweioh/GCP-Inspector-Toolkit/blob/main/MassLynx_Export_Data_Inspection_Toolkit.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# MassLynx Export Data Inspection Toolkit - User Manual

---

## Introduction:
This toolkit allows users to easily inspect and analyze data exported from MassLynx 4.2. With a few steps, users can upload their data, visualize the internal standard, check time gaps between each injection, and identify any instances of manual integration flags.

## Prerequisites:
- Ensure you have the necessary data files exported from MassLynx 4.2 in a `.txt` format.
- It is recommended to use Google Chrome for the best user experience, as I develope this tool in Colab using Chrome.

## Step-by-step Guide:

### 1. Data Upload:
Begin by uploading the data files you wish to inspect. The toolkit will automatically process `.txt` files, reading the data before the "Compound 2:" line.

### 2. Visualizing Internal Standard:
Once the data is uploaded:
- The toolkit will generate scatter plots for each unique type in the 'Type' column against the 'IS Area'.
- The plots will be saved and bundled into a single Word document for you.

### 3. Time Gap Inspection:
This feature examines the time gaps between each injection:
- The average time gap for each file is calculated.
- Any gaps exceeding 20% of the average are highlighted and detailed in a comprehensive report.

### 4. Manual Integration Flag Check:
To ensure data integrity:
- You will be prompted to enter a keyword for manual integration (default keyword is 'mm').
- The toolkit then inspects each cell in the data for instances of the keyword. If found, the filename and row name will be recorded.
  
## File Outputs:
- Internal Standard Plots: `plots.docx`
- Time Gap Analysis Report: `time_gap_analysis.docx`
- Manual Integration Review: `manual_integration_review.docx`

**Note**: The aforementioned files will automatically download upon completion of their respective processes. They can typically be found in your browser's default download location or the "Downloads" folder.

## Conclusion:
The MassLynx Export Data Inspection Toolkit provides a streamlined and efficient approach to inspecting and reviewing your exported MassLynx 4.2 data. Ensure you regularly review this manual for any updates or additional features added to enhance your data analysis experience.

Happy Analyzing!

In [None]:
from google.colab import files
import pandas as pd

def upload_and_load_txt():
    uploaded = files.upload()
    list_of_dataframes = []
    filenames = []

    for filename in uploaded.keys():
        # Read the file line by line and stop before "Compound 2:"
        lines = []
        try:
            with open(filename, 'r', encoding='utf-8') as f:
                for line in f:
                    if "Compound 2:" in line:
                        break
                    lines.append(line)
        except UnicodeDecodeError:
            with open(filename, 'r', encoding='latin1') as f:
                for line in f:
                    if "Compound 2:" in line:
                        break
                    lines.append(line)

        # Convert the list of lines back to a single string
        data_str = '\n'.join(lines)

        # Now you can use pandas to convert this string to DataFrame
        separator = "\t"  # Change to "," or other as needed
        header_row = 5  # Adjust as needed

        try:
            from io import StringIO
            df = pd.read_csv(StringIO(data_str), sep=separator, header=header_row)
            print(f"Data from {filename}:")
            print(df.head())  # Debug: print the first few rows
            list_of_dataframes.append(df)
            filenames.append(filename)
        except Exception as e:
            print(f"Error processing file: {filename}. Error: {e}")

    return list_of_dataframes, filenames

# Execute the function and store the list of DataFrames along with their filenames
list_of_dataframes, filenames = upload_and_load_txt()


In [None]:
!pip install python-docx

import matplotlib.pyplot as plt
from docx import Document
import io
from google.colab import files
import pandas as pd
import docx.shared

def plot_and_save_graphs(dataframes, filenames):
    # Create a new Word document
    doc = Document()

    for df, filename in zip(dataframes, filenames):
        # Treat blank cells in 'IS Area' as 0 and convert 'IS Area' and 'Unnamed: 0' to numeric

        df['IS Area'] = pd.to_numeric(df['IS Area'], errors='coerce')
        df['IS Area'] = df['IS Area'].fillna(0)
        df['Unnamed: 0'] = pd.to_numeric(df['Unnamed: 0'], errors='coerce')

        # Indicate that the plot is being generated
        print(f"Generating plot for {filename}...")

        # Plotting the graph
        plt.figure(figsize=(10, 6))
        for t in df['Type'].unique():
            subset = df[df['Type'] == t]
            plt.scatter(subset['Unnamed: 0'], subset['IS Area'], label=t)

        # Additional plot settings
        plt.xlabel('Index')
        plt.ylabel('IS Area')
        plt.title(f'Plot from {filename}')
        plt.legend()

        # Save the plot to a BytesIO object
        buf = io.BytesIO()
        plt.savefig(buf, format='png')
        buf.seek(0)

        # Add the plot to the Word document
        doc.add_picture(buf, width=docx.shared.Inches(6))
        plt.close()

        # Indicate that the plot has been generated and added to the document
        print(f"Plot for {filename} generated and added to the document.")

    # Save the Word document to a BytesIO object
    buf = io.BytesIO()
    doc.save(buf)
    buf.seek(0)

    # Download the Word document with the graphs
    with open("plots.docx", "wb") as f:
        f.write(buf.getvalue())
    files.download("plots.docx")

    print("The Word document with plots has been downloaded. Check your 'Downloads' folder or the browser's default download location.")

# Assume list_of_dataframes and filenames are already defined
plot_and_save_graphs(list_of_dataframes, filenames)


Generating plot for 010822_S005&S006_01_00 (17).txt...
Plot for 010822_S005&S006_01_00 (17).txt generated and added to the document.
Generating plot for 010822_S007&S008_01_00 (11).txt...
Plot for 010822_S007&S008_01_00 (11).txt generated and added to the document.
Generating plot for 020822_S009&S010_01_00 (9).txt...
Plot for 020822_S009&S010_01_00 (9).txt generated and added to the document.
Generating plot for 020822_S011&S012_01_00 (3).txt...
Plot for 020822_S011&S012_01_00 (3).txt generated and added to the document.
Generating plot for 020822_S011&S012_01_01 (3).txt...
Plot for 020822_S011&S012_01_01 (3).txt generated and added to the document.
Generating plot for 030822_S013&S014_01_00 (3).txt...
Plot for 030822_S013&S014_01_00 (3).txt generated and added to the document.
Generating plot for 030822_S013&S014_01_01 (3).txt...
Plot for 030822_S013&S014_01_01 (3).txt generated and added to the document.
Generating plot for 030822_S015&S016_01_00 (3).txt...
Plot for 030822_S015&S016

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

The Word document with plots has been downloaded. Check your 'Downloads' folder or the browser's default download location.


In [None]:
from datetime import datetime
from docx import Document
from google.colab import files
import pandas as pd

# Function to process each dataframe
def process_dataframe(df, filename, document):
    # Convert the 'Acq.Time' column to datetime format
    df['Acq.Time'] = pd.to_datetime(df['Acq.Time'], format='%H:%M:%S').dt.time

    # Compute time gaps in seconds and add it to the dataframe
    time_gap = [(datetime.combine(datetime.min, t) - datetime.combine(datetime.min, s)).seconds for s, t in zip(df['Acq.Time'], df['Acq.Time'][1:])]
    time_gap = [0] + time_gap  # Add a zero for the first row
    df['Time Gap'] = time_gap

    # Compute the average time gap
    avg_gap = sum(time_gap) / len(time_gap)

    document.add_paragraph(f"File: {filename}")
    document.add_paragraph(f"Average Time Gap: {avg_gap:.2f} seconds")

    # Identify and print out the rows with time gaps exceeding 20% of average
    excessive_gaps = []

    for idx, gap in enumerate(time_gap):
        if gap > 1.2 * avg_gap:
            excessive_gaps.append(f"Line {idx + 1} (Time: {df.iloc[idx]['Acq.Time']}): Time Gap of {gap} seconds is more than 20% above average. Corresponding row: {df.iloc[idx]['Name']}.")

    if excessive_gaps:
        for message in excessive_gaps:
            document.add_paragraph(message)
    else:
        document.add_paragraph("No time gaps more than 20% of average noted.")
    document.add_paragraph("\n")  # Add a space for clarity

document = Document()
document.add_heading('Subject Run review: Time Gap')

for i, df in enumerate(list_of_dataframes):
    process_dataframe(df, filenames[i], document)

document.save('time_gap_analysis.docx')
files.download('time_gap_analysis.docx')


<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

In [None]:
from docx import Document

# Prompt the user for the keyword for manual integration, default to "mm"
keyword = input("Enter the keyword for manual integration (default is 'mm'): ")
if not keyword:
    keyword = "mm"

document = Document()
document.add_heading('Manual Integration Review')
document.add_paragraph(f"Keyword searched for manual integration: '{keyword}'")

# Loop through each DataFrame and its corresponding filename
for i, df in enumerate(list_of_dataframes):
    filename = filenames[i]

    # A flag to check if any instance of the keyword was found in the current DataFrame
    found = False

    # Check each cell in the DataFrame for the keyword
    for idx, row in df.iterrows():
        if keyword in row.astype(str).values:
            # If found, set the flag to True and add a note to the document
            found = True
            document.add_paragraph(f"In file {filename}, manual integration found at row with Name: {row['Name']}.")

    # If keyword was not found in the entire DataFrame, add a note to the document
    if not found:
        document.add_paragraph(f"In file {filename}, no instances of manual integration were found.")

# Save the document
document.save('manual_integration_review.docx')

# Download the document
from google.colab import files
files.download('manual_integration_review.docx')


Enter the keyword for manual integration (default is 'mm'): mm


<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

In [None]:
#Diagnostic tool
for df in list_of_dataframes:
    print(df.columns)
for df in list_of_dataframes:
    print(df.dtypes)