# FNCT Data Read In and Transformation

This Jupyter Notebook supports in research data management (RDM) of FNCT data and was developed in and using data obtained in the associated laboratory at the [Bundesanstalt für Materialforschung und -prüfung (BAM)](https://www.bam.de). 
In detail, with the help of this script, Excel files generated by an FNCT device are read and transformed into several files. (Note: More information on the FNCT device used here can be found in a corresponding [paper](https://www.doi.org/10.1016/j.polymertesting.2017.09.043).)
Thereby, data is differentiated into primary data, secondary data, and metadata. Furthermore, the resulting files are reviewed and edited with regard to the vocabulary used (terminology) and thus brought into a form that can be easily processed by both humans and machines.

Mapping scripts, for instance [this mapping script](https://github.com/MarkusSchilling/fnct-data-transformation/blob/main/FNCT_Data_Mapping.ipynb) also provided in the [FNCT Data Transformation repository](https://github.com/MarkusSchilling/fnct-data-transformation), can then be used to transform the data into interoperable data. The developments that have emerged in the frame of the [ontoFNCT](https://github.com/MarkusSchilling/ontoFNCT) project are ideal for this. A documentation of the ontoFNCT ontology can be found using the respective namespace [https://w3id.org/ontofnct](https://w3id.org/ontofnct).

The working folder path has to be specified manually (see below, third cell): 
```python 
folder_path = r'Path_to_your_folder'

In [1]:
%%capture
# Installation of necessary Python packages
%pip install pandas
%pip install xlrd

In [2]:
# Import of relevant packages, including specific methods and functions stored in "Module"
import os
import sys
import pandas as pd
import numpy as np
from Module import primary_data_read_in as prim
from Module import metadata_read_in as meta
from Module import secondary_data_read_in as sec

In [None]:
# Specify the folder path where Excel files to be processed are located (format for Windows: folder_path = "C:\\Path\\To\\Your\\Folder" | using double backslashes to escape special characters)
folder_path = r'Path_to_your_folder'
# For using the current working folder, the os.getcwd() method can be used
# folder_path = os.getcwd()

# Dealing with primary data
# Check if "primary_data" folder exists, otherwise create it
primary_data_folder = os.path.join(folder_path, "primary_data")
if not os.path.exists(primary_data_folder):
    os.makedirs(primary_data_folder)
    print(f"Created '{primary_data_folder}' folder.")

# Dealing with metadata
# Check if "metadata" folder exists, otherwise create it
metadata_folder = os.path.join(folder_path, "metadata")
if not os.path.exists(metadata_folder):
    os.makedirs(metadata_folder)
    print(f"Created '{metadata_folder}' folder.")

# Dealing with secondary data
# Check if "secondary_data" folder exists, otherwise create the folder
secondary_data_folder = os.path.join(folder_path, "secondary_data")
if not os.path.exists(secondary_data_folder):
    os.makedirs(secondary_data_folder)
    print(f"Created '{secondary_data_folder}' folder.")
else:
    print(f"Referring to existing '{secondary_data_folder}' concerning secondary data considerations.")

# List all files in the folder
files = os.listdir(folder_path)

# Filter for Excel files only (only Excel files will be considered)
excel_files = [f for f in files if f.endswith(('.xls', '.xlsx'))]

# Process all Excel files included in specified folder
for file in excel_files:
    file_path = os.path.join(folder_path, file)
    prim.primary_data_read_in(file_path, primary_data_folder)
    meta.metadata_read_in(file_path, metadata_folder)
    sec.secondary_data_read_in(file_path, secondary_data_folder, primary_data_folder)

print("Primary Data of all files processed successfully.")
print("Metadata of all files processed successfully.")
print("Secondary Data of all files processed successfully.")


## Creation of Input Mask
Another CSV file is created that allows for manual input of missing data / information (by using, e.g., MS Excel as an editor to include data.).\
In another script ([FNCT_Data_Transformation_Postprocess](https://github.com/MarkusSchilling/fnct-data-transformation/blob/main/FNCT_Data_Transformation_Postprocess.ipynb)), this information can be added to the corresponding CSV files.

In [4]:
%%capture
# Read in FNCT_secondary_data file and extract headers 
# Specify the name of the secondary data CSV file
secondary_data_file = "FNCT_secondary_data.csv"
# Path to existing secondary data CSV file
secondary_data_file_path = os.path.join(secondary_data_folder, secondary_data_file)

# Check if the secondary data CSV file exists
if os.path.exists(secondary_data_file_path):
    # Load existing secondary data into DataFrame
    secondary_data = pd.read_csv(secondary_data_file_path, lineterminator="\n", sep=';', header=[0, 1], dtype=str)
else:
    print(f"No existing secondary data found at {secondary_data_file_path}. No specific input mask can be created.")
    sys.exit()

selected_columns = ['Process_ID', 'Specimen_ID', 'Material', 'Medium', 'Residual fracture surface measured AL1', 'Residual fracture surface measured AL2', 'Notch depth measured nm']
df_secondary_data_selected = secondary_data[selected_columns]

meta_columns = ["Funding Party", "Funding Party ID", "Grant number"]
# Add these columns to the new dataframe
for col in meta_columns:
    df_secondary_data_selected[col] = pd.NA

# Remove any 'Unnamed' columns from df_metadata_info
new_columns = []
for col in df_secondary_data_selected.columns:
    new_col = (col[0], '') if 'Unnamed' in col[1] else col
    new_columns.append(new_col)
df_secondary_data_selected.columns = pd.MultiIndex.from_tuples(new_columns)

# Avoid NA and NaN entries and fill such which an empty string
df_secondary_data_selected = df_secondary_data_selected.fillna('')

meta_list = os.listdir(metadata_folder)
for idx, ID in enumerate(df_secondary_data_selected['Process_ID']):
    meta_file = ID + "_metadata.csv"
    if meta_file in meta_list:
        meta_data = pd.read_csv(metadata_folder+"\\"+meta_file, lineterminator="\n", sep=';', header=[0, 1], dtype=str)
        if not meta_data[meta_columns].dropna().empty:
            content = np.array(meta_data[meta_columns]).flatten()
            df_secondary_data_selected.iloc[idx, -3:] = content


# Write the final DataFrame to a new Excel file in a newly created folder named 'input'
# Check if "input" folder exists, otherwise create it
input_folder = os.path.join(folder_path, "input")
if not os.path.exists(input_folder):
    os.makedirs(input_folder)
    print(f"Created '{input_folder}' folder.")

output_excel_path = os.path.join(input_folder, 'FNCT_Input_Information.csv')
df_secondary_data_selected.to_csv(output_excel_path, index=False, sep=";", encoding='utf-8-sig')