# Data Reduction Input File Generator

Author: Eike Gericke

This is a script to generate an input file for the script >pyASAXS_DataReduction<. \
This script scans through a specified directory (data_dir) and lists all containing files of the target file type (*.dat). \
These target files are assumed to be ASCII type files generated from HDF5 files from the PTB-FCM beamline (Physikalisch Technische Bundesanstalt, Fore-Crystal Monochromator beamline). These files have a header (about 375 lines) containing all motors and detectors' initial positions and values in the beamline. After the header always comes a table with one row per experimental point. For SAXS and ASAXS experiments, each row represents the information relevant for one single scattering image. 
\
The information to read from the files are defined in >deviceList< \
The information is first searched in the table, assuming them to vary between the rows. If a piece of information is not in the table, the header is searched. If it is not in the header >NaN< is written. 
\
Information to later set by hand are defined in >deviceByHandList< \
Information with a standard value are defined in >deviceStandardNameList< and >deviceStandardValueList< \
\
Finally, an output ASCII file is generated containing the relevant information for the data treatment. \
Some changes (like filling the NaN entries with reasonable values and defining sample names) need to be done by the user. 
\
\
The following information must be entered concerning the files: \
The file directory \
The file data type like: .dat \
The length of the data header \
The structure of entries in the header (separator, digits to skip after entry

## Libraries

In [1]:
import pandas as pd
import os

## Data location and structure

In [3]:
# give the metadada directory
data_dir = 'files\Data_for_ASAXS'

# give the target file tyle:
target_file_type = '.dat'

# give the output file name
outputFileName = data_dir.split(os.sep)[-1] + '_ASCII_out.txt'

# Concerning the header
# Number of header line
header_lines = 375
# Structure of information in header
seperatorBeforeInfo = '\t'
seperatorAfterInfo = '\t = '
digitsAfterIdentifier = 12 # this is the length of the entry after the identifier you are searching for


# List of experimental divices/information nessesary for the data reduction
deviceList = ['Time','VacSampleX', 'VacSampleY','Energy','Pilatus_Tiff','Pilatus_Trigger','Pilatus_filename','PilatusAcqTime','Keysight1','Keysight2','Keysight3','Keysight4','Keysight1:StandardDeviation','Keysight2:StandardDeviation','Keysight3:StandardDeviation','Keysight4:StandardDeviation','PilatusPos']
# List of values to set by hand
deviceByHandList = ['SampleName','SampleThickness','Empty_Index','Background_Index','Reference_Index','MaskFile']
# List of values to change by hand if nessesary
deviceStandardNameList = ['BackGround_SubtractionFactor','x_Center','y_Center','SampleToDet_Distance']
deviceStandardValueList = ['1.0','457.0','557.0','0.802'] # x/y Center in pixel, distance in meter

## Data handling

In [6]:
# scan the directory
fileList = list()
for entry in os.scandir(data_dir):
    if entry.path.endswith(target_file_type) and entry.is_file():
        fileList.append(entry.path)
# make an path iterator from the found files
fileIterator = iter(fileList)
metaFrame=pd.DataFrame()

for element in fileIterator:
    path_to_data = element
    print(element)
    dF = pd.read_csv(path_to_data, sep='\t',header=header_lines)
    dF = dF.drop([0]) # drop the NaN lines
    # suggesting there is a '# ' directly before the first header entry, drop it
    dfInitialColumnName = dF.columns[0]
    dfNewColumnName = dF.columns.tolist()[0][2:]
    dF = dF.rename(columns={dfInitialColumnName: dfNewColumnName}) # get rid of the '# ' in the first column name
    list_dF_columns = dF.columns.values.tolist()
    
    deviceIterator = iter(deviceList)
    device = next(deviceIterator)
    if device in list_dF_columns:
        deviceFrame = pd.DataFrame({device:dF[device]})
        #print(device)
    else:
        print('Error: First device is not in experimental data:',element)
    
    
    for device in deviceIterator:
        if device in list_dF_columns:
            deviceFrame = deviceFrame.join(dF[device])
        else:
            searchString = seperatorBeforeInfo + device + seperatorAfterInfo
            InfoStringLength = len(device) + len(seperatorAfterInfo)
        
            dataFile = open(path_to_data, 'r')
            resultPosition = dataFile.read().find(searchString)
        
            if resultPosition != -1:
                dataFile = open(path_to_data, 'r')
                longText = dataFile.read(resultPosition+InfoStringLength+digitsAfterIdentifier)
                result = longText[resultPosition+InfoStringLength:]
                deviceFrame[device]=float(result)
                print('Device >>>', device, '<<< is taken as constant from the header with a value of',float(result))
            else:
                deviceFrame[device]='NaN'
                print('Error: Device >>>',device, '<<< not in the data and not in header taken as NaN :(')
                
    metaFrame = metaFrame.append(deviceFrame)

metaFrame = metaFrame.reset_index(drop=True)
for device in deviceByHandList:
    metaFrame[device] = 'NaN'
for device in deviceStandardNameList:
    metaFrame[device] = deviceStandardValueList[deviceStandardNameList.index(device)]
    

out_path = data_dir + os.sep + outputFileName
metaFrame.to_csv(out_path, sep='\t')

files\Data_for_ASAXS\fcm_2020kw29tg14_00091.dat
Device >>> Energy <<< is taken as constant from the header with a value of 7000.002692
files\Data_for_ASAXS\fcm_2020kw29tg14_00093.dat
Device >>> Energy <<< is taken as constant from the header with a value of 8003.995109
files\Data_for_ASAXS\fcm_2020kw29tg14_00095.dat
Device >>> Energy <<< is taken as constant from the header with a value of 8238.985877
files\Data_for_ASAXS\fcm_2020kw29tg14_00097.dat
Device >>> Energy <<< is taken as constant from the header with a value of 8303.985331
files\Data_for_ASAXS\fcm_2020kw29tg14_00099.dat
Device >>> Energy <<< is taken as constant from the header with a value of 8323.985784
files\Data_for_ASAXS\fcm_2020kw29tg14_00101.dat
Device >>> Energy <<< is taken as constant from the header with a value of 8329.990438
files\Data_for_ASAXS\fcm_2020kw29tg14_00103.dat
Device >>> Energy <<< is taken as constant from the header with a value of 8973.015959
files\Data_for_ASAXS\fcm_2020kw29tg14_00104.dat
Device 