# FAIR and scalable management of small-angle X-ray scattering data  
## Module 3: Data analysis and visualization
### 3.1: Lorentzian fit with SAS-tools

> Authors: Torsten Giess, Richard Schoemig 
> Last modified: 19.12.2022

---

### **Abstract** <a class="anchor" name="abstract"></a>

Using novel packages [sastools](https://github.com/FAIRChemistry/SAS-tools) (version 0.3.2) and [pyAnIML](https://github.com/FAIRChemistry/pyAnIML) (version 1.0.0), as well as packages from the Python 3 standard libary, this notebook provides an integrated fitting tool based on the python packages lmfit and signal.

---

### **Table of Contents** <a class="anchor" name="table_of_contents"></a>

- [Abstract](#abstract)
- [Workflow](#workflow)
    - [User guide](#user_guide)
    - [Preparation](#preparation)
    - [Lorentzian fit with Origin](#origin)
- [Disclosure](#disclosure)

---

### **Workflow** <a class="anchor" name="workflow"></a>

Following is the workflow for Module 3.1: Lorentzian fit with Origin of FAIR and scalable management of small-angle X-ray scattering data.

#### **User guide** <a class="anchor" name="user_guide"></a>

This notebook can be used to create TSV files from data(sets) stored in the AnIML file and export them to external software for further analyses inlcuding Lorentzian fits of diffraction maxima.

#### **Preparation** <a class="anchor" name="preparation"></a>

This section contains the necessary preparations for using this module. Code cells in this section are required regardless of which functionality of this notebook is used. First, the required packages from the [Python 3 standard library](https://docs.python.org/3/library/), the Python Package Index ([PyPI](https://pypi.org/)), and *ad hoc* modules of this work are imported. Then, both current time and path are retrieved and stored in the desired formats.

In [None]:
print("Importing standard library packages.")
from datetime import date
from pathlib import Path
import numpy as np
print ("Done.")

: 

In [None]:
print("Importing PyPI packages.")
from pyaniml import AnIMLDocument
from sastools.analyzer import PeakFitting
from sastools.readers import SeriesReader
print ("Done.")

: 

In [None]:
date_suffix = str(date.today()).replace("-", "")[2:]

: 

In [None]:
cwd = Path.cwd()
path_to_datasets = cwd / "./datasets/"
print(cwd)
print(path_to_datasets)

: 

---

#### **Lorentzian fit with the internal peak fitting tool** <a class="anchor" name="origin"></a>

Extract data from the AnIML file and store it in a Pandas dataframe.

In [None]:
path_to_AnIML_file = path_to_datasets / f"download/fairsaxs_220512/fairsaxs_220512.animl"

: 

In [None]:
with path_to_AnIML_file.open("r") as f:
    xml_string = f.read()
    animl_doc = AnIMLDocument.fromXMLString(xml_string)

: 

In [None]:
reader = SeriesReader(animl_doc)

: 

In [None]:
list_of_IDs = reader.available_seriesIDs
for series_ID in list_of_IDs:
    print(series_ID)

: 

In [None]:
reader.selected_seriesIDs = [list_of_IDs[0]] # , list_of_IDs[3]]
dataframe = reader.create_dataframe()
print(dataframe)

: 

In [None]:
file_name = reader.selected_seriesIDs[0]
if 'OTAB' in file_name:
    compound = 'OTAB'
    print(compound, 'is selected')
elif 'OTAC' in file_name:
    compound = 'OTAC'
    print(compound, 'is selected')
else:
    print('CholPal is selected')

: 

In [None]:
path_to_sastools_peakfitting = path_to_datasets / f'./raw/{compound}_measurement_data/Lorentzian_fitting_data/sastools-curvefitting/'
path_to_plots = path_to_sastools_peakfitting / './plots/'
path_to_fitting_data = path_to_sastools_peakfitting / './fitting_data/'

: 

In [None]:
analyzer = PeakFitting(
    dataframe,                # creating an instance of Analyzer and initialize with the experimental data as pd.DataFrame
    file_name,
    path_to_plots,
    path_to_fitting_data
) 
# analyzer.plot_data()
analyzer.find_peaks_cwt(                        # searching for peaks using the cwt method
    peak_widths= 5,# np.arange(0.5, 10.),             # withs of the peaks that are of interest
    cutoff_amplitude=0.3           
)

: 

In [None]:
analyzer.set_specifications_automatically(
    model_type = 'LorentzianModel',           # model type ('GaussianModel', 'LorentzianModel' or 'VoigtModel')
    tolerance = 0.01                           # tolerated variance of the peak location between model parameters given by 
                                              # the automatic_peak_finding and the final fit parameters (default is 0.5)
)

: 

In [None]:
# analyzer.set_specifications_manually(           # setting up the specifications for the fitting process
#     number_of_models=1,                         # number of models to fit the data 
#     model_specifications = [                    # specifications for every single model to be used for fitting.
#                                                 # Unlike for the automatically set specifications, different model types can be mixed! (See model type)
#                                                 # number_of_models parameter has to match with the actual number of models provided in the input.
#                                                 # (This restriction will be lifted and excess models will be fitted automatically and used as
#                                                 # 'auxiliary models' for small peaks/noise/biases).
#         [
#             'LorentzianModel',                  # model type ('GaussianModel', 'LorentzianModel' or 'VoigtModel')
#             [2.3, 0.85, 1.],                     # model parameters (center, amplitude, sigma) see also https://lmfit.github.io/lmfit-py/builtin_models.html
#             [2.3, 2.4]                           # restriction parameters (lower and upper bound of center parameter)
#         ],                                      # ...
#         [                                       # ..
#             'LorentzianModel',                  # .
#             [2.6, 0.5, 0.1],
#             [2.55, 2.65]
#         ],
#         [
#             'LorentzianModel',
#             [2.7, 1.7, 0.1],
#             [2.65, 2.75]
#         ],
#         [
#             'LorentzianModel',
#             [2.85, 0.27, 0.1],
#             [2.8, 2.9]
#         ],
#         [
#             'LorentzianModel',
#             [3., 0.1, 0.1],
#             [2.95, 3.05]
#         ],
#         [
#             'LorentzianModel',
#             [3.14, 5., 0.05],
#             [3.1, 3.16]
#         ],
#         [
#             'LorentzianModel',
#             [5.9, 0.1, 0.1],
#             [5.85, 5.95]
#         ],
#         [
#             'LorentzianModel',
#             [6.2, 0.7, 0.1],
#             [6.15, 6.25]
#         ]
#     ]
# )

: 

In [None]:
analyzer.fit()                                  # fit the model using the generated specifications set (dict in json format)
analyzer.plot_fit() 

: 

In [None]:
analyzer.list_of_model_centers()


: 

---

### **Disclosure** <a class="anchor" name="disclosure"></a>

**Contributions**

If you wish to contribute to the FAIR Chemistry project, find us on [GitHub](https://github.com/FAIRChemistry)!

**MIT License**

Copyright (c) 2022 FAIR Chemistry

Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
in the Software without restriction, including without limitation the rights
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
copies of the Software, and to permit persons to whom the Software is
furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in all
copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
SOFTWARE.