# FAIR and scalable management of small-angle X-ray scattering data  
## Module 3: Data analysis and visualisation

> Authors: Torsten Giess, Selina Itzigehl, Jan Range, Johanna Bruckner, Juergen Pleiss  
> Last modified: 16.03.2022

---

### **Abstract** <a class="anchor" name="abstract"></a>

Lorem ipsum dolor ...

---

### **Table of Contents** <a class="anchor" name="table_of_contents"></a>

- [Abstract](#abstract)
- [Workflow](#workflow)
    - [User guide](#user_guide)
    - [Preparation](#preparation)
    - [Submodule 3.1&nbsp;&nbsp;&nbsp;Lorentzian fit with Origin](#three_one)
    - [Submodule 3.2&nbsp;&nbsp;&nbsp;Analysis and visualization with Python](#three_two)
- [Disclosure](#disclosure)

---

### **Workflow** <a class="anchor" name="workflow"></a>

Following is the workflow for Module 3 (3.1 and 3.2) of FAIR and scalable management of small-angle X-ray scattering data.

#### **User guide** <a class="anchor" name="user_guide"></a>

Lorem ipsum...

#### **Preparation** <a class="anchor" name="preparation"></a>

This section contains the necessary preparations for using this module. Code cells in this section are required regardless of which functionality of this notebook is used. First, the required packages from the [Python 3 standard library](https://docs.python.org/3/library/), the Python Package Index ([PyPI](https://pypi.org/)), and *ad hoc* modules of this work are imported. Then, both current time and path are retrieved and stored in the desired formats.

In [1]:
print("Importing standard library packages.")
from datetime import date
from pathlib import Path
from typing import List
print ("Done.")

Importing standard library packages.
Done.


In [36]:
print("Importing PyPI packages.")
import numpy as np
import pandas as pd
from plotly import express as px
import plotly.io as pio
pio.renderers.default = "iframe"
from pyaniml import AnIMLDocument
print ("Done.")

Importing PyPI packages.
Done.


In [3]:
print("Importing local packages.")
from modules.originreader import LorentzianReader
from modules.tsvwriter import TSVWriter
print("All done.")

Importing local packages.
Initializing logger for 'modules.originreader'.
Loading logger configuration from 'logcfg.json'.
Done.
Initializing logger for 'modules.tsvwriter'.
Loading logger configuration from 'logcfg.json'.
Done.
All done.


In [4]:
date_suffix = str(date.today()).replace("-", "")[2:]

In [5]:
cwd = Path.cwd()
path_to_datasets = cwd / "./datasets/"

In [6]:
def calculate_scattering_vector(d: float) -> float:
    q = (2 * np.pi) / (d / 10)
    return q

def calculate_linear_regression(m: float, x: float, b: float) -> float:
    y = m * x + b
    return y

def calculate_lattice_plane(q: float) -> float:
    d = (2 * np.pi) / q
    return d

def calculate_lattice_ratio(d: float, d_0: float) -> float:
    d_ratio = d / d_0
    return d_ratio

def determine_phase(d_ratios: List) -> bool:
    H1 = [
        (1 / np.sqrt(3)),
        (1 / np.sqrt(4)),
        (1 / np.sqrt(7)),
        (1 / np.sqrt(9)),
    ]
    V1 = [
        (1 / np.sqrt(2)),
        (1 / np.sqrt(3)),
        (1 / np.sqrt(4)),
        (1 / np.sqrt(5)),
    ]
    La = [(1 / 2), (1 / 3), (1 / 4), (1 / 5)]
    
    for i, j in enumerate(d_ratios):
        if (abs(d_ratios[i] - H1[i])) < 0.03:
            return "hexagonal"
        elif (abs(d_ratios[i] - V1[i])) < 0.03:
            return "cubic"
        elif (abs(d_ratios[i] - La[i])) < 0.03:
            return "lamellar"
        else:
            return "indeterminate"

def calculate_a_H1(d: float, h: int, k: int) -> float:
    a_H1 = d * np.sqrt((4/3)*((h**2 + k**2 + (h * k))))
    return a_H1


def calculate_a_V1(d: float, h: int, k: int, l: int) -> float:
    a_V1 = d * (np.sqrt((h ** 2) + (k ** 2) + (l ** 2)))
    return a_V1

## Mockup To-Do

### Module 3.1
1. Export data from AnIML to TSV (for use in Origin) -> later
2. Import TXT from Origin via LorentzianReader

### Module 3.2
3. Calibration
4. Calculations
5. Visualisation  
    5.1 plotly  
    5.2 matplotlib  
6. Phase diagram

---

#### **Submodule 3.1&nbsp;&nbsp;&nbsp;Lorentzian fit with Origin** <a class="anchor" name="three_one"></a>

lorem ipsum...

Export $q$ and $I$ to TSV for export to Origin:

In [7]:
path_to_AnIML_file = path_to_datasets / f"download/fairsaxs_220316.animl"

In [8]:
with path_to_AnIML_file.open("r") as f:
    xml_string = f.read()
    animl_doc = AnIMLDocument.fromXMLString(xml_string)

In [None]:
path_to_TSV_file = path_to_datasets / f"processed/test/fairsaxs_220316.tsv"

In [None]:
writer = TSVWriter(animl_doc)

In [None]:
list_of_IDs = writer.available_seriesIDs()
print(list_of_IDs)

In [None]:
writer.add_seriesID([list_of_IDs[0], list_of_IDs[3]])
dataframe = writer.create_dataframe()
writer.create_tsv(df=dataframe, path=path_to_TSV_file)

Import Lorentzian data from Origin TXT output file:

In [9]:
available_txt_files = [file for file in (path_to_datasets / "raw/test/").glob("*.txt")]
print([file.name for file in available_txt_files])

['CholPal_20220214_lorentz.txt', 'OTAB_078wtp_T060_lorentz.txt', 'OTAB_082wtp_T025_lorentz.txt', 'OTAB_095wtp_T030_lorentz.txt']


In [10]:
dict_of_df = {}
i=0
for file in available_txt_files:
    dict_of_df[available_txt_files[i].name] = LorentzianReader(available_txt_files[i]).get_xc_dataframe()
    i += 1
print(dict_of_df.keys())

21:35:30 - modules.tsvwriter - DEBUG: Constructor called, 'LorentzianReader'@0x22491af56c0 initialised.
21:35:30 - modules.tsvwriter - DEBUG: Data extracted from 'C:\Users\ac138949\Documents\GitHub\SAXS-workflow\notebooks\datasets\raw\test\CholPal_20220214_lorentz.txt'.
21:35:30 - modules.tsvwriter - DEBUG: Destructor called, 'LorentzianReader'@0x22491af56c0 deleted.
21:35:30 - modules.tsvwriter - DEBUG: Constructor called, 'LorentzianReader'@0x224915f3eb0 initialised.
21:35:30 - modules.tsvwriter - DEBUG: Data extracted from 'C:\Users\ac138949\Documents\GitHub\SAXS-workflow\notebooks\datasets\raw\test\OTAB_078wtp_T060_lorentz.txt'.
21:35:30 - modules.tsvwriter - DEBUG: Destructor called, 'LorentzianReader'@0x224915f3eb0 deleted.
21:35:30 - modules.tsvwriter - DEBUG: Constructor called, 'LorentzianReader'@0x224915f3eb0 initialised.
21:35:30 - modules.tsvwriter - DEBUG: Data extracted from 'C:\Users\ac138949\Documents\GitHub\SAXS-workflow\notebooks\datasets\raw\test\OTAB_082wtp_T025_lor

---

#### **Submodule 3.2&nbsp;&nbsp;&nbsp;Analysis and visualization with Python** <a class="anchor" name="three_two"></a>

lorem ipsum...

In [11]:
d_from_literature = [52.49824535, 26.24912267, 17.49941512]
q_cholpal_literature = [calculate_scattering_vector(d) for d in d_from_literature]

In [12]:
slope, intercept = np.polyfit(
    x=dict_of_df[available_txt_files[0].name]["value"].tolist(),
    y=q_cholpal_literature,
    deg=1
)
q_corrected = [calculate_linear_regression(slope, value, intercept) for value in dict_of_df[available_txt_files[2].name]["value"].tolist()]
print(q_corrected)

[2.5338410141474608, 4.432579919820964]


In [13]:
d_measured = [calculate_lattice_plane(q) for q in q_corrected]
print(d_measured)

[2.479707792279791, 1.4175007379073652]


In [14]:
d_ratio = [calculate_lattice_ratio(d, d_measured[0]) for d in d_measured[1:]]
print(d_ratio)

[0.5716402320953087]


In [15]:
phase = determine_phase(d_ratio)
if phase == "hexagonal":
    h = [1, 1, 2, 2, 3]
    k = [0, 1, 0, 1, 0]
    a_hex = []
    for i, j in enumerate(d_measured):
        a_i = calculate_a_H1(d_measured[i], h[i], k[i])
        a_hex.append(a_i)
    phase_information = [phase, (np.mean(a_hex))]

elif phase == "cubic":
    h = [1, 1, 2, 2, 2]
    k = [0, 1, 0, 1, 2]
    l = [0, 1, 0, 1, 2]
    a_cub = []
    for i, j in enumerate(d_measured):
        a_i = calculate_a_V1(d_measured[i], h[i], k[i], l[i])
        a_cub.append(a_i)
    phase_information = [phase, (np.mean(a_cub))]

elif phase == "lamellar":
    phase_information = [phase, d_measured[0]]

else:
    phase_information = ["indeterminate", "-"]

In [16]:
print(d_measured)

[2.479707792279791, 1.4175007379073652]


In [17]:
print(phase_information)

['hexagonal', 2.849160699291715]


**VISUALISATION**

In [18]:
writer = TSVWriter(animl_doc)

21:35:36 - modules.tsvwriter - DEBUG: Constructor called, 'TSVWriter'@0x22491af7d30 initialised.


In [19]:
list_of_IDs = writer.available_seriesIDs()
print(list_of_IDs)

['CholPal_20220214', 'OTAB_078wtp_T060', 'OTAB_082wtp_T025', 'OTAB_095wtp_T030', 'OTAB_100wtp_T090']


In [40]:
writer.add_seriesID([ID for ID in list_of_IDs])
dataframe = writer.create_dataframe()

In [54]:
newlist = []
q_list = []
i_list = []
for index in range(len(dataframe.columns))[2:]:
    newlist.append(dataframe.iloc[:, index].tolist())
new_df = dataframe.iloc[:, [0, 1]]
    for _ in range range(len(newlist)):
    if _ % 2:
        q_list.append(newlist[i])
        
    else:
        i_list.append(newlist[i])

In [41]:
new_df = dataframe.iloc[:, [0, 1]]
print(new_df)
plot_df = pd.concat([new_df, dataframe.iloc[:, [2, 3]]], ignore_index=True)
print(plot_df)

      CholPal_20220214_q  CholPal_20220214_i
0               0.114488        2.772433e-12
1               0.121128        8.453862e-12
2               0.127769        1.096632e-12
3               0.134409        1.764940e-12
4               0.141050        7.023565e-13
...                  ...                 ...
1245                 NaN                 NaN
1246                 NaN                 NaN
1247                 NaN                 NaN
1248                 NaN                 NaN
1249                 NaN                 NaN

[1250 rows x 2 columns]
      CholPal_20220214_q  CholPal_20220214_i  OTAB_078wtp_T060_q  \
0               0.114488        2.772433e-12                 NaN   
1               0.121128        8.453862e-12                 NaN   
2               0.127769        1.096632e-12                 NaN   
3               0.134409        1.764940e-12                 NaN   
4               0.141050        7.023565e-13                 NaN   
...                  ...   

In [21]:
figure = px.line(dataframe, x="CholPal_20220214_q", y="CholPal_20220214_i")

In [22]:
figure.show()

---

### **Disclosure** <a class="anchor" name="disclosure"></a>

**Contributions**

If you wish to contribute to the FAIR Chemistry project, find us on [GitHub](https://github.com/FAIRChemistry)!

**MIT License**

Copyright (c) 2022 FAIR Chemistry

Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
in the Software without restriction, including without limitation the rights
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
copies of the Software, and to permit persons to whom the Software is
furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in all
copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
SOFTWARE.