# Exercises for using NumPy
Import the NumPy module

In [1]:
import numpy as np


## Warm-Up Exercises

1. **Create a NumPy array**: Create a NumPy array of integers from 1 to 10.

2. **Array shape**: Find the shape of the array you created.

3. **Array data type**: Find the data type of the array. Try to convert it to a different data type.

4. **Array operations**: Perform basic arithmetic operations (addition, subtraction, multiplication, division) on the array.

5. **Reshape array**: Reshape the array into a 2x5 matrix.

6. **Indexing and slicing**: Access the third element of the array and slice the array to get the first 5 elements.

7. **Array statistics**: Calculate the sum, mean, and standard deviation of the array.

8. **Boolean indexing**: Create a boolean array that selects only the even numbers from the original array.

9. **Broadcasting**: Add a scalar (e.g., 5) to the original array using broadcasting.

In [2]:
# 1
arr1 = np.arange(1, 11)

# 2
print(arr1.shape)

# 3
print(arr1.dtype)

# 4
print(arr1 - np.random.rand(10))

# 5
print(np.reshape(arr1, (2,5)))

# 6
print(arr1[2], arr1[:5])

#7
print(np.sum(arr1), np.mean(arr1), np.std(arr1))

# 8
print(arr1[arr1 % 2 == 0])

# 9
print(arr1 + 5)


(10,)
int64
[0.97003961 1.35910972 2.50627871 3.87222668 4.45183602 5.18331718
 6.81442792 7.44675419 8.46672628 9.56540818]
[[ 1  2  3  4  5]
 [ 6  7  8  9 10]]
3 [1 2 3 4 5]
55 5.5 2.8722813232690143
[ 2  4  6  8 10]
[ 6  7  8  9 10 11 12 13 14 15]


## Working with XRD data.
The  file `../Data/exercise/fco.txt` contains a cut output file from a crystallographic refinement. For simplicity, you can rename the variables $A$-$F$, as the context is not really that important here.

Here are the exercises for working with the SC-XRD data:

1. **Read the data**: Use NumPy to read the data from the file `data/fco.txt` mentioned above. The data contains six columns:
    - $h$ : Integer, Miller index
    - $k$ : Integer, Miller index
    - $l$ : Integer, Miller index
    - $F^2_\text{calc,i}$ : Calculated scaled intensity from the model
    - $F^2_\text{obs,i}$ : Observed scaled intensity from the model
    - $\sigma_i$ : Estimated standard deviation of the scaled observed intensity

Note: Make sure that you take into account that the data file already contains the squared values.


In [3]:
h, k, l, fsq_calc, fsq_obs, sigma = np.loadtxt('../Data/exercise/fco.txt', unpack=True, dtype='i8,i8,i8,f8,f8,f8')


2. **Calculate the mean I/sigma**: Calculate the mean of the ratio of observed intensity to the estimated standard deviation for the dataset. The formula for calculating the mean is:

   $$\overline{I/\sigma} = \frac{\sum_{i=1}^{n} \frac{F^2_\text{obs,i}}{\sigma_i}}{n}$$

   where $F^2_\text{obs,i}$ is the observed absolute squared structure factor and $\sigma_i$ is the estimated standard deviation for the $i$ th data point.


In [4]:
np.mean(fsq_obs / sigma)

np.float64(22.894137052287043)


3. **Calculate $R_1$**: Calculate the R1 value for the dataset. R1 is a measure of the agreement between the observed and modelled absolute squared structure factors. The formula for calculating R1 is:

   $$R_1 = \frac{\sum_{i=1}^{n} ||F^2_\text{calc,i} - F^2_\text{obs,i}||}{\sum_{i=1}^{n} |F^2_\text{obs,i}|}$$

   where $F^2_\text{obs,i}$ is the observed absolute squared structure factor and $F^2_\text{calc,i}$ is the modelled absolute squared structure factor for the $i$th data point.


In [5]:
np.sum(np.abs(fsq_calc - fsq_obs)) / np.sum(np.abs(fsq_obs))

np.float64(0.08166633291590505)


4. **Calculate wR2**: Calculate the weighted R2 value for the dataset. wR2 is another measure of the agreement between the observed and modelled absolute squared structure factors, taking into account the estimated standard deviation. The formula for calculating wR2 is:

   $$wR_2 = \sqrt{\frac{\sum_{i=1}^{n} \left( \frac{F^2_\text{obs,i} - F^2_\text{calc,i}}{\sigma_i} \right)^2}{\sum_{i=1}^{n} \left( \frac{F^2_\text{obs,i}}{\sigma_i} \right)^2}}$$

   where $F^2_\text{obs,i}$ is the observed absolute squared structure factor, $F^2_\text{calc,i}$ is the modelled absolute squared structure factor, and $\sigma_i$ is the estimated standard deviation for the $i$th data point.


In [6]:
np.sqrt(np.sum(((fsq_calc - fsq_obs) / sigma)**2) / np.sum((fsq_obs/sigma)**2))

np.float64(0.15807699516903526)

# Bonus exercise: Systematic absenses
The high temperature phase of the molecule crystallises in the space group $C2/c$ while the low temperature space group is $P2_1/c$. Decide which space group this structure belongs to by calculating the $\overline{I/\sigma}$ of the intensities that need to be absenst.
Use the following information:
 - For $C2/c$ the following reflections should be absent, with $n$ being any integer value:
    - C-centred lattice: $h + k = 2n + 1$ 
    - $c$-glide plane perpendicular to b: $k = 0$ and $l = 2n + 1$
 - For $P2_1/n$ the following reflections should be absent, with $n$ being any integer value:
    - $2_1$ screw axis parallel to b: $h = 0$ and $k = 2n + 1$ and $l = 0$
    - $c$-glide plane perpendicular to b: $k = 0$ and $l = 2n + 1$

Hint: Use the modulo (`%`) operator to test whether the division by two leaves a remainder of one.

In [7]:
c_centred_condition = (h + k) % 2 == 1
c_glide_condition = (k == 0) & (l % 2 == 1)

c2ovc_condition = np.logical_or(c_glide_condition, c_centred_condition)
print('In C2/c the mean I/sigma for values that should be zero is ', np.mean(fsq_obs[c2ovc_condition] / sigma[c2ovc_condition]))

screw_condition = (h == 0) & (k % 2 == 1) & (l == 0)
p21ovc_condition = np.logical_or(c_glide_condition, screw_condition)
print('In P2(1)/c the mean I/sigma for values that should be zero is ', np.mean(fsq_obs[p21ovc_condition] / sigma[p21ovc_condition]))

In C2/c the mean I/sigma for values that should be zero is  15.923100074221722
In P2(1)/c the mean I/sigma for values that should be zero is  0.6655857510376365
