# Basic I/O of DFPT
The density functional perturbation theory (DFPT) specific methods and attributes are available in 'file_readwrite.py' since version 2022.x.x.

## Class Crystal_output

Class Crystal_output can be used to extract the harmonic frequency calculation data from the output.

### Methods
**get_qpoint()**  
Get DFT total energy of the system and the list of q points where the vibrational frequency is calculated. Dispersion output (multiple q points) is supported. 

To avoid ambigious definition, the search for DFT total energy is integrated into this method. To consider possible empirical corrections, the total energy reported at the central point when taking the numerical secondary derivatives is used, instead of the converged value at the final step of SCF. Unit: KJ/mol cell

**get_mode()**  
Get frequencies of all modes at all the q points sampled. Unit: THz.

**get_eigenvector()**  
Get the eigenvectors of all normal modes. Eigenvectors are normalised to classical amplitudes. Unit: Angstrom

**clean_imaginary()**  
Substitute the negative frequencies with numpy NaN values and keep the original shape of numpy arrays. Not a standalone method. Used after get_mode(). The threshold of a negative frequency is -0.0001 THz.

### Attributes

* **self.lattice**, \_\_init\_\_, pymatgen structure objection, for geometry information.
* **self.edft**, get_edft, float, DFT total energy of simulation cell. Unit: KJ / mol cell.
* **self.nqpoint**, get_qpoint, int, number of q point where the frequencies are calculated.  
* **self.qpoint**, get_qpoint, nqpoint * 3 numpy array, the fractional coordinates of q points in reciprocal lattice.  
* **self.nmode**, get_mode, nqpoint * 1 numpy array, the numbers of modes at each q point.  
* **self.frequency**, get_mode and clean_imaginary, nqpoint * nmode numpy array, the vibrational frequencies ($\nu$) of each mode at each q point. Unit: THz.

Note: 

Angular frequency: $\omega = 2\pi\nu$  
Eigen value of dynamic matrix: $\lambda = \nu^{2}$

* **self.eigenvector**, get_eigenvector, nqpoint * nmode * natom * 3 numpy array, the eigenvectors of dynamic matrix at each atom, each vibrational mode and each qpoint. Classical amplitude (298.15K), expressed in Cartesian coordinates. Unit: Angstrom.

**Modifications are made at the end of Crystal_output class. Line 737~957. Source codes attached in the bottom cell.**

## Tests

3 test are performed. The first one is for $\Gamma$ point frequencies. The second one is for dispersion calculations. The last one has negative frequencies, in order to examine the method `clean_imaginary`. 

1. Form I paracetamol, 'freqf1-r0.out', $\Gamma$ point HA frequencies, with occasional warning messages. nqpoint = 1, nmode = 240, natom = 80.  
2. A graphene primitive cell, 'nostr-modes.out', phonon dispersion calculated along the $\Gamma-K-M$ path. nqpoint = 123, nmode = 6, natom = 2.  
3. Form II paracetamol, 'f2p2q-r1.out', $\Gamma$ point HA frequencies, with imaginary modes. nqpoint = 1, nmode = 480, natom = 160.  

### Get energy, qpoints and frequencies at each qpoint

nqpoint = 1, nmode = 240 for paracetamol case, nqpoint = 123, nmode = 3 for graphene case. In both cases, nqpoint should be an integer and nmode should be a numpy array.

In [2]:
from crystal_functions.file_readwrite import Crystal_output

paracetamol = Crystal_output()
paracetamol.read_cry_output('freqf1-r0.out')
paracetamol.get_mode()
print('Paracetamol EDFT:', paracetamol.edft)
print('Paracetamol qpoint:', paracetamol.nqpoint, paracetamol.qpoint)
print('Paracetamol modes:', len(paracetamol.nmode), paracetamol.nmode)

Paracetamol EDFT: -5402457.523570631
Paracetamol qpoint: 1 [0. 0. 0.]
Paracetamol modes: 1 [240.]


In [3]:
graphene = Crystal_output()
graphene.read_cry_output('nostr_modes.out')
graphene.get_mode()
print('Graphene EDFT:', graphene.edft)
print('Graphene qpoints:', graphene.nqpoint)
print('Graphene modes:', len(graphene.nmode), graphene.nmode)

Graphene EDFT: -28768041.661716238
Graphene qpoints: 123
Graphene modes: 123 [6. 6. 6. 6. 6. 6. 6. 6. 6. 6. 6. 6. 6. 6. 6. 6. 6. 6. 6. 6. 6. 6. 6. 6.
 6. 6. 6. 6. 6. 6. 6. 6. 6. 6. 6. 6. 6. 6. 6. 6. 6. 6. 6. 6. 6. 6. 6. 6.
 6. 6. 6. 6. 6. 6. 6. 6. 6. 6. 6. 6. 6. 6. 6. 6. 6. 6. 6. 6. 6. 6. 6. 6.
 6. 6. 6. 6. 6. 6. 6. 6. 6. 6. 6. 6. 6. 6. 6. 6. 6. 6. 6. 6. 6. 6. 6. 6.
 6. 6. 6. 6. 6. 6. 6. 6. 6. 6. 6. 6. 6. 6. 6. 6. 6. 6. 6. 6. 6. 6. 6. 6.
 6. 6. 6.]


### Get eigenvectors
Self.eigenvector should be a 1 (nqpoint) * 240 (nmode) * 80 (natom) * 3 (xyz) numpy array for paracetamol case. A 123 * 6 * 2 * 3 array for graphene case.

Example: Find the qpoint coordinate, frequency and eigenvectors of all atoms in the graphene primitive cell, with a given qpoint (75) and vibrational mode (3rd)

In [4]:
paracetamol.get_eigenvector()

print('Paracetamol')
print('nqpoint=', len(paracetamol.eigenvector),
      'nmode=', len(paracetamol.eigenvector[0]),
      'natom=', len(paracetamol.eigenvector[0, 0]),
      'first eigenvector=', paracetamol.eigenvector[0, 0, 0])

lattice = graphene.get_eigenvector()
print('Graphene')
print('nqpoint=', len(graphene.eigenvector),
      'nmode=', len(graphene.eigenvector[0]),
      'natom=', len(graphene.eigenvector[0, 0]),
      'first eigenvector=', graphene.eigenvector[0, 0, 0])


Paracetamol
nqpoint= 1 nmode= 240 natom= 80 first eigenvector= [ 0.02143167  0.         -0.00211671]


ValueError: array split does not result in an equal division

In [5]:
print(lattice)

NameError: name 'lattice' is not defined

In [8]:
print('Wavevector coordinates:', graphene.qpoint[74])
print('Frequency:', graphene.frequency[74, 2])
print('Eigenvectors:', graphene.eigenvector[74, 2])

Wavevector coordinates: [0.47083333 0.05833333 0.        ]
Frequency: 19.8656
Eigenvectors: [[ 5.2920000e-05  1.0795680e-01 -5.2920000e-05]
 [-5.2920000e-04  1.0615752e-01  5.2920000e-05]]


Compare with the crystal output file:

At the start point of $K-M$ path:  
```
 *******************************************************************************

  PHONONS ALONG PATH:   2 NUMBER OF K POINTS:   41

  FROM K  (   2   2   0 ) TO K  (   3   0   0 ) WITH DENOMINATOR    6

  THE POSITION OF THE POINTS IS EXPRESSED IN UNITS        OF DENOMINATOR  240

 *******************************************************************************
```
 
 At the 75th qpoint:  
```
 DISPERSION K POINT NUMBER    75 COORD:  C( 113  14   0 )    WEIGHT:    1.

    MODES         EIGV          FREQUENCIES     IRREP
             (HARTREE**2)   (CM**-1)     (THZ)
    1-   1    0.4928E-05    487.1942   14.6057  (  1)
    2-   2    0.8383E-05    635.4696   19.0509  (  1)
    3-   3    0.9116E-05    662.6443   19.8656  (  1)
    4-   4    0.3933E-04   1376.3503   41.2619  (  1)
    5-   5    0.4131E-04   1410.6170   42.2892  (  1)
    6-   6    0.4439E-04   1462.3150   43.8391  (  1)

 MODES IN PHASE

 FREQ(CM**-1)    487.19    635.47    662.64   1376.35   1410.62   1462.32

 AT.   1 C  X     0.0001    0.0001    0.0001   -0.2006    0.1997    0.0072
            Y     0.0000   -0.0001    0.2040    0.0015   -0.0002   -0.1963
            Z     0.2041   -0.2007   -0.0001    0.0001    0.0001    0.0000
 AT.   2 C  X    -0.0001    0.0001   -0.0010    0.2041    0.1963    0.0003
            Y     0.0000   -0.0001    0.2006   -0.0004   -0.0077    0.1996
            Z     0.2007    0.2041    0.0001    0.0001   -0.0001   -0.0000 
```


### Get rid of negative frequency
A test is performed on a harmonic phonon output with negative frequencies. In principle, the user is expected to redo all calculations to get rid of imaginary modes, instead of continuing thermodynamic analysis. Here we assume the user does not know the existance of imagninary modes.

Tests performed on Form II paracetamol $\Gamma$ point, stretched by 2% volumetric strain. There are 160 atoms, 480 modes, 1 qpoint. The output is attached in the same directory named as 'f2p2q-r1.out'. Modes 1-5 are negative, as given below:

```
    MODES         EIGV          FREQUENCIES     IRREP  IR   INTENS    RAMAN
             (HARTREE**2)   (CM**-1)     (THZ)             (KM/MOL)
    1-   1   -0.7201E-07    -58.8941   -1.7656  (B3g)   I (     0.00)   A
    2-   2   -0.6031E-07    -53.9001   -1.6159  (Ag )   I (     0.00)   A
    3-   3   -0.3525E-07    -41.2052   -1.2353  (Au )   I (     0.00)   I
    4-   4   -0.1932E-07    -30.5064   -0.9146  (B2g)   I (     0.00)   A
    5-   5   -0.2848E-08    -11.7124   -0.3511  (B1g)   I (     0.00)   A
    6-   6   -0.4132E-19      0.0000    0.0000  (B2u)   A (     0.00)   I
```

PS: Never use pob-TZVP basis set for energies, that is miserable :-)

In [12]:
neg_freq = Freq_output('f2p2q-r1.out')
neg_freq.get_mode()
neg_freq.get_eigenvector()
neg_freq.clean_imaginary()

print(neg_freq.frequency[0, :6])
print(neg_freq.eigenvector[0, :6, 0, :])

[nan nan nan nan nan  0.]
[[       nan        nan        nan]
 [       nan        nan        nan]
 [       nan        nan        nan]
 [       nan        nan        nan]
 [       nan        nan        nan]
 [-0.0152403 -0.         0.       ]]


In [10]:
from crystal_functions.file_readwrite import Crystal_output
from crystal_functions.convert import cry_out2pmg

class Freq_output(Crystal_output):
    """
    Class Freq_output, inheriated from Crystal_out and with thermodynamic-
    specific attributes, including:
        self.lattice: __init__, Lattice information
        self.edft: get_edft, DFT total energy, with probable corrections. Unit: KJ / mol cell
        self.nqpoint: get_qpoint, Number of q points
        self.qpoint: get_qpoint, Fractional coordinates of qpoints
        self.nmode: get_mode, Number of vibrational modes at all qpoints
        self.frequency: get_mode, Frequencies of all modes at all qpoints, Unit: THz
        self.eigenvector: get_eigenvector, Eigenvectors (classical amplitude) of 
                          all atoms, all modes at all qpoints. Unit: Angstrom
    """
    def __init__(self, output_name):
        """
        Input:
            The name of '.out' file
        Output:
            self, Crystal_functions output object
            self.lattice, pymatgen structure object, lattice and atom information
        """
        super(Freq_output, self).__init__(output_name)
        self.lattice = cry_out2pmg(self, initial=False, vacuum=500)
        
    def get_edft(self):
        '''
        Get the DFT total energy of simulation cell. Unit: KJ / mol cell. To include
        probable energy corrections, the value at 'CENTRAL POINT' of force constant 
        matrix is adopted.
        Input:
            -
        Output:
            self.edft, float, DFT total energy. Unit: KJ / mol cell
        '''
        import re
        
        for i, line in enumerate(self.data):
            if re.match(r'\s*CENTRAL POINT', line):
                self.edft = float(line.strip().split()[2]) * 2625.500256
                break

        return self.edft

    def get_qpoint(self):
        """
        Get the qpoints at which the phonon frequency is calculated.
        Input:
            -
        Output:
            self.nqpoint, int, Number of q points where the frequencies are calculated.
            self.qpoint, nq * 3 numpy float array, Fractional coordinates of qpoints.
        """
        import numpy as np
        import re
        
        self.nqpoint = 0
        self.qpoint = np.array([], dtype=float)

        for i, line in enumerate(self.data):
            if re.search(r'EXPRESSED IN UNITS\s*OF DENOMINATOR', line):
                shrink = int(line.strip().split()[-1])
                
            if re.match(r'\s*DISPERSION K POINT NUMBER', line):
                coord = np.array(line.strip().split()[7:10], dtype=float)
                self.qpoint = np.append(self.qpoint, coord / shrink)
                self.nqpoint += 1
        
        self.qpoint = np.reshape(self.qpoint, (-1, 3))
        if self.nqpoint == 0:
            self.nqpoint = 1
            self.qpoint = np.array([0, 0, 0], dtype=float)
            
        return self.nqpoint, self.qpoint

    def get_mode(self):
        """
        Get corresponding vibrational frequencies and for all modes and
        compute the total number of vibration modes (natoms * 3).

        Input:
            -
        Output:
            self.nmode, nqpoint * 1 numpy int array, Number of vibration modes at each
                        qpoints.
            self.frequency: nqpoint * nmode numpy float array, Harmonic vibrational
                        frequency. Unit: THz
        """
        import numpy as np
        import re
        
        if not hasattr(self, 'nqpoint'):
            self.get_qpoint()
        
        self.frequency = np.array([], dtype=float)

        countline = 0
        while countline < len(self.data):
            is_freq = False
            if re.match(r'\s*DISPERSION K POINT NUMBER\s*\d', self.data[countline]):
                countline += 2
                is_freq = True
            
            if re.match(r'\s*MODES\s*EIGV\s*FREQUENCIES\s*IRREP', self.data[countline]):
                countline += 2
                is_freq = True

            while self.data[countline].strip() and is_freq:
                line_data = re.findall(r'\-*[\d\.]+[E\d\-\+]*', self.data[countline])
                if line_data:
                    nm_a = int(line_data[0].strip('-'))
                    nm_b = int(line_data[1])
                    freq = float(line_data[4])

                for mode in range(nm_a, nm_b + 1):
                    self.frequency = np.append(self.frequency, freq)

                countline += 1
                
            countline += 1

        self.frequency = np.reshape(self.frequency, (self.nqpoint, -1))
        self.nmode = np.array([len(i) for i in self.frequency], dtype=float)

        return self.nmode, self.frequency

    def get_eigenvector(self):
        """
        Get corresponding mode eigenvectors for all modes on all
        atoms in the supercell. 
        
        Input:
            -
        Output:
            self.eigenvector, nqpoint * nmode * natom * 3 numpy float array, 
                              Eigenvectors expressed in Cartesian coordinate,
                              at all atoms, all modes and all qpoints. Classical
                              amplitude. Unit: Angstrom
        """
        import numpy as np
        import re
        
        if not hasattr(self, 'nmode'):
            self.get_mode()
        
        total_mode = np.sum(self.nmode)
        countline = 0
        # Multiple blocks for 1 mode. Maximum 6 columns for 1 block.
        if np.max(self.nmode) >= 6:
            countmode = 6
        else:
            countmode = total_mode
        
        # Read the eigenvector region as its original shape
        block_label = False
        total_data = []
        while countline < len(self.data) and countmode <= total_mode:
            # Gamma point / phonon dispersion calculation
            if re.match(r'\s*MODES IN PHASE', self.data[countline]) or\
               re.match(r'\s*NORMAL MODES NORMALIZED', self.data[countline]):
                block_label = True
            elif re.match(r'\s*MODES IN ANTI-PHASE', self.data[countline]):
                block_label = False

            # Enter a block
            if re.match(r'\s*FREQ\(CM\*\*\-1\)', self.data[countline]) and\
               block_label:
                countline += 2
                block_data = []
                while self.data[countline].strip():
                    # Trim annotation part (12 characters)
                    line_data = re.findall(r'\-*[\d\.]+[E\d\-\+]*', 
                                           self.data[countline][13:])
                    if line_data:
                        block_data.append(line_data)

                    countline += 1

                countmode += len(line_data)
                total_data.append(block_data)

            countline += 1

        total_data = np.array(total_data, dtype=float)

        # Rearrage eigenvectors
        block_per_q = len(total_data) / self.nqpoint
        self.eigenvector = []
        # 1st dimension, nqpoint
        for q in range(self.nqpoint):
            index_bg = int(q * block_per_q)
            index_ed = int((q + 1) * block_per_q)
            q_data = np.hstack([i for i in total_data[index_bg : index_ed]])
        # 2nd dimension, nmode    
            q_data = np.transpose(q_data)
        # 3rd dimension, natom
            natom = len(self.lattice.sites)
#             natom = int(self.nmode[0] / 3)
            q_rearrange = [np.split(m, natom, axis=0) for m in q_data]
            
            self.eigenvector.append(q_rearrange)

        self.eigenvector = np.array(self.eigenvector) * 0.529177

        return self.eigenvector
    
    def clean_imaginary(self):
        """
        Substitute imaginary modes and corresponding eigenvectors with numpy
        NaN format and print warning message.
        
        Input:
            -
        Output:
            cleaned attributes.
            self.frequency
            self.eigenvector
        """
        import numpy as np
        
        for q, freq in enumerate(self.frequency):
            if freq[0] > -1e-4:
                continue

            print('WARNING: Negative frequencies detected - Calculated thermodynamics might be inaccurate.')
            print('WARNING: Negative frequencies will be substituted with NaN.')
            
            neg_rank = np.where(freq <= -1e-4)[0]
            self.frequency[q, neg_rank] = np.nan
            
            if hasattr(self, 'eigenvector'):
                natom = len(self.lattice.sites)
                nan_eigvt = np.full([natom, 3], np.nan)
                self.eigenvector[q, neg_rank] = nan_eigvt

        if hasattr(self, 'eigenvector'):
            return self.frequency, self.eigenvector
        else:
            return self.frequency
