Although the DataFrames produced by **ASDtools** are similar to those from **astroquery.NIST**, there a handful of differences. The following sections identify each column, their meaning, and one mechanism for checking the accuracy of the processed data.

In [2]:
import ASDtools as asdt

## **Basic DataFrame Format**

The default DataFrame produced by __ASDtools__ has 14 columns...

1. `Element`: The element symbol and ionization for a given row.
2. `Observed (nm)`: The vacuum wavelength this line has been observed at.
3. `Ritz (nm)`: The Ritz wavelength inferred from the energy levels of the transition.
4. `Transition Type`: Indicates the type of transition rules necessary for the transition (i.e., E1, E2, M1, etc.).
5. `Rel.`: The qualitative relative intensity of lines.
6. `Ei (eV)`: Energy of the lower level (in eV).
7. `Ek (eV)`: Energy of the upper level (in eV).
8. `Aki`: Emission transition probability
9. `gi`: Lower level statistical weight.
10. `gk`: Upper level statistical weight.
11. `S`: Line strength (electric dipole matrix element squared).
12. `Acc.`: 'Accuracy' of the given transition strength (see NIST help doc).
13. `Lower level`: Electronic configuration, term symbol, and J-value of the lower level.
14. `Upper level`: Electronic configuration, term symbol, and J-value of the upper level.

To arrive at the final DataFrame of __ASDtools__, several columns have to be filtered / processed. For several columns, this only requires the column to be converted into the appropriate data type. But, several columns feature <u>_flags_</u>, where are any non-numeric character that communicate information about the line in question. Although __ASDtools__ has been extensively tested, there is still a possibility that the numerical values in this DataFrame are extracted incorrectly. So, using ``keep_original_columns = True`` you can manually check its accuracy by choosing to keep a copy of each column before any processing was applied.

In [4]:
lines = asdt.NIST_lines("He I", 
                        keep_original_columns=True, 
                        update=False)
lines.df.head()

Unnamed: 0,Element,Observed_original,Observed (nm),Ritz_original,Ritz (nm),Transition Type,Rel._original,Rel.,Ei (eV)_original,Ei (eV),...,Aki_original,Aki,gi_original,gi,gk_original,gk,S,Acc.,Lower level,Upper level
0,He I,32.02926,32.02926,32.02935,32.02935,E1,10*,10.0,[20.96421885051],20.964219,...,,,1,1.0,9,9.0,,,1s.2p | 3P* | 0,"2p2 | 3P | 0,1,2"
1,He I,32.02926,32.02926,32.02925,32.02925,E1,10*,10.0,[20.96409636464],20.964096,...,,,3,3.0,9,9.0,,,1s.2p | 3P* | 1,"2p2 | 3P | 0,1,2"
2,He I,32.02926,32.02926,32.02924,32.02924,E1,10*,10.0,[20.96408688908],20.964087,...,,,5,5.0,9,9.0,,,1s.2p | 3P* | 2,"2p2 | 3P | 0,1,2"
3,He I,50.561,50.561,50.5500354,50.550035,E1,2,2.0,[0.00000000],0.0,...,,,1,1.0,3,3.0,,,1s2 | 1S | 0,1s.15p | 1P* | 1
4,He I,50.575,50.575,50.5684333,50.568433,E1,3,3.0,[0.00000000],0.0,...,,,1,1.0,3,3.0,,,1s2 | 1S | 0,1s.14p | 1P* | 1


The original columns all have the naming convention "{column}_original." Certain columns (i.e., "Element," "S," "Lower level," "Upper level") do not have an "original" column since they have not been altered or are not a defult __astroquery__ output.

## **All NIST-Provided Flags**

If the user uses the ```keep_flag_columns = True``` argument, __ASDtools__ will attempt to extract and filter out "flags" found within the DataFrame. The term "flag" refers to any non-numeric entry in numeric columns. These are not filtered out by __astroquery__, meaning they can interfere with using data directly from the NIST ASD without extra filtering. Flags are stored as a list in a new column with the naming convention "{column}_flags." Their meanings are all pulled from the **[NIST ASD Help Page](https://physics.nist.gov/PhysRefData/ASD/Html/lineshelp.html#OUTRELINT)**.

- `*`: Intensity is shared by several lines (typically, for multiply classified lines).
- `:`: Observed value given is actually the rounded Ritz value, e.g., Ar IV, λ = 443.40 Å.
- `-`: Somewhat lower intensity than the value given.
- `?`: This level/line may not be real.
- `†`: Term assignment of the level is questionable.
- `a`: Observed in absorption.
- `bl`: Blended with another line that may affect the wavelength and intensity."
- `b`: Band head.
- `B`: Line or feature having large width due to autoionization broadening.
- `c`: Complex line.
- `d`: Diffuse line.
- `D`: Double line.
- `E`: Broad due to overexposure in the quoted reference.
- `f`: Forbidden line.
- `g`: Transition involving a level of the ground term.
- `G`: Line position roughly estimated.
- `H`: Very hazy line.
- `hfs`: Line has hyperfine structure.
- `h`: Hazy line (same as 'diffuse').
- `i`: Identification uncertain.
- `j`: Wavelength smoothed along isoelectronic sequence.
- `l`: Shaded to longer wavelengths; NB: This may look like a 'one' at the end of the number!
- `m`: Masked by another line (no wavelength measurement).
- `p`: Perturbed by a close line. Both wavelength and intensity may be affected.
- `q`: Asymmetric line.
- `r`: Easily reversed line.
- `s`: Shaded to shorter wavelengths.
- `t`: Tentatively classified line.
- `u`: Unresolved from a close line.
- `+x` : The relative positions of the levels within such a system are accurate within experimental uncertainties, but no experimental connection between this system and the other levels of the spectrum has been made.
- `+y` : The relative positions of the levels within such a system are accurate within experimental uncertainties, but no experimental connection between this system and the other levels of the spectrum has been made.
- `+z` : The relative positions of the levels within such a system are accurate within experimental uncertainties, but no experimental connection between this system and the other levels of the spectrum has been made.
- `w`: Wide line.
- `x`: Extrapolated wavelength.
- `()` : Theoretical value.
- `[]` : This level was determined by interpolation or extrapolation of known experimental values or by semiempirical calculation.

In [7]:
lines = asdt.NIST_lines("Ne II", 
                        keep_flag_columns=True, 
                        keep_original_columns=True, 
                        update=False)
lines.df.head()

Unnamed: 0,Element,Observed_original,Observed (nm),Ritz_original,Ritz (nm),Transition Type,Rel._original,Rel.,Rel._flags,Ei (eV)_original,...,Aki_original,Aki,gi_original,gi,gk_original,gk,S,Acc.,Lower level,Upper level
0,Ne II,28.6308,28.6308,28.6308,28.6308,E1,6,6.0,,0.0,...,,,4,4.0,2,2.0,,,2s2.2p5 | 2P* | 3/2,2s2.2p4.(1D).8d | 2S | 1/2
1,Ne II,,,28.6949,28.6949,E1,m,,[m],0.09676024,...,,,2,2.0,2,2.0,,,2s2.2p5 | 2P* | 1/2,2s2.2p4.(1D).8d | 2S | 1/2
2,Ne II,28.8073,28.8073,28.80732,28.80732,E1,12,12.0,,0.0,...,,,4,4.0,2,2.0,,,2s2.2p5 | 2P* | 3/2,2s2.2p4.(1D).7d | 2S | 1/2
3,Ne II,28.872,28.872,28.8722,28.8722,E1,6,6.0,,0.09676024,...,,,2,2.0,2,2.0,,,2s2.2p5 | 2P* | 1/2,2s2.2p4.(1D).7d | 2S | 1/2
4,Ne II,29.0831,29.0831,29.083,29.083,E1,48,48.0,,0.0,...,,,4,4.0,2,2.0,,,2s2.2p5 | 2P* | 3/2,2s2.2p4.(1D).6d | 2S | 1/2


If you want to see which flags were identified within your data, along with their total number of occurences and meaning, a helper function will attempt to do just that!

In [9]:
lines.explain_column_flags("Rel.")

   Flag: *
 Counts: 101
Meaning: Intensity is shared by several lines (typically, for multiply classified lines).

   Flag: ?
 Counts: 7
Meaning: This level/line may not be real.

   Flag: bl
 Counts: 83
Meaning: Blended with another line that may affect the wavelength and intensity.

   Flag: c
 Counts: 3
Meaning: Complex line.

   Flag: l
 Counts: 1
Meaning: Shaded to longer wavelengths; NB: This may look like a 'one' at the end of the number!

   Flag: m
 Counts: 9
Meaning: Masked by another line (no wavelength measurement).

   Flag: s
 Counts: 1
Meaning: Shaded to shorter wavelengths.

   Flag: w
 Counts: 46
Meaning: Wide line.

