### Text parsing examples

This notebook demonstrates methods from the Python standard library to:

+ find specific text within a larger text string
+ replace the text with something else

It uses the Standard Library tools.  Methods on the Python `str` objects are used first, for exact matches.
Regular expressions from the `re` module allow patterns to be used.

The [pythex](https://pythex.org/) website is an excellent reference for Regular Expressions.

Beware: Regular Expressions are notoriously fiddly and can make things harder (see https://xkcd.com/1171/)

For further reading, see the [Real Python Regular Expression tutorial](https://realpython.com/regex-python/).

In [14]:
from pathlib import Path
import re

In [15]:
# Simple text string
EMAIL_ADDRESS = "user@example.com"

# A Volcanic Ash Advisory notice is a more complicated text string
VAA_TEXT = Path('VAA_EXAMPLE.DAT').read_text()
print(VAA_TEXT)

FVFE01 RJTD 090552                                              2014068 0553
VA ADVISORY
DTG: 20140309/0552Z
VAAC: TOKYO
VOLCANO: SAKURAJIMA 0802-08
PSN: N3135E13040
AREA: JAPAN
SUMMIT ELEV: 1060M
ADVISORY NR: 2014/90
INFO SOURCE: MTSAT-2
AVIATION COLOUR CODE: NIL
ERUPTION DETAILS: VA CONTINUOUSLY OBS ON SATELLITE IMAGERY.
OBS VA DTG: 09/0515Z
OBS VA CLD: SFC/FL120 N3105 E13115 - N3125 E13150 - N3115 E13210 -
N3
130 E13235 - N3115 E13245 - N3055 E13205 - N3100 E13150 - N3050
E1312
0 MOV SE 25KT
FCST VA CLD +6 HR: 09/1115Z SFC/FL110 N3010 E13225 - N3115 E13730 -
N
2945 E13730 - N2900 E13500 - N2900 E13230
FCST VA CLD +12 HR: 09/1715Z SFC/FL090 N2830 E13350 - N2835 E13720 -
N3030 E14105 - N2855 E14150 - N2705 E13905 - N2700 E13400
FCST VA CLD +18 HR: 09/2315Z SFC/FL080 N2735 E14035 - N2950 E14440 -
N2820 E14555 - N2545 E14200 - N2455 E13455 - N2620 E13455
RMK: NIL
NXT ADVISORY: 20140309/1200Z=



### String methods

In [16]:
# Check that text exists
"user" in EMAIL_ADDRESS

True

In [17]:
# Check where it is found
EMAIL_ADDRESS.find("example")

5

In [18]:
# Use .casefold for case-insensitive comparison
"USER".casefold() in EMAIL_ADDRESS

True

In [19]:
# Use .replace to replace text
EMAIL_ADDRESS.replace("example.com", "bgs.ac.uk")

'user@bgs.ac.uk'

In [20]:
# Use split to break text into newlines
for line_no, line in enumerate(VAA_TEXT.split('\n')):
    print(f"{line_no}: {line}")

0: FVFE01 RJTD 090552                                              2014068 0553
1: VA ADVISORY
2: DTG: 20140309/0552Z
3: VAAC: TOKYO
4: VOLCANO: SAKURAJIMA 0802-08
5: PSN: N3135E13040
6: AREA: JAPAN
7: SUMMIT ELEV: 1060M
8: ADVISORY NR: 2014/90
9: INFO SOURCE: MTSAT-2
10: AVIATION COLOUR CODE: NIL
11: ERUPTION DETAILS: VA CONTINUOUSLY OBS ON SATELLITE IMAGERY.
12: OBS VA DTG: 09/0515Z
13: OBS VA CLD: SFC/FL120 N3105 E13115 - N3125 E13150 - N3115 E13210 -
14: N3
15: 130 E13235 - N3115 E13245 - N3055 E13205 - N3100 E13150 - N3050
16: E1312
17: 0 MOV SE 25KT
18: FCST VA CLD +6 HR: 09/1115Z SFC/FL110 N3010 E13225 - N3115 E13730 -
19: N
20: 2945 E13730 - N2900 E13500 - N2900 E13230
21: FCST VA CLD +12 HR: 09/1715Z SFC/FL090 N2830 E13350 - N2835 E13720 -
22: N3030 E14105 - N2855 E14150 - N2705 E13905 - N2700 E13400
23: FCST VA CLD +18 HR: 09/2315Z SFC/FL080 N2735 E14035 - N2950 E14440 -
24: N2820 E14555 - N2545 E14200 - N2455 E13455 - N2620 E13455
25: RMK: NIL
26: NXT ADVISORY: 20140309/1200

#### Exercise

+ Write a function, `extract_vaac`, to extract the VAAC from the VAA text.
  <details><summary>Hint</summary>
  Split twice, first on newlines, then on `:`
  </details>
+ Confirm the function works by running `assert extract_vaac(VAA_TEXT) == "TOKYO"`