In [1]:
import re

## Metacharacters

|Character|Description|Example|
|:---|:---|:---|
|[]|A set of characters|"[a-m]"|
|&#47;|Signals a special sequence|"\d"|
|.|Any character|"he..o"|
|^|Starts with|"^hello"|
|\$|Ends with|"planet\$"|
|*|Zero or more occurrences|"he.*o"|
|+|One or more occurrences|"he.+o"|
|?|Zero or One occurrences|"he.?o"|
|{}|Exactly the specified number of occurrences|"he.{2}o"|
|\||Either or|"falls\|stays"|

## Special Sequences

|Character|Description|Example|
|:---|:---|:---|
|\A|Returns a match if the specified characters are the beginning of the string|"\AThe"|
|\b|Returns a match where the specified characters are at the beginning or at the end of a word<br>(the "r" in the beginning is making sure that the string is being treated as a "raw string")|r"\bain"<br>r"ain\b"|
|\d|Returns a match where the string contains digits (numbers from 0-9)|"\d"|
|\D|Returns a match where the string DOES NOT contain digits|"\D"|
|\s|Returns a match where the string contains a white space character|"\s"|
|\S|Returns a match where the string DOES NOT contatin a white space character|"\S"|
|\w|Returns a match where the string contains any word characters (characters from a to Z, digits from 0-9, and the underscore _ character|"\w"|
|\W|Returns a match where the string DOES NOT contain any word characters|"\W"|
|\Z|Returns a match if the specified characters are at the end of the string|"\Z"|

## Sets

|Set|Description|
|:---|:---|
|[arn]|Returns a match where one of the specified characters (a, r, or n) is present|
|[a-n]|Returns a match for any lower case character, alphabetically between a and n|
|[^arn]|Returns a match for any character EXCEPT a, r, and n|
|[0123]|Returns a match where any of the specified digits (0, 1, 2, or 3) are present|
|[0-9]|Returns a match for any digit between 0 and 9|
|[0-5][0-9]|Returns a match for any two-digit numbers from 00 and 59|
|[a-zA-Z]|Returns a match for any character alphabetically between a and z, lower case OR upper case|
|[+]|In sets, "+","*",".","\|","()","\$","{}" has no spacial meaning, so [+] means: return a match for any + character in the string|

#### 공백 위치 찾기

In [27]:
txt = 'The rain in Spain'
x = re.search('\s', txt)

print("The first white-space character is located in position:", x.start())

The first white-space character is located in position: 3


#### 공백 단위로 split하기

In [28]:
txt = 'The rain in Spain'
x = re.split('\s', txt)

print(x)

['The', 'rain', 'in', 'Spain']


In [30]:
import re

txt = "The rain in Spain"
x = re.split("\s", txt, maxsplit=1)
print(x)

['The', 'rain in Spain']


#### 텍스트 대체하기

In [33]:
txt = "The rain in Spain"
x = re.sub("\s", "9", txt)

print(x)

The9rain9in9Spain


In [34]:
txt = "The rain in Spain"
x = re.sub("\s", "9", txt, 2)

print(x)

The9rain9in Spain


#### 객체 매칭

In [35]:
txt = "The rain in Spain"
x = re.search("ai", txt)

print(x)

<re.Match object; span=(5, 7), match='ai'>


In [41]:
txt = "The rain in Spain"
x = re.search(r'\bS\w+', txt)

print(x.span())

(12, 17)


In [43]:
txt = "The rain in Spain"
x = re.search(r'\bS\w+', txt)

print(x.string)

The rain in Spain


In [42]:
txt = "The rain in Spain"
x = re.search(r'\bS\w+', txt)

print(x.group())

Spain
