## re

[RE](https://docs.python.org/3.5/library/re.html): regex, regexp, regular expression

### Common Functions / Methods

1. [fullmatch](https://docs.python.org/3.5/library/re.html#re.fullmatch)
2. [match](https://docs.python.org/3.5/library/re.html#re.regex.match)
3. [search](https://docs.python.org/3.5/library/re.html#re.regex.search)
4. [findall](https://docs.python.org/3.5/library/re.html#re.regex.findall)
5. [sub](https://docs.python.org/3.5/library/re.html#re.regex.sub)
6. [split](https://docs.python.org/3.5/library/re.html#re.regex.split)
7. [compile](https://docs.python.org/3.5/library/re.html#re.compile)

### Useful Tool

- https://regex101.com/#python

In [1]:
import re

In [2]:
# the r prefix means a raw string, not regex
match = re.match(r'[0-9]{1,3}(?:\.[0-9]{1,3}){3}', '127.0.0.1')
match # -> match object if match

<_sre.SRE_Match object; span=(0, 9), match='127.0.0.1'>

In [3]:
match = re.match(r'[0-9]{1,3}(?:\.[0-9]{1,3}){3}', '127.0.0.1aaaa')
match

<_sre.SRE_Match object; span=(0, 9), match='127.0.0.1'>

In [4]:
match = re.fullmatch(r'[0-9]{1,3}(?:\.[0-9]{1,3}){3}', 'x.x.x.x')
match # -> None if no match

In [5]:
# more methods of the match object:
# https://docs.python.org/3.5/library/re.html#match-objects
match = re.fullmatch(r'[0-9]{1,3}(?:\.[0-9]{1,3}){3}', '127.0.0.1')
match.group(0) # -> the first matched group str

'127.0.0.1'

In [6]:
# or compile the re str into a regex object first:
# https://docs.python.org/3.5/library/re.html#regular-expression-objects
ip_re = re.compile(r'[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}')
match = ip_re.fullmatch('192.168.0.1')
match.group(0)

'192.168.0.1'

In [7]:
raw_text = 'My IP is 127.0.0.1.'

In [8]:
match = ip_re.fullmatch(raw_text)
match # -> None

In [9]:
match = ip_re.search(raw_text)
match.group(0)

'127.0.0.1'

In [10]:
ip_re.findall('127.0.0.1, 127.0.0.2')

['127.0.0.1', '127.0.0.2']

In [11]:
for m in ip_re.finditer('127.0.0.1, 127.0.0.2'):
    print(m)

<_sre.SRE_Match object; span=(0, 9), match='127.0.0.1'>
<_sre.SRE_Match object; span=(11, 20), match='127.0.0.2'>


## json

[JSON](https://docs.python.org/3.5/library/json.html): JavaScript Object Notation

- [load](https://docs.python.org/3.5/library/json.html#json.load)/[loads](https://docs.python.org/3.5/library/json.html#json.loads): JSON → Python object
- [dump](https://docs.python.org/3.5/library/json.html#json.dump)/[dumps](https://docs.python.org/3.5/library/json.html#json.dumps): Python object → JSON

In [12]:
import json
from pprint import pprint

# https://www.pinkoi.com/offers/92
json_text = '''{
  "locale": {
    "currency": "TWD",
    "code": "zh_TW",
    "geo_name": "\u53f0\u7063"
  },
  "result": [
    {
      "total": 147,
      "user_voted": "",
      "vote_option_sum": {
        "1": 44,
        "3": 7,
        "2": 30,
        "4": 66
      }
    }
  ]
}'''

d = json.loads(json_text)
print(d)
print(d['result'][0]['total'])

{'locale': {'currency': 'TWD', 'code': 'zh_TW', 'geo_name': '台灣'}, 'result': [{'total': 147, 'user_voted': '', 'vote_option_sum': {'1': 44, '3': 7, '2': 30, '4': 66}}]}
147


## csv

In [13]:
# make a mock file first

from io import StringIO
 
csv_f = StringIO('''\
moskytw,mosky.tw@gmail.com
moskyliu,mosky.liu@pinkoi.com
moskybot,mosky.bot@gmail.com\
''')
# ≈ csv_f = open(path)

for line in csv_f:
    print(line, end='')
print()
    
csv_f.seek(0)

moskytw,mosky.tw@gmail.com
moskyliu,mosky.liu@pinkoi.com
moskybot,mosky.bot@gmail.com


0

In [14]:
# read from the file
import csv
reader = csv.reader(csv_f)
rows = [row for row in reader]
print(rows)

[['moskytw', 'mosky.tw@gmail.com'], ['moskyliu', 'mosky.liu@pinkoi.com'], ['moskybot', 'mosky.bot@gmail.com']]


In [15]:
# write to a file
import sys
writer = csv.writer(sys.stdout)
for row in rows:
    writer.writerow(row)
    
# see also:
# csvkit is a suite of utilities for converting to and working with CSV
# https://csvkit.readthedocs.org/en/0.9.1/

moskytw,mosky.tw@gmail.com
moskyliu,mosky.liu@pinkoi.com
moskybot,mosky.bot@gmail.com


## pickle

In [16]:
import pickle

print(rows)
print()

# negative to use the pickle.HIGHEST_PROTOCOL
dumped_bin = pickle.dumps(rows, protocol=-1)
print(dumped_bin)
print()

print(pickle.loads(dumped_bin))

[['moskytw', 'mosky.tw@gmail.com'], ['moskyliu', 'mosky.liu@pinkoi.com'], ['moskybot', 'mosky.bot@gmail.com']]

b'\x80\x04\x95s\x00\x00\x00\x00\x00\x00\x00]\x94(]\x94(\x8c\x07moskytw\x94\x8c\x12mosky.tw@gmail.com\x94e]\x94(\x8c\x08moskyliu\x94\x8c\x14mosky.liu@pinkoi.com\x94e]\x94(\x8c\x08moskybot\x94\x8c\x13mosky.bot@gmail.com\x94ee.'

[['moskytw', 'mosky.tw@gmail.com'], ['moskyliu', 'mosky.liu@pinkoi.com'], ['moskybot', 'mosky.bot@gmail.com']]
