# Exercise - Webserver Log Parser


Using named groups, write a pattern to capture the values in a log entry.

Each entry consists of the IP address of the requester, the HTTP method, resource, number of bytes, and request duration

Use the following group names: ip, http, resource, bytes, duration

I would recommend using the learn regex tool to visually write the pattern and then test with the code provided

**Input**

192.168.0.20 GET / index.html 32504 1.030

192.168.0.55 GET / index.html 32504 0.500

**Output (first match shown below):**

ip:192.168.0.20

http:GET

resource:/index.html

bytes:32504

duration:1.030



In [1]:
import re # python regex module
 
pattern = r""
 
# Sample multi-line text
text = """192.168.0.20 GET /index.html 32504 1.030
192.168.0.55 GET /index.html 32504 0.500"""
 
print ('Pattern:',pattern)
print ('Text:',text)
print()
 
match_iter = re.finditer(pattern, text)
 
print ('Match:')
for match in match_iter:
    print('', match.group(0), 'at index:', match.start())
    
    for key,value in match.groupdict().items():
        print('  ', key,':',value)

Pattern: 
Text: 192.168.0.20 GET /index.html 32504 1.030
192.168.0.55 GET /index.html 32504 0.500

Match:
  at index: 0
  at index: 1
  at index: 2
  at index: 3
  at index: 4
  at index: 5
  at index: 6
  at index: 7
  at index: 8
  at index: 9
  at index: 10
  at index: 11
  at index: 12
  at index: 13
  at index: 14
  at index: 15
  at index: 16
  at index: 17
  at index: 18
  at index: 19
  at index: 20
  at index: 21
  at index: 22
  at index: 23
  at index: 24
  at index: 25
  at index: 26
  at index: 27
  at index: 28
  at index: 29
  at index: 30
  at index: 31
  at index: 32
  at index: 33
  at index: 34
  at index: 35
  at index: 36
  at index: 37
  at index: 38
  at index: 39
  at index: 40
  at index: 41
  at index: 42
  at index: 43
  at index: 44
  at index: 45
  at index: 46
  at index: 47
  at index: 48
  at index: 49
  at index: 50
  at index: 51
  at index: 52
  at index: 53
  at index: 54
  at index: 55
  at index: 56
  at index: 57
  at index: 58
  at index: 59
  at

In [9]:
text = """192.168.0.20 GET /index.html 32504 1.030
192.168.0.55 GET /index.html 32504 0.500"""

pattern = '(?P<ip>\d{1,3}\.\d{1,3}.\d{1,3})' #captures the iP
print(re.findall(pattern, text))

['168.0.20', '1.030', '168.0.55']


In [11]:
pattern = '(?P<http>\w+)' #captures the HTTP request
print(re.findall(pattern, text))

['192', '168', '0', '20', 'GET', 'index', 'html', '32504', '1', '030', '192', '168', '0', '55', 'GET', 'index', 'html', '32504', '0', '500']


In [12]:
#This pattern captures a / followed by any word character or "." character.
# For example, this pattern will match the text /index.html

pattern = '(?P<resource>/[.\w.]+)'
print(re.findall(pattern, text))

['/index.html', '/index.html']


In [20]:
#duration and bytes values
pattern = '(?P<duration>\d+\.\d+)$'
print(re.findall(pattern, text))

['0.500']


## Solution

In [21]:
import re # python regex module
 
pattern = r"(?m)^(?P<ip>\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3})\s+(?P<http>\w+)\s+(?P<resource>/[\w.]+)\s+(?P<bytes>\d+)\s+(?P<duration>\d+\.\d+)$"
 
# Sample multi-line text
text = """192.168.0.20 GET /index.html 32504 1.030
192.168.0.55 GET /index.html 32504 0.500"""
 
print ('Pattern:',pattern)
print ('Text:',text)
print()
 
match_iter = re.finditer(pattern, text)
 
print ('Match:')
for match in match_iter:
    print('', match.group(0), 'at index:', match.start())
    
    for key,value in match.groupdict().items():
        print('  ', key,':',value)

Pattern: (?m)^(?P<ip>\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3})\s+(?P<http>\w+)\s+(?P<resource>/[\w.]+)\s+(?P<bytes>\d+)\s+(?P<duration>\d+\.\d+)$
Text: 192.168.0.20 GET /index.html 32504 1.030
192.168.0.55 GET /index.html 32504 0.500

Match:
 192.168.0.20 GET /index.html 32504 1.030 at index: 0
   ip : 192.168.0.20
   http : GET
   resource : /index.html
   bytes : 32504
   duration : 1.030
 192.168.0.55 GET /index.html 32504 0.500 at index: 41
   ip : 192.168.0.55
   http : GET
   resource : /index.html
   bytes : 32504
   duration : 0.500
