### The dataset file in [grades.txt](grades.txt) contains a line separated list of people with their grade in 
### a class. Let's create a regex to generate a list of just those students who received a B in the course.

In [1]:
# importing regex
import re

In [2]:
# reading the file
with open ("grades.txt", "r") as file:
        grades = file.read()

In [3]:
grades

'Ronald Mayr: A\nBell Kassulke: B\nJacqueline Rupp: A \nAlexander Zeller: C\nValentina Denk: C \nSimon Loidl: B \nElias Jovanovic: B \nStefanie Weninger: A \nFabian Peer: C \nHakim Botros: B\nEmilie Lorentsen: B\nHerman Karlsen: C\nNathalie Delacruz: C\nCasey Hartman: C\nLily Walker : A\nGerard Wang: C\nTony Mcdowell: C\nJake Wood: B\nFatemeh Akhtar: B\nKim Weston: B\nNicholas Beatty: A\nKirsten Williams: C\nVaishali Surana: C\nCoby Mccormack: C\nYasmin Dar: B\nRomy Donnelly: A\nViswamitra Upandhye: B\nKendrick Hilpert: A\nKillian Kaufman: B\nElwood Page: B\nMukti Patel: A\nEmily Lesch: C\nElodie Booker: B\nJedd Kim: A\nAnnabel Davies: A\nAdnan Chen: B\nJonathan Berg: C\nHank Spinka: B\nAgnes Schneider: C\nKimberly Green: A\nLola-Rose Coates: C\nRose Christiansen: C\nShirley Hintz: C\nHannah Bayer: B'

In [4]:
# extracting the names of only those students who received B
result_B = re.findall('([A-Z][a-z]* [A-Z][a-z]*)(?=[:]\sB)', grades)
result_B

['Bell Kassulke',
 'Simon Loidl',
 'Elias Jovanovic',
 'Hakim Botros',
 'Emilie Lorentsen',
 'Jake Wood',
 'Fatemeh Akhtar',
 'Kim Weston',
 'Yasmin Dar',
 'Viswamitra Upandhye',
 'Killian Kaufman',
 'Elwood Page',
 'Elodie Booker',
 'Adnan Chen',
 'Hank Spinka',
 'Hannah Bayer']

### Consider the standard web log file in [logdata.txt](logdata.txt). This file records the access a user makes when visiting a web page (like this one!). Each line of the log has the following items:
#### * a host (e.g., '146.204.224.152') 
#### * a user_name (e.g., 'feest6811' **note: sometimes the user name is missing! In this case, use '-' as the value for the username.**)
#### * the time a request was made (e.g., '21/Jun/2019:15:45:24 -0700')
#### * the post request type (e.g., 'POST /incentivize HTTP/1.1' **note: not everything is a POST!**)
### Let's convert this into a list of dictionaries, where each dictionary looks like the following:
 ```
 example_dict = {"host":"146.204.224.152", 
                 "user_name":"feest6811", 
                 "time":"21/Jun/2019:15:45:24 -0700",
                 "request":"POST /incentivize HTTP/1.1"}
 ```

In [5]:
# reading the file
with open("logdata.txt", "r") as file:
        logdata = file.read()

In [6]:
result = []

In [7]:
# composing regex patterm in verbose mode
pat = '''
    (?P<host>[\d]*[.][\d]*[.][\d]*[.][\d]*)
    (\s[-]\s)
    (?P<user_name>[\w]*[-]*)
    ([\s[])
    ([\s[])(?P<time>.*)
    ([]]\s["])
    (?P<request>.*)
    (?=["])'''

In [8]:
# using regex in verbose mode with finditer() method. Appending the extracted information to the list
for item in re.finditer(pat, logdata, re.VERBOSE):
        result.append(item.groupdict())

In [10]:
# show first 5 dictionaries in the resulting list

result[:5]

[{'host': '146.204.224.152',
  'user_name': 'feest6811',
  'time': '21/Jun/2019:15:45:24 -0700',
  'request': 'POST /incentivize HTTP/1.1'},
 {'host': '197.109.77.178',
  'user_name': 'kertzmann3129',
  'time': '21/Jun/2019:15:45:25 -0700',
  'request': 'DELETE /virtual/solutions/target/web+services HTTP/2.0'},
 {'host': '156.127.178.177',
  'user_name': 'okuneva5222',
  'time': '21/Jun/2019:15:45:27 -0700',
  'request': 'DELETE /interactive/transparent/niches/revolutionize HTTP/1.1'},
 {'host': '100.32.205.59',
  'user_name': 'ortiz8891',
  'time': '21/Jun/2019:15:45:28 -0700',
  'request': 'PATCH /architectures HTTP/1.0'},
 {'host': '168.95.156.240',
  'user_name': 'stark2413',
  'time': '21/Jun/2019:15:45:31 -0700',
  'request': 'GET /engage HTTP/2.0'}]

In [11]:
# info about the user with index 3
result[3]

{'host': '100.32.205.59',
 'user_name': 'ortiz8891',
 'time': '21/Jun/2019:15:45:28 -0700',
 'request': 'PATCH /architectures HTTP/1.0'}

In [12]:
# what is the user_name of the user with index 15
result[15]['user_name']

'luettgen1860'