# IO Process

This is not very related to the OOModeling but in order to build our case study we need to practice with the basic techniques by which one can read/write from/to files and process the contents. In this note we practice with the following topics:
- read / write from / to a file.
- handling json files
- handling csv files
- regular expressions as a tool to process content


## Files: Read / Write

Using files, there is a simple but important rule to remember: open the file, use (read/write) the file, close the file.

There are plenty of materials available to explain the topic in detail. Here, I try to summarize the important points and practice with some examples.

In Python:
- one can open a file using a built in function **open(name,mode)** where name defines the name of the file and mode specifies the purpose of the file access: (r)ead, (w)rite, (a)ppend. In order to open a file to have both write and read 'r+' can be used.
- after a successful execution of the function **open(...)** a file object is returned. The returned file object contains all the attributes and methods that are required to handle the files.
- the most important methods to start: **write(content)**, **read()** and **close()**.

In the following we try to provide some examples to practice the basics of files read / write.

In [36]:
fo = open(r'./iofiles/sample.txt','w')  # Here we open a file to write: mode is w, directory address is relative
fo.write('[Note] This is a sample content to be written in a file \n') # write the given string: fo is the file object
fo.close()  # Do not forget to close the file when you do not need it.
# Let's read the content
fo = open(r'./iofiles/sample.txt','r')  # Here we open a file to read it
content = fo.read()  # Here we read the content
print('The content of the file after the read:\n',content)
fo.close()

The content of the file after the read:
 [Note] This is a sample content to be written in a file 



In [37]:
fo = open('sample.txt','a') # no directory specified: means same folder 
fo.write('We expect the file exits and this line will be added ...\n')
fo.close()
fo = open('sample.txt','r')
content = fo.read()
print('The latest content of our extended file is:\n',content)
fo.close()

The latest content of our extended file is:
 We expect the file exits and this line will be added ...
We expect the file exits and this line will be added ...
We expect the file exits and this line will be added ...



The following example introduces:
- funtions **tell()** and **seek()** which can be used to get and put reading / writing position. 
- functions **split()** and **strip()** as very simple functions to parse and process content.

In [76]:
# A simple example to practice with tell() , seek(), readline()

fo = open(r'csvexample.csv','r')
line = fo.readline()  # This is readin one line from fo
print(line)
pos = fo.tell()  # This will tell us what is the current position of fo
print('position now is at:', pos)
fo.seek(pos+10)  # This is asking fo to jump to the position po+5
line = fo.readline()
print(line)
pos = fo.tell()  # This will tell us what is the current position of fo
print('position now is at:', pos)
fo.seek(0)  # This is asking po to jump to the beginning
line = fo.readline()  # This is readin a line from the beginning ...
print(line)  # Here we print the content of the line
print(line.split(',')) # Here we split the content of the line by comma
cl = line.strip('\n')  # Let's remove the newline character
print(cl.split(','))  # Now it is more clean
fo.close()

Name,Class Year,Dorm,Room,GPA

position now is at: 30
taker,2018,McCarren House,312,3.75

position now is at: 75
Name,Class Year,Dorm,Room,GPA

['Name', 'Class Year', 'Dorm', 'Room', 'GPA\n']
['Name', 'Class Year', 'Dorm', 'Room', 'GPA']


## JSON Files

In this note we practice simple steps of handling json file. In order to load information from a given json file we can employ the module *json*. See the example below:

In [38]:
import json

fo = open(r'example.json','r')
content = fo.read()
info = json.loads(content)

print('[Content of json load is]:',info)
print('[Check the value of one of the keys]:',info[0]['first_name'])
print('[All the keys]:',info[0].keys())
print('[Print all the emails]:')
for i in range(0,len(info)):
    print(info[i]['email'])

fo.close()

[Content of json load is]: [{'id': 1, 'first_name': 'Jeanette', 'last_name': 'Penddreth', 'email': 'jpenddreth0@census.gov', 'gender': 'Female', 'ip_address': '26.58.193.2'}, {'id': 2, 'first_name': 'Giavani', 'last_name': 'Frediani', 'email': 'gfrediani1@senate.gov', 'gender': 'Male', 'ip_address': '229.179.4.212'}, {'id': 3, 'first_name': 'Noell', 'last_name': 'Bea', 'email': 'nbea2@imageshack.us', 'gender': 'Female', 'ip_address': '180.66.162.255'}, {'id': 4, 'first_name': 'Willard', 'last_name': 'Valek', 'email': 'wvalek3@vk.com', 'gender': 'Male', 'ip_address': '67.76.188.26'}]
[Check the value of one of the keys]: Jeanette
[All the keys]: dict_keys(['id', 'first_name', 'last_name', 'email', 'gender', 'ip_address'])
[Print all the emails]:
jpenddreth0@census.gov
gfrediani1@senate.gov
nbea2@imageshack.us
wvalek3@vk.com


As you can find out from the content of a json file, all the information exists as a string. The function **json.loads()** converts this string into a dictionary. This is called *deserialization*. In order to store the information from a dictionary into a json file, we have to convert a dictionary into a sequence of characters. This process is known as *serialization*.

The following example presents a simple example of serialization and storing into a file.

In [39]:
import json

class Address:
    def __init__(self, cnt , cty , pcode, st , num):
        self.country = cnt
        self. city = cty
        self.postcode = pcode
        self.street = st
        self.number = num

if __name__ == '__main__':
    ad = Address('The Netherlands','Rotterdam','2323AX','Zwartehondstraat','24a')
    # Check how we pass dictionary of an object to serialize
    adic = json.dumps(ad.__dict__ , indent = 4)  # method dumps takes care of serialization
    # The result is a string to be stored / printed
    print(adic)
    # Here we can write the content into a file
#    fo = open(r'./folder/address.json', 'w')  
#    fo.write(adic)  
#    fo.close()  


{
    "country": "The Netherlands",
    "city": "Rotterdam",
    "postcode": "2323AX",
    "street": "Zwartehondstraat",
    "number": "24a"
}


Now the challenge is to serialize a complex object. Here by complex we mean objects that contain other objects as one of their attributes. In general, we have to implement the procedure to convert the object into a dictionary. See the example below.

In [40]:
import json

class Address:
    def __init__(self, cnt , cty , pcode, st , num):
        self.country = cnt
        self. city = cty
        self.postcode = pcode
        self.street = st
        self.number = num

class Person:
    def __init__(self, fn, ln, ad):
        self.first_name = fn
        self.last_name = ln
        self.address = ad
    def toDict(self):
        sd = self.__dict__
        sd['address']=self.address.__dict__
        return sd

if __name__ == '__main__':
    ad = Address('The Netherlands','Rotterdam','2323AX','Zwartehondstraat','24a')
    p = Person('John','Johanssen', ad)
    pdict = p.toDict() # Check: here we get the dictionary
    res = json.dumps(pdict , indent = 2)  # we pass dictionary of an object to serialize
    print(res)


{
  "first_name": "John",
  "last_name": "Johanssen",
  "address": {
    "country": "The Netherlands",
    "city": "Rotterdam",
    "postcode": "2323AX",
    "street": "Zwartehondstraat",
    "number": "24a"
  }
}


## CSV Files

A CSF file is simply a demilited text in which comma is used to separate the values. It is very suitable to represent a tabular data. A table consists of rows containing some fields. Each field contains a simple value: number, string, date, boolean, etc. In a CSV file, fields are separated by comma ( or sometimes other separator characters like tabs, semicolon, space). Below is a simple example to read the contents of a csv file.

In [56]:
import csv

fo = open(r'csvexample.csv','r')

content = csv.reader(fo , delimiter = ',')

for row in content:
    print(row)  # Check the structure of a row given by a csv reader

fo.close()

['Name', 'Class Year', 'Dorm', 'Room', 'GPA']
['Sally Whittaker', '2018', 'McCarren House', '312', '3.75']
['Belinda Jameson', '2017', 'Cushing House', '148', '3.52']
['Jeff Smith', '2018', 'Prescott House', '17-D', '3.20']
['Sandy Allen', '2019', 'Oliver House', '108', '3.48']


The module csv from Python provides a reader method to provide the content in an ordered dictionary format. The reader assumes that the first line of the input file contains the field names.

In [73]:
import csv

fo = open(r'csvexample.csv','r')

fields = fo.readline()  # Lets see the field names
sf = fields.strip()  # This removes the newline character from the end of the line
print(sf.split(','))  # This provides field names in a list
fo.seek(0)

content = csv.DictReader(fo , delimiter = ',') # Each row of content is an OrderedDict (ordered dictionary)

for row in content:
    print(row['Name'],'with GPA = ',row['GPA'])  # Check the structure of a row given by a csv reader

fo.close()

['Name', 'Class Year', 'Dorm', 'Room', 'GPA']
Sally Whittaker with GPA =  3.75
Belinda Jameson with GPA =  3.52
Jeff Smith with GPA =  3.20
Sandy Allen with GPA =  3.48


## Processing: Regular Expressions

**Motivation**: Sometimes in a file we need to search for a specific patterns. For example, we would like to see where in the file expressions defining some amount of the value is written. So texts like 'amount=20', 'amount is 20', 'amount can be 20' or 'amount equals to 20' should be acceptable. We need a technique todefine such a general pattern. In the following we introduce regular expressions which is very helpful to define and find our required patterns.


**Definition**: A RE is a regular expressions specified with a set of formal symbols to specify a pattern within a sequence. In order to define a pattern we specify **r** followed by a combination of the following symbols. Here we make a list of the basic symbols:
- \w : Any word characters (letters, digits, and the underscore _ character)
- \W : The sequence that DOES NOT contain any word characters ( anything that IS NOT in \w )
- \d : Any digit
- \D : The sequence that DOES NOT contain any digits ( anything that IS NOT in \d )
- [ ] : A set of characters; like r'[a-m]' means any letter between a and m is a match, r'[arn]' means any of {a,r,n}.
- \* : Zero or more occurrences like r'ab*' means a followed by zero or more bs. like: a , ab, abbb, abbbb.
- \+ : One or more occurrences like r'aix+'
- ?  : 0 or 1 of the preceding character; like r'ab?' means a or ab
- {} : Exactly the specified number of occurrences; r'al{2}' means one a followed by exactly two l
- |	 : Either left or right; like r'falls | stays'
- () : makes a group



**Programming**: 
In order to process regular expressions in Python we import a module named **re**. The module **re** provides us a method to search a pattern in a given string:

**re.search(pattern, string, flags=0)** Scans through **string** looking for the first location where the regular expression **pattern** produces a match, and return a corresponding **match object**.

A match object provides the folling methods to process the result of the search:
- ** span() **: is a method from match object that returns a tuple containing the start-, and end positions of the match
- ** string **: is an attribute that returns the string passed into the function
- ** group() **: is a method that returns the part of the string where there was a match



In [41]:
import re

text = 'Class diagrams can specify static aspects of an entity. It defines the classes and relationships. \n ' \
       'For the examples above, we can define classes Light and Task. But, how can we specify / model the behaviour of \n' \
       'the created objects from these classes? How can we specify that what happens to the objects during their lifetime? \n' \

r1 = r'([c|C])(lass)([a-z]*)'
mo = re.search(r1, text)  # This will give the first macth
print('The match object is:',mo) # This will print the match object
print('The start and end pos of the match',mo.span()) # tuple containing the start-, and end positions of the match
print(mo.string) # This will print the string passed into the function
print(mo.group()) # This will print the part of the string where there was a match
moall = re.findall(r1, text)  # This will all the matches
print('The result for all the matches is:',moall) # This will print the list of all the matches

The match object is: <_sre.SRE_Match object; span=(0, 5), match='Class'>
The start and end pos of the match (0, 5)
Class diagrams can specify static aspects of an entity. It defines the classes and relationships. 
 For the examples above, we can define classes Light and Task. But, how can we specify / model the behaviour of 
the created objects from these classes? How can we specify that what happens to the objects during their lifetime? 

Class
The result for all the matches is: [('C', 'lass', ''), ('c', 'lass', 'es'), ('c', 'lass', 'es'), ('c', 'lass', 'es')]


**Exercise**: Read a text file. Search the RE r'([c|C])(lass)([a-z]*)' in the contents of the file.

**Exercise**: Define a regular expression and search in the content of a text file.

**Exercise**: The following code is given. It shows how to search a list of REs in a text. It store the results of the searches in a log-file. Morover, it can be used as an example to see how one can read the content of a file line-by-line. Copy the following code in your editor and try the following exercises: 
- Check each RE and see which word of the provided text is a match.
- Add some terms at the end of the text that can be found by the list of REs.
- Change the code to read the content of a given file and finds the REs in the content.


In [42]:
import re

text = 'Class diagrams can specify static aspects of an entity. ' \
       'It defines the classes and relationships. \n ' \
       'For the examples above, we can define classes Light and Task. But, ... / model the behaviour of \n' \
       'the created objects from these classes? .... during their lifetime? \n' \
       'used to specify: Real-time / Mission-Critical systems, e.g. Defense Systems; ' \
       'Special-purpose devices, like ATM; Games.\n' \
       'some terms added to be found,,, June 29th, July-16, ' \
       'the amount of some products can be 0 but we have a product with amount = 20. \n'

# let's define a list of all regular expressions
regexs = [r'e.g.' , r'([T|t])(ask)([a-z]*)', r'([a-zA-Z]+)-([a-zA-Z]+)', r'(([a-zA-Z]+)( )([0-9]+))', 
          r'(([a-zA-Z]+).([0-9]+))' , r'(amount)(\D+)(\d+)', r'^[A-Z]', r'(\w+|\d+)([.])$']

result = ''
for reg in regexs:
    ms = re.findall(reg, text, re.MULTILINE)  # This will give all the matches match
    result = result + '\n'+'RE is: '+reg+'\n'+'All the matches are:'+str(ms)

# In order to practice, uncomment the following lines 
#filename = 'regexp-result.txt'
#filecontent = '[ Text ]:'+text+'\n'+result
#fo = open(filename ,'w')
#fo.write(filecontent)
#fo.close()
#fo = open(filename,'r')
#l = fo.readline() # reads one line
#for l in fo:  # reads the file line by line
#    print(l)
#fo.close()