## Chapter 8: Working with Files in Python
***

In real life data reside in files. File reading and writing are common input/output (IO) operations.

In Python, we must open files before we can use them and close them when we are done with them. 

### §8.1 Reading from Files

Python provides an `open()` function to create a file object, and we can read data from the file object.

In [None]:
f = open('../folder/data.txt', 'r')

In the above example, `'folder/data.txt'` is the path to the file we want to read, `'r'` stands for reading mode.
If the file does not exists, `open()` will throw an `IOError` exception.

![image.png](attachment:image.png)

If the specified file can be successfully opened, we can call the file object's `read()` method to get the contents from it:

In [None]:
f.read()

The final step is calling `close()` method to close the file. Opened files must be closed in order to release system resources.

In [None]:
f.close()

If the file size is small, we can read all the data from the file using `read()` conveniently. However, it is not practical to read all the data from a large file directly, instead, we can add a size argument to it and read repeatedly:

In [5]:
gcfile = open("../data/graudate course type in Singapore.txt", "r")

for aline in gcfile:
    values = aline.split(",")
    print('In Year ', values[0], ", There are ",values[3], " ", values[1], " graduates in course ", values[2] )

gcfile.close()

In Year  year , There are  no_of_graduates
   sex  graduates in course  type_of_course
In Year  1993 , There are  na
   Males  graduates in course  Education
In Year  1993 , There are  na
   Males  graduates in course  Applied Arts
In Year  1993 , There are  481
   Males  graduates in course  Humanities & Social Sciences
In Year  1993 , There are  na
   Males  graduates in course  Mass Communication
In Year  1993 , There are  295
   Males  graduates in course  Accountancy
In Year  1993 , There are  282
   Males  graduates in course  Business & Administration
In Year  1993 , There are  92
   Males  graduates in course  Law
In Year  1993 , There are   Physical & Mathematical Sciences"   Males  graduates in course  "Natural
In Year  1993 , There are  95
   Males  graduates in course  Medicine
In Year  1993 , There are  14
   Males  graduates in course  Dentistry
In Year  1993 , There are  10
   Males  graduates in course  Health Sciences
In Year  1993 , There are  264
   Males  graduates 

In Year  1999 , There are  218
   Females  graduates in course  Information Technology
In Year  1999 , There are  167
   Females  graduates in course  Architecture & Building
In Year  1999 , There are  492
   Females  graduates in course  Engineering Sciences
In Year  1999 , There are  na
   Females  graduates in course  Services
In Year  2000 , There are  35
   Males  graduates in course  Education
In Year  2000 , There are  na
   Males  graduates in course  Applied Arts
In Year  2000 , There are  574
   Males  graduates in course  Humanities & Social Sciences
In Year  2000 , There are  34
   Males  graduates in course  Mass Communication
In Year  2000 , There are  250
   Males  graduates in course  Accountancy
In Year  2000 , There are  328
   Males  graduates in course  Business & Administration
In Year  2000 , There are  74
   Males  graduates in course  Law
In Year  2000 , There are   Physical & Mathematical Sciences"   Males  graduates in course  "Natural
In Year  2000 , There ar

   Males  graduates in course  Mass Communication
In Year  2006 , There are  176
   Males  graduates in course  Accountancy
In Year  2006 , There are  432
   Males  graduates in course  Business & Administration
In Year  2006 , There are  70
   Males  graduates in course  Law
In Year  2006 , There are   Physical & Mathematical Sciences"   Males  graduates in course  "Natural
In Year  2006 , There are  135
   Males  graduates in course  Medicine
In Year  2006 , There are  18
   Males  graduates in course  Dentistry
In Year  2006 , There are  24
   Males  graduates in course  Health Sciences
In Year  2006 , There are  319
   Males  graduates in course  Information Technology
In Year  2006 , There are  166
   Males  graduates in course  Architecture & Building
In Year  2006 , There are  2934
   Males  graduates in course  Engineering Sciences
In Year  2006 , There are  na
   Males  graduates in course  Services
In Year  2006 , There are  302
   Females  graduates in course  Education
In Y

In Year  2010 , There are  121
   Females  graduates in course  Applied Arts
In Year  2010 , There are  1474
   Females  graduates in course  Humanities & Social Sciences
In Year  2010 , There are  147
   Females  graduates in course  Mass Communication
In Year  2010 , There are  554
   Females  graduates in course  Accountancy
In Year  2010 , There are  859
   Females  graduates in course  Business & Administration
In Year  2010 , There are  90
   Females  graduates in course  Law
In Year  2010 , There are   Physical & Mathematical Sciences"   Females  graduates in course  "Natural
In Year  2010 , There are  94
   Females  graduates in course  Medicine
In Year  2010 , There are  25
   Females  graduates in course  Dentistry
In Year  2010 , There are  196
   Females  graduates in course  Health Sciences
In Year  2010 , There are  154
   Females  graduates in course  Information Technology
In Year  2010 , There are  178
   Females  graduates in course  Architecture & Building
In Year  2

There is an `with` statement in Python which can help us close the opened files automatically:

In [2]:
with open('../data/graudate course type in Singapore.csv', 'r') as f:
    print(f.read())

year,sex,type_of_course,no_of_graduates
1993,Males,Education,na
1993,Males,Applied Arts,na
1993,Males,Humanities & Social Sciences,481
1993,Males,Mass Communication,na
1993,Males,Accountancy,295
1993,Males,Business & Administration,282
1993,Males,Law,92
1993,Males,"Natural, Physical & Mathematical Sciences",404
1993,Males,Medicine,95
1993,Males,Dentistry,14
1993,Males,Health Sciences,10
1993,Males,Information Technology,264
1993,Males,Architecture & Building,132
1993,Males,Engineering Sciences,1496
1993,Males,Services,na
1993,Females,Education,na
1993,Females,Applied Arts,na
1993,Females,Humanities & Social Sciences,1173
1993,Females,Mass Communication,na
1993,Females,Accountancy,396
1993,Females,Business & Administration,708
1993,Females,Law,93
1993,Females,"Natural, Physical & Mathematical Sciences",588
1993,Females,Medicine,61
1993,Females,Dentistry,11
1993,Females,Health Sciences,40
1993,Females,Information Technology,215
1993,Females,Architecture & Building,144
1993,Females,Engine

Instead of reading the whole file content, we can read one line from the file.
Below table summarizes methods we can use.
When it reaches end of file, readline() and readlines() will return empty string.

![image.png](attachment:image.png)

In [25]:
infile = open("../data/graudate course type in Singapore.txt", "r")

In [28]:
line = infile.readline()
while line:
    values = line.split(",")
    print('In Year ', values[0], ", There are ",values[3], " ", values[1], " graduates in course ", values[2] )
    line = infile.readline()

infile.close()

In Year  1993 , There are  na
   Males  graduates in course  Applied Arts
In Year  1993 , There are  481
   Males  graduates in course  Humanities & Social Sciences
In Year  1993 , There are  na
   Males  graduates in course  Mass Communication
In Year  1993 , There are  295
   Males  graduates in course  Accountancy
In Year  1993 , There are  282
   Males  graduates in course  Business & Administration
In Year  1993 , There are  92
   Males  graduates in course  Law
In Year  1993 , There are   Physical & Mathematical Sciences"   Males  graduates in course  "Natural
In Year  1993 , There are  95
   Males  graduates in course  Medicine
In Year  1993 , There are  14
   Males  graduates in course  Dentistry
In Year  1993 , There are  10
   Males  graduates in course  Health Sciences
In Year  1993 , There are  264
   Males  graduates in course  Information Technology
In Year  1993 , There are  132
   Males  graduates in course  Architecture & Building
In Year  1993 , There are  1496
   Mal

   Males  graduates in course  Law
In Year  2001 , There are   Physical & Mathematical Sciences"   Males  graduates in course  "Natural
In Year  2001 , There are  97
   Males  graduates in course  Medicine
In Year  2001 , There are  19
   Males  graduates in course  Dentistry
In Year  2001 , There are  17
   Males  graduates in course  Health Sciences
In Year  2001 , There are  249
   Males  graduates in course  Information Technology
In Year  2001 , There are  124
   Males  graduates in course  Architecture & Building
In Year  2001 , There are  2517
   Males  graduates in course  Engineering Sciences
In Year  2001 , There are  na
   Males  graduates in course  Services
In Year  2001 , There are  147
   Females  graduates in course  Education
In Year  2001 , There are  na
   Females  graduates in course  Applied Arts
In Year  2001 , There are  1520
   Females  graduates in course  Humanities & Social Sciences
In Year  2001 , There are  77
   Females  graduates in course  Mass Communica

In Year  2005 , There are  363
   Males  graduates in course  Information Technology
In Year  2005 , There are  138
   Males  graduates in course  Architecture & Building
In Year  2005 , There are  2887
   Males  graduates in course  Engineering Sciences
In Year  2005 , There are  na
   Males  graduates in course  Services
In Year  2005 , There are  278
   Females  graduates in course  Education
In Year  2005 , There are  11
   Females  graduates in course  Applied Arts
In Year  2005 , There are  1027
   Females  graduates in course  Humanities & Social Sciences
In Year  2005 , There are  110
   Females  graduates in course  Mass Communication
In Year  2005 , There are  495
   Females  graduates in course  Accountancy
In Year  2005 , There are  799
   Females  graduates in course  Business & Administration
In Year  2005 , There are  125
   Females  graduates in course  Law
In Year  2005 , There are   Physical & Mathematical Sciences"   Females  graduates in course  "Natural
In Year  20

   Females  graduates in course  Mass Communication
In Year  2011 , There are  507
   Females  graduates in course  Accountancy
In Year  2011 , There are  944
   Females  graduates in course  Business & Administration
In Year  2011 , There are  208
   Females  graduates in course  Law
In Year  2011 , There are   Physical & Mathematical Sciences"   Females  graduates in course  "Natural
In Year  2011 , There are  119
   Females  graduates in course  Medicine
In Year  2011 , There are  26
   Females  graduates in course  Dentistry
In Year  2011 , There are  213
   Females  graduates in course  Health Sciences
In Year  2011 , There are  195
   Females  graduates in course  Information Technology
In Year  2011 , There are  247
   Females  graduates in course  Architecture & Building
In Year  2011 , There are  1215
   Females  graduates in course  Engineering Sciences
In Year  2011 , There are  50
   Females  graduates in course  Services
In Year  2012 , There are  116
   Males  graduates i

In [34]:
f = open('../data/graudate course type in Singapore.txt', 'r')
f.readlines()[1:]

['1993,Males,Education,na\n',
 '1993,Males,Applied Arts,na\n',
 '1993,Males,Humanities & Social Sciences,481\n',
 '1993,Males,Mass Communication,na\n',
 '1993,Males,Accountancy,295\n',
 '1993,Males,Business & Administration,282\n',
 '1993,Males,Law,92\n',
 '1993,Males,"Natural, Physical & Mathematical Sciences",404\n',
 '1993,Males,Medicine,95\n',
 '1993,Males,Dentistry,14\n',
 '1993,Males,Health Sciences,10\n',
 '1993,Males,Information Technology,264\n',
 '1993,Males,Architecture & Building,132\n',
 '1993,Males,Engineering Sciences,1496\n',
 '1993,Males,Services,na\n',
 '1993,Females,Education,na\n',
 '1993,Females,Applied Arts,na\n',
 '1993,Females,Humanities & Social Sciences,1173\n',
 '1993,Females,Mass Communication,na\n',
 '1993,Females,Accountancy,396\n',
 '1993,Females,Business & Administration,708\n',
 '1993,Females,Law,93\n',
 '1993,Females,"Natural, Physical & Mathematical Sciences",588\n',
 '1993,Females,Medicine,61\n',
 '1993,Females,Dentistry,11\n',
 '1993,Females,Health 

Calling `readlines()` method can read the file into lines, so that we can process line by line:

In [35]:
f = open('../data/graudate course type in Singapore.txt', 'r')
for line in f.readlines()[1:]:
    print(line.strip()) # strip() removes '\n' at the end of the line

1993,Males,Education,na
1993,Males,Applied Arts,na
1993,Males,Humanities & Social Sciences,481
1993,Males,Mass Communication,na
1993,Males,Accountancy,295
1993,Males,Business & Administration,282
1993,Males,Law,92
1993,Males,"Natural, Physical & Mathematical Sciences",404
1993,Males,Medicine,95
1993,Males,Dentistry,14
1993,Males,Health Sciences,10
1993,Males,Information Technology,264
1993,Males,Architecture & Building,132
1993,Males,Engineering Sciences,1496
1993,Males,Services,na
1993,Females,Education,na
1993,Females,Applied Arts,na
1993,Females,Humanities & Social Sciences,1173
1993,Females,Mass Communication,na
1993,Females,Accountancy,396
1993,Females,Business & Administration,708
1993,Females,Law,93
1993,Females,"Natural, Physical & Mathematical Sciences",588
1993,Females,Medicine,61
1993,Females,Dentistry,11
1993,Females,Health Sciences,40
1993,Females,Information Technology,215
1993,Females,Architecture & Building,144
1993,Females,Engineering Sciences,254
1993,Females,Services

In `'rb'` mode, we can read binary files like image, video, etc.

In [None]:
f = open('binary_file.bin', 'rb')

### §8.2 Writing to Files
Writing files is similar to reading files, but we use `'w'` or `'wb'` mode to write text or binary files respectively.

In [41]:
f = open('../data/data.txt', 'w')
f.write('01 02 03')
f.close()


Remember to close the file, otherwise the last part of the data may be lost.  
Similarly, we can also write files using `with` statement without caring about closing it:

In [45]:
with open('../data/data.txt', 'a') as f:
    f.write('this is a new line\n')

If the file we are going to write is already existing, the original one will be replace. To append new content to the existing file, we can use `'a'` mode instead of `'w'`.

### §8.3 Working with JSON Data


JSON is one of the most popular formats for transferring data through APIs nowadays.  
Python comes with a built-in module for encoding and decoding JSON data.

In [14]:
import json

#### Request data from online
Nowadays, most of the programs or applications requres access to the internet and downloads or uploads certatin data. In this part, let's look at how can we download data in Python in a convenient way.

#### Using requests
Making a request by a cetrain URL using `requests` module is very simple. First, install `requests` module by running
```
pip install requests
```
then import it in your code

In [48]:
import requests
import json

Now, let's try to get some data from an API:

In [50]:
r = requests.get('https://api.github.com')
print(r.text)

{
  "current_user_url": "https://api.github.com/user",
  "current_user_authorizations_html_url": "https://github.com/settings/connections/applications{/client_id}",
  "authorizations_url": "https://api.github.com/authorizations",
  "code_search_url": "https://api.github.com/search/code?q={query}{&page,per_page,sort,order}",
  "commit_search_url": "https://api.github.com/search/commits?q={query}{&page,per_page,sort,order}",
  "emails_url": "https://api.github.com/user/emails",
  "emojis_url": "https://api.github.com/emojis",
  "events_url": "https://api.github.com/events",
  "feeds_url": "https://api.github.com/feeds",
  "followers_url": "https://api.github.com/user/followers",
  "following_url": "https://api.github.com/user/following{/target}",
  "gists_url": "https://api.github.com/gists{/gist_id}",
  "hub_url": "https://api.github.com/hub",
  "issue_search_url": "https://api.github.com/search/issues?q={query}{&page,per_page,sort,order}",
  "issues_url": "https://api.github.com/issues

By passing the URL to the `requests.get` function, we can get the response from the URL, which is usually HTML data.  
If you copy paste the URL to your browser, you can see the same result.

Try the URL `https://api.github.com` in your browser. Right click the webpage and select `View Page Source`, you can see the data.

If you want to extract certain information from the webpage, especially when you are writing a web crawler, you have to deal with such string containing HTML tags.

We can convert this JSON object (saved as a Python string) to a Python dictionary using `json.loads`

In [51]:
json.loads(r.text)

{'current_user_url': 'https://api.github.com/user',
 'current_user_authorizations_html_url': 'https://github.com/settings/connections/applications{/client_id}',
 'authorizations_url': 'https://api.github.com/authorizations',
 'code_search_url': 'https://api.github.com/search/code?q={query}{&page,per_page,sort,order}',
 'commit_search_url': 'https://api.github.com/search/commits?q={query}{&page,per_page,sort,order}',
 'emails_url': 'https://api.github.com/user/emails',
 'emojis_url': 'https://api.github.com/emojis',
 'events_url': 'https://api.github.com/events',
 'feeds_url': 'https://api.github.com/feeds',
 'followers_url': 'https://api.github.com/user/followers',
 'following_url': 'https://api.github.com/user/following{/target}',
 'gists_url': 'https://api.github.com/gists{/gist_id}',
 'hub_url': 'https://api.github.com/hub',
 'issue_search_url': 'https://api.github.com/search/issues?q={query}{&page,per_page,sort,order}',
 'issues_url': 'https://api.github.com/issues',
 'keys_url': '

In [52]:
import json
r = requests.get('https://api.github.com')
user_url = json.loads(r.text)['current_user_url']
print(user_url)

https://api.github.com/user


In case the response we got is not in JSON format, calling `json.loads` will cause exceptions. Again, it is better to surround this piece of code with a try-except block.

In [17]:
try:
    data = json.loads(r.text)
    print(data['emails_url'])
except:
    print('invalid JSON data')

https://api.github.com/user/emails


It is possbile to encounter network issues while requesting online resources, thus it is important to surround your requests with try-except blocks, check response status, as well as set timeout.

In [22]:
try:
    # set timeout as 5 seconds, to prevent endless waitting
    r = requests.get('https://api.github.com/', timeout=5)
    if r.status_code == 200:
        # HTTP 200 status response code indicates the request has succeeded
        print('Valid response status')
    else:
        print('Invalid response status')
except:
    print('Error occurred')

Valie response status


`json.dumps` can convert a Python dictionary to a JSON formatted string.

In [53]:
json.dumps({'key': 'abc', 'value': 'Hello World', 'valid': True})

'{"key": "abc", "value": "Hello World", "valid": true}'