## Reading Files

The *open()* function opens a file and returns file object.

In [18]:
file = open('text_file.txt', 'r')
file

<_io.TextIOWrapper name='text_file.txt' mode='r' encoding='cp1252'>

The second argument specifies the mode. r - read only, w - write only(overwrite), a - append, r+ - both read and write, b - open file in binary mode.

In [3]:
bin_file = open('text_file.txt', 'rb')
bin_file

<_io.BufferedReader name='text_file.txt'>

Better approach reading files is:

In [None]:
with open('text_file.txt', 'r') as f:
    #operations on file object

The *read()* function reads the entire content of the file. The *read()* function takes a second argument specifying the number of characters to read (text mode) or number of bytes to read.

In [4]:
contents = file.read()
contents

'File is a named location on disk to store related information. It is used to permanently store data in a non-volatile memory.\n\nSince, random access memory (RAM) is volatile which loses its data when computer is turned off, we use files for future use of the data.\n\nWhen we want to read from or write to a file we need to open it first. When we are done, it needs to be closed, so that resources that are tied with the file are freed.\nHence, in Python, a file operation takes place in the following order.'

In [5]:
bin_contents = bin_file.read(2)
bin_contents

b'Fi'

To read one line at a time, use *readline()* function. Or use the for loop.

In [19]:
for line in file:
    print(line, end='')

File is a named location on disk to store related information. It is used to permanently store data in a non-volatile memory.

Since, random access memory (RAM) is volatile which loses its data when computer is turned off, we use files for future use of the data.

When we want to read from or write to a file we need to open it first. When we are done, it needs to be closed, so that resources that are tied with the file are freed.
Hence, in Python, a file operation takes place in the following order.

We can see that the file object is an iterator.

In [20]:
content_iterator = iter(file)
next(content_iterator)

StopIteration: 

An error was got in earlier code because the iterator had already been iterated throughout its content.

In [21]:
file.seek(0)
next(content_iterator)

'File is a named location on disk to store related information. It is used to permanently store data in a non-volatile memory.\n'

## Encoding Issue

The *open()* function also accepts **encoding** as an argument, just attach encoding='value'. On Windows the default encoding is CP-1252.

In [22]:
file.encoding

'cp1252'

In [25]:
utf_file = open('text_file.txt', 'r', encoding='utf-8')
utf_file.read()

'File is a named location on disk to store related information. It is used to permanently store data in a non-volatile memory.\n\nSince, random access memory (RAM) is volatile which loses its data when computer is turned off, we use files for future use of the data.\n\nWhen we want to read from or write to a file we need to open it first. When we are done, it needs to be closed, so that resources that are tied with the file are freed.\nHence, in Python, a file operation takes place in the following order.'

In [30]:
chinese_text = open('chinese_text.txt')
print(chinese_text.encoding)
chinese_text.read()

cp1252


UnicodeDecodeError: 'charmap' codec can't decode byte 0x9d in position 43: character maps to <undefined>

In [36]:
chinese_text_utf = open('chinese_text.txt', encoding='utf-8')
chinese_text_utf.read()

'\ufeff今有一人，入人園圃，竊其桃李，眾聞則非之，上為政者得則罰之。此何也？以虧人自利也。至攘人犬豕雞豚者，其不義又甚入人園圃竊桃李。是何故也？以虧人愈多，其不仁茲甚，罪益厚。至入人欄廄，取人馬牛者，其不仁義又甚攘人犬豕雞豚。此何故也？以其虧人愈多。苟虧人愈多，其不仁茲甚，罪益厚。至殺不辜人也，扡其衣裘，取戈劍者，其不義又甚入人欄廄取人馬牛。此何故也？以其虧人愈多。苟虧人愈多，其不仁茲甚矣，罪益厚。當此，天下之君\n子1皆知而非之，謂之不義。今至大為攻國，則弗知非，從而譽之，謂之義。此可2謂知義與不義之別乎？'

In [37]:
chinese_text_utf.seek(0)
chinese_text_utf.read(1) # prints '\ufeff'
chinese_text.tell()

754

In [33]:
chinese_text_utf.read(1)

'今'

In [47]:
chinese_text_utf.seek(0)
print(chinese_text_utf.read(1))
chinese_text_utf.seek(3)
print(chinese_text_utf.read(1))
chinese_text_utf.seek(6)
print(chinese_text_utf.read(1))
chinese_text_utf.seek(9)
print(chinese_text_utf.read(1))

﻿
今
有
一


In the above example, each chinese character requires 3 bytes (utf-8 is variable length). If we access a random position, we get error.

In [48]:
chinese_text_utf.seek(2)
chinese_text_utf.read(1)

UnicodeDecodeError: 'utf-8' codec can't decode byte 0xbf in position 0: invalid start byte

## Writing to File

In [49]:
with open('another_file.txt', 'a') as f:
    f.write('The first line.\nThe second one')

In [51]:
with open('another_file.txt', 'r') as f:
    print(f.read())

The first line.
The second one


In [52]:
with open('another_file.txt', 'r') as f:
    f.write('Adding content in read only mode')

UnsupportedOperation: not writable

In [53]:
with open('another_file.txt', 'w') as f:
    f.write('New content added!')

In [54]:
with open('another_file.txt', 'r') as f:
    print(f.read())

New content added!


In [55]:
with open('another_file.txt', 'a') as f:
    f.write('Adding again the old line.\nThe second one.')

In [57]:
with open('another_file.txt', 'a') as f:
    f.seek(0)
    f.write('Trying to add to the start.\n') # Doesn't work

In [63]:
with open('utf_file.txt', 'w', encoding='utf-8') as f:
    f.write('是. The JSON format is commonly used by modern applications to allow for data exchange.\
    \nMany programmers are already familiar with it, which makes it a good choice for interoperability.')

In [69]:
with open('utf_file.txt', 'rb') as f:
    print(f.read(5))
    print(f.read())

b'\xe6\x98\xaf. '
b'The JSON format is commonly used by modern applications to allow for data exchange.    \r\nMany programmers are already familiar with it, which makes it a good choice for interoperability.'


In [80]:
with open('random_characters.txt', 'r+b') as f:
    f.write(b'\xe6\x99\xad\xe5\x9f\xad\xe2\x94\xae')

with open('random_characters.txt', 'r', encoding='utf-8') as f:
    print(f.read())

晭埭┮


In [89]:
with open('random_characters2.txt', 'ab') as f:
    n = '1100001010100010'
    data = int(n,2).to_bytes((len(n)+7)//8, 'big')  # converting binary to bytes object  
    f.write(data)

with open('random_characters2.txt', 'r', encoding='utf-8') as f:
    print(f.read())

¢


In [91]:
with open('ferris.jpg', 'rb') as f:
    print(bin(int.from_bytes(f.read(10), 'big')))

0b11111111110110001111111111100000000000000001000001001010010001100100100101000110
