# File objects in Python

In Python, a file operation takes place in the following order.

1. Open a file
2. Read or write (perform operation)
3. Close the file

## Opening a file

### Mode while opening a file

Mode accepts two parameters -

1. Read/write mode: 'r'(default), 'w', 'a'
2. Text/Binary mode: 't'(default), 'b'

eg. `open('file.txt', 'wb')`

In text mode, we get strings when reading from the file.

Binary mode returns bytes and this is the mode to be used when dealing with non-text files like image or exe files.

### File encoding

Default encoding on Windows: `cp1252`
Default encoding on Linux: `utf-8`

Hence, we must not rely on the defult encoding of the platform.

```py
f = open("test.txt", mode='r',encoding='utf-8')
```

In [3]:
f = open('data/sample_file.txt', 'r', encoding='utf-8')

## Closing a file

Closing a file will free up the resources. What resource ???

Preferred way to operate with files. Opening a file can lead to n types of erros. An opened file must be closed properly even in case of errros.

```py
try:
   f = open("test.txt",encoding = 'utf-8')
   # perform file operations
finally:
   f.close()
```

```py
with open("test.txt",encoding = 'utf-8') as f:
   # perform file operations
```

In [6]:
with open('data/test.txt', 'w', encoding='utf-8') as f:
    f.write("This is my first file\n")
    f.write("This file\n\n")
    f.write("contains three lines\n")

## Reading a file

A file can read in multiple ways -

1. Read few chars. Byte to char conversion is based on the encoding mentioned while reading the file
2. Read the entire contents of the file. newline chars are also read as a char
3. Read the file line by line in a loop. File object is iterable ie can use `for line in f`
4. Read one line at a time.
5. Read all lines in a list.

In [7]:
f = open('data/test.txt', encoding='utf-8')
f.read(4)   # read the first 4 chars

'This'

In [8]:
f.read(4)   # read the next 4 chars

' is '

In [9]:
f.read()    # read till EOF
# this will newline chars '\n'

'my first file\nThis file\n\ncontains three lines\n'

In [10]:
f.read()   # if the EOF is reached, we get empty string

''

In [11]:
# The position in the file that has been read is maintained by a 'file cursor'. 
# We can get and change the file cursor position

In [12]:
f.tell()

54

In [13]:
f.seek(0)   # go to the starting of the file

0

In [14]:
f.read(10)

'This is my'

In [15]:
f.tell()

10

In [16]:
# read all contents fo the file

f.seek(0)
len(f.read())  # \n is counted as a char

54

In [19]:
f.seek(0)
contents = f.read()
contents[21]   # \n is also a char in the read contents

'\n'

In [21]:
# read the lines in a loop
# file object is iterable

f.seek(0)
for line in f:
    print(line, end='')  # end='' avoids extra newline by the print func

This is my first file
This file

contains three lines


In [22]:
f.seek(0)
for line in f:
    print(len(line), line, end='') 

22 This is my first file
10 This file
1 
21 contains three lines


In [25]:
# read each line one at a time

f.seek(0)
f.readline()   # This method reads a file till the newline, including the newline character

'This is my first file\n'

In [26]:
f.readline()

'This file\n'

In [27]:
f.readline()

'\n'

In [29]:
# Read all lines in a list

f.seek(0)
lines = f.readlines()
lines

['This is my first file\n', 'This file\n', '\n', 'contains three lines\n']

## Other methods on the file object

In [31]:
f.fileno()   # Return an integer number (file descriptor) of the file.

56

In [34]:
f.seek(0)
f.readline(4)

'This'

In [36]:
f.name    # name of the file

'data/test.txt'

In [37]:
f.mode

'r'

In [38]:
f.closed

True

In [35]:
f.close()   # frees up the file descriptor number. 
# If we don't close the file the os may run out of available file desciptors.

## Reading a large file in chunks of data

In [39]:
with open('data/test.txt', 'r') as f:
    chunk_size = 10
    contents = f.read(chunk_size)
    while len(contents) > 0:
        print(contents, end='')
        contents = f.read(chunk_size)

This is my first file
This file

contains three lines


# Write to a file

In [42]:
def print_test_file():
    with open('data/testw.txt') as f:
        print(f.read())

In [47]:
with open('data/testw.txt', 'w') as f:   # 'w' will create a file if it doesn't exist
    f.write('Testing write')

print_test_file()

Testing write


In [48]:
# While using separate open and close command, you must close a file to flush out the contents

f = open('data/testw.txt', 'w')  # this overwrites the contents of the file 
f.write('Testing write again')
f.close()

print_test_file()

Testing write again


In [49]:
with open('data/testw.txt', 'w') as f:
    f.write('line 1')
    f.write('line 2')
    
print_test_file()

line 1line 2


In [50]:
with open('data/testw.txt', 'w') as f:
    f.write('line 1')
    f.seek(0)  # write from the beginning of the line
    f.write('line 2')
    
print_test_file()

line 2


In [51]:
with open('data/testw.txt', 'w') as f:
    f.write('line 1')
    f.seek(0)
    f.write('row')  # the first 3 chars are over-written
    
print_test_file()

rowe 1


## Ex: Copy one file to another

In [52]:
with open('data/sample_file.txt', 'r') as rf:
    with open('data/sample_file_copy.txt', 'w') as wf:
        for line in rf:
            wf.write(line)
            
with open('data/sample_file_copy.txt', 'r') as rf:
    print(rf.read())

line 1
line 2
line 3
this is the last line


## Ex: Copy an image file

In [55]:
# read and write the file in binary
# you can also read the file in chunks, preferably in multiples of 1024

with open('data/octocat.jpg', 'rb') as rf:
    with open('data/octocat_copy.jpg', 'wb') as wf:
        for line in rf:
            wf.write(line)

In [57]:
with open('data/octocat.jpg', 'rb') as rf:
    print(rf.readline())  # prints the bytes for the image

b'\xff\xd8\xff\xe1\x00\x18Exif\x00\x00II*\x00\x08\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\xff\xec\x00\x11Ducky\x00\x01\x00\x04\x00\x00\x00P\x00\x00\xff\xe1\x03-http://ns.adobe.com/xap/1.0/\x00<?xpacket begin="\xef\xbb\xbf" id="W5M0MpCehiHzreSzNTczkc9d"?> <x:xmpmeta xmlns:x="adobe:ns:meta/" x:xmptk="Adobe XMP Core 5.3-c011 66.145661, 2012/02/06-14:56:27        "> <rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"> <rdf:Description rdf:about="" xmlns:xmpMM="http://ns.adobe.com/xap/1.0/mm/" xmlns:stRef="http://ns.adobe.com/xap/1.0/sType/ResourceRef#" xmlns:xmp="http://ns.adobe.com/xap/1.0/" xmpMM:DocumentID="xmp.did:86F1F2DF995F11E29A15BC1046A8904D" xmpMM:InstanceID="xmp.iid:86F1F2DE995F11E29A15BC1046A8904D" xmp:CreatorTool="Adobe Photoshop CS6 (Macintosh)"> <xmpMM:DerivedFrom stRef:instanceID="xmp.did:6C7547580D2168118083F148A6B5326D" stRef:documentID="xmp.did:6C7547580D2168118083F148A6B5326D"/> </rdf:Description> </rdf:RDF> </x:xmpmeta> <?xpacket end="r"?>\xff\xee\x00\