---   
 <img align="left" width="75" height="75"  src="https://upload.wikimedia.org/wikipedia/en/c/c8/University_of_the_Punjab_logo.png"> 

<h1 align="center">Department of Data Science</h1>
<h1 align="center">Course: Tools and Techniques for Data Science</h1>

---
<h3><div align="right">Instructor: Muhammad Arif Butt, Ph.D.</div></h3>    

<h1 align="center">Lecture 2.7</h1>

## _file_handling.ipynb_

### Learning agenda of this notebook

1. Open a text file
2. Reading from a file
    - using read()
    - using seek()
    - using readlines()
    - using readline()
    - Iterating a file object
3. Writing in a file
    - using write()
    - using append()
4. Close a file
5. Delete a file
6. Rename a file
7. Get file attributes using stat()
8. Checking file types
9. Directory handling
    - Create a directory
    - Delete a directory
    - Rename a directory
    - Changing a process cwd
    

### What are Files, and Why we need them?
An important component of an operating system is its files and directories. A file is a location on disk that stores related information and has a name. We use files to organize our data in different directories on a hard-disk.

The RAM (Random Access Memory) is volatile; it holds data only as long as it is up. So, we use files to store data permanently.

### File Types
There are different types of files exist including **image**, **audio**, **video**, **text**, **exectuables**. However, python supports only two file types:
- **_Text File_** structured as a sequence of lines, where each line includes a sequence of characters may be called code or syntax. Every line in a file terminates with a special character known as EOL or end of line.
     
- **_Binary File_** in pyhton is any type of file which is not a text file and contains data in the form of 0's and 1's.
   
   
### File Handling in python
File handling is an important part of any web application. In python, file handling is basically **CRUD** operations. CRUD operations include:
 - Create
 - Read
 - Update
 - Delete
 
There are other file operations as well e.g., copying a file, changing properties of file. However, CRUD operations are basic file handling operations.

### 1. Opening a file
Python has a built-in **_open()_** function to open a file. This function returns a file object,whose type depends on the mode, and through which the standard file operations such as reading and writing are performed. 
```
open(file,  mode='r',  buffering=-1, encoding=None, errors=None, newline=None, closefd=True, opener=None)
```
Where,
- file is the only required argument, which is the name of the file to be opende
- mode is for the file access modes described below
- buffering for text file is line buffering by default (0: no buffering, 1: line buffering)
- encoding is the name of the encoding used to decode or encode the file. (See codecs module for supported encodings)
- errors is an optional string that specifies how encoding errors are to be handled
- newline controls how universal newlines works (it only applies to text mode). It can be None, '', '\n', '\r', and '\r\n'. 
- closefd is by default True, which closes the underlying file descriptor, when file is closed
- opener is by default None, however, advance users can pass a custom opener to obtain the underlying file descriptor

**FILE ACCESS MODES**

 * ***`Read Only (‘r’)`*** : It opens the text file for reading. If the file does not exist, raises I/O error.
 * ***`Read and Write (‘r+’)`*** : It opens the file for reading and writing. Raises I/O error if the file does not exists.
 * ***`Write Only (‘w’)`*** : It opens the file for writing only. For existing file, data is truncated. Creates the file if the file does not exists.
 * ***`Write and Read (‘w+’)`*** : It opens the file for reading and writing. For existing file, data is truncated. Creates the file if the file does not exists.
 * ***`Append Only (‘a’)`*** : It opens the file for writing, appending to the end off the file if it exists. The file is created if it does not exist.
 * ***`Append and Read (‘a+’)`*** : It opens the file for reading and writing. The file is created if it does not exist. The data being written will be inserted at the end, after the existing data.
 * ***`Exclusive creation (‘x’)`*** : It Opens a file for exclusive creation. If the file already exists, the operation fails.
 
Along with above file access modes, you can also specify how file should be handled as text or binary (rt is default mode)
  * ***`Text file (‘t’)`*** : (By default) Opens a file in text mode
  * ***`Binary file (‘b’)`*** : Opens a file in binary mode


In [1]:
# Open a file named f1.txt in current working directory (giving relative or absolute path)
# # In Microsoft OSs, the path may look a bit different e.g., "C:\\Users\\Kakamanna\\f1.txt"
fd = open("f1.txt")
fd = open("/Users/arif/Documents/DS-522/Demo-Files/Section-2/Lec-2.7/f1.txt")

#fd = fd1 =open('nofile.txt') # In case file does not exist, the interpreter will throw a FileNotFoundError.
fd

<_io.TextIOWrapper name='/Users/arif/Documents/DS-522/Demo-Files/Section-2/Lec-2.7/f1.txt' mode='r' encoding='UTF-8'>

### 2. Reading Contents of a file
In Python once you have a file opened, there are three ways to read from that file:
- read([n])
- readline([n])
- readlines()
    - In case of read() and readline(), n specifies the number of bytes/characters to read from stream. If n is negative or omitted, it will read till the EOF
    - The readlines() function, return a list of lines from the stream (read entire file)



### a.  Using read() method

In [2]:
# By default a file opens in read-only mode, However we can specify its mode "r"
fd = open("f1.txt","r")  # or fd=open("f1.txt"), or fd=open("f1.txt", "rt") are all equivalent

# read first 5 characters from f1.txt
rv = fd.read(5)
rv

'Hello'

In [3]:
# reading the complete file till EOF character
fd.read()

' Students.\nWelcome to File Handling with Arif Butt\nI learn a lot when I teach my students\nHope learning will be fun for you all, at least it is for me\nHappy Learning\n'

In [4]:
# try to read again, and notice what will happen?
fd.read()

''

In [5]:
'''
As you can see we got an empty string. This is because the file is already opened  and the file pointer was already 
pointing at the end of the file so one way is to close the file first to read it again.
'''
# close the file
fd.close()

In [6]:
# now open and read the file again
fd = open("f1.txt","r")
print(fd.read())

# close the file
fd.close()

Hello Students.
Welcome to File Handling with Arif Butt
I learn a lot when I teach my students
Hope learning will be fun for you all, at least it is for me
Happy Learning



### - Change File Offset using seek() method
In Python, seek() function is used to change the position of the File Handle to a given specific position, from where the data has to be read or written in the file. It returns the new absolute position.
```
seek(offset, whence)
```
Where,
 - offset means the number of positions to move forward/backward. It is interpreted relative to the position indicated by whence
 - whence can take following values: 
     - 0:  start of stream (the default); offset should be zero or positive 
     - 1:  current stream position; offset may be negative
     - 2:  end of stream; offset is usually negative
     
**Note:** 
- Reference point at current position / end of file cannot be set in text mode except when offset is equal to 0.
- Seek() function with negative offset only works when file is opened in Binary mode.

In [7]:
# open a file in read mode and check the position of file offset
fd = open("f1.txt","r")
# check the position of file offset
fd.seek(0, 1)
print("Cursor is pointing at the location: ", fd.seek(0, 1))


# Let us read five characters and check the position of file offset
fd.read(5)
print("Cursor is pointing at the location: ", fd.seek(0, 1))


# Let us read remaining portion of file and check the position of file offset
fd.read()
print("Cursor is pointing at the location: ", fd.seek(0, 1))


# close the file
fd.close()
# fd.seek(0, 1) is equivalent to  fd.tell()

Cursor is pointing at the location:  0
Cursor is pointing at the location:  5
Cursor is pointing at the location:  171


In [8]:
# Let us do some more practice with the seek() function
# open a file in append mode
fd = open("f1.txt","a")
print("Cursor is pointing at the location: ", fd.seek(0, 1))

# set the cursor to beginning
cur = fd.seek(0)   # equivalent to fd.seek(0, 0)
print("Cursor is pointing at the location: ", cur)

# set the cursor to 100 position from beginning
cur = fd.seek(100)   # equivalent to fd.seek(100, 0)
print("Cursor is pointing at the location: ", cur)


# let us move the cursor 50 bytes back from current position
cur = fd.seek(50, 0)   
print("Cursor is pointing at the location: ", cur)

#close the file
fd.close()

Cursor is pointing at the location:  171
Cursor is pointing at the location:  0
Cursor is pointing at the location:  100
Cursor is pointing at the location:  50


In [9]:
# You can read data from binary file as well
# To open a file in binary mode, the mode parameter contains a 'b' character.
# The stream object you get from opening a file in binary mode has many of the same attributes
# One difference is a binary stream object has no encoding attribute, so you're reading/writing bytes (not strings), 
# so there's no conversion for Python to do. 
# What you get out of a binary file is exactly what you put into it, no conversion necessary.
f = open("image.png","rb")


# reading the image file
print(f.read())

#close the file
f.close()

b'\x89PNG\r\n\x1a\n\x00\x00\x00\rIHDR\x00\x00\x02\x1e\x00\x00\x02\x1e\x08\x02\x00\x00\x01\x0c\xa6\xa1\x8b\x00\x00\x00\x19tEXtSoftware\x00Adobe ImageReadyq\xc9e<\x00\x00\x03"iTXtXML:com.adobe.xmp\x00\x00\x00\x00\x00<?xpacket begin="\xef\xbb\xbf" id="W5M0MpCehiHzreSzNTczkc9d"?> <x:xmpmeta xmlns:x="adobe:ns:meta/" x:xmptk="Adobe XMP Core 5.3-c011 66.145661, 2012/02/06-14:56:27        "> <rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"> <rdf:Description rdf:about="" xmlns:xmp="http://ns.adobe.com/xap/1.0/" xmlns:xmpMM="http://ns.adobe.com/xap/1.0/mm/" xmlns:stRef="http://ns.adobe.com/xap/1.0/sType/ResourceRef#" xmp:CreatorTool="Adobe Photoshop CS6 (Windows)" xmpMM:InstanceID="xmp.iid:CC8C1453C90311E6997E86E5265323A5" xmpMM:DocumentID="xmp.did:CC8C1454C90311E6997E86E5265323A5"> <xmpMM:DerivedFrom stRef:instanceID="xmp.iid:CC8C1451C90311E6997E86E5265323A5" stRef:documentID="xmp.did:CC8C1452C90311E6997E86E5265323A5"/> </rdf:Description> </rdf:RDF> </x:xmpmeta> <?xpacket end="r

### c. Using readline() method

In [10]:
# If you want to read a file line by line use readline() method
fd = open("f1.txt","r")
print(fd.readline())
print(fd.readline())
print(fd.readline())

# close the file
fd.close()

Hello Students.

Welcome to File Handling with Arif Butt

I learn a lot when I teach my students



### d. Using readlines() method


In [11]:
#You can read the file lines separately, using readlines() method

fd = open("f1.txt","r")
rd = fd.readlines()

print(rd)

# close the file
fd.close()

['Hello Students.\n', 'Welcome to File Handling with Arif Butt\n', 'I learn a lot when I teach my students\n', 'Hope learning will be fun for you all, at least it is for me\n', 'Happy Learning\n']


### e. Iterating Contents of a File (Line by Line)

The above functionality can be achieved using a for loop on the file object

In [12]:
# open a file in read mode
fd = open("f1.txt","r")

# use for loop to print all the lines one by one
for line in fd:
    print(line)

# close the file
fd.close()

Hello Students.

Welcome to File Handling with Arif Butt

I learn a lot when I teach my students

Hope learning will be fun for you all, at least it is for me

Happy Learning



####  Count the words in the file using **split()** method

In [13]:
# We know that words are separated by spaces in a file
# The str.split(sep) function returns a list of the words in the string, using sep as the delimiter
# Let us use , split() method to count words in each line of a file

fd = open("f1.txt","r")
totalwords = 0
for line in fd:
    listoftokens = line.split(' ')
    print(line, "Number of words in the line: ", len(listoftokens))
    totalwords = totalwords + len(listoftokens)
fd.close()
print("Total words in this file are: ", totalwords)

Hello Students.
 Number of words in the line:  2
Welcome to File Handling with Arif Butt
 Number of words in the line:  7
I learn a lot when I teach my students
 Number of words in the line:  9
Hope learning will be fun for you all, at least it is for me
 Number of words in the line:  14
Happy Learning
 Number of words in the line:  2
Total words in this file are:  34


### 3. Writing in a File

### a. Using write() method
write() method is used to write data in a file. It overwrites the existing data. If the file doesn't exist, it will create the file. 

In [14]:
# let's open a file in write only mode
fd1 = open('out.txt','w')
# write function will return the length of text written in the file.
rv = fd1.write('Python is Awesome!')
print("Number of bytes written in the file: ", rv)
#close the file
fd1.close()

# Let us open the file in read mode and read its contents
fd1 = open('out.txt')
print(fd1.read())
fd1.close()


Number of bytes written in the file:  18
Python is Awesome!


In [15]:
# let's open a file in read-write mode
fd1 = open('out.txt','w+') # Due to w+ all the data is truncated


# write function will return the length of text written in the file.
rv = fd1.write('Writing again in the out.txt file')
print("Number of bytes written in the file: ", rv)

# take the pointer to the initial position, and read the file. Note due to w+ the previous data is truncated
fd1.seek(0)
print(fd1.read())

#close the file
fd1.close()

Number of bytes written in the file:  33
Writing again in the out.txt file


### b. How to append data to a file
Open the file in append mode

In [16]:
# let's open a file in append mode
fd1 = open('out.txt','a+') # The '+' sign after 'a' means you can append as well read
# write function will return the length of text written in the file.
rv = fd1.write('\nPython is Awesome!')
print("Number of bytes written in the file: ", rv)

#Confirm
fd1.seek(0)
print(fd1.read())

#close the file
fd1.close()

Number of bytes written in the file:  19
Writing again in the out.txt file
Python is Awesome!


In [17]:
# creating a list
fruits = ["\nApple","\nBanana\n","\nOranges"]

# open a file in read-append mode
fd =open("out.txt",mode="a+")

#Copying the list content in file
for fruit in fruits:
    fd.write(fruit)
    
# set the cursor to the beginning
fd.seek(0)

#reading the data from file
for line in fd:
    print(line)
    
# close the file
fd.close()

Writing again in the out.txt file

Python is Awesome!

Apple

Banana



Oranges


### 4. Closing a File

### a. Using close() method
Closing a file will free up the resources that were tied with the file. Python has a garbage collector to clean up unreferenced objects but we must not rely on it to close the file.
It is done using the close() method available in Python.

In [18]:
# open a file
f = open("f1.txt", "r")

# perform some file operations

#close the file
f.close()

### b. Use close() in try...finally Block
The close() method is not entirely safe. If an exception occurs when we are performing some operation 
with the file, the code exits without closing the file.

A safer way is to use a try...finally block.

In [19]:
# Put the entire code in try block
try:
    fd = open("f1.txt", "r")
   # perform file operations
    
finally:
    fd.close()

### c- Open-Read-Write-Close Using with Statement
The best way to close a file is by using the with statement. This ensures that the file is closed when the block inside the with statement is exited.

We don't need to explicitly call the close() method. It is done internally.

In [20]:
# open the file in read mode using with statement
with open("f1.txt", "r") as f:
    # perform file operations
    print(f.read())
   

Hello Students.
Welcome to File Handling with Arif Butt
I learn a lot when I teach my students
Hope learning will be fun for you all, at least it is for me
Happy Learning



### 5 Delete a file

In [21]:
# you can delete the file using remove() method of "os" module
# first import the os module
import os

# first check whether the file exist or not
if os.path.exists("temp.txt"):
    os.remove("temp.txt")
    print("File removed")
    
else:
    print("file doesn't exist")

file doesn't exist


### 6. Rename a file

In [22]:
# you can rename the file as well using rename() method of "os" module
# For renaming a file, rename() method takes in two arguments: 
# rename(oldname,newname)
!touch temp.txt
!ls
os.rename('temp.txt','newtemp.txt')
print("\n\n")
!ls

[34maaanew[m[m              f1.txt              [31mimage.png[m[m           out.txt
[34mbbb[m[m                 f2.txt              new.txt             temp.txt
[34mdir1[m[m                file_handling.ipynb newtemp.txt



[34maaanew[m[m              f1.txt              [31mimage.png[m[m           out.txt
[34mbbb[m[m                 f2.txt              new.txt
[34mdir1[m[m                file_handling.ipynb newtemp.txt


### 7. Reading Attributes of a File using stat() method
This method is used to get status of the specified path

**os.stat(path)**

Path is a string or bytes object representing a valid path. Stat() method represents the status of specified path having attributes incluing _file mode_, _inode number_, _file owner_, _group owner_, number of _hard links_, _file size_ etc.

In [23]:
import os
# providing path
path =  'f1.txt' 
  
# Get the status of the specified path
status = os.stat(path)

status


os.stat_result(st_mode=33188, st_ino=8619937371, st_dev=16777220, st_nlink=1, st_uid=501, st_gid=20, st_size=171, st_atime=1632484966, st_mtime=1632396344, st_ctime=1632484965)

In [24]:
# You can extract the information individually
import os
# providing path
path = 'f1.txt'
  
# Get the status of the specified path
status = os.stat(path)

# extract the file size
print("file size: ", status.st_size)

# extract the file type and file mode bits (permissions)
print("file type: ", status.st_mode)

# user identifier of the file owner
print("file ID: ", status.st_uid)

# recent access time in seconds
print("Last access time: ", status.st_atime)

# recent modification time in seconds
print("Last modification time: ", status.st_mtime)

# recent metadata change on Unix and creation time on Windows
print("Last status change time: ", status.st_ctime)

file size:  171
file type:  33188
file ID:  501
Last access time:  1632484966.9258096
Last modification time:  1632396344.419134
Last status change time:  1632484965.3663845


### 8. Identifying Type of File

In [25]:
import os
!ls
path = input("Enter name of the file/directory: ")

if os.path.isfile(path):
    print("Provided path is a file")
elif os.path.isdir(path):
    print("Provided path is a directory")

else:
    print("Unknown file type")

[34maaanew[m[m              f1.txt              [31mimage.png[m[m           out.txt
[34mbbb[m[m                 f2.txt              new.txt
[34mdir1[m[m                file_handling.ipynb newtemp.txt
Enter name of the file/directory: new.txt
Provided path is a file


## Python Directories
If there are a large number of files to handle in our Python program, we can arrange our code 
within different directories to make things more manageable.A directory or folder is a collection of files and subdirectories.

### 1. Some Basic Functions on Directories

`os.listdir([path])`: Return a list containing the names of the files in the directory (default is cwd). <br>
`os.getcwd()`:        Return a unicode string representing absolute path ofcurrent working directory. <br>
`os.mkdir([path])`:   Create a new directory. <br>
`os.rmdir('path')`:   Removes a directory. <br>
`os.rename('oldname', 'newname')`: Renames a directory <br>
`os.chdir(path)`:     Change the current working directory to the specified path. <br>


### Example Code Snippets


In [26]:
# Example Create new directory
import os
os.chdir
# check absolute pathe of your current working directory
print("Current working directory is: ", os.getcwd())

# check the contents of cwd
print("\nContents:\n", os.listdir()) # you can give path of a directory in listdir() function

# create a new directory named test
!rmdir test
os.mkdir('test') #Will raise an error if already exist (!rmdir test)

# check the contents of cwd again
print("\nContents:\n", os.listdir())

Current working directory is:  /Users/arif/Documents/DS-522/Demo-Files/Section-2/Lec-2.7

Contents:
 ['newtemp.txt', 'out.txt', 'f1.txt', 'f2.txt', 'bbb', 'file_handling.ipynb', '.ipynb_checkpoints', 'dir1', 'image.png', 'aaanew', 'new.txt']
rmdir: test: No such file or directory

Contents:
 ['newtemp.txt', 'out.txt', 'f1.txt', 'test', 'f2.txt', 'bbb', 'file_handling.ipynb', '.ipynb_checkpoints', 'dir1', 'image.png', 'aaanew', 'new.txt']


In [27]:
# Example Removing an empty directory (You cannot remove a non-empty directory using rmdir())

os.rmdir('test')
# check the contents of cwd again
print("\nContents:\n", os.listdir())



Contents:
 ['newtemp.txt', 'out.txt', 'f1.txt', 'f2.txt', 'bbb', 'file_handling.ipynb', '.ipynb_checkpoints', 'dir1', 'image.png', 'aaanew', 'new.txt']


In [28]:
# Example: Renaming a directory

import os

!mkdir aaa
print("Contents of CWD:\n", os.listdir())
os.rename('aaa','aaanew')
print("\nContents of CWD:\n",os.listdir())


Contents of CWD:
 ['newtemp.txt', 'out.txt', 'f1.txt', 'f2.txt', 'bbb', 'file_handling.ipynb', 'aaa', '.ipynb_checkpoints', 'dir1', 'image.png', 'aaanew', 'new.txt']

Contents of CWD:
 ['newtemp.txt', 'out.txt', 'f1.txt', 'f2.txt', 'bbb', 'file_handling.ipynb', '.ipynb_checkpoints', 'dir1', 'image.png', 'aaanew', 'new.txt']


In [29]:
# you can change the current working directory by using the chdir() method.
# The new path that we want to change into must be supplied as a string 
import os
print(os.getcwd())
os.chdir('../')

print(os.getcwd())

/Users/arif/Documents/DS-522/Demo-Files/Section-2/Lec-2.7
/Users/arif/Documents/DS-522/Demo-Files/Section-2
