# FILE HANDLING IN PYTHON:-

One of the most common tasks that you can do with Python is reading and writing files. Whether it’s writing to a simple text file, reading a complicated server log, or even analyzing raw byte data, all of these situations require reading or writing a file.

What Is a File?
Before we can go into how to work with files in Python, it’s important to understand what exactly a file is and how modern operating systems handle some of their aspects.

At its core, a file is a contiguous set of bytes used to store data. This data is organized in a specific format and can be anything as simple as a text file or as complicated as a program executable. In the end, these byte files are then translated into binary 1 and 0 for easier processing by the computer.

Files on most modern file systems are composed of three main parts:

Header: metadata about the contents of the file (file name, size, type, and so on)

Data: contents of the file as written by the creator or editor

End of file (EOF): special character that indicates the end of the file


What this data represents depends on the format specification used, which is typically represented by an extension. For example, a file that has an extension of .gif most likely conforms to the Graphics Interchange Format specification. There are hundreds, if not thousands, of file extensions out there. For this tutorial, you’ll only deal with .txt or .csv file extensions.


File Paths
When you access a file on an operating system, a file path is required. The file path is a string that represents the location of a file. It’s broken up into three major parts:

1.Folder Path: the file folder location on the file system where subsequent folders are separated by a forward slash / (Unix) or backslash \ (Windows)

2.File Name: the actual name of the file

3.Extension: the end of the file path pre-pended with a period (.) used to indicate the file type



# 1. A file located within a file structure:-

Here’s a quick example. Let’s say you have a file located within a file structure like this:

Let’s say you wanted to access the cats.gif file, and your current location was in the same folder as path. In order to access the file, you need to go through the path folder and then the to folder, finally arriving at the cats.gif file. The Folder Path is path/to/. The File Name is cats. The File Extension is .gif. So the full path is path/to/cats.gif.


# 2.The file can be simply referenced by the file name and extension 

Now let’s say that your current location or current working directory (cwd) is in the to folder of our example folder structure. Instead of referring to the cats.gif by the full path of path/to/cats.gif, the file can be simply referenced by the file name and extension cats.gif


But what about dog_breeds.txt? How would you access that without using the full path? You can use the special characters double-dot (..) to move one directory up. This means that ../dog_breeds.txt will reference the dog_breeds.txt file from the directory of to:

The double-dot (..) can be chained together to traverse multiple directories above the current directory. For example, to access animals.csv from the to folder, you would use ../../animals.csv.

# 1.Line Endings


One problem often encountered when working with file data is the representation of a new line or line ending. The line ending has its roots from back in the Morse Code era, when a specific pro-sign was used to communicate the end of a transmission or the end of a line.

Later, this was standardized for teleprinters by both the International Organization for Standardization (ISO) and the American Standards Association (ASA). ASA standard states that line endings should use the sequence of the Carriage Return (CR or \r) and the Line Feed (LF or \n) characters (CR+LF or \r\n). The ISO standard however allowed for either the CR+LF characters or just the LF character.

Windows uses the CR+LF characters to indicate a new line, while Unix and the newer Mac versions use just the LF character. This can cause some complications when you’re processing files on an operating system that is different than the file’s source.


# 2.Character Encodings
Another common problem that you may face is the encoding of the byte data. An encoding is a translation from byte data to human readable characters. This is typically done by assigning a numerical value to represent a character. The two most common encodings are the ASCII and UNICODE Formats. ASCII can only store 128 characters, while Unicode can contain up to 1,114,112 characters.

ASCII is actually a subset of Unicode (UTF-8), meaning that ASCII and Unicode share the same numerical to character values. It’s important to note that parsing a file with the incorrect character encoding can lead to failures or misrepresentation of the character. For example, if a file was created using the UTF-8 encoding, and you try to parse it using the ASCII encoding, if there is a character that is outside of those 128 values, then an error will be thrown.



# Opening and Closing a File in Python

When you want to work with a file, the first thing to do is to open it. This is done by invoking the open() built-in function. open() has a single required argument that is the path to the file. open() has a single return, the file object:

In [None]:
file=open('Daily Activities.txt')
print(file)

It’s important to remember that it’s your responsibility to close the file.

In most cases, upon termination of an application or script, a file will be closed eventually. However, there is no guarantee when exactly that will happen. This can lead to unwanted behavior including resource leaks. It’s also a best practice within Python (Pythonic) to make sure that your code behaves in a way that is well defined and reduces any unwanted behavior.

# 1.First way of closing the file with the help of try-catch block

When you’re manipulating a file, there are two ways that you can use to ensure that a file is closed properly, even when encountering an error. The first way to close a file is to use the try-finally block:

In [34]:
reader=open('Daily Activities.txt')
try:
    for f in reader:
        print(f)
        
finally:
    reader.close()

# 2. The second way to close a file is to use the with statement:

The with statement automatically takes care of closing the file once it leaves the with block, even in cases of error. I highly recommend that you use the with statement as much as possible, as it allows for cleaner code and makes handling any unexpected errors easier for you.

Most likely, you’ll also want to use the second positional argument, mode. This argument is a string that contains multiple characters to represent how you want to open the file. The default and most common is 'r', which represents opening the file in read-only mode as a text file:

In [50]:
with open('Daily Activities.txt','r') as reader:
    try:
        for file in reader:
            print(file.read())
    finally:
        file.close()

Character           Meaning

'r'                 Open for reading (default)

'w'                 Open for writing, truncating (overwriting) the file first

'rb' or 'wb'	    Open in binary mode (read/write using byte data)


Let’s go back and talk a little about file objects. A file object is:

“an object exposing a file-oriented API (with methods such as read() or write()) to an underlying resource.” (Source)

There are three different categories of file objects:

Text files

Buffered binary files

Raw binary files

Each of these file types are defined in the io module. Here’s a quick rundown of how everything lines up.

# 1.Text File Types

A text file is the most common file that you’ll encounter. Here are some examples of how these files are opened:

In [57]:
read_file=open('Daily Activities.txt','r')
print(type(read_file))

<class '_io.TextIOWrapper'>


With these types of files, open() will return a TextIOWrapper file object.This is the default file object returned by open().

In [60]:
read_file.close()

# 2.Buffered binary files

A buffered binary file type is used for reading and writing binary files. Here are some examples of how these files are opened:

In [66]:
binary_file=open('Placement_Records.txt','rb')
print(type(binary_file))
binary_file=open('Placement_Records.txt','wb')
print(type(binary_file))
binary_file.close()

<class '_io.BufferedReader'>
<class '_io.BufferedWriter'>


With these types of files, open() will return either a BufferedReader or BufferedWriter file object:

# 3.Raw binary files

A raw file type is:

“generally used as a low-level building-block for binary and text streams.

In [67]:
raw_file=open('5 and 7.png','rb',buffering=0)
print(type(raw_file))

<class '_io.FileIO'>


# Reading and Writing Opened Files

Once you’ve opened up a file, you’ll want to read or write to the file. First off, let’s cover reading a file. There are multiple methods that can be called on a file object to help you out:

Method for Readng the File:-

.read(size=-1)-This reads from the file based on the number of size bytes. If no argument is passed or None or -1 is passed, then the entire file is read.

.readline(size=-1)-This reads at most size number of characters from the line. This continues to the end of the line and then wraps back around. If no argument is passed or None or -1 is passed, then the entire line (or rest of the line) is read.

.readlines()-This reads the remaining lines from the file object and returns them as a list.

In [104]:
raw_file.close()

In [110]:
with open('Snippets.txt','r') as reader:
        print(reader.read())
        reader.close()

data.describe()
data.info()
data.isnull().sum()
sns.pairplot(data)
rows=2
cols=7

fig.ax=plt.subplots(nrows=rows,ncols=cols,figsize=(16,4))
plt.tight_layout()
cols=data.columns
index=0

for i in range(rows):
   for j in range(columns):
	sns.displot(data[col[index],ax[i][j])
   index=index+1






Here’s an example of how to read 5 bytes of a line each time using the Python .readline() method:

In [118]:
with open('Snippets.txt','r') as reader:
    print(reader.readline(10))
    print(reader.readline(10))
    print(reader.readline(10))
    print(reader.readline(10))
    reader.close()

data.descr
ibe()

data.info(
)



Here’s an example of how to read the entire file as a list using the Python .readlines() method:

In [120]:
with open('Placement_Records.txt','r') as reader:
    print(reader.readlines())
    reader.close()

['1. Asif Tandel\n', '2. Sanjay Sarkar\n', '3. Utkarsha Desale\n', '4. Shalaka Patil\n', '5. Yogesh Patole\n']


The above example can also be done by using list() to create a list out of the file object:

In [126]:
f=open('Placement_Records.txt','r')
print(list(f),end="")
f.close()

['1. Asif Tandel\n', '2. Sanjay Sarkar\n', '3. Utkarsha Desale\n', '4. Shalaka Patil\n', '5. Yogesh Patole\n']

# Iterating Over Each Line in the File

A common thing to do while reading a file is to iterate over each line. Here’s an example of how to use the Python .readline() method to perform that iteration:

In [134]:
# File Handling using While Loop
with open('Placement_Records.txt','r') as reader:
    line=reader.readline()
    while line!='':
        print(line)
        line=reader.readline() # Counter parameter

1. Asif Tandel

2. Sanjay Sarkar

3. Utkarsha Desale

4. Shalaka Patil

5. Yogesh Patole



Another way you could iterate over each line in the file is to use the Python .readlines() method of the file object. Remember, .readlines() returns a list where each element in the list represents a line in the file:

In [136]:
# File Handling using For Loop
with open('Placement_Records.txt','r') as reader:
    for line in reader.readlines():
        print(line)

1. Asif Tandel

2. Sanjay Sarkar

3. Utkarsha Desale

4. Shalaka Patil

5. Yogesh Patole



However, the above examples can be further simplified by iterating over the file object itself:

In [138]:
with open('Placement_Records.txt','r') as reader:
    for line in reader:
        print(line)

1. Asif Tandel

2. Sanjay Sarkar

3. Utkarsha Desale

4. Shalaka Patil

5. Yogesh Patole

