### Introduction to Data File Handling in Python

* Data file handling is an important programming aspect for almost all programming languages.
* We need data to be written, modified, deleted and read from a physical storage like a disk by using a programming language. This is quite helpful in storing data for future use.
* Python too provides several features to perform various operations on disk files using built-in funtions.
* The **open()** methods is the key method that is used to open a file on a disk for various operations.

#### Opening data files

* Python generaly saves the created datafiles in the same location where the related python scripts are stored. However we can decide a different location for datafile storage depending on application needs.
* Python uses various modes for opeing a datafile
* The **open()** returns a file object, and is most commonly used with two arguments: **open(filename, mode)**.
* There are four different modes for opening a data file
    * "r" - Read - Default value. Opens a file for reading, error if the file does not exist
    * "a" - Append - Opens a file for appending, creates the file if it does not exist. Does not eraase previous data
    * "w" - Write - Opens a file for writing, creates the file if it does not exist. Erases previous data
    * "x" - Create - Creates the specified file, returns an error if the file exists
    
* In addition you can specify if the file should be handled as binary or text mode
    * "t" - Text - Default value. Text mode
    * "b" - Binary - Binary mode (e.g. images)
    
* We also can open a file using the following modes
    * "r+" - Random access in text files (First read and then write)
    * "r+b" or "rb+" - Random access in binary files
    * "w+" and "w+b" - Similar to above but previous data gets erased

### Using Google Drive to perform Data File handling operations
* Type the following code to authenticate and access yur drive for data file storage

In [None]:
from google.colab import drive
drive.mount('/drive')


Mounted at /drive


#### Opening a datafile for writing


* In the following example we are going to use the **'w'** mode with **open()** function to create a data file **data.txt** in the location
* When we use the **w** mode a new file is created that replaces the previous file and now we can use the **write()** function write in to the file.
* The write() function writes strings (or a set of ASCII charecters) in the file, which we can see by double clicking the file at its location.
* After writing it is mandatory to close the file handle by using the **close()** function.
* The two functions write() and close() are member functions to the **file** object created by us.

In [None]:
file = open('/drive/My Drive/Teaching/2025 Spring/INFO 617/Lecture 1 Jan 13/Test.txt','w')
file.write('Hello Students\n')
file.write('Welcome to INFO 617')
file.close()

### Opening the file for reading

* Now we are going to open it in python environment for reading by using the **r** mode or the default mode. (*If you do not use any mode it becomes default read mode*)
​
#### Reading the whole content using read() function
* In the following example the file data.txt is being opened for reading and all the contents are stored as a string in a user defined object **res**.
* For reading the date we need to use the **read()** function.
* After we store the data from the file in the string we can manipulate the string by using string functions.

In [None]:
file = open('/drive/My Drive/Teaching/2025 Spring/INFO 617/Lecture 1 Jan 13/Test.txt','r')
res = file.read()
print(res)
file.close()

Hello Students
Welcome to INFO 617


### What will happen when we open the same file again in **w** mode
* The existing information will be erased and new information will replace it
* Let us see the following example

In [None]:
# Opening the file for writing
file = open('/drive/My Drive/Teaching/2025 Spring/INFO 617/Lecture 1 Jan 13/Test.txt','w')
file.write('Dear Students\n')
file.write('This is file handling\n')
file.close()

# Opening the file for reading
file = open('/drive/My Drive/Teaching/2025 Spring/INFO 617/Lecture 1 Jan 13/Test.txt','r')
res = file.read()
print(res)
file.close()

Dear Students
This is file handling



### Using the append or **a** mode for adding information to the end of the file

* Below we have used the file data.txt created above with some existing information.
* The same is followed by opening the file in **a** mode and writing two more
line.
* We can see that the additional infiomration has been added/appended to the existing information

In [None]:
# Opening the file data.txt in append mode to add extra information
file = open('/drive/My Drive/Teaching/2025 Spring/INFO 617/Lecture 1 Jan 13/Test.txt','a')
file.write('\nWe are learning Python Text File operations\n')
file.write('This is really interesting')
file.close()

# Opening the file for reading
file = open('/drive/My Drive/Teaching/2025 Spring/INFO 617/Lecture 1 Jan 13/Test.txt','r')
res = file.read()
print(res)
file.close()

Hello Students
Welcome to INFO 617We are learning Text File operations
This is really interesting
We are learning Python Text File operations
This is really interesting
We are learning Python Text File operations
This is really interesting


### Using read() function with an integer argument to read specific number of charecters
* We can also read a specific number of charecters from the file by providing a integer argument to the read() function as illustrated in the following example.
* The first read(5) function reads first five characters.
* The next read(5) function reads the next 5 characters and so on.

In [None]:
file = open('/drive/My Drive/Teaching/2025 Spring/INFO 617/Lecture 1 Jan 13/Test.txt','r')
res = file.read(5)
print(res)
res = file.read(5)
print(res)
res = file.read(3)
print(res)
file.close()

Hello
 Stud
ent


### Reading line by line using readline() and readlines() function
* We can read one line at a time by using the **readline()** or **readlines()** functions.

### Using readline() function
* The call of readline() function will read one line at a time and store that line as a **string** as illustrated below

In [None]:
file = open('/drive/My Drive/Teaching/2025 Spring/INFO 617/Lecture 1 Jan 13/Test.txt','r')
res = file.readline() # Reading a single line as a string
print(res)
res = file.readline() # Reading the next line in the file
print(res)
file.close()

Hello Students

Welcome to INFO 617We are learning Text File operations



### Using readlines() to read all the lines

* readlines() can be usied to read all the lines from a file but the extracted data is in the form of as **list of strings** as illustrated below

In [None]:
file = open('/drive/My Drive/Teaching/2025 Spring/INFO 617/Lecture 1 Jan 13/Test.txt','r')
y = file.readlines() # The data is extracted as a list of strings
print(y)
#printing the second line only using list indexing operation
print(y[1])

['Hello Students\n', 'Welcome to INFO 617We are learning Text File operations\n', 'This is really interesting\n', 'We are learning Python Text File operations\n', 'This is really interesting\n', 'We are learning Python Text File operations\n', 'This is really interesting']
Welcome to INFO 617We are learning Text File operations



### Alternatinve way of reading lines using for loop
* We can also use the for loop in a datafile operations to read one line at a time.
* In the given example the line object is storing one line at a time as string

In [None]:
file = open('/drive/My Drive/Teaching/2025 Spring/INFO 617/Lecture 1 Jan 13/Test.txt','r')
for line in file:
    print(line)
file.close()

Hello Students

Welcome to INFO 617We are learning Text File operations

This is really interesting

We are learning Python Text File operations

This is really interesting

We are learning Python Text File operations

This is really interesting


### Using with keywod to open a datafile
* It is good practice to use the **with** keyword when dealing with file objects.
* The advantage is that the file is properly closed after its suite finishes, even if an exception is raised at some point.


In [None]:

with open('/drive/My Drive/Teaching/2025 Spring/INFO 617/Lecture 1 Jan 13/Test.txt','w') as f:
    f.write('Some stuff has been written')

with open('/drive/My Drive/Teaching/2025 Spring/INFO 617/Lecture 1 Jan 13/Test.txt','r') as f:
    s = f.read()
    print(s)

Some stuff has been written


Another type of data file we frequently handle is the .csv file.

A CSV file, or comma-separated values file, is a plain text file that stores data in a structured format using commas to separate values and newlines to separate records. We demonstrate how to load .csv files in the following.

In [None]:
import pandas as pd
liwc = pd.read_csv('/drive/My Drive/Teaching/2025 Spring/INFO 617/Lecture 1 Jan 13/INFO 617 Mental Health.csv')


In [None]:
print(liwc.head(5))

                    ID     Funct   Pronoun     PPron         I        We  \
0  human_latest_3000_0  0.471591  0.107955  0.073864  0.017046  0.000000   
1  human_latest_3000_1  0.483696  0.076087  0.070652  0.021739  0.027174   
2  human_latest_3000_2  0.541126  0.183983  0.138528  0.028139  0.000000   
3  human_latest_3000_3  0.477901  0.093923  0.071823  0.027624  0.000000   
4  human_latest_3000_4  0.476923  0.092308  0.080000  0.015385  0.000000   

        You     SheHe  They     iPron  ...  WordPerSentence  RateDicCover  \
0  0.051136  0.011364   0.0  0.034091  ...          58.6667      0.931818   
1  0.000000  0.021739   0.0  0.005435  ...          46.0000      0.880435   
2  0.075758  0.032468   0.0  0.041126  ...          28.8750      0.971861   
3  0.041437  0.002762   0.0  0.022099  ...          10.9697      0.895028   
4  0.043077  0.012308   0.0  0.012308  ...          12.5000      0.901538   

   RateNumeral  RateSixLtrWord  RateFourCharWord  RateLatinWord  NumAtMention  \