# Reading Files Python


We can use Python to read and write the contents of the <b>files</b>. 
Text files are the easiest to manipulate. Before a file can be edited, it must be pened, using the <b>open</b> function

In [59]:
myfile = open("Example1.txt")

<h3>Note</h3>
The arguement of the open function is the <b>path</b> to the file. If the file in the current working directory of the program, you can specify only its name.

<h3>Opening Files</h3>
You can specify the mode used to <b>open </b>a file by applying a second arguement to the <b>open </b> function.
'r' mean in read mode
'w' mean in a write mode
'b' mean in a binary mode, which is used for non-text files (such as image and sound files)

In [2]:
## let the following code cell run.

import urllib.request
url = 'https://cf-courses-data.s3.us.cloud-object-storage.appdomain.cloud/IBMDeveloperSkillsNetwork-PY0101EN-SkillsNetwork/labs/Module%204/data/example1.txt'
filename = 'Example1.txt'
urllib.request.urlretrieve(url, filename)

## Download Example file
#!wget -O /resources/data/Example1.txt https://cf-courses-data.s3.us.cloud-object-storage.appdomain.cloud/IBMDeveloperSkillsNetwork-PY0101EN-SkillsNetwork/labs/Module%204/data/example1.txt

('Example1.txt', <http.client.HTTPMessage at 0x7ff49e43ca00>)

In [3]:
## all imports
import pandas as pd
from IPython.display import HTML
import numpy as np
import bs4 #this is beautiful soup
import time
import operator
import socket
import re # regular expressions

from pandas import Series
import pandas as pd
from pandas import DataFrame
import urllib.request as urllib2
#from pyodide.http import pyfetch
import pandas as pd

filename = "https://cf-courses-data.s3.us.cloud-object-storage.appdomain.cloud/IBMDeveloperSkillsNetwork-PY0101EN-SkillsNetwork/labs/Module%204/data/example1.txt"

async def download(url, filename):
    response = await pyfetch(url)
    if response.status == 200:
        with open(filename, "wb") as f:
            f.write(await response.bytes())


await download(filename, "Example1.txt")

NameError: name 'pyfetch' is not defined

<h2 id="read">Reading Text Files</h2>


One way to read or write a file in Python is to use the built-in <code>open</code> function. The <code>open</code> function provides a **File object** that contains the methods and attributes you need in order to read, save, and manipulate the file. In this notebook, we will only cover **.txt** files. The first parameter you need is the file path and the file name. An example is shown as follow:


The mode argument is optional and the default value is **r**. In this notebook we only cover two modes:

<ul>
    <li>**r**: Read mode for reading files </li>
    <li>**w**: Write mode for writing files</li>
</ul>


In [71]:
# Read the Example1.txt
example1 = "Example1.txt"
file1 = open(example1, "r")


In [72]:
#We can view the attributes of the file. The name of the file:
file1.name

'Example1.txt'

In [73]:
#The mode the file object is in:
file1.mode

'r'

The content of a file has been opend in text mode can be read using the <b>read</b> method We can read the file and assign it to a variable :


In [74]:
#This will print all of the content of the file 
file_read = file1.read()
file_read

'This is line 1 \nThis is line 2\nThis is line 3'

In [65]:
# Print the file with '\n' as a new line
print(file_read)

This is line 1 
This is line 2
This is line 3


In [66]:
# Type of file content

type(file_read)

str

It is very important that the file is closed in the end. This frees up resources and ensures consistency across different python versions.


In [67]:
# Close file after finish

file1.close()

<h2 id="better">A Better Way to Open a File</h2>


Using the <code>with</code> statement is better practice, it automatically closes the file even if the code encounters an exception. The code will run everything in the indent block then close the file object.


In [68]:
# Open file using with

with open(example1, "r") as file1:
    FileContent = file1.read()
    print(FileContent)

This is line 1 
This is line 2
This is line 3


In [69]:
file1.closed

True

In [70]:
#The file object is closed, you can verify it by running the following cell:
print(FileContent)

This is line 1 
This is line 2
This is line 3


The syntax is a little confusing as the file object is after the <code>as</code> statement. We also don’t explicitly close the file. Therefore we summarize the steps in a figure:


<img src="https://cf-courses-data.s3.us.cloud-object-storage.appdomain.cloud/IBMDeveloperSkillsNetwork-PY0101EN-SkillsNetwork/labs/Module%204/images/ReadWith.png" width="500">


To read only a certain amount of a file, you can provide a number as an argument to the <b>read </b> function. This determines the number of bytes that should be read.
You can make more calls to read on the same file object to read more of the file byte by byte. with no argument, <b>read </b> returns the rest of the file.

We don’t have to read the entire file, for example, we can read the first 4 characters by entering three as a parameter to the method **.read()**:


In [88]:
# Read first four characters

with open(example1, "r") as file1:
    print(file1.read(5))
    

This 


Once the method <code>.read(4)</code> is called the first 4 characters are called. If we call the method again, the next 4 characters are called. The output for the following cell will demonstrate the process for different inputs to the method <code>read()</code>:


In [89]:
# Read certain amount of characters

with open(example1, "r") as file1:
    print(file1.read(4))
    print(file1.read(4))
    print(file1.read(7))
    print(file1.read(15))

This
 is 
line 1 

This is line 2


The process is illustrated in the below figure, and each color represents the part of the file read after the method <code>read()</code> is called:


<img src="https://cf-courses-data.s3.us.cloud-object-storage.appdomain.cloud/IBMDeveloperSkillsNetwork-PY0101EN-SkillsNetwork/labs/Module%204/images/read.png" width="500">


Here is an example using the same file, but instead we read 16, 5, and then 9 characters at a time:


In [90]:
# Read certain amount of characters

with open(example1, "r") as file1:
    print(file1.read(16))
    print(file1.read(5))
    print(file1.read(9))

This is line 1 

This 
is line 2


We can also read one line of the file at a time using the method <code>readline()</code>:


In [93]:
# Read one line

with open(example1, "r") as file1:
    print("first line: " + file1.readline())

first line: This is line 1 



We can also pass an argument to  readline()  to specify the number of charecters we want to read. However, unlike  read(),  readline() can only read one line at most

In [94]:
with open(example1, "r") as file1:
    print(file1.readline(20)) # does not read past the end of line
    print(file1.read(20)) # Returns the next 20 chars

This is line 1 

This is line 2
This 


In [95]:
# Iterate through the lines

with open(example1,"r") as file1:
        i = 0;
        for line in file1:
            print("Iteration", str(i), ": ", line)
            i = i + 1

Iteration 0 :  This is line 1 

Iteration 1 :  This is line 2

Iteration 2 :  This is line 3


In [96]:
# Read all lines and save as a list (filename.readlines()------>all lines)(filename.readline()----->first line)

with open(example1, "r") as file1:
    FileasList = file1.readlines()
    print(FileasList)

['This is line 1 \n', 'This is line 2\n', 'This is line 3']


In [97]:
# Print the first line

FileasList[0]

'This is line 1 \n'

In [98]:
# Print the third line

FileasList[2]

'This is line 3'

<h2 id="write">Writing Files</h2>


 We can open a file object using the method <code>write()</code> to save the text file to a list. To write to a file, the mode argument must be set to **w**. Let’s write a file **Example2.txt** with the line: **“This is line A”**


In [99]:
# Write line to file
exmp2 = 'Example2.txt'
with open(exmp2, 'w') as writefile:
    writefile.write("This is line A")

In [100]:
# Read file
with open(exmp2,'r') as readfile:
    print(readfile.read())

This is line A


We can write multiple lines:


In [101]:
# Write lines to file

with open(exmp2, 'w') as writefile:
    writefile.write("This is line A\n")
    writefile.write("This is line B\n")

In [102]:
# Check whether write to file

with open(exmp2, 'r') as testwritefile:
    print(testwritefile.read())

This is line A
This is line B



In [103]:
# Create a Sample list of text

Lines = ["This is line A\n", "This is line B\n", "This is line C\n"]
Lines

['This is line A\n', 'This is line B\n', 'This is line C\n']

In [108]:
# Write the strings in the list to text file name Example2.txt

with open('Example2.txt', 'w') as writefile:
    for line in Lines:
        #you can check what you wrting 
        #print(line) 
        writefile.write(line)

In [109]:
# Verify if writing to file is successfully executed

with open('Example2.txt', 'r') as testwritefile:
    print(testwritefile.read())

This is line A
This is line B
This is line C



However, note that setting the mode to __w__ overwrites all the existing data in the file.


In [30]:
with open('Example2.txt', 'w') as writefile:
    writefile.write("Overwrite\n")
with open('Example2.txt', 'r') as testwritefile:
    print(testwritefile.read())

Overwrite



<hr>
<h2 id="Append">Appending Files</h2>


 We can write to files without losing any of the existing data as follows by setting the mode argument to append: **a**.  you can append a new line as follows:


In [110]:
# Write a new line to text file

with open('Example2.txt', 'a') as testwritefile:
    testwritefile.write("This is line C\n")
    testwritefile.write("This is line D\n")
    testwritefile.write("This is line E\n")

In [111]:
# Verify if the new line is in the text file

with open('Example2.txt', 'r') as testwritefile:
    print(testwritefile.read())

This is line A
This is line B
This is line C
This is line C
This is line D
This is line E



<hr>
<h2 id="add">Additional modes</h2> 


It's fairly ineffecient to open the file in **a** or **w** and then reopening it in **r** to read any lines. Luckily we can access the file in the following modes:
- **r+** : Reading and writing. Cannot truncate the file.
- **w+** : Writing and reading. Truncates the file.
- **a+** : Appending and Reading. Creates a new file, if none exists.
You dont have to dwell on the specifics of each mode for this lab. 


In [114]:
#Let's try out the a+ mode:
with open('Example2.txt', 'a+') as testwritefile:
    testwritefile.write("This is line E\n")
    print(testwritefile.read())




There were no errors but <code>read()</code> also did not output anything. This is because of our location in the file.


Most of the file methods we've looked at work in a certain location in the file. <code>.write() </code> writes at a certain location in the file. <code>.read()</code> reads at a certain location in the file and so on. You can think of this as moving your pointer around in the notepad to make changes at specific location.


Opening the file in **w** is akin to opening the .txt file, moving your cursor to the beginning of the text file, writing new text and deleting everything that follows.
Whereas opening the file in **a** is similiar to opening the .txt file, moving your cursor to the very end and then adding the new pieces of text. <br>
It is often very useful to know where the 'cursor' is in a file and be able to control it. The following methods allow us to do precisely this -
- <code>.tell()</code> - returns the current position in bytes
- <code>.seek(offset,from)</code> - changes the position by 'offset' bytes with respect to 'from'. From can take the value of 0,1,2 corresponding to beginning, relative to current position and end


In [115]:
with open('Example2.txt', 'a+') as testwritefile:
    print("Initial Location: {}".format(testwritefile.tell()))
    
    data = testwritefile.read()
    if (not data):  #empty strings return false in python
            print('Read nothing') 
    else: 
            print(testwritefile.read())
            
    testwritefile.seek(0,0) # move 0 bytes from beginning.
    
    print("\nNew Location : {}".format(testwritefile.tell()))
    data = testwritefile.read()
    if (not data): 
            print('Read nothing') 
    else: 
            print(data)
    
    print("Location after read: {}".format(testwritefile.tell()) )

Initial Location: 135
Read nothing

New Location : 0
This is line A
This is line B
This is line C
This is line C
This is line D
This is line E
This is line E
This is line E
This is line E

Location after read: 135


Finally, a note on the difference between **w+** and **r+**. Both of these modes allow access to read and write methods, however, opening a file in **w+** overwrites it and deletes all pre-existing data. <br>
To work with a file on existing data, use **r+** and **a+**. While using **r+**, it can be useful to add a <code>.truncate()</code> method at the end of your data. This will reduce the file to your data and delete everything that follows. <br>
In the following code block, Run the code as it is first and then run it with the <code>.truncate()</code>.


In [116]:
with open('Example2.txt', 'r+') as testwritefile:
    data = testwritefile.readlines()
    testwritefile.seek(0,0) #write at beginning of file
   
    testwritefile.write("Line 1" + "\n")
    testwritefile.write("Line 2" + "\n")
    testwritefile.write("Line 3" + "\n")
    testwritefile.write("finished\n")
    #Uncomment the line below
    #testwritefile.truncate()
    testwritefile.seek(0,0)
    print(testwritefile.read())

Line 1
Line 2
Line 3
finished
This is line C
This is line C
This is line D
This is line E
This is line E
This is line E
This is line E



<h2 id="copy">Copy a File</h2> 


Let's copy the file **Example2.txt** to the file **Example3.txt**:


In [117]:
# Copy file to another

with open('Example2.txt','r') as readfile:
    with open('Example3.txt','w') as writefile:
          for line in readfile:
                writefile.write(line)

We can read the file to see if everything works:


In [118]:
# Verify if the copy is successfully executed

with open('Example3.txt','r') as testwritefile:
    print(testwritefile.read())

Line 1
Line 2
Line 3
finished
This is line C
This is line C
This is line D
This is line E
This is line E
This is line E
This is line E



 After reading files, we can also write data into files and save them in different file formats like **.txt, .csv, .xls (for excel files) etc**. You will come across these in further examples


**NOTE:** If you wish to open and view the `example3.txt` file, download this lab [here](https://cf-courses-data.s3.us.cloud-object-storage.appdomain.cloud/IBMDeveloperSkillsNetwork-PY0101EN-SkillsNetwork/labs/Module%204/PY0101EN-4-2-WriteFile.ipynb) and run it locally on your machine. Then go to the working directory to ensure the `example3.txt` file exists and contains the summary data that we wrote.


<h3> Exercise</h3>

Your local university's Raptors fan club maintains a register of its active members on a .txt document. Every month they update the file by removing the members who are not active. You have been tasked with automating this with your Python skills. <br>
Given the file `currentMem`, Remove each member with a 'no' in their Active column. Keep track of each of the removed members and append them to the `exMem` file. Make sure that the format of the original files in preserved.   (*Hint: Do this by reading/writing whole lines and ensuring the header remains* )
<br>
Run the code block below prior to starting the exercise. The skeleton code has been provided for you. Edit only the `cleanFiles` function.


In [38]:
#Run this prior to starting the exercise
from random import randint as rnd

memReg = 'members.txt'
exReg = 'inactive.txt'
fee =('yes','no')

def genFiles(current,old):
    with open(current,'w+') as writefile: 
        writefile.write('Membership No  Date Joined  Active  \n')
        data = "{:^13}  {:<11}  {:<6}\n"

        for rowno in range(20):
            date = str(rnd(2015,2020))+ '-' + str(rnd(1,12))+'-'+str(rnd(1,25))
            writefile.write(data.format(rnd(10000,99999),date,fee[rnd(0,1)]))


    with open(old,'w+') as writefile: 
        writefile.write('Membership No  Date Joined  Active  \n')
        data = "{:^13}  {:<11}  {:<6}\n"
        for rowno in range(3):
            date = str(rnd(2015,2020))+ '-' + str(rnd(1,12))+'-'+str(rnd(1,25))
            writefile.write(data.format(rnd(10000,99999),date,fee[1]))


genFiles(memReg,exReg)


Now that you've run the prerequisite code cell above, which prepared the files for this exercise, you are ready to move on to the implementation.

#### **Exercise:** Implement the cleanFiles function in the code cell below.


In [39]:
'''
The two arguments for this function are the files:
    - currentMem: File containing list of current members
    - exMem: File containing list of old members
    
    This function should remove all rows from currentMem containing 'no' 
    in the 'Active' column and appends them to exMem.
    '''
def cleanFiles(currentMem, exMem):
    # TODO: Open the currentMem file as in r+ mode
        #TODO: Open the exMem file in a+ mode

        #TODO: Read each member in the currentMem (1 member per row) file into a list.
        # Hint: Recall that the first line in the file is the header.

        #TODO: iterate through the members and create a new list of the innactive members

        # Go to the beginning of the currentMem file
        # TODO: Iterate through the members list. 
        # If a member is inactive, add them to exMem, otherwise write them into currentMem

        
    with open(currentMem,'r+') as writeFile: 
        with open(exMem,'a+') as appendFile:
            #get the data
            writeFile.seek(0)
            members = writeFile.readlines()
            #remove header
            header = members[0]
            members.pop(0)
                
            inactive = [member for member in members if ('no' in member)]
            '''
            The above is the same as 

            for member in members:
            if 'no' in member:
                inactive.append(member)
            '''
            #go to the beginning of the write file
            writeFile.seek(0) 
            writeFile.write(header)
            for member in members:
                if (member in inactive):
                    appendFile.write(member)
                else:
                    writeFile.write(member)      
            writeFile.truncate()

# The code below is to help you view the files.
# Do not modify this code for this exercise.
memReg = 'members.txt'
exReg = 'inactive.txt'
cleanFiles(memReg,exReg)

# code to help you see the files

headers = "Membership No  Date Joined  Active  \n"

with open(memReg,'r') as readFile:
    print("Active Members: \n\n")
    print(readFile.read())
    
with open(exReg,'r') as readFile:
    print("Inactive Members: \n\n")
    print(readFile.read())
                

Active Members: 


Membership No  Date Joined  Active  
    70525      2015-3-19    yes   
    77860      2020-8-4     yes   
    96038      2018-7-9     yes   
    35119      2020-12-5    yes   
    71757      2017-4-5     yes   
    90544      2020-10-19   yes   
    29891      2015-3-22    yes   
    59792      2016-1-14    yes   
    97923      2015-4-14    yes   
    14381      2017-10-25   yes   

Inactive Members: 


Membership No  Date Joined  Active  
    43201      2016-4-9     no    
    46206      2018-11-7    no    
    32960      2020-8-1     no    
    68455      2015-2-12    no    
    12118      2020-7-1     no    
    86044      2018-11-12   no    
    95512      2018-5-14    no    
    29921      2016-2-16    no    
    48957      2020-6-17    no    
    45812      2020-9-10    no    
    34831      2016-6-12    no    
    52904      2015-5-7     no    
    95576      2016-9-4     no    



In [40]:
def testMsg(passed):
    if passed:
       return 'Test Passed'
    else :
       return 'Test Failed'

testWrite = "testWrite.txt"
testAppend = "testAppend.txt" 
passed = True

genFiles(testWrite,testAppend)

with open(testWrite,'r') as file:
    ogWrite = file.readlines()

with open(testAppend,'r') as file:
    ogAppend = file.readlines()

try:
    cleanFiles(testWrite,testAppend)
except:
    print('Error')

with open(testWrite,'r') as file:
    clWrite = file.readlines()

with open(testAppend,'r') as file:
    clAppend = file.readlines()
        
# checking if total no of rows is same, including headers

if (len(ogWrite) + len(ogAppend) != len(clWrite) + len(clAppend)):
    print("The number of rows do not add up. Make sure your final files have the same header and format.")
    passed = False
    
for line in clWrite:
    if  'no' in line:
        passed = False
        print("Inactive members in file")
        break
    else:
        if line not in ogWrite:
            print("Data in file does not match original file")
            passed = False
print ("{}".format(testMsg(passed)))
    



Test Passed


In [None]:
#Q1 Create a text file called mytxt.txt
# Save the following content in the text file:
# "In this section, we will use Python's built-in open function to create a file
# and obtain the data from a "txt" file.
# We will use Python's open function to get a file object.
# We can apply a method to that object to read data from the file.
# The first argument is the file path, the second parameter is the mode."

In [None]:
txt1= "In this section, we will use Python's built-in open function to create a file and obtain the data from a txt file.\nWe will use the open function to get a file object.\nWe can apply a method to that object to read data from the file. \nThe first argument is the file path, the second parameter is the mode."
mytxt=open('mytxt.txt','w+')
mytxt.write(txt1)
mytxt.seek(0)
mytxt.read()

In [57]:
#You can create the text file in the same directory you working on or 
#you can create it by writing:
f2 = "testme.txt"
with open(f2, 'w') as writetest:
    writetest.write("This is line A")
    

In [58]:
with open(f2,'r') as readtest:
    print(readtest.read())
    #print(readtest.readline(4))

This is line A


In [142]:
#Q2 Write a Python program to read the entire mytxt file
f = open("mytxt.txt",'r')
for line in f:
    print(line)

In this section, we will use Python's built-in open function to create a file

and obtain the data from a "txt" file.

We will use Python's open function to get a file object.

We can apply a method to that object to read data from the file.

The first argument is the file path, the second parameter is the mode.


In [143]:
f = open("mytxt.txt",'r')
print(f.readlines())

["In this section, we will use Python's built-in open function to create a file\n", 'and obtain the data from a "txt" file.\n', "We will use Python's open function to get a file object.\n", 'We can apply a method to that object to read data from the file.\n', 'The first argument is the file path, the second parameter is the mode.']


In [151]:
#Q3 Write a Python program to read last 3 lines of mytxt file 
with open("mytxt.txt", "r") as file:
    FileasList = file.readlines()
    lines = FileasList[-3::]
    listToStr = '\n'.join(map(str,lines))
    print(listToStr)

We will use Python's open function to get a file object.

We can apply a method to that object to read data from the file.

The first argument is the file path, the second parameter is the mode.


In [155]:
#Q3 Write a Python program to read last n lines of mytxt file
#[Hint: write a function]
def read_last_n_lines(n):
    with open("mytxt.txt") as myfile:
        lines = myfile.readlines()
        last_lines =lines[-n:]
        listToStr = '\n'.join(map(str,last_lines))
        return listToStr
print(read_last_n_lines(2))

We can apply a method to that object to read data from the file.

The first argument is the file path, the second parameter is the mode.


In [133]:
#Q4 count the number of lines in mytxt file
with open("mytxt.txt", "r") as file:
    lines = file.readlines()
    for i,j in enumerate(lines):
        pass
    print("Number of lines:",i+1)

Number of lines: 5


In [152]:
#Q5 Write a Python program to generate 26 text files named A.txt, B.txt,... Z.txt
import string,os
for letter in string.ascii_uppercase:
    open(letter + ".txt", "x")

FileExistsError: [Errno 17] File exists: 'A.txt'

In [158]:
#Q6 Write a Python program to append "End of file" to mytxt file and
# display the tex
with open("mytxt.txt","a") as filend:
    filend.write('end of the file')
with open("mytxt.txt", 'r') as myfile:
    a = myfile.read()
    print(a)

In this section, we will use Python's built-in open function to create a file
and obtain the data from a "txt" file.
We will use Python's open function to get a file object.
We can apply a method to that object to read data from the file.
The first argument is the file path, the second parameter is the mode.
 end of the fileend of the file


In [159]:
#Q7 Write a Python program to remove newline characters from mytxt file
def remove_char(file):
    myFile=open(file,'r')
    text=myFile.read().replace('\n','')
    myFile.close()
    return(text)

print(remove_char("mytxt.txt"))

In this section, we will use Python's built-in open function to create a fileand obtain the data from a "txt" file.We will use Python's open function to get a file object.We can apply a method to that object to read data from the file.The first argument is the file path, the second parameter is the mode. end of the fileend of the file


In [160]:
# Q8 Write a python program to find the longest word in mytxt file
def LongestWord(file):
    with open(file) as files:
        words = files.read().split()
    max_len = len(max(words, key=len))
    return[word for word in words if len(word)==max_len]
LongestWord("mytxt.txt")
    

['parameter']

In [161]:
#Q9 Write a Python program to count the frequency "We" in mytxt file
with open("mytxt.txt") as myfile:
    words=myfile.read().split()
    a=words.count('We') #.just gives output for "We".doesn't include "we".
    print(a)

2
