# Write and Save Files in Python


Estimated time needed: **25** minutes
    

## Objectives

After completing this lab you will be able to:

* Write to files using Python libraries

## Table of Contents

<div class="alert alert-block alert-info" style="margin-top: 20px">
    <ul>
        <li><a href="write">Writing Files</a></li>
        <li><a href="Append">Appending Files</a></li>
        <li><a href="add">Additional File modes</a></li>
        <li><a href="copy">Copy a File</a></li>
    </ul>

</div>

---

## Shell and Magic Commands

Before working with this notebook, it is a better practice for us to clean up the existing output files in the current directory.

In IPython syntax, the exclamation mark (`!`) allows users to run shell commands from inside a Jupyter Notebook code cell. Shell commands in the Jupyter Notebook are executed in a temporary subshell. To do it in a more enduring way, use shell-like magic functions, e.g. `%cd`, `%ls`, `%pwd`

In [1]:
## List all the .txt files
## 1) Platform dependant shell commands
# !dir *.txt
## 2) Magic commands provided by IPython kernel
%ls *.txt

 Volume in drive C is OS
 Volume Serial Number is 9CF3-BBDB

 Directory of c:\Users\Asus\Coding\course-cognitiveclass-python-for-data-science\module04

2023-03-04  16:07               331 CurrentMembers.txt
2023-03-04  16:12                45 DownloadedFile.txt
2023-03-04  16:07               583 InactiveMembers.txt
2023-03-04  16:07               403 TestAppend.txt
2023-03-04  16:07               511 TestWrite.txt
2023-03-04  16:07                75 WiteFile.txt
2023-03-04  16:07                75 WiteFile1.txt
               7 File(s)          2,023 bytes
               0 Dir(s)  52,255,006,720 bytes free


In [2]:
## Remove all the .txt files
## 1) For Linux & Windows Powershell
# !rm *.txt
## 2) For Windows Command Prompt/CMD
!del *.txt

In [3]:
## Run the notebook specified
%run ".\Lab-04-01-Reading-Files.ipynb"

DownloadedFile.txt
r
<class 'str'>
'This is line 1 \nThis is line 2\nThis is line 3'
This is line 1 
This is line 2
This is line 3
This is line 1 
This is line 2
This is line 3
This is line 1 
This is line 2
This is line 3
This
This
 is 
line 1 

This is line 2
This is line 1 

This 
is line 2
Read a line: This is line 1 

Read a line: This is line 2

This is line 1 

This is line 2
This 
Iteration 0: This is line 1 

Iteration 1: This is line 2

Iteration 2: This is line 3
['This is line 1 \n', 'This is line 2\n', 'This is line 3']
This is line 1 

This is line 2

This is line 3


'wget' is not recognized as an internal or external command,
operable program or batch file.


## Writing Files

We can open a file object using the method <code>write()</code> to save the text file to a list. To write to a file, the mode argument must be set to **w**. 

In [4]:
## Write line to file
written_file_name = 'WiteFile.txt'
with open(written_file_name, 'w') as written_file:
	written_file.write("This is line 1")

 We can read the file to see if it worked:


In [5]:
## Read the written file above
with open(written_file_name, 'r') as read_written_file:
	print(read_written_file.read())

This is line 1


We can write multiple lines:


In [6]:
## Write lines to file
with open(written_file_name, 'w') as written_file:
	written_file.write("This is line 1\n")
	written_file.write("This is line 2\n")

The method <code>.write()</code> works similar to the method <code>.readline()</code>, except instead of reading a new line it writes a new line. The process is illustrated in the figure. The different colour coding of the grid represents a new line added to the file after each method call.


<img src="https://cf-courses-data.s3.us.cloud-object-storage.appdomain.cloud/IBMDeveloperSkillsNetwork-PY0101EN-SkillsNetwork/labs/Module%204/images/WriteLine.png" width="500">


You can check the file to see if your results are correct 


In [7]:
## Check the written file above
with open(written_file_name, 'r') as read_written_file:
	print(read_written_file.read())

This is line 1
This is line 2



 We write a list to a **.txt** file  as follows:


In [8]:
## List of lines
lines = ["This is line A\n", "This is line B\n", "This is line C\n"]
print(lines)

print("-"*50)

## Write the strings in the list to text file
with open('WiteFile.txt', 'w') as written_file:
	for line in lines:
		print(line)
		written_file.write(line)

print("-"*50)

## Verify the content written above
with open('WiteFile.txt', 'r') as read_written_file:
	print(read_written_file.read())

['This is line A\n', 'This is line B\n', 'This is line C\n']
--------------------------------------------------
This is line A

This is line B

This is line C

--------------------------------------------------
This is line A
This is line B
This is line C



However, note that setting the mode to __w__ overwrites all the existing data in the file.


In [9]:
with open('WiteFile.txt', 'w') as written_file:
	written_file.write("Overwrite\n")
with open('WiteFile.txt', 'r') as read_written_file:
	print(read_written_file.read())

Overwrite



<h2 id="Append">Appending Files</h2>

 We can write to files without losing any of the existing data as follows by setting the mode argument to append: **a**.  you can append a new line as follows:


In [10]:
## Write to the existing file without overwriting 
with open('WiteFile.txt', 'a') as written_file:
	written_file.write("This is line 1\n")
	written_file.write("This is line 2\n")
	written_file.write("This is line 3\n")

 You can verify the file has changed by running the following cell:


In [11]:
## Verify the new lines were appended to the file 
with open('WiteFile.txt', 'r') as read_written_file:
	print(read_written_file.read())

Overwrite
This is line 1
This is line 2
This is line 3



<h2 id="add">Additional modes</h2>

It's fairly ineffecient to open the file in **a** or **w** and then reopening it in **r** to read any lines. Luckily we can access the file in the following modes:
- **r+** : Reading and writing. Cannot truncate the file.
- **w+** : Writing and reading. Truncates the file.
- **a+** : Appending and Reading. Creates a new file, if none exists.
You dont have to dwell on the specifics of each mode for this lab. 

Let's try out the __a+__ mode:


In [12]:
with open('WiteFile.txt', 'a+') as file_append_plus:
	file_append_plus.write("This is line 5\n")
	print(file_append_plus.read())




There were no errors but <code>read()</code> also did not output anything. This is because of our location in the file.


Most of the file methods we've looked at work in a certain location in the file. <code>.write() </code> writes at a certain location in the file. <code>.read()</code> reads at a certain location in the file and so on. You can think of this as moving your pointer around in the notepad to make changes at specific location.


Opening the file in **w** is akin to opening the .txt file, moving your cursor to the beginning of the text file, writing new text and deleting everything that follows.
Whereas opening the file in **a** is similiar to opening the .txt file, moving your cursor to the very end and then adding the new pieces of text. <br>
It is often very useful to know where the 'cursor' is in a file and be able to control it. The following methods allow us to do precisely this -
- <code>.tell()</code> - returns the current position in bytes
- <code>.seek(offset,from)</code> - changes the position by 'offset' bytes with respect to 'from'. From can take the value of 0,1,2 corresponding to beginning, relative to current position and end


Now lets revisit **a+**


In [13]:
with open('WiteFile.txt', 'a+') as file_append_plus:
	print("Initial Location: {}".format(file_append_plus.tell()))
	
	data = file_append_plus.read()
	if (not data):  ## Empty strings return False in python
		print("Nothing was read")
	else:
		print(data)
	
	## Move 0 bytes from the beginning of the file
	file_append_plus.seek(0, 0)

	print("New Location: {}".format(file_append_plus.tell()))
	
	data = file_append_plus.read()
	if (not data):
		print("Nothing was read")
	else:
		print(data)
	
	print("New Location: {}".format(file_append_plus.tell()) )

Initial Location: 75
Nothing was read
New Location: 0
Overwrite
This is line 1
This is line 2
This is line 3
This is line 5

New Location: 75


Finally, a note on the difference between **w+** and **r+**. Both of these modes allow access to read and write methods, however, opening a file in **w+** overwrites it and deletes all pre-existing data. <br>
To work with a file on existing data, use **r+** and **a+**. While using **r+**, it can be useful to add a <code>.truncate()</code> method at the end of your data. This will reduce the file to your data and delete everything that follows. <br>
In the following code block, Run the code as it is first and then run it with the <code>.truncate()</code>.


In [14]:
with open('WiteFile.txt', 'r+') as file_read_plus:
	data = file_read_plus.readlines()
	print(data)
	## Move 0 bytes from the beginning of the file
	file_read_plus.seek(0, 0)
	file_read_plus.write("Line 1" + "\n")
	file_read_plus.write("Line 2" + "\n")
	file_read_plus.write("Line 3" + "\n")
	file_read_plus.write("Finished\n")
	## Truncates the file's size
	# file_read_plus.truncate()
	## Move 0 bytes from the beginning of the file
	file_read_plus.seek(0, 0)
	print(file_read_plus.read())

['Overwrite\n', 'This is line 1\n', 'This is line 2\n', 'This is line 3\n', 'This is line 5\n']
Line 1
Line 2
Line 3
Finished
 line 2
This is line 3
This is line 5



<h2 id="copy">Copy a File</h2> 


In [15]:
## Copy file to another
with open('WiteFile.txt', 'r') as file_read:
	with open('WiteFile1.txt', 'w') as file_copied:
		for line in file_read:
			file_copied.write(line)

We can read the file to see if everything works:


In [16]:
## Check if the file content was copied successfully
with open('WiteFile1.txt','r') as file_read_1:
	print(file_read_1.read())

Line 1
Line 2
Line 3
Finished
 line 2
This is line 3
This is line 5



 After reading files, we can also write data into files and save them in different file formats like **.txt, .csv, .xls (for excel files) etc**. You will come across these in further examples


**NOTE:** If you wish to open and view the `example3.txt` file, download this lab [here](https://cf-courses-data.s3.us.cloud-object-storage.appdomain.cloud/IBMDeveloperSkillsNetwork-PY0101EN-SkillsNetwork/labs/Module%204/PY0101EN-4-2-WriteFile.ipynb) and run it locally on your machine. Then go to the working directory to ensure the `example3.txt` file exists and contains the summary data that we wrote.


<h2> Exercise </h2>


Your local university's Raptors fan club maintains a register of its active members on a .txt document. Every month they update the file by removing the members who are not active. You have been tasked with automating this with your Python skills. <br>
Given the file `cur_mem_file_name`, Remove each member with a 'no' in their Active column. Keep track of each of the removed members and append them to the `ex_mem_file_name` file. Make sure that the format of the original files in preserved.   (*Hint: Do this by reading/writing whole lines and ensuring the header remains* )
<br>
Run the code block below prior to starting the exercise. The skeleton code has been provided for you. Edit only the `clean_files` function.


In [17]:
## Generate text files for the exercise
from random import randint as rnd

cur_mem_file_name = 'CurrentMembers.txt'
ex_mem_file_name = 'InactiveMembers.txt'
fee = ('Yes', 'No')

def gen_files(current, old):
	col_name_1 = "Membership ID"
	col_name_2 = "Date Joined"
	col_name_3 = "Active Status"
	global file_header
	file_header = f"{col_name_1}\t\t{col_name_2}\t\t{col_name_3}\n"

	with open(current, 'w+') as file_current:
		## Write file header
		file_current.write(file_header)
		## Number of rows in the file
		for row_num in range(20):
			## Generate random dates
			date = str(rnd(2015, 2020)) + "-" + str(rnd(1, 12)) + "-" + str(rnd(1, 28))
			## Member status active and inactive in cur_mem_file_name
			file_current.write("{:^13}\t\t{:<11}\t\t{:<6}\n".format(rnd(10000, 99999), date, fee[rnd(0, 1)]))
	
	with open(old, 'w+') as file_old:
		## Write file header
		file_old.write(file_header)
		## Number of rows in the file
		for row_num in range(3):
			## Generate random dates
			date = str(rnd(2015, 2020)) + "-" + str(rnd(1, 12)) + "-" + str(rnd(1, 28))
			## Member status always inactive in ex_members
			file_old.write("{:^13}\t\t{:<11}\t\t{:<6}\n".format(rnd(10000, 99999), date, fee[1]))

gen_files(cur_mem_file_name, ex_mem_file_name)

Run the the prerequisite code cell above to prepare the files for this exercise and then implement the clean_files() function in the code cell below.

2 arguments for this function:
- `cur_mem_file_name`: Name of file containing list of current members
- `ex_mem_file_name`: Name of file containing list of former members

This function should remove all rows from cur_mem_file_name containing "No" in the "Active Status" column and append them to ex_mem_file_name.

In [18]:
def clean_files(cur_mem_file_name, ex_mem_file_name):
	'''
	This function moves inactive members from cur_mem_file_name to ex_mem_file_name.
	'''
	# TODO: Open cur_mem_file in r+ mode
	with open(cur_mem_file_name, 'r+') as cur_mem_file:
		# TODO: Open ex_mem_file in a+ mode
		with open(ex_mem_file_name, 'a+') as ex_mem_file:
			# TODO: Read each member in cur_mem_file_name (1 member per row) into a list
			# HINT: Recall that the 1st line in the file is the header
			member_list = cur_mem_file.readlines()
			## Remove the header row (1st row)
			member_list = member_list[1:]
			print(f"Number of all current members: {len(member_list)}")
			print(f"{'-'*100}\nmember_list:\n{member_list}")
			# TODO: Iterate through the members and create a new list of the inactive members
			# TODO: Go to the beginning of cur_mem_file
			# TODO: Iterate through the members list, add inactive members to ex_mem_file and write active ones to cur_mem_file  
			active_members = []
			inactive_members = []
			for member in member_list: 
				if "No" in member:
					inactive_members.append(member)
				elif "Yes" in member:
					active_members.append(member)
				else: 
					print("Error!".upper())
					print(f"The following member was not properly put into the active or inactive member list:\n{member}")
			print(f"Number of active_members: {len(active_members)}")
			print(f"{'-'*100}\nactive_members:\n{active_members}")
			print(f"Number of inactive members: {len(inactive_members)}")
			print(f"{'-'*100}\ninactive_members:\n{inactive_members}")
			## Move the file pointer to the beginning of the file
			print(f"Current position in the file cur_mem_file: {cur_mem_file.tell()}")
			cur_mem_file.seek(0, 0)
			print(f"Current position in the file cur_mem_file: {cur_mem_file.tell()}")
			## Write the header row
			cur_mem_file.write(file_header)
			print(f"Current position in the file cur_mem_file: {cur_mem_file.tell()}")
			for ac_mem in active_members: 
				cur_mem_file.write(ac_mem)
			print(f"Current position in the file cur_mem_file: {cur_mem_file.tell()}")
			## Need to limit the file size to remove the original lines before writing
			cur_mem_file.truncate()
			## File pointer should be at the end to append content to the file
			print(f"Current position in the file ex_mem_file: {ex_mem_file.tell()}")
			for ex_mem in inactive_members: 
				ex_mem_file.write(ex_mem)


clean_files(cur_mem_file_name, ex_mem_file_name)

Number of all current members: 20
----------------------------------------------------------------------------------------------------
member_list:
['    56709    \t\t2018-1-15  \t\tNo    \n', '    52219    \t\t2017-5-14  \t\tYes   \n', '    52832    \t\t2016-8-24  \t\tYes   \n', '    51529    \t\t2019-1-14  \t\tYes   \n', '    80226    \t\t2017-7-4   \t\tYes   \n', '    54893    \t\t2020-2-18  \t\tYes   \n', '    74674    \t\t2020-7-24  \t\tYes   \n', '    43651    \t\t2019-7-4   \t\tNo    \n', '    24896    \t\t2018-8-28  \t\tYes   \n', '    85396    \t\t2020-9-6   \t\tNo    \n', '    45566    \t\t2016-7-16  \t\tYes   \n', '    79471    \t\t2016-9-17  \t\tNo    \n', '    25937    \t\t2020-11-7  \t\tYes   \n', '    76519    \t\t2015-12-27 \t\tYes   \n', '    86599    \t\t2016-5-9   \t\tYes   \n', '    56606    \t\t2017-12-8  \t\tYes   \n', '    57381    \t\t2018-11-5  \t\tNo    \n', '    43566    \t\t2016-7-16  \t\tYes   \n', '    73838    \t\t2016-11-8  \t\tNo    \n', '    11434    \

In [19]:
## View the files

with open(cur_mem_file_name, 'r') as cur_mem_file:
	header = "Active Members"
	print(f"{'-'*int((100-len(header))/2)}{header.upper()}{'-'*int((100-len(header))/2)}")
	print(cur_mem_file.read())
	print("-"*100)

with open(ex_mem_file_name, 'r') as ex_mem_file:
	header = "Inactive Members"
	print(f"{'-'*int((100-len(header))/2)}{header.upper()}{'-'*int((100-len(header))/2)}")
	print(ex_mem_file.read())
	print("-"*100)

-------------------------------------------ACTIVE MEMBERS-------------------------------------------
Membership ID		Date Joined		Active Status
    52219    		2017-5-14  		Yes   
    52832    		2016-8-24  		Yes   
    51529    		2019-1-14  		Yes   
    80226    		2017-7-4   		Yes   
    54893    		2020-2-18  		Yes   
    74674    		2020-7-24  		Yes   
    24896    		2018-8-28  		Yes   
    45566    		2016-7-16  		Yes   
    25937    		2020-11-7  		Yes   
    76519    		2015-12-27 		Yes   
    86599    		2016-5-9   		Yes   
    56606    		2017-12-8  		Yes   
    43566    		2016-7-16  		Yes   

----------------------------------------------------------------------------------------------------
------------------------------------------INACTIVE MEMBERS------------------------------------------
Membership ID		Date Joined		Active Status
    19939    		2016-4-3   		No    
    86549    		2015-4-24  		No    
    68329    		2015-12-27 		No    
    56709    		2018-1-15  		No    
    43651    		20

Run the code cell below to test `clean_files()`.

In [20]:
def test_msg(passed):
	if passed:
		return 'Test Passed'
	else:
		return 'Test Failed'

test_write = "TestWrite.txt"
test_append = "TestAppend.txt"
passed = True

gen_files(test_write, test_append)

with open(test_write, 'r') as test_file_write:
	gen_files_write_lines = test_file_write.readlines()

with open(test_append, 'r') as test_file_append:
	gen_files_append_lines = test_file_append.readlines()

try:
	clean_files(test_write, test_append)
except:
	print("Error!".upper())

with open(test_write, 'r') as test_file_write:
	clean_files_write_lines = test_file_write.readlines()

with open(test_append, 'r') as test_file_append:
	clean_files_append_lines = test_file_append.readlines()

## Check if the total no. of rows, including header row, in each file is same

if (len(gen_files_write_lines) + len(gen_files_append_lines) != len(clean_files_write_lines) + len(clean_files_append_lines)):
	print("The numbers of rows do not add up. Make sure your final files have the same header and format.")
	passed = False

for line in clean_files_write_lines:
	if 'No' in line:
		passed = False
		print("Inactive members in file")
		break
	else:
		if line not in gen_files_write_lines:
			print("Data in file does not match original file")
			passed = False
print("{}".format(test_msg(passed)))

Number of all current members: 20
----------------------------------------------------------------------------------------------------
member_list:
['    56686    \t\t2020-3-12  \t\tYes   \n', '    62961    \t\t2019-9-7   \t\tNo    \n', '    28453    \t\t2019-10-3  \t\tYes   \n', '    32646    \t\t2015-12-15 \t\tNo    \n', '    44630    \t\t2019-1-19  \t\tYes   \n', '    93336    \t\t2018-3-18  \t\tYes   \n', '    60758    \t\t2018-3-17  \t\tYes   \n', '    96862    \t\t2019-11-27 \t\tYes   \n', '    46159    \t\t2020-2-23  \t\tYes   \n', '    67798    \t\t2015-5-4   \t\tNo    \n', '    88449    \t\t2018-6-8   \t\tYes   \n', '    67769    \t\t2020-6-19  \t\tYes   \n', '    97229    \t\t2016-12-26 \t\tNo    \n', '    61338    \t\t2018-9-3   \t\tYes   \n', '    40270    \t\t2018-8-26  \t\tYes   \n', '    54357    \t\t2020-10-4  \t\tNo    \n', '    44842    \t\t2020-7-25  \t\tNo    \n', '    76442    \t\t2020-5-17  \t\tNo    \n', '    55467    \t\t2019-3-27  \t\tNo    \n', '    26574    \

<details><summary>Click here for the solution</summary>

```python
def clean_files(cur_mem_file_name,ex_mem_file_name):
    with open(cur_mem_file_name,'r+') as writeFile: 
        with open(ex_mem_file_name,'a+') as appendFile:
            #get the data
            writeFile.seek(0)
            members = writeFile.readlines()
            #remove header
            header = members[0]
            members.pop(0)
                
            inactive = [member for member in members if ('no' in member)]
            '''
            The above is the same as 

            for member in members:
            if 'no' in member:
                inactive.append(member)
            '''
            #go to the beginning of the write file
            writeFile.seek(0) 
            writeFile.write(header)
            for member in members:
                if (member in inactive):
                    appendFile.write(member)
                else:
                    writeFile.write(member)      
            writeFile.truncate()
                
current_members = 'members.txt'
ex_members = 'inactive.txt'
clean_files(current_members,ex_members)

# code to help you see the files

headers = "Membership No  Date Joined  Active  \n"

with open(current_members,'r') as readFile:
    print("Active Members: \n\n")
    print(readFile.read())
    
with open(ex_members,'r') as readFile:
    print("Inactive Members: \n\n")
    print(readFile.read())
    
```

</details>

    


---

Author(s):

- [Joseph Santarcangelo](https://www.linkedin.com/in/joseph-s-50398b136/?utm_medium=Exinfluencer&utm_source=Exinfluencer&utm_content=000026UJ&utm_term=10006555&utm_id=NA-SkillsNetwork-Channel-SkillsNetworkCoursesIBMDeveloperSkillsNetworkPY0101ENSkillsNetwork1005-2022-01-01)

Other Contributor(s):

- [Mavis Zhou](www.linkedin.com/in/jiahui-mavis-zhou-a4537814a)