# LiveCoding: Read and Write a .txt File (solutions)

by [Luciano Gabbanelli](https://www.linkedin.com/in/luciano-gabbanelli-ph-d-75302218)

<img width=80 src="https://media.giphy.com/media/KAq5w47R9rmTuvWOWa/giphy.gif">

<img width=150 src="Images/Assembler.png">

***

## What is a .txt file?

- A .txt file is a standard text document that contains plain text.
- Text documents contain little to no formatting.
- They are used to store text-based information.
- It can be opened and edited in any text-editing or word-processing program.

## What is a CSV file?

CSV stands for “Comma Separated Values”. It is the simplest form of storing data in tabular form as plain text. Because it’s a plain text file, it can contain only actual text data—in other words, printable ASCII or Unicode characters.

They are capable of handling large amounts of data, and they are a convenient way to export this data from spreadsheets and databases as well as import or use it in other programs. 

It is important to know to work with CSV because we mostly rely on CSV data in our day-to-day lives as data scientists. For example, you might export the results of a data mining program to a CSV file and then import that into a spreadsheet to analyze the data, generate graphs for a presentation, or prepare a report for publication.

- See for example a 'Salary_Data.csv' file. Its structure goes as follows:

> $\hspace{10pt}$ years_experience,salary <br>
> $\hspace{10pt}$ 1.1,39343.00 <br>
> $\hspace{10pt}$ 1.3,46205.00 <br>
> $\hspace{10pt}$ 1.5,37731.00 <br>
> $\hspace{10pt}$ 2.0,43525.00 <br>
> $\hspace{10pt}$ ...

**Structure:**
1. Each piece of data is separated by a comma.
2. Normally, the first line identifies each piece of data—in other words, the name of a data column.
3. Every subsequent line after that is actual data and is limited only by file size constraints.
4. The separator character is called a delimiter, and the comma is not the only one used. Other popular delimiters include the tab (\t), colon (:) and semi-colon (;) characters.
5. Properly parsing a CSV file requires us to know which delimiter is being used.

## .txt reading methods

Let us look at several ways to read text files in Python.

- Drag and dropp the 'Assembler_School.txt' file in a folder called 'Files' (create it if you haven't already) inside your MDS root folder where this Jupyter Notebook is located.


- To open the file, use the built-in `open()` function.

> **Syntax:**
> 
> `open(path_to_file, mode, encoding)`
>
> | Mode  | Description  |
> | :---: | :---: |
> | `'r'` | Open for text file for reading text |
> | `'w'` | Open a text file for writing text |
> | `'a'` | Open a text file for appending text |
>

- The `open()` function returns a file object, which has `read()`, `readline()`, or `readlines()` methods for reading the content of the file.


- Be careful with the location of the file. You will need to specify the file path if the .txt is not in the same folder as this notebook.


-  Reading text methods:

    1. Loop through the file line by line  &rarr; **Go to the first code cell**
    
    2.  `read()` : read all text from a file into a string. This method is useful if you have a small file and you want to manipulate the whole text of that file. You can also specify how many characters you want to return.
    
    3. `readline()` : read the text file line by line and return all the lines as strings.
    
    4. `readlines()` : read all the lines of the text file and return them as a list of strings.


- If there are no specifications, the open command works fine with ASCII text files ([American Standard Code for Information Interchange](https://en.wikipedia.org/wiki/ASCII))<a name="cite_ref-1"></a>[<sup>[1]</sup>](#cite_note-1). However, if you are dealing with other languages (for example, if accents are present in your text file), you will need to specify the `encoding` parameter to determine the character encoding<a name="cite_ref-2"></a>[<sup>[2]</sup>](#cite_note-2), because computers don\'t actually know what text is. By specifying the encoding as `open(path_to_file, mode, encoding='utf-8')`, accents and other characters will be displayed correctly. You are now using the UTF-8 ([8-bit Unicode (or Universal Coded Character Set) Transformation Format](https://en.wikipedia.org/wiki/UTF-8))<a name="cite_ref-3"></a>[<sup>[3]</sup>](#cite_note-3) encoding for text files.

<br>

<a name="cite_note-1"></a> [[1]](#cite_ref-1) ASCII is a character encoding standard for electronic communication. ASCII codes represent text in computers, telecommunications equipment, and other devices. Most modern character-encoding schemes are based on ASCII.

<a name="cite_note-2"></a> [[2]](#cite_ref-2) Character encodings are systems that map characters to numbers. Each character is given a specific ID number. This way, computers can actually read and understand characters. Try, for instance, the commands `ord("💩")` or `ord("a")`.

<a name="cite_note-3"></a> [[3]](#cite_ref-3) UTF-8 is a variable-width character encoding used for electronic communication, defined by the [Unicode Standard](https://en.wikipedia.org/wiki/Unicode). UTF-8 is capable of encoding all 1_112_064 valid character code points in Unicode using one to four one-byte (8-bit) code units. Code points with lower numerical values, which tend to occur more frequently, are encoded using fewer bytes. It was designed for backward compatibility with ASCII: the first 128 characters of Unicode, which correspond one-to-one with ASCII, are encoded using a single byte with the same binary value as ASCII, so that valid ASCII text is valid UTF-8-encoded Unicode as well. UTF-8 is the dominant encoding for the World Wide Web (and internet technologies), accounting for 98% of all web pages, and up to 100.0% for some languages, as of 2022.

### Line by line loop

In [1]:
# Type the code here:
f = open("Files/Assembler_School.txt", "r")
for x in f:
    print(x)
# Why do we have a blanck line between lines?
# Recall that by default the print has end = '\n'. Try:
# print(x, end = '')
f.close()

Hello! Welcome to Assembler_School.txt

This file is for testing purposes.



Good Luck!


It is a good practice to always close the file when you are done with it. Use the file `close()` method or your file will remain open.


#### How do we close a file?

**There is another way to close a file, but on the contrary, doing it automatically, and without calling the `close()` method.**

**Task:** Count the lines of the file.

In [44]:
# Type the code here:
with open("Files/Assembler_School.txt", "r") as f:
    num_lines = sum(1 for line in f)
    print(f'The file contains {num_lines} lines')

The file contains 4 lines


**Task:** Count also the non-blank lines of the file.

In [16]:
# Type the code here:
with open("Files/Assembler_School.txt", "r") as f:
    num_lines = 0
    nb_lines = 0
    for line in f:
        num_lines += 1
        if line != '\n':
            nb_lines += 1
print(f'The file contains {num_lines} lines from which only {nb_lines} are not in blanck')

The file contains 4 lines from which only 3 are not in blanck


#### Combine the loop with the `enumerate()` function

When files are large and we don't want to use unnecessary memory, we can use the `enumerate()` function adding a counter to each line.

`enumerate(file_pointer)` doesn’t load the entire file in memory, so this is an efficient fasted way to count lines in a file.

In [10]:
# Type the code here:
with open("Files/Assembler_School.txt", 'r') as fp:
    for count, line in enumerate(fp):
        print(count, line)
        pass # You need to put something inside the loop so that the code understands that the print has no syntax errors

print('\nTotal lines: ', count + 1)

0 Hello! Welcome to Assembler_School.txt

1 This file is for testing purposes.

2 

3 Good Luck!

Total lines:  4


**Go up for the remaining methods** &uarr;

### The `read()` method

In [11]:
# Type the code here:
with open("Files/Assembler_School.txt", "r") as f:
    contents = f.read()
    print(contents)

Hello! Welcome to Assembler_School.txt
This file is for testing purposes.

Good Luck!


In [12]:
print(f)

<_io.TextIOWrapper name='Files/Assembler_School.txt' mode='r' encoding='cp1252'>


In [13]:
print(f.read(51))

ValueError: I/O operation on closed file.

Operations on files must be done from within the `with`, otherwise you will find a closed file.

In [14]:
# Type the code here:
with open("Files/Assembler_School.txt", "r") as f:
    print(f.read(51))

Hello! Welcome to Assembler_School.txt
This file is


**Go up for the remaining methods** &uarr;

### The `readline()` method

Another option is to use the `readline()` method, which only returns one line.

In [39]:
# Type the code here:
with open("Files/Assembler_School.txt", "r") as f:
    print(f.readline())

Hello! Welcome to Assembler_School.txt



By calling `readline()` twice, you can read the first two lines, and so on increasing the number of calls to the method.

In [136]:
# Type the code here:
with open("Files/Assembler_School.txt", "r") as f:
    print(f.readline(), end = '')
    print(f.readline(), end = '')

Hello! Welcome to Assembler_School.txt
This file is for testing purposes.


The same sentence is usually not repeated several times in the same code. Let's implement a loop to read all lines.

In [138]:
# Type some code here:
with open("Files/Assembler_School.txt", "r") as f:
    for line in f.readlines() :
        print(line)

Hello! Welcome to Assembler_School.txt

This file is for testing purposes.



Good Luck!


Or you can implement a list comprehension:

In [33]:
# Type the code here:
with open("Files/Assembler_School.txt", "r") as f:
    lines = [line for line in f]
print(lines)

['Hello! Welcome to Assembler_School.txt\n', 'This file is for testing purposes.\n', '\n', 'Good Luck!']


Or using a `while` loop:

In [38]:
# Type the code here:
with open("Files/Assembler_School.txt", "r") as f:
    a_line = f.readline()
    print(a_line, end = '')
    while  a_line:
        a_line = f.readline()
        print(a_line, end = '')

Hello! Welcome to Assembler_School.txt
This file is for testing purposes.

Good Luck!

**Go up for the remaining methods** &uarr;

### The `readlines()` method

In [39]:
# Type the code here:
with open("Files/Assembler_School.txt", "r") as f:
    print(f.readlines())

['Hello! Welcome to Assembler_School.txt\n', 'This file is for testing purposes.\n', '\n', 'Good Luck!']


The information in the list can be extracted with a `for` statement. 

**Task:** Count again how many lines the file has.

In [143]:
# Type the code here:
with open("Files/Assembler_School.txt", "r") as f:
    lines = f.readlines()
    
count = 0
print(f'The file has {len(lines)} lines. They are:\n')
for line in lines:
    count += 1
    print(f'Line {count}: {line}')    

The file has 4 lines. They are:

Line 1: Hello! Welcome to Assembler_School.txt

Line 2: This file is for testing purposes.

Line 3: 

Line 4: Good Luck!


In [41]:
lines

['Hello! Welcome to Assembler_School.txt\n',
 'This file is for testing purposes.\n',
 '\n',
 'Good Luck!']

## The `seek()` method

Python file method `seek()` sets the file's current position at the offset.

> **Sintax:** `file.seek(offset)`
>
> `offset`: a number representing the position to set the current file stream position

This method returns the new postion.

In [1]:
# Type the code here:
with open("Files/Assembler_School.txt", "r") as f:
    # returns the offset position
    print(f.seek(30))
    # and the process start from that position
    lines = f.readlines()
    print(lines)

30
['hool.txt\n', 'This file is for testing purposes.\n', '\n', 'Good Luck!']


In [3]:
f.seek(10)

ValueError: I/O operation on closed file.

## The `tell()` method

The `tell()` method returns the current file position in a file stream.

> **Sintax:** `file.tell()`
>
> No parameter values

**Tip:** You can change the current file position with the seek() method.

In [62]:
# Type some code here:
with open("Files/Assembler_School.txt", "r") as f:
    print(f.tell())

0


In [63]:
print(f.tell())

ValueError: I/O operation on closed file.

In [64]:
with open("Files/Assembler_School.txt", "r") as f:
    print(f.readline())
    print(f.tell())

Hello! Welcome to Assembler_School.txt

40


## .txt writing methods

First, to write to an existing file, you must add a parameter to the `open()` function for writing or appending:

- `'w'` : will overwrite any existing content; if the specified file does not exist, it will be created.


- `'a'` : will append to the end of the file; if the specified file does not exist, it  will be created.


- `'x'` : will create a file, returns an error if the file exist.

Second,  write to the text file using the `write()` or `writelines()` method.

### The `write()` method

Use the `write()` function to write a list of texts to a text file.

In [85]:
# Type the code here:
lines = ['Readme','Now let us do something new', 'How to write text files in Python']

with open('Files/readme.txt', 'w') as f:
    for line in lines:
        f.write(line)

Now, let us open and read the newly created file. Also see what has been created in your 'Files' folder

In [86]:
# Type the code here:
with open('Files/readme.txt', 'r') as f:
    print(f.read())

ReadmeNow let us do something newHow to write text files in Python


If you treat each item in the list as a line, you must concatenate it with the newline character:

In [128]:
# Type some code here:
# brak lines after each element:
with open('Files/readme2.txt', 'w') as f:
    for line in lines:
        f.write(line)
        f.write('\n')

In [129]:
with open('Files/readme2.txt', 'r') as f:
    print(f.read())

Readme
Now let us do something new
How to write text files in Python



But the line breaks for each list item have been put in by hand.

Remove the trailing newline by hand and place it with the `join()` method:

In [99]:
# Type the code here:
print(lines)
with open('Files/readme.txt', 'w') as f:
    x = '\n'.join(lines)
    print('Joined lines: \n\n',x, sep = '')
    f.write(x)

['Readme', 'Now let us do something new', 'How to write text files in Python']
Joined lines: 

Readme
Now let us do something new
How to write text files in Python


**Warning!** We are printing the joined lines of the list that has been stored in a variable. We are not printing the content of the readme.txt file.

Print the content:

In [101]:
# Type the code here:
with open('Files/readme.txt', 'r') as f:
    print(f.read())

Readme
Now let us do something new
How to write text files in Python


### The `writelines()` method

The `writelines()` method writes the items of a list to the file.

In [121]:
new_lines = ['Woops!', '\nI have deleted the content!', '\nDo you understand why?']

# Type the code here:
with open('Files/readme.txt', 'w') as f:
    f.writelines(new_lines)

### Appending text files

To append to a text file, you need to open the text file in appending mode.
Let us append new lines to the `readme.txt` file

In [123]:
# Type the code here:
more_lines = ['Append text files', 'The End']
with open('Files/readme.txt', 'a') as f:
    f.writelines('\n'.join(more_lines))

In [124]:
with open('Files/readme.txt', 'r') as f:
    print(f.read())

Woops!
I have deleted the content!
Do you understand why?Append text files
The End


We need to insert an empty string as the first element of the list

In [125]:
# Type the code here:
more_lines = ['', 'Append text files', 'The', 'End']

with open('Files/readme.txt', 'a') as f:
    f.writelines('\n'.join(more_lines))

In [126]:
with open('Files/readme.txt', 'r') as f:
    print(f.read())

Woops!
I have deleted the content!
Do you understand why?Append text files
The End
Append text files
The
End


## Delete a File

To delete a file, you must import the OS module, and run its `os.remove()` function:

In [127]:
# Type the code here:
import os
os.remove("Files/readme.txt")

In [130]:
with open('Files/readme.txt', 'r') as f:
    print(f.read())

FileNotFoundError: [Errno 2] No such file or directory: 'Files/readme.txt'

To avoid getting an error, you might want to check if the file exists before you try to delete it:

In [131]:
import os
if os.path.exists("Files/readme2.txt"):
    os.remove("Files/readme2.txt")
else:
    print("The file does not exist")

Run the same cell one more time:

In [132]:
import os
if os.path.exists("Files/readme2.txt"):
    os.remove("Files/readme2.txt")
else:
    print("The file does not exist")

The file does not exist


You can also delete an entire folder, use the `os.rmdir()` method:

In [133]:
import os
os.rmdir("myfolder")

FileNotFoundError: [WinError 2] El sistema no puede encontrar el archivo especificado: 'myfolder'