### File Handling
---

> So far we have seen different Python data types. We usually store our data in different file formats. In addition to handling files, we will also see different file formats(.txt, .json, .xml, .csv, .tsv, .excel) in this section. First, let us get familiar with handling files with common file format(.txt).

>File handling is an import part of programming which allows us to create, read, update and delete files. In Python to handle data we use open() built-in function.


In [None]:
from pathlib import Path
path = Path('pi_digits.txt')
contents = path.read_text()
print(contents)

3.1415926535
 8979323846
 2643383279


In [2]:
pi_digits = Path("pi_digits.txt")
read_contents = pi_digits.read_text()
read_contents = read_contents.rstrip()
print(read_contents)

3.1415926535
 8979323846
 2643383279


In [3]:
pi = Path("pi_digits.txt")
read_pi = pi.read_text().lstrip()
print(read_pi)

3.1415926535
 8979323846
 2643383279


You can also tell Python exactly where the file is on your computer,
regardless of where the program that’s being executed is stored. This is
called an absolute file path. You can use an absolute path if a relative path
doesn’t work. For instance, if you’ve put text_files in some folder other than
python_work, then just passing Path the path 'text_files/ filename.txt' won’t
work because Python will only look for that location inside python_work.
You’ll need to write out an absolute path to clarify where you want Python
to look.
Absolute paths are usually longer than relative paths, because they start
at your system’s root folder:

In [6]:
js_docs = Path("C:/Users/adams/Documents/bluetooth")
read_docs = js_docs.read_text()

PermissionError: [Errno 13] Permission denied: 'C:\\Users\\adams\\Documents\\bluetooth'

In [None]:
# Syntax
open('filename', mode) # mode(r, a, w, x, t,b)  could be to read, write, update

: 

"r" - Read - Default value. Opens a file for reading, it returns an error if the file does not exist
"a" - Append - Opens a file for appending, creates the file if it does not exist
"w" - Write - Opens a file for writing, creates the file if it does not exist
"x" - Create - Creates the specified file, returns an error if the file exists
"t" - Text - Default value. Text mode
"b" - Binary - Binary mode (e.g. images)
Opening Files for Reading
The default mode of open is reading, so we do not have to specify 'r' or 'rt'. I have created and saved a file named reading_file_example.txt in the files directory. Let us see how it is done:

f = open('./files/reading_file_example.txt')
print(f) # <_io.TextIOWrapper name='./files/reading_file_example.txt' mode='r' encoding='UTF-8'>
As you can see in the example above, I printed the opened file and it gave some information about it. Opened file has different reading methods: read(), readline, readlines. An opened file has to be closed with close() method.

read(): read the whole text as string. If we want to limit the number of characters we want to read, we can limit it by passing int value to the read(number) method.

In [3]:
f = open('oop.ipynb')
txt = f.read()
print(type(txt))
print(txt)
f.close()

<class 'str'>
{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# chapter 9 CLASSES\n",
    "\n",
    ">Making an object from a class is called instantiation, and you work with instances of a class.\n",
    "\n",
    ">Learning about object-oriented programming will help you see the world as a programmer does. It’ll help you understand your code—not just what’s happening line by line, but also the bigger concepts behind it. Knowing the logic behind classes will train you to think logically, so you can write programs that effectively address almost any problem you encounter\n",
    "\n",
    ">Classes also make life easier for you and the other programmers you’ll work with as you take on increasingly complex challenges. \n",
    ">When you and other programmers write code based on the same kind of logic, you’ll be able to understand each other’s work.\n",
    "> Your programs will make sense to the people you work with, allowing everyone to accomplish 

"r" - Read - Default value. Opens a file for reading, it returns an error if the file does not exist
"a" - Append - Opens a file for appending, creates the file if it does not exist
"w" - Write - Opens a file for writing, creates the file if it does not exist
"x" - Create - Creates the specified file, returns an error if the file exists
"t" - Text - Default value. Text mode
"b" - Binary - Binary mode (e.g. images)
Opening Files for Reading
The default mode of open is reading, so we do not have to specify 'r' or 'rt'. I have created and saved a file named reading_file_example.txt in the files directory. Let us see how it is done:

In [1]:
f = open('./files/reading_file_example.txt')
print(f) # <_io.TextIOWrapper name='./files/reading_file_example.txt' mode='r' encoding='UTF-8'>

FileNotFoundError: [Errno 2] No such file or directory: './files/reading_file_example.txt'

As you can see in the example above, I printed the opened file and it gave some information about it. Opened file has different reading methods: read(), readline, readlines. An opened file has to be closed with close() method.

read(): read the whole text as string. If we want to limit the number of characters we want to read, we can limit it by passing int value to the read(number) method.

In [2]:
f = open('./files/reading_file_example.txt')
txt = f.read()
print(type(txt))
print(txt)
f.close()

FileNotFoundError: [Errno 2] No such file or directory: './files/reading_file_example.txt'

Instead of printing all the text, let us print the first 10 characters of the text file.

In [3]:
f = open('./files/reading_file_example.txt')
txt = f.read(10)
print(type(txt))
print(txt)
f.close()

FileNotFoundError: [Errno 2] No such file or directory: './files/reading_file_example.txt'

readline(): read only the first line

In [4]:
f = open('./files/reading_file_example.txt')
line = f.readline()
print(type(line))
print(line)
f.close()

FileNotFoundError: [Errno 2] No such file or directory: './files/reading_file_example.txt'

readlines(): read all the text line by line and returns a list of lines

In [6]:
f = open('./files/reading_file_example.txt')
lines = f.readlines()
print(type(lines))
print(lines)
f.close()

FileNotFoundError: [Errno 2] No such file or directory: './files/reading_file_example.txt'

Another way to get all the lines as a list is using splitlines():

In [7]:
f = open('./files/reading_file_example.txt')
lines = f.read().splitlines()
print(type(lines))
print(lines)
f.close()

FileNotFoundError: [Errno 2] No such file or directory: './files/reading_file_example.txt'

After we open a file, we should close it. There is a high tendency of forgetting to close them. There is a new way of opening files using with - closes the files by itself. Let us rewrite the the previous example with the with method:

In [8]:
with open('./files/reading_file_example.txt') as f:
    lines = f.read().splitlines()
    print(type(lines))
    print(lines)

FileNotFoundError: [Errno 2] No such file or directory: './files/reading_file_example.txt'

Opening Files for Writing and Updating
To write to an existing file, we must add a mode as parameter to the open() function:

"a" - append - will append to the end of the file, if the file does not it creates a new file.
"w" - write - will overwrite any existing content, if the file does not exist it creates.
Let us append some text to the file we have been reading:

In [None]:
with open('./files/reading_file_example.txt','a') as f:
    f.write('This text has to be appended at the end')

The method below creates a new file, if the file does not exist:

In [9]:
with open('./files/writing_file_example.txt','w') as f:
    f.write('This text will be written in a newly created file')

FileNotFoundError: [Errno 2] No such file or directory: './files/writing_file_example.txt'

Deleting Files
We have seen in previous section, how to make and remove a directory using os module. Again now, if we want to remove a file we use os module.

In [10]:
import os
os.remove('./files/example.txt')


FileNotFoundError: [WinError 3] The system cannot find the path specified: './files/example.txt'

If the file does not exist, the remove method will raise an error, so it is good to use a condition like this:

In [11]:
import os
if os.path.exists('./files/example.txt'):
    os.remove('./files/example.txt')
else:
    print('The file does not exist')

The file does not exist


File Types
File with txt Extension
File with txt extension is a very common form of data and we have covered it in the previous section. Let us move to the JSON file

File with json Extension
JSON stands for JavaScript Object Notation. Actually, it is a stringified JavaScript object or Python dictionary.

Example:

In [12]:
# dictionary
person_dct= {
    "name":"Asabeneh",
    "country":"Finland",
    "city":"Helsinki",
    "skills":["JavaScrip", "React","Python"]
}
# JSON: A string form a dictionary
person_json = "{'name': 'Asabeneh', 'country': 'Finland', 'city': 'Helsinki', 'skills': ['JavaScrip', 'React', 'Python']}"

# we use three quotes and make it multiple line to make it more readable
person_json = '''{
    "name":"Asabeneh",
    "country":"Finland",
    "city":"Helsinki",
    "skills":["JavaScrip", "React","Python"]
}'''

Changing JSON to Dictionary
To change a JSON to a dictionary, first we import the json module and then we use loads method.

In [13]:
import json
# JSON
person_json = '''{
    "name": "Asabeneh",
    "country": "Finland",
    "city": "Helsinki",
    "skills": ["JavaScrip", "React", "Python"]
}'''
# let's change JSON to dictionary
person_dct = json.loads(person_json)
print(type(person_dct))
print(person_dct)
print(person_dct['name'])

<class 'dict'>
{'name': 'Asabeneh', 'country': 'Finland', 'city': 'Helsinki', 'skills': ['JavaScrip', 'React', 'Python']}
Asabeneh


Changing Dictionary to JSON
To change a dictionary to a JSON we use dumps method from the json module.

In [14]:
import json
# python dictionary
person = {
    "name": "Asabeneh",
    "country": "Finland",
    "city": "Helsinki",
    "skills": ["JavaScrip", "React", "Python"]
}
# let's convert it to  json
person_json = json.dumps(person, indent=4) # indent could be 2, 4, 8. It beautifies the json
print(type(person_json))
print(person_json)

<class 'str'>
{
    "name": "Asabeneh",
    "country": "Finland",
    "city": "Helsinki",
    "skills": [
        "JavaScrip",
        "React",
        "Python"
    ]
}


Saving as JSON File
We can also save our data as a json file. Let us save it as a json file using the following steps. For writing a json file, we use the json.dump() method, it can take dictionary, output file, ensure_ascii and indent.

In [15]:
import json
# python dictionary
person = {
    "name": "Asabeneh",
    "country": "Finland",
    "city": "Helsinki",
    "skills": ["JavaScrip", "React", "Python"]
}
with open('./files/json_example.json', 'w', encoding='utf-8') as f:
    json.dump(person, f, ensure_ascii=False, indent=4)

FileNotFoundError: [Errno 2] No such file or directory: './files/json_example.json'

In the code above, we use encoding and indentation. Indentation makes the json file easy to read.

File with csv Extension
CSV stands for comma separated values. CSV is a simple file format used to store tabular data, such as a spreadsheet or database. CSV is a very common data format in data science.

Example:

In [16]:
"name","country","city","skills"
"Asabeneh","Finland","Helsinki","JavaScript"

('Asabeneh', 'Finland', 'Helsinki', 'JavaScript')

In [17]:
import csv
with open('./files/csv_example.csv') as f:
    csv_reader = csv.reader(f, delimiter=',') # w use, reader method to read csv
    line_count = 0
    for row in csv_reader:
        if line_count == 0:
            print(f'Column names are :{", ".join(row)}')
            line_count += 1
        else:
            print(
                f'\t{row[0]} is a teachers. He lives in {row[1]}, {row[2]}.')
            line_count += 1
    print(f'Number of lines:  {line_count}')

FileNotFoundError: [Errno 2] No such file or directory: './files/csv_example.csv'

File with xlsx Extension
To read excel files we need to install xlrd package. We will cover this after we cover package installing using pip.

In [19]:
import xlrd
excel_book = xlrd.open_workbook('sample.xls')
print(excel_book.nsheets)
print(excel_book.sheet_names)

ModuleNotFoundError: No module named 'xlrd'

File with xml Extension
XML is another structured data format which looks like HTML. In XML the tags are not predefined. The first line is an XML declaration. The person tag is the root of the XML. The person has a gender attribute. Example:XML

In [20]:
<?xml version="1.0"?>
<person gender="female">
  <name>Asabeneh</name>
  <country>Finland</country>
  <city>Helsinki</city>
  <skills>
    <skill>JavaScrip</skill>
    <skill>React</skill>
    <skill>Python</skill>
  </skills>
</person>

SyntaxError: invalid syntax (3890865105.py, line 1)

For more information on how to read an XML file check the [documentation](https://docs.python.org/2/library/xml.etree.elementtree.html)

In [21]:
import xml.etree.ElementTree as ET
tree = ET.parse('./files/xml_example.xml')
root = tree.getroot()
print('Root tag:', root.tag)
print('Attribute:', root.attrib)
for child in root:
    print('field: ', child.tag)

FileNotFoundError: [Errno 2] No such file or directory: './files/xml_example.xml'