**First Starting with Text Files**

In [2]:
# 1. Locating the File
filepath ="C:/Users/nrmmw/Documents/Flatiron/dsc-file-io/zen_of_python.txt"

In [3]:
# 2. To open the file, we will use the open() function
file_obj = open(filepath)

In [4]:
# 3. Reading from the file using readlines()
file_contents = file_obj.readlines()
# file_contents is now a list
file_contents[0] # First Line

'The Zen of Python, by Tim Peters\n'

In [5]:
for line in file_contents:
    print(line, end = '')
# end = "" is used to avoid double spacing after '\n'

The Zen of Python, by Tim Peters

Beautiful is better than ugly.
Explicit is better than implicit.
Simple is better than complex.
Complex is better than complicated.
Flat is better than nested.
Sparse is better than dense.
Readability counts.
Special cases aren't special enough to break the rules.
Although practicality beats purity.
Errors should never pass silently.
Unless explicitly silenced.
In the face of ambiguity, refuse the temptation to guess.
There should be one-- and preferably only one --obvious way to do it.
Although that way may not be obvious at first unless you're Dutch.
Now is better than never.
Although never is often better than *right* now.
If the implementation is hard to explain, it's a bad idea.
If the implementation is easy to explain, it may be a good idea.
Namespaces are one honking great idea -- let's do more of those!



In [6]:
# 4. Closing the file
# Disconnecting Python from the file object
file_obj.close()

# Running any code on the file will result in a ValueError
# You can however still analyze its contents

In [7]:
# To count how many times the line "is better than" was used
is_better_than_count = 0
for line in file_contents:
    if "is better than" in line:
        is_better_than_count += 1
print(f"The phrase appears {is_better_than_count} times in {file_contents[0]}")

The phrase appears 7 times in The Zen of Python, by Tim Peters



There is a shortened syntax for opening, reading and closing a file using the *with* keyword.

From:

file_obj = open(file_path)

file_contents = file_obj.readlines()

file_obj.close()

To:

with open(file_path) as file_obj:

    file_contents = file_obj.readlines()

**Other Reading methods**

- file.readlines() reads all of the lines into a list of strings
- file.read() reads everything into a single string
- file.read(100) reads up to 100 characters and display it in a singe string
- file.readline() reads only one line of the file into a string 

In [8]:
with open(filepath) as file_obj:
    file_contents = file_obj.readline()
    # Prints only one line
file_contents

'The Zen of Python, by Tim Peters\n'

In [9]:
with open(filepath) as file_obj:
    file_contents = file_obj.read()
    # Prints all into a string
file_contents

"The Zen of Python, by Tim Peters\n\nBeautiful is better than ugly.\nExplicit is better than implicit.\nSimple is better than complex.\nComplex is better than complicated.\nFlat is better than nested.\nSparse is better than dense.\nReadability counts.\nSpecial cases aren't special enough to break the rules.\nAlthough practicality beats purity.\nErrors should never pass silently.\nUnless explicitly silenced.\nIn the face of ambiguity, refuse the temptation to guess.\nThere should be one-- and preferably only one --obvious way to do it.\nAlthough that way may not be obvious at first unless you're Dutch.\nNow is better than never.\nAlthough never is often better than *right* now.\nIf the implementation is hard to explain, it's a bad idea.\nIf the implementation is easy to explain, it may be a good idea.\nNamespaces are one honking great idea -- let's do more of those!\n\n"

In [10]:
with open(filepath) as file_obj:
    # Prints 100 character
    file_contents = file_obj.read(100)
    # Prints first 100 characters
file_contents

'The Zen of Python, by Tim Peters\n\nBeautiful is better than ugly.\nExplicit is better than implicit.\nS'

In [11]:
with open(filepath) as file_obj:
    file_contents = file_obj.readlines()
    # prints everything onto a lit
file_contents

['The Zen of Python, by Tim Peters\n',
 '\n',
 'Beautiful is better than ugly.\n',
 'Explicit is better than implicit.\n',
 'Simple is better than complex.\n',
 'Complex is better than complicated.\n',
 'Flat is better than nested.\n',
 'Sparse is better than dense.\n',
 'Readability counts.\n',
 "Special cases aren't special enough to break the rules.\n",
 'Although practicality beats purity.\n',
 'Errors should never pass silently.\n',
 'Unless explicitly silenced.\n',
 'In the face of ambiguity, refuse the temptation to guess.\n',
 'There should be one-- and preferably only one --obvious way to do it.\n',
 "Although that way may not be obvious at first unless you're Dutch.\n",
 'Now is better than never.\n',
 'Although never is often better than *right* now.\n',
 "If the implementation is hard to explain, it's a bad idea.\n",
 'If the implementation is easy to explain, it may be a good idea.\n',
 "Namespaces are one honking great idea -- let's do more of those!\n",
 '\n']

In [12]:
import string

file_contents_cleaned = []
for line in file_contents:
    words = line.split() # Splits each word according to whitespaces
    cleaned_words = [word.strip(string.punctuation).lower() for word in words] 
    # strips punctuation from each word and changes to lower case
    cleaned_line = " ".join(cleaned_words) + "\n" # joins each word with a space
    file_contents_cleaned.append(cleaned_line) # Appends to new list
file_contents_cleaned

['the zen of python by tim peters\n',
 '\n',
 'beautiful is better than ugly\n',
 'explicit is better than implicit\n',
 'simple is better than complex\n',
 'complex is better than complicated\n',
 'flat is better than nested\n',
 'sparse is better than dense\n',
 'readability counts\n',
 "special cases aren't special enough to break the rules\n",
 'although practicality beats purity\n',
 'errors should never pass silently\n',
 'unless explicitly silenced\n',
 'in the face of ambiguity refuse the temptation to guess\n',
 'there should be one and preferably only one obvious way to do it\n',
 "although that way may not be obvious at first unless you're dutch\n",
 'now is better than never\n',
 'although never is often better than right now\n',
 "if the implementation is hard to explain it's a bad idea\n",
 'if the implementation is easy to explain it may be a good idea\n',
 "namespaces are one honking great idea  let's do more of those\n",
 '\n']

In [13]:
# Let's say we want to store the above file in another file
# To open and write into the file, we use "w" to mean write

output_file_obj = open("C:/Users/nrmmw/Documents/Flatiron/dsc-file-io/zen_of_python_cleaned.txt", mode = "w")


In [14]:
# Close the file to write it on disk
output_file_obj.close()

**Comma Separated Values**


In [20]:
# If we read it as txt file, the output would be:

# Prints each row as a string
with open("C:/Users/nrmmw/Documents/Flatiron/dsc-file-io/food_prices.csv") as f:
    print(f.readline())
    print(f.readline())
    print(f.readline())
    print(f.readline())

City,Bread,Burger,Milk,Oranges,Tomatoes

ATLANTA,24.5,94.5,73.9,80.1,41.6

BALTIMORE,26.5,91.0,67.5,74.6,53.3

BOSTON,29.7,100.8,61.4,104.0,59.6



In [23]:
# It would be better to use the csv module.
import csv

# This converts the table into a dictionary respresenting each row
with open("C:/Users/nrmmw/Documents/Flatiron/dsc-file-io/food_prices.csv") as f:
    reader = csv.DictReader(f)
    print(next(reader))
    print(next(reader))

{'City': 'ATLANTA', 'Bread': '24.5', 'Burger': '94.5', 'Milk': '73.9', 'Oranges': '80.1', 'Tomatoes': '41.6'}
{'City': 'BALTIMORE', 'Bread': '26.5', 'Burger': '91.0', 'Milk': '67.5', 'Oranges': '74.6', 'Tomatoes': '53.3'}


**Excel**

Excel files are binary encoded as bytes and thus cannot be read as a txt file.

We will have to call a library called xlrd

In [28]:
import xlrd

# Noneed to call open/close on this library
book = xlrd.open_workbook(r"C:\Users\nrmmw\Documents\Flatiron\dsc-file-io\cities.xls")
sheet = book.sheet_by_name("Sheet1")
cols = [sheet.cell_value(0, col) for col in range(sheet.ncols)]

for row in range(1, 4):
    row_dict = {}
    for col_index, col_value in enumerate(cols):
        row_dict[col_value] = sheet.cell_value(row, col_index)
    print(row_dict)

{'City': 'Solta', 'Country': 'Croatia', 'Population': 1700.0, 'Area': 59.0}
{'City': 'Greenville', 'Country': 'USA', 'Population': 84554.0, 'Area': 68.0}
{'City': 'Buenos Aires', 'Country': 'Argentina', 'Population': 13591863.0, 'Area': 4758.0}


**JSON**

In [29]:
import json

with open(r"C:\Users\nrmmw\Documents\Flatiron\dsc-file-io\leia.json") as f:
    leia_data = json.load(f)
leia_data

{'name': 'Leia Organa',
 'height': '150',
 'mass': '49',
 'hair_color': 'brown',
 'skin_color': 'light',
 'eye_color': 'brown',
 'birth_year': '19BBY',
 'gender': 'female',
 'homeworld': 'http://swapi.dev/api/planets/2/',
 'films': ['http://swapi.dev/api/films/1/',
  'http://swapi.dev/api/films/2/',
  'http://swapi.dev/api/films/3/',
  'http://swapi.dev/api/films/6/'],
 'species': [],
 'vehicles': ['http://swapi.dev/api/vehicles/30/'],
 'starships': [],
 'created': '2014-12-10T15:20:09.791000Z',
 'edited': '2014-12-20T21:17:50.315000Z',
 'url': 'http://swapi.dev/api/people/5/'}

**XML**

In [31]:
import xml.etree.ElementTree as ET

tree = ET.parse(R"C:\Users\nrmmw\Documents\Flatiron\dsc-file-io\leia.xml")
root = tree.getroot()
print("Tree:", tree)
print("Root:", root)
print("Child nodes:")

for child in root:
    if len(list(child)) > 0:
        print(child.tag, "| [", child[0], "... ]")
    elif child.text:
        print(child.tag, "|", child.text)

Tree: <xml.etree.ElementTree.ElementTree object at 0x000002791C696340>
Root: <Element 'person' at 0x000002791C6B77C0>
Child nodes:
name | Leia Organa
height | 150
mass | 49
hair_color | brown
skin_color | light
eye_color | brown
birth_year | 19BBY
gender | female
homeworld | http://swapi.dev/api/planets/2/
films | [ <Element 'film' at 0x000002791C669680> ... ]
vehicles | [ <Element 'vehicle' at 0x000002791C66EEF0> ... ]
created | 2014-12-10T15:20:09.791000Z
edited | 2014-12-20T21:17:50.315000Z
url | http://swapi.dev/api/people/5/
