# Lab 7:Reading and writing CSV, JSON and XML with Python

### Author: <font color='red'> Charles Moore </font>

### Part A - Reading and writing CSV files (20 points)

In [51]:
# Import necessary libraries
import os
import csv

#### Read a TXT file 'classic_books.txt' using Python's open() function

Python files are read using the Python's open() function
- Additional information about the __open()__ library can be found here:<br>
https://www.w3schools.com/python/python_file_open.asp <br>
https://docs.python.org/3/tutorial/inputoutput.html <br>

In addition to the standard __open()__ function, there is also the option to use __with open() as XXXX__<br>
Using __with open()__ eliminates the need to __close()__ the file.


<span style="color:blue">
    Make sure file <strong>classic_books.txt</strong> is located in the current directory.<br>
    Since this file is a <strong>.txt</strong> file, use Python <strong>open()</strong> to read the file line-by-line<br><br>
    - The file has 4 columns of data: Rank, Title, Author, Year<br>
    - Each data entry is separated by the <strong>'|'</strong> character, so use Python's <strong>split()</strong> method to parse the input line.<br>
    - Save each line as an entry in a Python List.<br><br>
    1. Create an empty list assigned to a variable named <strong>classics</strong> <br>
    2. Read file <strong>classic_books.txt</strong> using Python's <strong>open()</strong> function. <br>
    - NOTE: Use <strong>with open(...) as XXXXX:</strong> to eliminate having to <strong>close()</strong> the file.<br>
    - HINT: Remember to <strong>strip()</strong> whitespace <br>
    3. Parse the file using the <strong>.split()</strong> method <br>
    4. Save each line/row in a variable named <strong>book_data</strong> <br>
    5. Add <strong>book_data</strong> to <strong>classics</strong> <br>
</span>

In [52]:
# Read a TXT file 'classic_books.txt' using Python's open() function

# INSERT CODE FOR STEPS 1-5
classics = []
with open('classic_books.txt','r') as file:
    reader = file.readlines()
reader = [x.strip() for x in reader]
for i in reader:
    book_data = i.split('|')
    classics.append(book_data)

In [53]:
# DO NOT MODIFY !!!
# Print each item in classics
for i in classics:
    print(i)

['Rank', 'Title', 'Author', 'Year']
['1', 'Pride and Prejudice', 'Jane Austen', '1813']
['2', 'To Kill a Mockingbird', 'Harper Lee', '1960']
['3', 'The Great Gatsby', 'F. Scott Fitzgerald', '1925']
['4', 'One Hundred Years of Solitude', 'Gabriel Garcia Marquez', '1967']
['5', 'In Cold Blood', 'Truman Capote', '1965']
['6', 'Wide Sargasso Sea', 'Jean Rhys', '1966']
['7', 'Brave New World', 'Aldous Huxley', '1932']
['8', 'I Capture The Castle', 'Dodie Smith', '1948']
['9', 'Jane Eyre', 'Charlotte Bronte', '1847']
['10', 'Crime and Punishment', 'Fyodor Dostoevsky', '1866']


#### Write a CSV file 'classic_books.csv' using Python's csv.write() function

CSV files are plain text files used to store data in a tabular format, with each piece of data separated by a comma (,)
Python has a __csv__ library that allows you to read, parse and write CSV files
- Additional information about the __csv__ library can be found here: <br> 
https://docs.python.org/3/library/csv.html

A few of the most commonly used methods are:
- csv.reader(_csvfile, dialect='excel', \*\*fmtparams_) ~ returns a reader object which will iterate over the lines in the csv file
- csv.writer((_csvfile, dialect='excel', \*\*fmtparams_) ~ returns a writer object responsible for converting the user's data into delimted strings 

<div class="alert alert-warning">
<strong>IMPORTANT:</strong> Use the <strong>classics</strong> list created in the previous cell to write a new CSV file named <strong>classic_books.csv</strong>.
</div>

<span style="color:blue">
    6. Open a file named <strong>classic_books.csv</strong> and use Pythons <strong>csv.writer()</strong> method to write each entry in your <strong>classics</strong> list as a row in a CSV file.<br>
    - 6a. Iterate thru the <strong>classics</strong> list using <strong>.writerow</strong> to populate the file <br>
    - 6b. HINT: Specify <strong>newline=''</strong> to keep from getting extra blank lines in CSV file
</span>

In [54]:
# Write a CSV file named 'classic_books.csv' using Python's csv.writer() function

# INSERT CODE FOR STEP 6
with open('classic_books.csv', 'w', newline='') as file:
    writer = csv.writer(file)
    for x in classics:
        writer.writerow(x)

<div class="alert alert-success">
    Verify your <strong>classic_books.csv</strong> looks like the following image when opened with Excel <br><br>
    <img src="classic_books_xclx.jpg"><br>
    You can also open the file in Jupyter Notebook. It will look "similar" to Excel but not exactly.
</div>

#### Read a CSV file 'classic_books.csv' using Panda's read_csv() function

<div class="alert alert-warning">
<strong>IMPORTANT:</strong> Use the <strong>classic_books.csv</strong> file created in the previous cell.
</div>

<span style="color:blue">
    7. Read file <strong>classic_books.csv</strong> using Pandas <strong>read_csv()</strong> function. <br>
    8. Save the file contents in a <strong>DataFrame</strong> named <strong>classics_df</strong> <br>
</span>

In [55]:
# Import pandas library
import pandas as pd

# Read a CSV file named 'classic_books.csv' using Panda's read_csv() function
classics_df = pd.read_csv('classic_books.csv')

# INSERT CODE FOR STEPS 7-8

<div class="alert alert-success">
 <strong>NOTE how much less code is required to read a CSV file using Pandas!</strong>
</div>

In [56]:
# DO NOT MODIFY !!!
# Display the first 5 records in the file
classics_df.head()

Unnamed: 0,Rank,Title,Author,Year
0,1,Pride and Prejudice,Jane Austen,1813
1,2,To Kill a Mockingbird,Harper Lee,1960
2,3,The Great Gatsby,F. Scott Fitzgerald,1925
3,4,One Hundred Years of Solitude,Gabriel Garcia Marquez,1967
4,5,In Cold Blood,Truman Capote,1965


### Part B - Reading and writing JSON files (20 points)

#### Read JSON file 'classic_books.json' using Python

<div class="alert alert-warning">
    <strong>IMPORTANT: </strong>Make sure file <strong>classic_books.json</strong> is located in the current directory.<br>
    - The file is the JSON version of the CSV file created in the previous step.<br>
    - There are 4 keys: rank, title, author, year
</div>

JSON can easily be read or writen in Python by using a Dictionary object in Python (read into/write from)<br> 
To read a JSON file into a Python dictionary, you use the __json.load()__ method<br>
- json.load(...) ~ deserialize a fp (.read-supporting text file containing JSON) into a Python object

Additional information about Python & JSON can be found here:<br>
- https://stackabuse.com/reading-and-writing-json-to-a-file-in-python/ <br>
- https://docs.python.org/3/library/json.html 


<span style="color:blue">
    1. Read file <strong>classic_books.json</strong> <br> 
    2. Use the <strong>json.load()</strong> function to parse the JSON into a variable named <strong>json_data</strong>.
</span>

In [57]:
# Import necessary libraries
import json

# Read JSON file 'classic_books.json' using Python

# INSERT CODE FOR STEPS 1-2
with open('classic_books.json') as json_file:
    json_data = json.load(json_file)

#### Create JSON formatted string from Python dictionary using json.dumps() function

If you look at the format of a JSON file, you will notice how similar it is to Python's Dictionary format.<br>
So it should be no surprise that you can use a Python Dictionary to create a JSON file.<br>
To write a Python dictionary to a JSON string, you use the __json.dumps()__ method<br>
- json.dumps(...) ~ serializes an object as a JSON string

Additional information about Python & JSON can be found here:<br>
- https://stackabuse.com/reading-and-writing-json-to-a-file-in-python/ <br>
- https://docs.python.org/3/library/json.html 

<span style="color:blue">
    4. Use <strong>json.dumps()</strong> with <strong>indent=4</strong> to create a JSON object named <strong>json_instruments</strong> <br>
    *** Use the <strong>instruments_dict</strong> dictionary provided to create <strong>json_instruments</strong> ***<br>
    5. Print <strong>json_instruments</strong><br>
</span>

In [59]:
# Dictionary of instruments (PROVIDED)
instruments_dict = {'keyboard': ['piano', 'organ', 'syntesizer'],
               'brass': ['trumpet', 'tuba', 'trombone', 'french horn'],
               'woodwind': ['clarinet', 'oboe', 'bassoon'],
               'percussion': ['drum', 'xylophone'],
               'strings': ['violin', 'cello']}

In [64]:
# Convert Python dictionary (instruments_dict) to JSON formatted string

# INSERT CODE FOR STEPS 4-5
json_instruments = json.dumps(instruments_dict, indent=4)
print(json_instruments)

{
    "keyboard": [
        "piano",
        "organ",
        "syntesizer"
    ],
    "brass": [
        "trumpet",
        "tuba",
        "trombone",
        "french horn"
    ],
    "woodwind": [
        "clarinet",
        "oboe",
        "bassoon"
    ],
    "percussion": [
        "drum",
        "xylophone"
    ],
    "strings": [
        "violin",
        "cello"
    ]
}


#### Write instruments_dict to file in JSON format using json.dump()

To write a Python dictionary to a JSON <strong>file</strong>, you use the __json.dump()__ method<br>
- json.dump(...) ~ serializes an object as a JSON formatted stream (fp - .write-supporting file-like object)<br>

Additional information about JSON's .dump() method can be found here:<br>
- https://stackabuse.com/reading-and-writing-json-to-a-file-in-python/ <br>
- https://docs.python.org/3/library/json.html 


<span style="color:blue">
    6. Write a JSON file named <strong>instruments.json</strong> using Pythons <strong>json.dump()</strong>. <br><br>
</span>

In [65]:
# Write instruments_dict to file in JSON format using Python

# INSERT CODE FOR STEP 6
with open('instruments.json', 'w') as file:
    json.dump(json_instruments,file)

<div class="alert alert-success">
 <strong>NOTE:</strong>  You should visually verify the existance of this new file in your current directory. <br>Does the file <strong>instruments.json</strong> now appear in Jupyter Notebook directory list (left pane)?
</div>

### Part C - Reading and parsing XML files (20 points)

#### Read and parse XML file 'classic_books.xml' using ElementTree

<div class="alert alert-warning">
<strong>IMPORTANT: </strong>Make sure file <strong>classic_books.xml</strong> is located in the current directory.  
</div>

<span style="color:blue">
 <strong>REFERENCE:</strong> https://www.edureka.co/blog/python-xml-parser-tutorial/ <br><br>
Use <strong>ElementTree()</strong> to read and parse file <strong>classic_books.xml</strong> <br><br>
    1. Use <strong>ET.parse('classic_books.xml')</strong> to read and part the XML file into <strong>mytree</strong> <br>
    2. Use <strong>mytree.getroot()</strong> to get the root of the tree in <strong>myroot</strong> <br>
    3. Print <strong>myroot.tag</strong> <br>
</span>

In [67]:
# Import necessary libraries
import xml.etree.ElementTree as ET

# Read and parse XML file using ElementTree

# INSERT CODE FOR STEPS 1-3
mytree = ET.parse('classic_books.xml')
myroot = mytree.getroot()
print(myroot.tag)

CLASSICS


<span style="color:blue">
Now that you have the root of the tree, use it to find ALL the book titles <br>
    4. Use a <strong>for loop</strong> to <strong>findall('BOOK')</strong> elements <br>
    5. Use <strong>.find('TITLE').text</strong> to get the title text from each book element <br>
    6. Print the <strong>TITLE</strong> value/text <br><br>
</span>

In [73]:
# Find and print ALL the TITLE values (text) for every BOOK element

# INSERT CODE FOR STEPS 4-6
for x in myroot.findall('BOOK'):
    title = x.find('TITLE').text
    print(title)

Pride and Prejudice
To Kill a Mockingbird
The Great Gatsby
One Hundred Years of Solitude
In Cold Blood
Wide Sargasso Sea
Brave New World
I Capture The Castle
Jane Eyre
Crime and Punishment


### Part D - Reading and parsing YAML files (20 points)

#### Read and parse YAML file 'classic_books.yaml' 

<div class="alert alert-warning">
<strong>IMPORTANT: </strong>Make sure file <strong>classic_books.yaml</strong> is located in the current directory.
</div>

The YAML file syntax is structured very much like a Python Dictionary, but without the __{ }s__<br>
However, the YAML file syntax does have a close tie-in to Python -- INDENTATION! <br>
- Indentation is a key aspect of YAML and is integral to defining the structure of a YAML file.<br>
- Indentation problems cause ERRORS, so you have to be very careful when creating YAML files (NO TABS!).<br>
- The good news is there are lots of YAML verification tools available, here are a few:<br>
http://www.yamllint.com/<br>
https://codebeautify.org/yaml-validator<br>

**Additional information** about reading & writing YAML in Python can be found here:
https://stackabuse.com/reading-and-writing-yaml-to-a-file-in-python/

<span style="color:blue">
    1. Open file <strong>class_books.yaml</strong> <br>
    - NOTE: Use <strong>with open(...) as XXXXX:</strong> to eliminate having to <strong>close()</strong> the file.<br>
    2. Use <strong>yaml.load()</strong> with <strong>Loader=yaml.FullLoader</strong> to read file <strong>class_books.yaml</strong> into a variable named  <strong>classics_dict</strong>.<br>
</span>

In [75]:
# Import necessary libraries
import yaml

# Read and parse YAML file using yaml.load() with Loader=yaml.FullLoader

# INSERT CODE FOR STEPS 1-2
with open('classic_books.yaml') as file:
    classics_dict = yaml.load(file,Loader=yaml.FullLoader)

In [76]:
# DO NOT MODIFY !!!
print(classics_dict, '\n')

{'classics': {'books': [{'rank': 1, 'title': 'Pride and Prejudice', 'author': 'Jane Austen', 'year': 1813}, {'rank': 2, 'title': 'To Kill a Mockingbird', 'author': 'Harper Lee', 'year': 1960}, {'rank': 3, 'title': 'The Great Gatsby', 'author': 'F. Scott Fitzgerald', 'year': 1925}, {'rank': 4, 'title': 'One Hundred Years of Solitude', 'author': 'Gabriel Garcia Marquez', 'year': 1967}, {'rank': 5, 'title': 'In Cold Blood', 'author': 'Truman Capote', 'year': 1965}, {'rank': 6, 'title': 'Wide Sargasso Sea', 'author': 'Jean Rhys"', 'year': 1966}, {'rank': 7, 'title': 'Brave New World', 'author': 'Aldous Huxley', 'year': 1932}, {'rank': 8, 'title': 'I Capture The Castle', 'author': 'Dodie Smith', 'year': 1948}, {'rank': 9, 'title': 'Jane Eyre,', 'author': 'Charlotte Bronte', 'year': 1847}, {'rank': 10, 'title': 'Crime and Punishment', 'author': 'Fyodor Dostoevsky', 'year': 1866}]}} 



<div class="alert alert-success">
    <strong>NOTE</strong> the structure of the dictionary data -- you have a dictionary within a dictionary within a dictionary,
</div>

<span style="color:blue">   
    3. Create a dictionary named <strong>books_dict</strong> containing the values where KEY = <strong>classics</strong><br>
    4. Create a dictionary named <strong>book_info_dict</strong> containing the values where  KEY = <strong>books</strong><br>
</span>

In [81]:
# INSERT CODE FOR STEP 3-4

# Get the books using 'classics' as the KEY
books_dict = classics_dict['classics']
# Get the books info using 'books' as the KEY
book_info_dict = books_dict['books']

In [82]:
# DO NOT MODIFY !!!
# Use for loop to loop thru the books printing each data field
for info in book_info_dict:
    print(info['rank'],info['title'],info['author'],info['year'])

1 Pride and Prejudice Jane Austen 1813
2 To Kill a Mockingbird Harper Lee 1960
3 The Great Gatsby F. Scott Fitzgerald 1925
4 One Hundred Years of Solitude Gabriel Garcia Marquez 1967
5 In Cold Blood Truman Capote 1965
6 Wide Sargasso Sea Jean Rhys" 1966
7 Brave New World Aldous Huxley 1932
8 I Capture The Castle Dodie Smith 1948
9 Jane Eyre, Charlotte Bronte 1847
10 Crime and Punishment Fyodor Dostoevsky 1866


### Part E - Writing a ZIP file (20 points)

#### Write a ZIP file 'CSC221Lab7.zip' containing 6 files from the lab

<span style="color:blue">
Use <strong>zipfile</strong> to create a ZIP file containing the TXT, CSV, JSON, XML and YAML files from this lab <br>
    1. Create a list named <strong>files_to_zip</strong> with the following filenames:<br>
        <ol>
        <li>classic_books.txt</li>
        <li>classic_books.csv</li> 
        <li>classic_books.json</li>
        <li>classic_books.xml</li>
        <li>classic_books.yaml</li>
        <li>instruments.json</li>
        </ol>
    2. Use <strong>zipfile.ZipFile()</strong> to create a ZIP file named <strong>CSC221Lab7.zip</strong> containing the <strong>SIX (6)</strong> files in <strong>files_to_zip</strong> <br>
    3. Print the message <strong>"<i>zipfilename</i> created successfully"</strong> where <i>zipfilename</i> is the name of the ZIP file.
</span>

In [90]:
# Import zipfile libraries
import zipfile

# Write a ZIP file using Python zipfile.Zipfile()

# INSERT CODE FOR STEPS 1-3
files_to_zip = ['classic_books.txt','classic_books.csv',
               'classic_books.json','classic_books.xml',
               'classic_books.yaml','instruments.json']
zip_file_name = 'CSC221Lab7.zip'
zipObj = zipfile.ZipFile(zip_file_name,'w')
for i in files_to_zip:
    zipObj.write(i)
zipObj.close()
print(zip_file_name,' create sucessfully')

CSC221Lab7.zip  create sucessfully
