# Text versus Binary Files and JSON

This worksheet is based on sections 2.4 

This in-class activity illustrates some of what you can do with JSON, and should help you to understand the difference between text and binary files.

Please run the cell below.

In [2]:
import json
import os

file1 = 'numbers.json'
file2 = 'numbers.dat'

# Create a simple list consisting of integer values
vars = [198, 247, 0, 128]

# Convert the list of integer values into a single binary byte value
bvars = bytes(vars)

# Text/Strings versus Binary

- A **string** is a sequence of text characters, where each character comes from the set of visible characters (available on a keyboard) plus certain control characters, like newline, (horizontal) tab, and carriage return.

- When a **file** is comprised of a sequence of text characters (in some encoding like UTF-8), it is called a **text file**.  A file that contains "raw" bytes, where integers, floats, and object variables retain their in-memory representation, is called a **binary file**.

- In Python, a **bytes** data type is used to obtain the underlying "raw" bytes of integers, floating point numbers, and even more complex data structures, like lists.

### Initial Questions

1. Is `vars`, as it resides in memory in a Python program, a sequence of characters?  ... if we were able to "look" at how this list was stored, would we be able to see, in that storage, the `'['`, the `','`, the character `'1'`, etc?
2. Write down a string with that sequence of characters and name it `vars2`; Is it a legal operation to, for instance, access `vars2[0]`?  What do you get, if you do?

## Writing from Python to Files

**Q** In the next cell, create a **string** that corresponds to the `vars` list of integers and then write that out to `numbers0.txt`.

**Q** In the next cell, write the binary (bytes) version of `vars` to the file `numbers0.dat`

**Q** In the next cell, use the `json` `dump` function to write `vars` to `numbers.json`.  Here is the link to the documentation: https://docs.python.org/3/library/json.html#basic-usage, and here is a link to the book section: https://tcbressoud.github.io/datasystems-bookweb/2-ch-filesystems.html#writing-data-structures-to-json

## Exploring text and binary files using Atom

Now, follow along with your instructor looking at the hex representation of the actual bytes of text and binary files.

## Reading from JSON

**Q** Create a text file named `test.json` (using either Atom or the Text Editor of Jupyter Lab) and add contents for a single top level data structure that contains a combination of one or more dictionaries, one or more lists, and uses integer and string values.  Make sure you delimit every string value with one double quote at the beginning and one double quote at the end.

**Q** In the next cell, open the file you created above, and then use the `load` function of the JSON module to read the value into a variable, `vars2`. Then use the `type` function to see its type.

# JSON

JSON files are text files, which can be used to easily move between Python data structures (e.g., lists, dictionaries) and files containing data (e.g., csv, txt). Recall that *binary files* are files whose bytes directly represent underlying data types, like integers and floats, rather than textual data. You have probably had the experience in the past of struggling to open a binary file, since a text editor cannot always convert the bytes into meaningful information for you. JSON files help manage this conversion.

**Q1** (warmup) In the `babyNames()` function below, open the file named `babynames.csv`. Create a dictionary to contain the names as keys and the numbers as values. Read each line of the file, rather than reading the entire file all at once. Make sure you close the file. Populate and return the dictionary you have created.

Note: if time is of the essence, you can skip this problem.

In [1]:
# Solution cell

def babyNames():
    """Open the comma-separated value file babynames.csv and return a 
    dictionary with names as keys and numbers of applications as values.
    
    """
    ### BEGIN SOLUTION
    fh = open("babynames.csv", 'r')
    bN = {}
    for line in fh:
        name = line.split(',')[1].strip()
        number = line.split(',')[0]
        bN[name] = number
    fh.close()
    return bN
    ### END SOLUTION

In [2]:
assert isinstance(babyNames(), dict)
assert len(babyNames().keys()) == 6
assert '22127' == babyNames()['Jacob']

In case you didn't have time for the problem above, the cell below gives you the dictionary the function above wold have created, which you can use in the problems below.

In [5]:
bN = {"Jacob":"22127", "Ethan":"18002", "Michael":"17350", "Jayden":"17179", "William":"17051", "Alexander":"16756"}


**Q2** In the `babyNamesDict()` function below, write the dictionary created using `babynames.csv` into a JSON formatted file without using any of the JSON utilities.  Make sure you close the file. Return the name of the file you have written. Ensure the file type is ".json" and remember that the syntax of a JSON file requires curly braces { } at the beginning and end.

Note: in case time is short, please be aware that the folder already contains the json file you are being asked to create here, as `babynames.json`. If you solve this problem and run your code, it will overwrite that file (hopefully with an identical one, if you solve this correctly!)

In [3]:
# Solution cell

def babyNamesDict():
    """Writes the dictionary from babyNames() out to a file. 
    Returns the name of the file.
    
    """
    ### BEGIN SOLUTION
    bN = babyNames()
    fN = "babynames.json"
    fh = open(fN, 'w')
    names = list(bN.keys())
    fh.write("{")
    count = 0
    total = len(names)
    for n in names:
        fh.write('"'+ n +'":"' + bN[n] + '"')
        if count < total-1:
            fh.write(", ")
        count += 1
    fh.write("}")
    fh.close()
    return fN
    ### END SOLUTION

In [9]:
output = babyNamesDict()
assert isinstance(output, str)
assert output[-5:] == '.json'
assert os.path.isfile(output)

The exercises above showed how to convert a csv file into a dictionary, and how to write a JSON file based on that dictionary "by hand." In practice, you would never do such a thing. Instead, you would use the built-in `dump` function. Recall from the reading that the `dump` function works as illustrated in the following code. We begin with a data structure in memory, then open a ".json" file to write into (this file does not need to already exist), then `dump(data,file)` will dump a dictionary `data` into a JSON `file`. We then close the file at the end. Please run the code below, then go check out the file `names.json` you have created. You could also change the name in the code and it will create a new file.

In [3]:
names = ["Isabella", "Sophia", "Emma", "Olivia", "Ava"]
f = open("names.json", 'w')

json.dump(names, f)

f.close()

**Q3a** Please mimic the code above to do the same thing with the dictionary `bN` given above, containing the dictionary of babynames and counts. Call your file `babynames2.json`, and don't forget to close the file after you dump into it. You can also experiment with other file names.

In [6]:
# Solution cell

### BEGIN SOLUTION
fN = "babynames2.json"
fh = open(fN, 'w')
json.dump(bN, fh)
fh.close()
### END SOLUTION


**Q3b** Turn your code from above into a function. In the `babyNamesDictJSON()` function below, write the dictionary created using `babynames.csv` into a JSON formatted file named `"babynames2.json"`, using the JSON `dump` function. Make sure you close the file after `dump`. Return the name of the file you have written.

Note: if time is short, please be aware that the file `babynames2.json` is already in the folder, so you can complete the next problem even if you don't do this one.

In [7]:
# Solution cell

def babyNamesDictJSON():
    """Writes the dictionary created by babyNames() into a file named 'babynames2.json using json utilities.
       Returns the name of the file.
    
    """
    ### BEGIN SOLUTION
    bN = babyNames()
    fN = "babynames2.json"
    fh = open(fN, 'w')
    json.dump(bN, fh)
    fh.close()
    return fN
    ### END SOLUTION

In [8]:
output2 = babyNamesDictJSON()
#assert output != output2
#assert output2 == "babynames2.json"
assert os.path.isfile(output2)

Please note the difference between the JSON files `names.json` and `babynames.json`, and remember that the first came from a list while the second came from a dictionary. 

The exercise above shows how to write from a Python data structure into a JSON file. To go the other way, we use the `load` function. The command `json.load(file)` will return a data structure based on a json file. For the file `names.json`, the `load` function returns a list. For `babynames.json` it should return a dictionary. Please carefully read the code below. Note that you must first `open` the file before you load, and so you should `close` it when you're done.

In [12]:
# Sample code

fileNames = open("names.json", 'r')
L = json.load(fileNames)
fileNames.close()
print(L)

['Isabella', 'Sophia', 'Emma', 'Olivia', 'Ava']


**Q4a** Please mimic the code above, but now do it for the file `"babynames.json"`. What data type is your result, and why?

In [None]:
# Solution cell

### BEGIN SOLUTION
fileBaby = open("babynames.json", 'r')
D = json.load(fileBaby)
fileBaby.close()
print(D)
### END SOLUTION

**Q4b** In the `compareJSON()` function below, write a utility function that compares the texts of two JSON files given their filenames. Assume that the filenames contain the file extension. Please make use of the `load` function that comes with json files.

In [29]:
# Solution cell

def compareJSON(filename1, filename2):
    """ Compare the text from two JSON files given their 
    filenames. Return true if they are the same.
    
    """
    ### BEGIN SOLUTION
    fh = open(filename1, 'r')
    fh2 = open(filename2, 'r')
    stuff = json.load(fh)
    stuff2 = json.load(fh2)
    fh.close()
    fh2.close()
    return stuff == stuff2
    ### END SOLUTION

In [30]:
assert compareJSON("babynames2.json", "babynames2.json")
assert compareJSON(output, output2)
assert not compareJSON("names.json","babynames.json")