# Reading and Writing Data

Up to this point, you have seen some examples of using files and you are probably familiar with computer files in general. In this lesson, we are going to provide a little more depth.

Files are a key concept of the stored program computer, which is the common data computer, smart phone, tablet, and just about every other computing device. A file is a simply a sequence of computer bytes which has a name and starting location associated within the file system. The key difference between files and memory is that files are persistent, that is if the system is powered down, the files will be available when the system is powered back on.

Lets look at an image from earlier in the boot camp:

![Computer Programming](../images/ComputerProgram.png)

In the case of computer programs, either the input or output or both can be files. This is often the case when you are using python to massage and un-mangle data that exists in a collection of files.

## File Type

The concept of a file type is nearly irrelevant for a file stored on the secondary storage system.  However, we often conceptually divide files into two classes:

  *  Text files, e.g., plain, readable characters within the file
  *  Binary files, e.g., files of bytes which have some structure that is relevant to a particular application

Why do we conceptualize hundreds of types of binary files, ranging from PDF, Office, Audio/video, etc. as one type - binary?  The character mode versus binary mode is usually all that matters when opening a file to read or write within a program. In either case, the file is a sequence of bytes. In the text file, the bytes encode characters.

## Using Files

The general pattern of file usages/opearations are:

  *  Open file (ether in read or write mode)
  *  Read or Write file
  *  Close file

By opening a file, we are telling a computer system to look at the sequence of bytes referenced by the file name and prepare to read and/or write bytes.

Just as a file name is a handle to reference the sequence of bytes in the file system, opening a file assigns the open file to a handle in Python. The labs at the end of this module will get into various examples for file input and output.


## File Access Mode

Access modes govern the type of operations possible in the opened file. It refers to how the file will be used once it’s opened. These modes also define the location of the File Handle in the file. File handle is like a cursor, which defines from where the data has to be read or written in the file. Different access modes for reading a file are:

* **Read Only ('r'):** Open the file for reading. We can read the contents, but cannot update it. The handle is positioned at the beginning of the file. This is the default mode i.e., if mode is not mentioned then the file will be opened in read mode.
* **Write Only (‘w’):** Open the file for writing. For an existing file, the data is truncated and over-written. The handle is positioned at the beginning of the file. Creates the file if the file does not exist.
* **Write and Read (‘w+’):** Open the file for reading and writing. For an existing file, data is truncated and over-written. The handle is positioned at the beginning of the file.
* **Append Only (‘a’):** Open the file for writing. The file is created if it does not exist. The handle is positioned at the end of the file. The data being written will be inserted at the end, after the existing data.

You can learn about other modes [here](https://www.tutorialspoint.com/python/python_files_io.htm).

## File Operations

To go through the file operations, we will first create a file and save it to the disk. Then we will open the file that we created and print the contents of the file. 

There are two ways to write in a file.

* **write():** Inserts the string `str1` in a single line in the text file.
  ```File_object.write(str1)```
* **writelines():** For a list of string elements, each string is inserted in the text file. Used to insert multiple strings at a single time.
  ```File_object.writelines(L)```, where  ```L = [str1, str2, str3]``` 

In [None]:
filepath = "mydata/files/myfile01.txt"

# Note that we don't have temp directory here. We can either manually or programatically create it.
# For creating the directory programatically we can take advantage of `os` package
import os
os.makedirs("mydata/files", exist_ok=True)
# once the above line is executed, you will see two dirs are created within labs dir: files dir within mydata

# open the file in write mode
fh = open(filepath, 'w')  


# fh stands for file handle. You can treat this as a file object. Apply methods on the object 
# to perform various file operations

# Writing a string to file
fh.write("PREAMBLE")

# write doesn't insert a new line (\n). You need to add it at end of the string (e.g., "PREAMBLE\n" )
# or you can call the write function again
fh.write("\n")

# Writing a string to file 
fh.write("""We the People of the United States, in Order to form a more perfect Union, establish Justice, 
insure domestic Tranquility, provide for the common defense, promote the general Welfare, and 
secure the Blessings of Liberty to ourselves and our Posterity, do ordain and establish this 
Constitution for the United States of America.\n""")

# Writing multiple strings 
# at a time 
fh.writelines(['ARTICLE I\n', 'SECTION 1\n', 
               """All legislative Powers herein granted shall be vested in a Congress of the United States, 
which shall consist of a Senate and House of Representatives.\n"""])

text = ["SECTION 2\n", """The House of Representatives shall be composed of Members chosen every second Year by the 
People of the several States, and the Electors in each State shall have the Qualifications requisite for Electors 
of the most numerous Branch of the State Legislature."""]

for line in text:
    fh.write(line)
    

# Closing file. (very important)
fh.close() 



<mark>With the above syntaxt, it is crucial that you close the file you have opened. The close() method of a file object flushes any unwritten information and closes the file object, after which no more writing can be done.</mark>

As we tend to forget closing a file, Python offers `with` statement (see Lesson M2-L1 on [Flow Control](../../Module2/labs/M2-L1-ControlStructures.ipynb#The-with-control-statement). Using `with` statement, the interpretter makes sure that certain clean-up actions (e.g., closing a file) are taken.

Let's open the above file append mode ('a') using with statement.


In [None]:
with open(filepath, 'a') as fh:
    fh.write("\n")
    fh.write("""No Person shall be a Representative who shall not have attained to the Age of twenty five Years, 
and been seven Years a Citizen of the United States, and who shall not, when elected, be an Inhabitant of that 
State in which he shall be chosen.""")

You can check the content by opening the file. 

Now we will open this file in read mode, and print the contents line by line. 

In [None]:
with open(filepath, 'r') as fh:
    for line in fh:  # reading line by line
        print(line)

# Note: we don't need to close the file here as with statement will take care of that

We can also use `readline()` or `readlines()` methods to read files line by line.

In [None]:
with open(filepath, 'r') as fh:
    line = fh.readline()  # read a line
    while line:  # continue untill the line is EOF (end of file)
        print(line)
        line = fh.readline()

# Note: we don't need to close the file here as with statement will take care of that

In [None]:
# it possible to read all of the lines at once

with open(filepath, 'r') as fh:
    lines = fh.readlines()  # read all the lines and store them to a list
    print(lines)
    
print("--" * 50)
with open(filepath, 'r') as fh:
    block = fh.read()  # read a line as a string
    print(block)
    
    lines = block.split("\n") # split the block into lines
    print(lines)


## Handling special types of files

Although the above methods are sufficient to read/write almost any types of files, some types of files have special purpose, and it is better to use advanced packages to handle such files as it makes the code simpler.

The following examples show how to read and write different types of text files: a) csv and b) json. We will be usign two libraries `csv` and `json` to deal with these kind of files. Note that there are libraries to read and write both csv and json files. In the next lesson we will see how to use `pandas` to read and write csv files. 

### CSV File
A comma-separated values or csv files is a text file where each line represents a record and the values within a  record are separated with commas. 

Reading data from a CSV file and printing it with line numbers:

In [None]:
# importing csv package/library
import csv  

filepath = "/dsa/data/all_datasets/SyriaIDPSites2015LateJunHIUDoS.csv"

line_number = 0
fh = open(filepath, encoding='latin-1')

file_data = csv.reader(fh)  # creating a csv reader object, which will process each line

for data_row in file_data:
    line_number += 1
    print(f'{line_number:0>4} {data_row}')
    # here {line_number:0>4} means the line number should be printed with 4 digit space. 
    # if the line number doesn't have 4 digits, use 0 to pad, and > indicates right alignment.

fh.close()


### JSON File

**JavaScript Object Notation (JSON) is a increasingly popular data interchange format.** There are a variety of specialty derivatives of JSON, such as GeoJSON.  JSON objects are constructed of key => value pairs, written like this: `{ "key1" : "value1", "key2" : "value2" }`. Values can also be lists of objects in the form of arrays, where the `{...}` represents an object in the array : `[ {...}, {...} ]`. This data format is pretty close to Python dictionary. 
 
In the following exercise, we will be reading data from a CSV and writing to a JavaScript Object Notation (JSON) file:



In [None]:
import csv
import json

filepath = "/dsa/data/all_datasets/SyriaIDPSites2015LateJunHIUDoS.csv"

# Open an output file, in write 'w' mode
jsonfile = open('mydata/files/myjson.json', 'w')


# Read File Data, using the dictionary reader
# Define the field names.  Look into CSV library deeper for details
fieldnames = ('Description', 'Country', 'ADM1', 'ADM2', 'ADM3', 'ADM4', 'Latitude', 'Longitude', 'Name', 
              'pcode', 'fips', 'iso_alpha2', 'iso_alpha3', 'iso_num', 'stanag', 'tld')

fh = open(filepath, encoding='latin-1')
file_data = csv.DictReader(fh, fieldnames)

# Now transform
for data_row in file_data:
    json.dump(data_row, jsonfile)
    jsonfile.write('\n')
    
jsonfile.close()
fh.close()

In [None]:
# read the above json file
import json

with open('mydata/files/myjson.json', encoding='latin-1') as fh:
    for line in fh:
        file_data = json.loads(line)
        print(file_data)

JSON data are very similar to Python data when Python data are printed. Something to remember is the equivalent names between JSON and Python

**JSON Object = Python (Dict) Dictionary, i.e., *name - value* pairs**

  * Read more about Dict here : https://docs.python.org/3.3/library/stdtypes.html#dict

**JSON Array = Python list**

  * Read more about List here : https://docs.python.org/3.3/library/stdtypes.html#lists

A common operation is to produce JSON formatted files from other data, such as CSV files
