# File I/O and a few more notes about string formatting, along with some NumPy for binary files and the os module for dir info
* How to read/write different file types - .txt, .csv, json

[useful link to Python Software Foundation doc](https://docs.python.org/3/tutorial/inputoutput.html)

[link to colab specific file I/O info](https://colab.research.google.com/notebooks/io.ipynb#scrollTo=p2E4EKhCWEC5)

## Quick notes on string formatting because we'll use this to generate some data files to read/write
* We discussed the str.format() method a few weeks ago
* Remember that we pass in the index of the argument that we want to insert into each { } 
* The second number determines how many places we want to set aside for spacing the output. 
* Using this convention its easy to make nice, neat output using print or writing to a file

In [0]:
for x in range(0, 11):
  print('{0:2d} {1:4d} {2:5d}'.format(x, 2**x, 3**x))

 0    1     1
 1    2     3
 2    4     9
 3    8    27
 4   16    81
 5   32   243
 6   64   729
 7  128  2187
 8  256  6561
 9  512 19683
10 1024 59049


## Quick note about file I/O on Google Colab
* Usually when we write a new file on a local install of Python (running e.g. Jupyter), the file will write directly to the current working directory
* However, on Google Colab it will write to you 'content' folder, which can only be mounted in the virtual machine via a special authentication process
* You can do this if you'd like, but I'm not sure I'd suggest it just for the purpose of this class because it will open all your drive files to the google file stream service - its probably fine, but its a decision you might want to investigate a bit more on your own. 
* If you ever do want to do this, use the code in the cell below - it will generate an authentication link and then you'll be able to mount the content drive and you can access google drive just like a local hard drive. 

In [0]:
# read the text cell above FIRST before running this code
# from google.colab import drive
# drive.mount('/content/drive')

# %ls #line magic to list out the contents of the current working directory. 

## Open a file
* open() creates a file object, and usually takes two arguments - a filename and the read mode

* The first argument is the filename. The second argument describes how the file will be used - read mode ('r'), write mode('w'), read/write mode ('r+') or append mode ('a'). 
    * read mode 'r' will be assumed if the second arg is omitted
    
* By default, files are opened in text mode, so you're reading and writing strings to the file. 

* Binary mode is enabled by appending 'b' to the read/write/append arg (e.g. 'rb' is read binary).

* In binary mode, you're reading/writing in units of bytes - this will often be the case for non txt files like image files and so forth

In [0]:
# for writing to a txt file
# 'w' will overwrite the file with that name!
f = open('test.txt', 'w')
f.close()

In [0]:
# for reading
f = open('test.txt', 'r')
f.close()

In [0]:
# for appending
f = open('test.txt', 'a')
f.close()

In [0]:
# for reading or writing
f = open('test.txt', 'r+')
f.close()

In [0]:
# for writing binary file
f = open('test', 'wb')
f.close()

### An alternate to mounting the drive locally...this will open a download dialog box that you can use to download any text files that you create if you want to view them

In [0]:
from google.colab import files
# of course, thus far its an empty file because all we did was create it and
# then close it. 
files.download('test.txt')

### Now lets try it out by actually writing something to the text files. 

In [0]:
f = open('test.txt', 'w')
for x in range(0, 11):
  # include the \n newline character - the text file will need that specified
  # to properly know what line to put the text on
  f.write('{0:2d} {1:4d} {2:5d}\n'.format(x, 2**x, 3**x))
  
f.close()

# download and take a look!
# NOTE - if you're running windows then use WordPad instead of Notepad - Notepad
# ignores newline chars
files.download('test.txt')

In [0]:
# QUESTION: what happens if you don't close it? 

In [0]:
# a better way...this will ensure that the file is properly closed when you're 
# done dealing with it (as many errors are caused by failing to close a file after open)
with open('test.txt', 'w') as f:
  for x in range(0, 11):
    # include the \n newline character
    f.write('{0:2d} {1:4d} {2:5d}\n'.format(x, 2**x, 3**x))

#confirm that its closed
print(f.closed)
files.download('test.txt')

True


### The 'read' method - f.read(size)
* Will read in **size** data from the file, where size is in terms of text or in terms of bytes (for binary read, more on that later)
* If you leave this blank, then it will read the entire file. That can be very problematic if the file is REALLY big and explodes your computer. 

In [0]:
# open our file for reading...
with open('test.txt', 'r') as f:
  # go ahead and read the entire file...
  out = f.read()
    
# print it out
print(out)

 0    1     1
 1    2     3
 2    4     9
 3    8    27
 4   16    81
 5   32   243
 6   64   729
 7  128  2187
 8  256  6561
 9  512 19683
10 1024 59049



In [0]:
# open our file for reading...just grab 15 elements and print
with open('test.txt', 'r') as f:
  
  out = f.read(15)
    
# print it out
print(out)

 0    1     1
 


In [0]:
# a better way to read a line of text
# open our file for reading...
with open('test.txt', 'r') as f:
  # read a line
  out = f.readline()
    
# print it out
print(out)

 0    1     1



In [0]:
# loop line by line and print out...
with open('test.txt', 'r') as f:
  # loop over all lines
  for line in f:
    print(line, end='')

 0    1     1
 1    2     3
 2    4     9
 3    8    27
 4   16    81
 5   32   243
 6   64   729
 7  128  2187
 8  256  6561
 9  512 19683
10 1024 59049


### Append mode

In [0]:
# open our test.txt file and append to it - will just pick up where you left off!
with open('test.txt', 'a') as f:
  for x in range(0, 11):
    # include the \n newline character
    f.write('{0:2d} {1:4d} {2:5d}\n'.format(x, 2*x, 3*x))

#confirm that its closed
print(f.closed)
files.download('test.txt')

True


### Read/write mode...
* when you open in r+ mode, you the stream is positioned at the start of the file

* when you open in w+ mode, the file is created if it does not exist and otherwise its truncated and the stream starts at the begining

In [0]:
# first remove our test file so that we and we don't fill up drive 
# with a bunch of randomly named files. Use the os module
# that contains many handy methods, esp for figuring out directory structure in 
# this context. 
import os

if os.path.exists("test.txt"):
  os.remove("test.txt")
else:
  print("The file does not exist")

# set up an empty list to store the info read from file
output = []

# open a file for reading and writing
with open('test.txt', 'w+') as f:
  for x in range(0, 11):
    # include the \n newline character
    f.write('{0:2d} {1:4d} {2:5d}\n'.format(x, 2**x, 3**x))

    
  # figure out where we are in the file when we're done writing
  print(f.tell())  
    
  # before reading, we need to 'rewind' or seek back to the
  # top of the file
  f.seek(0)
  
  # now actually do the reading!
  # instead of fread you can use the list function to a return a list of the 
  # file contents!
  output = list(f)
  
# now the file should be closed...
# print out a line...
print(output[3])

# note the newline character at the end of each line!
# 
# files.download('test.txt')

154
 4   16    81



### What happens when you try to write an integer to a text file?

In [0]:
# open a file for writing
with open('test.txt', 'w') as f:
  for x in range(0, 11):
    f.write(x)
    

## Binary file I/O
* So far we've just been dealing with text files where everything is a string (of characters)
* Binary files are written in "machine language" that is denser and easier to interpret (for the machine, not for you!)

In [0]:
# open a file for writing binary
with open('test', 'wb') as f:
  # generate a list of numbers, use bytearray to convert
  # numbers over the range 0:255 to binary format
  bytes_to_write = bytearray([0,1,2,3,4,5])

  # write to file!
  f.write(bytes_to_write)

# have a look!
files.download('test')

In [0]:
# now read it back in
with open('test', 'rb') as f:
  bytes_read = f.read()

# notice that f.read() returns the byte array as a string
print(bytes_read)

b'\x00\x01\x02\x03\x04\x05'


In [0]:
# can use numpy to make it more usable and read as numbers
import numpy as np

# now read it back in - note tht your HAVE to know the data type!
with open('test', 'rb') as f:
  bytes_read = np.fromfile(f, dtype=np.int8)
 # bytes_read = np.fromfile(f, dtype=np.int16)
  
print(bytes_read)

[ 256  770 1284]


## JSON (JavaScript Object Notation) format
* straightforward and standardized way of storing and exchanging data files
* kind of like a csv or a txt file in nature, but more sophisticated
* developed as a way of tranferring JavaScript objects between browsers and servers, but now frequently used for all types of data and languages
* takes one of several data formats: 
  * objects (like dicts)
  * arrays (like lists)
  * values (string in double quotes or a number)
  * strings (sequence of characters)

[link to main page](http://json.org/)

In [0]:
# import json module
import json

# build a dictionary with a bunch of different data types, including a sub list
# of dictionaries
user_profile = {
  "name": "John",
  "age": 30,
  "kids": True,
  "pets": None,
  "bicycles": [
    {"type": "Road", "Make": "Giant"},
    {"type": "Mountain", "Make": "Cannondale"}
  ]
}

print(json.dumps(user_profile))

{"name": "John", "age": 30, "kids": true, "pets": null, "bicycles": [{"type": "Road", "Make": "Giant"}, {"type": "Mountain", "Make": "Cannondale"}]}


### Now write a .json file to disk - very similar to file creating/writing that we did above

In [0]:
import json
with open('test.json', 'w') as outfile:
  json.dump(user_profile, outfile)
  
files.download('test.json')

In [0]:
with open('test.json', 'r') as outfile:
  x = json.load(outfile)
  
# and you get back a dictionary!
x['bicycles'][1]['Make']

'Cannondale'