# CSV - Comma-separated values


https://tools.ietf.org/html/rfc4180.html  
https://en.wikipedia.org/wiki/Comma-separated_values

## Simple python code
`sample1.csv`

Contents:

    Title,Release Date,Director
    And Now For Something Completely Different,1971,Ian MacNaughton
    Monty Python And The Holy Grail,1975,Terry Gilliam and Terry Jones
    Monty Python's Life Of Brian,1979,Terry Jones
    Monty Python Live At The Hollywood Bowl,1982,Terry Hughes
    Monty Python's The Meaning Of Life,1983,Terry Jones

In [None]:
with open('sample1.csv') as f:
    content = f.readlines()

content

In [None]:
header = content[0]
header

In [None]:
#usuwa znaki specjalne
header = header.strip()
header

In [None]:
header = header.split(',')
header

In [None]:
row = content[1]
row = row.strip().split(',')
row

In [None]:
rows = []
for row in content[1:]:
    row = row.strip().split(',')
    rows.append(row)
    
rows

In [None]:
rows = [row.strip().split(',') for row in content[1:]]
    
rows

__Parsing line by line__

In [None]:
with open('sample1.csv') as f:
    rows = [line.strip().split(',') for line in f] #dla kazdej linii w pliku

In [None]:
rows

## CSV reader

https://docs.python.org/3/library/csv.html

In [None]:
import csv

### reader
`csv.reader(csvfile, dialect='excel', **fmtparams)`

---
__Footnotes__  
(1, 2) If `newline=''` is not specified, newlines embedded inside quoted fields will not be interpreted correctly, and on platforms that use `\r\n` linendings on write an extra `\r` will be added. It should always be safe to specify `newline=''`, since the csv module does its own (universal) newline handling.

In [None]:
with open('sample1.csv', newline='') as csvfile:
    csv_reader = csv.reader(csvfile, delimiter=',')
    rows = [row for row in csv_reader]
    
rows

In [None]:
with open('sample1.csv', newline='') as f:
    reader = csv.reader(f)
    try:
        for row in reader:
            print(row)
    except csv.Error as e:
        sys.exit('file {}, line {}: {}'.format(filename, reader.line_num, e))

### DictReader

`class csv.DictReader(csvfile, fieldnames=None, restkey=None, restval=None, dialect='excel', *args, **kwds)`

In [None]:
with open('sample1.csv') as csvfile:
    rows = [row for row in csv.DictReader(csvfile)]  # For tests set some wrong delimiter
    
rows

In [None]:
rows[0]['Director']

## More complicated file

`sample2.csv`  

Contents:

    John,Doe,120 jefferson st.,Riverside, NJ, 08075
    Jack,McGinnis,220 hobo Av.,Phila, PA,09119
    "John ""Da Man""",Repici,120 Jefferson St.,Riverside, NJ,08075
    Stephen,Tyler,"7452 Terrace ""At the Plaza"" road",SomeTown,SD, 91234
    ,Blankman,,SomeTown, SD, 00298
    "Joan ""the bone"", Anne",Jet,"9th, at Terrace plc",Desert City,CO,00123

__Custom Python code__

In [None]:
with open('sample2.csv') as f:
    rows = [line.strip().split(',') for line in f]
    
rows

In [None]:
[len(row) for row in rows]

__CSV reader__

In [None]:
with open('sample2.csv') as csvfile:
    csv_reader = csv.reader(csvfile, delimiter=',')
    rows = [row for row in csv_reader]
    
rows

In [None]:
[len(row) for row in rows]

__DictReader__

In [None]:
with open('sample2.csv') as csvfile:
    #rows = [row for row in csv.DictReader(csvfile)]
    rows = [row for row in csv.DictReader(csvfile, fieldnames=['firstname', 'lastname', 'street', 'city', 'state', 'postal_code'])]
    
rows
#1wsza linijka potraktowana jako header

In [None]:
[len(row) for row in rows]

## Writing to CSV file

`class csv.DictWriter(csvfile, fieldnames, restval='', extrasaction='raise', dialect='excel', *args, **kwds)`

https://docs.python.org/3/library/csv.html#csv.DictWriter

In [None]:
rows[0].keys() #pobranie headera, wszystkich kluczy

In [None]:
fieldnames = list(rows[0].keys())
fieldnames

In [None]:
with open('sample2_1.csv', 'w', newline='') as csvfile:
    fieldnames = list(rows[0].keys())
    writer = csv.DictWriter(csvfile, fieldnames=fieldnames)  #, dialect='unix')  #, quoting=csv.QUOTE_MINIMAL)
    
    writer.writeheader()
    writer.writerows(rows)

In [None]:
csv.list_dialects()

## Loading and saving matrices

In [None]:
import numpy as np

In [None]:
a = np.random.random((10,10))
a

__Save matrix__

In [None]:
with open('sample_a.csv', 'w', newline='') as csvfile:
    writer = csv.writer(csvfile)
    writer.writerows(a)

__Load matrix__

In [None]:
with open('sample_a.csv', newline='') as csvfile:
    reader = csv.reader(csvfile)
    a_bis = []
    for row in reader:
        a_bis.append(row)
    a_bis = np.array(a_bis, np.float)

In [None]:
a_bis

In [None]:
a == a_bis

In [None]:
np.allclose(a, a_bis)

## CSV Exercises

### Loading data, filtering and saving

1. Load data from `sample2.csv`
2. Save rows with `state==NJ` to file `exercise_1_NJ.csv`
3. Save rows with `state==CO` to file `exercise_1_CO.csv`
4. Save rows with state different than `NJ` and `CO` to file `exercise_1_rest.csv`

### Loading and saving matrices
1. Load matrices from files `csv_exercise_2_a.csv` and `csv_exercise_2_b.csv`.  
2. Create matrix __c = a*b__ (matrix multiplication).  
3. Save matrix __c__ to file `csv_exercise_2_c.csv`

# JSON

http://json.org/  
https://docs.python.org/3/library/json.html

In [None]:
import json

In [None]:
print(json.dumps([1,2,3,{'4': 5, '6': 7}], separators=(',', ':')))

__Pretty printing__

In [None]:
print(json.dumps([1,2,3,{'4': 5, '6': 7}], separators=(',', ':'), indent=4))

In [None]:
print(json.dumps({'4': 5, '6': 7}, sort_keys=True, indent=4))

## Loading from file

In [None]:
with open('sample2.json') as f:
    sample2 = json.load(f)

sample2

## Saving to file

In [None]:
with open('sample2_bis.json', 'w') as f:
    f.write(json.dumps(rows))

In [None]:
with open('sample2_bis_pretty.json', 'w') as f:
    f.write(json.dumps(rows, sort_keys=True, indent=4))

## JSON Exercises

### Loading data, filtering and saving

1. Load data from `sample2.json`
2. Save rows with `state==NJ` to file `exercise_1_NJ.json`
3. Save rows with `state==CO` to file `exercise_1_CO.json`
4. Save rows with state different than `NJ` and `CO` to file `exercise_1_rest.json`

# Pickle


The pickle module implements binary protocols for serializing and de-serializing a Python object structure.  
“Pickling” is the process whereby a Python object hierarchy is converted into a byte stream, and “unpickling” is the inverse operation, whereby a byte stream (from a binary file or bytes-like object) is converted back into an object hierarchy.  
Pickling (and unpickling) is alternatively known as “serialization”, “marshalling,” [1] or “flattening”; however, to avoid confusion, the terms used here are “pickling” and “unpickling”.

https://docs.python.org/3/library/pickle.html

In [None]:
import pickle

## Unpickle file

`pickle.load(file, *, fix_imports=True, encoding='ASCII', errors='strict')`

In [None]:
with open('sample2.pickle', 'rb') as f:
    sample2_p = pickle.load(f)

In [None]:
sample2_p

In [None]:
sample2_p == sample2

## Pickle object

`pickle.dump(obj, file, protocol=None, *, fix_imports=True)`

In [None]:
with open('sample2_bis.pickle', 'wb') as f:
    pickle.dump(sample2, f)

__NumPy objects__

In [None]:
a = np.random.random((10,10))
a

In [None]:
with open('_numpy_a.pickle', 'wb') as f:
    pickle.dump(a, f)

In [None]:
with open('_numpy_a.pickle', 'rb') as f:
    a_bis = pickle.load(f)

In [None]:
a == a_bis

In [None]:
with open('_numpy_a2.pickle', 'wb') as f:
    a.dump(f)

In [None]:
with open('_numpy_a2.pickle', 'rb') as f:
    a_bis2 = np.load(f)

In [None]:
a_bis2