### Working with formatted data

* we often need to keep the information in some data format
* we also need a way to access the in my application.
* while a simple file i/o may work it will required addtional works


### Reading a CSV file.

* CSV is a comma spearated values
* generally contains first row with field names.

In [1]:
%%file books.csv
title,author,price,rating
The Accursed God, Vivek Dutta Mishra, 299, 4.6
Rashmirathi, Ramdhari Singh Dinkar, 109, 4.8
Kurukshetra,Ramdhari Singh Dinkar, 99, 4.1
Manas, Vivek Dutta Mishra,199, 4.2
Asura, Anant Neelkanthan,499,3.6
 

Writing books.csv


#### We can read and use the data without any additional module

In [4]:
def read_csv(file):
    rows=[]
    
    with open(file) as file:
        for line in file.readlines():
            rows.append( line.strip().split(','))
    
    return {
        'header': rows[0],
        'data': rows[1:]
    }

In [5]:
books= read_csv('books.csv')
print(books)

{'header': ['title', 'author', 'price', 'rating'], 'data': [['The Accursed God', ' Vivek Dutta Mishra', ' 299', ' 4.6'], ['Rashmirathi', ' Ramdhari Singh Dinkar', ' 109', ' 4.8'], ['Kurukshetra', 'Ramdhari Singh Dinkar', ' 99', ' 4.1'], ['Manas', ' Vivek Dutta Mishra', '199', ' 4.2'], ['Asura', ' Anant Neelkanthan', '499', '3.6'], ['']]}


In [7]:
for book in books["data"]:
    for i in range(len(book)):
        print(books["header"][i],book[i])
    print()

title The Accursed God
author  Vivek Dutta Mishra
price  299
rating  4.6

title Rashmirathi
author  Ramdhari Singh Dinkar
price  109
rating  4.8

title Kurukshetra
author Ramdhari Singh Dinkar
price  99
rating  4.1

title Manas
author  Vivek Dutta Mishra
price 199
rating  4.2

title Asura
author  Anant Neelkanthan
price 499
rating 3.6

title 



#### But a specialized library can help us access it more properly.

* python support csv module.

In [8]:
import csv

In [9]:
print(dir(csv))

['Dialect', 'DictReader', 'DictWriter', 'Error', 'QUOTE_ALL', 'QUOTE_MINIMAL', 'QUOTE_NONE', 'QUOTE_NONNUMERIC', 'QUOTE_NOTNULL', 'QUOTE_STRINGS', 'Sniffer', 'StringIO', '_Dialect', '__all__', '__builtins__', '__cached__', '__doc__', '__file__', '__loader__', '__name__', '__package__', '__spec__', '__version__', 'excel', 'excel_tab', 'field_size_limit', 'get_dialect', 'list_dialects', 're', 'reader', 'register_dialect', 'types', 'unix_dialect', 'unregister_dialect', 'writer']


In [10]:
help(csv.reader)

Help on built-in function reader in module _csv:

reader(...)
    csv_reader = reader(iterable [, dialect='excel']
                            [optional keyword args])
        for row in csv_reader:
            process(row)

    The "iterable" argument can be any object that returns a line
    of input for each iteration, such as a file object or a list.  The
    optional "dialect" parameter is discussed below.  The function
    also accepts optional keyword arguments which override settings
    provided by the dialect.

    The returned object is an iterator.  Each iteration returns a row
    of the CSV file (which can span multiple input lines).



In [12]:
def read_csv(path):
    with open(path) as file:
        reader=csv.reader(file)
        for row in reader:
            print(row)

In [13]:
read_csv('books.csv')

['title', 'author', 'price', 'rating']
['The Accursed God', ' Vivek Dutta Mishra', ' 299', ' 4.6']
['Rashmirathi', ' Ramdhari Singh Dinkar', ' 109', ' 4.8']
['Kurukshetra', 'Ramdhari Singh Dinkar', ' 99', ' 4.1']
['Manas', ' Vivek Dutta Mishra', '199', ' 4.2']
['Asura', ' Anant Neelkanthan', '499', '3.6']
[' ']


### Advantages of csv over plain file

* we don't need to split or strip
* it can identify and work with different delimeters.
* sometimes we use other delimeter than comma.

In [16]:
def copy_csv(source, target, source_delimeter=',', target_delimeter=','):
    with open(source) as sfile:
        with open(target, 'w') as tfile:
            reader=csv.reader(sfile, delimiter= source_delimeter)
            writer=csv.writer(tfile, delimiter= target_delimeter)
            for data in reader:
                writer.writerow(data)

In [17]:
copy_csv('books.csv','books2.csv',target_delimeter="|")

### Csv also has dict reader and writer that can help us read the data in dictionary

In [28]:
def read_csv(file, delimiter=","):
    with open(file) as file:
        lines = file.readlines()

        fieldNames= lines[0].strip().split(delimiter)

        reader=csv.DictReader(lines[1:],delimiter=delimiter,fieldnames=fieldNames)
        items=[]
        for data in reader:
            items.append(data)

        return items

In [29]:
books=read_csv('books.csv')
print(books)

[{'title': 'The Accursed God', 'author': ' Vivek Dutta Mishra', 'price': ' 299', 'rating': ' 4.6'}, {'title': 'Rashmirathi', 'author': ' Ramdhari Singh Dinkar', 'price': ' 109', 'rating': ' 4.8'}, {'title': 'Kurukshetra', 'author': 'Ramdhari Singh Dinkar', 'price': ' 99', 'rating': ' 4.1'}, {'title': 'Manas', 'author': ' Vivek Dutta Mishra', 'price': '199', 'rating': ' 4.2'}, {'title': 'Asura', 'author': ' Anant Neelkanthan', 'price': '499', 'rating': '3.6'}]


In [30]:
for book in books:
    for key,value in book.items():
        print(key,value)
    print()

title The Accursed God
author  Vivek Dutta Mishra
price  299
rating  4.6

title Rashmirathi
author  Ramdhari Singh Dinkar
price  109
rating  4.8

title Kurukshetra
author Ramdhari Singh Dinkar
price  99
rating  4.1

title Manas
author  Vivek Dutta Mishra
price 199
rating  4.2

title Asura
author  Anant Neelkanthan
price 499
rating 3.6



#### JSON file

* in current programming JSON is perhaps one of the most important data exchange format.
* it is important to be able to access and read json files.
* pyhton provides a module **json** just to work with json files.
* It allows us to 
    * read json as python object 
        * by default reads as list and dict
    * write python list and dictionary as json

* There are 4 important functions

#### read() and reads()

* both reads json data and saves in python object
* read reads from a file
* reads reads from a string


##### write and writes()

* both writes pyhton objects as json
* write writes to a file
* writes writes to a string.



In [31]:
import json

#### I can convert my books array of dictionary to json string

In [34]:
json_str= json.dumps(books,indent=4)
print(json_str)

[
    {
        "title": "The Accursed God",
        "author": " Vivek Dutta Mishra",
        "price": " 299",
        "rating": " 4.6"
    },
    {
        "title": "Rashmirathi",
        "author": " Ramdhari Singh Dinkar",
        "price": " 109",
        "rating": " 4.8"
    },
    {
        "title": "Kurukshetra",
        "author": "Ramdhari Singh Dinkar",
        "price": " 99",
        "rating": " 4.1"
    },
    {
        "title": "Manas",
        "author": " Vivek Dutta Mishra",
        "price": "199",
        "rating": " 4.2"
    },
    {
        "title": "Asura",
        "author": " Anant Neelkanthan",
        "price": "499",
        "rating": "3.6"
    }
]


### Now this informatin can be saved using startndard file.write

In [37]:
def save_json(data, path):
    with open(path, "w") as file:
        json.dump(data,file,indent=4)

In [38]:
save_json(books, 'books.json')

### Reading the JSON file

In [39]:
def read_json(path):
    with open(path) as file:
        data = json.load(file)
        return data

In [40]:
books2= read_json('books.json')


In [41]:
print(books2)

[{'title': 'The Accursed God', 'author': ' Vivek Dutta Mishra', 'price': ' 299', 'rating': ' 4.6'}, {'title': 'Rashmirathi', 'author': ' Ramdhari Singh Dinkar', 'price': ' 109', 'rating': ' 4.8'}, {'title': 'Kurukshetra', 'author': 'Ramdhari Singh Dinkar', 'price': ' 99', 'rating': ' 4.1'}, {'title': 'Manas', 'author': ' Vivek Dutta Mishra', 'price': '199', 'rating': ' 4.2'}, {'title': 'Asura', 'author': ' Anant Neelkanthan', 'price': '499', 'rating': '3.6'}]


### json can't work with user defined object

* json can work with list, dict kind primitive objects 
* but it can't work with user defined objects
* it doesn't know how to use them.


In [42]:
import sys
sys.path.append('../libs')
import books as b

In [44]:
books= b.get_books()
b.print_books(books,"All Books")

                                          All Books                                          

---------------------------------------------------------------------------------------------
|           Title           |           Author          |      Price      |      Rating     |
---------------------------------------------------------------------------------------------
|     The Acccursed God     |     Vivek Dutta Mishra    |           299   |           4.6   |
|        Rashmirathi        |   Ramdhari Singh Dinkar   |           109   |           4.8   |
|           Asura           |        Neelkanthan        |           499   |           3.6   |
|           Manas           |     Vivek Dutta Mishra    |           199   |           4.5   |
|  One Night at Call Center |       Chetan Bhagat       |           399   |           3.9   |
|        Kuruksehtra        |   Ramdhari Singh Dinkar   |            99   |           4.1   |
-----------------------------------------------------------

#### We can directory save this data to JSON (or even CSV)

In [46]:
json_str= json.dumps(books,indent=4)

TypeError: Object of type Book is not JSON serializable

### A python Feature

* python store all attributes of an object in  a special dictionary inside the object
* this dictionary is __dict__ 

In [47]:
class Triangle:
    pass

t=Triangle()
t.__dict__

{}

In [48]:
t.s1=3
t.s2=4
t.s3=5

t.__dict__

{'s1': 3, 's2': 4, 's3': 5}

In [49]:
books[0].__dict__

{'title': 'The Acccursed God',
 'author': 'Vivek Dutta Mishra',
 'price': 299,
 'rating': 4.6}

### We can use this features to save our python object to json

In [50]:
def save_object_list(objects, path):
    with open(path, 'w') as file:
        data=[ obj.__dict__ for obj in objects ]
        json.dump(data,file, indent=4)

In [51]:
save_object_list(books, 'books2.json')