I/O
-----

This notebook explores how to load and save data in various formats.

### Text files

In [1]:
%%file ../data/animals.txt
name|species|age|weight
arun|cat|5|7.3
bob|bird|2|1.5
coco|cat|2|5.5
dumbo|elephant|23|454
elmo|dog|5|11
fido|dog|3|24.5
gumba|bird|2|2.7

Overwriting ../data/animals.txt


### Loading a text file

#### Using a generator expression to read one line at a time

Note the use of the `with` contxt manager - this automates the closing of the file resource once the `with` blcok is exited, avoiding leakage of system resources.

In [2]:
with open('../data/animals.txt') as f:
    for line in f:
        print(line, end='')

name|species|age|weight
arun|cat|5|7.3
bob|bird|2|1.5
coco|cat|2|5.5
dumbo|elephant|23|454
elmo|dog|5|11
fido|dog|3|24.5
gumba|bird|2|2.7

#### Reading into memory as a single string

In [3]:
with open('../data/animals.txt') as f:
    text = f.read()
print(text)

name|species|age|weight
arun|cat|5|7.3
bob|bird|2|1.5
coco|cat|2|5.5
dumbo|elephant|23|454
elmo|dog|5|11
fido|dog|3|24.5
gumba|bird|2|2.7


#### Reading into memory as a list of strings

In [4]:
with open('../data/animals.txt') as f:
    text = f.readlines()
print(text)

['name|species|age|weight\n', 'arun|cat|5|7.3\n', 'bob|bird|2|1.5\n', 'coco|cat|2|5.5\n', 'dumbo|elephant|23|454\n', 'elmo|dog|5|11\n', 'fido|dog|3|24.5\n', 'gumba|bird|2|2.7']


#### Tabular data can also be read with numpy or pandss

In [5]:
import numpy as np

In [6]:
np.loadtxt('../data/animals.txt', dtype='object', delimiter='|')

array([["b'name'", "b'species'", "b'age'", "b'weight'"],
       ["b'arun'", "b'cat'", "b'5'", "b'7.3'"],
       ["b'bob'", "b'bird'", "b'2'", "b'1.5'"],
       ["b'coco'", "b'cat'", "b'2'", "b'5.5'"],
       ["b'dumbo'", "b'elephant'", "b'23'", "b'454'"],
       ["b'elmo'", "b'dog'", "b'5'", "b'11'"],
       ["b'fido'", "b'dog'", "b'3'", "b'24.5'"],
       ["b'gumba'", "b'bird'", "b'2'", "b'2.7'"]], dtype=object)

In [7]:
import pandas as pd

In [8]:
pd.read_table('../data/animals.txt', sep='|')

Unnamed: 0,name,species,age,weight
0,arun,cat,5,7.3
1,bob,bird,2,1.5
2,coco,cat,2,5.5
3,dumbo,elephant,23,454.0
4,elmo,dog,5,11.0
5,fido,dog,3,24.5
6,gumba,bird,2,2.7


### Saving a text file

In [9]:
s = """
name|species|age|weight
arun|cat|5|7.3
bob|bird|2|1.5
coco|cat|2|5.5
dumbo|elephant|23|454
elmo|dog|5|11
fido|dog|3|24.5
gumba|bird|2|2.7
"""

In [10]:
with open('../data/animals2.txt', 'w') as f:
    f.write(s)

In [11]:
!cat '../data/animals2.txt'


name|species|age|weight
arun|cat|5|7.3
bob|bird|2|1.5
coco|cat|2|5.5
dumbo|elephant|23|454
elmo|dog|5|11
fido|dog|3|24.5
gumba|bird|2|2.7


### Web resources

### Reading an unformatted web page

In [12]:
import requests

In [13]:
# Only download once - Project Gutenburg will block you if you do this repeatedly

try:
    with open('../data/Ulysses.txt') as f:
        text = f.read()
except IOError:
    url = 'http://www.gutenberg.org/cache/epub/4300/pg4300.txt'
    resp = requests.get(url)
    text = resp.text
    with open('../data/Ulysses.txt', 'w') as f:
        f.write(text)

In [14]:
print(text[:1000])

﻿The Project Gutenberg EBook of Ulysses, by James Joyce

This eBook is for the use of anyone anywhere at no cost and with
almost no restrictions whatsoever.  You may copy it, give it away or
re-use it under the terms of the Project Gutenberg License included
with this eBook or online at www.gutenberg.org


Title: Ulysses

Author: James Joyce

Posting Date: August 1, 2008 [EBook #4300]
Release Date: July, 2003
[Last updated: November 17, 2011]

Language: English


*** START OF THIS PROJECT GUTENBERG EBOOK ULYSSES ***




Produced by Col Choat





ULYSSES

by James Joyce




-- I --

Stately, plump Buck Mulligan came from the stairhead, bearing a bowl of
lather on which a mirror and a razor lay crossed. A yellow dressinggown,
ungirdled, was sustained gently behind him on the mild morning air. He
held the bowl aloft and intoned:

--_Introibo ad altare Dei_.

Halted, he peered down the dark winding stairs and called out coarsely:

--Come up, Kinch! Come up, you fearful jesuit!

Solemnly h

### Getting a table from a URL

It might be necesary to install some packages before this works.

In [27]:
url = 'http://www.marketwatch.com/investing/stock/aapl/financials'
pd.read_html(url, match="Fiscal year is October-September. All values USD millions")[0]

Unnamed: 0,Fiscal year is October-September. All values USD millions.,2011,2012,2013,2014,2015,5-year trend
0,Sales/Revenue,108.6B,155.97B,170.87B,183.24B,231.28B,
1,Sales Growth,-,43.62%,9.55%,7.24%,26.22%,
2,Cost of Goods Sold (COGS) incl. D&A,64.08B,87.92B,107.24B,112.55B,142.26B,
3,COGS excluding D&A,62.26B,84.64B,100.48B,104.61B,131B,
4,Depreciation & Amortization Expense,1.81B,3.28B,6.76B,7.95B,11.26B,
5,Depreciation,1.62B,2.6B,5.8B,6.85B,9.96B,
6,Amortization of Intangibles,192M,677M,960M,1.1B,1.3B,
7,COGS Growth,-,37.21%,21.98%,4.96%,26.39%,
8,Gross Income,44.52B,68.06B,63.63B,70.69B,89.03B,
9,Gross Income Growth,-,52.86%,-6.51%,11.10%,25.94%,
