# Data Loading, Storage, and File Formats

Input and output typically falls into a few main categories: reading text files and other more efficient on-disk formats, loading data from databases, and interacting with network sources like web APIs.

## Reading and Writing Data in Text Format

pandas features a number of functions for reading tabular data as a DataFrame object. __read_csv__ and __read_table__ are likely the ones you'll use the most.

Because of how messy data in the real world can be, some of the data loading functions (especially __read_csv__) habe grown very complex in their options over time. It's normal to feel overwhelmed by the number of different parameters(__read_csv__ has over 50)

In [1]:
import pandas as pd

In [2]:
df = pd.read_csv('../examples/ex1.csv')

In [3]:
df

Unnamed: 0,a,b,c,d,message
0,1,2,3,4,hello
1,5,6,7,8,world
2,9,10,11,12,foo


In [4]:
pd.read_table('../examples/ex1.csv', sep=',')

Unnamed: 0,a,b,c,d,message
0,1,2,3,4,hello
1,5,6,7,8,world
2,9,10,11,12,foo


File without a header row:

In [8]:
pd.read_csv('../examples/ex2.csv', header=None)

Unnamed: 0,0,1,2,3,4
0,1,2,3,4,hello
1,5,6,7,8,world
2,9,10,11,12,foo


In [9]:
pd.read_csv('../examples/ex2.csv', names=['a','b','c','d','message'])

Unnamed: 0,a,b,c,d,message
0,1,2,3,4,hello
1,5,6,7,8,world
2,9,10,11,12,foo


In [10]:
names = ['a','b','c','d','message']

In [11]:
pd.read_csv('../examples/ex2.csv', names=names, index_col='message')

Unnamed: 0_level_0,a,b,c,d
message,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
hello,1,2,3,4
world,5,6,7,8
foo,9,10,11,12


In [13]:
parsed = pd.read_csv('../examples/csv_mindex.csv', 
                    index_col=['key1','key2'])

In [14]:
parsed

Unnamed: 0_level_0,Unnamed: 1_level_0,value1,value2
key1,key2,Unnamed: 2_level_1,Unnamed: 3_level_1
one,a,1,2
one,b,3,4
one,c,5,6
one,d,7,8
two,a,9,10
two,b,11,12
two,c,13,14
two,d,15,16


If no comma:

In [15]:
result = pd.read_table('../examples/ex3.txt', sep='\s+') # \s+ indicates whitespace

In [16]:
result

Unnamed: 0,A,B,C
aaa,-0.264438,-1.026059,-0.6195
bbb,0.927272,0.302904,-0.032399
ccc,-0.264273,-0.386314,-0.217601
ddd,-0.871858,-0.348382,1.100491


In [18]:
pd.read_csv('../examples/ex4.csv', skiprows=[0,2,3])

Unnamed: 0,a,b,c,d,message
0,1,2,3,4,hello
1,5,6,7,8,world
2,9,10,11,12,foo


In [19]:
result = pd.read_csv('../examples/ex5.csv')

In [20]:
result

Unnamed: 0,something,a,b,c,d,message
0,one,1,2,3.0,4,
1,two,5,6,,8,world
2,three,9,10,11.0,12,foo


In [21]:
pd.isnull(result)

Unnamed: 0,something,a,b,c,d,message
0,False,False,False,False,False,True
1,False,False,False,True,False,False
2,False,False,False,False,False,False


In [22]:
result = pd.read_csv('../examples/ex5.csv', na_values=['NULL'])

In [23]:
result

Unnamed: 0,something,a,b,c,d,message
0,one,1,2,3.0,4,
1,two,5,6,,8,world
2,three,9,10,11.0,12,foo


In [24]:
sentinels = {'message': ['foo','NA'], 'something': ['two']}

In [25]:
pd.read_csv('../examples/ex5.csv', na_values=sentinels)

Unnamed: 0,something,a,b,c,d,message
0,one,1,2,3.0,4,
1,,5,6,,8,world
2,three,9,10,11.0,12,


### Reading Text Files in Pieces

### Writing Data to Text Format

### Working with Delimited Formats

### XML and HTML: Web Scraping

## Binary Data Formats

### Using HDF5 Format

### Reading Microsoft Excel Files

## Interacting with Web APIs

## Interacting with Databases