# Table of Contents
 <p><div class="lev1 toc-item"><a href="#Reading-Data-into-Pandas" data-toc-modified-id="Reading-Data-into-Pandas-1"><span class="toc-item-num">1&nbsp;&nbsp;</span>Reading Data into Pandas</a></div><div class="lev2 toc-item"><a href="#CSV" data-toc-modified-id="CSV-11"><span class="toc-item-num">1.1&nbsp;&nbsp;</span>CSV</a></div>

# Reading Data into Pandas
Pandas is a versitile package that is able to read and write various formats

- csv
- excel
- hdf
- sql
- json
- msgpack
- html
- gbq
- stata
- clipboard
- pickle

This lesson will be dealing with csv and excel, as they are the more common data structures to be imported

In [16]:
import pandas as pd

# let's look at the housing data:
data = '../../../data/data/housingNew.csv'

with open(data) as f:
    print(f.read())


"Neighborhood","Class","Units","YearBuilt","SqFt","Income","IncomePerSqFt","Expense","ExpensePerSqFt","NetIncome","Value","ValuePerSqFt","Boro"
"ELMHURST","RR-CONDOMINIUM",14,2006,8400,184800,22,57120,6.8,127680,868000,103.33,"Queens"
"OCEAN HILL","R2-CONDOMINIUM",40,1900,54600,683046,12.51,348353,6.38,334693,2242996,41.08,"Brooklyn"
"GREENWICH VILLAGE-WEST","R9-CONDOMINIUM",169,1925,100633,2727154,27.1,1106963,11,1620191,11693000,116.19,"Manhattan"
"MIDTOWN EAST","R4-CONDOMINIUM",100,1970,60900,2142462,35.18,673554,11.06,1468908,10864000,178.39,"Manhattan"
"WILLIAMSBURG-CENTRAL","R4-CONDOMINIUM",14,2004,25452,463735,18.22,190131,7.47,273604,1969004,77.36,"Brooklyn"
"KINGSBRIDGE/JEROME PARK","R4-CONDOMINIUM",53,2005,56512,1186752,21,534604,9.46,652148,4792003,84.8,"Bronx"
"HARLEM-CENTRAL","RR-CONDOMINIUM",123,2004,99275,2216811,22.33,764418,7.7,1452393,10503000,105.8,"Manhattan"
"TRIBECA","R4-CONDOMINIUM",107,1986,87479,3637377,41.58,1061120,12.13,2576257,19449002,222.33,"Manhattan"
"M

## CSV

In [17]:
# the basic way to read in a csv file
pd.read_csv(data)

Unnamed: 0,Neighborhood,Class,Units,YearBuilt,SqFt,Income,IncomePerSqFt,Expense,ExpensePerSqFt,NetIncome,Value,ValuePerSqFt,Boro
0.0,ELMHURST,RR-CONDOMINIUM,14,2006,8400,184800,22.00,57120,6.80,127680,868000,103.33,Queens
1.0,OCEAN HILL,R2-CONDOMINIUM,40,1900,54600,683046,12.51,348353,6.38,334693,2242996,41.08,Brooklyn
2.0,GREENWICH VILLAGE-WEST,R9-CONDOMINIUM,169,1925,100633,2727154,27.10,1106963,11.00,1620191,11693000,116.19,Manhattan
3.0,MIDTOWN EAST,R4-CONDOMINIUM,100,1970,60900,2142462,35.18,673554,11.06,1468908,10864000,178.39,Manhattan
4.0,WILLIAMSBURG-CENTRAL,R4-CONDOMINIUM,14,2004,25452,463735,18.22,190131,7.47,273604,1969004,77.36,Brooklyn
5.0,KINGSBRIDGE/JEROME PARK,R4-CONDOMINIUM,53,2005,56512,1186752,21.00,534604,9.46,652148,4792003,84.80,Bronx
6.0,HARLEM-CENTRAL,RR-CONDOMINIUM,123,2004,99275,2216811,22.33,764418,7.70,1452393,10503000,105.80,Manhattan
7.0,TRIBECA,R4-CONDOMINIUM,107,1986,87479,3637377,41.58,1061120,12.13,2576257,19449002,222.33,Manhattan
8.0,MORRISANIA/LONGWOOD,R9-CONDOMINIUM,110,2005,99240,1271264,12.81,691703,6.97,579561,3870000,39.00,Bronx
9.0,UPPER WEST SIDE (79-96),R4-CONDOMINIUM,22,1910,28436,974217,34.26,246256,8.66,727961,5393999,189.69,Manhattan


you can also use the 'index_col' parameter to specify the row indicies
it can take the column name or the column index of your data

For example:
index_col=0
index_col='date'

you can also use a list and create a hierarchical index

you can also use read_csv() to read in other deliminated files (i.e., tsv)

In [18]:
pd.read_csv(data, index_col=[0, 'Class'])

Unnamed: 0_level_0,Unnamed: 1_level_0,Units,YearBuilt,SqFt,Income,IncomePerSqFt,Expense,ExpensePerSqFt,NetIncome,Value,ValuePerSqFt,Boro
Neighborhood,Class,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1
ELMHURST,RR-CONDOMINIUM,14,2006,8400,184800,22.00,57120,6.80,127680,868000,103.33,Queens
OCEAN HILL,R2-CONDOMINIUM,40,1900,54600,683046,12.51,348353,6.38,334693,2242996,41.08,Brooklyn
GREENWICH VILLAGE-WEST,R9-CONDOMINIUM,169,1925,100633,2727154,27.10,1106963,11.00,1620191,11693000,116.19,Manhattan
MIDTOWN EAST,R4-CONDOMINIUM,100,1970,60900,2142462,35.18,673554,11.06,1468908,10864000,178.39,Manhattan
WILLIAMSBURG-CENTRAL,R4-CONDOMINIUM,14,2004,25452,463735,18.22,190131,7.47,273604,1969004,77.36,Brooklyn
KINGSBRIDGE/JEROME PARK,R4-CONDOMINIUM,53,2005,56512,1186752,21.00,534604,9.46,652148,4792003,84.80,Bronx
HARLEM-CENTRAL,RR-CONDOMINIUM,123,2004,99275,2216811,22.33,764418,7.70,1452393,10503000,105.80,Manhattan
TRIBECA,R4-CONDOMINIUM,107,1986,87479,3637377,41.58,1061120,12.13,2576257,19449002,222.33,Manhattan
MORRISANIA/LONGWOOD,R9-CONDOMINIUM,110,2005,99240,1271264,12.81,691703,6.97,579561,3870000,39.00,Bronx
UPPER WEST SIDE (79-96),R4-CONDOMINIUM,22,1910,28436,974217,34.26,246256,8.66,727961,5393999,189.69,Manhattan


In [None]:
## Excel