### Reading a tabular data file into Pandas

In [1]:
import pandas as pd
%matplotlib inline

In [3]:
orders = pd.read_table('http://bit.ly/chiporders')
orders.head()

Unnamed: 0,order_id,quantity,item_name,choice_description,item_price
0,1,1,Chips and Fresh Tomato Salsa,,$2.39
1,1,1,Izze,[Clementine],$3.39
2,1,1,Nantucket Nectar,[Apple],$3.39
3,1,1,Chips and Tomatillo-Green Chili Salsa,,$2.39
4,2,2,Chicken Bowl,"[Tomatillo-Red Chili Salsa (Hot), [Black Beans...",$16.98


This is because the data is organized with default values as in pandas.read_table

Let's' use a data file that is not formatted as per default parameters in read_table and see what happens.

In [5]:
mov_1 = pd.read_table('http://bit.ly/movieusers')
mov_1.head()

Unnamed: 0,1|24|M|technician|85711
0,2|53|F|other|94043
1,3|23|M|writer|32067
2,4|24|M|technician|43537
3,5|33|F|other|15213
4,6|42|M|executive|98101


So, the data does not show as we would have liked. Let's use the parameters in read_table to inform pandas about the separator and header.

Notice that I have used head() at the end to print only the first five rows.

In [8]:
pd.read_table('http://bit.ly/movieusers', sep = '|', header = None).head()

Unnamed: 0,0,1,2,3,4
0,1,24,M,technician,85711
1,2,53,F,other,94043
2,3,23,M,writer,32067
3,4,24,M,technician,43537
4,5,33,F,other,15213


This looks better.

Let's give the columns a name.

In [9]:
user_cols = ['user', 'age', 'gender', 'ocupation', 'zip_code']
users = pd.read_table('http://bit.ly/movieusers', sep = '|', header = None, names = user_cols)
users.head()

Unnamed: 0,user,age,gender,ocupation,zip_code
0,1,24,M,technician,85711
1,2,53,F,other,94043
2,3,23,M,writer,32067
3,4,24,M,technician,43537
4,5,33,F,other,15213


This looks perfect.

Additional Information:

skiprows and skipfooter parameter in read_table can be used to skip few texts that maybe used in the file we are trying to use to fetch the data.