## Reading in excel files
The pd.read_csv() method is used to retrieve data from a CSV file and formats that data into a Pandas data frame. This method is similar to read_excel(), except excel files have multiple sheets and CSV files only have one sheet. The method accepts the following arguements:
* filepath_or_buffer: the location of the file, and the only obligatory arguement. 
* sep: separator, default is a comma (', ') for comma seperated values (CSV).
* header: The row number that contains the column headers.
* usecols: Accepts a list of strings that refer to the column names to be used so that one can select specific columns.
* nrows: Accepts an integer that refers to how many rows are used from the CSV file.
* index_col: Used to assign column names to the index
* skiprows: Accepts a range of rows to include in the dataset if they were previously left out in the inital read. 

In [None]:
import pandas as pd

The code below shows an example of the read_csv() method with only the one obligatory arguement and the default parameters. 

In [38]:
pd.read_csv('hw05_prob_01.csv')

Unnamed: 0,A,B,C,x
0,a1,b0,C1,-6.240387
1,a1,b0,C1,-4.860661
2,a1,b0,C1,-5.066712
3,a1,b0,C1,-5.635156
4,a1,b0,C1,-4.893876
...,...,...,...,...
125,a2,b1,C2,1.155377
126,a2,b1,C2,2.818466
127,a2,b1,C2,3.294759
128,a2,b1,C2,2.398190


Below, setting the index_col to 0 makes the zeroth column (A) the index. 

In [37]:
pd.read_csv('hw05_prob_01.csv', index_col = 0)

Unnamed: 0_level_0,B,C,x
A,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
a1,b0,C1,-6.240387
a1,b0,C1,-4.860661
a1,b0,C1,-5.066712
a1,b0,C1,-5.635156
a1,b0,C1,-4.893876
...,...,...,...
a2,b1,C2,1.155377
a2,b1,C2,2.818466
a2,b1,C2,3.294759
a2,b1,C2,2.398190


You can also limit the number of rows included. Below, only the first 5 rows are included.

In [39]:
pd.read_csv('hw05_prob_01.csv', nrows = 5)

Unnamed: 0,A,B,C,x
0,a1,b0,C1,-6.240387
1,a1,b0,C1,-4.860661
2,a1,b0,C1,-5.066712
3,a1,b0,C1,-5.635156
4,a1,b0,C1,-4.893876


You can also exclude rows. The skiprows parameter accepts an integer that refers to the first n rows that will be skipped. However, you must set the header to None when using this method, or the first row of values will be assigned to the column headers.

In [44]:
pd.read_csv('hw05_prob_01.csv', skiprows=2,header=None)

Unnamed: 0,a1,b0,C1,-4.8606610695201224
0,a1,b0,C1,-5.066712
1,a1,b0,C1,-5.635156
2,a1,b0,C1,-4.893876
3,a1,b0,C1,-6.626860
4,a1,b0,C1,-5.162203
...,...,...,...,...
123,a2,b1,C2,1.155377
124,a2,b1,C2,2.818466
125,a2,b1,C2,3.294759
126,a2,b1,C2,2.398190


Skiprows can also accept a range of numbers to ensure that you don't skip the first row containing the column headers.

In [48]:
pd.read_csv('hw05_prob_01.csv', skiprows=[3,4],header=0)

Unnamed: 0,A,B,C,x
0,a1,b0,C1,-6.240387
1,a1,b0,C1,-4.860661
2,a1,b0,C1,-4.893876
3,a1,b0,C1,-6.626860
4,a1,b0,C1,-5.162203
...,...,...,...,...
123,a2,b1,C2,1.155377
124,a2,b1,C2,2.818466
125,a2,b1,C2,3.294759
126,a2,b1,C2,2.398190


In [49]:
pd.read_csv('hw05_prob_01.csv', skiprows=20,header=None)

Unnamed: 0,0,1,2,3
0,a1,b1,C1,-0.101313
1,a1,b1,C1,1.162981
2,a1,b1,C1,0.385787
3,a1,b1,C1,-0.655076
4,a1,b1,C1,-0.949339
...,...,...,...,...
106,a2,b1,C2,1.155377
107,a2,b1,C2,2.818466
108,a2,b1,C2,3.294759
109,a2,b1,C2,2.398190


### Question 1

True or False: In the method pd.read_csv(), the default value of the parameter 'header' is 1.
a) True
b) False

Answer: b) False, the default is 0.

### Question 2

You don't want to include the first 20 rows of your CSV file. Which parameter would be most appropiate to achieve that result?
a) nrows
b) index_col
c) skiprows
d) sep

Answer: c) skiprows