# Reading data from a csv file

In most cases, we’ll read data from a file. The most commonly used file formats are csv, excel, parquet, and json. In this course, we’ll use csv files. The pandas <font color ='green'>read_csv</font> function creates a <font color ='red'>DataFrame</font> from a csv file. Pandas provides several other functions for reading data in files with different formats, such as <font color ='green'>read_json, read_parquet, read_excel</font>, and so on.

The code below demonstrates how to use the <font color ='red'>read_csv</font> function. The only required parameter is the file path that tells the function where the file is located, relative to the current working directory. If the file is in the current working directory, then we simply need to give the file name to the function.

The <font color ='green'>head</font> method returns the first five rows of the <font color ='red'>DataFrame</font> to retrieve an overview of what the dataset looks like.



In [2]:
import pandas as pd

sales = pd.read_csv("sales.csv")

print(sales.head())

   product_code product_group  stock_qty    cost    price  last_week_sales  \
0          4187           PG2        498  420.76   569.91               13   
1          4195           PG2        473  545.64   712.41               16   
2          4204           PG2        968  640.42   854.91               22   
3          4219           PG2        241  869.69  1034.55               14   
4          4718           PG2       1401   12.54    26.59               50   

   last_month_sales  
0                58  
1                58  
2                88  
3                45  
4               285  


# Using **<font color='brown'>usecols</font>** parameter

There are several other parameters of the <font color ='red'>read_csv</font> function. For instance, we have the option to read only some of the columns from the csv file.

In [3]:
import pandas as pd

sales = pd.read_csv("sales.csv", usecols=["product_code","product_group","stock_qty"])

print(sales.head())

   product_code product_group  stock_qty
0          4187           PG2        498
1          4195           PG2        473
2          4204           PG2        968
3          4219           PG2        241
4          4718           PG2       1401


<font color ='red'>DataFrame</font> consists of rows and columns. Just like we can choose to read only some of the columns, the <font color ='red'>read_csv</font> function lets us limit the number of rows read by using the nrows parameter. It’s especially useful when working with large files.

Suppose we’ve a file with 10 million rows. For a quick analysis or exploration, we can set the <font color ='red'>nrows</font> parameter as 1000 to read only 1000 rows in the file.

In [7]:
import pandas as pd

nrows=100
sales = pd.read_csv("sales.csv", usecols=["product_code","product_group","stock_qty"]).nrows

print(sales)

AttributeError: 'DataFrame' object has no attribute 'nrows'