# Pandas DataFrame

CSV stands for Comma Separated Values, A popular way of representing and storing tabular, column oriented data in a persistent storage

In [3]:
import pandas as pd

my_dict = { 
     'name' : ["a", "b", "c", "d", "e","f", "g"],
     'age' : [20,27, 35, 55, 18, 21, 35],
     'designation': ["VP", "CEO", "CFO", "VP", "VP", "CEO", "MD"]
}       #creating out dataset for this time being 

In [4]:
#loading data
df = pd.DataFrame(my_dict)

In [5]:
df

Unnamed: 0,name,age,designation
0,a,20,VP
1,b,27,CEO
2,c,35,CFO
3,d,55,VP
4,e,18,VP
5,f,21,CEO
6,g,35,MD


# Persisting the DataFrame into a CSV file

Once we have the DataFrame, we can persist it in a CSV file on the local disk. Let’s first create our own CSV file using the data that is currently present in the DataFrame, we can store the data of this DataFrame in CSV format using the API called to_csv(...) of Pandas DataFrame as

In [6]:
df.to_csv('csv_example')

In [7]:
#Let’s go ahead and load the CSV file and create a new DataFrame out of it

df_csv = pd.read_csv('csv_example')

In [8]:
df_csv

Unnamed: 0.1,Unnamed: 0,name,age,designation
0,0,a,20,VP
1,1,b,27,CEO
2,2,c,35,CFO
3,3,d,55,VP
4,4,e,18,VP
5,5,f,21,CEO
6,6,g,35,MD


Here, the 'Unnamed' is generated automatically by Pandas while loading the CSV file.

This problem can be avoided by making sure that the writing of CSV files doesn’t write indexes, because DataFrame will generate it anyway. We can do the same by specifying index = False parameter in to_csv(...) function

In [9]:
df.to_csv('csv_example', index=False)

In [10]:
# Now, if we read the file as

df_csv = pd.read_csv('csv_example')

In [11]:
df_csv

Unnamed: 0,name,age,designation
0,a,20,VP
1,b,27,CEO
2,c,35,CFO
3,d,55,VP
4,e,18,VP
5,f,21,CEO
6,g,35,MD


# First row to column Header

In [20]:
df_csv = pd.read_csv('csv_example', header = 1)# by default ,header=0

In [21]:
df_csv

Unnamed: 0,a,20,VP
0,b,27,CEO
1,c,35,CFO
2,d,55,VP
3,e,18,VP
4,f,21,CEO
5,g,35,MD


In [22]:
#  we can also have more than one row as header as

df_csv = pd.read_csv('csv_example', header=[0,1,2])
df_csv

Unnamed: 0_level_0,name,age,designation
Unnamed: 0_level_1,a,20,VP
Unnamed: 0_level_2,b,27,CEO
0,c,35,CFO
1,d,55,VP
2,e,18,VP
3,f,21,CEO
4,g,35,MD


# Customizing Column Names


Though we’re reading the data from CSV files with Column headers, we can still have our own column names. We can achieve the same by adding a parameter called names in read_csv(...) as

In [23]:
df_csv = pd.read_csv('csv_example', names=['first', 'second', 'third'])

In [24]:
df_csv

Unnamed: 0,first,second,third
0,name,age,designation
1,a,20,VP
2,b,27,CEO
3,c,35,CFO
4,d,55,VP
5,e,18,VP
6,f,21,CEO
7,g,35,MD


However, even though we are successful in adding our own header, the top row still displays header which is a non desired one.

This can be avoided by using the header parameter in read_csv(…)to skip the row depicting the header. In this particular case, we know that first row, i.e. row 0 is header so we can skip it as

In [26]:
df_csv = pd.read_csv('csv_example', names=['first', 'second', 'third'], header=1)
df_csv


Unnamed: 0,first,second,third
0,b,27,CEO
1,c,35,CFO
2,d,55,VP
3,e,18,VP
4,f,21,CEO
5,g,35,MD


# Dataset separator

In [27]:
#Let’s first create a CSV file using a different separator i.e “:” (A colon)

df.to_csv('csv_example', index=False, sep=":")

This will create a file where the colon (‘:’) instead of comma (‘,’) shall be used as a separator. We can read the file as



In [33]:
df_csv = pd.read_csv('csv_example')

In [34]:
df_csv

Unnamed: 0,name:age:designation
0,a:20:VP
1,b:27:CEO
2,c:35:CFO
3,d:55:VP
4,e:18:VP
5,f:21:CEO
6,g:35:MD


In [38]:
df_csv = pd.read_csv('csv_example', sep=":")  #defining the seperation
df_csv

Unnamed: 0,name,age,designation
0,a,20,VP
1,b,27,CEO
2,c,35,CFO
3,d,55,VP
4,e,18,VP
5,f,21,CEO
6,g,35,MD


# Setting the Row Index



By default, Pandas DataFrame generates a row index automatically which we can change by setting any column as the Index as

In [46]:
sal=df_csv.set_index('age')

In [47]:
sal

Unnamed: 0_level_0,name,designation
age,Unnamed: 1_level_1,Unnamed: 2_level_1
20,a,VP
27,b,CEO
35,c,CFO
55,d,VP
18,e,VP
21,f,CEO
35,g,MD


# Load specific number of rows

You can do the same by specifying the number of Rows to be loaded by passing an argument nrows in read_csv(...)

In [48]:
# Load Only 3 Rows
df_csv = pd.read_csv('csv_example', sep=":", nrows=3)

In [49]:
df_csv

Unnamed: 0,name,age,designation
0,a,20,VP
1,b,27,CEO
2,c,35,CFO


# Skipping Empty Lines

    By default, read_csv(...) function skips blank line, i.e it will ignore blank lines while loading the file and constructing the DataFrame.

However, in case you want to load blank line(s) for doing some explicit calculations like counting empty records, you should mark skipping blank lines as False

In [51]:
df_csv = pd.read_csv('csv_example', skip_blank_lines=False, sep=":")
