# DataFrame

#### A DataFrame represents a tabular, spreadsheet-like data structure containing an ordered collection of columns, each of which can be a different value type. It has both row and column index. It can be thought of as dictionary of series. Row oriented and column oriented operations are treated symmetrically. 

#### Data is stored as 1D or 2D blocks rather than list or dictionary. Higher dimensions are also possible using heirarchical structure. 

columns in data frame = series and colums + columns (i.e addition of series) = DF (Analogy)

In [5]:
#Constructing the DataFrame:

In [8]:
import pandas as  pd
from pandas import DataFrame

In [9]:
data = {'state' : ['Mah', 'UP', 'MP', 'Kar', 'TN', 'Naga'],
       'year': [1960, 1947, 1990, 1995, 2000, 1962],
        'pop': [1.1,1.2,1.6,1.1,1.0,2.0]}
frame = DataFrame(data)

We are using the pd.DataFrame() constructor to generate these DataFrame objects. The syntax for declaring a new one is a dictionary whose keys are the column names and whose values are a list of entries.

In [10]:
frame

Unnamed: 0,state,year,pop
0,Mah,1960,1.1
1,UP,1947,1.2
2,MP,1990,1.6
3,Kar,1995,1.1
4,TN,2000,1.0
5,Naga,1962,2.0


The dictionary-list constructor assigns values to the column labels, but just uses an ascending count from 0 to n-1

We can also change the index labels:

In [11]:
pd.DataFrame({'state' : ['Mah', 'UP', 'MP', 'Kar', 'TN', 'Naga'],
       'year': [1960, 1947, 1990, 1995, 2000, 1962],
        'pop': [1.1,1.2,1.6,1.1,1.0,2.0]}, index = ['state1','state2','state3','state4','state5','state6'])

Unnamed: 0,state,year,pop
state1,Mah,1960,1.1
state2,UP,1947,1.2
state3,MP,1990,1.6
state4,Kar,1995,1.1
state5,TN,2000,1.0
state6,Naga,1962,2.0


#Pass a column that is not appearing in data and it will return NaN

Recall that series does not have column name but have only overall name. While in Pandas we have index and column name

checking the dimensions:

In [17]:
frame.shape 

(6, 3)

In [18]:
rows, columns = frame.shape

In [48]:
rows

6

In [49]:
columns

3

# Retrieving Columns using Indexing:

In [45]:
frame.columns

Index(['state', 'year', 'pop'], dtype='object')

In [46]:
frame[['year','state']]

Unnamed: 0,year,state
a,1960,Mah
b,1947,UP
c,1990,MP
d,1995,Kar
e,2000,TN
f,1962,Naga


In [28]:
frame['year']

0    1960
1    1947
2    1990
3    1995
4    2000
5    1962
Name: year, dtype: int64

In [44]:
frame.year #use any one syntax

a    1960
b    1947
c    1990
d    1995
e    2000
f    1962
Name: year, dtype: int64

# Retriving rows:

In [29]:
frame.loc[frame['year']==1960]

Unnamed: 0,state,year,pop
0,Mah,1960,1.1


In [36]:
frame.index = ['a','b','c','d','e','f']

In [39]:
frame.loc['a'] #Using indexing if we have specified any particular index to our frame

state     Mah
year     1960
pop       1.1
Name: a, dtype: object