A DataFrame represents a tabular, spreadsheet-like data structure containing an ordered collection of columns, each of which can be a different value type (numeric, string, boolean, etc.). The DataFrame has both a row and column index; it can be thought of as a dict of Series (one for all sharing the same index). Compared with other such DataFrame-like structures you may have used before (like R’s data.frame), row-oriented and column-oriented operations in DataFrame are treated roughly symmetrically. Under the hood, the data is stored as one or more two-dimensional blocks rather than a list, dict, or some other collection of one-dimensional arrays. 

> There are numerous ways to construct a DataFrame, though one of the most common is from a dict of equal-length lists or NumPy arrays

In [1]:
import pandas as pd
from pandas import Series, DataFrame
import numpy as np

In [2]:
data = {'state':['Loralai', 'Quetta', 'Duki', 'Sanjavi', 'Usman Bagh', 'Pshin'],
        'year':[2001, 2002, 2003, 2004, 2005, 2006],
        'pop': [1.5, 1.7, 2.0, 2.4, 2.9, 3.0]}

In [3]:
frame = DataFrame(data)

The resulting DataFrame will have its index assigned automatically as with Series, and the columns are placed in sorted order:


In [4]:
frame

Unnamed: 0,state,year,pop
0,Loralai,2001,1.5
1,Quetta,2002,1.7
2,Duki,2003,2.0
3,Sanjavi,2004,2.4
4,Usman Bagh,2005,2.9
5,Pshin,2006,3.0


If you specify a sequence of columns, the DataFrame’s columns will be exactly what you pass:

In [5]:
DataFrame(data, columns=['state', 'pop', 'year'])

Unnamed: 0,state,pop,year
0,Loralai,1.5,2001
1,Quetta,1.7,2002
2,Duki,2.0,2003
3,Sanjavi,2.4,2004
4,Usman Bagh,2.9,2005
5,Pshin,3.0,2006


As with Series, if you pass a column that isn’t contained in data, it will appear with NA values in the result:


In [6]:
frame2 = DataFrame(data, columns=['state', 'pop', 'year', 'debt'],
        index=['a', 'b', 'c', 'd', 'e', 'f'])

In [7]:
frame2

Unnamed: 0,state,pop,year,debt
a,Loralai,1.5,2001,
b,Quetta,1.7,2002,
c,Duki,2.0,2003,
d,Sanjavi,2.4,2004,
e,Usman Bagh,2.9,2005,
f,Pshin,3.0,2006,


In [8]:
frame2.columns

Index(['state', 'pop', 'year', 'debt'], dtype='object')

A column in a DataFrame can be retrieved as a Series either by dict-like notation or by attribute:

In [9]:
frame['state'], frame2.year

(0       Loralai
 1        Quetta
 2          Duki
 3       Sanjavi
 4    Usman Bagh
 5         Pshin
 Name: state, dtype: object,
 a    2001
 b    2002
 c    2003
 d    2004
 e    2005
 f    2006
 Name: year, dtype: int64)

Rows can also be retrieved by position or name by a couple of methods, such as the ix indexing field

xi 
کی  جگہ  
iloc ya loc ka function use karna ka

jis my iloc integer k lia or loc non integer k lia hota ha

In [10]:
frame2.loc['b']

state    Quetta
pop         1.7
year       2002
debt        NaN
Name: b, dtype: object

Columns can be modified by assignment. For example, the empty 'debt' column could be assigned a scalar value or an array of values:

In [11]:
# if you want to assign 1 number to whole row

frame2['debt'] = 1.0

frame2

Unnamed: 0,state,pop,year,debt
a,Loralai,1.5,2001,1.0
b,Quetta,1.7,2002,1.0
c,Duki,2.0,2003,1.0
d,Sanjavi,2.4,2004,1.0
e,Usman Bagh,2.9,2005,1.0
f,Pshin,3.0,2006,1.0


In [12]:
# if you want to assign array

frame2['debt'] = [16.5, 10.0, 9.5, 2.0, 1.0, 3]

frame2

Unnamed: 0,state,pop,year,debt
a,Loralai,1.5,2001,16.5
b,Quetta,1.7,2002,10.0
c,Duki,2.0,2003,9.5
d,Sanjavi,2.4,2004,2.0
e,Usman Bagh,2.9,2005,1.0
f,Pshin,3.0,2006,3.0


In [13]:
# if you want to assign a range

frame2['debt'] = np.arange(6.0)

frame2

Unnamed: 0,state,pop,year,debt
a,Loralai,1.5,2001,0.0
b,Quetta,1.7,2002,1.0
c,Duki,2.0,2003,2.0
d,Sanjavi,2.4,2004,3.0
e,Usman Bagh,2.9,2005,4.0
f,Pshin,3.0,2006,5.0


When assigning lists or arrays to a column, the value’s length must match the length of the DataFrame. If you assign a Series, it will be instead conformed exactly to the DataFrame’s index, inserting missing values in any holes:

In [14]:
val = Series([1.2, 2.2, 3.2, 4.2], index= ['a', 'c', 'e', 'f'])
val

a    1.2
c    2.2
e    3.2
f    4.2
dtype: float64

In [15]:
frame2['debt'] = val

frame2

Unnamed: 0,state,pop,year,debt
a,Loralai,1.5,2001,1.2
b,Quetta,1.7,2002,
c,Duki,2.0,2003,2.2
d,Sanjavi,2.4,2004,
e,Usman Bagh,2.9,2005,3.2
f,Pshin,3.0,2006,4.2


Assigning a column that doesn’t exist will create a new column. The del keyword will delete columns as with a dict:

In [16]:
frame2['estern' ] = frame2.state == 'Quetta'

frame2

Unnamed: 0,state,pop,year,debt,estern
a,Loralai,1.5,2001,1.2,False
b,Quetta,1.7,2002,,True
c,Duki,2.0,2003,2.2,False
d,Sanjavi,2.4,2004,,False
e,Usman Bagh,2.9,2005,3.2,False
f,Pshin,3.0,2006,4.2,False


In [17]:
del frame2['estern']

frame2

Unnamed: 0,state,pop,year,debt
a,Loralai,1.5,2001,1.2
b,Quetta,1.7,2002,
c,Duki,2.0,2003,2.2
d,Sanjavi,2.4,2004,
e,Usman Bagh,2.9,2005,3.2
f,Pshin,3.0,2006,4.2


> Another common form of data is a nested dict of dicts format:


In [18]:
pop = {'Nevada': {2001: 2.4, 2002: 2.9},
'Ohio': {2000: 1.5, 2001: 1.7, 2002: 3.6}}


If passed to DataFrame, it will interpret the outer dict keys as the columns and the inner keys as the row indices:

In [19]:
frame3 = DataFrame(pop)

# It can also be written as
# pop = Dataframe({'Nevada': {2001: 2.4, 2002: 2.9},
# 'Ohio': {2000: 1.5, 2001: 1.7, 2002: 3.6}})

frame3

Unnamed: 0,Nevada,Ohio
2001,2.4,1.7
2002,2.9,3.6
2000,,1.5


Of course you can always transpose the result:

In [20]:
frame3.T

Unnamed: 0,2001,2002,2000
Nevada,2.4,2.9,
Ohio,1.7,3.6,1.5


The keys in the inner dicts are unioned and sorted to form the index in the result. This isn’t true if an explicit index is specified:


In [21]:
DataFrame(pop, index= [2001, 2002, 2003])

Unnamed: 0,Nevada,Ohio
2001,2.4,1.7
2002,2.9,3.6
2003,,


Dicts of Series are treated much in the same way:

In [22]:
pdata = {'Ohio': frame3['Ohio'][:-1],
        'Nevada': frame3['Nevada'][:2]}

In [23]:
DataFrame(pdata)

Unnamed: 0,Ohio,Nevada
2001,1.7,2.4
2002,3.6,2.9


If a DataFrame’s index and columns have their name attributes set, these will also be displayed:

In [24]:
frame3.index.name = 'year'; frame3.columns.name = 'state'

frame3

state,Nevada,Ohio
year,Unnamed: 1_level_1,Unnamed: 2_level_1
2001,2.4,1.7
2002,2.9,3.6
2000,,1.5


Like Series, the values attribute returns the data contained in the DataFrame as a 2D ndarray:

In [25]:
frame3.values

array([[2.4, 1.7],
       [2.9, 3.6],
       [nan, 1.5]])

If the DataFrame’s columns are different dtypes, the dtype of the values array will be chosen to accomodate all of the columns:


In [26]:
frame2.values

array([['Loralai', 1.5, 2001, 1.2],
       ['Quetta', 1.7, 2002, nan],
       ['Duki', 2.0, 2003, 2.2],
       ['Sanjavi', 2.4, 2004, nan],
       ['Usman Bagh', 2.9, 2005, 3.2],
       ['Pshin', 3.0, 2006, 4.2]], dtype=object)

![Possible data inputs to DataFrame constructor](../../Pictures/Possible%20data%20inputs%20to%20DataFrame%20constructor.png)