## Pandas DataFrame

In [1]:
import pandas as pd
pd.__version__

'1.3.4'

### DataFrame as a Dictionary

The first analogy we will consider is the DataFrame as a dictionary of related Series objects. Let's return to an example of Population and Sex Ratio and Literacy Rate of different places:

In [2]:
Population = {'Gralazala': 66070,
        'Dachepalli': 75233,
        'Piduguralla': 122319,
        'Karempudi': 52365,
        'Rentachinthala':49827,
        'Durgi':49095,
        'Macherla': 113048}
Population = pd.Series(Population)

Population

Gralazala          66070
Dachepalli         75233
Piduguralla       122319
Karempudi          52365
Rentachinthala     49827
Durgi              49095
Macherla          113048
dtype: int64

In [3]:
SRatio = {'Gralazala': 1005,
        'Dachepalli': 1000,
        'Piduguralla': 994,
        'Karempudi': 974,
        'Rentachinthala': 988,
        'Durgi':997,
        'Macherla': 978}

SRatio = pd.Series(SRatio)
SRatio

Gralazala         1005
Dachepalli        1000
Piduguralla        994
Karempudi          974
Rentachinthala     988
Durgi              997
Macherla           978
dtype: int64

In [4]:
LRate = {'Gralazala': 50.3,
        'Dachepalli': 51.9,
        'Piduguralla': 55.4,
        'Karempudi': 49.9,
        'Rentachinthala': 48.2,
        'Durgi': 46.5,
        'Macherla': 55.6}
LRate = pd.Series(LRate)
LRate

Gralazala         50.3
Dachepalli        51.9
Piduguralla       55.4
Karempudi         49.9
Rentachinthala    48.2
Durgi             46.5
Macherla          55.6
dtype: float64

In [5]:
data = pd.DataFrame({'Population':Population, 'SRatio':SRatio, 'LRate':LRate})
data

Unnamed: 0,Population,SRatio,LRate
Gralazala,66070,1005,50.3
Dachepalli,75233,1000,51.9
Piduguralla,122319,994,55.4
Karempudi,52365,974,49.9
Rentachinthala,49827,988,48.2
Durgi,49095,997,46.5
Macherla,113048,978,55.6


Like the Series object, the DataFrame has an index attribute that gives access to the index labels:

In [6]:
data.index

Index(['Gralazala', 'Dachepalli', 'Piduguralla', 'Karempudi', 'Rentachinthala',
       'Durgi', 'Macherla'],
      dtype='object')

Additionally, the DataFrame has a columns attribute, which is an Index object holding the column labels:

In [7]:
data.columns

Index(['Population', 'SRatio', 'LRate'], dtype='object')

Similarly, we can also think of a DataFrame as a specialization of a dictionary. Where a dictionary maps a key to a value, a DataFrame maps a column name to a Series of column data. For example, asking for the 'LRate' attribute returns the Series object containing the LRate we saw earlier:

In [8]:
data['SRatio']

Gralazala         1005
Dachepalli        1000
Piduguralla        994
Karempudi          974
Rentachinthala     988
Durgi              997
Macherla           978
Name: SRatio, dtype: int64

The individual Series that make up the columns of the DataFrame can be accessed via dictionary-style indexing of the column name:

In [9]:
data['Population']

Gralazala          66070
Dachepalli         75233
Piduguralla       122319
Karempudi          52365
Rentachinthala     49827
Durgi              49095
Macherla          113048
Name: Population, dtype: int64

Equivalently, we can use attribute-style access with column names that are strings:

In [10]:
data.SRatio

Gralazala         1005
Dachepalli        1000
Piduguralla        994
Karempudi          974
Rentachinthala     988
Durgi              997
Macherla           978
Name: SRatio, dtype: int64

In [11]:
data.iloc[:,0:2]

Unnamed: 0,Population,SRatio
Gralazala,66070,1005
Dachepalli,75233,1000
Piduguralla,122319,994
Karempudi,52365,974
Rentachinthala,49827,988
Durgi,49095,997
Macherla,113048,978


In [12]:
data.iloc[2:6,1:3]

Unnamed: 0,SRatio,LRate
Piduguralla,994,55.4
Karempudi,974,49.9
Rentachinthala,988,48.2
Durgi,997,46.5


### DataFrame as two-dimensional array

As mentioned previously, we can also view the DataFrame as an enhanced two-dimensional array. We can examine the raw underlying data array using the values attribute:

In [13]:
data.values

array([[6.60700e+04, 1.00500e+03, 5.03000e+01],
       [7.52330e+04, 1.00000e+03, 5.19000e+01],
       [1.22319e+05, 9.94000e+02, 5.54000e+01],
       [5.23650e+04, 9.74000e+02, 4.99000e+01],
       [4.98270e+04, 9.88000e+02, 4.82000e+01],
       [4.90950e+04, 9.97000e+02, 4.65000e+01],
       [1.13048e+05, 9.78000e+02, 5.56000e+01]])

In [14]:
data.keys()

Index(['Population', 'SRatio', 'LRate'], dtype='object')

In [15]:
data.index

Index(['Gralazala', 'Dachepalli', 'Piduguralla', 'Karempudi', 'Rentachinthala',
       'Durgi', 'Macherla'],
      dtype='object')

With this picture in mind, many familiar array-like observations can be done on the DataFrame itself. For example, we can transpose the full DataFrame to swap rows and columns:

In [16]:
data.T

Unnamed: 0,Gralazala,Dachepalli,Piduguralla,Karempudi,Rentachinthala,Durgi,Macherla
Population,66070.0,75233.0,122319.0,52365.0,49827.0,49095.0,113048.0
SRatio,1005.0,1000.0,994.0,974.0,988.0,997.0,978.0
LRate,50.3,51.9,55.4,49.9,48.2,46.5,55.6


In [17]:
data.values[1]

array([7.5233e+04, 1.0000e+03, 5.1900e+01])

The Pandas uses the `loc` and`iloc` indexers. Using the iloc indexer, we can index the underlying array as if it is a simple NumPy array (using the implicit Python-style index), but the DataFrame index and column labels are maintained in the result:

In [18]:
data.iloc[:5, :2]

Unnamed: 0,Population,SRatio
Gralazala,66070,1005
Dachepalli,75233,1000
Piduguralla,122319,994
Karempudi,52365,974
Rentachinthala,49827,988


Similarly, using the`loc` indexer we can index the underlying data in an array-like style but using the explicit index and column names:

In [19]:
data.loc['Karempudi', 'SRatio']

974

In [20]:
data.loc[:'Rentachinthala', :'LRate']

Unnamed: 0,Population,SRatio,LRate
Gralazala,66070,1005,50.3
Dachepalli,75233,1000,51.9
Piduguralla,122319,994,55.4
Karempudi,52365,974,49.9
Rentachinthala,49827,988,48.2


Find Maximum of Rows and Colums

In [21]:
Max_Column = data.max()
print(Max_Column)

Population    122319.0
SRatio          1005.0
LRate             55.6
dtype: float64


In [22]:
Max_Rows = data.max(axis = 1)
print(Max_Rows)

Gralazala          66070.0
Dachepalli         75233.0
Piduguralla       122319.0
Karempudi          52365.0
Rentachinthala     49827.0
Durgi              49095.0
Macherla          113048.0
dtype: float64


Maximum of Single column

In [23]:
Max_SColumn = data['LRate'].max()
print(Max_SColumn)

55.6


Multiple Columns

In [24]:
Max_SColumn = data[['Population','LRate']].max()
print(Max_SColumn)

Population    122319.0
LRate             55.6
dtype: float64


Get row index label of Maximum value in every column

In [25]:
max = data.idxmax()
max

Population    Piduguralla
SRatio          Gralazala
LRate            Macherla
dtype: object

Get Row Index label of a particular column

In [26]:
Max_PCol = data['Population'].idxmax()
Max_PCol

'Piduguralla'

In [27]:
data[1:3]

Unnamed: 0,Population,SRatio,LRate
Dachepalli,75233,1000,51.9
Piduguralla,122319,994,55.4


In [28]:
data[data.LRate > 50]

Unnamed: 0,Population,SRatio,LRate
Gralazala,66070,1005,50.3
Dachepalli,75233,1000,51.9
Piduguralla,122319,994,55.4
Macherla,113048,978,55.6
