We will continue using the same DataFrames as in the previous tutorial. Therefore you can continue in the same Notebook. If you decide to create a new one, don't forget to import the packages and create the same `df` and `df2`.

In [5]:
import numpy as np
import pandas as pd

In [6]:
dates = pd.date_range('20130101', periods=6)

In [7]:
df = pd.DataFrame(np.random.randn(6, 4), index=dates, columns=list('ABCD'))

# Getting

Below, we can find the table with a small cheat-sheet on how to get the values of DataFrame. 

The first coordinate always refers to the rows to slice. The second will be for the columns.

In [8]:
df[0:3]

Unnamed: 0,A,B,C,D
2013-01-01,7.6e-05,-0.362764,-1.081886,-0.385399
2013-01-02,1.965094,-0.030204,0.830541,-0.349466
2013-01-03,1.195461,0.038819,0.461198,-0.618108


In [11]:
df['20130102' : '20130104']

Unnamed: 0,A,B,C,D
2013-01-02,1.965094,-0.030204,0.830541,-0.349466
2013-01-03,1.195461,0.038819,0.461198,-0.618108
2013-01-04,-1.278339,0.724638,-0.620409,-0.208204


# Selection by Label

In [12]:
df.loc['2013-01-01']

A    0.000076
B   -0.362764
C   -1.081886
D   -0.385399
Name: 2013-01-01 00:00:00, dtype: float64

In [13]:
df.loc[:, ['A', 'B']]

Unnamed: 0,A,B
2013-01-01,7.6e-05,-0.362764
2013-01-02,1.965094,-0.030204
2013-01-03,1.195461,0.038819
2013-01-04,-1.278339,0.724638
2013-01-05,0.632169,1.501716
2013-01-06,1.245204,1.669894


Remember that the data type of the returned object is automatically changed based on the dimension of the object.

In [19]:
type(df.loc['20130101'])

pandas.core.series.Series

In [23]:
df.loc[dates[0], 'A']

7.636768909909155e-05

In [21]:
type(df.loc[dates[0], 'A'])

numpy.float64

# Selection by Position

Counting in Python always starts with 0. Therefore, the command [3] returns the 4th row.

In [24]:
df.iloc[3:5, 0:2]

Unnamed: 0,A,B
2013-01-04,-1.278339,0.724638
2013-01-05,0.632169,1.501716


In [25]:
df.iloc[3]

A   -1.278339
B    0.724638
C   -0.620409
D   -0.208204
Name: 2013-01-04 00:00:00, dtype: float64

In [26]:
df.iloc[1:3, :]

Unnamed: 0,A,B,C,D
2013-01-02,1.965094,-0.030204,0.830541,-0.349466
2013-01-03,1.195461,0.038819,0.461198,-0.618108


# Selection by dtype

In [27]:
df = pd.DataFrame({'string': list('abc'),
                       'int64': list(range(1, 4)),
                       'uint8': np.arange(3, 6).astype('u1'),
                       'float64': np.arange(4.0, 7.0),
                       'bool1': [True, False, True],
                       'bool2': [False, True, False],
                       'dates': pd.date_range('now', periods=3),
                       'category': pd.Series(list("ABC")).astype('category')})

In [28]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 3 entries, 0 to 2
Data columns (total 8 columns):
 #   Column    Non-Null Count  Dtype         
---  ------    --------------  -----         
 0   string    3 non-null      object        
 1   int64     3 non-null      int64         
 2   uint8     3 non-null      uint8         
 3   float64   3 non-null      float64       
 4   bool1     3 non-null      bool          
 5   bool2     3 non-null      bool          
 6   dates     3 non-null      datetime64[ns]
 7   category  3 non-null      category      
dtypes: bool(2), category(1), datetime64[ns](1), float64(1), int64(1), object(1), uint8(1)
memory usage: 368.0+ bytes


In [31]:
df

Unnamed: 0,string,int64,uint8,float64,bool1,bool2,dates,category
0,a,1,3,4.0,True,False,2022-04-13 17:49:14.719132,A
1,b,2,4,5.0,False,True,2022-04-14 17:49:14.719132,B
2,c,3,5,6.0,True,False,2022-04-15 17:49:14.719132,C


In [32]:
df.select_dtypes(include=[bool])

Unnamed: 0,bool1,bool2
0,True,False
1,False,True
2,True,False


# Boolean Indexing

In [33]:
df[df['float64'] >= 5]

Unnamed: 0,string,int64,uint8,float64,bool1,bool2,dates,category
1,b,2,4,5.0,False,True,2022-04-14 17:49:14.719132,B
2,c,3,5,6.0,True,False,2022-04-15 17:49:14.719132,C


In [39]:
df2 = df.copy()

In [40]:
df['E'] = ['one', 'two', 'three']

In [42]:
df2['E'].isin(['one', 'two'])

0     True
1     True
2    False
Name: E, dtype: bool

In [43]:
df2[df2['E'].isin(['one', 'two'])]

Unnamed: 0,string,int64,uint8,float64,bool1,bool2,dates,category,E
0,a,1,3,4.0,True,False,2022-04-13 17:49:14.719132,A,one
1,b,2,4,5.0,False,True,2022-04-14 17:49:14.719132,B,two


# Setting Values by Position

In [44]:
df.iat[0, 1] = -1 # index at...

In [45]:
df

Unnamed: 0,string,int64,uint8,float64,bool1,bool2,dates,category,E
0,a,-1,3,4.0,True,False,2022-04-13 17:49:14.719132,A,one
1,b,2,4,5.0,False,True,2022-04-14 17:49:14.719132,B,two
2,c,3,5,6.0,True,False,2022-04-15 17:49:14.719132,C,three


In [46]:
df.iloc[0, 1] = 2 # index location...

In [47]:
df

Unnamed: 0,string,int64,uint8,float64,bool1,bool2,dates,category,E
0,a,2,3,4.0,True,False,2022-04-13 17:49:14.719132,A,one
1,b,2,4,5.0,False,True,2022-04-14 17:49:14.719132,B,two
2,c,3,5,6.0,True,False,2022-04-15 17:49:14.719132,C,three


In [50]:
df.at[0, 1] = 2 # simple indexing

In [51]:
df

Unnamed: 0,string,int64,uint8,float64,bool1,bool2,dates,category,E,1
0,a,2,3,4.0,True,False,2022-04-13 17:49:14.719132,A,one,2.0
1,b,2,4,5.0,False,True,2022-04-14 17:49:14.719132,B,two,
2,c,3,5,6.0,True,False,2022-04-15 17:49:14.719132,C,three,


# Setting Values by Label

In [53]:
df.at[0, 'float64'] # call location at 0, 'float64'

4.0

In [54]:
df.loc[0, 'float64'] = -10

In [55]:
df

Unnamed: 0,string,int64,uint8,float64,bool1,bool2,dates,category,E,1
0,a,2,3,-10.0,True,False,2022-04-13 17:49:14.719132,A,one,2.0
1,b,2,4,5.0,False,True,2022-04-14 17:49:14.719132,B,two,
2,c,3,5,6.0,True,False,2022-04-15 17:49:14.719132,C,three,


In [59]:
np.array([50] * len(df))

array([50, 50, 50])

In [60]:
df.loc[:, 'uint8'] = np.array([50] * len(df))

In [61]:
df

Unnamed: 0,string,int64,uint8,float64,bool1,bool2,dates,category,E,1,unit8
0,a,2,50,-10.0,True,False,2022-04-13 17:49:14.719132,A,one,2.0,50
1,b,2,50,5.0,False,True,2022-04-14 17:49:14.719132,B,two,,50
2,c,3,50,6.0,True,False,2022-04-15 17:49:14.719132,C,three,,50
