#### "Slicing" DataFrames 

"Slicing" is a way to get a "suset" of a DataFrame using its indeces.

In [2]:
import pandas as pd

In [17]:
data = {'Integer': [0, 1, 2, 3, 4, 5],
        'English': ['zero', 'one', 'two', 'three', 'four', 'five'],
        'Mandarin': ['líng', 'yī', 'èr', 'sān', 'sì', 'wǔ'],
        'Spanish': ['cero', 'uno', 'dos', 'tres', 'cuatro', 'cinco']}

In [18]:
df_count = pd.DataFrame(data)

print(df_count, '\n') 

   Integer English Mandarin Spanish
0        0    zero     líng    cero
1        1     one       yī     uno
2        2     two       èr     dos
3        3   three      sān    tres
4        4    four       sì  cuatro
5        5    five       wǔ   cinco 



In [19]:
df_rows = df_count[0:5] #reference rows starting at zero and ending at four - it's important to note that the final digit is excluded.  

print(df_rows, '\n')

   Integer English Mandarin Spanish
0        0    zero     líng    cero
1        1     one       yī     uno
2        2     two       èr     dos
3        3   three      sān    tres
4        4    four       sì  cuatro 



In [20]:
df_rows = df_count[0:5] #reference rows starting at zero and ending at four - it's important to note that the final digit is excluded.  

print(df_rows, '\n')

   Integer English Mandarin Spanish
0        0    zero     líng    cero
1        1     one       yī     uno
2        2     two       èr     dos
3        3   three      sān    tres
4        4    four       sì  cuatro 



In [21]:
df_rows2 = df_count[:5] #if you omit the first digit it starts at the beginning of a DataFrame.

print(df_rows2, '\n')

   Integer English Mandarin Spanish
0        0    zero     líng    cero
1        1     one       yī     uno
2        2     two       èr     dos
3        3   three      sān    tres
4        4    four       sì  cuatro 



In [22]:
df_rows3 = df_count[1:] #if you omit the last digit it goes until the end of a DataFrame.

print(df_rows3, '\n')

   Integer English Mandarin Spanish
1        1     one       yī     uno
2        2     two       èr     dos
3        3   three      sān    tres
4        4    four       sì  cuatro
5        5    five       wǔ   cinco 



In [23]:
df_rows4 = df_count[:] #omitting both digits is equivalent to specifying all rows of a DataFrame.

print(df_rows4, '\n')

   Integer English Mandarin Spanish
0        0    zero     líng    cero
1        1     one       yī     uno
2        2     two       èr     dos
3        3   three      sān    tres
4        4    four       sì  cuatro
5        5    five       wǔ   cinco 



In [24]:
df_cols = df_count['English'] #you can return an entire column by speciying it in quotes

print(df_cols)

0     zero
1      one
2      two
3    three
4     four
5     five
Name: English, dtype: object


In [25]:
df_cols2 = df_count[['English', 'Spanish']] #you can return several columns by speciying each in quotes. It's important to note that a second set of square brackets are required for cases such as these.

print(df_cols2)

  English Spanish
0    zero    cero
1     one     uno
2     two     dos
3   three    tres
4    four  cuatro
5    five   cinco


In [26]:
df_cols3 = df_count[df_count.columns.difference(['Mandarin'])] #However, if you don't know the namrs of all the columns exept the one you wish to exclude 

print(df_cols3)

  English  Integer Spanish
0    zero        0    cero
1     one        1     uno
2     two        2     dos
3   three        3    tres
4    four        4  cuatro
5    five        5   cinco


In [27]:
df_cols4 = df_count[[x for x in df_count.columns if x != 'Mandarin']] #another way to excude a column

print(df_cols4, '\n')

   Integer English Spanish
0        0    zero    cero
1        1     one     uno
2        2     two     dos
3        3   three    tres
4        4    four  cuatro
5        5    five   cinco 



The last example is an apt transition for "condtional selections".

In [31]:
df_cond = df_count['Integer'] >= 1 #given each row, evaluate the condition and assign a Boolean value.  I illustrated this because it's a common mistake.

print(df_cond)

0    False
1     True
2     True
3     True
4     True
5     True
Name: Integer, dtype: bool


In [32]:
df_cond2 = df_count[df_count['Integer'] >= 1] #return rows with positive numbers

print(df_cond2)

   Integer English Mandarin Spanish
1        1     one       yī     uno
2        2     two       èr     dos
3        3   three      sān    tres
4        4    four       sì  cuatro
5        5    five       wǔ   cinco


In [37]:
df_cond3 = df_count[(df_count['Integer'] >= 1) & (~df_count['Integer'] % 2)] #return rows with positive numbers AND that's divible by two

print(df_cond3)

   Integer English Mandarin Spanish
2        2     two       èr     dos
4        4    four       sì  cuatro


For the sake of readability you may assign the conditions to variables.

In [42]:
positive = df_count['Integer'] >= 1
divisibleBy2 = ~df_count['Integer'] % 2 # the '%' is a modulo operator, while the '~' is an inverter or NOT symbol

df_cond3b = df_count[positive & divisibleBy2] #return rows with positive numbers AND that's divible by two

print(df_cond3b)

   Integer English Mandarin Spanish
2        2     two       èr     dos
4        4    four       sì  cuatro


In [52]:
nullEnglish = ~df_count['English'].notnull() #I used notnull() as most datasets have empty values
nullMandarin = ~df_count['Mandarin'].notnull()
nullSpanish = ~df_count['Spanish'].notnull()

df_cond4 = df_count[~( nullEnglish | nullMandarin | nullSpanish)] #Had to use a "double negative" of sorts to illustrate the use of OR.

print(df_cond4)

   Integer English Mandarin Spanish
0        0    zero     líng    cero
1        1     one       yī     uno
2        2     two       èr     dos
3        3   three      sān    tres
4        4    four       sì  cuatro
5        5    five       wǔ   cinco


Learning Activity: What effect does NOT have on AND? OR? 

You can also "slice" specific rows and columns. As far as I'm aware, you need to use something like _.loc_ or _.iloc_ functions.  Previously. you could use _.ix_ but I think it has been deprecated( that is, only "older" versions of Python made use of it).