This material should help you get the ideas clearer from the first meeting:

In [4]:
names=["Tomás", "Pauline", "Pablo", "Bjork","Alan","Juana"]
woman=[False,True,False,False,False,True]
ages=[32,33,28,30,32,27]
country=["Chile", "Senegal", "Spain", "Norway","Peru","Peru"]
education=["Bach", "Bach", "Master", "PhD","Bach","Master"]

# now in a dict:
data={'name':names, 'age':ages, 'girl':woman,'born In':country, 'degree':education}

#now into a DF
import pandas as pd

friends=pd.DataFrame.from_dict(data)
# seeing it:
friends

Unnamed: 0,name,age,girl,born In,degree
0,Tomás,32,False,Chile,Bach
1,Pauline,33,True,Senegal,Bach
2,Pablo,28,False,Spain,Master
3,Bjork,30,False,Norway,PhD
4,Alan,32,False,Peru,Bach
5,Juana,27,True,Peru,Master


The result is what you expected, but you need to be sure of what data structure you have:

In [5]:
#what is it?
type(friends)
#this shows that friends is a dataframe

pandas.core.frame.DataFrame

In [6]:
#this is good
friends.age

0    32
1    33
2    28
3    30
4    32
5    27
Name: age, dtype: int64

In [7]:
#what is it?
type(friends.age)
#a series is a column for pandas... there are specific operations that work for a series that do not work for dataframe other types

pandas.core.series.Series

In [8]:
#this is good
friends['age']

0    32
1    33
2    28
3    30
4    32
5    27
Name: age, dtype: int64

In [9]:
#what is it?
type(friends['age'])
#this is the same result as type(friends.age)... friends.age is a little more difficult for pandas to operate/read since you can put anything in quotations and it will read that name

pandas.core.series.Series

In [19]:
#this is bad
friends.iloc[['age']]
# .iloc uses the following to find positions in a df:
#An integer, e.g. 5.
#A list or array of integers, e.g. [4, 3, 0].
#A slice object with ints, e.g. 1:7.
#A boolean array.

TypeError: cannot perform reduce with flexible type

In [22]:
#correct example, added by brady
friends.iloc[:,[1,2]]
#note: .loc ca

Unnamed: 0,age,girl
0,32,False
1,33,True
2,28,False
3,30,False
4,32,False
5,27,True


In [23]:
#this is bad
friends.loc[['age']]

age    33
Name: 1, dtype: object

In [13]:
#this is bad
friends['age','born In']
#to be a list it needs additional square brackets

KeyError: ('age', 'born In')

In [14]:
#this is good
friends[['age','born In']]

Unnamed: 0,age,born In
0,32,Chile
1,33,Senegal
2,28,Spain
3,30,Norway
4,32,Peru
5,27,Peru


In [24]:
# what is it?
type(friends[['age','born In']])

pandas.core.frame.DataFrame

In [25]:
#this is bad
friends.'born In'

SyntaxError: invalid syntax (<ipython-input-25-a1aa66ff4520>, line 2)

In [17]:
#this is good
friends.loc[:,['age','born In']]

Unnamed: 0,age,born In
0,32,Chile
1,33,Senegal
2,28,Spain
3,30,Norway
4,32,Peru
5,27,Peru


In [26]:
type(friends.loc[:,['age','born In']])

pandas.core.frame.DataFrame

In [27]:
#this is bad
friends.loc[:,['age':'born In']]


#loc uses names and cannot do slices

SyntaxError: invalid syntax (<ipython-input-27-5fb6a0a5d253>, line 2)

In [None]:
#this is bad
friends.iloc[:,['age','born In']]


#iloc uses positions to get columns (or rows) and can use slices (see below)

In [22]:
# this is good (but different)
friends.iloc[:,1:4]

Unnamed: 0,age,girl,born In
0,32,False,Chile
1,33,True,Senegal
2,28,False,Spain
3,30,False,Norway
4,32,False,Peru
5,27,True,Peru


In [23]:
# what is it?
type(friends.iloc[:,1:4])


pandas.core.frame.DataFrame

In [24]:
# this is good
friends.iloc[:,[1,3]]
#iloc accepts a list of indexes but not a list of names

Unnamed: 0,age,born In
0,32,Chile
1,33,Senegal
2,28,Spain
3,30,Norway
4,32,Peru
5,27,Peru


In [25]:
#what is it?
type(friends.iloc[:,[1,3]])

pandas.core.frame.DataFrame

In [26]:
friends[friends.age>30]

Unnamed: 0,name,age,girl,born In,degree
0,Tomás,32,False,Chile,Bach
1,Pauline,33,True,Senegal,Bach
4,Alan,32,False,Peru,Bach


Some people like coding with the filter language:

In [32]:
# 
filter1=friends.age>30
friends[filter1]

Unnamed: 0,name,age,girl,born In,degree
0,Tomás,32,False,Chile,Bach
1,Pauline,33,True,Senegal,Bach
4,Alan,32,False,Peru,Bach


In [33]:
friends.where(filter1)
#where() creates missing values.... if something is not in the condition it gives a missing value (Not a Number/NaN)
#this could also be in the format: friends.where(friends.age>30)

Unnamed: 0,name,age,girl,born In,degree
0,Tomás,32.0,0.0,Chile,Bach
1,Pauline,33.0,1.0,Senegal,Bach
2,,,,,
3,,,,,
4,Alan,32.0,0.0,Peru,Bach
5,,,,,


In [34]:
filter1a='age>30'
friends.query(filter1a)
#query must be text, and uses a different format than where(). It does not use the name of the dataframe...
#it uses the last item in memory... so it can be more flexible with multiple cases

Unnamed: 0,name,age,girl,born In,degree
0,Tomás,32,False,Chile,Bach
1,Pauline,33,True,Senegal,Bach
4,Alan,32,False,Peru,Bach


In [35]:
isinstance(friends[filter1], pd.DataFrame), \
isinstance(friends.where(filter1), pd.DataFrame), \
isinstance(friends.query(filter1a), pd.DataFrame)
#the backslash means that the code continues on the next line
#The isinstance() function checks if the object (first argument) is an instance or subclass of classinfo class (second argument).
#The syntax of isinstance() is: isinstance(object, classinfo)

(True, True, True)

When you have Boolean values (True/False) you can simplify:

In [36]:
#from:
friends[friends.girl==False]
#Note: The order matters! if the operation is still a dataframe then you can continue operations as dataframes... if you convert to, say, country only, and then ask a df command then it will give an error

Unnamed: 0,name,age,girl,born In,degree
0,Tomás,32,False,Chile,Bach
2,Pablo,28,False,Spain,Master
3,Bjork,30,False,Norway,PhD
4,Alan,32,False,Peru,Bach


In [32]:
# to...
friends[~friends.girl]
#samesies for Boolean values

Unnamed: 0,name,age,girl,born In,degree
0,Tomás,32,False,Chile,Bach
2,Pablo,28,False,Spain,Master
3,Bjork,30,False,Norway,PhD
4,Alan,32,False,Peru,Bach


You can have two filters:

In [33]:
# this will not work because there are not parentheses
friends[~friends.girl & friends.degree=='Bach']

  result = method(y)


TypeError: invalid type comparison

In [34]:
# this will (with parentheses)
friends[(~friends.girl) & (friends.degree=='Bach')]

Unnamed: 0,name,age,girl,born In,degree
0,Tomás,32,False,Chile,Bach
4,Alan,32,False,Peru,Bach


Other times you want a values once a filter was applied:

In [35]:
# youngest male:
friends[(~friends.girl) & (friends.age.min())] # this is wrong!
#friends.age.min() is not compared to anything so it results in nothing... in the next one there is a comparison for the filter to work with...


Unnamed: 0,name,age,girl,born In,degree
0,Tomás,32,False,Chile,Bach
2,Pablo,28,False,Spain,Master
3,Bjork,30,False,Norway,PhD
4,Alan,32,False,Peru,Bach


In [36]:
friends[(~friends.girl) & (friends.age==friends.age.min())] # this is wrong too!
#...so it asks "who is a boy and also the youngest?" The youngest friend is a girl so it provides no results!
#semantically incorrect (it is a filter which results from a valid comparison but that comparison makes not sense for what results we want)
#... but is not syntactically incorrect

Unnamed: 0,name,age,girl,born In,degree


In [37]:
friends.age.min()

27

You got empty answer because there is no man aged 27.

In [43]:
# this is correct
friends[~friends.girl].age.min()
#after the period operates on the results of before the period

28

Once you know the right age, you have to put it in the right place:

In [38]:
friends[friends.age==friends[~friends.girl].age.min()]
#this asks : give me the friends who have the age in the square bracket (the age of the youngest boy... this could still be inccorect since some girls may also have that age)
#remember that filters must be comparisons... work through the steps to get that filter to tell you what you want before moving to the next step/specification
#if you can't solve something within 30ish minutes, leave, come back, and try a new approach. Your first approach is probably wrong

Unnamed: 0,name,age,girl,born In,degree
2,Pablo,28,False,Spain,Master


In [46]:
# or
friends.where(friends.age==friends[~friends.girl].age.min())


Unnamed: 0,name,age,girl,born In,degree
0,,,,,
1,,,,,
2,Pablo,28.0,0.0,Spain,Master
3,,,,,
4,,,,,
5,,,,,


In [50]:
# or
friends.where(friends.age==friends[~friends.girl].age.min()).dropna()
#dropna() drops all the rows with at least one missing values. This can be dangerous

Unnamed: 0,name,age,girl,born In,degree
2,Pablo,28.0,0.0,Spain,Master


The problem is that 'friends' are not subset and the age keeps being that of the youngest woman:

In [51]:
# bad:
friends.where(~friends.girl).where(friends.age==friends.age.min())

Unnamed: 0,name,age,girl,born In,degree
0,,,,,
1,,,,,
2,,,,,
3,,,,,
4,,,,,
5,,,,,


That's the advantage of **query**:

In [52]:
friends.query('~girl').query('age==age.min()')
#query operates on the last subset! This is a big advantage from the where() function that works on the whole DF unless you specify
#remember that for queries 

Unnamed: 0,name,age,girl,born In,degree
2,Pablo,28,False,Spain,Master


In [62]:
#but

students=friends.copy() #makes a copy of friends

students.where(~students.girl,inplace=True) #makes the subset and immediately changes the original subset
students.where(students.age==students.age.min())['born In']


0      NaN
1      NaN
2    Spain
3      NaN
4      NaN
5      NaN
Name: born In, dtype: object

Let's vary the data a little:

In [63]:
#NOTE: pauline and pablo have the same age which muddles the results (see below)

names=["Tomás", "Pauline", "Pablo", "Bjork","Alan","Juana"]
woman=[False,True,False,False,False,True]
ages=[32,28,28,30,32,27]
country=["Chile", "Senegal", "Spain", "Norway","Peru","Peru"]
education=["Bach", "Bach", "Master", "PhD","Bach","Master"]

# now in a dict:
data={'name':names, 'age':ages, 'girl':woman,'born In':country, 'degree':education}

#now into a DF
import pandas as pd

friends2=pd.DataFrame.from_dict(data)
# seeing it:
friends2

Unnamed: 0,name,age,girl,born In,degree
0,Tomás,32,False,Chile,Bach
1,Pauline,28,True,Senegal,Bach
2,Pablo,28,False,Spain,Master
3,Bjork,30,False,Norway,PhD
4,Alan,32,False,Peru,Bach
5,Juana,27,True,Peru,Master


There is a girl with the same age as the youngest boy, then:

In [64]:
friends2.where(friends2.age==friends2[~friends2.girl].age.min()).dropna()
#friends2[~friends2.girl].age.min() = the age of the youngest boy but in this case there is also a girl with the same age

Unnamed: 0,name,age,girl,born In,degree
1,Pauline,28.0,1.0,Senegal,Bach
2,Pablo,28.0,0.0,Spain,Master


We need a previous strategy:

In [65]:
# bad implementation:
friends2.where(friends2.age==friends2[~friends2.girl].age.min() & friends2.girl==False).dropna()

ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().

In [66]:
# bad implementation:
friends2.where(friends2.age==friends2[~friends2.girl].age.min() & ~friends2.girl).dropna()

Unnamed: 0,name,age,girl,born In,degree


In [69]:
# just parentheses to make it work!
friends2.where((friends2.age==friends2[~friends2.girl].age.min()) & (~friends2.girl)).dropna()

Unnamed: 0,name,age,girl,born In,degree
2,Pablo,28.0,0.0,Spain,Master


This one still works!

In [71]:
friends2.query('~girl').query('age==age.min()')
#query is a little simpler since it works from the previous results vs. the where() function (see below where where() requires the inclusion of inplace=True)
#query may be simple but leaves out code which makes things less clear: this is slight disadvantage. 

Unnamed: 0,name,age,girl,born In,degree
2,Pablo,28,False,Spain,Master


In [72]:
students2=friends2.copy()

students2.where(~students2.girl,inplace=True) #real subset
students2.where(students2.age==students2.age.min()).dropna()

Unnamed: 0,name,age,girl,born In,degree
2,Pablo,28.0,0.0,Spain,Master
