### This is an Introductory notebook for Pandas library and Dataframes. In this notebook you can be familiar with: 
- How to store and manipulate single dimensional indexed data in the Series object
- Querying a Series
- DataFrame Data Structure
- DataFrame Indexing and Loading
- Querying DataFrames
- Indexing DataFrames
- Missing Values
- Manipulating DataFrames



In [1]:
import pandas as pd 


In [2]:
# List of three students
students = ['Alice', 'Jack', 'Molly']

# Call the Series function in pandas and pass in the students list
pd.Series(students)


0    Alice
1     Jack
2    Molly
dtype: object

In [3]:
# Values are indexed with integers. so we have two columns, one to be the index
# and the other is to be the data provided. the type is set to object.

# We also dont have to use strings. If we use whole numbers, then the type will be 
# set to int64. The Pandas library is built on the Numpy library

# Create a list of numbers
numbers = [1,2,3]

pd.Series(numbers)


0    1
1    2
2    3
dtype: int64

In [4]:
# Below is an explanation of how Pandas and Numpy deal with missing data

# In python, we have the none type to indicate a lack of data. If we have
# a list of string and one element is a None type, pandas inserts it as a None and uses the type object
# for the underlying array. 

# Difference between None and NaN: NaN can be used as a numerical value,
# whereas None is similar to "inexistent" or "empty" 

students = ['Alice', 'Jack', None]
pd.Series(students)

0    Alice
1     Jack
2     None
dtype: object

In [5]:
# NaN example
numbers = [1,2, None]
pd.Series(numbers)

0    1.0
1    2.0
2    NaN
dtype: float64

In [6]:
# Pandas represents NaN as a floating point number, and because integers can be typecast to floats,
# Pandas wnet and converted our integers to floats. 

# NaN is not equivalent to None


import numpy as np

np.nan == None

False

In [7]:
np.nan == None

False

In [10]:
# More about pandas series
# Series can be created directly from dictionary data.

students_scores = {
    'Alice':'Physics',
    'Jack': 'Chemistry',
    'Molly': 'English'
}

s = pd.Series(students_scores)
print(s)

# Print the index of the Series
print(s.index)

Alice      Physics
Jack     Chemistry
Molly      English
dtype: object
Index(['Alice', 'Jack', 'Molly'], dtype='object')


In [12]:
# More complex data. list of tuples
students = [('Alice','Brown'), ('Jack', 'White'), ('Molly', 'Green')]
pd.Series(students)

0    (Alice, Brown)
1     (Jack, White)
2    (Molly, Green)
dtype: object

In [14]:
# You can also separate you index creation from the data by passing in the index as a
# list explicitly to the series
s = pd.Series (['Physics', 'Chemistry', 'English'], index=['Alice', 'Jack', 'Molly'])
s

Alice      Physics
Jack     Chemistry
Molly      English
dtype: object

In [15]:
# What happens if your list of values in the index object are not 
# aligned with the keys in your dictionary for creating the series? 
# Pandas overrides the automatic creation to favor only and all of the indices values that you provided. 
# So it will ignore frmo your dictionary all keys which are not in your index, 
# and pandas will add None or NaN type values for any index value you provide, which is not in your dict list

# Example
students_scores = {'Alice': 'Physics',
                   'Jack' : 'Chemistry',
                   'Molly': 'English'}

s = pd.Series(students_scores, index=['Alice', 'Molly' ,'Sam'])

print(s)

Alice    Physics
Molly    English
Sam          NaN
dtype: object


### Querying a Series

In [16]:
# A pandas Series can be queried either by the index position or the index label.
# To query by the index label, you can use the loc attribute

students_scores = {'Alice': 'Physics',
                   'Jack' : 'Chemistry',
                   'Molly': 'English',
                   'Sam': 'History'}

s = pd.Series(students_scores)
s

Alice      Physics
Jack     Chemistry
Molly      English
Sam        History
dtype: object

In [17]:
# To query the fourth entry:
s.iloc[3]

'History'

In [18]:
# Tp query Molly's 
s.loc['Molly']

'English'

In [19]:
# iloc and loc are not methods, they are attributes. So you don't use parentheses
# to query them, but square brackets instead, which is called the indexing operator.

# You can also do this instead of using iloc and loc

print(s[3])
print(s['Molly'])

History
English


In [20]:
# What happens if your index is a list of integers? It is best to use the loc and iloc directly

# Example where classes are indexed by class codes 
class_code = {99: 'Physics',
              100: 'Chemistry',
               101: 'English', 
                102: 'History'}

s = pd.Series(class_code)

# if we try to call s[0]
s[0]

KeyError: 0

In [22]:
# But if we call the item using iloc
s.iloc[1]


'Chemistry'

In [23]:
# Get data out of the series
# An approach is to iterate over the items in the series, and invoke the operation
# one is interested in.

grades = pd.Series([90,80,70,60])

total = 0
for grade in grades:
    total+= grade
print(total/len(grades))

75.0


In [24]:
# This works but it's slow when dealing with large data because of the iteration.
# It's best to use vectorization

total = np.sum(grades)
print(total/len(grades))

75.0


In [31]:
# Appending series together
# A pandas Series can be queried either by the index position or the index label.
# To query by the index label, you can use the loc attribute

import pandas as pd

students_classes = pd.Series({'Alice': 'Physics',
                   'Jack' : 'Chemistry',
                   'Molly': 'English',
                   'Sam': 'History'})

kelly_classes = pd.Series(['Philosophy', 'Arts'], index = ['Kelly', 'Kelly'])
all_students = pd.concat([students_classes, kelly_classes])
print(all_students)


Alice       Physics
Jack      Chemistry
Molly       English
Sam         History
Kelly    Philosophy
Kelly          Arts
dtype: object


## DataFrame and Data Structure

In [33]:
# Example
record1 = pd.Series({
            'Name' : 'Alice',
            'CLass':'Physics',
            'Score': 85
})

record2 = pd.Series({
            'Name' : 'Jack',
            'CLass':'Chemistry',
            'Score': 82
})

record3 = pd.Series({
            'Name' : 'Helen',
            'CLass':'Biology',
            'Score': 90
})

In [34]:
# Like a Series, the DF object is index. 
df = pd.DataFrame([record1, record2, record3], index=['School1', 'School2', 'School1'])

df.head()

Unnamed: 0,Name,CLass,Score
School1,Alice,Physics,85
School2,Jack,Chemistry,82
School1,Helen,Biology,90


In [36]:
# Outputs the index, similar to Series and the rest of the content

# You can also do it in alternative way that uses a list of dictionaries

students = [
            {'Name' : 'Alice',
            'CLass': 'Physics',
            'Score': 85},
            {'Name' : 'Jack',
            'CLass':'Chemistry',
            'Score':82},
            {'Name' : 'Helen',
            'CLass':'Biology',
            'Score':90}
            ]

# Pass this list of dictionaries into the DataFrame function
df = pd.DataFrame(students, index=['school1', 'school2', 'school1'])
df.head()

Unnamed: 0,Name,CLass,Score
school1,Alice,Physics,85
school2,Jack,Chemistry,82
school1,Helen,Biology,90


In [39]:
# You can select data associated with school2, we would just query 
# the .loc attribute with one parameter
print(df.loc['school2'])

# You can also check the data type of the return
type(df.loc['school2'])

Name          Jack
CLass    Chemistry
Score           82
Name: school2, dtype: object


pandas.core.series.Series

In [40]:
# Lets try this with school1 
print(df.loc['school1'])

          Name    CLass  Score
school1  Alice  Physics     85
school1  Helen  Biology     90


In [41]:
# If we are only interested in shcool1's student names
df.loc['school1', 'Name']

school1    Alice
school1    Helen
Name: Name, dtype: object

In [43]:
# What do we do if we want to select a single column through
# Can do it in multiple ways. But firstly, we could transpose the matrix.
# This pivots all of the rows into columns and all the columns intro rows
df.T

Unnamed: 0,school1,school2,school1.1
Name,Alice,Jack,Helen
CLass,Physics,Chemistry,Biology
Score,85,82,90


In [44]:
# Then we can call .loc on the transpose to get the student names only
df.T.loc['Name']

school1    Alice
school2     Jack
school1    Helen
Name: Name, dtype: object

In [47]:
# You could also choose the column by calling the dataframe with the label of the column in the brackets
print(df['Name'])

#df.loc['Name'] does not work bc it is used for rows only

type(df['Name'])

school1    Alice
school2     Jack
school1    Helen
Name: Name, dtype: object


pandas.core.series.Series

In [50]:
# You can specify the rows you want and the specific column requested
print(df.loc['school1']['Name'])  # This yields a Series
print(df.loc['school1']) # This yields a DataFrame

school1    Alice
school1    Helen
Name: Name, dtype: object
          Name    CLass  Score
school1  Alice  Physics     85
school1  Helen  Biology     90


In [53]:
# If we want to select all rows, we can use a colon to indicate a full slice

# Example, ask for all names and scores for all shcools
print(df.loc[:,['Name', 'Score']]) # rows, columns

          Name  Score
school1  Alice     85
school2   Jack     82
school1  Helen     90


In [57]:
# To drop columns in DataFrames, we can use the drop attribute of pandas df

df.drop('school1') # drop rows with school1

Unnamed: 0,Name,CLass,Score
school2,Jack,Chemistry,82


In [58]:
df

Unnamed: 0,Name,CLass,Score
school1,Alice,Physics,85
school2,Jack,Chemistry,82
school1,Helen,Biology,90


In [60]:
# Drop has two optional parameters, The first is called inplace, 
# and if i's set to true, the DataFrame will be updated in place, instead of a copy being returned
# The second parameter is the axes, which should be dropped.By default, this value is 0
# indicating the row axis, But if you want to drop a column, then specify axis to be 1

copy_df = df.copy()

# Drop the name column
copy_df.drop('Name', inplace=True, axis=1)
copy_df

Unnamed: 0,CLass,Score
school1,Physics,85
school2,Chemistry,82
school1,Biology,90


In [63]:
# The second way to drop a column, and that's directly through the use of indexing
# operator, using the del keyword. This way of dropping data, however, takes immediate
# effect ont he DataFrame and does not return a view
del copy_df['CLass']
copy_df

Unnamed: 0,Score
school1,85
school2,82
school1,90


In [64]:
# To add a column

df['ClassRanking'] = None
df

Unnamed: 0,Name,CLass,Score,ClassRanking
school1,Alice,Physics,85,
school2,Jack,Chemistry,82,
school1,Helen,Biology,90,


## Load data from a comma separated file into a dataframe

In [1]:
!cat resources/Admission_Predict.csv

Serial No.,GRE Score,TOEFL Score,University Rating,SOP,LOR ,CGPA,Research,Chance of Admit 
1,337,118,4,4.5,4.5,9.65,1,0.92
2,324,107,4,4,4.5,8.87,1,0.76
3,316,104,3,3,3.5,8,1,0.72
4,322,110,3,3.5,2.5,8.67,1,0.8
5,314,103,2,2,3,8.21,0,0.65
6,330,115,5,4.5,3,9.34,1,0.9
7,321,109,3,3,4,8.2,1,0.75
8,308,101,2,3,4,7.9,0,0.68
9,302,102,1,2,1.5,8,0,0.5
10,323,108,3,3.5,3,8.6,0,0.45
11,325,106,3,3.5,4,8.4,1,0.52
12,327,111,4,4,4.5,9,1,0.84
13,328,112,4,4,4.5,9.1,1,0.78
14,307,109,3,4,3,8,1,0.62
15,311,104,3,3.5,2,8.2,1,0.61
16,314,105,3,3.5,2.5,8.3,0,0.54
17,317,107,3,4,3,8.7,0,0.66
18,319,106,3,4,3,8,1,0.65
19,318,110,3,4,3,8.8,0,0.63
20,303,102,3,3.5,3,8.5,0,0.62
21,312,107,3,3,2,7.9,1,0.64
22,325,114,4,3,2,8.4,0,0.7
23,328,116,5,5,5,9.5,1,0.94
24,334,119,5,5,4.5,9.7,1,0.95
25,336,119,5,4,3.5,9.8,1,0.97
26,340,120,5,4.5,4.5,9.6,1,0.94
27,322,109,5,4.5,3.5,8.8,0,0.76
28,298,98,2,1.5,2.5,7.5,1,0.44
29,295,93,1,2,2,7.2,0,0.46
30,310,99,2,1.5,2,7.3,0,0.54
31,300,97,2,3,3,8.1,1,0.65
32,327,103,3,

In [3]:
# Pandas makes it easy to turn a CSV file into a datafram, using the read_csv()
import pandas as pd 
df = pd.read_csv('./resources/Admission_Predict.csv')

df.head()

Unnamed: 0,Serial No.,GRE Score,TOEFL Score,University Rating,SOP,LOR,CGPA,Research,Chance of Admit
0,1,337,118,4,4.5,4.5,9.65,1,0.92
1,2,324,107,4,4.0,4.5,8.87,1,0.76
2,3,316,104,3,3.0,3.5,8.0,1,0.72
3,4,322,110,3,3.5,2.5,8.67,1,0.8
4,5,314,103,2,2.0,3.0,8.21,0,0.65


In [4]:
# By default, index starts with a 0 while the students' serial number
# starts from 1. If you jump back to the CSV output you'll deduce that pandas has create a new index.
# Instead we can set the serial no. as the index. If we want to by using the_index_col
df = pd.read_csv('./resources/Admission_Predict.csv', index_col=0)

df.head()

Unnamed: 0_level_0,GRE Score,TOEFL Score,University Rating,SOP,LOR,CGPA,Research,Chance of Admit
Serial No.,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1
1,337,118,4,4.5,4.5,9.65,1,0.92
2,324,107,4,4.0,4.5,8.87,1,0.76
3,316,104,3,3.0,3.5,8.0,1,0.72
4,322,110,3,3.5,2.5,8.67,1,0.8
5,314,103,2,2.0,3.0,8.21,0,0.65


In [6]:
# Rename columns SOP and LOR
new_df = df.rename(columns={
        'GRE Score' : 'GRE Score',
        'TOEFL Score' : 'TOEFL Score', 
        'University Rating' : 'University Rating',
        'SOP' : 'Statement of Purpose',
        'LOR' : 'Letter of Recommendation',
        'CGPA' : 'CGPA',
        'Research' : 'Researcha'
})

new_df.head()

Unnamed: 0_level_0,GRE Score,TOEFL Score,University Rating,Statement of Purpose,LOR,CGPA,Researcha,Chance of Admit
Serial No.,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1
1,337,118,4,4.5,4.5,9.65,1,0.92
2,324,107,4,4.0,4.5,8.87,1,0.76
3,316,104,3,3.0,3.5,8.0,1,0.72
4,322,110,3,3.5,2.5,8.67,1,0.8
5,314,103,2,2.0,3.0,8.21,0,0.65


In [7]:
# Only the SOR changed. Let's investigate why the LOR didnt change
new_df.columns

Index(['GRE Score', 'TOEFL Score', 'University Rating', 'Statement of Purpose',
       'LOR ', 'CGPA', 'Researcha', 'Chance of Admit '],
      dtype='object')

In [8]:
# We can see that LOR actually has a space

# You can include the column by adding the space after LOR
new_df = new_df.rename(columns={'LOR ': 'Letter of Recommendation'})
new_df.head()

Unnamed: 0_level_0,GRE Score,TOEFL Score,University Rating,Statement of Purpose,Letter of Recommendation,CGPA,Researcha,Chance of Admit
Serial No.,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1
1,337,118,4,4.5,4.5,9.65,1,0.92
2,324,107,4,4.0,4.5,8.87,1,0.76
3,316,104,3,3.0,3.5,8.0,1,0.72
4,322,110,3,3.5,2.5,8.67,1,0.8
5,314,103,2,2.0,3.0,8.21,0,0.65


In [9]:
# But that way is fragile and not robust, instead, we can create a function
# that does the cleaning and then tell renamed to apply that function across all the data
# Python comes with a handy string function to strip white space called strip().

new_df = new_df.rename(mapper=str.strip, axis='columns')
new_df.head()

Unnamed: 0_level_0,GRE Score,TOEFL Score,University Rating,Statement of Purpose,Letter of Recommendation,CGPA,Researcha,Chance of Admit
Serial No.,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1
1,337,118,4,4.5,4.5,9.65,1,0.92
2,324,107,4,4.0,4.5,8.87,1,0.76
3,316,104,3,3.0,3.5,8.0,1,0.72
4,322,110,3,3.5,2.5,8.67,1,0.8
5,314,103,2,2.0,3.0,8.21,0,0.65


In [10]:
# We can also use the df.columns attribute by assigning to it a list of column names
# which will directly rename the columns. This wont be affected by minor problems such as the white space
# we encountered. 

cols = list(df.columns)
# Then a little list comprehension
cols = [x.lower().strip() for x in cols]
# Overwrite the columns attribute
df.columns = cols
# Check the results
df.head()

Unnamed: 0_level_0,gre score,toefl score,university rating,sop,lor,cgpa,research,chance of admit
Serial No.,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1
1,337,118,4,4.5,4.5,9.65,1,0.92
2,324,107,4,4.0,4.5,8.87,1,0.76
3,316,104,3,3.0,3.5,8.0,1,0.72
4,322,110,3,3.5,2.5,8.67,1,0.8
5,314,103,2,2.0,3.0,8.21,0,0.65


## Querying a DataFrame

In [None]:
# First is to understand Boolean masking. It is the heart of fast and efficient querying
# in mnumpy and pandas, and its analogous to bit masking used in other areas of computational science.

# A boolean mask is an array which can be of one dimension like a series, or two dimensions like a dataframe
# where each of the values in the array are either TRUE or FALSE. This array is essentially overlaid on top of the data structure 
# that we're querying. And any cell aligned with the true value will be admitted into our final result.

In [11]:
import pandas as pd 

# Load CSV file
df = pd.read_csv('./resources/Admission_Predict.csv', index_col=0)

# Clean the columns
df.columns = [x.lower().strip() for x in df.columns]
df.head()


Unnamed: 0_level_0,gre score,toefl score,university rating,sop,lor,cgpa,research,chance of admit
Serial No.,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1
1,337,118,4,4.5,4.5,9.65,1,0.92
2,324,107,4,4.0,4.5,8.87,1,0.76
3,316,104,3,3.0,3.5,8.0,1,0.72
4,322,110,3,3.5,2.5,8.67,1,0.8
5,314,103,2,2.0,3.0,8.21,0,0.65


In [12]:
# Boolean masks are created by applying operators directly to the pandas Series or DataFrame objects.
# For instance, in our graduate admission dataset, we might be interested in seeing only those students
# that have a change higher than 0.7 of being admitted

# Project the change of admit column using the indexing operator and aply the 
# greate than operator with a comparison value of 0.7.

admit_mask=df['chance of admit'] > 0.7
admit_mask



Serial No.
1       True
2       True
3       True
4       True
5      False
       ...  
396     True
397     True
398     True
399    False
400     True
Name: chance of admit, Length: 400, dtype: bool

In [13]:
# The result of broadcasting a result operator is a boolean mask. 
# Thre result is a series object filled with either true or false values.

# What do you do with it after you got it? You can just lay it on top of the data to "hide" the data
# you don't want, which is represented by all of the False values. 

df.where(admit_mask).head()

Unnamed: 0_level_0,gre score,toefl score,university rating,sop,lor,cgpa,research,chance of admit
Serial No.,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1
1,337.0,118.0,4.0,4.5,4.5,9.65,1.0,0.92
2,324.0,107.0,4.0,4.0,4.5,8.87,1.0,0.76
3,316.0,104.0,3.0,3.0,3.5,8.0,1.0,0.72
4,322.0,110.0,3.0,3.5,2.5,8.67,1.0,0.8
5,,,,,,,,


In [14]:
# The result df keeps the original indexed values, and only data which met the condition
# All of the rows that did not meet the condition have NaN values instead

# Then we can drop the NaN values
df.where(admit_mask).dropna().head()

Unnamed: 0_level_0,gre score,toefl score,university rating,sop,lor,cgpa,research,chance of admit
Serial No.,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1
1,337.0,118.0,4.0,4.5,4.5,9.65,1.0,0.92
2,324.0,107.0,4.0,4.0,4.5,8.87,1.0,0.76
3,316.0,104.0,3.0,3.0,3.5,8.0,1.0,0.72
4,322.0,110.0,3.0,3.5,2.5,8.67,1.0,0.8
6,330.0,115.0,5.0,4.5,3.0,9.34,1.0,0.9


In [17]:
# There is a shorthand version that uses where() and dropna() together.

df[df['chance of admit'] > 0.7].head()

Unnamed: 0_level_0,gre score,toefl score,university rating,sop,lor,cgpa,research,chance of admit
Serial No.,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1
1,337,118,4,4.5,4.5,9.65,1,0.92
2,324,107,4,4.0,4.5,8.87,1,0.76
3,316,104,3,3.0,3.5,8.0,1,0.72
4,322,110,3,3.5,2.5,8.67,1,0.8
6,330,115,5,4.5,3.0,9.34,1,0.9


In [18]:
# It can also be called with a string parameter to project a single column
df['gre score'].head()

Serial No.
1    337
2    324
3    316
4    322
5    314
Name: gre score, dtype: int64

In [19]:
# or you can send it a list of columns as strings
df[['gre score', 'toefl score']].head()

Unnamed: 0_level_0,gre score,toefl score
Serial No.,Unnamed: 1_level_1,Unnamed: 2_level_1
1,337,118
2,324,107
3,316,104
4,322,110
5,314,103


In [20]:
# Or you can send it a boolean mask
df[df['gre score']>320].head()

Unnamed: 0_level_0,gre score,toefl score,university rating,sop,lor,cgpa,research,chance of admit
Serial No.,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1
1,337,118,4,4.5,4.5,9.65,1,0.92
2,324,107,4,4.0,4.5,8.87,1,0.76
4,322,110,3,3.5,2.5,8.67,1,0.8
6,330,115,5,4.5,3.0,9.34,1,0.9
7,321,109,3,3.0,4.0,8.2,1,0.75


In [21]:
# Combining multiple boolean masks

# take two boolean series and add them together
(df['chance of admit'] > 0.7) and (df['chance of admit'] < 0.9)

ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().

In [22]:
# It doesn't work because Python underneath doesn't know how to compare two series using and or or.
# Instead, the pandas authors have overwritten the pipe | and amepersand & operators to handle this for us

(df['chance of admit'] > 0.7) & (df['chance of admit'] < 0.9)

Serial No.
1      False
2       True
3       True
4       True
5      False
       ...  
396     True
397     True
398    False
399    False
400    False
Name: chance of admit, Length: 400, dtype: bool

In [23]:
# One thing to look out for is the order of operations. If you try to do boolean comparisons using the &
# operator but not putting the parenthes, it won't work

df['chance of admit'] > 0.7 & df['chance of admit'] < 0.9

TypeError: Cannot perform 'rand_' with a dtyped [float64] array and scalar of type [bool]

In [24]:
# Instead, you can use something like this 
df['chance of admit'].gt(0.7) & df['chance of admit'].lt(0.9)

Serial No.
1      False
2       True
3       True
4       True
5      False
       ...  
396     True
397     True
398    False
399    False
400    False
Name: chance of admit, Length: 400, dtype: bool

In [25]:
# or even this
df['chance of admit'].gt(0.7).lt(0.9)

Serial No.
1      False
2      False
3      False
4      False
5       True
       ...  
396    False
397    False
398    False
399     True
400    False
Name: chance of admit, Length: 400, dtype: bool