## Introduction to Pandas

Source: [DataCamp](https://www.datacamp.com/community/tutorials/pandas-tutorial-dataframe-python)

Pandas is a popular Python package for data science, and with good reason: it offers powerful, expressive and flexible data structures that make data manipulation and analysis easy, among many other things. The DataFrame is one of these structures.

Pandas is built on top of NumPy.

In [1]:
# import pandas and numpy

import pandas as pd
import numpy as np

#### DataFrames

DataFrames are the workhorse of pandas and are directly inspired by the R programming language. 
We can think of a DataFrame as a bunch of NumPy Series objects put together to share the same index.

In [2]:
# Simple Dataframe

tbl = [["A",92], ["B+",86]]
sample_df = pd.DataFrame(tbl,columns=["Letter Grade","Numeric Base"])

# Dump
sample_df


Unnamed: 0,Letter Grade,Numeric Base
0,A,92
1,B+,86


In [4]:
# Copy data structure from Canvas

students = [{"name":"Alice","course":"BSME","strand":"STEM"},
            {"name":"Bob","course":"BSME","strand":"HUMSS"},
            {"name":"Carol","course":"BSITE","strand":"ABM"},
            {"name":"Charlie","course":"BSITE","strand":"ABM"},
            {"name":"Chuck","course":"BSLM","strand":"ABM"},
            {"name":"Charlie","course":"BSME","strand":"ABM"},
            {"name":"Dave","course":"BSMGTH","strand":"GAS"},
            {"name":"Eve","course":"BSMGT","strand":"HUMSS"},
            {"name":"Frank","course":"BSITE","strand":"Arts and Design"},
            ]



# Dump list

listdf = pd.DataFrame(students)

#students

listdf

Unnamed: 0,name,course,strand
0,Alice,BSME,STEM
1,Bob,BSME,HUMSS
2,Carol,BSITE,ABM
3,Charlie,BSITE,ABM
4,Chuck,BSLM,ABM
5,Charlie,BSME,ABM
6,Dave,BSMGTH,GAS
7,Eve,BSMGT,HUMSS
8,Frank,BSITE,Arts and Design


In [6]:
# rearrange columns

listdf[["course","strand"]]
# dump

Unnamed: 0,course,strand
0,BSME,STEM
1,BSME,HUMSS
2,BSITE,ABM
3,BSITE,ABM
4,BSLM,ABM
5,BSME,ABM
6,BSMGTH,GAS
7,BSMGT,HUMSS
8,BSITE,Arts and Design


Lets say we would like to splice our data frame and select only specific portions of our data.  There are three different ways of doing so.
```
1. .loc[]
2. .iloc[]
3. .ix()
```

In [28]:
# filter where name == "Alice"

alice_condition=listdf["name"]=="Alice"
#listdf.loc[:,"name"]=="Alice"

listdf.loc[alice_condition,:]

Unnamed: 0,name,course,strand
0,Alice,BSME,STEM


In [26]:
listdf

Unnamed: 0,name,course,strand
0,Alice,BSME,STEM
1,Bob,BSME,HUMSS
2,Carol,BSITE,ABM
3,Charlie,BSITE,ABM
4,Chuck,BSLM,ABM
5,Charlie,BSME,ABM
6,Dave,BSMGTH,GAS
7,Eve,BSMGT,HUMSS
8,Frank,BSITE,Arts and Design


In [31]:
# filter where course=="BSME"

bsme_condition = listdf.loc[:,"course"]=="BSME"
listdf.loc[bsme_condition,:]

Unnamed: 0,name,course,strand
0,Alice,BSME,STEM
1,Bob,BSME,HUMSS
5,Charlie,BSME,ABM


In [33]:
listdf.loc[listdf.loc[:,"course"]=="BSME"]

Unnamed: 0,name,course,strand
0,Alice,BSME,STEM
1,Bob,BSME,HUMSS
5,Charlie,BSME,ABM


.loc[] takes two single/list/range operator separated by ','. The first one indicates the row and the second one indicates columns.

In [34]:
# Use .loc, filter course=="BSMGT"

listdf.loc[listdf.loc[:,"course"]=="BSMGT"]


Unnamed: 0,name,course,strand
7,Eve,BSMGT,HUMSS


In [35]:
# slice rows and columns, course=="BSMGTH", column: strand

listdf.loc[listdf["course"]=="BSMGTH",["strand"]]



Unnamed: 0,strand
6,GAS


### .iloc
.iloc[] is integer based slicing, whereas .loc[] used labels/column names. Here are some examples:

In [36]:
# get row 0

listdf.iloc[0,:]

name      Alice
course     BSME
strand     STEM
Name: 0, dtype: object

In [37]:
# Get first two rows using .iloc

listdf.iloc[0:2,:]

Unnamed: 0,name,course,strand
0,Alice,BSME,STEM
1,Bob,BSME,HUMSS


#### Length and Size

In [40]:
# len
print(len(listdf))

# shape
print(listdf.shape)

# shape[0]
print(listdf.shape[0])


9
(9, 3)
9


In [41]:
listdf

Unnamed: 0,name,course,strand
0,Alice,BSME,STEM
1,Bob,BSME,HUMSS
2,Carol,BSITE,ABM
3,Charlie,BSITE,ABM
4,Chuck,BSLM,ABM
5,Charlie,BSME,ABM
6,Dave,BSMGTH,GAS
7,Eve,BSMGT,HUMSS
8,Frank,BSITE,Arts and Design


#### Add column

In [43]:
# randomize cumulative QPI

listdf["qpi"] = np.random.randint(250,400,len(listdf))/100

# dump
listdf



Unnamed: 0,name,course,strand,qpi
0,Alice,BSME,STEM,3.21
1,Bob,BSME,HUMSS,3.36
2,Carol,BSITE,ABM,3.84
3,Charlie,BSITE,ABM,3.28
4,Chuck,BSLM,ABM,2.68
5,Charlie,BSME,ABM,3.84
6,Dave,BSMGTH,GAS,3.81
7,Eve,BSMGT,HUMSS,3.98
8,Frank,BSITE,Arts and Design,3.11


In [44]:
listdf.columns

Index(['name', 'course', 'strand', 'qpi'], dtype='object')

#### Apply Function

In [45]:
# Copy function definition from Canvas


def latin_honors(qpi):
     if qpi >= 3.87:
        return "Summa Cum Laude"
     elif qpi >= 3.70:
        return "Magna Cum Laude"
     elif qpi >= 3.50:
        return "Cum Laude"
     elif qpi >= 3.35:
        return "Honorable Mention"
     else:
        return "NA"






In [46]:
# apply function latin_honors

listdf["qpi"].apply(latin_honors)


0                   NA
1    Honorable Mention
2      Magna Cum Laude
3                   NA
4                   NA
5      Magna Cum Laude
6      Magna Cum Laude
7      Summa Cum Laude
8                   NA
Name: qpi, dtype: object

In [47]:
listdf["qpi"]

0    3.21
1    3.36
2    3.84
3    3.28
4    2.68
5    3.84
6    3.81
7    3.98
8    3.11
Name: qpi, dtype: float64

#### Create new column based on function

In [48]:
# create honors column
listdf["honors"] = listdf["qpi"].apply(latin_honors)
#dump
listdf


Unnamed: 0,name,course,strand,qpi,honors
0,Alice,BSME,STEM,3.21,
1,Bob,BSME,HUMSS,3.36,Honorable Mention
2,Carol,BSITE,ABM,3.84,Magna Cum Laude
3,Charlie,BSITE,ABM,3.28,
4,Chuck,BSLM,ABM,2.68,
5,Charlie,BSME,ABM,3.84,Magna Cum Laude
6,Dave,BSMGTH,GAS,3.81,Magna Cum Laude
7,Eve,BSMGT,HUMSS,3.98,Summa Cum Laude
8,Frank,BSITE,Arts and Design,3.11,


In [49]:
# Download file from Canvas


# Load csv file to dataframe

# assign file name to filename variable
filename = "senior_high_schools.csv"

# load to dataframe
df = pd.read_csv(filename)


In [51]:
# head
df.head()


Unnamed: 0,TAXONOMY,MUNICIPALITY,SCHOOL ID,SCHOOL NAME,PROGRAM OFFERINGS
0,United Arab Emirates,United International Private School,ABM|HUMSS|STEM,Claret School of Lamitan,STEM|ABM|GAS|TVL
1,ARMM|Basilan,City of Lamitan,406921,"Hardam Furigay Colleges Foundation, Inc.",ABM|GAS|TVL
2,ARMM|Basilan,City of Lamitan,407069,"The Mariam School of Nursing, Inc.",TVL
3,ARMM|Basilan,Maluso,406059,Claret School of Maluso,GAS|TVL
4,ARMM|Basilan,Sumisip,406060,Claret School of Tumahubong,GAS|TVL


In [52]:
# get first 20 entries

df.head(20)


Unnamed: 0,TAXONOMY,MUNICIPALITY,SCHOOL ID,SCHOOL NAME,PROGRAM OFFERINGS
0,United Arab Emirates,United International Private School,ABM|HUMSS|STEM,Claret School of Lamitan,STEM|ABM|GAS|TVL
1,ARMM|Basilan,City of Lamitan,406921,"Hardam Furigay Colleges Foundation, Inc.",ABM|GAS|TVL
2,ARMM|Basilan,City of Lamitan,407069,"The Mariam School of Nursing, Inc.",TVL
3,ARMM|Basilan,Maluso,406059,Claret School of Maluso,GAS|TVL
4,ARMM|Basilan,Sumisip,406060,Claret School of Tumahubong,GAS|TVL
5,ARMM|Lanao del Sur,Balabagan,406063,San Isidro High School of Balabagan,ABM|GAS|TVL
6,ARMM|Lanao del Sur,Ditsaan-Ramain,600120,Adiong Memorial Polytechnic State College,ABM|HUMSS|TVL
7,ARMM|Lanao del Sur,Malabang,406066,Our Lady of Peace High School,HUMSS
8,ARMM|Lanao del Sur,Malabang,476002,Felix A. Panganiban (FAP) Academy of the Phili...,ABM|GAS
9,ARMM|Lanao del Sur,Marawi City,406008,Jamiatul Philippine Al-Islamia,ABM|STEM|HUMSS|TVL


#### Count Rows


In [53]:
# use shape
df.shape



(4905, 5)

In [54]:
# shape[0]
df.shape[0]



4905

In [55]:
# len
len(df)


4905

#### Get columns

In [56]:
# .columns
df.columns



Index(['TAXONOMY', 'MUNICIPALITY', 'SCHOOL ID', 'SCHOOL NAME',
       'PROGRAM OFFERINGS'],
      dtype='object')

In [60]:
# list comprehension of columns
#list(df.columns)
[c for c in df.columns]

['TAXONOMY', 'MUNICIPALITY', 'SCHOOL ID', 'SCHOOL NAME', 'PROGRAM OFFERINGS']

#### Get specific colum

In [65]:
# SCHOOL NAME

df["SCHOOL NAME"]

0                       Claret School of Lamitan
1       Hardam Furigay Colleges Foundation, Inc.
2             The Mariam School of Nursing, Inc.
3                        Claret School of Maluso
4                    Claret School of Tumahubong
                          ...                   
4900                           Middle East|Qatar
4901            Middle East|United Arab Emirates
4902                        United Arab Emirates
4903             The Philippine School Abu Dhabi
4904                          ABM|STEM|HUMSS|GAS
Name: SCHOOL NAME, Length: 4905, dtype: object

In [66]:
# head
df["SCHOOL NAME"].head()



0                    Claret School of Lamitan
1    Hardam Furigay Colleges Foundation, Inc.
2          The Mariam School of Nursing, Inc.
3                     Claret School of Maluso
4                 Claret School of Tumahubong
Name: SCHOOL NAME, dtype: object

#### Get multiple columns

In [67]:
# pass a list of column names: SCHOOL NAME, PROGRAM OFFERINGS

df[["SCHOOL NAME","PROGRAM OFFERINGS"]]


Unnamed: 0,SCHOOL NAME,PROGRAM OFFERINGS
0,Claret School of Lamitan,STEM|ABM|GAS|TVL
1,"Hardam Furigay Colleges Foundation, Inc.",ABM|GAS|TVL
2,"The Mariam School of Nursing, Inc.",TVL
3,Claret School of Maluso,GAS|TVL
4,Claret School of Tumahubong,GAS|TVL
...,...,...
4900,Middle East|Qatar,Qatar
4901,Middle East|United Arab Emirates,United Arab Emirates
4902,United Arab Emirates,Philippine-Emirates Private School (PISCO Priv...
4903,The Philippine School Abu Dhabi,STEM|ABM


In [68]:
# head

df[["SCHOOL NAME","PROGRAM OFFERINGS"]].head()


Unnamed: 0,SCHOOL NAME,PROGRAM OFFERINGS
0,Claret School of Lamitan,STEM|ABM|GAS|TVL
1,"Hardam Furigay Colleges Foundation, Inc.",ABM|GAS|TVL
2,"The Mariam School of Nursing, Inc.",TVL
3,Claret School of Maluso,GAS|TVL
4,Claret School of Tumahubong,GAS|TVL


In [69]:
# TAXONOMY, SCHOOL NAME, PROGRAM OFFERINGS (head)

df[["TAXONOMY","SCHOOL NAME","PROGRAM OFFERINGS"]].head()



Unnamed: 0,TAXONOMY,SCHOOL NAME,PROGRAM OFFERINGS
0,United Arab Emirates,Claret School of Lamitan,STEM|ABM|GAS|TVL
1,ARMM|Basilan,"Hardam Furigay Colleges Foundation, Inc.",ABM|GAS|TVL
2,ARMM|Basilan,"The Mariam School of Nursing, Inc.",TVL
3,ARMM|Basilan,Claret School of Maluso,GAS|TVL
4,ARMM|Basilan,Claret School of Tumahubong,GAS|TVL


In [71]:
# TAXONOMY
df[["TAXONOMY"]]


Unnamed: 0,TAXONOMY
0,United Arab Emirates
1,ARMM|Basilan
2,ARMM|Basilan
3,ARMM|Basilan
4,ARMM|Basilan
...,...
4900,
4901,
4902,Far Eastern Private School - Al Shahba Campus
4903,STEM|ABM|HUMSS|GAS


In [73]:
print(type(df["TAXONOMY"]))
print(type(df[["TAXONOMY"]]))

<class 'pandas.core.series.Series'>
<class 'pandas.core.frame.DataFrame'>


#### Get Unique Values

In [74]:
# TAXONOMY, unique()

df["TAXONOMY"].unique()

array(['United Arab Emirates', 'ARMM|Basilan', 'ARMM|Lanao del Sur',
       'ARMM|Maguindanao', 'ARMM|Sulu', 'ARMM|Tawi-Tawi', 'CAR|Abra',
       'CAR|Apayao', 'CAR|Benguet', 'CAR|Ifugao', 'CAR|Kalinga',
       'CAR|Mt. Province', 'CARAGA|Agusan del Norte',
       'CARAGA|Agusan del Sur', 'CARAGA|Dinagat Islands',
       'CARAGA|Surigao del Norte', 'CARAGA|Surigao del Sur',
       'NCR|Metro Manila', 'NIR|Negros Occidental', 'NIR|Negros Oriental',
       'Region I|Ilocos Norte', 'Region I|Ilocos Sur',
       'Region I|La Union', 'Region I|Pangasinan', 'Region II|Cagayan',
       'Region II|Isabela', 'Region II|Nueva Vizcaya',
       'Region II|Quirino', 'Region III|Aurora', 'Region III|Bataan',
       'Region III|Bulacan', 'Region III|Nueva Ecija',
       'Region III|Pampanga', 'Region III|Tarlac', 'Region III|Zambales',
       'Region IV-A|Batangas', 'Region IV-A|Cavite', 'Region IV-A|Laguna',
       'Region IV-A|Quezon', 'Region IV-A|Rizal',
       'Region IV-B|Marinduque', 'Region I

#### Conditional Expression

In [78]:
# TAXONOMY = "NCR|Metro Manila"

is_ncr = df["TAXONOMY"]=="NCR|Metro Manila"
df[is_ncr]
ncr_df = df[is_ncr]
ncr_df

Unnamed: 0,TAXONOMY,MUNICIPALITY,SCHOOL ID,SCHOOL NAME,PROGRAM OFFERINGS
263,NCR|Metro Manila,City of Makati,407698,"Globetek Science Foundation, Inc.",STEM|ABM|HUMSS|GAS
264,NCR|Metro Manila,Binondo,406330,Lorenzo Ruiz Academy,ABM|STEM
265,NCR|Metro Manila,Binondo,406335,Tiong Se Academy,GAS|ABM
266,NCR|Metro Manila,Caloocan City,400942,ABE International Business College-Caloocan Ca...,ABM|TVL
267,NCR|Metro Manila,Caloocan City,400947,AMA Computer College-Caloocan City,STEM|ABM|HUMSS|GAS|TVL
...,...,...,...,...,...
1014,NCR|Metro Manila,Valenzuela City,487015,"St. Michael School of Canumay, Inc.",GAS|HUMSS|ABM
1015,NCR|Metro Manila,Valenzuela City,487020,Nuestra Señora de Guia Academy,ABM|HUMSS|GAS
1016,NCR|Metro Manila,Valenzuela City,487031,St. Gregory College of Valenzuela City,ABM|HUMSS|STEM|GAS
1017,NCR|Metro Manila,Valenzuela City,487036,Sta. Cecilia College,ABM|HUMSS|GAS|TVL


#### Update columns

In [79]:
# assign "NCR" to "TAXONOMY" column

ncr_df["TAXONOMY"] = "NCR"


# Note: you may get an error here

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  This is separate from the ipykernel package so we can avoid doing imports until


In [80]:
# Fix using .copy()
# (copy and paste code above and modify with .copy())


is_ncr = df["TAXONOMY"]=="NCR|Metro Manila"
df[is_ncr]
ncr_df = df[is_ncr].copy()
ncr_df

Unnamed: 0,TAXONOMY,MUNICIPALITY,SCHOOL ID,SCHOOL NAME,PROGRAM OFFERINGS
263,NCR|Metro Manila,City of Makati,407698,"Globetek Science Foundation, Inc.",STEM|ABM|HUMSS|GAS
264,NCR|Metro Manila,Binondo,406330,Lorenzo Ruiz Academy,ABM|STEM
265,NCR|Metro Manila,Binondo,406335,Tiong Se Academy,GAS|ABM
266,NCR|Metro Manila,Caloocan City,400942,ABE International Business College-Caloocan Ca...,ABM|TVL
267,NCR|Metro Manila,Caloocan City,400947,AMA Computer College-Caloocan City,STEM|ABM|HUMSS|GAS|TVL
...,...,...,...,...,...
1014,NCR|Metro Manila,Valenzuela City,487015,"St. Michael School of Canumay, Inc.",GAS|HUMSS|ABM
1015,NCR|Metro Manila,Valenzuela City,487020,Nuestra Señora de Guia Academy,ABM|HUMSS|GAS
1016,NCR|Metro Manila,Valenzuela City,487031,St. Gregory College of Valenzuela City,ABM|HUMSS|STEM|GAS
1017,NCR|Metro Manila,Valenzuela City,487036,Sta. Cecilia College,ABM|HUMSS|GAS|TVL


In [81]:
# dump

ncr_df

Unnamed: 0,TAXONOMY,MUNICIPALITY,SCHOOL ID,SCHOOL NAME,PROGRAM OFFERINGS
263,NCR|Metro Manila,City of Makati,407698,"Globetek Science Foundation, Inc.",STEM|ABM|HUMSS|GAS
264,NCR|Metro Manila,Binondo,406330,Lorenzo Ruiz Academy,ABM|STEM
265,NCR|Metro Manila,Binondo,406335,Tiong Se Academy,GAS|ABM
266,NCR|Metro Manila,Caloocan City,400942,ABE International Business College-Caloocan Ca...,ABM|TVL
267,NCR|Metro Manila,Caloocan City,400947,AMA Computer College-Caloocan City,STEM|ABM|HUMSS|GAS|TVL
...,...,...,...,...,...
1014,NCR|Metro Manila,Valenzuela City,487015,"St. Michael School of Canumay, Inc.",GAS|HUMSS|ABM
1015,NCR|Metro Manila,Valenzuela City,487020,Nuestra Señora de Guia Academy,ABM|HUMSS|GAS
1016,NCR|Metro Manila,Valenzuela City,487031,St. Gregory College of Valenzuela City,ABM|HUMSS|STEM|GAS
1017,NCR|Metro Manila,Valenzuela City,487036,Sta. Cecilia College,ABM|HUMSS|GAS|TVL


In [82]:
# should not have any more errors after this
ncr_df["TAXONOMY"] = "NCR"


In [83]:
# dump

ncr_df

Unnamed: 0,TAXONOMY,MUNICIPALITY,SCHOOL ID,SCHOOL NAME,PROGRAM OFFERINGS
263,NCR,City of Makati,407698,"Globetek Science Foundation, Inc.",STEM|ABM|HUMSS|GAS
264,NCR,Binondo,406330,Lorenzo Ruiz Academy,ABM|STEM
265,NCR,Binondo,406335,Tiong Se Academy,GAS|ABM
266,NCR,Caloocan City,400942,ABE International Business College-Caloocan Ca...,ABM|TVL
267,NCR,Caloocan City,400947,AMA Computer College-Caloocan City,STEM|ABM|HUMSS|GAS|TVL
...,...,...,...,...,...
1014,NCR,Valenzuela City,487015,"St. Michael School of Canumay, Inc.",GAS|HUMSS|ABM
1015,NCR,Valenzuela City,487020,Nuestra Señora de Guia Academy,ABM|HUMSS|GAS
1016,NCR,Valenzuela City,487031,St. Gregory College of Valenzuela City,ABM|HUMSS|STEM|GAS
1017,NCR,Valenzuela City,487036,Sta. Cecilia College,ABM|HUMSS|GAS|TVL


In [84]:
# head

ncr_df.head()

Unnamed: 0,TAXONOMY,MUNICIPALITY,SCHOOL ID,SCHOOL NAME,PROGRAM OFFERINGS
263,NCR,City of Makati,407698,"Globetek Science Foundation, Inc.",STEM|ABM|HUMSS|GAS
264,NCR,Binondo,406330,Lorenzo Ruiz Academy,ABM|STEM
265,NCR,Binondo,406335,Tiong Se Academy,GAS|ABM
266,NCR,Caloocan City,400942,ABE International Business College-Caloocan Ca...,ABM|TVL
267,NCR,Caloocan City,400947,AMA Computer College-Caloocan City,STEM|ABM|HUMSS|GAS|TVL


#### Remove columns

In [85]:
# not in place unless specified; drop column TAXONOMY

ncr_df.drop(columns=["TAXONOMY"])


Unnamed: 0,MUNICIPALITY,SCHOOL ID,SCHOOL NAME,PROGRAM OFFERINGS
263,City of Makati,407698,"Globetek Science Foundation, Inc.",STEM|ABM|HUMSS|GAS
264,Binondo,406330,Lorenzo Ruiz Academy,ABM|STEM
265,Binondo,406335,Tiong Se Academy,GAS|ABM
266,Caloocan City,400942,ABE International Business College-Caloocan Ca...,ABM|TVL
267,Caloocan City,400947,AMA Computer College-Caloocan City,STEM|ABM|HUMSS|GAS|TVL
...,...,...,...,...
1014,Valenzuela City,487015,"St. Michael School of Canumay, Inc.",GAS|HUMSS|ABM
1015,Valenzuela City,487020,Nuestra Señora de Guia Academy,ABM|HUMSS|GAS
1016,Valenzuela City,487031,St. Gregory College of Valenzuela City,ABM|HUMSS|STEM|GAS
1017,Valenzuela City,487036,Sta. Cecilia College,ABM|HUMSS|GAS|TVL


In [87]:
# Still there!
# dump

ncr_df

Unnamed: 0,TAXONOMY,MUNICIPALITY,SCHOOL ID,SCHOOL NAME,PROGRAM OFFERINGS
263,NCR,City of Makati,407698,"Globetek Science Foundation, Inc.",STEM|ABM|HUMSS|GAS
264,NCR,Binondo,406330,Lorenzo Ruiz Academy,ABM|STEM
265,NCR,Binondo,406335,Tiong Se Academy,GAS|ABM
266,NCR,Caloocan City,400942,ABE International Business College-Caloocan Ca...,ABM|TVL
267,NCR,Caloocan City,400947,AMA Computer College-Caloocan City,STEM|ABM|HUMSS|GAS|TVL
...,...,...,...,...,...
1014,NCR,Valenzuela City,487015,"St. Michael School of Canumay, Inc.",GAS|HUMSS|ABM
1015,NCR,Valenzuela City,487020,Nuestra Señora de Guia Academy,ABM|HUMSS|GAS
1016,NCR,Valenzuela City,487031,St. Gregory College of Valenzuela City,ABM|HUMSS|STEM|GAS
1017,NCR,Valenzuela City,487036,Sta. Cecilia College,ABM|HUMSS|GAS|TVL


In [88]:
# Make drop permanent using inplace=True

ncr_df.drop(columns=["TAXONOMY"], inplace=True)

In [89]:
# Check
ncr_df

Unnamed: 0,MUNICIPALITY,SCHOOL ID,SCHOOL NAME,PROGRAM OFFERINGS
263,City of Makati,407698,"Globetek Science Foundation, Inc.",STEM|ABM|HUMSS|GAS
264,Binondo,406330,Lorenzo Ruiz Academy,ABM|STEM
265,Binondo,406335,Tiong Se Academy,GAS|ABM
266,Caloocan City,400942,ABE International Business College-Caloocan Ca...,ABM|TVL
267,Caloocan City,400947,AMA Computer College-Caloocan City,STEM|ABM|HUMSS|GAS|TVL
...,...,...,...,...
1014,Valenzuela City,487015,"St. Michael School of Canumay, Inc.",GAS|HUMSS|ABM
1015,Valenzuela City,487020,Nuestra Señora de Guia Academy,ABM|HUMSS|GAS
1016,Valenzuela City,487031,St. Gregory College of Valenzuela City,ABM|HUMSS|STEM|GAS
1017,Valenzuela City,487036,Sta. Cecilia College,ABM|HUMSS|GAS|TVL


In [90]:
# can also drop this way (using axis)

# reconstruct ncr_df...
is_ncr = df["TAXONOMY"]=="NCR|Metro Manila"
df[is_ncr]
ncr_df = df[is_ncr].copy()
ncr_df


Unnamed: 0,TAXONOMY,MUNICIPALITY,SCHOOL ID,SCHOOL NAME,PROGRAM OFFERINGS
263,NCR|Metro Manila,City of Makati,407698,"Globetek Science Foundation, Inc.",STEM|ABM|HUMSS|GAS
264,NCR|Metro Manila,Binondo,406330,Lorenzo Ruiz Academy,ABM|STEM
265,NCR|Metro Manila,Binondo,406335,Tiong Se Academy,GAS|ABM
266,NCR|Metro Manila,Caloocan City,400942,ABE International Business College-Caloocan Ca...,ABM|TVL
267,NCR|Metro Manila,Caloocan City,400947,AMA Computer College-Caloocan City,STEM|ABM|HUMSS|GAS|TVL
...,...,...,...,...,...
1014,NCR|Metro Manila,Valenzuela City,487015,"St. Michael School of Canumay, Inc.",GAS|HUMSS|ABM
1015,NCR|Metro Manila,Valenzuela City,487020,Nuestra Señora de Guia Academy,ABM|HUMSS|GAS
1016,NCR|Metro Manila,Valenzuela City,487031,St. Gregory College of Valenzuela City,ABM|HUMSS|STEM|GAS
1017,NCR|Metro Manila,Valenzuela City,487036,Sta. Cecilia College,ABM|HUMSS|GAS|TVL


In [91]:
#drop using axis=1 and inplace=True

ncr_df.drop(["TAXONOMY"], axis=1, inplace=True)

# dump to check
ncr_df


Unnamed: 0,MUNICIPALITY,SCHOOL ID,SCHOOL NAME,PROGRAM OFFERINGS
263,City of Makati,407698,"Globetek Science Foundation, Inc.",STEM|ABM|HUMSS|GAS
264,Binondo,406330,Lorenzo Ruiz Academy,ABM|STEM
265,Binondo,406335,Tiong Se Academy,GAS|ABM
266,Caloocan City,400942,ABE International Business College-Caloocan Ca...,ABM|TVL
267,Caloocan City,400947,AMA Computer College-Caloocan City,STEM|ABM|HUMSS|GAS|TVL
...,...,...,...,...
1014,Valenzuela City,487015,"St. Michael School of Canumay, Inc.",GAS|HUMSS|ABM
1015,Valenzuela City,487020,Nuestra Señora de Guia Academy,ABM|HUMSS|GAS
1016,Valenzuela City,487031,St. Gregory College of Valenzuela City,ABM|HUMSS|STEM|GAS
1017,Valenzuela City,487036,Sta. Cecilia College,ABM|HUMSS|GAS|TVL


#### Select Rows (More examples)

In [92]:
# more selection and filtering examples: MUNICIPALITY = "City of Makati"

ncr_df[ncr_df["MUNICIPALITY"]=="City of Makati"]
# hmmm why only two entries?


Unnamed: 0,MUNICIPALITY,SCHOOL ID,SCHOOL NAME,PROGRAM OFFERINGS
263,City of Makati,407698,"Globetek Science Foundation, Inc.",STEM|ABM|HUMSS|GAS
342,City of Makati,401728,SoFa Design Institute,Arts and Design


In [93]:
# how about Quezon City?


ncr_df[ncr_df["MUNICIPALITY"]=="Quezon City"]


Unnamed: 0,MUNICIPALITY,SCHOOL ID,SCHOOL NAME,PROGRAM OFFERINGS
674,Quezon City,401443,ABE International Business College-Commonwealth,ABM|STEM|TVL
675,Quezon City,401445,ABE International College of Business and Acco...,ABM|GAS|HUMSS|STEM|TVL
676,Quezon City,401447,AMA Computer College-Farview,ABM|GAS|HUMSS|TVL
677,Quezon City,401448,AMA Computer University,ABM|STEM|GAS|HUMSS|TVL
678,Quezon City,401450,Asian College of Science and Technology Founda...,ABM|GAS|TVL
...,...,...,...,...
869,Quezon City,482956,Eclaro Academy,STEM|GAS|TVL
870,Quezon City,482971,"Datamex College of Saint Adeline, Inc.-Fairview",ABM|STEM|GAS|HUMSS|Arts and Design|Sports|TVL
871,Quezon City,482974,Mary the Queen College of Quezon City,ABM|HUMSS|STEM|GAS|TVL
872,Quezon City,(Blank),Jbest School of Technology and Practical Skill...,TVL


#### Select subset of rows and columns

In [96]:
# MUNICIPALITY = Quezon City, display MUNICIPALITY and PROGRAM OFFERINGS

ncr_df[ncr_df["MUNICIPALITY"]=="Quezon City"][["MUNICIPALITY","PROGRAM OFFERINGS"]]


Unnamed: 0,MUNICIPALITY,PROGRAM OFFERINGS
674,Quezon City,ABM|STEM|TVL
675,Quezon City,ABM|GAS|HUMSS|STEM|TVL
676,Quezon City,ABM|GAS|HUMSS|TVL
677,Quezon City,ABM|STEM|GAS|HUMSS|TVL
678,Quezon City,ABM|GAS|TVL
...,...,...
869,Quezon City,STEM|GAS|TVL
870,Quezon City,ABM|STEM|GAS|HUMSS|Arts and Design|Sports|TVL
871,Quezon City,ABM|HUMSS|STEM|GAS|TVL
872,Quezon City,TVL


#### Regex

`Series.str.contains(self, pat, case=True, flags=0, na=nan, regex=True)[source]`

In [99]:
ncr_df["MUNICIPALITY"].str.contains("Quezon")

263    False
264    False
265    False
266    False
267    False
Name: MUNICIPALITY, dtype: bool

In [101]:
# str.contains(...)
ncr_df[ncr_df["MUNICIPALITY"].str.contains("Quezon")]



Unnamed: 0,MUNICIPALITY,SCHOOL ID,SCHOOL NAME,PROGRAM OFFERINGS
674,Quezon City,401443,ABE International Business College-Commonwealth,ABM|STEM|TVL
675,Quezon City,401445,ABE International College of Business and Acco...,ABM|GAS|HUMSS|STEM|TVL
676,Quezon City,401447,AMA Computer College-Farview,ABM|GAS|HUMSS|TVL
677,Quezon City,401448,AMA Computer University,ABM|STEM|GAS|HUMSS|TVL
678,Quezon City,401450,Asian College of Science and Technology Founda...,ABM|GAS|TVL
...,...,...,...,...
869,Quezon City,482956,Eclaro Academy,STEM|GAS|TVL
870,Quezon City,482971,"Datamex College of Saint Adeline, Inc.-Fairview",ABM|STEM|GAS|HUMSS|Arts and Design|Sports|TVL
871,Quezon City,482974,Mary the Queen College of Quezon City,ABM|HUMSS|STEM|GAS|TVL
872,Quezon City,(Blank),Jbest School of Technology and Practical Skill...,TVL


In [102]:
# MUNICIPALITY contains Makati

ncr_df[ncr_df["MUNICIPALITY"].str.contains("Makati")]


Unnamed: 0,MUNICIPALITY,SCHOOL ID,SCHOOL NAME,PROGRAM OFFERINGS
263,City of Makati,407698,"Globetek Science Foundation, Inc.",STEM|ABM|HUMSS|GAS
342,City of Makati,401728,SoFa Design Institute,Arts and Design
389,Makati City,401015,ABE International Business College-Makati City,STEM|ABM|HUMSS|GAS|TVL
390,Makati City,401016,"AMA Computer College, Inc.-Makati",STEM|ABM|HUMSS|GAS|TVL
391,Makati City,401027,AMA Computer Learning Center of Guadalupe,STEM|ABM|HUMSS|GAS|TVL
392,Makati City,401034,Asia Pacific College,GAS|ABM|STEM
393,Makati City,401056,Career Academy Asia- PHINMA Education,Arts and Design|TVL
394,Makati City,401059,Centro Escolar University-Makati,STEM|ABM|HUMSS
395,Makati City,401067,"iACADEMY, Belair",ABM|HUMSS|GAS|Arts and Design|TVL
396,Makati City,401071,Infotech Institute of Arts and Sciences-Makati...,TVL


In [103]:
# SCHOOL NAME contains Ateneo

ncr_df[ncr_df["SCHOOL NAME"].str.contains("Ateneo")]

Unnamed: 0,MUNICIPALITY,SCHOOL ID,SCHOOL NAME,PROGRAM OFFERINGS
793,Quezon City,406575,Ateneo de Manila University,ABM|STEM|HUMSS|GAS


In [104]:
# SCHOOL NAME contains Ateneo or Xavier


ncr_df[ncr_df["SCHOOL NAME"].str.contains("Ateneo|Xavier")]

Unnamed: 0,MUNICIPALITY,SCHOOL ID,SCHOOL NAME,PROGRAM OFFERINGS
771,Quezon City,406488,St. Francis Xavier Catholic School,GAS|HUMSS
793,Quezon City,406575,Ateneo de Manila University,ABM|STEM|HUMSS|GAS
913,San Juan City,406954,Xavier School,GAS|ABM|STEM|HUMSS


In [106]:
# PROGRAM OFFERINGS contains STEM


ncr_df[ncr_df["PROGRAM OFFERINGS"].str.contains("STEM")]


Unnamed: 0,MUNICIPALITY,SCHOOL ID,SCHOOL NAME,PROGRAM OFFERINGS
263,City of Makati,407698,"Globetek Science Foundation, Inc.",STEM|ABM|HUMSS|GAS
264,Binondo,406330,Lorenzo Ruiz Academy,ABM|STEM
267,Caloocan City,400947,AMA Computer College-Caloocan City,STEM|ABM|HUMSS|GAS|TVL
268,Caloocan City,400951,AMA Computer Learning Center of Caloocan City,STEM|ABM|HUMSS|GAS|TVL
271,Caloocan City,400960,Datamex Institute of Computer Technology-Caloo...,STEM|ABM|HUMSS|GAS|Arts and Design|Sports|TVL
...,...,...,...,...
1004,Valenzuela City,407184,Our Lady of Lourdes College,STEM|ABM|HUMSS|GAS|TVL
1005,Valenzuela City,407185,St. Bernadette College of Valenzuela,ABM|STEM|HUMSS|GAS|TVL
1006,Valenzuela City,407186,"St. Catherine College of Valenzuela, Inc.",ABM|HUMSS|STEM|GAS
1016,Valenzuela City,487031,St. Gregory College of Valenzuela City,ABM|HUMSS|STEM|GAS


#### Add Columns

In [107]:
# First check what happens when we invoke str.contains ...
ncr_df["PROGRAM OFFERINGS"].str.contains("STEM")


263      True
264      True
265     False
266     False
267      True
        ...  
1014    False
1015    False
1016     True
1017    False
1018     True
Name: PROGRAM OFFERINGS, Length: 756, dtype: bool

In [108]:
# Introduce columns
# STEM
# ABM
# HUMSS
# GAS
# TVL
# Arts and Design
# Sports

ncr_df["STEM"]=ncr_df["PROGRAM OFFERINGS"].str.contains("STEM")
ncr_df["ABM"]=ncr_df["PROGRAM OFFERINGS"].str.contains("ABM")
ncr_df["HUMSS"]=ncr_df["PROGRAM OFFERINGS"].str.contains("HUMSS")
ncr_df["GAS"]=ncr_df["PROGRAM OFFERINGS"].str.contains("GAS")
ncr_df["TVL"]=ncr_df["PROGRAM OFFERINGS"].str.contains("TVL")
ncr_df["Arts and Design"]=ncr_df["PROGRAM OFFERINGS"].str.contains("Arts and Design")
ncr_df["Sports"]=ncr_df["PROGRAM OFFERINGS"].str.contains("Sports")




In [109]:
# Check new columns

ncr_df.columns

Index(['MUNICIPALITY', 'SCHOOL ID', 'SCHOOL NAME', 'PROGRAM OFFERINGS', 'STEM',
       'ABM', 'HUMSS', 'GAS', 'TVL', 'Arts and Design', 'Sports'],
      dtype='object')

#### Conditional Selection

In [111]:
# Check new columns
ncr_df[['MUNICIPALITY', 'SCHOOL ID', 'SCHOOL NAME', 'PROGRAM OFFERINGS', 'STEM',
       'ABM', 'HUMSS', 'GAS', 'TVL', 'Arts and Design', 'Sports']]


Unnamed: 0,MUNICIPALITY,SCHOOL ID,SCHOOL NAME,PROGRAM OFFERINGS,STEM,ABM,HUMSS,GAS,TVL,Arts and Design,Sports
263,City of Makati,407698,"Globetek Science Foundation, Inc.",STEM|ABM|HUMSS|GAS,True,True,True,True,False,False,False
264,Binondo,406330,Lorenzo Ruiz Academy,ABM|STEM,True,True,False,False,False,False,False
265,Binondo,406335,Tiong Se Academy,GAS|ABM,False,True,False,True,False,False,False
266,Caloocan City,400942,ABE International Business College-Caloocan Ca...,ABM|TVL,False,True,False,False,True,False,False
267,Caloocan City,400947,AMA Computer College-Caloocan City,STEM|ABM|HUMSS|GAS|TVL,True,True,True,True,True,False,False
...,...,...,...,...,...,...,...,...,...,...,...
1014,Valenzuela City,487015,"St. Michael School of Canumay, Inc.",GAS|HUMSS|ABM,False,True,True,True,False,False,False
1015,Valenzuela City,487020,Nuestra Señora de Guia Academy,ABM|HUMSS|GAS,False,True,True,True,False,False,False
1016,Valenzuela City,487031,St. Gregory College of Valenzuela City,ABM|HUMSS|STEM|GAS,True,True,True,True,False,False,False
1017,Valenzuela City,487036,Sta. Cecilia College,ABM|HUMSS|GAS|TVL,False,True,True,True,True,False,False


In [113]:
# Which Schools in NCR offer both STEM and HUMSS?
# copy and paste column list above

stem = ncr_df["STEM"]==True
humss = ncr_df["HUMSS"]==True

ncr_df[(stem)&(humss)][['MUNICIPALITY', 'SCHOOL ID', 'SCHOOL NAME', 'PROGRAM OFFERINGS', 'STEM',
       'ABM', 'HUMSS', 'GAS', 'TVL', 'Arts and Design', 'Sports']]

Unnamed: 0,MUNICIPALITY,SCHOOL ID,SCHOOL NAME,PROGRAM OFFERINGS,STEM,ABM,HUMSS,GAS,TVL,Arts and Design,Sports
263,City of Makati,407698,"Globetek Science Foundation, Inc.",STEM|ABM|HUMSS|GAS,True,True,True,True,False,False,False
267,Caloocan City,400947,AMA Computer College-Caloocan City,STEM|ABM|HUMSS|GAS|TVL,True,True,True,True,True,False,False
268,Caloocan City,400951,AMA Computer Learning Center of Caloocan City,STEM|ABM|HUMSS|GAS|TVL,True,True,True,True,True,False,False
271,Caloocan City,400960,Datamex Institute of Computer Technology-Caloo...,STEM|ABM|HUMSS|GAS|Arts and Design|Sports|TVL,True,True,True,True,True,True,True
274,Caloocan City,400972,Martinez Memorial Colleges,STEM|ABM|HUMSS|GAS|TVL,True,True,True,True,True,False,False
...,...,...,...,...,...,...,...,...,...,...,...
1004,Valenzuela City,407184,Our Lady of Lourdes College,STEM|ABM|HUMSS|GAS|TVL,True,True,True,True,True,False,False
1005,Valenzuela City,407185,St. Bernadette College of Valenzuela,ABM|STEM|HUMSS|GAS|TVL,True,True,True,True,True,False,False
1006,Valenzuela City,407186,"St. Catherine College of Valenzuela, Inc.",ABM|HUMSS|STEM|GAS,True,True,True,True,False,False,False
1016,Valenzuela City,487031,St. Gregory College of Valenzuela City,ABM|HUMSS|STEM|GAS,True,True,True,True,False,False,False


#### `.loc[rowindex,colindex]`

In [114]:
# define row_idx and col_idx variables

row_idx=ncr_df["MUNICIPALITY"].str.contains("Makati")
col_idx=["MUNICIPALITY","PROGRAM OFFERINGS"]
# use .loc
ncr_df.loc[row_idx,col_idx]




Unnamed: 0,MUNICIPALITY,PROGRAM OFFERINGS
263,City of Makati,STEM|ABM|HUMSS|GAS
342,City of Makati,Arts and Design
389,Makati City,STEM|ABM|HUMSS|GAS|TVL
390,Makati City,STEM|ABM|HUMSS|GAS|TVL
391,Makati City,STEM|ABM|HUMSS|GAS|TVL
392,Makati City,GAS|ABM|STEM
393,Makati City,Arts and Design|TVL
394,Makati City,STEM|ABM|HUMSS
395,Makati City,ABM|HUMSS|GAS|Arts and Design|TVL
396,Makati City,TVL


### Lightweight Data Cleanup

In [115]:
# dump unique MUNICIPALITIES first to check

ncr_df["MUNICIPALITY"].unique()


array(['City of Makati', 'Binondo', 'Caloocan City', 'City of Las Piñas',
       'City of Muntinlupa', 'City of Valenzuela', 'Kalookan City',
       'Las Piñas City', 'Makati City', 'Malabon City', 'Malate',
       'Mandaluyong City', 'Manila', 'Manila City', 'Marikina City',
       'Muntinlupa City', 'Navotas City', 'Paco', 'Pandacan',
       'Parañaque City', 'Pasay City', 'Pasig City', 'Pateros',
       'Quezon City', 'Quezon CIty', 'Quiapo', 'Sampaloc',
       'San Juan City', 'Santa Ana', 'Santa Cruz', 'Taguig City', 'Tondo',
       'Valenzuela City'], dtype=object)

In [116]:
# Replace MUNICIPALITIES

ncr_df.loc[ncr_df["MUNICIPALITY"]=="Makati City",["MUNICIPALITY"]]="City of Makati"
ncr_df.loc[ncr_df["MUNICIPALITY"]=="Quezon CIty",["MUNICIPALITY"]]="Quezon City"
ncr_df.loc[ncr_df["MUNICIPALITY"]=="Caloocan City",["MUNICIPALITY"]]="Kalookan City"
ncr_df.loc[ncr_df["MUNICIPALITY"]=="Las Piñas City",["MUNICIPALITY"]]="City of Las Piñas"
ncr_df.loc[ncr_df["MUNICIPALITY"]=="Valenzuela City",["MUNICIPALITY"]]="City of Valenzuela"
ncr_df.loc[ncr_df["MUNICIPALITY"]=="Muntinlupa City",["MUNICIPALITY"]]="City of Muntinlupa"
ncr_df.loc[ncr_df["MUNICIPALITY"]=="Manila",["MUNICIPALITY"]]="Manila City"
ncr_df.loc[ncr_df["MUNICIPALITY"]=="Binondo",["MUNICIPALITY"]]="Manila City"
ncr_df.loc[ncr_df["MUNICIPALITY"]=="Malate",["MUNICIPALITY"]]="Manila City"
ncr_df.loc[ncr_df["MUNICIPALITY"]=="Paco",["MUNICIPALITY"]]="Manila City"
ncr_df.loc[ncr_df["MUNICIPALITY"]=="Pandacan",["MUNICIPALITY"]]="Manila City"
ncr_df.loc[ncr_df["MUNICIPALITY"]=="Quiapo",["MUNICIPALITY"]]="Manila City"
ncr_df.loc[ncr_df["MUNICIPALITY"]=="Sampaloc",["MUNICIPALITY"]]="Manila City"
ncr_df.loc[ncr_df["MUNICIPALITY"]=="Santa Cruz",["MUNICIPALITY"]]="Manila City"
ncr_df.loc[ncr_df["MUNICIPALITY"]=="Santa Ana",["MUNICIPALITY"]]="Manila City"
ncr_df.loc[ncr_df["MUNICIPALITY"]=="Tondo",["MUNICIPALITY"]]="Manila City"



In [117]:
# check unique values again
ncr_df["MUNICIPALITY"].unique()





array(['City of Makati', 'Manila City', 'Kalookan City',
       'City of Las Piñas', 'City of Muntinlupa', 'City of Valenzuela',
       'Malabon City', 'Mandaluyong City', 'Marikina City',
       'Navotas City', 'Parañaque City', 'Pasay City', 'Pasig City',
       'Pateros', 'Quezon City', 'San Juan City', 'Taguig City'],
      dtype=object)

In [118]:
# len
len(ncr_df["MUNICIPALITY"].unique())


17

In [119]:
# define offerings variable with a list of offerings
offerings = ["STEM","ABM","HUMSS","GAS","TVL","Arts and Design","Sports"]
# dump
ncr_df[offerings]



Unnamed: 0,STEM,ABM,HUMSS,GAS,TVL,Arts and Design,Sports
263,True,True,True,True,False,False,False
264,True,True,False,False,False,False,False
265,False,True,False,True,False,False,False
266,False,True,False,False,True,False,False
267,True,True,True,True,True,False,False
...,...,...,...,...,...,...,...
1014,False,True,True,True,False,False,False
1015,False,True,True,True,False,False,False
1016,True,True,True,True,False,False,False
1017,False,True,True,True,True,False,False


In [120]:
# Count number of offerings per school using sum (note: True == 1)
ncr_df[offerings].sum(axis=1)



263     4
264     2
265     2
266     2
267     5
       ..
1014    3
1015    3
1016    4
1017    4
1018    4
Length: 756, dtype: int64

In [121]:
# Add new column OFFERINGS COUNT
ncr_df["OFFERINGS COUNT"]=ncr_df[offerings].sum(axis=1)
ncr_df.head()


Unnamed: 0,MUNICIPALITY,SCHOOL ID,SCHOOL NAME,PROGRAM OFFERINGS,STEM,ABM,HUMSS,GAS,TVL,Arts and Design,Sports,OFFERINGS COUNT
263,City of Makati,407698,"Globetek Science Foundation, Inc.",STEM|ABM|HUMSS|GAS,True,True,True,True,False,False,False,4
264,Manila City,406330,Lorenzo Ruiz Academy,ABM|STEM,True,True,False,False,False,False,False,2
265,Manila City,406335,Tiong Se Academy,GAS|ABM,False,True,False,True,False,False,False,2
266,Kalookan City,400942,ABE International Business College-Caloocan Ca...,ABM|TVL,False,True,False,False,True,False,False,2
267,Kalookan City,400947,AMA Computer College-Caloocan City,STEM|ABM|HUMSS|GAS|TVL,True,True,True,True,True,False,False,5


### Group By

In [123]:
# loc, STEM==TRUE
muni_gb = ncr_df.groupby("MUNICIPALITY")
muni_gb["MUNICIPALITY"].count()


MUNICIPALITY
City of Las Piñas      43
City of Makati         33
City of Muntinlupa     35
City of Valenzuela     30
Kalookan City          77
Malabon City           14
Mandaluyong City       19
Manila City           106
Marikina City          35
Navotas City            4
Parañaque City         44
Pasay City             22
Pasig City             39
Pateros                 4
Quezon City           200
San Juan City          15
Taguig City            36
Name: MUNICIPALITY, dtype: int64

In [None]:
# count()




#### `sort_values()`

In [124]:
# sort_values()

muni_gb["MUNICIPALITY"].count().sort_values(ascending=False)

MUNICIPALITY
Quezon City           200
Manila City           106
Kalookan City          77
Parañaque City         44
City of Las Piñas      43
Pasig City             39
Taguig City            36
City of Muntinlupa     35
Marikina City          35
City of Makati         33
City of Valenzuela     30
Pasay City             22
Mandaluyong City       19
San Juan City          15
Malabon City           14
Navotas City            4
Pateros                 4
Name: MUNICIPALITY, dtype: int64

In [127]:
# How many STEM schools per municipality?

muni_gb_stem = ncr_df.loc[ncr_df["STEM"],["MUNICIPALITY","STEM"]].groupby(["MUNICIPALITY"])
muni_gb_stem.count().sort_values(by="STEM",ascending=False)


Unnamed: 0_level_0,STEM
MUNICIPALITY,Unnamed: 1_level_1
Quezon City,84
Manila City,68
Kalookan City,30
City of Las Piñas,21
City of Makati,19
Marikina City,16
Parañaque City,15
Mandaluyong City,13
Pasig City,12
City of Valenzuela,11


### Writing to csv

`to_csv(...)`

In [128]:
# turn off index when writing to csv
ncr_df.to_csv("ncr_shs.csv",index=False)


In [129]:
!cat ncr_shs.csv

MUNICIPALITY,SCHOOL ID,SCHOOL NAME,PROGRAM OFFERINGS,STEM,ABM,HUMSS,GAS,TVL,Arts and Design,Sports,OFFERINGS COUNT
City of Makati,407698,"Globetek Science Foundation, Inc.",STEM|ABM|HUMSS|GAS,True,True,True,True,False,False,False,4
Manila City,406330,Lorenzo Ruiz Academy,ABM|STEM,True,True,False,False,False,False,False,2
Manila City,406335,Tiong Se Academy,GAS|ABM,False,True,False,True,False,False,False,2
Kalookan City,400942,ABE International Business College-Caloocan Campus,ABM|TVL,False,True,False,False,True,False,False,2
Kalookan City,400947,AMA Computer College-Caloocan City,STEM|ABM|HUMSS|GAS|TVL,True,True,True,True,True,False,False,5
Kalookan City,400951,AMA Computer Learning Center of Caloocan City,STEM|ABM|HUMSS|GAS|TVL,True,True,True,True,True,False,False,5
Kalookan City,400952,Asian Institute of Computer Studies - Caloocan,ABM|GAS|TVL,False,True,False,True,True,False,False,3
Kalookan City,400959,Colegio de San Gabriel Arcangel of Caloocan,ABM|GAS|TVL,False,True,False