<img src="https://courses.edx.org/asset-v1:ACCA+ML001+2T2021+type@asset+block@acca-logo.jpg" alt="ACCA logo" style="width: 400px;"/>

# Python for data analysis
## Part 2 - First steps with pandas

* **Course:** __Machine learning with Python for finance professionals__ by ACCA
* **Instructor:** [Coefficient](https://coefficient.ai) / [@CoefficientData](https://twitter.com/CoefficientData)

---

<div class="alert alert-block alert-info" style="background-color: #BA001E; border: 0px; -moz-border-radius: 10px; -webkit-border-radius: 10px;">
<h2 style="color: white">
pandas 101
</h2><br>
</div>

<img src="https://courses.edx.org/asset-v1:ACCA+ML001+2T2021+type@asset+block@pandas.png" alt="pandas" style="width: 300px;"/>

> **[pandas](https://pandas.pydata.org) is a Python library for data analysis & manipulation**. The name is a contraction of "[panel data](https://en.wikipedia.org/wiki/Panel_data) analysis" and refers to the kind of tabular data common in financial applications. It was released in 2008 by Wes McKinney and has been called "_[the most important tool in data science](https://qz.com/1126615/the-story-of-the-most-important-tool-in-data-science/)_".
>
> pandas is built on top of NumPy and enables the storage and manipulation of Excel-like tables in Python. These special tables are called DataFrames, the primary object in `pandas`.

In [1]:
# We will import pandas using the alias "pd" (it's shorter, i.e. quicker to type)
import pandas as pd

In [2]:
pd.read_excel?

In [3]:
# Let's read in the Dream Destination hotel data.
orders = pd.read_excel(
    "Hotel Industry - Orders Database - 2019.xlsx", sheet_name="Order Database"
)

In [4]:
# What does it look like?
orders

Unnamed: 0,Booking ID,Date of Booking,Year,Time,Customer ID,Gender,Age,Origin Country,State,Location,Destination Country,Destination City,No. Of People,Check-in date,No. Of Days,Check-Out Date,Rooms,Hotel Name,Hotel Rating
0,DDID57035,2019-01-01,2019,13:23:47,ID10297,Female,51,Indonesia,Tambora,Jakarta,Ireland,Tallaght,2,2019-03-24,1,2019-03-25,1,Blooming Bed And Breakfast,4.2
1,DDSG57036,2019-01-01,2019,16:14:22,SG10307,Male,46,Singapore,Central,Novena,Maldives,Viligili,4,2019-01-15,2,2019-01-17,2,Four Points,4.3
2,DDMY57037,2019-01-01,2019,09:49:48,MY10283,Female,25,Malaysia,Johor,Johor Bahru,Canada,North York,5,2019-01-16,9,2019-01-25,3,Hotel Joy Stick,3.8
3,DDSG57038,2019-01-01,2019,11:46:28,SG10308,Male,22,Singapore,North-East,Hougang,Maldives,Fuvahmulah,5,2019-01-18,1,2019-01-19,3,Classio Hotel,3.7
4,DDID57039,2019-01-01,2019,13:57:50,ID10298,Male,45,Indonesia,Bekasi,West Java,France,Nice,7,2019-01-02,1,2019-01-03,4,Adam Lake B&B,4.5
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
9496,DDSG66531,2019-12-31,2019,23:36:16,SG12034,Female,42,Singapore,Central,Orchard,Germany,Berlin,4,2020-01-06,4,2020-01-10,2,Silver Cloud Inn,4.3
9497,DDSG66532,2019-12-31,2019,14:41:01,SG12035,Female,54,Singapore,Central,Geylang,Israel,Holon,4,2020-04-09,4,2020-04-13,2,The Elet,4.2
9498,DDSG66533,2019-12-31,2019,19:11:16,SG12036,Female,57,Singapore,Central,Downtown Core,Canada,Ottawa,7,2020-01-09,1,2020-01-10,4,The Elet,4.4
9499,DDTH66534,2019-12-31,2019,05:12:29,TH12170,Female,44,Thailand,Surat Thani,Ko Samui,Maldives,Viligili,3,2020-01-01,1,2020-01-02,2,Sunset Lodge,4.2


In [5]:
# This DataFrame has 9501 rows and 19 columns
orders.shape

(9501, 19)

In [6]:
# We can use the Python "length" function, len(), to count the length of things
len({"a": 1, 'b': 2, 'c': 3})

3

In [7]:
# The "length" of a DataFrame is the number of rows it has
len(orders)

9501

---

In this next section we will aim to answer the following questions:
1. How do we select a single column?
2. How do we select several columns?
3. How do we select the top 5 rows? The top 10 rows?
4. How do we select the bottom 5 rows?
5. How do we select only bookings made by people under 20 years old? How about only bookings made by women? How about bookings made by men aged 40-49?

### 1. How do we select a single column?

In [8]:
# Remember that square brackets are used for "selecting things" in Python.
numbers = [1, 2, 3]
numbers[0]

1

In [9]:
# Select the value associated with a dictionary key...
capitals = {"Germany": "Berlin", "France": "Paris", "Slovenia": "Ljubljana", "Tanzania": "Dodoma"}
capitals["Slovenia"]

'Ljubljana'

In [10]:
orders

Unnamed: 0,Booking ID,Date of Booking,Year,Time,Customer ID,Gender,Age,Origin Country,State,Location,Destination Country,Destination City,No. Of People,Check-in date,No. Of Days,Check-Out Date,Rooms,Hotel Name,Hotel Rating
0,DDID57035,2019-01-01,2019,13:23:47,ID10297,Female,51,Indonesia,Tambora,Jakarta,Ireland,Tallaght,2,2019-03-24,1,2019-03-25,1,Blooming Bed And Breakfast,4.2
1,DDSG57036,2019-01-01,2019,16:14:22,SG10307,Male,46,Singapore,Central,Novena,Maldives,Viligili,4,2019-01-15,2,2019-01-17,2,Four Points,4.3
2,DDMY57037,2019-01-01,2019,09:49:48,MY10283,Female,25,Malaysia,Johor,Johor Bahru,Canada,North York,5,2019-01-16,9,2019-01-25,3,Hotel Joy Stick,3.8
3,DDSG57038,2019-01-01,2019,11:46:28,SG10308,Male,22,Singapore,North-East,Hougang,Maldives,Fuvahmulah,5,2019-01-18,1,2019-01-19,3,Classio Hotel,3.7
4,DDID57039,2019-01-01,2019,13:57:50,ID10298,Male,45,Indonesia,Bekasi,West Java,France,Nice,7,2019-01-02,1,2019-01-03,4,Adam Lake B&B,4.5
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
9496,DDSG66531,2019-12-31,2019,23:36:16,SG12034,Female,42,Singapore,Central,Orchard,Germany,Berlin,4,2020-01-06,4,2020-01-10,2,Silver Cloud Inn,4.3
9497,DDSG66532,2019-12-31,2019,14:41:01,SG12035,Female,54,Singapore,Central,Geylang,Israel,Holon,4,2020-04-09,4,2020-04-13,2,The Elet,4.2
9498,DDSG66533,2019-12-31,2019,19:11:16,SG12036,Female,57,Singapore,Central,Downtown Core,Canada,Ottawa,7,2020-01-09,1,2020-01-10,4,The Elet,4.4
9499,DDTH66534,2019-12-31,2019,05:12:29,TH12170,Female,44,Thailand,Surat Thani,Ko Samui,Maldives,Viligili,3,2020-01-01,1,2020-01-02,2,Sunset Lodge,4.2


In [11]:
# The same is true for DataFrames. Here we get the single column with the name "Location".
orders['Location']

0             Jakarta
1              Novena
2         Johor Bahru
3             Hougang
4           West Java
            ...      
9496          Orchard
9497          Geylang
9498    Downtown Core
9499         Ko Samui
9500          Gia Lai
Name: Location, Length: 9501, dtype: object

In [12]:
# We can also use the "dot" notation to access the same column.
orders.Age

0       51
1       46
2       25
3       22
4       45
        ..
9496    42
9497    54
9498    57
9499    44
9500    52
Name: Age, Length: 9501, dtype: int64

In [13]:
# We can turn the above into a normal Python list
orders.Location.tolist()

['Jakarta',
 'Novena',
 'Johor Bahru',
 'Hougang',
 'West Java',
 'Ipoh',
 'Central Java',
 'Cabuyao',
 'Mandai',
 'Sakon Nakhon',
 'Binh Thuan',
 'Nakhon Pathom',
 'Phitsanulok',
 'Papua',
 'Marina South',
 'Quang Binh',
 'Gia Lai',
 'Gia Lai',
 'Seberang Perai',
 'Paya Lebar',
 'Lam Dong',
 'Tien Giang',
 'Nakhon Pathom',
 'Kuching',
 'Hougang',
 'Pattaya',
 'Siem Reap',
 'Alor Setar',
 'Samut Sakhon',
 'Papua',
 'Yala',
 'Bacoor',
 'Mae Sot',
 'Central Java',
 'Kuching',
 'Nghe An',
 'George Town',
 'Choa Chu Kang',
 'Om Noi',
 'Kuching',
 'Samut Prakan',
 'Central Sulawesi',
 'Sakon Nakhon',
 'Rayong',
 'Jakarta',
 'Newton',
 'Papua',
 'Kota Kinabalu',
 'Chiang Mai',
 'Khon Kaen',
 'Paya Lebar',
 'Bukit Panjang',
 'Miri',
 'Johor Bahru',
 'Kota Kinabalu',
 'Petaling Jaya',
 'Dasmariñas',
 'Bukit Batok',
 'Central Java',
 'Chiang Rai',
 'Boon Lay',
 'Tacloban',
 'Thura Thien-Hue',
 'Miri',
 'Battambang',
 'Seberang Perai',
 'Phuket',
 'Queenstown',
 'Jambi',
 'Melaka',
 'Changi',
 '

In [14]:
# This doesn't work with columns containing spaces, we must use the square brackets here.
orders['Destination Country']

0        Ireland
1       Maldives
2         Canada
3       Maldives
4         France
          ...   
9496     Germany
9497      Israel
9498      Canada
9499    Maldives
9500       Egypt
Name: Destination Country, Length: 9501, dtype: object

In [15]:
# We call this a "pandas Series"
type(orders['Location'])

pandas.core.series.Series

A [Series](https://pandas.pydata.org/pandas-docs/stable/user_guide/dsintro.html#series) is a single column from a DataFrame. A DataFrame is made up of many Series (i.e. columns), each with its own column name.

### 2. How do we select several columns?

In [16]:
# We can select multiple columns by passing in a list of column names
# into the first set of square brackets.
orders[['Location', 'Destination Country']]

Unnamed: 0,Location,Destination Country
0,Jakarta,Ireland
1,Novena,Maldives
2,Johor Bahru,Canada
3,Hougang,Maldives
4,West Java,France
...,...,...
9496,Orchard,Germany
9497,Geylang,Israel
9498,Downtown Core,Canada
9499,Ko Samui,Maldives


In [17]:
# Note there's nothing special about double brackets [[]] in pandas!

# It's just a Python list...
columns = ['Location', 'Destination Country']

# ...being placed inside the normal pandas "selector" brackets.
orders[columns]

Unnamed: 0,Location,Destination Country
0,Jakarta,Ireland
1,Novena,Maldives
2,Johor Bahru,Canada
3,Hougang,Maldives
4,West Java,France
...,...,...
9496,Orchard,Germany
9497,Geylang,Israel
9498,Downtown Core,Canada
9499,Ko Samui,Maldives


In [18]:
orders[['Location']]

Unnamed: 0,Location
0,Jakarta
1,Novena
2,Johor Bahru
3,Hougang
4,West Java
...,...
9496,Orchard
9497,Geylang
9498,Downtown Core
9499,Ko Samui


### 3. How do we select the top 5 rows? The top 10 rows?

In [19]:
# We can use the pandas .head() method for this
orders.head()

Unnamed: 0,Booking ID,Date of Booking,Year,Time,Customer ID,Gender,Age,Origin Country,State,Location,Destination Country,Destination City,No. Of People,Check-in date,No. Of Days,Check-Out Date,Rooms,Hotel Name,Hotel Rating
0,DDID57035,2019-01-01,2019,13:23:47,ID10297,Female,51,Indonesia,Tambora,Jakarta,Ireland,Tallaght,2,2019-03-24,1,2019-03-25,1,Blooming Bed And Breakfast,4.2
1,DDSG57036,2019-01-01,2019,16:14:22,SG10307,Male,46,Singapore,Central,Novena,Maldives,Viligili,4,2019-01-15,2,2019-01-17,2,Four Points,4.3
2,DDMY57037,2019-01-01,2019,09:49:48,MY10283,Female,25,Malaysia,Johor,Johor Bahru,Canada,North York,5,2019-01-16,9,2019-01-25,3,Hotel Joy Stick,3.8
3,DDSG57038,2019-01-01,2019,11:46:28,SG10308,Male,22,Singapore,North-East,Hougang,Maldives,Fuvahmulah,5,2019-01-18,1,2019-01-19,3,Classio Hotel,3.7
4,DDID57039,2019-01-01,2019,13:57:50,ID10298,Male,45,Indonesia,Bekasi,West Java,France,Nice,7,2019-01-02,1,2019-01-03,4,Adam Lake B&B,4.5


In [20]:
# Let's take a look at the inline help - you can see the default is 5
orders.head?

In [21]:
# Let's try get 10 rows
orders.head(n=10)

Unnamed: 0,Booking ID,Date of Booking,Year,Time,Customer ID,Gender,Age,Origin Country,State,Location,Destination Country,Destination City,No. Of People,Check-in date,No. Of Days,Check-Out Date,Rooms,Hotel Name,Hotel Rating
0,DDID57035,2019-01-01,2019,13:23:47,ID10297,Female,51,Indonesia,Tambora,Jakarta,Ireland,Tallaght,2,2019-03-24,1,2019-03-25,1,Blooming Bed And Breakfast,4.2
1,DDSG57036,2019-01-01,2019,16:14:22,SG10307,Male,46,Singapore,Central,Novena,Maldives,Viligili,4,2019-01-15,2,2019-01-17,2,Four Points,4.3
2,DDMY57037,2019-01-01,2019,09:49:48,MY10283,Female,25,Malaysia,Johor,Johor Bahru,Canada,North York,5,2019-01-16,9,2019-01-25,3,Hotel Joy Stick,3.8
3,DDSG57038,2019-01-01,2019,11:46:28,SG10308,Male,22,Singapore,North-East,Hougang,Maldives,Fuvahmulah,5,2019-01-18,1,2019-01-19,3,Classio Hotel,3.7
4,DDID57039,2019-01-01,2019,13:57:50,ID10298,Male,45,Indonesia,Bekasi,West Java,France,Nice,7,2019-01-02,1,2019-01-03,4,Adam Lake B&B,4.5
5,DDMY57040,2019-01-01,2019,23:00:00,MY10284,Female,46,Malaysia,Perak,Ipoh,China,Shanghai,5,2019-01-16,3,2019-01-19,3,Reefs Resort & Club,4.2
6,DDID57041,2019-01-01,2019,20:46:02,ID10299,Female,26,Indonesia,Sukolilo,Central Java,Canada,Toronto,2,2019-03-03,2,2019-03-05,1,The Awua,4.1
7,DDPH57042,2019-01-01,2019,13:35:07,PH05497,Female,49,Philippines,Laguna,Cabuyao,Denmark,Odense,3,2019-01-02,4,2019-01-06,2,Flying Fox Motel,4.3
8,DDSG57043,2019-01-01,2019,19:31:53,SG10309,Female,54,Singapore,North,Mandai,Maldives,Fuvahmulah,2,2019-02-04,4,2019-02-08,1,Starlight Motel,4.6
9,DDTH57044,2019-01-01,2019,22:30:50,TH10427,Female,20,Thailand,Sakon Nakhon,Sakon Nakhon,France,Lyon,5,2019-03-08,1,2019-03-09,3,Thompson,4.4


### 4. How do we select the bottom 5 rows?

In [22]:
# The equivalent command for the bottom rows is .tail()
orders.tail?

> ### 🚩 Exercise
> Use the `.tail()` method to get the bottom 2 rows only.

In [23]:
# ✏️ ENTER YOUR SOLUTION HERE
orders.tail(2)


Unnamed: 0,Booking ID,Date of Booking,Year,Time,Customer ID,Gender,Age,Origin Country,State,Location,Destination Country,Destination City,No. Of People,Check-in date,No. Of Days,Check-Out Date,Rooms,Hotel Name,Hotel Rating
9499,DDTH66534,2019-12-31,2019,05:12:29,TH12170,Female,44,Thailand,Surat Thani,Ko Samui,Maldives,Viligili,3,2020-01-01,1,2020-01-02,2,Sunset Lodge,4.2
9500,DDVN66535,2019-12-31,2019,00:51:52,VN05959,Female,52,Vietnam,Pleiku,Gia Lai,Egypt,Luxor,5,2020-01-24,4,2020-01-28,3,Coastal bay hotel,4.3


> ### 🚩 Exercise
> Select the `Gender`, `Age` and `Destination Country` columns only. Add on `.head()` to the end to get the top 5 rows for just these two columns.

In [24]:
# ✏️ ENTER YOUR SOLUTION HERE

orders[["Gender","Age","Destination Country"]].head()


Unnamed: 0,Gender,Age,Destination Country
0,Female,51,Ireland
1,Male,46,Maldives
2,Female,25,Canada
3,Male,22,Maldives
4,Male,45,France


### 5. How do we select only bookings made by women? How about only bookings by people under 20 years old? How about bookings made by men aged 40-49?

Here we need to know how to query our data based on a condition. There are two ways of apply this "conditional filter" in pandas:
1. Using square brackets ("masking").
2. Using the `.query()` method.

#### Filter to only bookings made by women: option 1 (using a mask)

In [25]:
# Let's create a "mask" filter. This contains True where the condition is matched.

mask = (orders.Gender == "Female")  # the round brackets are optional but may aid readability
mask

0        True
1       False
2        True
3       False
4       False
        ...  
9496     True
9497     True
9498     True
9499     True
9500     True
Name: Gender, Length: 9501, dtype: bool

In [26]:
# When you pass the mask into the pandas DataFrame selector brackets,
# it returns only the rows containing True, i.e. the rows where Gender is "Female"
orders[mask]

Unnamed: 0,Booking ID,Date of Booking,Year,Time,Customer ID,Gender,Age,Origin Country,State,Location,Destination Country,Destination City,No. Of People,Check-in date,No. Of Days,Check-Out Date,Rooms,Hotel Name,Hotel Rating
0,DDID57035,2019-01-01,2019,13:23:47,ID10297,Female,51,Indonesia,Tambora,Jakarta,Ireland,Tallaght,2,2019-03-24,1,2019-03-25,1,Blooming Bed And Breakfast,4.2
2,DDMY57037,2019-01-01,2019,09:49:48,MY10283,Female,25,Malaysia,Johor,Johor Bahru,Canada,North York,5,2019-01-16,9,2019-01-25,3,Hotel Joy Stick,3.8
5,DDMY57040,2019-01-01,2019,23:00:00,MY10284,Female,46,Malaysia,Perak,Ipoh,China,Shanghai,5,2019-01-16,3,2019-01-19,3,Reefs Resort & Club,4.2
6,DDID57041,2019-01-01,2019,20:46:02,ID10299,Female,26,Indonesia,Sukolilo,Central Java,Canada,Toronto,2,2019-03-03,2,2019-03-05,1,The Awua,4.1
7,DDPH57042,2019-01-01,2019,13:35:07,PH05497,Female,49,Philippines,Laguna,Cabuyao,Denmark,Odense,3,2019-01-02,4,2019-01-06,2,Flying Fox Motel,4.3
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
9496,DDSG66531,2019-12-31,2019,23:36:16,SG12034,Female,42,Singapore,Central,Orchard,Germany,Berlin,4,2020-01-06,4,2020-01-10,2,Silver Cloud Inn,4.3
9497,DDSG66532,2019-12-31,2019,14:41:01,SG12035,Female,54,Singapore,Central,Geylang,Israel,Holon,4,2020-04-09,4,2020-04-13,2,The Elet,4.2
9498,DDSG66533,2019-12-31,2019,19:11:16,SG12036,Female,57,Singapore,Central,Downtown Core,Canada,Ottawa,7,2020-01-09,1,2020-01-10,4,The Elet,4.4
9499,DDTH66534,2019-12-31,2019,05:12:29,TH12170,Female,44,Thailand,Surat Thani,Ko Samui,Maldives,Viligili,3,2020-01-01,1,2020-01-02,2,Sunset Lodge,4.2


In [27]:
# This is usually done all in one go
orders[orders.Gender == "Female"]

Unnamed: 0,Booking ID,Date of Booking,Year,Time,Customer ID,Gender,Age,Origin Country,State,Location,Destination Country,Destination City,No. Of People,Check-in date,No. Of Days,Check-Out Date,Rooms,Hotel Name,Hotel Rating
0,DDID57035,2019-01-01,2019,13:23:47,ID10297,Female,51,Indonesia,Tambora,Jakarta,Ireland,Tallaght,2,2019-03-24,1,2019-03-25,1,Blooming Bed And Breakfast,4.2
2,DDMY57037,2019-01-01,2019,09:49:48,MY10283,Female,25,Malaysia,Johor,Johor Bahru,Canada,North York,5,2019-01-16,9,2019-01-25,3,Hotel Joy Stick,3.8
5,DDMY57040,2019-01-01,2019,23:00:00,MY10284,Female,46,Malaysia,Perak,Ipoh,China,Shanghai,5,2019-01-16,3,2019-01-19,3,Reefs Resort & Club,4.2
6,DDID57041,2019-01-01,2019,20:46:02,ID10299,Female,26,Indonesia,Sukolilo,Central Java,Canada,Toronto,2,2019-03-03,2,2019-03-05,1,The Awua,4.1
7,DDPH57042,2019-01-01,2019,13:35:07,PH05497,Female,49,Philippines,Laguna,Cabuyao,Denmark,Odense,3,2019-01-02,4,2019-01-06,2,Flying Fox Motel,4.3
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
9496,DDSG66531,2019-12-31,2019,23:36:16,SG12034,Female,42,Singapore,Central,Orchard,Germany,Berlin,4,2020-01-06,4,2020-01-10,2,Silver Cloud Inn,4.3
9497,DDSG66532,2019-12-31,2019,14:41:01,SG12035,Female,54,Singapore,Central,Geylang,Israel,Holon,4,2020-04-09,4,2020-04-13,2,The Elet,4.2
9498,DDSG66533,2019-12-31,2019,19:11:16,SG12036,Female,57,Singapore,Central,Downtown Core,Canada,Ottawa,7,2020-01-09,1,2020-01-10,4,The Elet,4.4
9499,DDTH66534,2019-12-31,2019,05:12:29,TH12170,Female,44,Thailand,Surat Thani,Ko Samui,Maldives,Viligili,3,2020-01-01,1,2020-01-02,2,Sunset Lodge,4.2


#### Filter to only bookings made by women: option 2 (using the `.query()` method)

In [28]:
# .query() takes a string; pandas will then try to interpret the string
orders.query("Age <= 20")

Unnamed: 0,Booking ID,Date of Booking,Year,Time,Customer ID,Gender,Age,Origin Country,State,Location,Destination Country,Destination City,No. Of People,Check-in date,No. Of Days,Check-Out Date,Rooms,Hotel Name,Hotel Rating
9,DDTH57044,2019-01-01,2019,22:30:50,TH10427,Female,20,Thailand,Sakon Nakhon,Sakon Nakhon,France,Lyon,5,2019-03-08,1,2019-03-09,3,Thompson,4.4
12,DDTH57047,2019-01-01,2019,14:53:48,TH10429,Female,19,Thailand,Phitsanulok,Phitsanulok,Mexico,Ecatepec,2,2019-01-29,6,2019-02-04,1,The Huntington Hotel,3.7
14,DDSG57049,2019-01-01,2019,08:52:38,SG10310,Male,20,Singapore,Central,Marina South,New Zealand,Auckland,6,2019-02-17,1,2019-02-18,3,Primland,4.4
103,DDID57138,2019-01-05,2019,01:45:48,ID10312,Male,20,Indonesia,Kenyam,Papua,Nepal,Patan,1,2019-03-24,9,2019-04-02,1,Eurostars Magnificent Mile,3.7
121,DDMY57156,2019-01-06,2019,21:44:01,MY10311,Male,19,Malaysia,Kedah,Alor Setar,Canada,Calgary,7,2019-01-07,1,2019-01-08,4,Big Dreams Hotel,4.1
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
9442,DDPH66477,2019-12-29,2019,11:42:32,PH06380,Male,20,Philippines,Cebu,Bogo,India,Bangalore,4,2020-02-20,1,2020-02-21,2,Grantham Inn,4.5
9447,DDVN66482,2019-12-29,2019,22:45:56,VN05953,Male,19,Vietnam,Pleiku,Gia Lai,Canada,Montreal,1,2020-01-01,1,2020-01-02,1,River Park Hotel,4.6
9458,DDSG66493,2019-12-29,2019,21:31:53,SG12026,Female,19,Singapore,Central,Queenstown,Maldives,Naifaru,4,2020-01-22,1,2020-01-23,2,The St. Regis,4.6
9476,DDTH66511,2019-12-30,2019,18:45:43,TH12166,Female,19,Thailand,Chonburi,Chaophraya Surasak,Iceland,Reykjavik,1,2020-02-05,1,2020-02-06,1,Four Seasons Hotel Gresham Palace,4.3


In [29]:
# Notice here we need double equals (for equality) and single quotes
orders.query("Gender == 'Female'")

Unnamed: 0,Booking ID,Date of Booking,Year,Time,Customer ID,Gender,Age,Origin Country,State,Location,Destination Country,Destination City,No. Of People,Check-in date,No. Of Days,Check-Out Date,Rooms,Hotel Name,Hotel Rating
0,DDID57035,2019-01-01,2019,13:23:47,ID10297,Female,51,Indonesia,Tambora,Jakarta,Ireland,Tallaght,2,2019-03-24,1,2019-03-25,1,Blooming Bed And Breakfast,4.2
2,DDMY57037,2019-01-01,2019,09:49:48,MY10283,Female,25,Malaysia,Johor,Johor Bahru,Canada,North York,5,2019-01-16,9,2019-01-25,3,Hotel Joy Stick,3.8
5,DDMY57040,2019-01-01,2019,23:00:00,MY10284,Female,46,Malaysia,Perak,Ipoh,China,Shanghai,5,2019-01-16,3,2019-01-19,3,Reefs Resort & Club,4.2
6,DDID57041,2019-01-01,2019,20:46:02,ID10299,Female,26,Indonesia,Sukolilo,Central Java,Canada,Toronto,2,2019-03-03,2,2019-03-05,1,The Awua,4.1
7,DDPH57042,2019-01-01,2019,13:35:07,PH05497,Female,49,Philippines,Laguna,Cabuyao,Denmark,Odense,3,2019-01-02,4,2019-01-06,2,Flying Fox Motel,4.3
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
9496,DDSG66531,2019-12-31,2019,23:36:16,SG12034,Female,42,Singapore,Central,Orchard,Germany,Berlin,4,2020-01-06,4,2020-01-10,2,Silver Cloud Inn,4.3
9497,DDSG66532,2019-12-31,2019,14:41:01,SG12035,Female,54,Singapore,Central,Geylang,Israel,Holon,4,2020-04-09,4,2020-04-13,2,The Elet,4.2
9498,DDSG66533,2019-12-31,2019,19:11:16,SG12036,Female,57,Singapore,Central,Downtown Core,Canada,Ottawa,7,2020-01-09,1,2020-01-10,4,The Elet,4.4
9499,DDTH66534,2019-12-31,2019,05:12:29,TH12170,Female,44,Thailand,Surat Thani,Ko Samui,Maldives,Viligili,3,2020-01-01,1,2020-01-02,2,Sunset Lodge,4.2


#### Bookings made by men aged 40-49: option 1 (using a mask)

In [30]:
# Round brackets and the & symbol are both essential here
orders[(orders.Gender == "Male") & (orders.Age >= 40) & (orders.Age <= 49)]

Unnamed: 0,Booking ID,Date of Booking,Year,Time,Customer ID,Gender,Age,Origin Country,State,Location,Destination Country,Destination City,No. Of People,Check-in date,No. Of Days,Check-Out Date,Rooms,Hotel Name,Hotel Rating
1,DDSG57036,2019-01-01,2019,16:14:22,SG10307,Male,46,Singapore,Central,Novena,Maldives,Viligili,4,2019-01-15,2,2019-01-17,2,Four Points,4.3
4,DDID57039,2019-01-01,2019,13:57:50,ID10298,Male,45,Indonesia,Bekasi,West Java,France,Nice,7,2019-01-02,1,2019-01-03,4,Adam Lake B&B,4.5
15,DDVN57050,2019-01-01,2019,13:51:27,VN05100,Male,40,Vietnam,Dong Hoi,Quang Binh,Nepal,Birgunj,1,2019-03-28,1,2019-03-29,1,The Inn on Lombard,4.2
19,DDSG57054,2019-01-01,2019,15:23:11,SG10311,Male,42,Singapore,East,Paya Lebar,Brazil,Belo Horizonte,1,2019-01-12,4,2019-01-16,1,Hillsong B&B,4.6
22,DDTH57057,2019-01-01,2019,11:40:20,TH10430,Male,40,Thailand,Nakhon Pathom,Nakhon Pathom,Kenya,Kisumu,2,2019-01-03,3,2019-01-06,1,Land’s End Resort,3.8
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
9467,DDMY66502,2019-12-30,2019,04:38:33,MY12044,Male,46,Malaysia,Penang,George Town,Italy,Bari,7,2020-01-14,1,2020-01-15,4,Serene Stay,4.2
9473,DDPH66508,2019-12-30,2019,17:02:41,PH06383,Male,40,Philippines,La Union,San Fernando,Maldives,Eydhafushi,7,2020-01-04,1,2020-01-05,4,Comfort Kingdom,4.2
9480,DDTH66515,2019-12-31,2019,04:15:03,TH12167,Male,46,Thailand,Surat Thani,Ko Samui,Iceland,Reykjavik,3,2020-01-12,3,2020-01-15,2,Consulate Hotel,4.2
9493,DDMY66528,2019-12-31,2019,17:53:01,MY12048,Male,46,Malaysia,Kedah,Alor Setar,Canada,Ottawa,5,2020-01-01,3,2020-01-04,3,Hotel Triton,4.1


#### Bookings made by men aged 40-49: option 2 (using the `.query()` method)

In [31]:
# We can use the "and" keyword here
orders.query("Gender == 'Male' and Age >= 40 and Age <= 49")

Unnamed: 0,Booking ID,Date of Booking,Year,Time,Customer ID,Gender,Age,Origin Country,State,Location,Destination Country,Destination City,No. Of People,Check-in date,No. Of Days,Check-Out Date,Rooms,Hotel Name,Hotel Rating
1,DDSG57036,2019-01-01,2019,16:14:22,SG10307,Male,46,Singapore,Central,Novena,Maldives,Viligili,4,2019-01-15,2,2019-01-17,2,Four Points,4.3
4,DDID57039,2019-01-01,2019,13:57:50,ID10298,Male,45,Indonesia,Bekasi,West Java,France,Nice,7,2019-01-02,1,2019-01-03,4,Adam Lake B&B,4.5
15,DDVN57050,2019-01-01,2019,13:51:27,VN05100,Male,40,Vietnam,Dong Hoi,Quang Binh,Nepal,Birgunj,1,2019-03-28,1,2019-03-29,1,The Inn on Lombard,4.2
19,DDSG57054,2019-01-01,2019,15:23:11,SG10311,Male,42,Singapore,East,Paya Lebar,Brazil,Belo Horizonte,1,2019-01-12,4,2019-01-16,1,Hillsong B&B,4.6
22,DDTH57057,2019-01-01,2019,11:40:20,TH10430,Male,40,Thailand,Nakhon Pathom,Nakhon Pathom,Kenya,Kisumu,2,2019-01-03,3,2019-01-06,1,Land’s End Resort,3.8
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
9467,DDMY66502,2019-12-30,2019,04:38:33,MY12044,Male,46,Malaysia,Penang,George Town,Italy,Bari,7,2020-01-14,1,2020-01-15,4,Serene Stay,4.2
9473,DDPH66508,2019-12-30,2019,17:02:41,PH06383,Male,40,Philippines,La Union,San Fernando,Maldives,Eydhafushi,7,2020-01-04,1,2020-01-05,4,Comfort Kingdom,4.2
9480,DDTH66515,2019-12-31,2019,04:15:03,TH12167,Male,46,Thailand,Surat Thani,Ko Samui,Iceland,Reykjavik,3,2020-01-12,3,2020-01-15,2,Consulate Hotel,4.2
9493,DDMY66528,2019-12-31,2019,17:53:01,MY12048,Male,46,Malaysia,Kedah,Alor Setar,Canada,Ottawa,5,2020-01-01,3,2020-01-04,3,Hotel Triton,4.1


> ### 🚩 Exercise
> Find all bookings made by women aged 30 whose destination country was Italy.
> 
> _**Tip**: [you can use backticks](https://stackoverflow.com/a/56157729/3279076) inside `.query()` to reference columns containing a space._

In [32]:
orders.columns

Index(['Booking ID', 'Date of Booking', 'Year', 'Time', 'Customer ID',
       'Gender', 'Age', 'Origin Country', 'State', 'Location',
       'Destination Country', 'Destination City', 'No. Of People',
       'Check-in date', 'No. Of Days', 'Check-Out Date', 'Rooms', 'Hotel Name',
       'Hotel Rating'],
      dtype='object')

In [33]:
# ✏️ ENTER YOUR SOLUTION HERE

orders[(orders.Age == 30) & (orders.Gender == 'Female') & (orders["Destination Country"] == 'Italy')]


Unnamed: 0,Booking ID,Date of Booking,Year,Time,Customer ID,Gender,Age,Origin Country,State,Location,Destination Country,Destination City,No. Of People,Check-in date,No. Of Days,Check-Out Date,Rooms,Hotel Name,Hotel Rating
1168,DDVN58203,2019-02-15,2019,18:26:02,VN05200,Female,30,Vietnam,Tam Ky,Quang Nam,Italy,Naples,4,2019-03-02,3,2019-03-05,2,The Watson Hotel,4.5
1964,DDMY58999,2019-03-16,2019,19:25:44,MY10647,Female,30,Malaysia,Sabah,Kota Kinabalu,Italy,Naples,6,2019-03-21,8,2019-03-29,3,Qualia,4.2
2200,DDTH59235,2019-03-25,2019,20:40:00,TH10841,Female,30,Thailand,Nonthaburi,Nonthaburi,Italy,Rome,7,2019-04-13,1,2019-04-14,4,Big Dreams Hotel,4.5
2891,DDID59926,2019-04-21,2019,20:40:00,ID10807,Female,30,Indonesia,Malang,East Java,Italy,Rome,7,2019-05-13,1,2019-05-14,4,Serene Stay,4.5
4339,DDMY61374,2019-06-17,2019,19:25:44,MY11097,Female,30,Malaysia,Selangor,Shah Alam,Italy,Naples,6,2019-06-18,1,2019-06-19,3,Always Welcome,4.2
5576,DDSG62611,2019-08-02,2019,01:36:44,SG11350,Female,30,Singapore,Central,Queenstown,Italy,Rome,4,2019-08-10,3,2019-08-13,2,The Manhattan,4.6
7878,DDSG64913,2019-10-30,2019,18:26:02,SG11767,Female,30,Singapore,Central,Downtown Core,Italy,Naples,4,2019-11-02,1,2019-11-03,2,The Lowell,4.5
8248,DDMY65283,2019-11-13,2019,01:36:44,MY11798,Female,30,Malaysia,Sabah,Kota Kinabalu,Italy,Rome,4,2019-11-14,3,2019-11-17,2,Oxford Suites,4.6


> ### 🚩 Exercise
> **How many** bookings by people aged 50 had a destination country of either Germany OR France?
> 
> _**Tips**:_
>   - _You may want to use brackets to help keep the logic clear._
>   - _The keyword in a `.query()` for "A or B" is `or`._
>   - _The "or" equivalent of `&` is `|` a.k.a. the "pipe operator"._

In [34]:
# ✏️ ENTER YOUR SOLUTION HERE

orders[(orders.Age == 50) & (orders["Destination Country"] == 'Germany') | (orders["Destination Country"] == 'France')]


Unnamed: 0,Booking ID,Date of Booking,Year,Time,Customer ID,Gender,Age,Origin Country,State,Location,Destination Country,Destination City,No. Of People,Check-in date,No. Of Days,Check-Out Date,Rooms,Hotel Name,Hotel Rating
4,DDID57039,2019-01-01,2019,13:57:50,ID10298,Male,45,Indonesia,Bekasi,West Java,France,Nice,7,2019-01-02,1,2019-01-03,4,Adam Lake B&B,4.5
9,DDTH57044,2019-01-01,2019,22:30:50,TH10427,Female,20,Thailand,Sakon Nakhon,Sakon Nakhon,France,Lyon,5,2019-03-08,1,2019-03-09,3,Thompson,4.4
16,DDVN57051,2019-01-01,2019,12:04:44,VN05101,Male,58,Vietnam,Pleiku,Gia Lai,France,Rennes,3,2019-01-21,1,2019-01-22,2,Hotel Vertigo,4.4
39,DDMY57074,2019-01-02,2019,08:33:23,MY10290,Female,25,Malaysia,Sarawak,Kuching,France,Lyon,5,2019-01-08,6,2019-01-14,3,Virgin Hotels,4.6
45,DDSG57080,2019-01-02,2019,06:55:55,SG10314,Male,32,Singapore,Central,Newton,France,Rennes,1,2019-03-24,2,2019-03-26,1,The Standard High Line,3.8
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
9385,DDSG66420,2019-12-27,2019,10:16:33,SG12015,Male,26,Singapore,North,Simpang,France,Marseille,4,2020-01-02,2,2020-01-04,2,Riverleaf Inn,3.7
9402,DDVN66437,2019-12-27,2019,10:16:33,VN05949,Male,26,Vietnam,Da Lat,Lam Dong,France,Marseille,4,2019-12-28,5,2020-01-02,2,Emerald Bay Inn,3.7
9443,DDTH66478,2019-12-29,2019,07:35:17,TH12160,Female,27,Thailand,Chonburi,Pattaya,France,Lyon,1,2020-03-13,1,2020-03-14,1,The Turkey Shore Resort,4.1
9477,DDMY66512,2019-12-31,2019,09:56:37,MY12046,Female,30,Malaysia,Sarawak,Kuching,France,Rennes,5,2020-01-07,2,2020-01-09,3,Roadside Motel,3.8


<div class="alert alert-block alert-info" style="background-color: #BA001E; border: 0px; -moz-border-radius: 10px; -webkit-border-radius: 10px;">
<h2 style="color: white">
Adding columns with map & apply
</h2><br>
</div>

In [35]:
# Adding a column to a pandas DataFrame is like adding a new key-value pair to a dictionary
orders['Continent'] = 'PLACEHOLDER'

In [36]:
# We can derive new columns from existing columns
orders['No. Of People'] / orders['Rooms']

0       2.000000
1       2.000000
2       1.666667
3       1.666667
4       1.750000
          ...   
9496    2.000000
9497    2.000000
9498    1.750000
9499    1.500000
9500    1.666667
Length: 9501, dtype: float64

In [37]:
# We need to save this information into a new column if we want to use it later
orders['People Per Room'] = orders['No. Of People'] / orders['Rooms']

In [38]:
# Let's take a look at our new column
orders.head(3)

Unnamed: 0,Booking ID,Date of Booking,Year,Time,Customer ID,Gender,Age,Origin Country,State,Location,...,Destination City,No. Of People,Check-in date,No. Of Days,Check-Out Date,Rooms,Hotel Name,Hotel Rating,Continent,People Per Room
0,DDID57035,2019-01-01,2019,13:23:47,ID10297,Female,51,Indonesia,Tambora,Jakarta,...,Tallaght,2,2019-03-24,1,2019-03-25,1,Blooming Bed And Breakfast,4.2,PLACEHOLDER,2.0
1,DDSG57036,2019-01-01,2019,16:14:22,SG10307,Male,46,Singapore,Central,Novena,...,Viligili,4,2019-01-15,2,2019-01-17,2,Four Points,4.3,PLACEHOLDER,2.0
2,DDMY57037,2019-01-01,2019,09:49:48,MY10283,Female,25,Malaysia,Johor,Johor Bahru,...,North York,5,2019-01-16,9,2019-01-25,3,Hotel Joy Stick,3.8,PLACEHOLDER,1.666667


### The `.map()` method

This is somewhat similar to a VLOOKUP in Excel. You must supply a "lookup dictionary".

In [39]:
continent_lookup = {
    # Africa
    'Egypt': 'Africa',
    'Kenya': 'Africa',
    # Asia
    'China': 'Asia',
    'India': 'Asia',
    'Israel': 'Asia',
    'Iran': 'Asia',
    'Japan': 'Asia',
    'Maldives': 'Asia',
    'Nepal': 'Asia',
    # Australia
    'Australia': 'Australia',
    'New Zealand': 'Australia',
    # Europe
    'Denmark': 'Europe',
    'France': 'Europe',
    'Germany': 'Europe',
    'Iceland': 'Europe',
    'Ireland': 'Europe',
    'Italy': 'Europe',
    # North America
    'Canada': 'North America',
    'Mexico': 'North America',
    # South America
    'Brazil': 'South America',
    'Colombia': 'South America',
}

In [40]:
# We can now "look up" the continent (i.e. dictionary lookup) associated
# with the dictionary key for e.g. Japan
continent_lookup['Brazil']

'South America'

The format of `.map()` is:

```python
DATAFRAME[COLUMN].map(DICTIONARY)
```

In [41]:
# Let's map the Destination Country of our orders DataFrame
# using the continent_lookup dictionary.
orders['Destination Country'].map(continent_lookup)

0              Europe
1                Asia
2       North America
3                Asia
4              Europe
            ...      
9496           Europe
9497             Asia
9498    North America
9499             Asia
9500           Africa
Name: Destination Country, Length: 9501, dtype: object

In [42]:
# This looks good, but it isn't saved. Let's save it into a new column called "Continent".
orders['Continent'] = orders['Destination Country'].map(continent_lookup)

In [43]:
# Great, time to take a look!
orders.tail(10)

Unnamed: 0,Booking ID,Date of Booking,Year,Time,Customer ID,Gender,Age,Origin Country,State,Location,...,Destination City,No. Of People,Check-in date,No. Of Days,Check-Out Date,Rooms,Hotel Name,Hotel Rating,Continent,People Per Room
9491,DDVN66526,2019-12-31,2019,15:27:30,VN05957,Male,52,Vietnam,Phan Thiet,Binh Thuan,...,Holon,2,2020-01-01,1,2020-01-02,1,Slumber Falls,3.8,Asia,2.0
9492,DDID66527,2019-12-31,2019,04:18:40,ID11986,Female,30,Indonesia,Senayan,Jakarta,...,Beersheba,4,2020-03-28,1,2020-03-29,2,Creek Quest,3.8,Asia,2.0
9493,DDMY66528,2019-12-31,2019,17:53:01,MY12048,Male,46,Malaysia,Kedah,Alor Setar,...,Ottawa,5,2020-01-01,3,2020-01-04,3,Hotel Triton,4.1,North America,1.666667
9494,DDVN66529,2019-12-31,2019,04:31:42,VN05958,Female,19,Vietnam,Dien Ban,Quang Nam,...,Thinadhoo,1,2020-01-01,1,2020-01-02,1,The Hot Springs Hotel,3.8,Asia,1.0
9495,DDMY66530,2019-12-31,2019,04:15:03,MY12049,Male,46,Malaysia,Johor,Johor Bahru,...,Reykjavik,3,2020-01-09,1,2020-01-10,2,Hotel The Pie,4.2,Europe,1.5
9496,DDSG66531,2019-12-31,2019,23:36:16,SG12034,Female,42,Singapore,Central,Orchard,...,Berlin,4,2020-01-06,4,2020-01-10,2,Silver Cloud Inn,4.3,Europe,2.0
9497,DDSG66532,2019-12-31,2019,14:41:01,SG12035,Female,54,Singapore,Central,Geylang,...,Holon,4,2020-04-09,4,2020-04-13,2,The Elet,4.2,Asia,2.0
9498,DDSG66533,2019-12-31,2019,19:11:16,SG12036,Female,57,Singapore,Central,Downtown Core,...,Ottawa,7,2020-01-09,1,2020-01-10,4,The Elet,4.4,North America,1.75
9499,DDTH66534,2019-12-31,2019,05:12:29,TH12170,Female,44,Thailand,Surat Thani,Ko Samui,...,Viligili,3,2020-01-01,1,2020-01-02,2,Sunset Lodge,4.2,Asia,1.5
9500,DDVN66535,2019-12-31,2019,00:51:52,VN05959,Female,52,Vietnam,Pleiku,Gia Lai,...,Luxor,5,2020-01-24,4,2020-01-28,3,Coastal bay hotel,4.3,Africa,1.666667


> ### 🚩 Exercise
> Let's further group the continents into "[Old World](https://en.wikipedia.org/wiki/Old_World)" and "[New World](https://en.wikipedia.org/wiki/New_World)". The mapping dictionary is provided to you below.
> 
> Create a new column called `World` with the continents mapped according to the dictionary below.

In [44]:
world_lookup = {
    # Old World
    "Africa": "Old World",
    "Asia": "Old World",
    "Europe": "Old World",
    
    # New World
    "North America": "New World",
    "South America": "New World",
    
    # Australia
    "Australia": "Australia",
}

In [45]:
world_lookup['Europe']

'Old World'

In [46]:
# ✏️ ENTER YOUR SOLUTION HERE

orders['Continent'].map(world_lookup)

orders["World"] = orders['Continent'].map(world_lookup)


In [47]:
orders.head(5)

Unnamed: 0,Booking ID,Date of Booking,Year,Time,Customer ID,Gender,Age,Origin Country,State,Location,...,No. Of People,Check-in date,No. Of Days,Check-Out Date,Rooms,Hotel Name,Hotel Rating,Continent,People Per Room,World
0,DDID57035,2019-01-01,2019,13:23:47,ID10297,Female,51,Indonesia,Tambora,Jakarta,...,2,2019-03-24,1,2019-03-25,1,Blooming Bed And Breakfast,4.2,Europe,2.0,Old World
1,DDSG57036,2019-01-01,2019,16:14:22,SG10307,Male,46,Singapore,Central,Novena,...,4,2019-01-15,2,2019-01-17,2,Four Points,4.3,Asia,2.0,Old World
2,DDMY57037,2019-01-01,2019,09:49:48,MY10283,Female,25,Malaysia,Johor,Johor Bahru,...,5,2019-01-16,9,2019-01-25,3,Hotel Joy Stick,3.8,North America,1.666667,New World
3,DDSG57038,2019-01-01,2019,11:46:28,SG10308,Male,22,Singapore,North-East,Hougang,...,5,2019-01-18,1,2019-01-19,3,Classio Hotel,3.7,Asia,1.666667,Old World
4,DDID57039,2019-01-01,2019,13:57:50,ID10298,Male,45,Indonesia,Bekasi,West Java,...,7,2019-01-02,1,2019-01-03,4,Adam Lake B&B,4.5,Europe,1.75,Old World


### The `.apply()` method
The `.map()` method cycles through the values in the specified column, and then runs each single value through a dictionary lookup.

In contrast the `.apply()` method cycles through the values in the specified column, and then runs each single value through a function. Let's see it in action.

In [48]:
# First let's see how the Python round() function works
for number in [4.2, 4.4, 4.5, 4.8, 5.5]:
    print(number, "rounds to", round(number))

4.2 rounds to 4
4.4 rounds to 4
4.5 rounds to 4
4.8 rounds to 5
5.5 rounds to 6


> _Sidenote: Python implements "[Banker's Rounding](https://www.mathsisfun.com/numbers/rounding-methods.html)" whereby 0.5 intervals are rounded towards the nearest even number. This reduces bias in calculations performed on the rounded numbers. This is the standard method of rounding taught at school in some countries._

In [49]:
# Let's apply the round function to the Hotel Rating column
orders['Hotel Rating'].apply(round)

0       4
1       4
2       4
3       4
4       4
       ..
9496    4
9497    4
9498    4
9499    4
9500    4
Name: Hotel Rating, Length: 9501, dtype: int64

The format of `.apply()` is:

```python
DATAFRAME[COLUMN].apply(FUNCTION)
```

In [50]:
# As before, let's save our calculation into a new column in the DataFrame
orders['Hotel Rating (rounded)'] = orders['Hotel Rating'].apply(round)

In [51]:
orders.head(3)

Unnamed: 0,Booking ID,Date of Booking,Year,Time,Customer ID,Gender,Age,Origin Country,State,Location,...,Check-in date,No. Of Days,Check-Out Date,Rooms,Hotel Name,Hotel Rating,Continent,People Per Room,World,Hotel Rating (rounded)
0,DDID57035,2019-01-01,2019,13:23:47,ID10297,Female,51,Indonesia,Tambora,Jakarta,...,2019-03-24,1,2019-03-25,1,Blooming Bed And Breakfast,4.2,Europe,2.0,Old World,4
1,DDSG57036,2019-01-01,2019,16:14:22,SG10307,Male,46,Singapore,Central,Novena,...,2019-01-15,2,2019-01-17,2,Four Points,4.3,Asia,2.0,Old World,4
2,DDMY57037,2019-01-01,2019,09:49:48,MY10283,Female,25,Malaysia,Johor,Johor Bahru,...,2019-01-16,9,2019-01-25,3,Hotel Joy Stick,3.8,North America,1.666667,New World,4


---

> ### 🚩 Exercise
> Create a function that generates an age range string from an integer. We've broken this exercise out step by step for you.

In [52]:
# Let's import the floor() function from Python's math module
from math import floor

The `floor()` function from [Python's math module](https://docs.python.org/3/library/math.html) rounds a decimal number down to the nearest integer.

In [53]:
print(floor(3.1))
print(floor(3.9999999))

3
3


In [54]:
# We can use this to round down to the nearest decade.
age = 33
floor(age / 10) * 10

30

In [55]:
# We can format strings in Python using "f-strings". These inject anything inside the curly brackets {}
# into the string and are great for taking variables and formatting them into the string.
age = 33
print(f"age is {age}")

age is 33


In [56]:
# You can put any Python code inside the curly brackets. It will be evaluated,
# and then the result will be inserted into the string.
f"{age}-{age + 10}"

'33-43'

> Now it's your turn! We've given you all the clues. We've even created the function outline for you. You should be able to take the `age` input to the function, calculate the nearest decade _lower_ than the age, calculate the age range, construct a format string, and **don't forget to return the formatted string.**
> 
> _Example: if `age` is 59, the nearest decade below this is 50, so the age bracket is "50-59"._

In [57]:
# ✏️ ENTER YOUR SOLUTION HERE

def calculate_age_bracket(age):
    f = floor(age / 10) * 10
    return print(f"{f}-{f}")


In [58]:
# Let's test out the function. This should display "30-39".
calculate_age_bracket(age=33)

30-30


In [59]:
# This should display "30-39"
calculate_age_bracket(age=39)

30-30


In [60]:
# This should display "40-49"
calculate_age_bracket(age=40)

40-40


> ### 🚩 Exercise
> Apply this function, using the `.apply()` method, to the `Age` column of the `orders` DataFrame, and create a new column called `Age Group`. Double check the previous examples, in fact it may be easiest to copy/paste/adapt the previous examples into the cell below in order to get started. Remember to enter just the name of the function you want to "apply" inside the apply method's brackets, i.e. `.apply(calculate_age_bracket)`.

In [61]:
# ✏️ ENTER YOUR SOLUTION HERE

orders["AgeGroup"] = orders["Age"].apply(calculate_age_bracket)


50-50
40-40
20-20
20-20
40-40
40-40
20-20
40-40
50-50
20-20
20-20
40-40
10-10
50-50
20-20
40-40
50-50
30-30
50-50
40-40
20-20
20-20
40-40
20-20
50-50
20-20
30-30
40-40
20-20
30-30
50-50
30-30
40-40
30-30
50-50
40-40
40-40
50-50
40-40
20-20
40-40
40-40
30-30
20-20
30-30
30-30
20-20
20-20
50-50
40-40
50-50
20-20
40-40
30-30
20-20
50-50
30-30
30-30
30-30
20-20
30-30
40-40
20-20
50-50
30-30
30-30
40-40
30-30
50-50
40-40
50-50
30-30
20-20
40-40
50-50
40-40
40-40
50-50
30-30
20-20
50-50
30-30
30-30
40-40
20-20
20-20
50-50
20-20
30-30
30-30
30-30
50-50
50-50
50-50
30-30
50-50
20-20
50-50
20-20
40-40
20-20
40-40
20-20
20-20
50-50
30-30
30-30
30-30
40-40
30-30
30-30
40-40
20-20
50-50
50-50
20-20
50-50
40-40
40-40
50-50
50-50
10-10
20-20
50-50
50-50
50-50
30-30
50-50
40-40
50-50
30-30
20-20
30-30
10-10
10-10
50-50
20-20
30-30
50-50
30-30
20-20
30-30
20-20
40-40
20-20
20-20
50-50
20-20
30-30
30-30
40-40
20-20
40-40
50-50
50-50
20-20
20-20
30-30
30-30
40-40
40-40
30-30
40-40
30-30
10-10
30-30
40-4

30-30
20-20
20-20
40-40
40-40
30-30
50-50
40-40
20-20
40-40
40-40
40-40
20-20
50-50
20-20
40-40
30-30
30-30
50-50
20-20
40-40
20-20
40-40
20-20
20-20
50-50
20-20
30-30
20-20
50-50
40-40
40-40
40-40
40-40
30-30
30-30
30-30
50-50
30-30
50-50
20-20
20-20
30-30
40-40
40-40
50-50
40-40
40-40
30-30
30-30
20-20
50-50
50-50
40-40
40-40
20-20
20-20
30-30
20-20
30-30
40-40
50-50
50-50
20-20
10-10
50-50
50-50
30-30
30-30
40-40
30-30
20-20
20-20
50-50
40-40
30-30
30-30
40-40
30-30
50-50
30-30
20-20
40-40
50-50
40-40
50-50
20-20
30-30
20-20
50-50
30-30
40-40
30-30
50-50
40-40
40-40
20-20
40-40
50-50
40-40
30-30
30-30
40-40
30-30
40-40
40-40
20-20
20-20
30-30
30-30
20-20
20-20
30-30
20-20
30-30
30-30
20-20
30-30
40-40
10-10
30-30
30-30
50-50
30-30
20-20
40-40
20-20
30-30
30-30
30-30
40-40
30-30
40-40
30-30
30-30
30-30
30-30
40-40
30-30
50-50
20-20
50-50
50-50
40-40
30-30
20-20
50-50
30-30
40-40
20-20
30-30
50-50
20-20
40-40
10-10
20-20
40-40
30-30
40-40
30-30
20-20
30-30
20-20
20-20
40-40
50-50
20-2

30-30
50-50
50-50
40-40
10-10
20-20
40-40
40-40
50-50
30-30
40-40
50-50
30-30
50-50
40-40
30-30
20-20
50-50
30-30
40-40
20-20
40-40
20-20
30-30
20-20
40-40
50-50
40-40
50-50
30-30
20-20
50-50
30-30
20-20
40-40
30-30
40-40
20-20
30-30
30-30
20-20
20-20
40-40
20-20
50-50
50-50
40-40
30-30
40-40
40-40
30-30
40-40
40-40
20-20
40-40
40-40
40-40
20-20
30-30
30-30
50-50
30-30
50-50
20-20
50-50
10-10
20-20
50-50
30-30
30-30
50-50
50-50
20-20
50-50
20-20
30-30
50-50
30-30
20-20
20-20
40-40
40-40
40-40
20-20
40-40
30-30
40-40
20-20
20-20
30-30
20-20
40-40
30-30
50-50
50-50
40-40
50-50
30-30
20-20
20-20
20-20
50-50
20-20
40-40
20-20
20-20
50-50
30-30
30-30
20-20
40-40
30-30
50-50
40-40
50-50
40-40
10-10
20-20
50-50
20-20
40-40
50-50
30-30
40-40
40-40
50-50
20-20
50-50
20-20
40-40
50-50
40-40
20-20
20-20
30-30
20-20
30-30
40-40
20-20
50-50
40-40
20-20
50-50
50-50
50-50
50-50
30-30
50-50
20-20
30-30
20-20
30-30
30-30
50-50
30-30
30-30
50-50
30-30
50-50
40-40
30-30
30-30
50-50
10-10
40-40
20-20
50-5

30-30
20-20
20-20
40-40
40-40
50-50
30-30
20-20
20-20
30-30
50-50
40-40
20-20
50-50
30-30
20-20
20-20
20-20
50-50
40-40
40-40
20-20
40-40
30-30
50-50
30-30
10-10
20-20
30-30
30-30
20-20
20-20
30-30
20-20
40-40
40-40
10-10
40-40
30-30
50-50
50-50
50-50
20-20
50-50
20-20
40-40
20-20
20-20
20-20
30-30
40-40
50-50
20-20
40-40
20-20
30-30
50-50
40-40
30-30
40-40
30-30
30-30
30-30
40-40
50-50
30-30
40-40
40-40
30-30
20-20
30-30
30-30
30-30
40-40
20-20
50-50
50-50
20-20
20-20
20-20
30-30
50-50
50-50
40-40
40-40
20-20
40-40
20-20
20-20
30-30
40-40
50-50
20-20
50-50
30-30
30-30
20-20
50-50
40-40
40-40
20-20
20-20
40-40
20-20
40-40
40-40
50-50
20-20
50-50
20-20
50-50
50-50
40-40
30-30
30-30
30-30
40-40
50-50
20-20
20-20
30-30
40-40
30-30
40-40
20-20
30-30
50-50
30-30
40-40
40-40
40-40
20-20
50-50
50-50
20-20
20-20
30-30
10-10
40-40
40-40
20-20
30-30
20-20
20-20
20-20
30-30
20-20
40-40
30-30
20-20
40-40
20-20
50-50
30-30
20-20
20-20
40-40
50-50
20-20
20-20
40-40
20-20
40-40
50-50
50-50
20-20
20-2

20-20
30-30
50-50
40-40
40-40
50-50
30-30
20-20
40-40
20-20
20-20
40-40
20-20
50-50
30-30
20-20
40-40
30-30
40-40
20-20
30-30
40-40
40-40
30-30
50-50
20-20
20-20
20-20
30-30
20-20
30-30
30-30
20-20
30-30
20-20
50-50
40-40
50-50
20-20
40-40
10-10
40-40
20-20
20-20
30-30
20-20
20-20
30-30
30-30
10-10
50-50
50-50
40-40
30-30
30-30
50-50
20-20
50-50
20-20
40-40
20-20
30-30
40-40
20-20
30-30
30-30
20-20
30-30
40-40
40-40
20-20
20-20
30-30
20-20
30-30
40-40
30-30
30-30
30-30
50-50
30-30
30-30
20-20
40-40
40-40
30-30
50-50
30-30
40-40
50-50
30-30
30-30
50-50
50-50
30-30
50-50
30-30
40-40
40-40
20-20
30-30
40-40
20-20
20-20
50-50
50-50
40-40
40-40
10-10
50-50
50-50
50-50
20-20
50-50
50-50
40-40
40-40
20-20
30-30
20-20
30-30
40-40
20-20
30-30
50-50
40-40
50-50
20-20
40-40
20-20
30-30
40-40
20-20
40-40
30-30
30-30
30-30
50-50
40-40
40-40
30-30
30-30
20-20
40-40
20-20
50-50
40-40
50-50
40-40
50-50
50-50
40-40
50-50
50-50
40-40
30-30
40-40
20-20
40-40
50-50
30-30
30-30
10-10
50-50
30-30
50-50
20-2

30-30
20-20
50-50
50-50
20-20
30-30
50-50
50-50
20-20
40-40
20-20
20-20
50-50
20-20
20-20
50-50
40-40
10-10
40-40
50-50
50-50
10-10
40-40
20-20
50-50
30-30
40-40
20-20
20-20
50-50
50-50
30-30
20-20
10-10
40-40
50-50
40-40
30-30
50-50
30-30
20-20
20-20
30-30
30-30
30-30
20-20
40-40
40-40
30-30
20-20
50-50
40-40
30-30
20-20
30-30
50-50
40-40
50-50
40-40
30-30
30-30
50-50
50-50
40-40
20-20
30-30
40-40
20-20
40-40
10-10
20-20
50-50
30-30
30-30
30-30
40-40
40-40
30-30
40-40
50-50
50-50
40-40
20-20
20-20
30-30
40-40
50-50
20-20
30-30
10-10
30-30
40-40
10-10
50-50
40-40
40-40
50-50
50-50
30-30
20-20
20-20
20-20
50-50
50-50
20-20
20-20
20-20
30-30
30-30
40-40
30-30
40-40
20-20
40-40
30-30
20-20
30-30
40-40
20-20
40-40
50-50
40-40
20-20
20-20
30-30
20-20
30-30
20-20
30-30
50-50
30-30
40-40
50-50
40-40
10-10
30-30
50-50
20-20
50-50
40-40
30-30
30-30
40-40
40-40
50-50
30-30
20-20
50-50
40-40
30-30
20-20
30-30
50-50
40-40
30-30
40-40
10-10
50-50
50-50
40-40
40-40
50-50
50-50
50-50
50-50
50-50
30-3

In [62]:
orders

Unnamed: 0,Booking ID,Date of Booking,Year,Time,Customer ID,Gender,Age,Origin Country,State,Location,...,No. Of Days,Check-Out Date,Rooms,Hotel Name,Hotel Rating,Continent,People Per Room,World,Hotel Rating (rounded),AgeGroup
0,DDID57035,2019-01-01,2019,13:23:47,ID10297,Female,51,Indonesia,Tambora,Jakarta,...,1,2019-03-25,1,Blooming Bed And Breakfast,4.2,Europe,2.000000,Old World,4,
1,DDSG57036,2019-01-01,2019,16:14:22,SG10307,Male,46,Singapore,Central,Novena,...,2,2019-01-17,2,Four Points,4.3,Asia,2.000000,Old World,4,
2,DDMY57037,2019-01-01,2019,09:49:48,MY10283,Female,25,Malaysia,Johor,Johor Bahru,...,9,2019-01-25,3,Hotel Joy Stick,3.8,North America,1.666667,New World,4,
3,DDSG57038,2019-01-01,2019,11:46:28,SG10308,Male,22,Singapore,North-East,Hougang,...,1,2019-01-19,3,Classio Hotel,3.7,Asia,1.666667,Old World,4,
4,DDID57039,2019-01-01,2019,13:57:50,ID10298,Male,45,Indonesia,Bekasi,West Java,...,1,2019-01-03,4,Adam Lake B&B,4.5,Europe,1.750000,Old World,4,
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
9496,DDSG66531,2019-12-31,2019,23:36:16,SG12034,Female,42,Singapore,Central,Orchard,...,4,2020-01-10,2,Silver Cloud Inn,4.3,Europe,2.000000,Old World,4,
9497,DDSG66532,2019-12-31,2019,14:41:01,SG12035,Female,54,Singapore,Central,Geylang,...,4,2020-04-13,2,The Elet,4.2,Asia,2.000000,Old World,4,
9498,DDSG66533,2019-12-31,2019,19:11:16,SG12036,Female,57,Singapore,Central,Downtown Core,...,1,2020-01-10,4,The Elet,4.4,North America,1.750000,New World,4,
9499,DDTH66534,2019-12-31,2019,05:12:29,TH12170,Female,44,Thailand,Surat Thani,Ko Samui,...,1,2020-01-02,2,Sunset Lodge,4.2,Asia,1.500000,Old World,4,


---
<div class="alert alert-block alert-info">
    <b>Please proceed to the next part of the course when you are ready.</b> We recommend you download a copy of the <a href="https://pandas.pydata.org/Pandas_Cheat_Sheet.pdf"><b>pandas cheatsheet</b></a> and start taking some notes on which methods and techniques you've seen so far.
</div>