# Assignment 4

Before working on this assignment please read these instructions fully. In the submission area, you will notice that you can click the link to **Preview the Grading** for each step of the assignment. This is the criteria that will be used for peer grading. Please familiarize yourself with the criteria before beginning the assignment.

This assignment requires that you to find **at least** two datasets on the web which are related, and that you visualize these datasets to answer a question with the broad topic of **religious events or traditions** (see below) for the region of **Singapore, None, Singapore**, or **Singapore** more broadly.

You can merge these datasets with data from different regions if you like! For instance, you might want to compare **Singapore, None, Singapore** to Ann Arbor, USA. In that case at least one source file must be about **Singapore, None, Singapore**.

You are welcome to choose datasets at your discretion, but keep in mind **they will be shared with your peers**, so choose appropriate datasets. Sensitive, confidential, illicit, and proprietary materials are not good choices for datasets for this assignment. You are welcome to upload datasets of your own as well, and link to them using a third party repository such as github, bitbucket, pastebin, etc. Please be aware of the Coursera terms of service with respect to intellectual property.

Also, you are welcome to preserve data in its original language, but for the purposes of grading you should provide english translations. You are welcome to provide multiple visuals in different languages if you would like!

As this assignment is for the whole course, you must incorporate principles discussed in the first week, such as having as high data-ink ratio (Tufte) and aligning with Cairo’s principles of truth, beauty, function, and insight.

Here are the assignment instructions:

 * State the region and the domain category that your data sets are about (e.g., **Singapore, None, Singapore** and **religious events or traditions**).
 * You must state a question about the domain category and region that you identified as being interesting.
 * You must provide at least two links to available datasets. These could be links to files such as CSV or Excel files, or links to websites which might have data in tabular form, such as Wikipedia pages.
 * You must upload an image which addresses the research question you stated. In addition to addressing the question, this visual should follow Cairo's principles of truthfulness, functionality, beauty, and insightfulness.
 * You must contribute a short (1-2 paragraph) written justification of how your visualization addresses your stated research question.

What do we mean by **religious events or traditions**?  For this category you might consider calendar events, demographic data about religion in the region and neighboring regions, participation in religious events, or how religious events relate to political events, social movements, or historical events.

## Tips
* Wikipedia is an excellent source of data, and I strongly encourage you to explore it for new data sources.
* Many governments run open data initiatives at the city, region, and country levels, and these are wonderful resources for localized data sources.
* Several international agencies, such as the [United Nations](http://data.un.org/), the [World Bank](http://data.worldbank.org/), the [Global Open Data Index](http://index.okfn.org/place/) are other great places to look for data.
* This assignment requires you to convert and clean datafiles. Check out the discussion forums for tips on how to do this from various sources, and share your successes with your fellow students!

## Example
Looking for an example? Here's what our course assistant put together for the **Ann Arbor, MI, USA** area using **sports and athletics** as the topic. [Example Solution File](./readonly/Assignment4_example.pdf)

## Region and Domain

Singapore and religious events or traditions

## Research Question


Do more Singapore residents travel overseas during school holidays?

## Datasets

Singapore residents travelling outbound: https://www.tablebuilder.singstat.gov.sg/publicfacing/api/csv/title/15300.csv

Historically, school holidays are usually in March, June and December of the year

## Data Cleaning

In [1]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import urllib.request, urllib.parse, urllib.error

%matplotlib notebook

In [2]:
df = pd.read_csv('https://www.tablebuilder.singstat.gov.sg/publicfacing/api/csv/title/15300.csv')
df

Unnamed: 0,Unnamed: 1,Unnamed: 2,Unnamed: 3,Unnamed: 4,Unnamed: 5,Unnamed: 6,Unnamed: 7,Unnamed: 8,Unnamed: 9,Unnamed: 10,Unnamed: 11,Unnamed: 12,Unnamed: 13,Unnamed: 14,Unnamed: 15,Unnamed: 16,Unnamed: 17,Unnamed: 18,Unnamed: 19,Unnamed: 20,Unnamed: 21,Unnamed: 22,Unnamed: 23,Unnamed: 24,Unnamed: 25,Unnamed: 26,Unnamed: 27,Unnamed: 28,Unnamed: 29,Unnamed: 30,Unnamed: 31,Unnamed: 32,Unnamed: 33,Unnamed: 34,Unnamed: 35,Unnamed: 36,Unnamed: 37,Unnamed: 38,Unnamed: 39,Unnamed: 40,Unnamed: 41,Unnamed: 42,Unnamed: 43,Unnamed: 44,Unnamed: 45,Unnamed: 46,Unnamed: 47,Unnamed: 48,Unnamed: 49,Unnamed: 50,Unnamed: 51,Unnamed: 52,Unnamed: 53,Unnamed: 54,Unnamed: 55,Unnamed: 56,Unnamed: 57,Unnamed: 58,Unnamed: 59,Unnamed: 60,Unnamed: 61,Unnamed: 62,Unnamed: 63,Unnamed: 64,Unnamed: 65,Unnamed: 66,Unnamed: 67,Unnamed: 68,Unnamed: 69,Unnamed: 70,Unnamed: 71,Unnamed: 72,Unnamed: 73,Unnamed: 74,Unnamed: 75,Unnamed: 76,Unnamed: 77,Unnamed: 78,Unnamed: 79,Unnamed: 80,Unnamed: 81,Unnamed: 82,Unnamed: 83,Unnamed: 84,Unnamed: 85,Unnamed: 86,Unnamed: 87,Unnamed: 88,Unnamed: 89,Unnamed: 90,Unnamed: 91,Unnamed: 92,Unnamed: 93,Unnamed: 94,Unnamed: 95,Unnamed: 96,Unnamed: 97,Unnamed: 98,Unnamed: 99,Unnamed: 100,Unnamed: 101,Unnamed: 102,Unnamed: 103,Unnamed: 104,Unnamed: 105,Unnamed: 106,Unnamed: 107,Unnamed: 108,Unnamed: 109,Unnamed: 110,Unnamed: 111,Unnamed: 112,Unnamed: 113,Unnamed: 114,Unnamed: 115,Unnamed: 116,Unnamed: 117,Unnamed: 118,Unnamed: 119,Unnamed: 120,Unnamed: 121,Unnamed: 122,Unnamed: 123,Unnamed: 124,Unnamed: 125,Number
Variables,2011 Jan,2011 Feb,2011 Mar,2011 Apr,2011 May,2011 Jun,2011 Jul,2011 Aug,2011 Sep,2011 Oct,2011 Nov,2011 Dec,2012 Jan,2012 Feb,2012 Mar,2012 Apr,2012 May,2012 Jun,2012 Jul,2012 Aug,2012 Sep,2012 Oct,2012 Nov,2012 Dec,2013 Jan,2013 Feb,2013 Mar,2013 Apr,2013 May,2013 Jun,2013 Jul,2013 Aug,2013 Sep,2013 Oct,2013 Nov,2013 Dec,2014 Jan,2014 Feb,2014 Mar,2014 Apr,2014 May,2014 Jun,2014 Jul,2014 Aug,2014 Sep,2014 Oct,2014 Nov,2014 Dec,2015 Jan,2015 Feb,2015 Mar,2015 Apr,2015 May,2015 Jun,2015 Jul,2015 Aug,2015 Sep,2015 Oct,2015 Nov,2015 Dec,2016 Jan,2016 Feb,2016 Mar,2016 Apr,2016 May,2016 Jun,2016 Jul,2016 Aug,2016 Sep,2016 Oct,2016 Nov,2016 Dec,2017 Jan,2017 Feb,2017 Mar,2017 Apr,2017 May,2017 Jun,2017 Jul,2017 Aug,2017 Sep,2017 Oct,2017 Nov,2017 Dec,2018 Jan,2018 Feb,2018 Mar,2018 Apr,2018 May,2018 Jun,2018 Jul,2018 Aug,2018 Sep,2018 Oct,2018 Nov,2018 Dec,2019 Jan,2019 Feb,2019 Mar,2019 Apr,2019 May,2019 Jun,2019 Jul,2019 Aug,2019 Sep,2019 Oct,2019 Nov,2019 Dec,2020 Jan,2020 Feb,2020 Mar,2020 Apr,2020 May,2020 Jun,2020 Jul,2020 Aug,2020 Sep,2020 Oct,2020 Nov,2020 Dec,2021 Jan,2021 Feb,2021 Mar,2021 Apr,2021 May,2021 Jun
Total,519371,552139,631472,589780,658311,762285,592057,537969,624327,615413,739630,930172,575380,528376,687345,618408,672397,785014,567318,573261,615076,660133,795232,969868,534554,586868,737164,605614,697209,870655,587318,639716,697010,717544,895821,1077593,626646,572401,757244,670221,730861,896499,606034,617255,713298,757472,872185,1082589,602818,612946,758811,695965,789999,868634,630628,674736,728534,715482,904417,1142361,626184,686304,829374,739805,797697,879990,645430,629682,779910,759911,914424,1185494,696318,608797,863448,789348,814350,917203,683886,731503,776151,785667,983609,1238572,669652,686178,950291,774528,872265,955818,719883,780692,809744,832460,1043293,1283525,700668,729915,953556,811810,844277,1048217,738793,814768,848901,875060,1072972,1271780,759018,420253,207985,5237,6265,8528,10304,13078,16452,16512,28202,51173,39195,43352,56512,50834,48587,50251
Air,403091,418658,493565,454456,512788,598743,447246,424227,496243,486524,598638,748441,454129,422488,547932,500320,550627,624639,451896,453977,498279,536331,661194,783456,431617,466876,583066,492933,561989,703498,476835,501336,561024,589119,728060,864075,492364,448791,604105,539291,595285,725083,487019,494973,580512,614387,716594,865374,476524,481183,616144,562934,633056,709001,509449,532272,590148,589746,750693,920227,497410,542520,675124,600065,652406,728965,532170,519051,647384,627721,773054,974723,564600,502463,708879,636536,684044,766096,560828,597199,631680,648313,817157,982892,550455,545445,756088,632948,718874,773912,576097,617271,639916,671392,861092,1017890,568322,572580,761095,650310,686037,842881,585753,626076,680835,705809,874748,1029844,593898,336590,169903,5156,6213,8399,10151,12936,16320,16312,18843,23810,24025,23013,26336,25895,29754,30694
Sea,116280,133481,137907,135324,145523,163542,144811,113742,128084,128889,140992,181731,121251,105888,139413,118088,121770,160375,115422,119284,116797,123802,134038,186412,102937,119992,154098,112681,135220,167157,110483,138380,135986,128425,167761,213518,134282,123610,153139,130930,135576,171416,119015,122282,132786,143085,155591,217215,126294,131763,142667,133031,156943,159633,121179,142464,138386,125736,153724,222134,128774,143784,154250,139740,145291,151025,113260,110631,132526,132190,141370,210771,131718,106334,154569,152812,130306,151107,123058,134304,144471,137354,166452,255680,119197,140733,194203,141580,153391,181906,143786,163421,169828,161068,182201,265635,132346,157335,192461,161500,158240,205336,153040,188692,168066,169251,198224,241936,165120,83663,38082,81,52,129,153,142,132,200,9359,27363,15170,20339,30176,24939,18833,19557
SOURCE: IMMIGRATION AND CHECKPOINTS AUTHORITY,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,
Generated by: SingStat Table Builder,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,
Data last updated: 02/08/2021,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,
Date generated: 12/08/2021,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,
Contact: info@singstat.gov.sg,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,


In [3]:
df.reset_index(inplace=True)
df

Unnamed: 0,level_0,level_1,level_2,level_3,level_4,level_5,level_6,level_7,level_8,level_9,...,level_117,level_118,level_119,level_120,level_121,level_122,level_123,level_124,level_125,Number
0,Variables,2011 Jan,2011 Feb,2011 Mar,2011 Apr,2011 May,2011 Jun,2011 Jul,2011 Aug,2011 Sep,...,2020 Sep,2020 Oct,2020 Nov,2020 Dec,2021 Jan,2021 Feb,2021 Mar,2021 Apr,2021 May,2021 Jun
1,Total,519371,552139,631472,589780,658311,762285,592057,537969,624327,...,16452,16512,28202,51173,39195,43352,56512,50834,48587,50251
2,Air,403091,418658,493565,454456,512788,598743,447246,424227,496243,...,16320,16312,18843,23810,24025,23013,26336,25895,29754,30694
3,Sea,116280,133481,137907,135324,145523,163542,144811,113742,128084,...,132,200,9359,27363,15170,20339,30176,24939,18833,19557
4,SOURCE: IMMIGRATION AND CHECKPOINTS AUTHORITY,,,,,,,,,,...,,,,,,,,,,
5,Generated by: SingStat Table Builder,,,,,,,,,,...,,,,,,,,,,
6,Data last updated: 02/08/2021,,,,,,,,,,...,,,,,,,,,,
7,Date generated: 10/08/2021,,,,,,,,,,...,,,,,,,,,,
8,Contact: info@singstat.gov.sg,,,,,,,,,,...,,,,,,,,,,


In [4]:
df.rename(columns=df.iloc[0], inplace=True)
df

Unnamed: 0,Variables,2011 Jan,2011 Feb,2011 Mar,2011 Apr,2011 May,2011 Jun,2011 Jul,2011 Aug,2011 Sep,...,2020 Sep,2020 Oct,2020 Nov,2020 Dec,2021 Jan,2021 Feb,2021 Mar,2021 Apr,2021 May,2021 Jun
0,Variables,2011 Jan,2011 Feb,2011 Mar,2011 Apr,2011 May,2011 Jun,2011 Jul,2011 Aug,2011 Sep,...,2020 Sep,2020 Oct,2020 Nov,2020 Dec,2021 Jan,2021 Feb,2021 Mar,2021 Apr,2021 May,2021 Jun
1,Total,519371,552139,631472,589780,658311,762285,592057,537969,624327,...,16452,16512,28202,51173,39195,43352,56512,50834,48587,50251
2,Air,403091,418658,493565,454456,512788,598743,447246,424227,496243,...,16320,16312,18843,23810,24025,23013,26336,25895,29754,30694
3,Sea,116280,133481,137907,135324,145523,163542,144811,113742,128084,...,132,200,9359,27363,15170,20339,30176,24939,18833,19557
4,SOURCE: IMMIGRATION AND CHECKPOINTS AUTHORITY,,,,,,,,,,...,,,,,,,,,,
5,Generated by: SingStat Table Builder,,,,,,,,,,...,,,,,,,,,,
6,Data last updated: 02/08/2021,,,,,,,,,,...,,,,,,,,,,
7,Date generated: 10/08/2021,,,,,,,,,,...,,,,,,,,,,
8,Contact: info@singstat.gov.sg,,,,,,,,,,...,,,,,,,,,,


In [5]:
df.set_index("Variables", inplace = True)
df = df.drop(df.index[0])
df = df.drop(df.index[range(3,8)])
df

Unnamed: 0_level_0,2011 Jan,2011 Feb,2011 Mar,2011 Apr,2011 May,2011 Jun,2011 Jul,2011 Aug,2011 Sep,2011 Oct,...,2020 Sep,2020 Oct,2020 Nov,2020 Dec,2021 Jan,2021 Feb,2021 Mar,2021 Apr,2021 May,2021 Jun
Variables,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
Total,519371,552139,631472,589780,658311,762285,592057,537969,624327,615413,...,16452,16512,28202,51173,39195,43352,56512,50834,48587,50251
Air,403091,418658,493565,454456,512788,598743,447246,424227,496243,486524,...,16320,16312,18843,23810,24025,23013,26336,25895,29754,30694
Sea,116280,133481,137907,135324,145523,163542,144811,113742,128084,128889,...,132,200,9359,27363,15170,20339,30176,24939,18833,19557


In [6]:
df1 = df.T
df1

Variables,Total,Air,Sea
2011 Jan,519371,403091,116280
2011 Feb,552139,418658,133481
2011 Mar,631472,493565,137907
2011 Apr,589780,454456,135324
2011 May,658311,512788,145523
...,...,...,...
2021 Feb,43352,23013,20339
2021 Mar,56512,26336,30176
2021 Apr,50834,25895,24939
2021 May,48587,29754,18833


In [7]:
df1.index = pd.to_datetime(df1.index).strftime('%Y-%m')
df1

Variables,Total,Air,Sea
2011-01,519371,403091,116280
2011-02,552139,418658,133481
2011-03,631472,493565,137907
2011-04,589780,454456,135324
2011-05,658311,512788,145523
...,...,...,...
2021-02,43352,23013,20339
2021-03,56512,26336,30176
2021-04,50834,25895,24939
2021-05,48587,29754,18833


In [8]:
df1['Date'] = df1.index
df1

Variables,Total,Air,Sea,Date
2011-01,519371,403091,116280,2011-01
2011-02,552139,418658,133481,2011-02
2011-03,631472,493565,137907,2011-03
2011-04,589780,454456,135324,2011-04
2011-05,658311,512788,145523,2011-05
...,...,...,...,...
2021-02,43352,23013,20339,2021-02
2021-03,56512,26336,30176,2021-03
2021-04,50834,25895,24939,2021-04
2021-05,48587,29754,18833,2021-05


In [9]:
df1["Year"] = df1['Date'].apply(lambda x: x[:4])
df1["Month"] = df1['Date'].apply(lambda x: x[-2:])
df1

Variables,Total,Air,Sea,Date,Year,Month
2011-01,519371,403091,116280,2011-01,2011,01
2011-02,552139,418658,133481,2011-02,2011,02
2011-03,631472,493565,137907,2011-03,2011,03
2011-04,589780,454456,135324,2011-04,2011,04
2011-05,658311,512788,145523,2011-05,2011,05
...,...,...,...,...,...,...
2021-02,43352,23013,20339,2021-02,2021,02
2021-03,56512,26336,30176,2021-03,2021,03
2021-04,50834,25895,24939,2021-04,2021,04
2021-05,48587,29754,18833,2021-05,2021,05


In [10]:
df2=df1.set_index(["Year"])
df2

Variables,Total,Air,Sea,Date,Month
Year,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
2011,519371,403091,116280,2011-01,01
2011,552139,418658,133481,2011-02,02
2011,631472,493565,137907,2011-03,03
2011,589780,454456,135324,2011-04,04
2011,658311,512788,145523,2011-05,05
...,...,...,...,...,...
2021,43352,23013,20339,2021-02,02
2021,56512,26336,30176,2021-03,03
2021,50834,25895,24939,2021-04,04
2021,48587,29754,18833,2021-05,05


In [11]:
df2 = df2.drop(df2.columns[[1, 2, 3]], axis=1)
df2

Variables,Total,Month
Year,Unnamed: 1_level_1,Unnamed: 2_level_1
2011,519371,01
2011,552139,02
2011,631472,03
2011,589780,04
2011,658311,05
...,...,...
2021,43352,02
2021,56512,03
2021,50834,04
2021,48587,05


In [12]:
df2['Month']=pd.to_datetime(df2['Month'],format='%m').dt.strftime('%b')
df2=df2.pivot(columns='Month',values='Total').rename_axis(columns=None)
df2

Unnamed: 0_level_0,Apr,Aug,Dec,Feb,Jan,Jul,Jun,Mar,May,Nov,Oct,Sep
Year,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1
2011,589780,537969.0,930172.0,552139,519371,592057.0,762285,631472,658311,739630.0,615413.0,624327.0
2012,618408,573261.0,969868.0,528376,575380,567318.0,785014,687345,672397,795232.0,660133.0,615076.0
2013,605614,639716.0,1077593.0,586868,534554,587318.0,870655,737164,697209,895821.0,717544.0,697010.0
2014,670221,617255.0,1082589.0,572401,626646,606034.0,896499,757244,730861,872185.0,757472.0,713298.0
2015,695965,674736.0,1142361.0,612946,602818,630628.0,868634,758811,789999,904417.0,715482.0,728534.0
2016,739805,629682.0,1185494.0,686304,626184,645430.0,879990,829374,797697,914424.0,759911.0,779910.0
2017,789348,731503.0,1238572.0,608797,696318,683886.0,917203,863448,814350,983609.0,785667.0,776151.0
2018,774528,780692.0,1283525.0,686178,669652,719883.0,955818,950291,872265,1043293.0,832460.0,809744.0
2019,811810,814768.0,1271780.0,729915,700668,738793.0,1048217,953556,844277,1072972.0,875060.0,848901.0
2020,5237,13078.0,51173.0,420253,759018,10304.0,8528,207985,6265,28202.0,16512.0,16452.0


In [13]:
from calendar import month_abbr

df2.columns=pd.Categorical(df2.columns,month_abbr[1:],ordered=True)
df2

Unnamed: 0_level_0,Apr,Aug,Dec,Feb,Jan,Jul,Jun,Mar,May,Nov,Oct,Sep
Year,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1
2011,589780,537969.0,930172.0,552139,519371,592057.0,762285,631472,658311,739630.0,615413.0,624327.0
2012,618408,573261.0,969868.0,528376,575380,567318.0,785014,687345,672397,795232.0,660133.0,615076.0
2013,605614,639716.0,1077593.0,586868,534554,587318.0,870655,737164,697209,895821.0,717544.0,697010.0
2014,670221,617255.0,1082589.0,572401,626646,606034.0,896499,757244,730861,872185.0,757472.0,713298.0
2015,695965,674736.0,1142361.0,612946,602818,630628.0,868634,758811,789999,904417.0,715482.0,728534.0
2016,739805,629682.0,1185494.0,686304,626184,645430.0,879990,829374,797697,914424.0,759911.0,779910.0
2017,789348,731503.0,1238572.0,608797,696318,683886.0,917203,863448,814350,983609.0,785667.0,776151.0
2018,774528,780692.0,1283525.0,686178,669652,719883.0,955818,950291,872265,1043293.0,832460.0,809744.0
2019,811810,814768.0,1271780.0,729915,700668,738793.0,1048217,953556,844277,1072972.0,875060.0,848901.0
2020,5237,13078.0,51173.0,420253,759018,10304.0,8528,207985,6265,28202.0,16512.0,16452.0


In [14]:
df2=df2.sort_index(axis=1)
df3=df2.T
df3=df3.drop(['2020','2021'],axis=1)
df3

Year,2011,2012,2013,2014,2015,2016,2017,2018,2019
Jan,519371,575380,534554,626646,602818,626184,696318,669652,700668
Feb,552139,528376,586868,572401,612946,686304,608797,686178,729915
Mar,631472,687345,737164,757244,758811,829374,863448,950291,953556
Apr,589780,618408,605614,670221,695965,739805,789348,774528,811810
May,658311,672397,697209,730861,789999,797697,814350,872265,844277
Jun,762285,785014,870655,896499,868634,879990,917203,955818,1048217
Jul,592057,567318,587318,606034,630628,645430,683886,719883,738793
Aug,537969,573261,639716,617255,674736,629682,731503,780692,814768
Sep,624327,615076,697010,713298,728534,779910,776151,809744,848901
Oct,615413,660133,717544,757472,715482,759911,785667,832460,875060


In [15]:
df3.astype('str')
df3.replace(',','', regex=True, inplace=True)
df3 = df3.apply(pd.to_numeric)
df3

Year,2011,2012,2013,2014,2015,2016,2017,2018,2019
Jan,519371,575380,534554,626646,602818,626184,696318,669652,700668
Feb,552139,528376,586868,572401,612946,686304,608797,686178,729915
Mar,631472,687345,737164,757244,758811,829374,863448,950291,953556
Apr,589780,618408,605614,670221,695965,739805,789348,774528,811810
May,658311,672397,697209,730861,789999,797697,814350,872265,844277
Jun,762285,785014,870655,896499,868634,879990,917203,955818,1048217
Jul,592057,567318,587318,606034,630628,645430,683886,719883,738793
Aug,537969,573261,639716,617255,674736,629682,731503,780692,814768
Sep,624327,615076,697010,713298,728534,779910,776151,809744,848901
Oct,615413,660133,717544,757472,715482,759911,785667,832460,875060


In [16]:
import matplotlib

fig, ax = plt.subplots()
plt.plot(df3, '-o')

plt.xlabel('Month')
plt.ylabel('Number of Residents Travelling Overseas - Millions')

plt.legend(df3)
ax.get_yaxis().set_major_formatter(
    matplotlib.ticker.FuncFormatter(lambda x, p: format(int(x)/1000000)))

plt.title('Residents Tend to Travel More During School Holidays')

<IPython.core.display.Javascript object>

  x = x[:, np.newaxis]


Text(0.5, 1.0, 'Residents Tend to Travel More During School Holidays')