# Tourism

-----

### Requirements

Extract the data from tab 2 and tab 4.

#### Observations & Dimensions

For `observations` we want the `purpose` data, we **dont** want the `all visits`.


The required dimensions are:

* **Geography** - it's all UK level data (the code for UK is "K02000001")
* **Time** - in the format MMM YYYY
* **Purpose** - one of Holiday, Business, Visiting friends or relatives, Miscellaneous
* **Direction of Travel** - either "overseas visits to the uk", or "uk visits abroad"
* **Units** - constant value of '1000'

-----
Notes:

* We dont want the data markings against 2019 dates
* We don't want the ad hoc summary data at the bottom , "latest three months.." etc

In [13]:
%cd mock-transformations/

[Errno 2] No such file or directory: 'mock-transformations/'
/workspace/mock-transformations


In [14]:
from databaker.framework import *
import pandas as pd

tabs = loadxlstabs("./sources/tourism.xls") # load tabs

Loading ./sources/tourism.xls which has size 180736 bytes
Table names: ['Index', 'Table 1', 'Table 2', 'Table 3', 'Table 4', 'Table 5', 'Table 6']


In [15]:
work = []

In [16]:
for tab in tabs:
    if tab.name not in ["Table 2", "Table 4"]:
        continue
    else:
        print("😍 " + tab.name)

    # Dimension Direction of Travel is the header of the report
    direction = tab.excel_ref("B1")
    # Dimension Geography is a hard string
    geography = "K02000001"
    # Units are 1000 counts
    units = "thousands"
    # Dimension Time is format MMM YYYY (Will require some pandas transform)
    # For the year CLOSEST UP
    year = tab.excel_ref("A7").expand(DOWN).is_not_blank()
    # For the month DIRECT LEFT
    month = tab.excel_ref("B7").expand(DOWN).is_not_blank()
    # Dimension Purpose DIRECT UP
    purpose = tab.excel_ref("G4").expand(RIGHT).is_not_blank()
    # observations are waffle
    observations = month.waffle(purpose)

    dimensions = [HDimConst("Geography", geography),
                  HDimConst("Units", units),
                  HDimConst("Direction", direction.value),
                  HDim(year, "Year", CLOSEST, UP),
                  HDim(month, "Month", DIRECTLY, LEFT),
                  HDim(purpose, "Purpose", DIRECTLY, ABOVE)]

    df = ConversionSegment(tab, dimensions, observations).topandas()

    work.append(df)

😍 Table 2

😍 Table 4



In [17]:
output = pd.concat(work)

In [18]:
output

Unnamed: 0,OBS,Geography,Units,Direction,Year,Month,Purpose
0,662.512925,K02000001,thousands,Purpose of overseas residents' visits to the U...,2015.0,Jan,Holiday
1,688.776441,K02000001,thousands,Purpose of overseas residents' visits to the U...,2015.0,Jan,Business
2,899.146350,K02000001,thousands,Purpose of overseas residents' visits to the U...,2015.0,Jan,Visiting friends or relatives
3,168.449570,K02000001,thousands,Purpose of overseas residents' visits to the U...,2015.0,Jan,Miscellaneous
4,639.824439,K02000001,thousands,Purpose of overseas residents' visits to the U...,2015.0,Feb,Holiday
...,...,...,...,...,...,...,...
211,120.000000,K02000001,thousands,Purpose of UK residents' visits abroad by month,2019.0,May¹,Miscellaneous
212,4850.000000,K02000001,thousands,Purpose of UK residents' visits abroad by month,2019.0,June¹,Holiday
213,610.000000,K02000001,thousands,Purpose of UK residents' visits abroad by month,2019.0,June¹,Business
214,1240.000000,K02000001,thousands,Purpose of UK residents' visits abroad by month,2019.0,June¹,Visiting friends or relatives


In [19]:
output["Direction"].replace({"Purpose of overseas residents' visits to the UK by month": "Overseas visits to the UK",
                             "Purpose of UK residents' visits abroad by month": "UK visits abroad"},
                             inplace=True)

In [28]:
# Time dimension
output["Month"] = output["Month"].str[:3]
output["Year"] = output["Year"].astype(float).astype(int).astype(str)
output["Time"] = output["Month"] + " " + output["Year"]
output.drop(columns=["Month", "Year"], inplace=True)

In [29]:
output

Unnamed: 0,OBS,Geography,Units,Direction,Purpose,Time
0,662.512925,K02000001,thousands,Overseas visits to the UK,Holiday,Jan 2015
1,688.776441,K02000001,thousands,Overseas visits to the UK,Business,Jan 2015
2,899.146350,K02000001,thousands,Overseas visits to the UK,Visiting friends or relatives,Jan 2015
3,168.449570,K02000001,thousands,Overseas visits to the UK,Miscellaneous,Jan 2015
4,639.824439,K02000001,thousands,Overseas visits to the UK,Holiday,Feb 2015
...,...,...,...,...,...,...
211,120.000000,K02000001,thousands,UK visits abroad,Miscellaneous,May 2019
212,4850.000000,K02000001,thousands,UK visits abroad,Holiday,Jun 2019
213,610.000000,K02000001,thousands,UK visits abroad,Business,Jun 2019
214,1240.000000,K02000001,thousands,UK visits abroad,Visiting friends or relatives,Jun 2019
