# Purpose
Knowledge of trade event data in a specific market can be very helpful. Uses include:
1. Demand and capacity planning. Certain industries (eg: travel, hospitality, F&B) would benefit from knowing when demand for their services would peak, and alter their capacity and pricing accordingly.
2. Comparison against same event in the past. It is troublesome to search the internet for the exact dates of an biennial-recurring event (eg: Singapore Air Show). Looking up the exacts dates in a searchable database would be more convenient.


## Notes
- If ever need to update records, the combined key of "title + dt_from" should be unique.

## Helpful links
- https://stackoverflow.com/questions/27835619/urllib-and-ssl-certificate-verify-failed-error
- https://stackoverflow.com/questions/27652543/how-to-use-python-requests-to-fake-a-browser-visit
- https://stackoverflow.com/questions/24330230/python-how-to-crawl-past-viewstate
- https://blog.scrapinghub.com/2016/04/20/scrapy-tips-from-the-pros-april-2016-edition


In [1]:
import pandas as pd
from pandas import DataFrame, Series
import datetime as dt
import requests
from bs4 import BeautifulSoup

# pandas options
pd.set_option('display.max_columns', None)  # Shows all columns in DataFrames. See http://pandas.pydata.org/pandas-docs/stable/options.html
pd.set_option('display.max_rows', None) # Shows all rows in DataFrames.
pd.set_option('display.width', 5000)
pd.set_option('display.multi_sparse', False)  #  Display every cell (for multi-level index).
pd.set_option('display.max_colwidth', -1)  # Display full contents of each column.


In [19]:
def get_event_data(url):
    """ Given the URL to scrape, return a DataFrame of the data. Code is tailored to specific URLs.
    """
    headers = {'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/70.0.3538.77 Safari/537.36'}
    res = requests.get(url, verify=False, headers=headers)  # Don't verify the SSL cert from the server.

    bs = BeautifulSoup(res.text)
    el_article = bs.find(name='article', attrs={'class': 'article listing'})

    df_all = DataFrame(columns=['title', 'dt_from', 'dt_to', 'location', 'url', 'desc'])

    for section in el_article.findAll("section"):
        a = section.find(name='a')
        str_title = a.contents[0]  # The text in the anchor tag
        str_url = a['href']  # The href attribute of the anchor tag
        str_desc = section.find(name='p').contents[0].strip()
        str_location = section.find(name='span', attrs={'class': 'location-label'}).contents[0].strip()

        # Find the event start date and end date.
        s = section.find(name='span', attrs={'class': 'timestamp'}).contents[0].strip()
        l = s.split(' - ')
        if len(l) > 1:
            dt_from = pd.to_datetime(l[0])
            dt_to = pd.to_datetime(l[1])
        else:  # Only 1 date, is not a range.
            dt_from = pd.to_datetime(l[0])
            dt_to = pd.to_datetime(l[0])

        df_all = df_all.append(DataFrame({'title':[str_title], 'dt_from':[dt_from], 'dt_to':[dt_to], 'location':[str_location], 'url':[str_url], 'desc':[str_desc]}), ignore_index=True)
    
    df_all['title'] = df_all['title'].str.upper()  # Uppercase event titles, for ease of searching later.
    df_all['snapshot_dt'] = dt.datetime.now()  # For recording when data was copied from data source.
    return df_all

In [None]:
str_url_past = 'https://ie.enterprisesg.gov.sg/Events/Past-Events?mk=2CADECAC544E47AAB230EAC334F3ACC3&sa=1'  # Past events
str_url_upcoming = 'https://ie.enterprisesg.gov.sg/Events/Upcoming-Events?mk=2CADECAC544E47AAB230EAC334F3ACC3&sa=1'  # sa => 'show all'; mk => market (SG)

df_past = get_event_data(str_url_past)
df_upcoming = get_event_data(str_url_upcoming)
df_all = pd.concat([df_past, df_upcoming], ignore_index=True)
df_all = df_all.sort_values(by=['dt_from']).reset_index(drop=True)

# Examining the dataset
- Observe that there's a URL that you can navigate to for more info.
- The dataset only goes back to Mar 2016. And is only for events happening in Singapore (as intended).
- Upcoming events are also present.

In [21]:
df_all

Unnamed: 0,title,dt_from,dt_to,location,url,desc,snapshot_dt
0,SWEETS & BAKES ASIA 2016,2016-03-03,2016-03-04,Singapore,https://gems.gevme.com/sweets-bakes-asia-2016,,2019-03-24 12:15:06.915008
1,FURNIPRO 2016,2016-03-10,2016-03-11,Singapore,https://gems.gevme.com/furnipro-2016,"Going into its third edition, furniPRO Asia continues to play a strategic role for exhibitors in the woodworking and furniture production industry by facilitating their development and outreach into highly dynamic ASEAN region. Riding it on the wave, furniPRO Asia has now changed its dates to 10 – 12 March 2016 which will be held in-conjunction wi...",2019-03-24 12:15:06.915008
2,INTERNATIONAL FURNITURE FAIR SINGAPORE (IFFS),2016-03-10,2016-03-12,Singapore,https://gems.gevme.com/international-furniture-fair-singapore-iffs,,2019-03-24 12:15:06.915008
3,ASIA PACIFIC MARITIME (APM) 2016,2016-03-16,2016-03-17,Singapore,https://gems.gevme.com/asia-pacific-maritime-apm-2016,,2019-03-24 12:15:06.915008
4,IDEM SINGAPORE 2016,2016-04-08,2016-04-09,Singapore,https://gems.gevme.com/idem-singapore-2016,,2019-03-24 12:15:06.915008
5,PROWINE ASIA 2016,2016-04-12,2016-04-14,"Singapore, Singapore",https://gems.gevme.com/prowine-asia-2016,"ProWine Asia 2016 will bring together the best practices, credibility and recognition from Wine & Spirits Asia - Asia’s international trade exhibition for wines and spirits, and ProWein - World’s leading international trade fair for wines and spirits. ProWine Asia 2016, the largest trade fair of its kind in Asia, will feature a wide congregation o...",2019-03-24 12:15:06.915008
6,CHILDREN BABY MATERNITY EXPO,2016-04-13,2016-04-14,Singapore,https://gems.gevme.com/children-baby-maternity-expo,"CBME is a leading event in the world in products for Children, Babies and Maternity industry. Organised by UBM, it is expanding its scope and gaining foothold across the world. In line with the growth in the South East Asia and Australasia’s GDP Purchasing Power Parity among the top six markets in this region namely Indonesia, Malaysia, Philippine...",2019-03-24 12:15:06.915008
7,CARDS AND PAYMENT ASIA 2016,2016-04-20,2016-04-20,Singapore,https://gems.gevme.com/cards-and-payment-asia-2016,,2019-03-24 12:15:06.915008
8,SMART-FACILITIES MANAGEMENT SOLUTION EXPO 2016,2016-04-26,2016-04-27,Singapore,https://gems.gevme.com/smart-facilities-management-solution-expo-2016,"Back in its third year, SMART Facilities Management Solutions is the region’s most comprehensive trade event servicing the facilities management industry. SMART FMSE 2016 provides an arena for suppliers, end users and professionals to network, exchange knowledge, share best practices and stay updated on the latest industry needs for future readine...",2019-03-24 12:15:06.915008
9,NXT@COMMUNICASIA 2016,2016-05-31,2016-06-02,Singapore,https://gems.gevme.com/nxt-communicasia-2016,,2019-03-24 12:15:06.915008


# Querying the data

In [33]:
# When has the event FLAsia been held in past years? What about this year?
df_all[df_all['title'].str.contains('FRANCHISING')][['title', 'dt_from', 'dt_to']]

Unnamed: 0,title,dt_from,dt_to
30,FRANCHISING & LICENSING ASIA 2016 (FLASIA 2016),2016-10-13,2016-10-14
79,FRANCHISING & LICENSING ASIA 2017 (FLASIA 2017),2017-10-12,2017-10-14
149,FRANCHISING & LICENSING ASIA 2018 (FLASIA) 2018,2018-10-18,2018-10-20
217,FRANCHISING & LICENSING ASIA 2019 (FLASIA 2019),2019-10-24,2019-10-26


In [35]:
# What events are in the latter half of 2019 (2H19) ?
df_all[(df_all['dt_from'] > '2019-07-01') & (df_all['dt_from'] > '2019-12-31')][['title', 'dt_from', 'dt_to', 'url']]

Unnamed: 0,title,dt_from,dt_to,url
220,SINGAPORE AIRSHOW 2020,2020-02-11,2020-02-16,https://gems.gevme.com/singapore-airshow-2020-25606537
221,RADIOLOGYASIA 2020,2020-02-20,2020-02-21,https://gems.gevme.com/radiologyasia-2020-16462867
222,BEAUTYASIA SINGAPORE 2020,2020-02-24,2020-02-26,https://gems.gevme.com/beautyasia-singapore-2020-17915707
223,FOODSERVICE EQUIPMENT 2020,2020-03-03,2020-03-06,https://gems.gevme.com/foodservice-equipment-2020-23967069
224,SPECIALITY COFFEE & TEA 2020,2020-03-03,2020-03-06,https://gems.gevme.com/specialitycoffeetea2020-52316877
225,INTERNATIONAL FURNITURE FAIR SINGAPORE,2020-03-09,2020-03-12,https://gems.gevme.com/international-furniture-fair-singapore-86217343
226,CAFE ASIA 2020,2020-03-14,2020-03-16,https://gems.gevme.com/cafe-asia-2020-44417768
227,ASIA PACIFIC MARITIME (APM) 2020,2020-03-18,2020-03-20,https://gems.gevme.com/asia-pacific-maritime-apm-2020-14677159
228,LAST MILE FULFULMENT ASIA 2020,2020-03-25,2020-03-26,https://gems.gevme.com/last-mile-fulfulment-asia-2020-82300563
229,IOT ASIA 2020,2020-03-25,2020-03-26,https://gems.gevme.com/iot-asia-2020-93276619


In [30]:
# Which events should I be interested in? I'm in the food business.
df_all[df_all['title'].str.contains('FOOD')][['title', 'dt_from', 'dt_to', 'url']]

Unnamed: 0,title,dt_from,dt_to,url
123,SPECIALITY & FINE FOOD ASIA 2018,2018-07-17,2018-07-19,https://gems.gevme.com/speciality-fine-food-asia-2018-75229096
137,VITAFOODS ASIA 2018,2018-09-11,2018-09-12,https://gems.gevme.com/vitafoods-asia-2018-35312507
170,CPTPP OUTREACH SERIES: BENEFITS FOR THE FOOD PROCESSING SECTOR,2018-11-28,2018-11-28,https://gems.gevme.com/cptpp-outreach-series-benefits-for-the-food-processing-sector-78812548
193,FOOD INDUSTRY: UNVEILING THE HIDDEN GAP THAT IMPEDES YOUR BUSINESS GROWTH,2019-04-09,2019-04-09,https://gems.gevme.com/food-industry-human-capital-workshop
204,SPECIALTY FINE FOOD ASIA 2019,2019-07-17,2019-07-19,https://gems.gevme.com/specialty-fine-food-asia-2019-66417022
223,FOODSERVICE EQUIPMENT 2020,2020-03-03,2020-03-06,https://gems.gevme.com/foodservice-equipment-2020-23967069
231,FOODTECH 2020,2020-03-31,2020-04-03,https://gems.gevme.com/foodtech-2020-55473776


# Saving the data

In [22]:
# Export to CSV
df_all.to_csv('C:/1/trade_events.csv', index=False)