This repository contains data that has been scraped from the following sources:
- Malaysian newspaper The Star's election dashboard https://election.thestar.com.my/
- Malaysian newspaper Malaysiakini's election dashboard https://undi.info/
You can access the CSV files of the cleaned datasets in the cleaned subfolder under the data folder.
The cleaned data is seperated into a folder for the data from The Star and one for undi.info.
In each of the sources' respective folders, there is are files for
- results: which contain data about the votes obtained by each candidate
- constituency_info: which contains data about each constituency
- CODEBOOK: which contains a metadata about the variables in each data file
The data was scraped using python scripts contained in the following jupyter notebooks:
SVG files of the constituency boundaries avaialble from The Star was also scraped (code is in the same notebook) and raw SVG files are in the svgs folder.
Scraping and cleaning data from The Star's election dashboard
Data from The Star was scraped from the HTML pages and parsed cleaned using code from this notebook. Raw data in CSV files can be found here.
Data from The Star was available for the 2018 general elections (GE14) for the parliamentary elections and the state elections.
Scraping and cleaning data from Malaysiakini's undi.info site
Data from undi.info was scraped from the website's API using code from this notebook. Raw data in JSON files can be found here.
Data from undi.info was available for the 2004, 2008, 2013, and 2018 general elections (GE11, GE12, GE13, GE14) for the parliamentary elections and the state elections.
Note: undi.info also contains results of state elections in Sabah (2020), Malacca (2021) and Sarawak (2016, 2021) but this data has not been scraped and added to this repo yet.
Scraping and cleaning data from dashboard.spr.gov.my site
The scripts is written in JavaScript. Use the following instructions to run the script.
Note: Make sure you have NodeJs installed on your system.
>> cd scripts
>> yarn install
>> yarn scrape // this will scrape the data from the dashboard and wrangle
>> yarn fetch // this will fetch the data from the dashboard
>> yarn wrangle // this will wrangle the data
Raw data from the dashboard will be stored here and cleaned data will be stored here
This repo is part of a collaborative project by Southeast Asian civic tech groups (including Thibi and Data-N) to provide open data and open sourced data visualisations for the Malaysian elections.
We plan to add data for the 2022 General Elections (GE15) as soon as possible after the official results are announced.
All the data used for the data visualisations will be available in this repo and we will be sharing the website where you can find the data visualisations very soon.