Skip to content

Thibico/malaysia-election-data-scraping

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

40 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Malaysian Elections Data Repository

This repository contains data that has been scraped from the following sources:

Cleaned datasets and code books

You can access the CSV files of the cleaned datasets in the cleaned subfolder under the data folder.

The cleaned data is seperated into a folder for the data from The Star and one for undi.info.

In each of the sources' respective folders, there is are files for

  • results: which contain data about the votes obtained by each candidate
  • constituency_info: which contains data about each constituency
  • CODEBOOK: which contains a metadata about the variables in each data file

Methodology, code and raw data

The data was scraped using python scripts contained in the following jupyter notebooks:

SVG files of the constituency boundaries avaialble from The Star was also scraped (code is in the same notebook) and raw SVG files are in the svgs folder.

Scraping and cleaning data from The Star's election dashboard

Data from The Star was scraped from the HTML pages and parsed cleaned using code from this notebook. Raw data in CSV files can be found here.

Data from The Star was available for the 2018 general elections (GE14) for the parliamentary elections and the state elections.

Scraping and cleaning data from Malaysiakini's undi.info site

Data from undi.info was scraped from the website's API using code from this notebook. Raw data in JSON files can be found here.

Data from undi.info was available for the 2004, 2008, 2013, and 2018 general elections (GE11, GE12, GE13, GE14) for the parliamentary elections and the state elections.

Note: undi.info also contains results of state elections in Sabah (2020), Malacca (2021) and Sarawak (2016, 2021) but this data has not been scraped and added to this repo yet.

Scraping and cleaning data from dashboard.spr.gov.my site

The scripts is written in JavaScript. Use the following instructions to run the script.

Note: Make sure you have NodeJs installed on your system.

>> cd scripts
>> yarn install
>> yarn scrape // this will scrape the data from the dashboard and wrangle
>> yarn fetch // this will fetch the data from the dashboard
>> yarn wrangle // this will wrangle the data

Raw data from the dashboard will be stored here and cleaned data will be stored here

About

This repo is part of a collaborative project by Southeast Asian civic tech groups (including Thibi and Data-N) to provide open data and open sourced data visualisations for the Malaysian elections.

We plan to add data for the 2022 General Elections (GE15) as soon as possible after the official results are announced.

All the data used for the data visualisations will be available in this repo and we will be sharing the website where you can find the data visualisations very soon.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published