Skip to content

Latest commit

ย 

History

History
154 lines (133 loc) ยท 29.7 KB

case-study.md

File metadata and controls

154 lines (133 loc) ยท 29.7 KB

Stars Badge Forks Badge Pull Requests Badge Issues Badge GitHub contributors Visitors

Don't forget to hit the โญ if you like this repo.

About Us

The information on this Github is part of the materials for the subject High Performance Data Processing (SECP3133). This folder contains general big data information as well as big data case studies using Malaysian datasets. This case study was created by a Bachelor of Computer Science (Data Engineering), Universiti Teknologi Malaysia student.

Case Study

Contents:

๐ŸŒŸWeb Scraping

Team Library Website GitHub
Group 10 Beautiful soup StudyMalaysia.com Open in GitHub
High Five Beautiful soup EduSpiral Consultant Services Open in GitHub
QwQ Beautiful soup States and federal territories of Malaysia Open in GitHub
SDS Scrapy Book Depository Open in GitHub
BigMac Scrapy CompAsia.com Open in GitHub
SIX Scrapy bukukita.com Open in GitHub
AdMiPeQa Selenium Lazada Open in GitHub
SamVerse Selenium Malaysia General Election (GE-15) Open in GitHub
Group 9 Selenium Lazada Shopee Open in GitHub
No Name Requests Puma: sneakers Open in GitHub
Quad Lxml Jobstreet.com Open in GitHub

๐ŸŒŸExploratory Data Analysis

Team Title Colab GitHub
404 Error Property in Kuala Lumpur Open in Colab Open in GitHub
Alrite The Exportation of Plantation in Sarawak Open in Colab Open in GitHub
BEFE Covid-19 Clusters in Malaysia Open in Colab Open in GitHub
Boboiboy Property Listings in Kuala Lumpur Open in Colab Open in GitHub
COLBY Malaysia GE-14 Result Open in Colab Open in GitHub
FANTOM Daily recorded COVID-19 cases at state level In Malaysia Open in Colab Open in GitHub
HAHA Foreign Direct Investment In Malaysia Open in Colab Open in GitHub
HD Guna Tanah Tampin 2021 Open in Colab Open in GitHub
KIA Malaysia State Election 2018 Open in Colab Open in GitHub
LAB Malaysia Air Pollution Analysis Open in Colab Open in GitHub
MAAM Malaysia Hospital Patient Movement Analysis Open in Colab Open in GitHub
MEOW Capacity and utilisation of Intensive Care Unit (ICU) beds during COVID-19 Open in Colab Open in GitHub
MM Malaysia's 14th State Election Result Open in Colab Open in GitHub
PIXALATED Number of deaths in Malaysia from 2001 to 2018 Open in Colab Open in GitHub
POTATO Death by state, sex and age group Malaysia 2001-2018 Open in Colab Open in GitHub
QnX Real Estate Kuala Lumpur Malaysia Open in Colab Open in GitHub
SAMVERSE Restaurant Rating in Malaysia Open in Colab Open in GitHub
SMOL Population in Malaysia from 2010-2019 Open in Colab Open in GitHub
SQ Number of Cases and Incidents Rate of Communicable Disease by State Open in Colab Open in GitHub
TUK Number of Government School Pupils by District Education Office and State 2017-2018 Open in Colab Open in GitHub
UWU Property Listings in Kuala Lumpur Open in Colab Open in GitHub

๐ŸŒŸPandas - Data Processing

Team Title GitHub
404 Error Sales Analysis Open in GitHub
Alrite EDA on The Nasa Jpl Aesteroid Open in GitHub
BEFE Summary of Google Play Store Application Open in GitHub
Boboiboy Car Sales Data Open in GitHub
COLBY Banking Loan Credit Open in GitHub
FANTOM Google Playstore App Open in GitHub
HAHA Car sales in Russia by region Open in GitHub
HD New York Yellow Taxi Trip Data 2016-03 Open in GitHub
KIA US Road Construction and Closures 2016-2021 Analysis Open in GitHub
LAB Fraudulent Transaction Analysis and Prediction Open in GitHub
MAAM US Accidents (2016 - 2021) Analysis Open in GitHub
MEOW Apple AppStore App Data Open in GitHub
MM 2015 Flight Delays and Cancellations Open in GitHub
PIXALATED Google Playstore Application Summary Open in GitHub
POTATO Flight Delays and Cancellations at 2015 Open in GitHub
QnX Trump vs Biden on Twitter Open in GitHub
SAMVERSE Google Playstore Management Open in GitHub
SMOL USA House Listing Open in GitHub
SQ Online Payment Fraud Detection Open in GitHub
TUK Fraud Detection in Online Payment Open in GitHub
UWU Airline Delay 2017 Open in GitHub

๐ŸŒŸ Alternatives to Pandas for Processing Large Datasets

Team Library Title GitHub
AdMiPeQa DataTable Health Insurance Marketplace Open in GitHub
QwQ Polars Health Insurance Marketplace Open in GitHub
BigMac Vaex Health Insurance Marketplace Open in GitHub
Sepuluh Pyspark Health Insurance Marketplace Open in GitHub
High Five Koalas Health Insurance Marketplace Open in GitHub
SIX cuDF Health Insurance Marketplace Open in GitHub
No name DataTable Health Insurance Marketplace Open in GitHub
QUAD Polars NYC yellow taxi trip data Open in GitHub
Rojak Vaex Health Insurance Marketplace Open in GitHub
SamVerse Pyspark 1000000 Sales Records Open in GitHub
SDS Koalas NYC yellow taxi trip data Open in GitHub

๐ŸŒŸ Processing Large Datasets: Library Comparison

Team Library Title GitHub
AdMiPeQa Pandas vs DataTable Health Insurance Marketplace Open in GitHub
QwQ Pandas vs Polars Health Insurance Marketplace Open in GitHub
BigMac Pandas vs Vaex Health Insurance Marketplace Open in GitHub
SamVerse Pandas vs Pyspark 1000000 Sales Records Open in GitHub
High Five Pandas vs Koalas Health Insurance Marketplace Open in GitHub
SIX Pandas vs cuDF Health Insurance Marketplace Open in GitHub
No name Pandas vs DataTable Health Insurance Marketplace Open in GitHub
QUAD Pandas vs Polars NYC Yellow Taxi Trip Open in GitHub
Rojak Pandas vs Vaex Health Insurance Marketplace Open in GitHub
Sepuluh Pandas vs Pyspark Health Insurance Marketplace Open in GitHub
SDS Pandas vs Koalas NYC Yellow Taxi Trip Open in GitHub

๐ŸŒŸ Project

Team Library 1 Library 2 Library 3 Dataset Open in GitHub
AdMiPeQa Pandas Dask Koalas Air Flight Analysis :octocat:
BigMac Vaex Koalas PySpark Airline Delay and Cancellation Data 2016 - 2018 :octocat:
No Name Pandas PySpark Koalas Amazon Book Review :octocat:
QUAD Polars Koalas Datatable NYC yellow taxi trip data :octocat:
QwQ Koalas Pyspark Dask NYC Automated Traffic Volume Counts :octocat:
Rojak Pandas Vaex Koalas 15 Million Chess Games from Lichess (2013-2014) :octocat:
SDS Pandas Polars Koalas Analysis of Amazon Books Review :octocat:
SIX Dask Pyspark Koalas NYC Parking Tickets :octocat:
SamVerse Pandas PySpark Koalas Spotify Charts :octocat:
Sepuluh Pyspark Polars Pandas Airline Delay and Cancellation Data 2017 - 2018 :octocat:
High Five Pandas Koalas Modin Airline Delay and Cancellation Data 2015 - 2016 :octocat:

Contribution ๐Ÿ› ๏ธ

Please create an Issue for any improvements, suggestions or errors in the content.

You can also contact me using Linkedin for any other queries or feedback.

Visitors