Skip to content
View xiaodaigh's full-sized avatar

Organizations

@AnalytixWare
Block or Report

Block or report xiaodaigh

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Please don't include any personal information such as legal names or email addresses. Maximum 100 characters, markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
xiaodaigh/README.md

About Me

Hi, my name is ZJ. I love contributing to the open-source data science community. I mainly code in R and Julia with a dash of Python and a minimal amount of Scala thrown in.

Projects

Project Language Notes
{disk.frame} R {disk.frame} https://diskframe.com is the most popular larger-than-RAM data manipulation framework in R.
SortingLab.jl Julia Some of the fastest sorting algorithms in Julia including faster String sorting and sortperm (i.e. R's order) algorithms
JDF.jl Julia A fast DataFrames serialization format and package.
ShortStrings.jl Julia A package that uses Integer (bits types) to store strings more efficiently. Great for sorting and group-by operations. It's been handed over to the JuliaString org. Note: you probably should be using InlineStrings.jl instead
TableScraper.jl Julia A simple scraper for well-formed tables from webapges
PkgVersionHelper.jl Julia A one function, upcheck(), package for checking if your Project.toml contains the most up-to-date packages
DataConvenience.jl Julia Some convenience functions for data manipulation and data related tasks
CuCountMap Julia Fast CUDA.jl based countmap of small types e.g. UInt8
Parquet writer in Julia Julia I wrote the Parquet writer in pure Julia which was contributed back to Parquet.jl

Other contriutions: I have also contributed a fast countmap in StatsBase.jl for small type.

GitHub stats

Social Media & websites

Platform Handle/URL Notes
Twitter @evalparse
LinkedIn daizj
Youtube Data Science ZJ
Baduk Go Weiqi Ratings A ranking list of the professional Go player strengths estimated from the last 365 days of games
Baduk Go Weiqi Ratings2 A ranking list of the professional Go player strengths estimated from the last 365 days of games

Interesting Projects I can no longer find time to maintain

Project Language Notes
JLBoost.jl Julia A pure Julia implementation of XGBoost-like boosting trees
ShinySky R A collection of Shiny widget. One of the earliest "popular" Shiny packages.
FastGroupBy Julia A fast group by functionality. I shared the ideas I had with the main developer of DataFrames.jl and DataFrames.jl were onto the same kind of ideas anyway, so they ended up optimizing the group-by. This makes it unnecesssary to keep maintaining the package.

Awesome lists

Category Notes
awesome-eda Exploratory Data Analysis
awesome-ml-fraemworks Machine Learning Framework
awesome-data-science-notebook-engines Data Science notebooks like Jupyter
awesome-visual-flow-data-science
awesome-markdown-table-editors
awesome-big-medium-data-frameworks
awesome-feature-engineering
awesome-flow

Pinned

  1. DiskFrame/disk.frame DiskFrame/disk.frame Public

    Fast Disk-Based Parallelized Data Manipulation Framework for Larger-than-RAM Data

    R 592 40

  2. JDF.jl JDF.jl Public

    Julia DataFrames serialization format

    Julia 86 8

  3. DataConvenience.jl DataConvenience.jl Public

    Convenience functions missing in Julia

    Julia 24

  4. SortingLab.jl SortingLab.jl Public

    Faster sorting algorithms (sort and sortperm) for Julia

    Julia 23 4

  5. AnalytixWare/ShinySky AnalytixWare/ShinySky Public

    Various UI widgets/components not part of Shiny e.g. alerts, styled buttons

    R 187 65

  6. TableScraper.jl TableScraper.jl Public

    Scrape WELL-FORMED tables from webpages

    Julia 28