Skip to content
View viplazylmht's full-sized avatar
🏠
Working from home
🏠
Working from home

Block or report viplazylmht

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Please don't include any personal information such as legal names or email addresses. Maximum 100 characters, markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
viplazylmht/README.md

Hi there 👋

Here's some information that can help you to know about me, let's go!

TLDR; Check out this pdf version of my CV.

Hits

Experience

  • 01/2022 -> present: Data Engineer at MoMo (M_service). From MoMo Talents Program.

Education


Skills

  • Agile / Scrum concept
  • Programming Languages (C/C++, Java, Kotlin, Python, SQL,...)
  • MS SQL Server / Oracle OCI / Bigquery / Vertica / Trino
  • Open Table Format (Delta Lake / Apache Iceberg)
  • Command Line (with or without Linux/Unix system)
  • Git and Version Control
  • CI / CD
  • Shell / Linux
  • Docker
  • Kubernetes
  • ETL / ELT
  • Spark Application
  • Data modeling
  • Data Observability / Data Quality / Data Catalog / Data Security
  • Data Governance
  • Google Cloud Platform (Bigquery / PubSub / Dataproc / GKE / GCS / Cloud Functions / Resource monitoring / Looker / GCP gRPC API)
  • Oracle APEX
  • Scikit-learn
  • Machine Learning Algorithms
  • Generative AI / Agentic AI
  • MS Office
  • Kubectl / Helm / Skaffold
  • Bazel
  • Infrastructure as code (IaC) with pulumi
  • Policy as code

Tools

Contributions


Project

Company Projects (newest order)

  • Data Agent
    Developing GenAI and Agentic AI agents to help users quickly extract insights from internal data and documents. It reduce engineers' time spent on periodic data analysis by 80%, enable autonomous AI-generated insights for customer reports, and provide a chatbot for engineers and customers to easily query and extract insights about their data and documents.
    Fluent in: GenAI, Agentic AI, LangChain, SMTP Email, FastAPI, Chatbots

  • Access Management
    Develop a SOC 2-compliant platform to manage time-based privileged access to all data, sensitive data and policy tags across multiple data warehouses, data lakehouses, and services. The Access Management tool centralizes the approval process for 100% data access requests within the data platform. Fluent in: SOC 2-compliant, SMTP Email, FastAPI, OpenID Connect

  • Data Pipeline Migration
    Build a transpiling tool based on top of open-source projects to help end-to-end migrate SQL from current production environment to the Lakehouse, reduce up to 90% human cost of the migration phase at Momo.
    Fluent in: SQLGlot, Trino/Presto, Bigquery, Airflow

  • Data Lakehouse
    Collaborate with the team to build a lakehouse solution to reduce the cost of all workloads at Momo. Trino/Spark run on GKE as a query engine to process large batch data stored in GCS. Reduce up to 70% cost per workload thanks to Spot instance without any data SLA.
    Fluent in: Trino, Spark, GKE, GCS, Bigquery Storage, dbt, Airflow, Apache Ranger, Delta Lake, Apache Iceberg

  • Cost Optimization - Reduce cost on GCP
    Support other teams to optimize queries: move services, ETL, and ELT to on-premise Kubernetes. Try to shift from Bigquery to Vertica. Manage GCP resources for each team in MoMo by the divide-and-conquer principle.
    Conclusion: 40% cost saved without any stuck workload.
    Fluent in: Bigquery, Vertica, Kubernetes, Oracle APEX, GCP gRPC API.

  • Golden Record - Process to achieve high-value Data Mart at MoMo
    Build tools and services on top of open-source projects to control the data model's quality, freshness, and extensionality. Golden Record currently serves many dataflows such as events and transactions of the MoMo Super App.
    Used: dbt, Great Expectations, Airflow, Gitlab, Kubernetes, Oracle OCI, and Oracle APEX.

University projects


Badges

There are a lot of badges (with AI, Machine Learning, Deep Learning, and Data Scientist) I have reached from that base on Google Cloud Platform.

Let's check out my Qwiklabs Public Profile.

Programming Languages

Top Langs

Duy's GitHub stats


Contact

Website

Github Page: viplazylmht.github.io

Pinned Loading

  1. sql-datalineage Public

    A project to build and visualize data lineage from SQL written in python. It supports column level lineage and can combine with metadata retriever for better result.

    Python 2

  2. PublicIDConverter Public

    Public ID Converter is a tool for Android that can convert ids when porting/modifing apk.

    Java 1 1

  3. LongInteger Public

    C++

  4. Predict_Covid19 Public

    Forked from caotatcuong/Predict_Covid19

    Jupyter Notebook