I'm an enthusiastic data scientist with over eight years of experience in data analysis, data visualization, and data storytelling. I enjoy solving challenging problems, harnessing the power of machine learning to derive valuable insights, and effectively communicating complex information.
Category | Skill |
---|---|
Programming | |
Data Manipulation | |
Data Visualization | |
Machine Learning | |
Big Data | |
Web Framework | |
Cloud Computing | |
Version Control | |
Code Editors |
- Motivation: Simplify the process of finding rental properties in Singapore's expensive real estate market by using machine learning to estimate rental prices.
- Data collection: Scraped 1680 property listings from an online property portal, including information on price, size, address, bedrooms, bathrooms and more.
- Exploratory data analysis: Visualized property locations on an interactive map, generated a word cloud to extract insights from property agent descriptions, and examined descriptive statistics, distributions, and correlations.
- Data preprocessing: Handled missing address data and engineered location-related features using the Google Maps API, extracted property features from agent descriptions and systematically evaluated multiple outlier handling methods.
- Model training: Trained five machine learning models with baseline configurations, selected an XGBoost regression model with optimized hyperparameters, and achieved a test dataset performance with an RMSE of 995, a MAPE of 0.13, and an R² of 0.90.
- Model deployment: Created a web application for serving the XGBoost model using the Flask framework. Containerized this application using Docker and successfully deployed the Docker container on render.com.
- Motivation: Develop a hate speech detector for social media comments.
- Data: Utilized the ETHOS Hate Speech Detection Dataset.
- Models: Trained and evaluated the performance of three deep learning models using TensorFlow and scikit-learn. The fine-tuned BERT model demonstrated superior performance (78.0% accuracy) compared to the SimpleRNN (66.3%) and LSTM (70.7%) models.
- Deployment: Prepared the fine-tuned BERT model for production by integrating it into a web application and an API endpoint using the Flask web framework.
Fine-tuned BERT: Confusion Matrix | Model Deployment |
---|---|
![]() |
- Developed an AI-assisted cover letter generator that empowers job seekers in crafting personalized and professional cover letters tailored to specific job offers.
- Scraped job postings by employing Python and Beautiful Soup and utilized the ChatGPT API to extract key information, including requirements and tasks, in JSON format.
- By leveraging the ChatGPT API further, cover letter suggestions were generated, aligning the candidate's education, work experience, skills, and motivation with the specific job's requirements and tasks.
{
"employer": "OpenAI",
"job title": "Research Scientist",
"requirements": [
"Track record of coming up with new ideas or improving
upon existing ideas in machine learning",
"Ability to own and pursue a research agenda",
"Excitement about OpenAI's approach to research",
"Nice to have: Interested in and thoughtful about the
impacts of AI technology",
"Nice to have: Past experience in creating high-performance
implementations of deep learning algorithms"
],
"tasks": [
"Develop innovative machine learning techniques",
"Advance the research agenda of the team",
"Collaborate with peers across the organization"
],
"contact person": "unknown",
"address": "San Francisco, California, United States"
}
Advanced SQL: MySQL for Ecommerce & Web Analytics, Udemy, February 2024, 🔗 see certificate
Skills: MySQL · SQL
AWS Certified Cloud Practitioner, AWS, January 2024, 🔗 see certificate
Skills: Amazon Web Services (AWS)
Ultimate AWS Certified Cloud Practitioner CLF-C02, Udemy, January 2024, 🔗 see certificate
Skills: Amazon Web Services (AWS)
Spark and Python for Big Data with PySpark, Udemy, January 2024, 🔗 see certificate
Skills: Spark · PySpark · AWS · Python · Machine Learning · Linear Regression · Logistic Regression · Decision Trees · Random Forest · Gradient Boosting · k-means clustering · Recommender Systems · Natural Language Processing (NLP)
Microsoft Power BI Data Analyst, Udemy, November 2023, 🔗 see certificate
Skills: Power BI
Deep Learning, alfatraining Bildungszentrum GmbH, April 2023
Skills: TensorFlow · NumPy · Natural Language Processing (NLP) · Python · Deep Learning · Recurrent Neural Networks (RNN) · Neural Networks · Scikit-Learn · Reinforcement Learning · Transfer Learning · Convolutional Neural Networks (CNN) · Time Series Analysis
Machine Learning by Stanford University & DeepLearning.AI, Coursera, April 2023, 🔗 see certificate
Skills: Decision Trees · Recommender Systems · Anomaly Detection · Python · Linear Regression · Neural Networks · Logistic Regression · Reinforcement Learning · Principal Component Analysis · k-means clustering
Python for Machine Learning & Data Science Masterclass, Udemy, March 2023, 🔗 see certificate
Skills: Decision Trees · Support Vector Machine (SVM) · Matplotlib · Random Forest · Naive Bayes · NumPy · Seaborn · Hierarchical Clustering · Natural Language Processing (NLP) · Pandas · Python · Linear Regression · Scikit-Learn · Logistic Regression · Principal Component Analysis · Gradient Boosting · DBSCAN · k-means clustering · K-Nearest Neighbors (KNN)
Machine Learning, alfatraining Bildungszentrum GmbH, February 2023
Skills: Decision Trees · Support Vector Machine (SVM) · Matplotlib · Naive Bayes · NumPy · Hierarchical Clustering · Pandas · Python · Linear Regression · Neural Networks · Scikit-Learn · Principal Component Analysis · DBSCAN · k-means clustering · K-Nearest Neighbors (KNN)
The Ultimate MySQL Bootcamp: Go from SQL Beginner to Expert, Udemy, December 2022, 🔗 see certificate
Skills: MySQL · SQL
Profile banner GIF based on the video by RDNE Stock project from Pexels