Hi, I am a junior linguist with a current interest in computational stylometry. As this area of interest requires a deep understanding of linguistics, statistics, and computational methods, I'm using data science and machine learning to explore my area of interest.
Currently, I'm learning how to extract important linguistic features from text data and how to experiment machine learning models for text classification. I am also exploring how to apply statistical techniques for authorship attribution. In addition to these, I am working on some data science projects in business context to get myself familiar with numbers.
Key Projects
-
PREDICTIVE MODELING
-
Optimizing Ride Fares: A Dynamic Pricing Model for Ride-Sharing Services
- Currently, ride-sharing prices are primarily set based on ride duration, overlooking fluctuating demand and supply. This project explores a dynamic pricing model powered by machine learning to enhance profitability while keeping prices appealing to customers. By experimenting with 12 ML algorithms and two feature engineering techniques, the project developed a model that, when tested with a simulation of 100 customers, showed that increasing the expected ride duration by 20% through a promotional campaign could generate a net profit of $2.4K. (Read More)
-
Addressing Customer Churn in an E-Commerce Company
- This project seeks to reduce an e-commerce company's customer churn rate from 16.8% to 10%. Using diagnostic analysis and a classification model, we focused on minimizing false negatives due to their higher financial impact. After testing various techniques and algorithms, we chose XGBoost and identified tenure and cashback amount as key factors for intervention. Simulations showed that with targeted strategies, achieving the 10% churn rate is feasible. (Read More)
-
Optimizing Ride Fares: A Dynamic Pricing Model for Ride-Sharing Services
-
DATA ANALYSIS
-
Evaluating Marketing Campaign Effectiveness for New Menu Items: An A/B Testing Approach
- This project assesses which promotional campaign best boosts sales for a fast-food company's new menu items. Statistical analysis, including the Kruskal-Wallis H test and Dunn's post-hoc test, was used due to non-normal sales distributions and outliers. Results showed the first campaign achieved the highest median sales, but the practical difference (
$\eta^2$ ) between campaigns were minor. It is recommended that the Marketing Manager re-evaluate marketing strategies and target customers to improve campaign impact. (Read More)
- This project assesses which promotional campaign best boosts sales for a fast-food company's new menu items. Statistical analysis, including the Kruskal-Wallis H test and Dunn's post-hoc test, was used due to non-normal sales distributions and outliers. Results showed the first campaign achieved the highest median sales, but the practical difference (
-
Improving the Number of Review: Exploring Review Patterns in Bangkok's Airbnb Landscape
- Despite an increase in reviews, about 36% (5.7 thousand) of Airbnb listings in Bangkok received none from 2012 to 2022. This project explores why some listings lack reviews and offers recommendations for Airbnb Thailand. It finds that unreviewed listings often have higher prices and longer minimum stays, which may deter bookings and reviews. In contrast, reviewed listings are typically entire homes or apartments, more centrally located, and closer to popular areas. Recommendations include adjusting prices and minimum stays for unreviewed listings, running promotions to boost reviews, and improving marketing to highlight unique features and attractions. (Read More)
-
Evaluating Marketing Campaign Effectiveness for New Menu Items: An A/B Testing Approach
-
NATURAL LANGUAGE PROCESSING
-
Regular Expression for Rule-Based Content Moderation
- This project addresses taboo expressions in computer-mediated communications by detecting and censoring specific elements of messages (e.g., "Shit, I forgot!"
$\rightarrow$ "****, I forgot!"). A rule-based approach using regular expressions was chosen over machine learning for its efficient implementation, high explainability to stakeholders, and reliable detection of inappropriate content through rule matching. (Read More)
- This project addresses taboo expressions in computer-mediated communications by detecting and censoring specific elements of messages (e.g., "Shit, I forgot!"
-
Using Personal Names to Predict Gender: A 3-Character N-Gram Approach
- This project investigated whether conventional machine learning algorithms with character n-grams could outperform Long Short-Term Memory (LSTM) models, which achieved an F1 score of 0.93 (Septiandri, 2017). Using 3-character n-grams focusing on word boundaries to capture spacing between name parts, the Support Vector Machine with a linear kernel performed best, achieving an F1 score of 0.94. The results suggest that conventional models can match or exceed LSTM performance when using word-boundary 3-character n-grams. (Read More)
-
Regular Expression for Rule-Based Content Moderation