{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# MACHINE LEARNING BUSINESS CASE STUDY" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "##### Disclaimer: This case study is included for demonstration purposes in my personal portfolio, and the problem itself is NOT my own creation. However, the solution presented here is solely my own work. While there are various methods to approach this problem, both with and without Machine Learning, using my solution for plagiarism is not advised. Anyone doing so will bear full responsibility for their actions." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Short Description of the Problem:" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "As Lyft's pricing manager for a new city route, your challenge is to refine the pricing model to ensure profitability and high driver availability. With riders accustomed to paying \\$25 and drivers expecting \\$19 per trip, the current 60% match rate is suboptimal. Previous tests showed reducing Lyft's share from \\$6 to \\$3 per trip increased matches to 95%, without a clear revenue gain. You must devise a strategy that maintains rider fees, adjusts driver pay, and considers acquisition costs and churn rates, all to enhance Lyft's annual net revenue for this route. This requires a strategic, analytical approach to determine the most effective commission structure." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Detailed Case:" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Lyft Pricing Strategy Problem\n", "\n", "## Introduction\n", "Imagine you're the pricing product manager for Lyft's ride-scheduling feature, and you're launching in a new city. In this city, people are accustomed to paying \\$25 for rides from the airport to downtown, in either direction, one way. Drivers typically expect to earn \\$19 for this trip.\n", "\n", "For your launch in this new city, you decide to set the price exactly at the prevailing rate: \\$25 per ride charged to the rider, with \\$19 per ride paid to the driver. However, you quickly realize that only about 60% of the ride requests find a driver willing to accept rides at this price point.\n", "\n", "For the purpose of this exercise, we will focus on this specific route, even though there may be multiple routes available in the new city.\n", "\n", "## Current Unit Economics\n", "\n", "### Drivers\n", "- The customer acquisition cost (CAC) of a new driver falls in the range of \\$400 to \\$600. CAC is sensitive to the rate of acquisition due to limited marketing channels.\n", "- At the prevailing wage, drivers experience a 5% monthly churn rate and complete an average of 100 rides per month.\n", "\n", "### Riders\n", "- The CAC for acquiring a new rider is between \\$10 to \\$20, which, similar to driver CAC, is sensitive to the rate of acquisition due to limited marketing channels.\n", "- On average, each rider requests only 1 ride per month.\n", "- Churn varies based on rider experiences: Riders who don't encounter a "failed to find driver" event churn at a rate of 10% per month, while riders who experience one or more such events churn at a higher rate of 33% per month.\n", "\n", "## Previous Pricing Experiment\n", "You've already conducted one pricing experiment where you reduced Lyft's take from \\$6 per ride to \\$3 per ride across the board for a few weeks. This led to a significant increase in match rates, soaring from 60% to approximately 93%.\n", "\n", "## Objective\n", "Your primary objective is to maximize the company's net revenue, which is the difference between the amount riders pay and the amount Lyft pays out to drivers, for this specific route in Toledo for the next 12 months. Keep in mind that you cannot charge riders more than the prevailing rate. \n", "\n", "## Key Question\n", "The central question to address is: How should you adjust the amount you pay drivers per trip (by modifying Lyft's take) to maximize net revenue over the next 12 months on this route?" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "_" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# The Solution" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "\n", "To maximize Lyft's net revenue for the route in Toledo over the next 12 months, we'll use a Machine Learning (ML) model. The ML model will help us predict the optimal balance between Lyft's take and driver payment to maximize net revenue while considering various factors like match rate, driver churn, and rider churn." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### ML Model Approach:\n", "\n", "1. Data Collection: \n", " - Collect historical data on match rates, driver and rider churn rates, and net revenue at different take rates.\n", "\n", "2. Feature Engineering: \n", " - Features can include Lyft's take, match rate, driver churn rate, rider churn rate, and any other relevant factors (e.g., time of day, special events).\n", "\n", "3. Model Selection: \n", " - A regression model (like a Random Forest Regressor) can be used to predict the net revenue based on the features.\n", "\n", "4. Model Training and Validation: \n", " - Train the model on historical data and validate it to ensure accuracy.\n", "\n", "5. Optimization: \n", " - Use the model to predict the net revenue for different take rates and find the optimal rate." ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [], "source": [ "# Import Necessary Libraries\n", "import numpy as np\n", "import pandas as pd\n", "import matplotlib.pyplot as plt\n", "from sklearn.model_selection import train_test_split\n", "from sklearn.ensemble import RandomForestRegressor\n", "from sklearn.metrics import mean_squared_error\n", "import seaborn as sns\n", "from mpl_toolkits.mplot3d import Axes3D\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 1. Set Up and Data Simulation\n", " \n", "We'll start by setting up the environment and simulating a dataset, as we don't have access to real-world data for this scenario. The simulated data will include different values for Lyft's take, match rates, driver churn rates, and rider churn rates, along with the resulting net revenue." ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Example Lyft takes: [4.64644051 5.1455681 4.80829013 4.63464955 4.2709644 ]\n", "Example Match rates: [0.79565049 0.60332102 0.75702264 0.83389423 0.61451189]\n", "Example Driver churn rates: [0.06246074 0.04904336 0.05092624 0.04002082 0.05420172]\n", "Example Rider churn rates: [0.19521137 0.24481221 0.27907438 0.29585829 0.28777492]\n", "Example Net Revenues: [369.69426669711515, 310.44293918083395, 363.99845090234544, 386.48075123896683, 262.4558415315587]\n" ] } ], "source": [ "# Correcting the code for visibility and proper prints\n", "\n", "np.random.seed(0)\n", "data_size = 1000\n", "lyft_takes = np.random.uniform(3, 6, data_size) # Randomly generated Lyft's takes between $3 and $6\n", "match_rates = np.random.uniform(0.6, 0.93, data_size) # Random match rates between 60% and 93%\n", "driver_churn_rates = np.random.uniform(0.03, 0.07, data_size) # Random driver churn rates between 3% and 7%\n", "rider_churn_rates = np.random.uniform(0.1, 0.33, data_size) # Random rider churn rates between 10% and 33%\n", "\n", "# Example prints for visibility\n", "print("Example Lyft takes:", lyft_takes[:5])\n", "print("Example Match rates:", match_rates[:5])\n", "print("Example Driver churn rates:", driver_churn_rates[:5])\n", "print("Example Rider churn rates:", rider_churn_rates[:5])\n", "\n", "# Constants\n", "prevailing_rate_per_ride = 25 # $25 per ride charged to rider\n", "monthly_rides_initial = 100 # Total rides considered per month\n", "\n", "# Calculating net revenue for each data point\n", "net_revenues = []\n", "for lyft_take, match_rate in zip(lyft_takes, match_rates):\n", " successful_rides = monthly_rides_initial * match_rate\n", " total_revenue = successful_rides * prevailing_rate_per_ride\n", " driver_cost = successful_rides * (prevailing_rate_per_ride - lyft_take)\n", " net_revenue = total_revenue - driver_cost\n", " net_revenues.append(net_revenue)\n", "\n", "# Example prints for net revenues\n", "print("Example Net Revenues:", net_revenues[:5])" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 2. DataFrame for the model " ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
1000 rows × 5 columns
\n", "normalize.less
sets [hidden] { display: none; }
but bootstrap.min.css set [hidden] { display: none !important; }
so we also need the !important
here to be able to override the default hidden behavior on the sphinx rendered scikit-learn.org. See: scikit-learn/scikit-learn#21755 */display: inline-block !important;position: relative;}#sk-container-id-1 div.sk-text-repr-fallback {display: none;}</style><div id="sk-container-id-1" class="sk-top-container"><div class="sk-text-repr-fallback">RandomForestRegressor(random_state=42)
On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.<div class="sk-container" hidden><div class="sk-item"><div class="sk-estimator sk-toggleable"><input class="sk-toggleable__control sk-hidden--visually" id="sk-estimator-id-1" type="checkbox" checked><label for="sk-estimator-id-1" class="sk-toggleable__label sk-toggleable__label-arrow">RandomForestRegressor<div class="sk-toggleable__content">
RandomForestRegressor(random_state=42)