<a href="https://colab.research.google.com/github/Michael-Santoro/airport-ride-share-pricing-strategy/blob/q_table/q_table.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [None]:
N_DRIVERS = 1000

# Bellman Equation Approach
**Michael Santoro - michael.santoro@du.edu**

The Bellman equation is a fundamental equation in reinforcement learning that expresses the relationship between the value of a state or state-action pair and the expected future rewards. It is named after Richard Bellman, who made significant contributions to the field of dynamic programming.

The Bellman equation can be written in two forms: one for state values (V-values) and another for action values (Q-values).

Bellman equation for state values (V-values):
$V(s) = max [ Q(s, a) ]$ over all possible actions 'a'
$V(s) = max [ R(s, a) + γ * V(s') ]$ over all possible actions 'a'
Here, $V(s)$ represents the value of a state $'s'$, $Q(s, a)$ represents the action value of taking action 'a' in state $'s'$, $R(s, a)$ represents the immediate reward obtained by taking action 'a' in state 's', $\gamma$ (gamma) represents the discount factor $(0 <= γ <= 1)$ that determines the importance of future rewards, and $V(s')$ represents the value of the next state 's' that the agent transitions to after taking action 'a' in state 's'.

In [None]:
import numpy as np

def sample_poisson(lambd, size=1):
    """
    Generate random samples from a Poisson distribution with parameter lambda.

    Parameters:
        lambd (float): The parameter lambda of the Poisson distribution.
        size (int, optional): The number of samples to generate (default 1).

    Returns:
        np.ndarray: An array of random samples from the Poisson distribution.
    """
    return np.random.poisson(lam=lambd, size=size)

In [None]:
arr = sample_poisson(1,1000)

In [None]:
# Get the distinct values
distinct_values = np.unique(arr)
distinct_values

array([0, 1, 2, 3, 4, 5, 6, 7])

In [None]:
q_table = {}

In [None]:
from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score
import pandas as pd
df = pd.read_csv('/content/driverAcceptanceData - driverAcceptanceData.csv',index_col=0)


# Split data into features and target
X = df.drop('ACCEPTED', axis=1).values
y = df['ACCEPTED'].values

# Split data into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Fit logistic regression model
lr = LogisticRegression()
lr.fit(X_train, y_train)

# Make predictions on test set
y_pred = lr.predict(X_test)

# Evaluate model performance
accuracy = accuracy_score(y_test, y_pred)
print('Accuracy:', accuracy)

Accuracy: 0.825


In [None]:
def driver_desc(price, n_drivers=N_DRIVERS):
  return np.any(np.random.choice([False, True], size=n_drivers, p=lr.predict_proba(np.array([price]).reshape(-1, 1))[0]))

In [None]:
np.any(np.random.choice([False, True], size=N_DRIVERS, p=lr.predict_proba(np.array([1]).reshape(-1, 1))[0]))

True

In [None]:
## Cost Array
c = np.linspace(0.01,30,3000)

In [None]:
m = 11

# Apply the function to each element using np.vectorize()
driver_choice = np.vectorize(driver_desc)(c)


In [None]:
driver_choice[:25]

array([ True,  True, False,  True,  True,  True,  True, False, False,
        True,  True,  True, False,  True, False,  True,  True,  True,
       False,  True,  True,  True,  True, False,  True])