Pandas starter file: the CSV is large and spark might be faster. It is possible to read using pandas and do machine learning here instead. Reading the CSV to pandas on my computer took: 5 minutes and 25s. It might speed it up to download the file manually.

In [37]:
import pandas as pd

# URL of the CSV file
url = 'https://project4-nyctaxi.s3.us-east-1.amazonaws.com/train.csv'

# Read the CSV file into a DataFrame
df = pd.read_csv(url)

# Display the first few rows of the DataFrame
df.head()


Unnamed: 0,id,vendor_id,pickup_datetime,dropoff_datetime,passenger_count,pickup_longitude,pickup_latitude,dropoff_longitude,dropoff_latitude,store_and_fwd_flag,trip_duration
0,id2875421,2,2016-03-14 17:24:55,2016-03-14 17:32:30,1,-73.982155,40.767937,-73.96463,40.765602,N,455
1,id2377394,1,2016-06-12 00:43:35,2016-06-12 00:54:38,1,-73.980415,40.738564,-73.999481,40.731152,N,663
2,id3858529,2,2016-01-19 11:35:24,2016-01-19 12:10:48,1,-73.979027,40.763939,-74.005333,40.710087,N,2124
3,id3504673,2,2016-04-06 19:32:31,2016-04-06 19:39:40,1,-74.01004,40.719971,-74.012268,40.706718,N,429
4,id2181028,2,2016-03-26 13:30:55,2016-03-26 13:38:10,1,-73.973053,40.793209,-73.972923,40.78252,N,435


In [38]:
df.shape

(1458644, 11)

Create a sample to test model.

In [39]:
# Take only the first 30,000 rows as a sample
df = df[:30000]
df.shape

(30000, 11)

In [40]:
# Check to see if the 'id' repeats
df['id'].nunique()

30000

In [41]:
# Remove id column
df = df.drop(columns= 'id')

The data is not in a format that helps keras, before scaling we'll do some data engineering.

In [42]:
df.dtypes

vendor_id               int64
pickup_datetime        object
dropoff_datetime       object
passenger_count         int64
pickup_longitude      float64
pickup_latitude       float64
dropoff_longitude     float64
dropoff_latitude      float64
store_and_fwd_flag     object
trip_duration           int64
dtype: object

In [43]:
# Convert to datetime format
df['pickup_datetime'] = pd.to_datetime(df['pickup_datetime'])

# Extract features
df['hour'] = df['pickup_datetime'].dt.hour  # Hour of the day (0-23)
df['day_of_week'] = df['pickup_datetime'].dt.dayofweek  # Day of the week (0=Monday, 6=Sunday)
df['month'] = df['pickup_datetime'].dt.month  # Month (1-12)
df['day_of_year'] = df['pickup_datetime'].dt.dayofyear  # Day of the year (1-366)
df['is_weekend'] = df['pickup_datetime'].dt.dayofweek >= 5  # Weekend flag (True/False)

df.head()

Unnamed: 0,vendor_id,pickup_datetime,dropoff_datetime,passenger_count,pickup_longitude,pickup_latitude,dropoff_longitude,dropoff_latitude,store_and_fwd_flag,trip_duration,hour,day_of_week,month,day_of_year,is_weekend
0,2,2016-03-14 17:24:55,2016-03-14 17:32:30,1,-73.982155,40.767937,-73.96463,40.765602,N,455,17,0,3,74,False
1,1,2016-06-12 00:43:35,2016-06-12 00:54:38,1,-73.980415,40.738564,-73.999481,40.731152,N,663,0,6,6,164,True
2,2,2016-01-19 11:35:24,2016-01-19 12:10:48,1,-73.979027,40.763939,-74.005333,40.710087,N,2124,11,1,1,19,False
3,2,2016-04-06 19:32:31,2016-04-06 19:39:40,1,-74.01004,40.719971,-74.012268,40.706718,N,429,19,2,4,97,False
4,2,2016-03-26 13:30:55,2016-03-26 13:38:10,1,-73.973053,40.793209,-73.972923,40.78252,N,435,13,5,3,86,True


We've created more scalable columns: hour, day of the week, month, day of the year, and weekend. Now we need to handle to eact time.

In [44]:
import numpy as np

For cyclic features like hour of the day and day of the week, use sine and cosine transformations to retain the cyclical nature.

In [45]:
# Encode hour of the day
df['hour_sin'] = np.sin(2 * np.pi * df['hour'] / 24)
df['hour_cos'] = np.cos(2 * np.pi * df['hour'] / 24)

# Encode day of the week
df['day_of_week_sin'] = np.sin(2 * np.pi * df['day_of_week'] / 7)
df['day_of_week_cos'] = np.cos(2 * np.pi * df['day_of_week'] / 7)

df[['hour', 'hour_sin', 'hour_cos', 'day_of_week', 'day_of_week_sin', 'day_of_week_cos']]


Unnamed: 0,hour,hour_sin,hour_cos,day_of_week,day_of_week_sin,day_of_week_cos
0,17,-0.965926,-2.588190e-01,0,0.000000,1.000000
1,0,0.000000,1.000000e+00,6,-0.781831,0.623490
2,11,0.258819,-9.659258e-01,1,0.781831,0.623490
3,19,-0.965926,2.588190e-01,2,0.974928,-0.222521
4,13,-0.258819,-9.659258e-01,5,-0.974928,-0.222521
...,...,...,...,...,...,...
29995,22,-0.500000,8.660254e-01,4,-0.433884,-0.900969
29996,8,0.866025,-5.000000e-01,3,0.433884,-0.900969
29997,1,0.258819,9.659258e-01,3,0.433884,-0.900969
29998,18,-1.000000,-1.836970e-16,5,-0.974928,-0.222521


Now let's adjust the latitude and longitude columns to make them better for scaling.

In [46]:
# Define a function for haversine distance
def haversine(lat1, lon1, lat2, lon2):
    R = 6371  # Radius of Earth in kilometers
    lat1, lon1, lat2, lon2 = map(np.radians, [lat1, lon1, lat2, lon2])
    dlat = lat2 - lat1
    dlon = lon2 - lon1
    a = np.sin(dlat / 2.0)**2 + np.cos(lat1) * np.cos(lat2) * np.sin(dlon / 2.0)**2
    c = 2 * np.arcsin(np.sqrt(a))
    return R * c

# Apply haversine to calculate distance
df['distance_km'] = haversine(df['pickup_latitude'], df['pickup_longitude'],
                              df['dropoff_latitude'], df['dropoff_longitude'])

df[['pickup_latitude', 'pickup_longitude', 'dropoff_latitude', 'dropoff_longitude', 'distance_km']]


Unnamed: 0,pickup_latitude,pickup_longitude,dropoff_latitude,dropoff_longitude,distance_km
0,40.767937,-73.982155,40.765602,-73.964630,1.498521
1,40.738564,-73.980415,40.731152,-73.999481,1.805507
2,40.763939,-73.979027,40.710087,-74.005333,6.385098
3,40.719971,-74.010040,40.706718,-74.012268,1.485498
4,40.793209,-73.973053,40.782520,-73.972923,1.188588
...,...,...,...,...,...
29995,40.777569,-73.979042,40.771542,-73.956421,2.019304
29996,40.762169,-73.983887,40.753639,-73.974396,1.240406
29997,40.755260,-73.994858,40.745785,-73.994324,1.054611
29998,40.772476,-73.948959,40.781132,-73.985283,3.206423


In [47]:
from sklearn.cluster import KMeans

# Stack coordinates for clustering
coords = np.vstack((df[['pickup_latitude', 'pickup_longitude']].values,
                    df[['dropoff_latitude', 'dropoff_longitude']].values))

# Apply k-means clustering
kmeans = KMeans(n_clusters=20, random_state=42)
kmeans.fit(coords)

# Assign cluster labels for pickup and dropoff
df['pickup_zone'] = kmeans.predict(df[['pickup_latitude', 'pickup_longitude']])
df['dropoff_zone'] = kmeans.predict(df[['dropoff_latitude', 'dropoff_longitude']])

df[['pickup_latitude', 'pickup_longitude', 'pickup_zone', 'dropoff_zone']]




Unnamed: 0,pickup_latitude,pickup_longitude,pickup_zone,dropoff_zone
0,40.767937,-73.982155,19,15
1,40.738564,-73.980415,3,12
2,40.763939,-73.979027,19,0
3,40.719971,-74.010040,0,0
4,40.793209,-73.973053,4,4
...,...,...,...,...
29995,40.777569,-73.979042,4,1
29996,40.762169,-73.983887,19,8
29997,40.755260,-73.994858,6,6
29998,40.772476,-73.948959,1,4


In [48]:
def calculate_bearing(lat1, lon1, lat2, lon2):
    lat1, lon1, lat2, lon2 = map(np.radians, [lat1, lon1, lat2, lon2])
    dlon = lon2 - lon1
    x = np.sin(dlon) * np.cos(lat2)
    y = np.cos(lat1) * np.sin(lat2) - np.sin(lat1) * np.cos(lat2) * np.cos(dlon)
    bearing = np.arctan2(x, y)
    return np.degrees(bearing) % 360

df['bearing'] = calculate_bearing(df['pickup_latitude'], df['pickup_longitude'],
                                   df['dropoff_latitude'], df['dropoff_longitude'])
df[['bearing']]


Unnamed: 0,bearing
0,99.970196
1,242.846232
2,200.319835
3,187.262300
4,179.473585
...,...
29995,109.376431
29996,139.871262
29997,177.554978
29998,287.479321


Let's take a look at all the columns to see what to keep before scaling.

In [49]:
df.head()

Unnamed: 0,vendor_id,pickup_datetime,dropoff_datetime,passenger_count,pickup_longitude,pickup_latitude,dropoff_longitude,dropoff_latitude,store_and_fwd_flag,trip_duration,...,day_of_year,is_weekend,hour_sin,hour_cos,day_of_week_sin,day_of_week_cos,distance_km,pickup_zone,dropoff_zone,bearing
0,2,2016-03-14 17:24:55,2016-03-14 17:32:30,1,-73.982155,40.767937,-73.96463,40.765602,N,455,...,74,False,-0.965926,-0.258819,0.0,1.0,1.498521,19,15,99.970196
1,1,2016-06-12 00:43:35,2016-06-12 00:54:38,1,-73.980415,40.738564,-73.999481,40.731152,N,663,...,164,True,0.0,1.0,-0.781831,0.62349,1.805507,3,12,242.846232
2,2,2016-01-19 11:35:24,2016-01-19 12:10:48,1,-73.979027,40.763939,-74.005333,40.710087,N,2124,...,19,False,0.258819,-0.965926,0.781831,0.62349,6.385098,19,0,200.319835
3,2,2016-04-06 19:32:31,2016-04-06 19:39:40,1,-74.01004,40.719971,-74.012268,40.706718,N,429,...,97,False,-0.965926,0.258819,0.974928,-0.222521,1.485498,0,0,187.2623
4,2,2016-03-26 13:30:55,2016-03-26 13:38:10,1,-73.973053,40.793209,-73.972923,40.78252,N,435,...,86,True,-0.258819,-0.965926,-0.974928,-0.222521,1.188588,4,4,179.473585


In [50]:
df.columns

Index(['vendor_id', 'pickup_datetime', 'dropoff_datetime', 'passenger_count',
       'pickup_longitude', 'pickup_latitude', 'dropoff_longitude',
       'dropoff_latitude', 'store_and_fwd_flag', 'trip_duration', 'hour',
       'day_of_week', 'month', 'day_of_year', 'is_weekend', 'hour_sin',
       'hour_cos', 'day_of_week_sin', 'day_of_week_cos', 'distance_km',
       'pickup_zone', 'dropoff_zone', 'bearing'],
      dtype='object')

Drop: 'pickup_datetime', 'dropoff_datetime', 'pickup_longitude', 'pickup_latitude', 'dropoff_longitude', 'dropoff_latitude', 'hour', 'day_of_week'

Get dummies: 'vendor_id', 'store_and_fwd_flag', 'pickup_zone', 'dropoff_zone'

In [51]:
df = df.drop(columns= ['pickup_datetime', 'dropoff_datetime', 'pickup_longitude', 'pickup_latitude', 'dropoff_longitude', 'dropoff_latitude', 'hour', 'day_of_week'])
df.head()

Unnamed: 0,vendor_id,passenger_count,store_and_fwd_flag,trip_duration,month,day_of_year,is_weekend,hour_sin,hour_cos,day_of_week_sin,day_of_week_cos,distance_km,pickup_zone,dropoff_zone,bearing
0,2,1,N,455,3,74,False,-0.965926,-0.258819,0.0,1.0,1.498521,19,15,99.970196
1,1,1,N,663,6,164,True,0.0,1.0,-0.781831,0.62349,1.805507,3,12,242.846232
2,2,1,N,2124,1,19,False,0.258819,-0.965926,0.781831,0.62349,6.385098,19,0,200.319835
3,2,1,N,429,4,97,False,-0.965926,0.258819,0.974928,-0.222521,1.485498,0,0,187.2623
4,2,1,N,435,3,86,True,-0.258819,-0.965926,-0.974928,-0.222521,1.188588,4,4,179.473585


In [52]:
df.shape

(30000, 15)

In [53]:
df['dropoff_zone'].nunique()

20

In [54]:
df['pickup_zone'].nunique()

19

In [55]:
# Get dummies for categorical columns
dummies = pd.get_dummies(df, columns=['vendor_id', 'store_and_fwd_flag','pickup_zone', 'dropoff_zone'], drop_first=True)


In [56]:
# Convert boolean columns to binary (0/1)
dummies = dummies.astype(int)

In [61]:
df['is_weekend'] = df['is_weekend'].astype(int)

In [57]:
# Concatenate dummies with the original DataFrame
df = pd.concat([df, dummies], axis=1)

In [58]:
df

Unnamed: 0,vendor_id,passenger_count,store_and_fwd_flag,trip_duration,month,day_of_year,is_weekend,hour_sin,hour_cos,day_of_week_sin,...,dropoff_zone_10,dropoff_zone_11,dropoff_zone_12,dropoff_zone_13,dropoff_zone_14,dropoff_zone_15,dropoff_zone_16,dropoff_zone_17,dropoff_zone_18,dropoff_zone_19
0,2,1,N,455,3,74,False,-0.965926,-2.588190e-01,0.000000,...,0,0,0,0,0,1,0,0,0,0
1,1,1,N,663,6,164,True,0.000000,1.000000e+00,-0.781831,...,0,0,1,0,0,0,0,0,0,0
2,2,1,N,2124,1,19,False,0.258819,-9.659258e-01,0.781831,...,0,0,0,0,0,0,0,0,0,0
3,2,1,N,429,4,97,False,-0.965926,2.588190e-01,0.974928,...,0,0,0,0,0,0,0,0,0,0
4,2,1,N,435,3,86,True,-0.258819,-9.659258e-01,-0.974928,...,0,0,0,0,0,0,0,0,0,0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
29995,2,1,N,567,5,148,False,-0.500000,8.660254e-01,-0.433884,...,0,0,0,0,0,0,0,0,0,0
29996,2,1,N,917,5,126,False,0.866025,-5.000000e-01,0.433884,...,0,0,0,0,0,0,0,0,0,0
29997,1,1,N,232,4,119,False,0.258819,9.659258e-01,0.433884,...,0,0,0,0,0,0,0,0,0,0
29998,1,1,N,1065,3,86,True,-1.000000,-1.836970e-16,-0.974928,...,0,0,0,0,0,0,0,0,0,0


In [59]:
df.columns

Index(['vendor_id', 'passenger_count', 'store_and_fwd_flag', 'trip_duration',
       'month', 'day_of_year', 'is_weekend', 'hour_sin', 'hour_cos',
       'day_of_week_sin', 'day_of_week_cos', 'distance_km', 'pickup_zone',
       'dropoff_zone', 'bearing', 'passenger_count', 'trip_duration', 'month',
       'day_of_year', 'is_weekend', 'hour_sin', 'hour_cos', 'day_of_week_sin',
       'day_of_week_cos', 'distance_km', 'bearing', 'vendor_id_2',
       'store_and_fwd_flag_Y', 'pickup_zone_1', 'pickup_zone_2',
       'pickup_zone_3', 'pickup_zone_4', 'pickup_zone_5', 'pickup_zone_6',
       'pickup_zone_7', 'pickup_zone_8', 'pickup_zone_9', 'pickup_zone_10',
       'pickup_zone_11', 'pickup_zone_12', 'pickup_zone_13', 'pickup_zone_15',
       'pickup_zone_16', 'pickup_zone_17', 'pickup_zone_18', 'pickup_zone_19',
       'dropoff_zone_1', 'dropoff_zone_2', 'dropoff_zone_3', 'dropoff_zone_4',
       'dropoff_zone_5', 'dropoff_zone_6', 'dropoff_zone_7', 'dropoff_zone_8',
       'dropoff_zone_

In [60]:
# Drop original columns before get_dummies
df = df.drop(columns= ['vendor_id', 'store_and_fwd_flag','pickup_zone', 'dropoff_zone'])
df.head()

Unnamed: 0,passenger_count,trip_duration,month,day_of_year,is_weekend,hour_sin,hour_cos,day_of_week_sin,day_of_week_cos,distance_km,...,dropoff_zone_10,dropoff_zone_11,dropoff_zone_12,dropoff_zone_13,dropoff_zone_14,dropoff_zone_15,dropoff_zone_16,dropoff_zone_17,dropoff_zone_18,dropoff_zone_19
0,1,455,3,74,False,-0.965926,-0.258819,0.0,1.0,1.498521,...,0,0,0,0,0,1,0,0,0,0
1,1,663,6,164,True,0.0,1.0,-0.781831,0.62349,1.805507,...,0,0,1,0,0,0,0,0,0,0
2,1,2124,1,19,False,0.258819,-0.965926,0.781831,0.62349,6.385098,...,0,0,0,0,0,0,0,0,0,0
3,1,429,4,97,False,-0.965926,0.258819,0.974928,-0.222521,1.485498,...,0,0,0,0,0,0,0,0,0,0
4,1,435,3,86,True,-0.258819,-0.965926,-0.974928,-0.222521,1.188588,...,0,0,0,0,0,0,0,0,0,0


In [62]:
df.head()

Unnamed: 0,passenger_count,trip_duration,month,day_of_year,is_weekend,hour_sin,hour_cos,day_of_week_sin,day_of_week_cos,distance_km,...,dropoff_zone_10,dropoff_zone_11,dropoff_zone_12,dropoff_zone_13,dropoff_zone_14,dropoff_zone_15,dropoff_zone_16,dropoff_zone_17,dropoff_zone_18,dropoff_zone_19
0,1,455,3,74,0,-0.965926,-0.258819,0.0,1.0,1.498521,...,0,0,0,0,0,1,0,0,0,0
1,1,663,6,164,1,0.0,1.0,-0.781831,0.62349,1.805507,...,0,0,1,0,0,0,0,0,0,0
2,1,2124,1,19,0,0.258819,-0.965926,0.781831,0.62349,6.385098,...,0,0,0,0,0,0,0,0,0,0
3,1,429,4,97,0,-0.965926,0.258819,0.974928,-0.222521,1.485498,...,0,0,0,0,0,0,0,0,0,0
4,1,435,3,86,1,-0.258819,-0.965926,-0.974928,-0.222521,1.188588,...,0,0,0,0,0,0,0,0,0,0


In [63]:
df.columns

Index(['passenger_count', 'trip_duration', 'month', 'day_of_year',
       'is_weekend', 'hour_sin', 'hour_cos', 'day_of_week_sin',
       'day_of_week_cos', 'distance_km', 'bearing', 'passenger_count',
       'trip_duration', 'month', 'day_of_year', 'is_weekend', 'hour_sin',
       'hour_cos', 'day_of_week_sin', 'day_of_week_cos', 'distance_km',
       'bearing', 'vendor_id_2', 'store_and_fwd_flag_Y', 'pickup_zone_1',
       'pickup_zone_2', 'pickup_zone_3', 'pickup_zone_4', 'pickup_zone_5',
       'pickup_zone_6', 'pickup_zone_7', 'pickup_zone_8', 'pickup_zone_9',
       'pickup_zone_10', 'pickup_zone_11', 'pickup_zone_12', 'pickup_zone_13',
       'pickup_zone_15', 'pickup_zone_16', 'pickup_zone_17', 'pickup_zone_18',
       'pickup_zone_19', 'dropoff_zone_1', 'dropoff_zone_2', 'dropoff_zone_3',
       'dropoff_zone_4', 'dropoff_zone_5', 'dropoff_zone_6', 'dropoff_zone_7',
       'dropoff_zone_8', 'dropoff_zone_9', 'dropoff_zone_10',
       'dropoff_zone_11', 'dropoff_zone_12', 'drop

OK, now we can start split, test, train and scale

In [64]:
# Import our dependencies
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.metrics import accuracy_score
import tensorflow as tf

In [None]:
# Remove target from features data
y = df.trip_duration.values
X = df.drop(columns="trip_duration").values

# Split training/test datasets
X_train, X_test, y_train, y_test = train_test_split(X, y, random_state=42)

In [67]:
# Preprocess numerical data for neural network

# Create a StandardScaler instances
scaler = StandardScaler()

# Fit the StandardScaler
X_scaler = scaler.fit(X_train)

# Scale the data
X_train_scaled = X_scaler.transform(X_train)
X_test_scaled = X_scaler.transform(X_test)

In [71]:
X.shape

(30000, 59)

In [76]:
# Define the model architecture
nn_model = tf.keras.Sequential([
    tf.keras.layers.InputLayer(input_shape=(X_train_scaled.shape[1],)),  # Shape of your input features
    tf.keras.layers.Dense(128, activation='relu'),  # First hidden layer with 128 units and ReLU activation
    tf.keras.layers.Dense(64, activation='relu'),   # Second hidden layer with 64 units and ReLU activation
    tf.keras.layers.Dense(32, activation='relu'),   # Third hidden layer with 32 units and ReLU activation
    tf.keras.layers.Dense(1)  # Output layer with 1 unit (for predicting trip duration)
])

# Compile the model
nn_model.compile(loss='mean_squared_error',   # MSE for regression
                 optimizer='adam',            # Adam optimizer is good for regression tasks
                 metrics=['mae'])             # Mean Absolute Error as a metric for evaluation

# Train the model
fit_model = nn_model.fit(X_train_scaled, y_train, epochs=50, batch_size=32, validation_split=0.2)


Epoch 1/50
[1m563/563[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m2s[0m 2ms/step - loss: 6907739.0000 - mae: 636.3627 - val_loss: 6108054.5000 - val_mae: 432.8905
Epoch 2/50
[1m563/563[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 1ms/step - loss: 7201754.5000 - mae: 426.2437 - val_loss: 6090110.0000 - val_mae: 383.9971
Epoch 3/50
[1m563/563[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 1ms/step - loss: 6974503.0000 - mae: 395.4896 - val_loss: 6098663.0000 - val_mae: 410.2178
Epoch 4/50
[1m563/563[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 1ms/step - loss: 11027380.0000 - mae: 467.3323 - val_loss: 6081035.0000 - val_mae: 375.8071
Epoch 5/50
[1m563/563[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 1ms/step - loss: 8306034.0000 - mae: 411.6190 - val_loss: 6079998.0000 - val_mae: 389.8233
Epoch 6/50
[1m563/563[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 1ms/step - loss: 9026545.0000 - mae: 434.1517 - val_loss: 6086450.5000 - val_mae: 

In [77]:
# Evaluate model
model_loss, model_mae = nn_model.evaluate(X_test_scaled, y_test, verbose=2)
print(f"Test Loss (MSE): {model_loss}")
print(f"Test MAE: {model_mae}")

235/235 - 0s - 741us/step - loss: 12900175.0000 - mae: 565.9164
Test Loss (MSE): 12900175.0
Test MAE: 565.9164428710938


This means that the model is about 565 seconds (over 9 minutes) off. It's not very precise. Outliers might be effecting this. Let's look at some ways to optimize this model. A MinMaxScaler could be used on `trip_duration` to help the learning avoid outliers. A different loss function, like Huber loss, might be less sensitive to outliers than MAE.

Start a new file called nn_model_optimization to experiment with some of these ideas.