# Exploratory data analysis

First we have to explore the data and find problems or insights.

Installing the libraries needed and loading the data

In [1]:
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

In [2]:
df = pd.read_parquet('../data/raw/yellow_tripdata_2025-07.parquet')

Now let's look at the data itself.

In [3]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 3898963 entries, 0 to 3898962
Data columns (total 20 columns):
 #   Column                 Dtype         
---  ------                 -----         
 0   VendorID               int32         
 1   tpep_pickup_datetime   datetime64[us]
 2   tpep_dropoff_datetime  datetime64[us]
 3   passenger_count        float64       
 4   trip_distance          float64       
 5   RatecodeID             float64       
 6   store_and_fwd_flag     object        
 7   PULocationID           int32         
 8   DOLocationID           int32         
 9   payment_type           int64         
 10  fare_amount            float64       
 11  extra                  float64       
 12  mta_tax                float64       
 13  tip_amount             float64       
 14  tolls_amount           float64       
 15  improvement_surcharge  float64       
 16  total_amount           float64       
 17  congestion_surcharge   float64       
 18  Airport_fee           

Data dictionary from the official [website](https://www.nyc.gov/site/tlc/about/tlc-trip-record-data.page):

| Field Name | Description |
|------------|-------------|
| **VendorID** | A code indicating the TPEP provider that provided the record.<br>1 = Creative Mobile Technologies, LLC<br>2 = Curb Mobility, LLC<br>6 = Myle Technologies Inc<br>7 = Helix |
| **tpep_pickup_datetime** | The date and time when the meter was engaged. |
| **tpep_dropoff_datetime** | The date and time when the meter was disengaged. |
| **passenger_count** | The number of passengers in the vehicle. |
| **trip_distance** | The elapsed trip distance in miles reported by the taximeter. |
| **RatecodeID** | The final rate code in effect at the end of the trip.<br>1 = Standard rate<br>2 = JFK<br>3 = Newark<br>4 = Nassau or Westchester<br>5 = Negotiated fare<br>6 = Group ride<br>99 = Null/unknown |
| **store_and_fwd_flag** | This flag indicates whether the trip record was held in vehicle memory before sending to the vendor, aka "store and forward," because the vehicle did not have a connection to the server.<br>Y = store and forward trip<br>N = not a store and forward trip |
| **PULocationID** | TLC Taxi Zone in which the taximeter was engaged. |
| **DOLocationID** | TLC Taxi Zone in which the taximeter was disengaged. |
| **payment_type** | A numeric code signifying how the passenger paid for the trip.<br>0 = Flex Fare trip<br>1 = Credit card<br>2 = Cash<br>3 = No charge<br>4 = Dispute<br>5 = Unknown<br>6 = Voided trip |
| **fare_amount** | The time-and-distance fare calculated by the meter. For additional information on the following columns, see https://www.nyc.gov/site/tlc/passengers/taxi-fare.page |
| **extra** | Miscellaneous extras and surcharges. |
| **mta_tax** | Tax that is automatically triggered based on the metered rate in use. |
| **tip_amount** | Tip amount â€“ This field is automatically populated for credit card tips. Cash tips are not included. |
| **tolls_amount** | Total amount of all tolls paid in trip. |
| **improvement_surcharge** | Improvement surcharge assessed trips at the flag drop. The improvement surcharge began being levied in 2015. |
| **total_amount** | The total amount charged to passengers. Does not include cash tips. |
| **congestion_surcharge** | Total amount collected in trip for NYS congestion surcharge. |
| **airport_fee** | For pick up only at LaGuardia and John F. Kennedy Airports. |
| **cbd_congestion_fee** | Per-trip charge for MTA's Congestion Relief Zone starting Jan. 5, 2025. |

In [4]:
df.describe()

Unnamed: 0,VendorID,tpep_pickup_datetime,tpep_dropoff_datetime,passenger_count,trip_distance,RatecodeID,PULocationID,DOLocationID,payment_type,fare_amount,extra,mta_tax,tip_amount,tolls_amount,improvement_surcharge,total_amount,congestion_surcharge,Airport_fee,cbd_congestion_fee
count,3898963.0,3898963,3898963,2860208.0,3898963.0,2860208.0,3898963.0,3898963.0,3898963.0,3898963.0,3898963.0,3898963.0,3898963.0,3898963.0,3898963.0,3898963.0,2860208.0,2860208.0,3898963.0
mean,1.887506,2025-07-17 04:53:36.198563,2025-07-17 05:10:42.166691,1.31591,7.104756,3.005781,159.3296,159.0559,0.9270137,18.54769,1.107153,0.4758367,2.686127,0.5004544,0.945986,26.82744,2.145727,0.1587607,0.5356467
min,1.0,2009-01-01 08:52:26,2009-01-01 10:00:26,0.0,0.0,1.0,1.0,1.0,0.0,-1591.3,-7.5,-0.5,-52.75,-112.84,-1.0,-1634.75,-2.5,-1.75,-0.75
25%,2.0,2025-07-09 20:58:59,2025-07-09 21:12:59,1.0,1.06,1.0,114.0,107.0,0.0,8.82,0.0,0.5,0.0,0.0,1.0,15.8,2.5,0.0,0.0
50%,2.0,2025-07-17 09:48:52,2025-07-17 10:07:44,1.0,1.91,1.0,161.0,161.0,1.0,14.2,0.0,0.5,2.0,0.0,1.0,21.42,2.5,0.0,0.75
75%,2.0,2025-07-24 19:51:14.500000,2025-07-24 20:08:03,1.0,3.9,1.0,230.0,231.0,1.0,23.3,2.5,0.5,3.85,0.0,1.0,30.78,2.5,0.0,0.75
max,7.0,2025-07-31 23:59:59,2025-08-03 17:03:02,9.0,397994.4,99.0,265.0,265.0,4.0,2495.0,15.0,5243.38,500.0,204.18,1.0,5297.87,2.5,6.75,1.5
std,0.7337028,,,0.7589974,670.0148,13.49034,66.24882,70.62472,0.7939307,20.07138,1.803865,2.659264,3.935143,2.132647,0.3001838,24.27461,1.008222,0.5558867,0.3599308


In [5]:
df['store_and_fwd_flag'].isna().sum()

np.int64(1038755)

1) We see that columns 'passenger_count', 'RatecodeID', 'congestion_subcharge', 'Airport_fee' and 'store_and_fwd_flag' contain about a million less values than others due to NaN values (about 25% less data).  
2) We see some problems with min and max values, such as passenger_count = 0, the year 2009 or negative tips.

Starting with the first problem. RateCodeID has a special value - 99 for Null values, so let's use it.

In [6]:
df['RatecodeID'] = df['RatecodeID'].fillna(99)

In [7]:
df.head(30)

Unnamed: 0,VendorID,tpep_pickup_datetime,tpep_dropoff_datetime,passenger_count,trip_distance,RatecodeID,store_and_fwd_flag,PULocationID,DOLocationID,payment_type,fare_amount,extra,mta_tax,tip_amount,tolls_amount,improvement_surcharge,total_amount,congestion_surcharge,Airport_fee,cbd_congestion_fee
0,1,2025-07-01 00:29:37,2025-07-01 00:45:30,1.0,7.3,1.0,N,138,74,1,29.6,7.75,0.5,9.0,6.94,1.0,54.79,0.0,1.75,0.0
1,1,2025-07-01 00:23:28,2025-07-01 01:07:44,1.0,17.7,2.0,N,132,142,1,70.0,4.25,0.5,5.0,0.0,1.0,80.75,2.5,1.75,0.0
2,2,2025-07-01 00:53:50,2025-07-01 01:27:12,1.0,9.98,1.0,N,138,48,1,43.6,6.0,0.5,10.87,0.0,1.0,66.97,2.5,1.75,0.75
3,2,2025-07-01 00:58:49,2025-07-01 01:15:55,1.0,10.27,1.0,N,138,229,1,38.7,6.0,0.5,14.1,6.94,1.0,72.24,2.5,1.75,0.75
4,2,2025-07-01 00:09:22,2025-07-01 00:23:54,1.0,2.94,1.0,N,211,97,1,17.0,1.0,0.5,3.0,0.0,1.0,25.75,2.5,0.0,0.75
5,1,2025-07-01 00:39:14,2025-07-01 00:55:21,1.0,11.8,1.0,N,132,155,1,44.3,1.0,0.5,14.05,0.0,1.0,60.85,0.0,0.0,0.0
6,2,2025-07-01 00:15:26,2025-07-01 00:29:39,1.0,3.87,1.0,N,79,263,1,17.7,1.0,0.5,4.69,0.0,1.0,28.14,2.5,0.0,0.75
7,2,2025-07-01 00:40:58,2025-07-01 00:44:15,1.0,0.85,1.0,N,140,262,1,5.8,1.0,0.5,2.16,0.0,1.0,12.96,2.5,0.0,0.0
8,2,2025-07-01 00:28:12,2025-07-01 00:39:49,2.0,2.54,1.0,N,114,50,1,14.2,1.0,0.5,3.99,0.0,1.0,23.94,2.5,0.0,0.75
9,2,2025-07-01 00:38:17,2025-07-01 00:55:44,1.0,6.37,1.0,N,132,197,1,26.8,1.0,0.5,5.86,0.0,1.0,36.91,0.0,1.75,0.0


Now we found something weird. If you look at rows 23-24, you can see practically the same trip, except columns referring to fares and fees are opposite.  We'll remember that, but for now let's focus on simpler things.

There were dates from 2009 in the table, but the data is supposed to be July 2025. Let's sort it out.

In [8]:
df.sort_values(by='tpep_pickup_datetime').head(10)

Unnamed: 0,VendorID,tpep_pickup_datetime,tpep_dropoff_datetime,passenger_count,trip_distance,RatecodeID,store_and_fwd_flag,PULocationID,DOLocationID,payment_type,fare_amount,extra,mta_tax,tip_amount,tolls_amount,improvement_surcharge,total_amount,congestion_surcharge,Airport_fee,cbd_congestion_fee
1328239,2,2009-01-01 08:52:26,2009-01-01 10:00:26,1.0,11.95,1.0,N,138,163,1,64.6,0.0,0.5,16.65,13.88,1.0,99.88,2.5,0.0,0.75
840,2,2025-06-30 22:15:24,2025-06-30 22:25:07,1.0,1.65,1.0,N,234,246,1,11.4,1.0,0.5,3.43,0.0,1.0,20.58,2.5,0.0,0.75
841,2,2025-06-30 22:39:17,2025-06-30 22:42:38,1.0,0.65,1.0,N,234,107,1,5.8,1.0,0.5,2.31,0.0,1.0,13.86,2.5,0.0,0.75
609,2,2025-06-30 23:55:06,2025-07-01 00:14:27,1.0,4.35,1.0,N,234,142,1,23.3,1.0,0.5,5.81,0.0,1.0,34.86,2.5,0.0,0.75
33,2,2025-06-30 23:56:19,2025-07-01 00:01:50,3.0,0.7,1.0,N,161,233,1,7.2,1.0,0.5,3.88,0.0,1.0,16.83,2.5,0.0,0.75
232,2,2025-06-30 23:57:41,2025-07-01 00:19:08,1.0,6.44,1.0,N,114,42,1,27.5,1.0,0.5,8.31,0.0,1.0,41.56,2.5,0.0,0.75
229,2,2025-06-30 23:59:59,2025-07-01 00:07:12,3.0,1.37,1.0,N,238,236,1,9.3,1.0,0.5,1.0,0.0,1.0,15.3,2.5,0.0,0.0
2860286,2,2025-07-01 00:00:00,2025-07-01 00:15:00,,2.89,99.0,,158,87,0,14.53,0.0,0.5,0.0,0.0,1.0,19.28,,,0.75
2860876,2,2025-07-01 00:00:00,2025-07-01 00:16:00,,3.8,99.0,,230,87,0,19.58,0.0,0.5,0.0,0.0,1.0,24.33,,,0.75
2860558,2,2025-07-01 00:00:02,2025-07-01 00:08:54,,1.7,99.0,,230,234,0,-4.75,0.0,0.5,0.0,0.0,1.0,2.22,,,0.75


In [9]:
df.sort_values(by='tpep_pickup_datetime').tail(10)

Unnamed: 0,VendorID,tpep_pickup_datetime,tpep_dropoff_datetime,passenger_count,trip_distance,RatecodeID,store_and_fwd_flag,PULocationID,DOLocationID,payment_type,fare_amount,extra,mta_tax,tip_amount,tolls_amount,improvement_surcharge,total_amount,congestion_surcharge,Airport_fee,cbd_congestion_fee
2857730,2,2025-07-31 23:59:54,2025-08-01 00:06:26,1.0,0.97,1.0,N,231,144,1,7.9,1.0,0.5,2.73,0.0,1.0,16.38,2.5,0.0,0.75
2857770,2,2025-07-31 23:59:54,2025-08-01 00:04:23,5.0,0.76,1.0,N,249,158,1,6.5,1.0,0.5,2.45,0.0,1.0,14.7,2.5,0.0,0.75
2857377,2,2025-07-31 23:59:55,2025-08-01 00:10:04,2.0,1.82,1.0,N,48,170,1,11.4,1.0,0.5,3.43,0.0,1.0,20.58,2.5,0.0,0.75
2860042,2,2025-07-31 23:59:56,2025-08-01 00:16:18,1.0,3.23,1.0,N,186,140,1,17.7,1.0,0.5,3.5,0.0,1.0,26.95,2.5,0.0,0.75
3898843,2,2025-07-31 23:59:57,2025-08-01 00:08:13,,1.76,99.0,,90,144,0,10.96,0.0,0.5,0.0,0.0,1.0,15.71,,,0.75
2858041,2,2025-07-31 23:59:57,2025-08-01 00:20:41,3.0,5.74,1.0,N,113,239,2,26.1,1.0,0.5,0.0,0.0,1.0,31.85,2.5,0.0,0.75
3898084,2,2025-07-31 23:59:57,2025-08-01 00:07:40,,1.72,99.0,,130,10,0,9.26,0.0,0.5,0.0,0.0,1.0,10.76,,,0.0
3897860,1,2025-07-31 23:59:58,2025-08-01 00:11:44,,2.6,99.0,,238,74,0,14.02,0.0,0.5,0.0,0.0,1.0,18.02,,,0.0
2859097,2,2025-07-31 23:59:58,2025-08-01 00:08:38,1.0,1.35,1.0,N,114,249,1,10.0,1.0,0.5,3.15,0.0,1.0,18.9,2.5,0.0,0.75
3897951,2,2025-07-31 23:59:59,2025-08-01 00:12:30,,0.98,99.0,,186,137,0,9.79,0.0,0.5,0.0,0.0,1.0,14.54,,,0.75


So there is only one ride that has the serious mistake. Let's remove it.

In [10]:
df.drop(1328239, inplace = True)

Analyzing the dropoff time

In [11]:
df.sort_values(by='tpep_dropoff_datetime').head()

Unnamed: 0,VendorID,tpep_pickup_datetime,tpep_dropoff_datetime,passenger_count,trip_distance,RatecodeID,store_and_fwd_flag,PULocationID,DOLocationID,payment_type,fare_amount,extra,mta_tax,tip_amount,tolls_amount,improvement_surcharge,total_amount,congestion_surcharge,Airport_fee,cbd_congestion_fee
840,2,2025-06-30 22:15:24,2025-06-30 22:25:07,1.0,1.65,1.0,N,234,246,1,11.4,1.0,0.5,3.43,0.0,1.0,20.58,2.5,0.0,0.75
841,2,2025-06-30 22:39:17,2025-06-30 22:42:38,1.0,0.65,1.0,N,234,107,1,5.8,1.0,0.5,2.31,0.0,1.0,13.86,2.5,0.0,0.75
1352,7,2025-07-01 00:00:40,2025-07-01 00:00:40,5.0,19.79,5.0,N,132,13,4,85.0,0.0,0.5,0.0,0.0,1.0,91.5,2.5,1.75,0.75
1377,7,2025-07-01 00:01:28,2025-07-01 00:01:28,4.0,0.51,1.0,N,100,230,1,5.8,0.0,0.5,2.31,0.0,1.0,13.86,2.5,0.0,0.75
33,2,2025-06-30 23:56:19,2025-07-01 00:01:50,3.0,0.7,1.0,N,161,233,1,7.2,1.0,0.5,3.88,0.0,1.0,16.83,2.5,0.0,0.75


In [12]:
df.sort_values(by='tpep_dropoff_datetime').tail()

Unnamed: 0,VendorID,tpep_pickup_datetime,tpep_dropoff_datetime,passenger_count,trip_distance,RatecodeID,store_and_fwd_flag,PULocationID,DOLocationID,payment_type,fare_amount,extra,mta_tax,tip_amount,tolls_amount,improvement_surcharge,total_amount,congestion_surcharge,Airport_fee,cbd_congestion_fee
2854755,2,2025-07-31 22:39:00,2025-08-01 21:15:50,1.0,20.13,2.0,N,132,143,1,70.0,0.0,0.5,12.52,6.94,1.0,95.96,2.5,1.75,0.75
2850393,2,2025-07-31 21:36:49,2025-08-01 21:28:03,2.0,2.3,1.0,N,161,113,1,14.9,1.0,0.5,0.0,0.0,1.0,20.65,2.5,0.0,0.75
2854160,2,2025-07-31 22:27:10,2025-08-01 21:59:56,1.0,1.81,1.0,N,107,158,1,12.1,1.0,0.5,3.57,0.0,1.0,21.42,2.5,0.0,0.75
2854416,2,2025-07-31 22:24:29,2025-08-01 22:24:04,2.0,2.65,1.0,N,79,230,1,18.4,1.0,0.5,4.83,0.0,1.0,28.98,2.5,0.0,0.75
2796403,1,2025-07-31 12:16:42,2025-08-03 17:03:02,1.0,0.0,99.0,N,222,222,1,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


The last ride took about 4 days, but that isn't too bad. Let's check date/time inconsistencies.

In [13]:
df[df['tpep_dropoff_datetime'] < df['tpep_pickup_datetime']]

Unnamed: 0,VendorID,tpep_pickup_datetime,tpep_dropoff_datetime,passenger_count,trip_distance,RatecodeID,store_and_fwd_flag,PULocationID,DOLocationID,payment_type,fare_amount,extra,mta_tax,tip_amount,tolls_amount,improvement_surcharge,total_amount,congestion_surcharge,Airport_fee,cbd_congestion_fee
275438,1,2025-07-04 09:30:00,2025-07-04 09:19:27,1.0,1.9,99.0,N,133,22,1,19.5,0.0,0.5,0.0,0.0,0.0,20.0,0.0,0.0,0.0


We got a time traveller. Removing it.

In [14]:
df.drop(275438, inplace = True)

Let's see if there were rides that didn't last a second.

In [15]:
no_time_rides = df[df['tpep_dropoff_datetime'] == df['tpep_pickup_datetime']]
no_time_rides

Unnamed: 0,VendorID,tpep_pickup_datetime,tpep_dropoff_datetime,passenger_count,trip_distance,RatecodeID,store_and_fwd_flag,PULocationID,DOLocationID,payment_type,fare_amount,extra,mta_tax,tip_amount,tolls_amount,improvement_surcharge,total_amount,congestion_surcharge,Airport_fee,cbd_congestion_fee
10,7,2025-07-01 00:44:56,2025-07-01 00:44:56,1.0,0.00,1.0,N,233,233,3,3.70,0.0,0.5,0.00,0.00,1.0,9.45,2.5,0.0,0.75
607,7,2025-07-01 00:02:38,2025-07-01 00:02:38,1.0,3.75,1.0,N,141,144,1,16.30,0.0,0.5,5.51,0.00,1.0,27.56,2.5,0.0,0.75
635,7,2025-07-01 00:30:33,2025-07-01 00:30:33,1.0,3.31,1.0,N,79,141,1,14.20,0.0,0.5,3.99,0.00,1.0,23.94,2.5,0.0,0.75
648,7,2025-07-01 00:46:07,2025-07-01 00:46:07,1.0,2.24,1.0,N,79,229,1,10.70,0.0,0.5,3.29,0.00,1.0,19.74,2.5,0.0,0.75
654,7,2025-07-01 00:44:27,2025-07-01 00:44:27,2.0,3.97,1.0,N,114,97,1,20.50,0.0,0.5,9.96,6.94,1.0,43.15,2.5,0.0,0.75
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
3810727,2,2025-07-29 19:10:33,2025-07-29 19:10:33,,0.97,99.0,,164,164,0,18.37,0.0,0.5,0.00,0.00,1.0,23.12,,,0.75
3825496,2,2025-07-30 07:33:00,2025-07-30 07:33:00,,0.02,99.0,,65,65,0,20.83,0.0,0.5,0.00,0.00,1.0,22.33,,,0.00
3827384,2,2025-07-30 08:10:00,2025-07-30 08:10:00,,0.05,99.0,,170,170,0,14.98,0.0,0.5,0.00,0.00,1.0,19.73,,,0.75
3841814,1,2025-07-30 18:21:24,2025-07-30 18:21:24,,0.00,99.0,,246,246,0,18.36,0.0,0.5,0.00,0.00,1.0,23.11,,,0.75


56063, that's not few. Maybe they were cancelled, but in that case the trip distance should be 0.

In [16]:
df = df[(df['tpep_dropoff_datetime'] != df['tpep_pickup_datetime']) | (df['trip_distance'] == 0)]

In [17]:
no_time_rides = df[df['tpep_dropoff_datetime'] == df['tpep_pickup_datetime']]
no_time_rides

Unnamed: 0,VendorID,tpep_pickup_datetime,tpep_dropoff_datetime,passenger_count,trip_distance,RatecodeID,store_and_fwd_flag,PULocationID,DOLocationID,payment_type,fare_amount,extra,mta_tax,tip_amount,tolls_amount,improvement_surcharge,total_amount,congestion_surcharge,Airport_fee,cbd_congestion_fee
10,7,2025-07-01 00:44:56,2025-07-01 00:44:56,1.0,0.0,1.0,N,233,233,3,3.70,0.0,0.5,0.00,0.00,1.0,9.45,2.5,0.0,0.75
5383,7,2025-07-01 06:44:33,2025-07-01 06:44:33,1.0,0.0,1.0,N,234,230,2,7.20,0.0,0.5,0.00,0.00,1.0,11.95,2.5,0.0,0.75
6883,7,2025-07-01 07:18:38,2025-07-01 07:18:38,1.0,0.0,1.0,N,170,186,2,7.90,0.0,0.5,0.00,0.00,1.0,12.65,2.5,0.0,0.75
10298,7,2025-07-01 08:00:30,2025-07-01 08:00:30,1.0,0.0,1.0,N,161,161,1,6.50,0.0,0.5,3.38,0.00,1.0,14.63,2.5,0.0,0.75
10299,7,2025-07-01 08:25:43,2025-07-01 08:25:43,1.0,0.0,1.0,N,163,161,1,6.50,0.0,0.5,3.64,6.94,1.0,21.83,2.5,0.0,0.75
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
3745860,2,2025-07-27 16:30:00,2025-07-27 16:30:00,,0.0,99.0,,41,41,0,13.64,0.0,0.5,0.00,0.00,1.0,15.14,,,0.00
3768919,2,2025-07-28 12:49:00,2025-07-28 12:49:00,,0.0,99.0,,113,113,0,4.64,0.0,0.5,0.00,0.00,1.0,9.39,,,0.75
3784920,2,2025-07-28 22:16:00,2025-07-28 22:16:00,,0.0,99.0,,158,158,0,-4.75,0.0,0.5,0.00,0.00,1.0,5.00,,,0.75
3788929,1,2025-07-29 06:56:17,2025-07-29 06:56:17,,0.0,99.0,,239,239,0,17.18,0.0,0.5,0.00,0.00,1.0,21.93,,,0.75


Now there are not actually that many of them, and they still have inconsistancies, such as different PU and DO locations, we can just remove them.

In [18]:
df = df[(df['tpep_dropoff_datetime'] != df['tpep_pickup_datetime'])]

In [19]:
df

Unnamed: 0,VendorID,tpep_pickup_datetime,tpep_dropoff_datetime,passenger_count,trip_distance,RatecodeID,store_and_fwd_flag,PULocationID,DOLocationID,payment_type,fare_amount,extra,mta_tax,tip_amount,tolls_amount,improvement_surcharge,total_amount,congestion_surcharge,Airport_fee,cbd_congestion_fee
0,1,2025-07-01 00:29:37,2025-07-01 00:45:30,1.0,7.30,1.0,N,138,74,1,29.60,7.75,0.5,9.00,6.94,1.0,54.79,0.0,1.75,0.00
1,1,2025-07-01 00:23:28,2025-07-01 01:07:44,1.0,17.70,2.0,N,132,142,1,70.00,4.25,0.5,5.00,0.00,1.0,80.75,2.5,1.75,0.00
2,2,2025-07-01 00:53:50,2025-07-01 01:27:12,1.0,9.98,1.0,N,138,48,1,43.60,6.00,0.5,10.87,0.00,1.0,66.97,2.5,1.75,0.75
3,2,2025-07-01 00:58:49,2025-07-01 01:15:55,1.0,10.27,1.0,N,138,229,1,38.70,6.00,0.5,14.10,6.94,1.0,72.24,2.5,1.75,0.75
4,2,2025-07-01 00:09:22,2025-07-01 00:23:54,1.0,2.94,1.0,N,211,97,1,17.00,1.00,0.5,3.00,0.00,1.0,25.75,2.5,0.00,0.75
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
3898958,2,2025-07-31 23:43:43,2025-07-31 23:47:49,,0.64,99.0,,100,170,0,6.39,0.00,0.5,0.00,0.00,1.0,11.14,,,0.75
3898959,2,2025-07-31 23:05:39,2025-07-31 23:34:07,,6.19,99.0,,249,265,0,-15.81,0.00,0.0,0.00,14.06,1.0,10.00,,,0.75
3898960,2,2025-07-31 23:08:32,2025-07-31 23:34:06,,7.80,99.0,,151,79,0,32.53,0.00,0.5,0.00,0.00,1.0,37.28,,,0.75
3898961,2,2025-07-31 23:13:56,2025-07-31 23:25:09,,2.09,99.0,,148,170,0,12.17,0.00,0.5,0.00,0.00,1.0,16.92,,,0.75
