# Generacion de dataset

In [1]:
%load_ext autoreload
%autoreload 2


import data_extractor as de
import constants as c

de.generate_dataset(
    c.DATASET.YELLOW_TAXI.PATTERN,
    start_date="01-12-1959",
    end_date="01-10-2020",
    verbose=True
)

file yellow_tripdata_2023_01.parquet filtered out because 1959-12-01 > 2023-01-01 or  2023-01-01 > 2020-10-01
file yellow_tripdata_2023_02.parquet filtered out because 1959-12-01 > 2023-02-01 or  2023-02-01 > 2020-10-01
file yellow_tripdata_2023_03.parquet filtered out because 1959-12-01 > 2023-03-01 or  2023-03-01 > 2020-10-01
file yellow_tripdata_2023_04.parquet filtered out because 1959-12-01 > 2023-04-01 or  2023-04-01 > 2020-10-01
file yellow_tripdata_2023_05.parquet filtered out because 1959-12-01 > 2023-05-01 or  2023-05-01 > 2020-10-01
file yellow_tripdata_2023_06.parquet filtered out because 1959-12-01 > 2023-06-01 or  2023-06-01 > 2020-10-01
file yellow_tripdata_2023_07.parquet filtered out because 1959-12-01 > 2023-07-01 or  2023-07-01 > 2020-10-01
file yellow_tripdata_2023_08.parquet filtered out because 1959-12-01 > 2023-08-01 or  2023-08-01 > 2020-10-01
file yellow_tripdata_2023_09.parquet filtered out because 1959-12-01 > 2023-09-01 or  2023-09-01 > 2020-10-01
file yello

Downloading data in parallel using 12 threads to build yellow_tripdata dataset: 0it [00:00, ?it/s]


# Transformacion de datos



| Columna                | Descripción                                                                                                                                                           |
|------------------------|-----------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| VendorID               | Un código que indica el proveedor TPEP que proporcionó el registro. 1= Creative Mobile Technologies, LLC; 2= VeriFone Inc.                                            |
| tpep_pickup_datetime   | Fecha y hora en que se activó el taxímetro.                                                                                                                            |
| tpep_dropoff_datetime  | Fecha y hora en que se desactivó el taxímetro.                                                                                                                          |
| Passenger_count        | Número de pasajeros en el vehículo. Valor ingresado por el conductor.                                                                                                  |
| Trip_distance          | Distancia recorrida en millas, reportada por el taxímetro.                                                                                                              |
| PULocationID           | Identificación de la Zona de Taxis TLC donde se activó el taxímetro.                                                                                                    |
| DOLocationID           | Identificación de la Zona de Taxis TLC donde se desactivó el taxímetro.                                                                                                  |
| RateCodeID             | Código tarifario final al final del viaje. 1= Tarifa estándar, 2= JFK, 3= Newark, 4= Nassau o Westchester, 5= Tarifa negociada, 6= Viaje en grupo.                  |
| Store_and_fwd_flag     | Indica si el registro del viaje se almacenó en la memoria del vehículo antes de ser enviado al proveedor ("store and forward"), por falta de conexión al servidor. |
| Payment_type           | Código numérico que indica cómo pagó el pasajero. 1= Tarjeta de crédito, 2= Efectivo, 3= Sin cargo, 4= Disputa, 5= Desconocido, 6= Viaje anulado.                  |
| Fare_amount            | Tarifa calculada por tiempo y distancia según el taxímetro.                                                                                                             |
| Extra                  | Extras y recargos varios. Actualmente incluye recargos por hora pico y noche ($0.50 y $1).                                                                              |
| MTA_tax                | Impuesto MTA de $0.50 que se activa automáticamente según la tarifa del taxímetro en uso.                                                                             |
| Improvement_surcharge  | Recargo de mejora de $0.30 para viajes que comenzaron a aplicarse en 2015.                                                                                              |
| Tip_amount             | Monto de propina. Este campo se completa automáticamente para propinas con tarjeta de crédito. Las propinas en efectivo no están incluidas.                           |
| Tolls_amount           | Total de todos los peajes pagados en el viaje.                                                                                                                           |
| Total_amount           | Total cobrado a los pasajeros. No incluye propinas en efectivo.                                                                                                          |
| Congestion_Surcharge   | Total recaudado en el viaje por el recargo de congestión del estado de Nueva York.                                                                                        |
| Airport_fee            | Tarifa de $1.25 solo para recogidas en los aeropuertos LaGuardia y John F. Kennedy.                                                                                     |



------------------


| Columna                | Tipo de PySpark    | Justificación                                                                                           |
|------------------------|--------------------|---------------------------------------------------------------------------------------------------------|
| VendorID               | IntegerType        | Representa un código numérico para identificar el proveedor.                                              |
| tpep_pickup_datetime   | TimestampType      | Almacena fechas y horas, útil para análisis de tiempo y series temporales.                                |
| tpep_dropoff_datetime  | TimestampType      | Similar a 'tpep_pickup_datetime', almacena fechas y horas para el final del viaje.                       |
| Passenger_count        | IntegerType        | Número de pasajeros debe ser un valor entero.                                                             |
| Trip_distance          | FloatType          | La distancia del viaje puede ser un valor decimal.                                                         |
| PULocationID           | IntegerType        | Identificación de la ubicación de recogida, representada por un número entero.                            |
| DOLocationID           | IntegerType        | Identificación de la ubicación de bajada, similar a 'PULocationID'.                                       |
| RateCodeID             | IntegerType        | Representa el código de tarifa como un valor numérico.                                                     |
| Store_and_fwd_flag     | StringType         | Almacena valores 'Y' o 'N', adecuado para una cadena de caracteres (string).                             |
| Payment_type           | IntegerType        | Códigos numéricos que indican diferentes métodos de pago.                                                  |
| Fare_amount            | FloatType          | La tarifa puede ser un valor decimal, por lo tanto, se elige un tipo de dato flotante.                   |
| Extra                  | FloatType          | Similar a 'Fare_amount', también se considera un valor decimal.                                           |
| MTA_tax                | FloatType          | Impuesto MTA puede ser un valor decimal.                                                                   |
| Improvement_surcharge  | FloatType          | Recargo de mejora, otro valor decimal.                                                                     |
| Tip_amount             | FloatType          | La propina puede ser un valor decimal.                                                                     |
| Tolls_amount           | FloatType          | El total de peajes es un valor decimal.                                                                    |
| Total_amount           | FloatType          | El total cobrado a los pasajeros es un valor decimal.                                                       |
| Congestion_Surcharge   | FloatType          | El recargo de congestión es un valor decimal.                                                               |
| Airport_fee            | FloatType          | La tarifa de aeropuerto puede ser un valor decimal.                                                         |


In [1]:
%load_ext autoreload
%autoreload 2


from data_transform import DataTransformer
import constants as c

dt = DataTransformer()

df = dt.load_data(path_parquet="../datasets/yellow_tripdata/2020/01.parquet")

# enforce schema
df = dt.enforce_schema(df, c.DATASET.YELLOW_TAXI.SQUEMA, verbose=True)
df.printSchema()

Column 'vendorid' has type 'LongType()' in DataFrame, but type 'IntegerType()' in schema.
Column 'tpep_pickup_datetime' has type 'TimestampNTZType()' in DataFrame, but type 'TimestampType()' in schema.
Column 'tpep_dropoff_datetime' has type 'TimestampNTZType()' in DataFrame, but type 'TimestampType()' in schema.
Column 'passenger_count' has type 'DoubleType()' in DataFrame, but type 'IntegerType()' in schema.
Column 'trip_distance' has type 'DoubleType()' in DataFrame, but type 'FloatType()' in schema.
Column 'ratecodeid' has type 'DoubleType()' in DataFrame, but type 'IntegerType()' in schema.
Column 'pulocationid' has type 'LongType()' in DataFrame, but type 'IntegerType()' in schema.
Column 'dolocationid' has type 'LongType()' in DataFrame, but type 'IntegerType()' in schema.
Column 'payment_type' has type 'LongType()' in DataFrame, but type 'IntegerType()' in schema.
Column 'fare_amount' has type 'DoubleType()' in DataFrame, but type 'FloatType()' in schema.
Column 'extra' has typ

In [3]:
# Mostrar el resumen de valores nulos por columna en forma de tabla
dt.show_nulls_per_column(df)

vendorid: 0 | 0.0%
tpep_pickup_datetime: 0 | 0.0%
tpep_dropoff_datetime: 0 | 0.0%
passenger_count: 65441 | 1.022%
trip_distance: 0 | 0.0%
ratecodeid: 65441 | 1.022%
store_and_fwd_flag: 65441 | 1.022%
pulocationid: 0 | 0.0%
dolocationid: 0 | 0.0%
payment_type: 0 | 0.0%
fare_amount: 0 | 0.0%
extra: 0 | 0.0%
mta_tax: 0 | 0.0%
tip_amount: 0 | 0.0%
tolls_amount: 0 | 0.0%
improvement_surcharge: 0 | 0.0%
total_amount: 0 | 0.0%
congestion_surcharge: 65441 | 1.022%
airport_fee: 6405008 | 100.0%


Eliminado de aquellas columnas con mas de un 30% de nulos

In [4]:
df = dt.drop_null_columns(df, threshold_percentage=30)
df.show(5)

Deleting columns that exceed 30% of null values
Column airport_fee marked to remove, 100.0% of null values found
+--------+--------------------+---------------------+---------------+-------------+----------+------------------+------------+------------+------------+-----------+-----+-------+----------+------------+---------------------+------------+--------------------+
|vendorid|tpep_pickup_datetime|tpep_dropoff_datetime|passenger_count|trip_distance|ratecodeid|store_and_fwd_flag|pulocationid|dolocationid|payment_type|fare_amount|extra|mta_tax|tip_amount|tolls_amount|improvement_surcharge|total_amount|congestion_surcharge|
+--------+--------------------+---------------------+---------------+-------------+----------+------------------+------------+------------+------------+-----------+-----+-------+----------+------------+---------------------+------------+--------------------+
|       1| 2020-01-01 00:28:15|  2020-01-01 00:33:03|              1|          1.2|         1|                

Total_amount should be "The total amount charged to passengers. Does not include cash tips.". So I consider it as the sum of the cols:
* Fare_amount: The total amount charged to passengers. Does not include cash tips.
* Extra: Miscellaneous extras and surcharges. Currently, this only includes the $0.50 and $1 rush hour and overnight charges.
* Improvement_surcharge: $0.30 improvement surcharge assessed trips at the flag drop. The improvement surcharge began being levied in 2015.
* Tip_amount: Tip amount – This field is automatically populated for credit card tips. Cash tips are not included.
* Tolls_amount: Total amount of all tolls paid in trip.
* Congestion_Surcharge: Total amount collected in trip for NYS congestion surcharge.
* Airport_fee: $1.25 for pick up only at LaGuardia and John F. Kennedy Airports

So:

```total_amount = Fare_amount + Extra + MTA_tax + Improvement_surcharge + Tip_amount + Tolls_amount + Congestion_Surcharge + Airport_fee```

But, since Airport_fee has been removed because all its values were null, then the formula looks like this:


```total_amount = Fare_amount + Extra + MTA_tax + Improvement_surcharge + Tip_amount + Tolls_amount + Congestion_Surcharge```


In [5]:
# check wheter total_amount is correctly computed
cols_to_sum = ["fare_amount", "extra", "mta_tax", "tip_amount", "tolls_amount", "improvement_surcharge", "congestion_surcharge"]
df = dt.create_sum_col(df, cols_to_sum)
df.show(5)

+--------+--------------------+---------------------+---------------+-------------+----------+------------------+------------+------------+------------+-----------+-----+-------+----------+------------+---------------------+------------+--------------------+---------------------+
|vendorid|tpep_pickup_datetime|tpep_dropoff_datetime|passenger_count|trip_distance|ratecodeid|store_and_fwd_flag|pulocationid|dolocationid|payment_type|fare_amount|extra|mta_tax|tip_amount|tolls_amount|improvement_surcharge|total_amount|congestion_surcharge|total_amount_computed|
+--------+--------------------+---------------------+---------------+-------------+----------+------------------+------------+------------+------------+-----------+-----+-------+----------+------------+---------------------+------------+--------------------+---------------------+
|       1| 2020-01-01 00:28:15|  2020-01-01 00:33:03|              1|          1.2|         1|                 N|         238|         239|           1|     

As we can see the values in the total_amount column do not always match the total_amount_computed column, so what we are going to do is to remove total_amount and replace it with total_amount_computed, renaming the latter to total_amount.
This choice may not be the most appropriate depending on the circumstances. For example, if you are going to make an extra charge to customers in such a way that this causes a decrease in demand for taxis.

To solve this problem correctly, one would have to consider how total_amount was initially calculated and find the error, in order to provide a consistent solution. However, given that we do not have that information, I consider this to be a possible option.

In [6]:
df = dt.replace_column(df, col_to_replace="total_amount", replace_with="total_amount_computed")
df.show(5)

+--------+--------------------+---------------------+---------------+-------------+----------+------------------+------------+------------+------------+-----------+-----+-------+----------+------------+---------------------+--------------------+------------+
|vendorid|tpep_pickup_datetime|tpep_dropoff_datetime|passenger_count|trip_distance|ratecodeid|store_and_fwd_flag|pulocationid|dolocationid|payment_type|fare_amount|extra|mta_tax|tip_amount|tolls_amount|improvement_surcharge|congestion_surcharge|total_amount|
+--------+--------------------+---------------------+---------------+-------------+----------+------------------+------------+------------+------------+-----------+-----+-------+----------+------------+---------------------+--------------------+------------+
|       1| 2020-01-01 00:28:15|  2020-01-01 00:33:03|              1|          1.2|         1|                 N|         238|         239|           1|        6.0|  3.0|    0.5|      1.47|         0.0|                  0.3

# DEBUG TESTS

In [21]:
import os
import sys

# add path to src
test_dir = "../test/"
if test_dir not in sys.path:
    sys.path.append(test_dir)

import test_data_extractor as tde

tde.test_convert_str_date_to_date_valid_date()

In [2]:
import os
import sys

# add path to src
test_dir = "../test/"
if test_dir not in sys.path:
    sys.path.append(test_dir)

import test_data_transform as tdt

tdt.test_cast_column()

Py4JJavaError: An error occurred while calling o274.count.
: org.apache.spark.SparkException: Job aborted due to stage failure: Task 9 in stage 1.0 failed 1 times, most recent failure: Lost task 9.0 in stage 1.0 (TID 10) (192.168.1.21 executor driver): org.apache.spark.SparkException: Python worker failed to connect back.
	at org.apache.spark.api.python.PythonWorkerFactory.createSimpleWorker(PythonWorkerFactory.scala:203)
	at org.apache.spark.api.python.PythonWorkerFactory.create(PythonWorkerFactory.scala:109)
	at org.apache.spark.SparkEnv.createPythonWorker(SparkEnv.scala:124)
	at org.apache.spark.api.python.BasePythonRunner.compute(PythonRunner.scala:174)
	at org.apache.spark.api.python.PythonRDD.compute(PythonRDD.scala:67)
	at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:364)
	at org.apache.spark.rdd.RDD.iterator(RDD.scala:328)
	at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
	at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:364)
	at org.apache.spark.rdd.RDD.iterator(RDD.scala:328)
	at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
	at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:364)
	at org.apache.spark.rdd.RDD.iterator(RDD.scala:328)
	at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
	at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:364)
	at org.apache.spark.rdd.RDD.iterator(RDD.scala:328)
	at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
	at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:364)
	at org.apache.spark.rdd.RDD.iterator(RDD.scala:328)
	at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
	at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:364)
	at org.apache.spark.rdd.RDD.iterator(RDD.scala:328)
	at org.apache.spark.shuffle.ShuffleWriteProcessor.write(ShuffleWriteProcessor.scala:59)
	at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:104)
	at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:54)
	at org.apache.spark.TaskContext.runTaskWithListeners(TaskContext.scala:161)
	at org.apache.spark.scheduler.Task.run(Task.scala:141)
	at org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$4(Executor.scala:620)
	at org.apache.spark.util.SparkErrorUtils.tryWithSafeFinally(SparkErrorUtils.scala:64)
	at org.apache.spark.util.SparkErrorUtils.tryWithSafeFinally$(SparkErrorUtils.scala:61)
	at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:94)
	at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:623)
	at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1144)
	at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:642)
	at java.base/java.lang.Thread.run(Thread.java:1583)
Caused by: java.net.SocketTimeoutException: Accept timed out
	at java.base/sun.nio.ch.NioSocketImpl.timedAccept(NioSocketImpl.java:701)
	at java.base/sun.nio.ch.NioSocketImpl.accept(NioSocketImpl.java:745)
	at java.base/java.net.ServerSocket.implAccept(ServerSocket.java:698)
	at java.base/java.net.ServerSocket.platformImplAccept(ServerSocket.java:663)
	at java.base/java.net.ServerSocket.implAccept(ServerSocket.java:639)
	at java.base/java.net.ServerSocket.implAccept(ServerSocket.java:585)
	at java.base/java.net.ServerSocket.accept(ServerSocket.java:543)
	at org.apache.spark.api.python.PythonWorkerFactory.createSimpleWorker(PythonWorkerFactory.scala:190)
	... 34 more

Driver stacktrace:
	at org.apache.spark.scheduler.DAGScheduler.failJobAndIndependentStages(DAGScheduler.scala:2844)
	at org.apache.spark.scheduler.DAGScheduler.$anonfun$abortStage$2(DAGScheduler.scala:2780)
	at org.apache.spark.scheduler.DAGScheduler.$anonfun$abortStage$2$adapted(DAGScheduler.scala:2779)
	at scala.collection.mutable.ResizableArray.foreach(ResizableArray.scala:62)
	at scala.collection.mutable.ResizableArray.foreach$(ResizableArray.scala:55)
	at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:49)
	at org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:2779)
	at org.apache.spark.scheduler.DAGScheduler.$anonfun$handleTaskSetFailed$1(DAGScheduler.scala:1242)
	at org.apache.spark.scheduler.DAGScheduler.$anonfun$handleTaskSetFailed$1$adapted(DAGScheduler.scala:1242)
	at scala.Option.foreach(Option.scala:407)
	at org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:1242)
	at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(DAGScheduler.scala:3048)
	at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:2982)
	at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:2971)
	at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:49)
Caused by: org.apache.spark.SparkException: Python worker failed to connect back.
	at org.apache.spark.api.python.PythonWorkerFactory.createSimpleWorker(PythonWorkerFactory.scala:203)
	at org.apache.spark.api.python.PythonWorkerFactory.create(PythonWorkerFactory.scala:109)
	at org.apache.spark.SparkEnv.createPythonWorker(SparkEnv.scala:124)
	at org.apache.spark.api.python.BasePythonRunner.compute(PythonRunner.scala:174)
	at org.apache.spark.api.python.PythonRDD.compute(PythonRDD.scala:67)
	at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:364)
	at org.apache.spark.rdd.RDD.iterator(RDD.scala:328)
	at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
	at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:364)
	at org.apache.spark.rdd.RDD.iterator(RDD.scala:328)
	at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
	at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:364)
	at org.apache.spark.rdd.RDD.iterator(RDD.scala:328)
	at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
	at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:364)
	at org.apache.spark.rdd.RDD.iterator(RDD.scala:328)
	at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
	at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:364)
	at org.apache.spark.rdd.RDD.iterator(RDD.scala:328)
	at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
	at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:364)
	at org.apache.spark.rdd.RDD.iterator(RDD.scala:328)
	at org.apache.spark.shuffle.ShuffleWriteProcessor.write(ShuffleWriteProcessor.scala:59)
	at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:104)
	at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:54)
	at org.apache.spark.TaskContext.runTaskWithListeners(TaskContext.scala:161)
	at org.apache.spark.scheduler.Task.run(Task.scala:141)
	at org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$4(Executor.scala:620)
	at org.apache.spark.util.SparkErrorUtils.tryWithSafeFinally(SparkErrorUtils.scala:64)
	at org.apache.spark.util.SparkErrorUtils.tryWithSafeFinally$(SparkErrorUtils.scala:61)
	at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:94)
	at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:623)
	at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1144)
	at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:642)
	at java.base/java.lang.Thread.run(Thread.java:1583)
Caused by: java.net.SocketTimeoutException: Accept timed out
	at java.base/sun.nio.ch.NioSocketImpl.timedAccept(NioSocketImpl.java:701)
	at java.base/sun.nio.ch.NioSocketImpl.accept(NioSocketImpl.java:745)
	at java.base/java.net.ServerSocket.implAccept(ServerSocket.java:698)
	at java.base/java.net.ServerSocket.platformImplAccept(ServerSocket.java:663)
	at java.base/java.net.ServerSocket.implAccept(ServerSocket.java:639)
	at java.base/java.net.ServerSocket.implAccept(ServerSocket.java:585)
	at java.base/java.net.ServerSocket.accept(ServerSocket.java:543)
	at org.apache.spark.api.python.PythonWorkerFactory.createSimpleWorker(PythonWorkerFactory.scala:190)
	... 34 more
