Project Scenario
You are a data engineer at a data analytics consulting company. You have been assigned to a project that aims to de-congest the national highways by analyzing the road traffic data from different toll plazas. Each highway is operated by a different toll operator with different IT setup that use different file formats. In the first hands-on lab your job is to collect data available in different formas and, consolidate it into a single file.
As a vehicle passes a toll plaza, the vehicle's data like vehicle_id, vehicle_type, toll_plaza_id and timestamp are streamed to Kafka. In the second hands-on lab your job is to create a data pipeline that collects the streaming data and loads it into a database.