# Producing IPinYou data to Kafka

In this notebook we'll be producing raw IPinYou data our <b>ipinyou</b> Kafka topic in order to similuate a live stream of data on which to perform our aggregation.

#### Import packages

In [None]:
%ShowTypes on

In [None]:
import org.apache.avro.Schema
import org.apache.avro.generic.GenericData
import org.apache.avro.generic.GenericRecord
import org.apache.kafka.clients.producer.KafkaProducer
import org.apache.kafka.clients.producer.ProducerConfig
import org.apache.kafka.clients.producer.ProducerRecord
import io.confluent.kafka.serializers.{KafkaAvroDecoder, KafkaAvroSerializer}
import java.util.Properties
import java.io.File
import org.apache.avro.generic.{GenericDatumReader, GenericRecord, GenericRecordBuilder, GenericData}
import org.apache.avro.file.DataFileReader

#### Create Kafka Producer

In [None]:
val props = new Properties()
props.put(ProducerConfig.BOOTSTRAP_SERVERS_CONFIG, "broker:9092")
props.put(ProducerConfig.KEY_SERIALIZER_CLASS_CONFIG, classOf[KafkaAvroSerializer])
props.put(ProducerConfig.VALUE_SERIALIZER_CLASS_CONFIG, classOf[KafkaAvroSerializer])
props.put("schema.registry.url", "http://schema_registry:8081")
val producer = new KafkaProducer[String, GenericRecord](props)

#### Read in IPinYou data from avro file

In [None]:
val file = new File("ipinyou_hour.avro")

In [None]:
val rawSchema = new org.apache.avro.Schema.Parser().parse("{\"type\":\"record\",\"name\":\"Ipinyou\",\"namespace\":\"com.conversantmedia.cake.avro\",\"doc\":\"Action\",\"fields\":[{\"name\":\"bid_id\",\"type\":[\"null\",\"string\"]},{\"name\":\"timestamp\",\"type\":[\"null\",\"long\"]},{\"name\":\"log_type\",\"type\":[\"null\",\"int\"]},{\"name\":\"ipinyou_id\",\"type\":[\"null\",\"string\"]},{\"name\":\"user_agent\",\"type\":[\"null\",\"string\"]},{\"name\":\"ip_address\",\"type\":[\"null\",\"string\"]},{\"name\":\"region\",\"type\":[\"null\",\"int\"]},{\"name\":\"city\",\"type\":[\"null\",\"int\"]},{\"name\":\"ad_exchange\",\"type\":[\"null\",\"int\"]},{\"name\":\"domain\",\"type\":[\"null\",\"string\"]},{\"name\":\"url\",\"type\":[\"null\",\"string\"]},{\"name\":\"anonymous_url_id\",\"type\":[\"null\",\"string\"]},{\"name\":\"ad_slot_id\",\"type\":[\"null\",\"string\"]},{\"name\":\"ad_slot_width\",\"type\":[\"null\",\"int\"]},{\"name\":\"ad_slot_height\",\"type\":[\"null\",\"int\"]},{\"name\":\"ad_slot_visibility\",\"type\":[\"null\",\"string\"]},{\"name\":\"ad_slot_format\",\"type\":[\"null\",\"string\"]},{\"name\":\"ad_slot_floor_price\",\"type\":[\"null\",\"int\"]},{\"name\":\"creative_id\",\"type\":[\"null\",\"string\"]},{\"name\":\"bidding_price\",\"type\":[\"null\",\"int\"]},{\"name\":\"paying_price\",\"type\":[\"null\",\"int\"]},{\"name\":\"landing_page_url\",\"type\":[\"null\",\"string\"]},{\"name\":\"advertiser_id\",\"type\":[\"null\",\"int\"]},{\"name\":\"user_tags\",\"type\":[\"null\",\"string\"]}],\"schemaId\":\"2\"}")

In [None]:
val datumReader = new GenericDatumReader[GenericRecord](rawSchema)
val dataFileReader = new DataFileReader[GenericRecord](file, datumReader)

#### Fast-forward each record's timestamp to simulate current events and produce to <b>ipinyou</b> Kafka topic

In [None]:
val currentTime = System.currentTimeMillis

In [None]:
while (dataFileReader.hasNext) {
    val minTimestamp = 1382752800000L // earliest timestamp found in avro data file
    val record = dataFileReader.next
    val recordTimestamp = record.get("timestamp").asInstanceOf[Long] * 1000
    val newRecordTimestamp = (recordTimestamp - minTimestamp) + currentTime
    record.put(1, newRecordTimestamp)
    
    val waitTime = newRecordTimestamp - System.currentTimeMillis
    val producerRecord = new ProducerRecord[String, GenericRecord]("ipinyou", record)
    if (waitTime > 0) Thread.sleep(waitTime)
    
    producer.send(producerRecord)
}

Now we are producing a steady stream of data our <b>ipinyou</b> Kafka topic that will continue for the next hour.

Notice that there is now an [**ipinyou** schema in Schema Registry.](http://localhost:8081/subjects/ipinyou-value/versions/latest)
<hr>
#### [4. Aggregating iPinYou Data](4. ipinyou_agg.ipynb)