# Prepare environment

In this notebook we will be setting up the environment necessary for the rest of the demo.

<hr>
#### Inject the spark-avro package needed to aggregate IPinYou Avro data with Spark

In [None]:
echo "spark.jars                          /cnvr/spark-avro/spark-avro_2.11-0.1.jar" \
    >> ${SPARK_HOME}/conf/spark-defaults.conf

In [None]:
cat ${SPARK_HOME}/conf/spark-defaults.conf

<hr>
#### Prepare Kafka Connect to talk with Elasticsearch

In [None]:
curl -X POST -H "Content-Type: application/json" \
     --data '{"name" : "elasticsearch-sink", "config" : {"connector.class" : "io.confluent.connect.elasticsearch.ElasticsearchSinkConnector", "tasks.max" : "1", "topics" : "ipinyou-agg" , "key.ignore" : "true", "connection.url" : "http://elasticsearch:9200", "type.name" : "kafka-connect", "name" : "elasticsearch-sink"}}' \
     connect:8083/connectors

Verify the elasticsearch connector has been configured here:

[Kafka Connectors](http://localhost:8083/connectors)

<hr>
#### Add explicit field mappings to **ipinyou-agg** index

The Kafka Elasticsearch Connector automatically creates a Kafka topic as well as an elasticsearch index named <b>ipinyou-agg</b>, however the index does not contain our desired field mappings.

In [None]:
curl -X GET 'elasticsearch:9200/ipinyou-agg?pretty'

We will add explicit mappings for 3 fields:

* ad_exchange (integer)
* count (integer)
* ten_second_floor (date in yyyy-MM-dd HH:mm:ss format)

In [None]:
curl -XPUT 'elasticsearch:9200/ipinyou-agg/_mapping/kafka-connect?pretty' -H 'Content-Type: application/json' -d '{
    "properties" : {
        "ad_exchange" : { "type" : "integer" },
        "count" : { "type" : "integer" },
        "ten_second_floor" : {
            "type" : "date",
            "format" : "yyyy-MM-dd HH:mm:ss"
        }
    }
}'

In [None]:
curl 'elasticsearch:9200/_cat/indices?v'

<hr>
#### [2. Create <b>ipinyou</b> Kafka topic](2. Create Topic.ipynb)