# Enriching data by joining to LOOKUP tables
<!--
  ~ Licensed to the Apache Software Foundation (ASF) under one
  ~ or more contributor license agreements.  See the NOTICE file
  ~ distributed with this work for additional information
  ~ regarding copyright ownership.  The ASF licenses this file
  ~ to you under the Apache License, Version 2.0 (the
  ~ "License"); you may not use this file except in compliance
  ~ with the License.  You may obtain a copy of the License at
  ~
  ~   http://www.apache.org/licenses/LICENSE-2.0
  ~
  ~ Unless required by applicable law or agreed to in writing,
  ~ software distributed under the License is distributed on an
  ~ "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
  ~ KIND, either express or implied.  See the License for the
  ~ specific language governing permissions and limitations
  ~ under the License.
  -->

Introductory paragraph - for example:

This tutorial demonstrates how to work with [feature](link to feature doc). In this tutorial you perform the following tasks:

- Task 1
- Task 2
- Task 3
- etc



## Table of contents

- [Prerequisites](#Prerequisites)
- [Initalization](#Initalization)
- [Next section](#Nextsection)
- etc

## Prerequisites

This tutorial works with Druid XX.0.0 or later.

#### Run with Docker

<!-- Profiles are:
`druid-jupyter` - just Jupyter and Druid
`all-services` - includes Jupyter, Druid, and Kafka
 -->

Launch this tutorial and all prerequisites using the ....... profile of the Docker Compose file for Jupyter-based Druid tutorials. For more information, see [Docker for Jupyter Notebook tutorials](https://druid.apache.org/docs/latest/tutorials/tutorial-jupyter-docker.html).
   
#### Run without Docker

If you do not use the Docker Compose environment, you need the following:

* A running Apache Druid instance, with a `DRUID_HOST` local environment variable containing the server name of your Druid router
* [druidapi](https://github.com/apache/druid/blob/master/examples/quickstart/jupyter-notebooks/druidapi/README.md), a Python client for Apache Druid. Follow the instructions in the Install section of the README file.

 <!-- Remove as needed -->
* A running Apache Kafka instance, with a `KAFKA_HOST` local environment variable containing the broker server name.
* [matplotlib](https://matplotlib.org/), a library for creating visualizations in Python.
* [pandas](https://pandas.pydata.org/), a data analysis and manipulation tool.

### Initialization

Run the next cell to set up the Druid Python client's connection to Apache Druid.

If successful, the Druid version number will be shown in the output.

In [2]:
import druidapi
import os

if 'DRUID_HOST' not in os.environ.keys():
    druid_host=f"http://localhost:8888"
else:
    druid_host=f"http://{os.environ['DRUID_HOST']}:8888"
    
print(f"Opening a connection to {druid_host}.")
druid = druidapi.jupyter_client(druid_host)

display = druid.display
sql_client = druid.sql
status_client = druid.status

status_client.version

Opening a connection to http://router:8888.


'27.0.0-SNAPSHOT'

<!-- Include these cells if you're relying on someone ingesting example data through the console -->

### Load example data

Once your Druid environment is up and running, ingest the sample data for this tutorial.

Run the following cell to create a table called `example-dataset-notebook`. Notice {the use of X as a timestamp | only required columns are ingested | WHERE / expressions / GROUP BY are front-loaded | partitions on X period and clusters by Y}.

When completed, you'll see a description of the final table.

<!--

Replace `example-dataset-notebook` with a unique table name for this notebook.

- Always prefix your table name with `example-`
- If using the standard example datasets, use the following standard values for `dataset`:

    wikipedia       wikipedia
    koalas          KoalasToTheMax one day
    koalanest       KoalasToTheMax one day (nested)
    nyctaxi3        NYC Taxi cabs (3 files)
    nyctaxi         NYC Taxi cabs (all files)
    flights         FlightCarrierOnTime (1 month)

-->

Monitor the ingestion task process in the Druid console.

In [None]:
# Replace `example-dataset-notebook` with your table name here.
# Remember to apply good data modelling practice to your INSERT / REPLACE.

sql='''
'''

sql_client.run_task(sql)
sql_client.wait_until_ready('example-dataset-notebook')
display.table('example-dataset-notebook')

Run the following cell to import additional Python modules that you will use to X, Y, Z.

In [12]:
# Add your modules here, remembering to align this with the prerequisites section

import json
import matplotlib
import matplotlib.pyplot as plt
import pandas as pd

In [None]:
To initiate the LOOKUPs service, run the following post:

In [20]:
rest_client = druid.rest

post_request='''
{
  "version": "v1",
  "lookupExtractorFactory": {
    "type": "map",
    "map": {
      "847632": "Internal Use Only"
    }
  }
}
'''

rest_client = druid.rest
rest_client.post_json('/druid/coordinator/v1/lookups/config', json.dumps(post_request))

ClientError: Cannot construct instance of `java.util.LinkedHashMap` (although at least one Creator exists): no String-argument constructor/factory method to deserialize from String value ('"\n{\n  \"version\": \"v1\",\n  \"lookupExtractorFactory\": {\n    \"type\": \"map\",\n    \"map\": {\n      \"847632\": \"Internal Use Only\"\n    }\n  }\n}\n"')
 at [Source: (org.eclipse.jetty.server.HttpInputOverHTTP); line: 1, column: 1]

## Awesome!

The main body of your notebook goes here!

### This is a step

Here things get done

### And so is this!

Wow! Awesome!

ClientError: Cannot construct instance of `java.util.LinkedHashMap` (although at least one Creator exists): no String-argument constructor/factory method to deserialize from String value ('"\n{\n  \"__default\": {\n    \"country_code\": {\n      \"version\": \"v0\",\n      \"lookupExtractorFactory\": {\n        \"type\": \"map\",\n        \"map\": {\n          \"77483\": \"United States\"\n        }\n      }\n    }\n"')
 at [Source: (org.eclipse.jetty.server.HttpInputOverHTTP); line: 1, column: 1]

## Summary

* You learned this
* Remember this

## Go further

* Try this out on your own data
* Solve for problem X that is't covered here

## Learn more

* Read the [full API documentation](https://druid.apache.org/docs/27.0.0/api-reference/lookups-api)
* Watch or read something cool from the community
* Do some exploratory stuff on your own

In [None]:
# STANDARD CODE BLOCKS

# When just wanting to display some SQL results
display.sql(sql)

# When ingesting data:
sql_client.run_task(sql)
sql_client.wait_until_ready('wikipedia-en')
display.table('wikipedia-en')

# When you want to make an EXPLAIN look pretty
print(json.dumps(json.loads(sql_client.explain_sql(sql)['PLAN']), indent=2))

# When you want a simple plot
df = pd.DataFrame(sql_client.sql(sql))
df.plot(x='Tail_Number', y='Flights', marker='o')
plt.xticks(rotation=45, ha='right')
plt.gca().get_legend().remove()
plt.show()

# When you want to add some query context parameters
req = sql_client.sql_request(sql)
req.add_context("useApproximateTopN", "false")
resp = sql_client.sql_query(req)

# When you want to compare two different sets of results
df3 = df1.compare(df2, keep_equal=True)
df3