# (Result) by (action) using (feature)
<!--
  ~ Licensed to the Apache Software Foundation (ASF) under one
  ~ or more contributor license agreements.  See the NOTICE file
  ~ distributed with this work for additional information
  ~ regarding copyright ownership.  The ASF licenses this file
  ~ to you under the Apache License, Version 2.0 (the
  ~ "License"); you may not use this file except in compliance
  ~ with the License.  You may obtain a copy of the License at
  ~
  ~   http://www.apache.org/licenses/LICENSE-2.0
  ~
  ~ Unless required by applicable law or agreed to in writing,
  ~ software distributed under the License is distributed on an
  ~ "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
  ~ KIND, either express or implied.  See the License for the
  ~ specific language governing permissions and limitations
  ~ under the License.
  -->

(Common situation or problem and how this approach helps).
In this tutorial, (step 1, 2, 3).

## Prerequisites

This tutorial works with Druid 26.0.0 or later.

#### Run using Docker

<!-- Profiles are:
`druid-jupyter` - just Jupyter and Druid
`all-services` - includes Jupyter, Druid, and Kafka
 -->

Launch this tutorial and all prerequisites using the ....... profile of the Docker Compose file for Jupyter-based Druid tutorials. For more information, see [Docker for Jupyter Notebook tutorials](https://druid.apache.org/docs/latest/tutorials/tutorial-jupyter-docker.html).
   
#### Run Druid without Docker

If you do not use the Docker Compose environment, you need the following:

* A running Apache Druid instance, with a `DRUID_HOST` local environment variable containing the servername of your Druid router
* [druidapi](https://github.com/apache/druid/blob/master/examples/quickstart/jupyter-notebooks/druidapi/README.md), a Python client for Apache Druid. Follow the instructions in the Install section of the README file.

 <!-- Remove as needed -->
* A running Apache Kafka instance, with a `KAFKA_HOST` local environment variable containing the broker server name
* [matplotlib](https://matplotlib.org/), a library for creating visualizations in Python,
* [pandas](https://pandas.pydata.org/), a data analysis and manipulation tool.



### Initialization

Run the next cell to attempt a connection to Druid services. If successful, the Druid version number will be shown in the output.

In [None]:
import druidapi
import os

if 'DRUID_HOST' not in os.environ.keys():
    druid_host=f"http://localhost:8888"
else:
    druid_host=f"http://{os.environ['DRUID_HOST']}:8888"
    
print(f"Opening a connection to {druid_host}.")
druid = druidapi.jupyter_client(druid_host)

display = druid.display
sql_client = druid.sql
status_client = druid.status

status_client.version

In [None]:
Run the next cell to connect to Kafka.

In [None]:
# INCLUDE THIS CELL IF YOUR NOTEBOOK USES KAFKA  
# Use kafka_host variable when connecting to kafka 
import os

if 'KAFKA_HOST' not in os.environ.keys():
   kafka_host=f"http://localhost:9092"
else:
    kafka_host=f"{os.environ['KAFKA_HOST']}:9092"

<!-- Include this if you're relying on someone ingesting example data through the console -->

### Example Data

Once your Druid environment is up and running, ingest the sample data for this tutorial.

Open the Druid console, and ingest the data as follows:

1. Select **Load data** from the top-level navigation.
2. Select **Batch - SQL**.
3. For the input type, select **Example data**.
4. Select **FlightCarrierOnTime (1 month)**.
5. Click **Use example**.

Go through the data loader wizard.

<!-- Add something here about the target table name and any steps to follow in the wizard. -->

The following cell will describe the table, a handy way to check that the table that you will need for this notebook is present.

In [None]:
# REPLACE THE TABLE NAME HERE

display.table('On_Time_Reporting_Carrier_On_Time_Performance_(1987_present)_2005_11')

Finally, run the following cell to import additional Python modules that you will use to X, Y, Z.

In [None]:
# INCLUDE THIS CELL AND THE ABOVE IF YOU HAVE MORE imports, variables, ETC. FOR EXAMPLE

import json
import matplotlib
import matplotlib.pyplot as plt
import pandas as pd

## Awesome!

The main body of your notebook goes here!

### This is a step

Here things get done

### And so is this!

Wow! Awesome!

## Conclusion

* You learned this
* Remember this

## Learn more

* Read docs pages
* Watch or read something cool from the community
* Do some exploratory stuff on your own

In [None]:
# HERE ARE SOME USEFUL FUNCTIONS!

# When just wanting to display some SQL results

display.sql(sql)

# When ingesting data:
sql_client.run_task(sql)
sql_client.wait_until_ready('wikipedia-en')
display.table('wikipedia-en')

# When you want to make an EXPLAIN look pretty

print(json.dumps(json.loads(sql_client.explain_sql(sql)['PLAN']), indent=2))

# When you want a simple plot

df = pd.DataFrame(sql_client.sql(sql))

df.plot(x='Tail_Number', y='Flights', marker='o')
plt.xticks(rotation=45, ha='right')
plt.gca().get_legend().remove()
plt.show()

# When you want to add some query context parameters

req = sql_client.sql_request(sql)
req.add_context("useApproximateTopN", "false")
resp = sql_client.sql_query(req)

# When you want to compare two different sets of results

df3 = df1.compare(df2, keep_equal=True)
df3

