# Getting to know the ClickHouse-driver Client

This notebook has samples that were included in the [Altinity blog article that introduces the clickhouse-driver client library](https://www.altinity.com/blog/clickhouse-and-python-getting-to-know-the-clickhouse-driver-client).

_WARNING_: If you run the whole notebook it will hang.  You must run the samples one by one as one of them is designed to hang and must be cancelled manually. 

It's easy to load the clickhouse driver. The `Client` class is the main client interface. 

In [None]:
from clickhouse_driver import Client

If you are running against an unencrypted local server setting up a connection is as simple as the following. Instantiating a client does not actually connect to ClickHouse.  It just sets up the data structure used to connect later on when your code does something. 

In [None]:
client = Client('localhost')

Servers with sensitive data should be encrypted with a user/password and encrypted communications.  The following command shows how to connect to a server with self-signed certificate using an explicit database name. 

In [None]:
client = Client('localhost', 
                user='python', 
                password='secret', 
                secure=True, 
                verify=False, 
                database='default',
                compression=True)

The Python driver uses the Client.execute() method to issue select commands.  Results are returned as a list of tuples.  Let's send a very simple query and take apart the results to see values and types. 

*NOTE*: If you get an error about an unknown timezone, ensure your server has the timezone set properly.  

In [None]:
result = client.execute('SELECT now(), version()')
print("RESULT: {0}: {1}".format(type(result), result))
for t in result:
    print(" ROW: {0}: {1}".format(type(t), t))
    for v in t:
        print("  COLUMN: {0}: {1}".format(type(v), v))

Create the iris table, dropping any previously existing table of the same name.  The print statements show that result sets from DDL are empty. 

In [None]:
r1 = client.execute('DROP TABLE IF EXISTS iris')
print(r1)
r2 = client.execute('CREATE TABLE iris ('
                    'sepal_length Float64, sepal_width Float64, '
                    'petal_length Float64, petal_width Float64, '
                    'species String) ENGINE = MergeTree '
                    ' PARTITION BY species ORDER BY (species)')
print(r2)

Add some data to the table.  Note that the values are given in a separate array of tuples. 

In [None]:
client.execute(
    'INSERT INTO iris (sepal_length, sepal_width, petal_length, petal_width, species) VALUES',
    [(5.1, 3.7, 1.5, 0.4, 'Iris-setosa'), (4.6, 3.6, 1.0, 0.2, 'Iris-setosa')]
)
print(client.execute("SELECT * FROM iris"))

Here is an example of how to insert CSV.  We read the values line by line using csv.DictReader() running inside the generator function row_reader().  This results in a tuple for each line.  Note that *you must* assign types or your values will not convert. The csv module converts everything to a string. 

In [None]:
client.execute("TRUNCATE TABLE iris")

import datetime
import csv

# Create a generator to fetch parsed rows. CSV must have variable names in header row.
def row_reader():
    with open('iris_with_names.csv') as iris_csv:
        # Use DictReader to get values as a dictionary with column names.
        for line in csv.DictReader(iris_csv):
            yield {
                'sepal_length': float(line['sepal_length']), 
                'sepal_width': float(line['sepal_width']), 
                'petal_length': float(line['petal_length']), 
                'petal_width': float(line['petal_width']), 
                'species': str(line['species']), 
            }

# Use a generator expression to load values as a list of dictionaries. 
client.execute("INSERT INTO iris VALUES", (line for line in row_reader()))
client.execute("SELECT count(*) FROM iris")

That was painful. We dislike pain. A better approach to non-toy CSV files is to use Pandas, which has a very good method for reading CSV that automatically coerces types. This is much simpler! 

In [None]:
client.execute("TRUNCATE TABLE iris")

import pandas as pd
df = pd.read_csv('iris_with_names.csv')

client.execute("INSERT INTO iris VALUES", [tuple(x) for x in df.values])
client.execute("SELECT count(*) FROM iris")

The next few queries show examples of select statements. 

In [None]:
result = client.execute('SELECT COUNT(*), species FROM iris '
                        'WHERE petal_length > 3.4 '
                        'GROUP BY species ORDER BY species')
print(result)

In [None]:
result = client.execute('SELECT COUNT(*), species FROM iris '
                        'WHERE petal_length > %(max_len)s '
                        'GROUP BY species ORDER BY species', 
                        {'max_len': 3.4})
print(result)

Show how to get the column names for results. Note that we also get the column types, which is convenient for conversions. 

In [None]:
result, columns = client.execute('SELECT COUNT(*), species FROM iris '
                                 'WHERE petal_length > %(max_len)s '
                                 'GROUP BY species ORDER BY species', 
                                 {'max_len': 3.4},
                                 with_column_types=True)
print(result)
print(columns)

This final example shows how to put a result set into a pandas data frame.  We'll use the column names so that the DataFrame has correct labels.

In [None]:
import pandas
result, columns = client.execute('SELECT * FROM iris WHERE species = %(species)s LIMIT 5', 
                                 {'species': "Iris-setosa"}, with_column_types=True)
df = pandas.DataFrame(result, columns=[tuple[0] for tuple in columns])
df.tail()

Since we're using pandas and may like to put this data into graphs, etc., we need to ensure the data types are correct.  Let's describe the data set to ensure that the columns with numbers really appear as numbers.  The following should show metrics for length and width values but nothing for species, which is a string. 

In [None]:
df.describe()