# PyDrill Demonstration
This notebook demonstrates how to use the PyDrill module to connect to Apache Drill and query data. The complete documentation for PyDrill can be found at http://pydrill.readthedocs.io

The essential steps are:
1.  Import the module
2.  Open a connection to Drill
3.  Execute a query
4.  Do something with the results. 

You will first need to install PyDrill.  This can be done by opening a terminal and typing:
```python
pip install pydrill
```
## Step 1:  Import the PyDrill module
After you've done this, you will be able to import the PyDrill module.

In [1]:
from pydrill.client import PyDrill

## Step 2:  Open a connection to Drill
The next step is to open a connection to Drill.  Once you've opened the connection, you will want to verify that the connection was successfully opened before executing any queries.  PyDrill includes an `is_active()` method for this purpose.

In [2]:
#Open a connection to Drill
drill = PyDrill(host='localhost', port=8047)

#Verify the connection is active, throw an error if not.
if not drill.is_active():
    raise ImproperlyConfigured('Please run Drill first')

## Step 3: Execute a query and get the results
The next and final step is to execute a query in Drill.  When you call the `.query()` method, PyDrill returns an iterable object from which you can extract the rows of your results.  You can also get PyDrill to return a pandas DataFrame. 

In [5]:
#Execute query in Drill
query_result = drill.query('''
SELECT JobTitle, 
AVG( TO_NUMBER( AnnualSalary, '¤' )) AS avg_salary, 
COUNT( DISTINCT `EmpName` ) AS number
FROM dfs.drillclass.`baltimore_salaries_2016.csvh`
GROUP BY JobTitle
Order BY avg_salary DESC
LIMIT 50
''')

#Iterate through the rows.
for row in query_result:
    print( row )

{'avg_salary': '238772.0', 'JobTitle': "STATE'S ATTORNEY", 'number': '1'}
{'avg_salary': '200000.0', 'JobTitle': 'Police Commissioner', 'number': '1'}
{'avg_salary': '182500.0', 'JobTitle': 'Executive Director V', 'number': '1'}
{'avg_salary': '171635.0', 'JobTitle': 'MAYOR', 'number': '1'}
{'avg_salary': '171306.5', 'JobTitle': 'Executive Director III', 'number': '10'}
{'avg_salary': '169800.0', 'JobTitle': 'CITY SOLICITOR', 'number': '1'}
{'avg_salary': '169800.0', 'JobTitle': 'DIRECTOR PUBLIC WORKS', 'number': '1'}
{'avg_salary': '163000.0', 'JobTitle': 'CITY AUDITOR', 'number': '1'}
{'avg_salary': '154900.0', 'JobTitle': 'Deputy Police Commissioner', 'number': '2'}
{'avg_salary': '153905.0', 'JobTitle': 'Executive Director I', 'number': '4'}
{'avg_salary': '149825.0', 'JobTitle': 'Executive Director IV', 'number': '4'}
{'avg_salary': '146500.0', 'JobTitle': 'Assistant Fire Chief', 'number': '3'}
{'avg_salary': '140800.0', 'JobTitle': 'Chief of Utility Finances', 'number': '1'}
{'av

### Retrieving a DataFrame
You can also get PyDrill to directly return a DataFrame by using the `.to_dataframe()` method of the results object.

In [6]:
df = query_result.to_dataframe()
df.head()

Unnamed: 0,JobTitle,avg_salary,number
0,STATE'S ATTORNEY,238772.0,1
1,Police Commissioner,200000.0,1
2,Executive Director V,182500.0,1
3,MAYOR,171635.0,1
4,Executive Director III,171306.5,10


## In Class Exercise:
Using the data in the `dailybots.csv` file use Drill to:
1.  Query the file to produce a summary of infections by day.
2.  Store this data in a dataframe using the `to_dataframe()` method.
3.  Create a line plot of this data by calling the .plot() method on the dataframe