# Install and test software for using SQL

This notebook will guide you through installing the software to use SQL directly in JupyterLab notebooks and within Python programs.

We will be working with **DuckDB**, a "lightweight" implementation of SQL. You can learn more about DuckDB [here.](https://duckdb.org) 

In addition, we will install **JupySQL** to act as an interface between DuckDB and Jupyter. Information is [here.](https://jupysql.ploomber.io/en/latest/quick-start.html#)

We'll start by installing software, then download some files and test whether everything is working as expected. For the testing (but not the installation), you will need to have the `s_orders.csv` file in the working directory.

## Install software   
You only need to carry out the following steps once!

In [None]:
pip install duckdb==0.9.2 --quiet

In [None]:
pip install jupysql --quiet

In [None]:
pip install duckdb-engine --quiet

## Load extension

Next, we need to load a package that tells Jupyter how to interface with a relational database. This is provided by JupySQL, which we installed earlier.   

Note the `%` symbol at the beginning of line in the code cell. This is called a "magic" and signifies to Jupyter that what follows on this line is **not** Python code. In this case, the rest of the magic command tells Jupyter to load the JupySQL module. There are many other magic commands (we will see some examples later).

In [None]:
%load_ext sql

## Test the installed software

Now that software is installed, we want to make sure everything is working as expected. We'll start by downloading the famous `penguins` dataset. 

In [None]:
from pathlib import Path                   # this is a function in a standard Python library
from urllib.request import urlretrieve     # same

if not Path("penguins.csv").is_file():     # now we download the dataset; the if not syntax replaces a previous file
    urlretrieve(
        "https://raw.githubusercontent.com/mwaskom/seaborn-data/master/penguins.csv",
        "penguins.csv",
    )

Next, we connect to DuckDB using a magic and create a temporary database in memory.

In [None]:
%sql duckdb://

We will now use a magic to issue an SQL statement that queries `penguins.csv`. The magic `%sql` tells Jupyter that what follows is SQL code, not Python (the default assumption). Note that DuckDB can interact with this `.csv` file as if it were already part of a relational database.   

After running the command below, you should see a table with seven colums and five rows.

In [None]:
%sql SELECT * FROM penguins.csv WHERE species = 'Chinstrap' LIMIT 5;

Finally, we'll make an authentic SQL table and query it. This code creates an SQL table called `orders`, then reads a `.csv` file to load data into the table. The process of inserting data is called *populating* a table. For now, don't worry about syntax; we will cover how to create and manipulate tables later. 

In [None]:
%%sql
DROP TABLE IF EXISTS orders;
CREATE TABLE orders(
    order_ioc VARCHAR PRIMARY KEY,
    seq SMALLINT NOT NULL,
    familiar_order VARCHAR,
    taxonomy VARCHAR
    );
COPY orders FROM 's_orders.csv';

Now we can interact with the table using a magic followed by standard SQL query.   

After running the command below, you should see a short table with four colums and five rows.

In [None]:
%sql SELECT * from orders ORDER BY seq LIMIT 5;