# Introduction to Smartnoise-SQL

[Smartnoise-SQL](https://docs.smartnoise.org/sql/index.html) is a python library that enables to perform differentially private SQL queries. 

SmartNoise is intended for scenarios where the analyst is trusted by the data owner.

## Step 1: Install the library

Smartnoise-sql is available on pypi, it can be installed via the pip command. We will use the latest version of the library to date: version 1.0.6.

In [2]:
!pip install smartnoise-sql==1.0.6

Defaulting to user installation because normal site-packages is not writeable
Collecting smartnoise-sql==1.0.6
  Downloading smartnoise_sql-1.0.6-py3-none-any.whl.metadata (9.6 kB)
Collecting antlr4-python3-runtime==4.9.3 (from smartnoise-sql==1.0.6)
  Downloading antlr4-python3-runtime-4.9.3.tar.gz (117 kB)
  Preparing metadata (setup.py) ... [?25ldone
[?25hCollecting graphviz<1.0,>=0.17 (from smartnoise-sql==1.0.6)
  Downloading graphviz-0.21-py3-none-any.whl.metadata (12 kB)
Collecting opendp<0.13.0,>=0.8.0 (from smartnoise-sql==1.0.6)
  Downloading opendp-0.12.1-cp39-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (6.1 kB)
Collecting sqlalchemy<3.0.0,>=2.0.0 (from smartnoise-sql==1.0.6)
  Downloading sqlalchemy-2.0.43-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (9.6 kB)
Collecting deprecated (from opendp<0.13.0,>=0.8.0->smartnoise-sql==1.0.6)
  Downloading Deprecated-1.2.18-py2.py3-none-any.whl.metadata (5.7 kB)
Collecting wrapt<2,>=1.10 (from

For this notebook, we will also use `pandas` library, which is one if the main python library to work with tables. We also install it via `pip`.

In [6]:
!pip install pandas==2.2.3

Defaulting to user installation because normal site-packages is not writeable


## Step 2: Load and prepare data

In this notebook, we will work with the [penguin dataset]("https://raw.githubusercontent.com/mwaskom/seaborn-data/master/penguins.csv") from [seaborn datasets](https://github.com/mwaskom/seaborn-data).
We load the dataset via pandas in a dataframe `df`.

In [7]:
import pandas as pd

In [8]:
path_to_data = "https://raw.githubusercontent.com/mwaskom/seaborn-data/master/penguins.csv"
df = pd.read_csv(path_to_data)

We can look at the first rows of the dataframe to get to know the data:

In [9]:
df.head()

Unnamed: 0,species,island,bill_length_mm,bill_depth_mm,flipper_length_mm,body_mass_g,sex
0,Adelie,Torgersen,39.1,18.7,181.0,3750.0,MALE
1,Adelie,Torgersen,39.5,17.4,186.0,3800.0,FEMALE
2,Adelie,Torgersen,40.3,18.0,195.0,3250.0,FEMALE
3,Adelie,Torgersen,,,,,
4,Adelie,Torgersen,36.7,19.3,193.0,3450.0,FEMALE


We see that there are 7 columns: 'species', 'island', 'bill_length_mm', 'bill_depth_mm', 'flipper_length_mm', 'body_mass_g' and 'sex' with various data types.

## Step 3: Query with Smarnoise-SQL

### Step 3.a: Prepare the metadata

In the next steps, `smartnoise-sql` will require metadata to do the differentially private queries. The format expected is explained [here](https://docs.smartnoise.org/sql/metadata.html#metadata) in `smartnoise-sql` documentation. It can be provided in different format such as an external `yaml` file or a dictionnary. In this notebook we will use the [dictionnary format](https://docs.smartnoise.org/sql/metadata.html#dictionary-format).

All global parameters have default values the we will 