# Connecting to Postgres from a Jupyter notebook

We need to use a Jupyter "magic" instruction which starts with a `%` for inline or `%%` if we want to apply it to the whole code block.

We first do `%load_ext sql` to load the SQL extension:

Then we establish a connection with `%sql postgresql://<username>:<password>@localhost[/<dbname>]`

We can create a new database to isolate everything we will do and to make it easier to drop it later and start again.

We create database with a `CREATE DATABASE <dbname>` command. We drop a database (if it exists) by running `DROP DATABASE IF EXISTS <dbname>`.

# Configuring MADlib

To set up MADlib go to http://madlib.apache.org/download.html and get the appropriate package (for Ubuntu pick the 4th option). After installation, we need to set up MADlib in our Postgres database:

```
/usr/local/madlib/bin/madpack -s madlib -p postgres -c postgres@localhost/discogs install
```

Then, to check if everything went OK, we run the `SELECT madlib.version()` query.

# Defining the Database Schema

We start off by running `DROP TABLE IF EXISTS <table1>[, <table2>, ...]` to delete any table if we have already created it. This makes the code block _idempotent_ and enables us to run it multiple times.

Then we define the schema for all tables. The relational model is the folliwng:

```
artists : (artist_id : int, name : varchar(256)?, realname : text?, profile : text?, url : text?)
    key: artist_id

releases : (release_id : int, released : date, title : text, country : varchar(256)?, genre : varchar(256))
    key: release_id

released_by : (release_id : int, artist_id : int)
    key: release_id, artist_id
    foreign key: release_id : releases(release_id), artist_id : artists(artist_id)

tracks : (release_id : int, position : varchar(128), title : text?, duration : int?)
    key: release_id, position
    foreign key: release_id : releases(release_id)

```

We use the `?` sign to denote attributes that are nullable.

# Visualizing the ER diagram of the database

We can use the eralchemy tool to display the ER diagram of the database schema we have just created. We need to import the `render_er` function from the `eralchemy` module. We will also be using the `IPython.display.Image` function to display the PNG that eralchemy outputs.

***

_Note:_ To install eralchemy, you will need to first run

`sudo apt-get install graphviz, libgraphviz-dev`

and then install the Python package with:

`pip install eralchemy`

***

# Loading the data