# CrateDB UK Offshore Wind Farms Data Workshop

![Turbines forming part of a wind farm near the UK coastline.](multi-model-offshore-wind-farms.jpg "A wind farm near the UK coastline.")

This workshop explores multi-model data modeling and queries with [CrateDB](https://cratedb.com), using data from [The Crown Estate](https://www.thecrownestate.co.uk/our-business/marine/offshore-wind) which manages the UK's offshore wind farms.  It is derived from a conference presentation that you can [watch on YouTube](https://www.youtube.com/watch?v=xqiLGjaTlBk).

You'll work with tables containing data for:

* **Wind Farms**.  Details of the UK's 45 offshore wind farms are loaded into a CrateDB table from the supplied JSONL file.  Each record includes an ID for the wind farm as well as a name, description and geospatial data in [WKT format](https://en.wikipedia.org/wiki/Well-known_text_representation_of_geometry) that describes the shape of the wind farm as one or more polygons.  The co-ordinates of each turbine are also included, where known.

* **Hourly Wind Farm Performance Data**.  This table contains time-series data pertaining to the power output of each wind farm on an hourly basis for the period 19th August 2024 to 28th October 2024.  The data is supplied as a compressed JSONL file.

## Install Dependencies

First, install the required dependencies by executing the `pip install` command below.

In [None]:
! pip install -U ipyleaflet sqlalchemy-cratedb pandas

## Connect to CrateDB

Before going any further, you'll need to update the code below to include a connection string for your CrateDB cluster.  If yo prefer, you can set the environment variable `CRATEDB_CONNECTION_STRING` instead.

The code below assumes that you're using a managed CrateDB Cloud database cluster.  [Sign up here](https://console.cratedb.cloud/) to create a free cluster.

Alternatively, if you are running CrateDB locally (for example with [Docker](https://hub.docker.com/_/crate)), use the `localhost` code block to eastablish a database connection instead.

In [None]:
import os
import sqlalchemy as sa

# Define database address when using CrateDB Cloud.
# Please find these settings on your cluster overview page.
CONNECTION_STRING = os.environ.get(
    "CRATEDB_CONNECTION_STRING",
    "crate://<USERNAME>:<PASSWORD>@<HOST>/?ssl=true",
)

# # Define database address when using CrateDB on localhost.
# CONNECTION_STRING = os.environ.get(
#    "CRATEDB_CONNECTION_STRING",
#    "crate://crate@localhost/",
# )

# Connect to CrateDB using SQLAlchemy.
engine = sa.create_engine(
    CONNECTION_STRING, 
    echo=sa.util.asbool(os.environ.get("DEBUG", "false")))
connection = engine.connect()

## Create Tables

Next, we'll create two tables as follows:

* `windfarms`: Contains data about each wind farm, including geospatial data, the nunber, type and location of each turbine, and a free-text description providing an overview of the wind farm and its history.

* `windfarm_output`: Hourly records for each wind farm containing details of the actual output for that hour and the percentage of the maximum output that the wind farm was operating at.

Run the code below to create them, taking a moment to understand the table schemas.

In [None]:
# Drop any previous version of the tables.
_ = connection.execute(sa.text("DROP TABLE IF EXISTS windfarms;"))
_ = connection.execute(sa.text("DROP TABLE IF EXISTS windfarm_output;"))

# Create the tables.

_= connection.execute(sa.text(
"""
    CREATE TABLE windfarms (
        id TEXT PRIMARY KEY,
        name TEXT,
        description TEXT INDEX USING fulltext WITH (analyzer='english'),
        location GEO_POINT,
        territory TEXT,
        boundaries GEO_SHAPE INDEX USING geohash WITH (PRECISION='1m', DISTANCE_ERROR_PCT=0.025),
        turbines OBJECT(STRICT) AS (
            brand TEXT,
            model TEXT,
            locations ARRAY(GEO_POINT),
            howmany SMALLINT
        ),
        capacity DOUBLE PRECISION,
        url TEXT
    );
"""    
))

_= connection.execute(sa.text(
"""
    CREATE TABLE windfarm_output (
        windfarmid TEXT,
        ts TIMESTAMP WITHOUT TIME ZONE,
        day TIMESTAMP WITH TIME ZONE GENERATED ALWAYS AS date_trunc('day', ts),
        output DOUBLE PRECISION,
        outputpercentage DOUBLE PRECISION
    ) PARTITIONED BY (day);
"""    
))

TODO commentary on the table schemas.

## Loading the Data

We'll load the data from files contained in the `cratedb-datasets` public GitHub repository.  There's one JSONL file for each table.  The file containing the hourly output data has been compressed.

The code that follows populates each table in turn, using `COPY FROM` statements.

In [None]:
def display_results(table_name, info):
    print(f"{table_name}: loaded {info['success_count']}, errors: {info['error_count']}")

    if info["error_count"] > 0:
        print(f"Errors: {info['errors']}")


# Load the wind farm data.
result = connection.execute(sa.text("""
    COPY windfarms 
    FROM 'https://github.com/crate/cratedb-datasets/raw/main/devrel/uk-offshore-wind-farm-data/wind_farms.json'
    RETURN SUMMARY;
"""))

display_results("windfarms", result.mappings().first())

In [None]:
# Load the wind farm output data.
result = connection.execute(sa.text("""
    COPY windfarm_output
    FROM 'https://github.com/crate/cratedb-datasets/raw/main/devrel/uk-offshore-wind-farm-data/wind_farm_output.json.gz' 
    WITH (compression='gzip')
    RETURN SUMMARY;
"""))

display_results("windfarm_data", result.mappings().first())

Once the data's located, verify that the output shows 0 errors for each table.  Next, we'll run `REFRESH` and `ANALYZE` commands to make sure that the data's ready for immediate querying.  This isn't normally necessary as CrateDB will perform these tasks automatically in the background.  We're invoking them manually here to ensure that everyone in the workshop is on the same page at the same time.

In [None]:
_ = connection.execute(sa.text("REFRESH TABLE windfarms, windfarm_output"))
_ = connection.execute(sa.text("ANALYZE"))

TODO... the bulk of the query examples content!

## Continue your Learning Journey

To learn more about CrateDB, sign up for our free courses at the CrateDB Academy.  We recommend the [CrateDB Fundamentals course](https://learn.cratedb.com/cratedb-fundamentals) for a comprehensive overview, and our [Advanced Time Series course](https://learn.cratedb.com/time-series) for a deep dive into time series data modelling, queries and aggregations.