# Introduction to TileDB Arrays

* This notebook is part 1 of the FOSS4G workshop "Universal data management for all geospatial data in TileDB" 
* Find the notebook on [GitHub]() or [TileDB Cloud]()

## Outline

* [A simple table](#table)
    * [Create a table](#table1)
    * [Convert the table to a TileDB array](#table2)
    * [Explore the TileDB array](#table3)
* [Dense arrays](#dense)
    * [Dense array schema](#dense1)
* Sparse arrays
* TileDB Cloud    

In [1]:
import numpy as np
import pandas as pd

import tiledb

<a id="table"></a>
## A simple table

<a id="table1"></a>
### Create a table 

> The [original dataset](https://simplemaps.com/data/world-cities) is cleaned up in [this notebook]()

In [2]:
capitals = pd.read_csv("./data/capitals.csv")
capitals.head()

Unnamed: 0.1,Unnamed: 0,city,lat,lon,country,iso3,population
0,0,Tokyo,35.6897,139.6922,Japan,JPN,37977000.0
1,1,Jakarta,-6.2146,106.8451,Indonesia,IDN,34540000.0
2,4,Manila,14.5958,120.9772,Philippines,PHL,23088000.0
3,7,Seoul,37.5833,127.0,"Korea, South",KOR,21794000.0
4,8,Mexico City,19.4333,-99.1333,Mexico,MEX,20996000.0


<a id="table2"></a>
### Convert the table to a TileDB array

With [pandas](https://tiledb-inc-tiledb.readthedocs-hosted.com/projects/tiledb-py/en/stable/python-api.html#tiledb.from_pandas):

In [3]:
uri = "arrays/capitals1"

tiledb.from_pandas(uri, capitals)

Or directly [from the csv file](https://tiledb-inc-tiledb.readthedocs-hosted.com/projects/tiledb-py/en/stable/python-api.html#tiledb.from_csv):

In [4]:
uri = "arrays/capitals2"

tiledb.from_csv(uri, "data/capitals.csv")

That is all! You have created your first TileDB array! 

<a id="table3"></a>
### Explore the TileDB array

But, what do these arrays now look like? And how can you work with the data in them?

> Find more info about the TileDB format specification [here](https://github.com/TileDB-Inc/TileDB/blob/dev/format_spec/FORMAT_SPEC.md) or in the [docs](https://docs.tiledb.com/main/basic-concepts/data-format)

An array is stored in a directory. For `capitals1` this looks like: 

In [5]:
%ls arrays/capitals1

[34m__1626970989823_1626970989823_b9cf74a787674969adb4664bbc1034ef_8[m[m/
__1626970989823_1626970989823_b9cf74a787674969adb4664bbc1034ef_8.ok
__array_schema.tdb
__lock.tdb
[34m__meta[m[m/


Also have a look at the first folder **by updating the path in the below cell to the path on your system**:

In [8]:
%ls arrays/capitals1/__1626970989823_1626970989823_b9cf74a787674969adb4664bbc1034ef_8/

Unnamed%3A 0.tdb         country.tdb              lat.tdb
__fragment_metadata.tdb  country_var.tdb          lon.tdb
city.tdb                 iso3.tdb                 population.tdb
city_var.tdb             iso3_var.tdb


In the above you will recognise the column names from the table. **Etc......** 

An array is defined by it's schema. **Etc....**

Load the schema:

In [6]:
uri = "arrays/capitals1"
A = tiledb.open(uri)
print(A.schema)

ArraySchema(
  domain=Domain(*[
    Dim(name='__tiledb_rows', domain=(0, 207), tile='207', dtype='uint64'),
  ]),
  attrs=[
    Attr(name='Unnamed: 0', dtype='int64', var=False, nullable=False),
    Attr(name='city', dtype='<U0', var=True, nullable=False),
    Attr(name='lat', dtype='float64', var=False, nullable=False),
    Attr(name='lon', dtype='float64', var=False, nullable=False),
    Attr(name='country', dtype='<U0', var=True, nullable=False),
    Attr(name='iso3', dtype='<U0', var=True, nullable=False),
    Attr(name='population', dtype='float64', var=False, nullable=False),
  ],
  cell_order='row-major',
  tile_order='row-major',
  capacity=10000,
  sparse=False,
  coords_filters=FilterList([ZstdFilter(level=-1)]),
)



Load all data:

Load a slice of the data:

Load filtered data, etc...

This first example is a sparse array. Let's now go into a little more detail on dense arrays.

<a id="dense"></a>
## Dense arrays

### Dense array schema