# Stage 2: prepare station metadata in the DocumentDB

#### 🚀 Targets
1. Check the DocumentDB implementation (with correct URI and r/w permission).
2. Populate the DocumentDB with station metadata.

#### ⚠️ Checklist
1. Make sure you have the DocumentDB cluster running and the `DOCDB_ENDPOINT_URI` in [parameters.py](../sb_catalog/src/parameters.py) has been filled properly.
2. This notebook has to be running on an EC2 instance, under the same VPC & security group as the DocumentDB cluster.
3. Make sure a key file `global-bundle.pem` exist in your current directory. If not, download it using the command below.
   ```
       $ wget https://truststore.pki.rds.amazonaws.com/global/global-bundle.pem
   ```

In [None]:
import sys
import pandas as pd

sys.path.append("../sb_catalog")

from src.parameters import DOCDB_ENDPOINT_URI
from src.utils import SeisBenchDatabase

## 1. Populate the DocumentDB

Connect to the DocumentDB and write the station metadata.

In [None]:
db = SeisBenchDatabase(DOCDB_ENDPOINT_URI, "earthscope")

In [None]:
for netfile in tqdm(sorted(glob.glob("../networks/*.zip"))):
    stations = pd.read_csv(netfile)
    for i, s in stations.iterrows():
        cha = stations.loc[i, "channels"]
        cha = list(set([i[:2] for i in cha.split(",")]))
        stations.loc[i, "channels"] = ",".join(cha)
    stations.location_code = stations.apply(lambda s: s.id.split('.')[-1], axis = 1)
    db.write_stations(stations)

## 2. Check the DocumentDB

Just to make sure that the DB has been populated with the station metadata, and we are able to read from it.

In [None]:
network = "BG,BK,BP,NC,PG,WR"
db.get_stations(None, network)