# Catalog: analytic_tables

Before we are able to download datasets that are part of the _analytic_tables_
catalog we need a user name and password to ssh-access a remote machine.

From our metadata catalog we are able to get all the necessary information
to open a SFTP connection with the machine that holds our dataset. First we
check what datasets are in the catalog.

In [None]:
import src.utils as ut
import src.ftp as ftp

# Setup the root path of the application
project_path = ut.project_path()

# Get contentUrl from metadata file
meta_filename = f"{project_path}/meta/mosquito_alert/analytic_tables.json"
ut.info_meta(meta_filename)

## Dataset: _tigaserver_app_tigauser_

### 1. Distribution by SFTP protocol from MosquitoAlert webserver


If we would like to download the dataset _tigaserver_app_tigauser_ we just
needs its contentUrl information. Note that _analytic_tables_ has many
parts (datasets) and one of those is _tigaserver_app_tigauser_ which is the
first in the list of the _hasPart_ key attribute of the metadata file.


In [None]:
# Get the dataset tigaserver_app_tigauser form the _analytic_tables_ catalog
contentUrl, dataset_name, distr_name = ut.get_meta(
    meta_filename, idx_distribution=0, idx_hasPart=0, parse=True
)

# Make folders for data download
path = f"{project_path}/data/{dataset_name}/{distr_name}"
ut.makedirs(path)

In [None]:
# Insert user password to connect by ftp
password = input()

# Get the dataframe
df = ftp.read_csv_sftp(
    hostname=contentUrl.hostname,
    port=contentUrl.port,
    username=contentUrl.username,
    password=password,
    remotepath=contentUrl.path,
)

df.info()

In [None]:
# Save reports on CSV or parquet
filename = f"{path}/dataset"
df.to_parquet(f"{filename}.parquet")  # very low file-size (need to install pyArrow)
df.to_csv(f"{filename}.csv")  # x10 size if compared with the dataframe