Skip to content

kids-first/kf-api-dataservice

Repository files navigation

Data Service

The Kids First Data Service provides a REST API to the Kids First data.

👩‍💻 Development

Run API

If you're developing an application that talks to Dataservice API, the fastest way to get a development service of your own running is with docker compose.

This means you do not need to install anything on your local machine (besides Docker).

git clone git@github.com:kids-first/kf-api-dataservice.git
cd kf-api-dataservice

# Create the environment variables used by docker-compose
cp .env.sample .env

# Bring up the dataservice and postgres
docker-compose up --build

This will start the dataservice api on port 5000 with a backing postgres database initialized with the current data model.

Develop API

If you're developing features of the API and the data model, you can setup your development environment using 1 of 2 options:

Option 1: Develop API in Docker

  • Pro: Quick setup, no installation of dependencies
  • Con: Since everything is running in containers, need to use docker container exec to run things in container
  • Con: You'll need to make a couple of small changes to run the dockerized service in debug mode (live updates)
  1. In docker-compose.yml change the following under the dataservice block:
    # command: /bin/ash -c "sleep 5; ./bin/run.sh"
    command: /bin/ash -c "sleep 5; flask db upgrade; ./manage.py"

    port:
        - "5000:5000"    
  1. Bind host to all interfaces in manage.py:
if __name__ == '__main__':
    app.run(host="0.0.0.0")
  1. Follow the instructions in Run API

Now your service should be running at http://localhost:5000 inside the docker-compose stack. The changes you made above allow the service to run in debug mode which means when you make changes to the code, it should reload the service automatically so that you can see your updates in realtime instead of having to bring down the stack and bring it up again.

Option 2: Develop API on Machine

  • Pro: May be more performant. Don't have to use docker exec -it <command>
  • Con: Need to install a bunch of stuff on local machine including Python 3.7.11

In this setup we will run Postgres in a docker container and the Dataservice on your local machine:

# Get source from github
git clone git@github.com:kids-first/kf-api-dataservice.git
cd kf-api-dataservice

# Follow steps to install pyenv
https://realpython.com/intro-to-pyenv/

# Setup python environment and install dependencies
pyenv virtualenv 3.7.11 dataservice_venv
pyenv local dataservice_venv

# Important but *temporary*
# See https://github.com/yaml/pyyaml/issues/724
pip install "cython<3.0.0" && pip install --no-build-isolation "pyyaml==5.4.0"

pip install -r dev-requirements.txt
pip install -r requirements.txt
pip install -e .

# Configure and run postgres 
cp .env.sample .env
docker-compose up dataservice_pg

# Configure and run migrations 
source ./env_local.sh 
flask db upgrade

# Run the flask web application
./manage.py

Database

Running postgres inside of a container and binding back to the host should be sufficent for most development needs. If you want to access psql directly, you can always connect using the following

docker exec dataservice_pg psql -U postgres dataservice

If you'd like to use system install of postgres, or a database running remotely, the dataservice can be configured with the following environment variables:

  • PG_HOST - the host postgres is running on
  • PG_PORT - the port postgres is listening on
  • PG_NAME - the name of the database in postgres
  • PG_USER - the postgres user to connect with
  • PG_PASS - the password of the user

Indexd

Gen3/Indexd is used for tracking most of the file information in the data model. It requires some environment variables to be set for the full functionality, however, this requires a deploment of Indexd which is currently difficult to do for development. The INDEXD_URL can be set to None so that files may still be registered in the data model, though many of the fields will not be persisted.

  • INDEXD_URL - the url of the indexd api
  • INDEXD_USER - the username of a user in the indexd api
  • INDEXD_PASS - the password of the user in the indexd api

Alternativly, an INDEXD_SECRET may be used in place of the INDEXD_USER and INDEXD_PASS to load the secrets from vault.

✅ Testing

Unit tests and pep8 linting is run via pytest tests. Depending on your development environment setup you can run tests like this:

When everything is running in Docker

docker exec dataservice pytest tests

When Dataservice is running locally

pytest tests

📝 Documentation

The swagger docs are located at the root localhost:5000/.

Generate a Data Model Diagram

An ERD (entity relation diagram) may be found in the docs/ directory, or may be produced for changes to the data schema. To do so requires the ERAlchemy library.

Unfortunately the original source code currently has a bug in it that causes cardinality labels to be drawn backwards (e.g. 1 to N vs N to 1), so you must install the following dev version which does not have that bug:

pip install -e git+git@github.com:msladecek/eralchemy.git@msladecek/switch-cardinality-labels#egg=eralchemy

This also requires GraphViz be installed as well as PyGraphViz. PyGraphViz may have trouble finding GraphViz, in which case, see this article.

Once dependencies are installed, run:

flask erd

A new diagram will be created at docs/erd.png.

Populating Development Database with mock data

to populate database run:

flask populate_db

to clear the database run:

flask clear_db

🚀 Deployment

Any commit to any non-master branch that passes tests and contains a Jenkinsfile in the root will be built and deployed to the dev environment.

Merges to master will be built and deployed to the QA environment once tests have passed.