DAP Council Backend ReadMe

Important urls

App: portal.displacementalert.org
Staging app: staging.portal.displacementalert.org
Api: api.displacementalert.org
api docs: api.displacementalert.org/docs
tasks: tasks.displacementalert.org

Installation

Install docker https://docs.docker.com/install/linux/docker-ce/ubuntu/#install-docker-ce and docker compose sudo apt-get install docker-compose
Install git, clone repo

Restarting Server (Not Database Content)

PRODUCTION RESTART

ssh in via ssh anhd@138.197.79.10 in terminal (Make sure your device is whitelisted with digitalocean)
cd /var/www/anhd-council-backend
sudo docker-compose -f docker-compose.yml -f docker-compose.prod.yml up -d Note: May need to rebuild/redeploy app (instead of the above restart command) if model changes have been made

DEV RESTART

In terminal, navigate to the 'anhd-council-backend' root folder on your local device and type: sudo docker-compose -f docker-compose.yml -f docker-compose.dev.yml up -d The dev restart progress should also appear in docker (you'll be able to see the resources restart, etc) (Build with sh build.dev.sh). A migration typically will only need to be done if a dataset structure has changed (ie. new fields added or removed) Note: May need to rebuild app (instead of the above restart command) if model changes have been made (sudo sh build.dev.sh)

DEV REBUILD

Run: sudo sh build.dev.sh

(May have to restart dev after rebuild if DB hasn't finished loading)

Production / Dev Startup

Clone repo
Get .env file from dev.
Run build script sh build.prod.sh or sudo sh build.dev.sh depending on your environment

DEV: If the build fails with an error regarding the database starting up, you may run sudo docker-compose -f docker-compose.yml -f docker-compose.dev.yml up -d to restart it
(first time startup) Shell into app container docker exec -i -t app /bin/bash and

create super user python manage.py createsuperuser (NOTE: check your email because the app will auto-generate a password despite you creating one in the wizard)
seed datasets python manage.py loaddata /app/core/fixtures/datasets.yaml
seed crontabs python manage.py loaddata /app/core/fixtures/crontabs.yaml
seed automation tasks python manage.py loaddata /app/core/fixtures/tasks.yaml

Upload initial datafiles and update (or download the pre-seeded database .tar from here: https://www.dropbox.com/s/lxdzcjkoezsn086/dap_council_pgvol1.tar?dl=0)

councils
pluto properties
buildings
padrecord
hpdbuildings
tax liens
coredata
public housing data
taxbills
j51 data
421a data

Development Setup (after cloning this repo)

run sh build.dev.sh
Download a pre-seeded database from dropbox here to move it to project root: https://www.dropbox.com/s/8iqkuk0ip39mtle/dap.gz?dl=0 This database comes with all the councils, communities, properties, buildings, address records, and subsidy programs pre-loaded.
Run this command to copy the data - gzip -d dap.gz && cat dap | docker exec -i postgres psql -U anhd -d anhd
If the site does not run as is, run docker exec -it app /bin/bash to connect to the running docker container, and then run python manage.py migrate

Migrations

To add a migration, run docker exec -it app /bin/bash and then run python manage.py makemigrations

Dev Startup (post setup)

After setting up the dev environment you can always restart it with docker-compose -f docker-compose.yml -f docker-compose.dev.yml up -d however you may want to have a non-dockerized and non-daemonized version of the app running for debugging purposes. (Note: PDB debugging is possible if you attach to the app container w/ docker attach app )
(optional) To detach for local debugging, stop the docker app docker-compose stop app
(optional) If the app is ejected, you'll need to eject the celery workers too if you plan on using them: docker-compose stop app celery_default celery_update.
(optional) start the celery_update worker manually with the shell script sh celery1.sh
(optional) start the celery_default worker manually with the shell script sh celery2.sh
(optional) start the app in terminal python manage.py runserver
Reset cache at: http://localhost:8000/admin/django_celery_beat/periodictask/

You can view logs in production with docker-compose logs -f app

To add environmental variables into running workers, refer to https://stackoverflow.com/questions/27812548/how-to-set-an-environment-variable-in-a-running-docker-container

docker exec -i CONTAINER_ID /bin/bash -c "export VAR1=VAL1 && export VAR2=VAL2 && your_cmd"

Continuous deployment

the production branch is master
Run this remote task to update the production server.
Updating the server will interrupt any running workers and clear the redis cache. Keep this in mind if any long running tasks are currently running.

IMPORTANT NOTE: Please do not deploy while any tasks are in progress. You can check the status at https://tasks.displacementalert.org/. If an update must be done, you may revoke the dataset updates in progress - but note that if it's an automated updated - ie. a monthly periodic task - if it will not run again until the following day/month, etc - or manually ran under periodic tasks. As well, if the seeding has already began, you may need to clear the "API LAST CHECKED" value for that dataset in the production database, so it tries to re-import the data (otherwise if API date is same to current data being imported, it may skip the import).

sh deploy.sh

or if already SSHed inside, Run the build script sh build.prod.sh

Note: If deploy runs fast, the db may need time to start up. You can also restart the server (ie on dev sudo docker-compose -f docker-compose.yml -f docker-compose.dev.yml up -d after deployment)

cache is preserved on deploy

Maintaining this App

3rd Party Services

Rollbar - account through anhd github auth.

Opening a live shell

ssh into the server
open a shell into the container - sudo docker exec -it app /bin/bash
open a django shell - python manage.py shell
close the shell when finished (important!!) with exit

Adding new async tasks

You can load tasks with python manage.py loaddata crontabs and python manage.py loaddata tasks or add them manually in the periodic tasks section of the admin panel.

I recommend going through the admin panel and adding it in manually because rewriting all the tasks with a new bulk upload using the loaddata command has run into problems. Please also make sure to add them to the cron yaml if you add them manually to the backend.

Manually triggering tasks

Login to admin
go to https://api.displacementalert.org/admin/django_celery_beat/periodictask/
Select the task
in the "action" dropdown, select "run selected tasks" and click "go".
monitor tasks in flower https://tasks.displacementalert.org (or localhost:8888 on dev)

Updating Pluto or PAD

Updating either one of these will leave the old entries intact and overwrite any existing entries that conflict with the new entries. To update these records, create an update in the admin panel using the appropriate file containing the new data (Property will be an automated update, and AddressRecord will not need a file). Update one of these at a time - not all at once. Wait for one to be successful before going to the next.

If you update Pluto / Property, you also need to update Buildings, PADRecord, and AddressRecord to make sure all the data for the frontend gets surfaced.

Update Property with Pluto (NOT MAPPluto) data (automated)
Update Building with PAD dataset
Update PADRecord with PAD dataset (same file as Building)
Update AddressRecord (no file needed - create manual update in the api panel for AddressRecord and don't choose a file)

Some of the automated, backend steps involved in the AddressRecord update include:

Database - Batch upserts completed for AddressRecord.
Building search index...
Updating address search vector: AddressRecord
Address Record seeding complete!
Deleting older records...

The celery_update log may show errors about missing BBL or duplicate key values (ie. [06/Dec/2023 21:49:54] ERROR [app:323] Database Error * - unable to upsert single record. Error: duplicate key value violates unique constraint "datasets_addressrecord_bbl_number_street_b8e5351f_uniq" DETAIL: Key (bbl, number, street)=(3049360053, 1177, Brooklyn Avenue) already exists.) - and also not show any progress in the ADMIN panel. This is expected based on how the app was built. The entire Address update takes 2-4 hours.

Downloading an open-data file to your local computer

Login to admin
go to https://api.displacementalert.org/admin/core/dataset/
select a dataset
click "Download CSV"

Manually updating datasets

If the dataset is automated,

Login to admin
navigate to https://api.displacementalert.org/admin/core/dataset/
Select an automated dataset
Click the "Update Dataset" button, which will run the normally automated task on command.

If the dataset ISN'T automated, you need to download the file to your local computer, upload & associate it to a dataset manually, then create an update manually. Each model file has a link to the download endpoint. One exception to this is the AddressRecords dataset - which is built off of Properties, Buildings, and PAD Record.

Login to admin
navigate to https://api.displacementalert.org/admin/core/update/
Click "Add update"
Click the green "+" icon in the "File" field.
In the popup window, upload your file where is says "choose file"
Select the dataset to associate thi file with in the "Dataset" field.
Click "Save" and monitor its progress in flower https://tasks.displacementalert.org

Manually updating property shark data

Download the monthly pre-foreclosures from property shark and manually upload it via admin associating it with the PSPreforeclosure dataset.
Download the monthly foreclosure auctions from property shark and manually upload it via admin associating it with the PSForeclosure dataset.

Building the address table

Whenever you update Pluto or PAD, you'll need to update the address records to make the new properties searchable. Updating the address records will delete all address records and seed new ones from the existing property and building records within an atomic transaction, meaning if it fails, the old records will be preserved. This runs within an atomic transaction, so there will be no interruption to the live address data while this is happening.

To do so, create an update within the admin panel with only the dataset attribute selected, and set it to AddressRecord.

This process requires around 6GB of available RAM due to performing an atomic transaction in the DB. Existing address records will be stored in memory while the new records populate to ensure continuous operation of the search feature while the process takes place over several hours. The existing records will only be deleted once the process is complete. Because of this, please restart the app and postgres containers in docker first to clear up memory usage from long-lived workers. (ssh in and run sh build.prod.sh to clear memory)

Caveats:

Best to run Property, Building, PADRecord, and AddressRecord updates around noon so they finish before 7pm (which is the when daily updates start.)
Space out the updates by a day. (property 1 day, building + pad on day 2, address on day 3)

Maintaining the daily cache.

Every night at around 1am (at the time of this writing) a task runs which caches ALL of the community and council district endpoints that serve the property data to District Dashboard in the frontend. The file which runs this task is in cache.py.

This script uses a unique token for authentication to cache both the authenticated and unauthenticated responses.

It visits each GET endpoint that the frontend calls when users visit this page, so if the client ever changes this endpoint, make sure to also update the endpoint in cache.py

Here's an example:

If you want "All properties with 10 HPD violations after 2018/01/01 AND EITHER (10 DOB violations after 2018/01/01 OR 10 ECB violations after 2018/01/01)":

The query string would look like this:

localhost:3000/properties?q=*condition_0=AND filter_0=condition_1 filter_1=hpdviolations__approveddate__gte=2018-01-01,hpdviolations__count__gte=10 *condition_1=OR filter_0=dobviolations__issueddate__gte=2018-01-01,dobviolations__count__gte=10 filter_1=ecbviolations__issueddate__gte=2018-01-01,ecbviolations__count__gte=10

Let's break this down first.

This query has 2 conditions. An AND condition (HPD Violations AND...) and an OR condition (DOB violations OR ECB violations.)
Each filter ("10 HPD violations after 2018/01/01" is a filter) has 2 parameters (after 2018/01/01 is a parameter, and >= 10 is a parameter).
The first condition (the AND condition) has a single nested condition (the OR condition is nested inside it).

With these in mind, this is how you start defining a new condition in the query string:

*condition_0=AND - define the TYPE and give it a unique id ("0")
The first condition MUST have a unique ID of "0", but all subsequent conditions can have any unique ID you want.
Next, each filter is separated with a SPACE
In this case, a nested condition is assigned as a filter. In this example, filter_0=condition_1 references condition_1 (the unique ID here is "1" but it can be anything as long as it's referenced correctly.)
Then, the last filter of this condition is added with each parameter separated by a COMMA like so: filter_1=hpdviolations__approveddate__gte=2018-01-01,hpdviolations__count__gte=10
The parameters are raw django query language. Reference
When condition_0's expression is complete, you can begin the next condition's expression after a SPACE using the same format.

Please view the test suite PropertyAdvancedFilterTests in datasets/tests/filters/test_property.py. There are numerous examples of this language that cover all of the special cases and advanced query types. This feature was very well tested!

Debugging

attach to app with docker attach app
use PDB to create a breakpoint: import pdb; pdb.set_trace()

Running tests

bash into the app docker exec -it app bash
run python manage.py test

Database Dumps

To create a database dump, run the following at the directory root (/var/www/anhd-council-backend)

docker exec -t postgres pg_dump --column-inserts -v -t datasets_council -t datasets_community -t datasets_stateassembly -t datasets_statesenate -t datasets_zipcode -t datasets_coresubsidyrecord -t datasets_property -t datasets_building -t datasets_padrecord -t datasets_addressrecord -t datasets_publichousingrecord -t django_celery_beat* -t core_dataset -c -U anhd | gzip > dap.gz

Then SFTP in and transfer the file locally and DELETE from the production server - it's a big file!

restore it with gzip -d dap.gz && cat dap | docker exec -i postgres psql -U anhd -d anhd on your local machine at repo root
Be sure to to create a superuser (python manage.py createsuperuser inside the app docker container)

docker exec postgres pg_dump -U anhd anhd -t datasets_council > dap.sql

docker exec -t postgres pg_dump --column-inserts -v -t datasets_council -c -U anhd | gzip > dap.gz

gzip -d dap.gz && cat dap | docker exec -i postgres psql -U anhd -d anhd

CRON / Periodic Tasks Not Running

If the Flower Periodic Tasks fail to automatically run, like the nightly cache reset or any automatic updates:

1. log into the droplet / remote server via terminal or digitalocean console
1. delete the celerybeat PID file from its backend folder
1. redeploy the backend

Viewing the OCA Housing Raw Data (As of 8/15/23)

If you need to view the two files that are being joined to update the OCA Housing Dataset, here are the instructions:
- After installing Amazon CLI, run "aws configure" in your command line, typing in the credentials from the env.
- It will prompt you for the following: AWS Access Key ID - Enter your OCA_AWS_SECRET_KEY_ID.
  AWS Secret Access Key - Enter your OCA_AWS_SECRET_ACCESS_KEY. Default region name - Default. Default output format - You can leave this as the default (blank)
  - Download the Files by directly accessing the buckets: You can use the following commands to download to the current directory you're in on your local device (make sure it's not the app's directory or it may add to the repo) aws s3 cp s3://BUCKET_NAME/public/oca_addresses_with_bbl.csv . aws s3 cp s3://BUCKET_NAME/public/oca_index.csv .
    If you see a table like this with TCP as 8000 set, run sudo service nginx stop to stop NGINX. It should redeploy dockered automatically when the app is created and run. You may then proceed with rebuilding the app.
    
    I'm getting python migration errors after table changes and cannot build the app. What do I do?
    
    WARNING: Make sure you have a backup before doing these steps.
    1. If you're 100% sure that you need to skip certain migration steps (like if they're already been done) you may navigate to the specific migration causing the issue (ie. root@anhdnyc:/var/www/anhd-council-backend# nano ./datasets/migrations/0112_auto_20230822_2217.py ) and edit the migration file.
    After this, re-run docker exec -it app /bin/bash in the root of the project and then re-create the migration: ie. root@anhdnyc:/var/www/anhd-council-backend# docker exec -it app /bin/bash root@7fd4271bcdd8:/app# python manage.py makemigrations Then re-run sh build.prod.sh Alternatively, you may skip the entire migration (not recommended) by logging into the postgres db (via docker exec) and faking that migration as complete. ie python manage.py migrate --fake datasets 0118_delete_hpdproblem
    
    How can I see the network usage?
    - sudo iftop can be run in the DigitalOcean console or locally
    - To see what is using which sockets, you can do: sudo netstat -tulpn
    How can I view what system resources are running on the app?
    - To check the usage of a specific CPU core mpstat -P ALL 1 - which continuously updates CPU usage data every second
      (To exit while the command is running, press Ctrl + C on your keyboard)
    - Identify High CPU Usage Processes:ps -eo pid,psr,comm,%cpu,%mem --sort=-%cpu
      - To investigate processes further, you can use the following command in your terminal, using the PIDs of the tasks you'd like to investigate in place of 25508 and 748 -> ps -p 25508,748 -o %cpu,%mem,cmd
    - To get more real-time data, use: vmstat 1
    Upon restarting the anhd production server, re-deployment is failing with error "ERROR: for app Cannot start service app: driver failed programming external connectivity on endpoint app (etc): Error starting userland proxy: listen tcp 0.0.0.0:8000: bind: address already in use, ERROR: Encountered errors while bringing up the project.
    
    This is LIKELY because NGINX is already running upon system boot or a prior docker image/conatiner is running.
    1. Log into the server via SSH: Ssh anhd@45.55.44.160
    2. Check if NGINX already running is the issue sudo lsof -i :8000 A. If it is running, stop NGINX (It will start up again during deployment)
    3. If that didn't resolve it, while still in the server, delete all current images (no data will be lost) `docker rm -f $(docker ps -aq) (Please make sure Database YAML files are up to date prior to this, or it could alter data)

Name		Name	Last commit message	Last commit date
Latest commit History 2,080 Commits
app		app
celery		celery
core		core
data		data
datasets		datasets
logs		logs
nginx		nginx
static		static
templates		templates
users		users
.anhd_backup.sql.swp		.anhd_backup.sql.swp
.dockerignore		.dockerignore
.env.example		.env.example
.gitignore		.gitignore
.travis.yml		.travis.yml
CHANGELOG.md		CHANGELOG.md
Dockerfile		Dockerfile
Pipfile		Pipfile
Pipfile.lock		Pipfile.lock
README.md		README.md
archive		archive
build.dev.sh		build.dev.sh
build.prod.sh		build.prod.sh
cat		cat
celery1.sh		celery1.sh
celery2.sh		celery2.sh
clear_cache.sh		clear_cache.sh
deploy.sh		deploy.sh
deploy_full.sh		deploy_full.sh
deploy_nginx.sh		deploy_nginx.sh
docker-compose.dev.yml		docker-compose.dev.yml
docker-compose.prod.yml		docker-compose.prod.yml
docker-compose.yml		docker-compose.yml
docker-entrypoint.dev.sh		docker-entrypoint.dev.sh
docker-entrypoint.sh		docker-entrypoint.sh
down.dev.sh		down.dev.sh
group.py		group.py
loaderio-0360d98d930444229b3836df467f32e9.txt		loaderio-0360d98d930444229b3836df467f32e9.txt
logs.dev.sh		logs.dev.sh
makefile		makefile
manage.py		manage.py
recreate.prod.sh		recreate.prod.sh
redis.conf		redis.conf
restart.dev.sh		restart.dev.sh
start.dev.sh		start.dev.sh
transferring		transferring
uwsgi_params		uwsgi_params

ANHD-NYC-CODE/anhd-council-backend

Folders and files

Latest commit

History

Repository files navigation