- App:
portal.displacementalert.org
- Staging app:
staging.portal.displacementalert.org
- Api:
api.displacementalert.org
- api docs:
api.displacementalert.org/docs
- tasks:
tasks.displacementalert.org
- Install docker https://docs.docker.com/install/linux/docker-ce/ubuntu/#install-docker-ce and docker compose
sudo apt-get install docker-compose
- Install git, clone repo
- ssh in via
ssh anhd@138.197.79.10
in terminal (Make sure your device is whitelisted with digitalocean) cd /var/www/anhd-council-backend
sudo docker-compose -f docker-compose.yml -f docker-compose.prod.yml up -d
Note: May need to rebuild/redeploy app (instead of the above restart command) if model changes have been made
In terminal, navigate to the 'anhd-council-backend' root folder on your local device and type:
sudo docker-compose -f docker-compose.yml -f docker-compose.dev.yml up -d
The dev restart progress should also appear in docker (you'll be able to see the resources restart, etc)
(Build with sh build.dev.sh). A migration typically will only need to be done if a dataset structure has changed (ie. new fields added or removed)
Note: May need to rebuild app (instead of the above restart command) if model changes have been made (sudo sh build.dev.sh
)
Run: sudo sh build.dev.sh
(May have to restart dev after rebuild if DB hasn't finished loading)
-
Clone repo
-
Get
.env
file from dev. -
Run build script
sh build.prod.sh
orsudo sh build.dev.sh
depending on your environmentDEV: If the build fails with an error regarding the database starting up, you may run
sudo docker-compose -f docker-compose.yml -f docker-compose.dev.yml up -d
to restart it -
(first time startup) Shell into app container
docker exec -i -t app /bin/bash
and
- create super user
python manage.py createsuperuser
(NOTE: check your email because the app will auto-generate a password despite you creating one in the wizard) - seed datasets
python manage.py loaddata /app/core/fixtures/datasets.yaml
- seed crontabs
python manage.py loaddata /app/core/fixtures/crontabs.yaml
- seed automation tasks
python manage.py loaddata /app/core/fixtures/tasks.yaml
- Upload initial datafiles and update (or download the pre-seeded database
.tar
from here: https://www.dropbox.com/s/lxdzcjkoezsn086/dap_council_pgvol1.tar?dl=0)
- councils
- pluto properties
- buildings
- padrecord
- hpdbuildings
- tax liens
- coredata
- public housing data
- taxbills
- j51 data
- 421a data
- run
sh build.dev.sh
- Download a pre-seeded database from dropbox here to move it to project root: https://www.dropbox.com/s/8iqkuk0ip39mtle/dap.gz?dl=0 This database comes with all the councils, communities, properties, buildings, address records, and subsidy programs pre-loaded.
- Run this command to copy the data -
gzip -d dap.gz && cat dap | docker exec -i postgres psql -U anhd -d anhd
- If the site does not run as is, run
docker exec -it app /bin/bash
to connect to the running docker container, and then runpython manage.py migrate
To add a migration, run docker exec -it app /bin/bash
and then run python manage.py makemigrations
- After setting up the dev environment you can always restart it with
docker-compose -f docker-compose.yml -f docker-compose.dev.yml up -d
however you may want to have a non-dockerized and non-daemonized version of the app running for debugging purposes. (Note: PDB debugging is possible if you attach to the app container w/docker attach app
) - (optional) To detach for local debugging, stop the docker app
docker-compose stop app
- (optional) If the app is ejected, you'll need to eject the celery workers too if you plan on using them:
docker-compose stop app celery_default celery_update
. - (optional) start the
celery_update
worker manually with the shell scriptsh celery1.sh
- (optional) start the
celery_default
worker manually with the shell scriptsh celery2.sh
- (optional) start the app in terminal
python manage.py runserver
- Reset cache at: http://localhost:8000/admin/django_celery_beat/periodictask/
You can view logs in production with docker-compose logs -f app
To add environmental variables into running workers, refer to https://stackoverflow.com/questions/27812548/how-to-set-an-environment-variable-in-a-running-docker-container
docker exec -i CONTAINER_ID /bin/bash -c "export VAR1=VAL1 && export VAR2=VAL2 && your_cmd"
- the production branch is
master
- Run this remote task to update the production server.
- Updating the server will interrupt any running workers and clear the redis cache. Keep this in mind if any long running tasks are currently running.
IMPORTANT NOTE: Please do not deploy while any tasks are in progress. You can check the status at https://tasks.displacementalert.org/. If an update must be done, you may revoke the dataset updates in progress - but note that if it's an automated updated - ie. a monthly periodic task - if it will not run again until the following day/month, etc - or manually ran under periodic tasks. As well, if the seeding has already began, you may need to clear the "API LAST CHECKED" value for that dataset in the production database, so it tries to re-import the data (otherwise if API date is same to current data being imported, it may skip the import).
sh deploy.sh
- or if already SSHed inside, Run the build script
sh build.prod.sh
Note: If deploy runs fast, the db may need time to start up. You can also restart the server (ie on dev sudo docker-compose -f docker-compose.yml -f docker-compose.dev.yml up -d
after deployment)
- cache is preserved on deploy
- Rollbar - account through anhd github auth.
- ssh into the server
- open a shell into the container -
sudo docker exec -it app /bin/bash
- open a django shell -
python manage.py shell
- close the shell when finished (important!!) with
exit
You can load tasks with python manage.py loaddata crontabs
and python manage.py loaddata tasks
or add them manually in the periodic tasks
section of the admin panel.
I recommend going through the admin panel and adding it in manually because rewriting all the tasks with a new bulk upload using the loaddata
command has run into problems. Please also make sure to add them to the cron yaml if you add them manually to the backend.
- Login to admin
- go to https://api.displacementalert.org/admin/django_celery_beat/periodictask/
- Select the task
- in the "action" dropdown, select "run selected tasks" and click "go".
- monitor tasks in flower https://tasks.displacementalert.org (or localhost:8888 on dev)
Updating either one of these will leave the old entries intact and overwrite any existing entries that conflict with the new entries. To update these records, create an update in the admin panel using the appropriate file containing the new data (Property will be an automated update, and AddressRecord will not need a file). Update one of these at a time - not all at once. Wait for one to be successful before going to the next.
If you update Pluto / Property
, you also need to update Buildings
, PADRecord
, and AddressRecord
to make sure all the data for the frontend gets surfaced.
- Update
Property
with Pluto (NOT MAPPluto) data (automated) - Update
Building
with PAD dataset - Update
PADRecord
with PAD dataset (same file as Building) - Update
AddressRecord
(no file needed - create manual update in the api panel for AddressRecord and don't choose a file)
Some of the automated, backend steps involved in the AddressRecord update include:
- Database - Batch upserts completed for AddressRecord.
- Building search index...
- Updating address search vector: AddressRecord
- Address Record seeding complete!
- Deleting older records...
The celery_update log may show errors about missing BBL or duplicate key values (ie. [06/Dec/2023 21:49:54] ERROR [app:323] Database Error * - unable to upsert single record. Error: duplicate key value violates unique constraint "datasets_addressrecord_bbl_number_street_b8e5351f_uniq" DETAIL: Key (bbl, number, street)=(3049360053, 1177, Brooklyn Avenue) already exists.
) - and also not show any progress in the ADMIN panel. This is expected based on how the app was built. The entire Address update takes 2-4 hours.
- Login to admin
- go to https://api.displacementalert.org/admin/core/dataset/
- select a dataset
- click "Download CSV"
If the dataset is automated,
- Login to admin
- navigate to https://api.displacementalert.org/admin/core/dataset/
- Select an automated dataset
- Click the "Update Dataset" button, which will run the normally automated task on command.
If the dataset ISN'T automated, you need to download the file to your local computer, upload & associate it to a dataset manually, then create an update manually. Each model file has a link to the download endpoint. One exception to this is the AddressRecords dataset - which is built off of Properties, Buildings, and PAD Record.
- Login to admin
- navigate to https://api.displacementalert.org/admin/core/update/
- Click "Add update"
- Click the green "+" icon in the "File" field.
- In the popup window, upload your file where is says "choose file"
- Select the dataset to associate thi file with in the "Dataset" field.
- Click "Save" and monitor its progress in flower https://tasks.displacementalert.org
- Download the monthly pre-foreclosures from property shark and manually upload it via admin associating it with the PSPreforeclosure dataset.
- Download the monthly foreclosure auctions from property shark and manually upload it via admin associating it with the PSForeclosure dataset.
Whenever you update Pluto or PAD, you'll need to update the address records to make the new properties searchable. Updating the address records will delete all address records and seed new ones from the existing property and building records within an atomic transaction, meaning if it fails, the old records will be preserved. This runs within an atomic transaction, so there will be no interruption to the live address data while this is happening.
To do so, create an update within the admin panel with only the dataset attribute selected, and set it to AddressRecord
.
This process requires around 6GB of available RAM due to performing an atomic transaction in the DB. Existing address records will be stored in memory while the new records populate to ensure continuous operation of the search feature while the process takes place over several hours. The existing records will only be deleted once the process is complete. Because of this, please restart the app and postgres containers in docker first to clear up memory usage from long-lived workers. (ssh in and run sh build.prod.sh
to clear memory)
Caveats:
- Best to run
Property
,Building
,PADRecord
, andAddressRecord
updates around noon so they finish before 7pm (which is the when daily updates start.) - Space out the updates by a day. (property 1 day, building + pad on day 2, address on day 3)
Every night at around 1am (at the time of this writing) a task runs which caches ALL of the community and council district endpoints that serve the property data to District Dashboard in the frontend. The file which runs this task is in cache.py
.
This script uses a unique token for authentication to cache both the authenticated and unauthenticated responses.
It visits each GET endpoint that the frontend calls when users visit this page, so if the client ever changes this endpoint, make sure to also update the endpoint in cache.py
If you want "All properties with 10 HPD violations after 2018/01/01 AND EITHER (10 DOB violations after 2018/01/01 OR 10 ECB violations after 2018/01/01)"
:
The query string would look like this:
localhost:3000/properties?q=*condition_0=AND filter_0=condition_1 filter_1=hpdviolations__approveddate__gte=2018-01-01,hpdviolations__count__gte=10 *condition_1=OR filter_0=dobviolations__issueddate__gte=2018-01-01,dobviolations__count__gte=10 filter_1=ecbviolations__issueddate__gte=2018-01-01,ecbviolations__count__gte=10
Let's break this down first.
- This query has 2
conditions
. AnAND
condition (HPD Violations AND...) and anOR
condition (DOB violations OR ECB violations.) - Each
filter
("10 HPD violations after 2018/01/01" is afilter
) has 2parameters
(after 2018/01/01 is a parameter, and >= 10 is a parameter). - The first condition (the
AND
condition) has a single nested condition (theOR
condition is nested inside it).
With these in mind, this is how you start defining a new condition in the query string:
*condition_0=AND
- define the TYPE and give it a unique id ("0")- The first condition MUST have a unique ID of "0", but all subsequent conditions can have any unique ID you want.
- Next, each
filter
is separated with aSPACE
- In this case, a
nested condition
is assigned as afilter
. In this example,filter_0=condition_1
referencescondition_1
(the unique ID here is "1" but it can be anything as long as it's referenced correctly.) - Then, the last filter of this condition is added with each
parameter
separated by aCOMMA
like so:filter_1=hpdviolations__approveddate__gte=2018-01-01,hpdviolations__count__gte=10
- The parameters are
raw django query language
. Reference - When condition_0's expression is complete, you can begin the next condition's expression after a
SPACE
using the same format.
Please view the test suite PropertyAdvancedFilterTests
in datasets/tests/filters/test_property.py
. There are numerous examples of this language that cover all of the special cases and advanced query types. This feature was very well tested!
- attach to app with
docker attach app
- use PDB to create a breakpoint:
import pdb; pdb.set_trace()
- bash into the app
docker exec -it app bash
- run
python manage.py test
To create a database dump, run the following at the directory root (/var/www/anhd-council-backend)
docker exec -t postgres pg_dump --column-inserts -v -t datasets_council -t datasets_community -t datasets_stateassembly -t datasets_statesenate -t datasets_zipcode -t datasets_coresubsidyrecord -t datasets_property -t datasets_building -t datasets_padrecord -t datasets_addressrecord -t datasets_publichousingrecord -t django_celery_beat* -t core_dataset -c -U anhd | gzip > dap.gz
Then SFTP in and transfer the file locally and DELETE from the production server - it's a big file!
- restore it with
gzip -d dap.gz && cat dap | docker exec -i postgres psql -U anhd -d anhd
on your local machine at repo root - Be sure to to create a superuser (
python manage.py createsuperuser
inside theapp
docker container)
docker exec postgres pg_dump -U anhd anhd -t datasets_council > dap.sql
docker exec -t postgres pg_dump --column-inserts -v -t datasets_council -c -U anhd | gzip > dap.gz
gzip -d dap.gz && cat dap | docker exec -i postgres psql -U anhd -d anhd
If the Flower Periodic Tasks fail to automatically run, like the nightly cache reset or any automatic updates:
-
- log into the droplet / remote server via terminal or digitalocean console
-
- delete the celerybeat PID file from its backend folder
-
- redeploy the backend
- If you need to view the two files that are being joined to update the OCA Housing Dataset, here are the instructions:
- After installing Amazon CLI, run "aws configure" in your command line, typing in the credentials from the env.
- It will prompt you for the following:
AWS Access Key ID - Enter your OCA_AWS_SECRET_KEY_ID.
AWS Secret Access Key - Enter your OCA_AWS_SECRET_ACCESS_KEY. Default region name - Default. Default output format - You can leave this as the default (blank)- Download the Files by directly accessing the buckets: You can use the following commands to download to the current directory you're in on your local device (make sure it's not the app's directory or it may add to the repo)
aws s3 cp s3://BUCKET_NAME/public/oca_addresses_with_bbl.csv .
aws s3 cp s3://BUCKET_NAME/public/oca_index.csv .
If you see a table like this with TCP as 8000 set, run
sudo service nginx stop
to stop NGINX. It should redeploy dockered automatically when the app is created and run. You may then proceed with rebuilding the app.WARNING: Make sure you have a backup before doing these steps.
- If you're 100% sure that you need to skip certain migration steps (like if they're already been done) you may navigate to the specific migration causing the issue (ie.
root@anhdnyc:/var/www/anhd-council-backend# nano ./datasets/migrations/0112_auto_20230822_2217.py
) and edit the migration file.
After this, re-run
docker exec -it app /bin/bash
in the root of the project and then re-create the migration: ie. root@anhdnyc:/var/www/anhd-council-backend#docker exec -it app /bin/bash
root@7fd4271bcdd8:/app# pythonmanage.py makemigrations
Then re-runsh build.prod.sh
Alternatively, you may skip the entire migration (not recommended) by logging into the postgres db (via docker exec) and faking that migration as complete. iepython manage.py migrate --fake datasets 0118_delete_hpdproblem
sudo iftop
can be run in the DigitalOcean console or locally- To see what is using which sockets, you can do:
sudo netstat -tulpn
- To check the usage of a specific CPU core
mpstat -P ALL 1
- which continuously updates CPU usage data every second
(To exit while the command is running, press Ctrl + C on your keyboard) - Identify High CPU Usage Processes:
ps -eo pid,psr,comm,%cpu,%mem --sort=-%cpu
- To investigate processes further, you can use the following command in your terminal, using the PIDs of the tasks you'd like to investigate in place of 25508 and 748 ->
ps -p 25508,748 -o %cpu,%mem,cmd
- To investigate processes further, you can use the following command in your terminal, using the PIDs of the tasks you'd like to investigate in place of 25508 and 748 ->
- To get more real-time data, use:
vmstat 1
Upon restarting the anhd production server, re-deployment is failing with error "ERROR: for app Cannot start service app: driver failed programming external connectivity on endpoint app (etc): Error starting userland proxy: listen tcp 0.0.0.0:8000: bind: address already in use, ERROR: Encountered errors while bringing up the project.
This is LIKELY because NGINX is already running upon system boot or a prior docker image/conatiner is running.
- Log into the server via SSH: Ssh anhd@45.55.44.160
- Check if NGINX already running is the issue
sudo lsof -i :8000
A. If it is running, stop NGINX (It will start up again during deployment) - If that didn't resolve it, while still in the server, delete all current images (no data will be lost) `docker rm -f $(docker ps -aq) (Please make sure Database YAML files are up to date prior to this, or it could alter data)
- If you're 100% sure that you need to skip certain migration steps (like if they're already been done) you may navigate to the specific migration causing the issue (ie.
- Download the Files by directly accessing the buckets: You can use the following commands to download to the current directory you're in on your local device (make sure it's not the app's directory or it may add to the repo)
aws s3 cp s3://BUCKET_NAME/public/oca_addresses_with_bbl.csv .
aws s3 cp s3://BUCKET_NAME/public/oca_index.csv .