Skip to content
This repository has been archived by the owner on Jan 4, 2022. It is now read-only.

Ideas for improving AATAMS' performance #72

Closed
jkburges opened this issue Feb 18, 2014 · 5 comments
Closed

Ideas for improving AATAMS' performance #72

jkburges opened this issue Feb 18, 2014 · 5 comments

Comments

@jkburges
Copy link
Contributor

Ref: #71

We should be able to speed things up and simplify the code a lot by implementing some of the following recommendations. My initial (but brief) prototyping has demonstrated that the following suggestions are feasible.

  1. Load detections in to DB directly using PostgresQL's COPY command
  2. Validate and join to deployments/recoveries on the fly
    • select from smaller table first, e.g. receiver, then join to detections - should be pretty fast, provided appropriate indexes are in place (see the examples in A couple of SQL examples which demonstrate the performance of querying d... #71)
    • removes the chance of data inconsistency, e.g when deployment info is added after detections, no need to rescanForDeployment
    • no need for the materialised view (and associated refresh, which is putting a lot of load on the DB)
@jkburges
Copy link
Contributor Author

jkburges commented Aug 8, 2014

An outage which just happened was possibly (likely) caused by materialised view refresh.

@jkburges
Copy link
Contributor Author

I'm not sure if I've made it clear enough in the past w.r.t. to the seriousness of this issue.

There are two main problems occurring as a result of the issues list above:

  1. data consistency - by having "join" tables and materialised views, the chances of having inconsistent data goes up immensely, and so this is related to the work @xhoenner is doing currently with the reconciliation;
  2. performance, reliability - the materialised view refresh is putting an enormous load on the system every night, and is in fact making the app unavailable while it is happening. With continued growth of the DB, this outage period is only going to increase, possibly to the point of where it starts overlapping with business hours.

@jkburges
Copy link
Contributor Author

jkburges commented Dec 1, 2014

I just prototyped (1) from above (loading detections using COPY).

I was able to load ~3.5M detections in around 11s (~300k records/s) - this is around 1500x as fast as the web app currently loads detections (including all the validation and what not).

FWIW, we could re-load every uploaded detection (~90M) in around 5 minutes this way :-)

@jkburges
Copy link
Contributor Author

jkburges commented Dec 1, 2014

To be fairer, I added a compound index and a primary key constraint. It takes now ~40s to load 3.5M records - still fast enough.

Table definition:

-- Table: raw_det

-- DROP TABLE raw_det;

CREATE TABLE raw_det
(
  "timestamp" timestamp with time zone,
  receiver_name text,
  transmitter_id text,
  id bigint NOT NULL,
  CONSTRAINT raw_det_pkey PRIMARY KEY (id)
)
WITH (
  OIDS=FALSE
);
ALTER TABLE raw_det
  OWNER TO aatams;

-- Index: pagination_index

-- DROP INDEX pagination_index;

CREATE INDEX pagination_index
  ON raw_det
  USING btree
  ("timestamp", receiver_name COLLATE pg_catalog."default", transmitter_id COLLATE pg_catalog."default");

COPY query (including some awk magic to add an id column, we would need to do a similar thing to add a receiver_download_file_id column):

COPY raw_det
FROM PROGRAM 'awk -F, ''{$(NF+1)=++i;}1'' OFS=, /tmp/big_dets.csv'
CSV HEADER DELIMITER ',';

@jkburges
Copy link
Contributor Author

Note that in 1), the file would have to exist on the same server as the DB. Possibly, rather than storing the file on disk, we could store it in the DB (as a BLOB or some such), and COPY from that.

This would also simplify deployment and backups somewhat (because only the DB has state, not DB + filesystem).

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

1 participant