Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HERE Data Refresh #285

Closed
3 of 6 tasks
radumas opened this issue Mar 21, 2020 · 5 comments
Closed
3 of 6 tasks

HERE Data Refresh #285

radumas opened this issue Mar 21, 2020 · 5 comments
Assignees
Labels

Comments

@radumas
Copy link
Member

radumas commented Mar 21, 2020

On March 24th the map is updated to 2019Q4 and data will be refreshed. Need to clarify how far back data will be refreshed. Assuming Jan 2019 for now.

  • Backup data from 2019-present to S3
  • Import new map
  • Compare similarities between both maps
  • Import refreshed 2020 data
  • Run comparison between new and old data
  • Import all refreshed data and port over.
@radumas radumas added the here label Mar 21, 2020
@radumas radumas self-assigned this Mar 21, 2020
@radumas
Copy link
Member Author

radumas commented Mar 21, 2020

created a Python function that will backup a partitioned table month to a zipped file in a temporary directory and then will copy that zipped file to s3

def backup_to_s3_month(con, month, s3):
    '''Backup a month of ta data to specified s3
    '''
    with tempfile.TemporaryDirectory() as tempdir:
        #backup month data to compressed file
        copy_query = sql.SQL("COPY here.{} TO STDOUT WITH (FORMAT 'csv', HEADER TRUE)")
        #compress results of the copy command to a file in the compressed directory
        os.chdir(tempdir)
        data_file_path = 'here_{}.csv.gz'.format(month)

        with gzip.GzipFile(filename=data_file_path, mode='w') as data_file:
            with con.cursor() as cur:
                cur.copy_expert(copy_query.format(sql.Identifier('ta_'+month)), data_file)      
        #copy file to s3 bucket
        subprocess.check_call(['aws','s3','cp',data_file_path, s3])

@radumas
Copy link
Member Author

radumas commented Mar 31, 2020

Comparing the difference in the number of links and the amount they match between versions of the map

Using routing streets

These are directional, filtered to actually be navigable by auto, and have some other processing for routing

SELECT COUNT(nu.link_dir) new_links, COUNT(ole.link_dir) ole_links,
SUM(CASE WHEN ole.link_dir = nu.link_dir THEN 1 ELSE 0 END) AS matches,
SUM(CASE WHEN ole.link_dir IS NOT NULL AND  nu.link_dir IS NULL THEN 1 ELSE 0 END) AS old_not_new,
SUM(CASE WHEN ole.link_dir IS NULL AND nu.link_dir IS NOT NULL THEN 1 ELSE 0 END) AS new_not_old
FROM here.routing_streets_18_3 ole
FULL OUTER JOIN  here.routing_streets_19_4 nu on nu.link_dir = ole.link_dir
new_links ole_links matches old_not_new new_not_old
140,765 120,727 108,446 12,281 32,319

Streets

If we look at all links however (not directional)

SELECT COUNT(nu.link_id) new_links, COUNT(ole.link_id) ole_links,
SUM(CASE WHEN ole.link_id = nu.link_id THEN 1 ELSE 0 END) AS matches,
SUM(CASE WHEN ole.link_id IS NOT NULL AND  nu.link_id IS NULL THEN 1 ELSE 0 END) AS old_not_new,
SUM(CASE WHEN ole.link_id IS NULL AND nu.link_id IS NOT NULL THEN 1 ELSE 0 END) AS new_not_old
FROM here_gis.streets_18_3 ole
FULL OUTER JOIN  here_gis.streets_19_4 nu on nu.link_id = ole.link_id
new_links ole_links matches old_not_new new_not_old
161,588 94,950 86,419 8,531 75,169

Traffic Streets

And if we look at "traffic streets", filtered to be paved and allowing cars and traffic

SELECT COUNT(nu.link_id) new_links, COUNT(ole.link_id) ole_links,
SUM(CASE WHEN ole.link_id = nu.link_id THEN 1 ELSE 0 END) AS matches,
SUM(CASE WHEN ole.link_id IS NOT NULL AND  nu.link_id IS NULL THEN 1 ELSE 0 END) AS old_not_new,
SUM(CASE WHEN ole.link_id IS NULL AND nu.link_id IS NOT NULL THEN 1 ELSE 0 END) AS new_not_old
FROM here_gis.traffic_streets_18_3 ole
FULL OUTER JOIN  here_gis.traffic_streets_19_4 nu on nu.link_id = ole.link_id
new_links ole_links matches old_not_new new_not_old
76,734 66,290 59,791 6,499 16,943

Traffic_streets taking into account direction of travel

WITH traffic_link_dirs_19_4 AS(
	SELECT link_id || 'F'::text AS link_dir
	FROM here_gis.traffic_streets_19_4
	WHERE dir_travel = ANY (ARRAY['F', 'B'])
	UNION ALL
	SELECT link_id || 'T'::text AS link_dir
	FROM here_gis.traffic_streets_19_4
	WHERE dir_travel = ANY (ARRAY['T', 'B'])
	)
	,traffic_link_dirs_18_3 AS(
	SELECT link_id || 'F'::text AS link_dir
	FROM here_gis.traffic_streets_18_3
	WHERE dir_travel = ANY (ARRAY['F', 'B'])
	UNION ALL
	SELECT link_id || 'T'::text AS link_dir
	FROM here_gis.traffic_streets_18_3
	WHERE dir_travel = ANY (ARRAY['T', 'B'])
	)
SELECT COUNT(nu.link_dir) new_links, COUNT(ole.link_dir) ole_links,
SUM(CASE WHEN ole.link_dir = nu.link_dir THEN 1 ELSE 0 END) AS matches,
SUM(CASE WHEN ole.link_dir IS NOT NULL AND  nu.link_dir IS NULL THEN 1 ELSE 0 END) AS old_not_new,
SUM(CASE WHEN ole.link_dir IS NULL AND nu.link_dir IS NOT NULL THEN 1 ELSE 0 END) AS new_not_old
FROM traffic_link_dirs_18_3 ole
FULL OUTER JOIN  traffic_link_dirs_19_4 nu on nu.link_dir = ole.link_dir
new_link_dirs ole_link_dirs matches old_not_new new_not_old
141,022 120,980 108,643 12,337 32,379

Nodes

I forgot that nodes are not unique on node_id, they also have a link_id field.

SELECT COUNT(nu.node_id) new_nodes, COUNT(ole.node_id) ole_nodes,
SUM(CASE WHEN ole.node_id = nu.node_id THEN 1 ELSE 0 END) AS matches,
SUM(CASE WHEN ole.node_id IS NOT NULL AND  nu.node_id IS NULL THEN 1 ELSE 0 END) AS old_not_new,
SUM(CASE WHEN ole.node_id IS NULL AND nu.node_id IS NOT NULL THEN 1 ELSE 0 END) AS new_not_old
FROM here_gis.zlevels_18_3 ole
FULL OUTER JOIN  here_gis.zlevels_19_4 nu on nu.node_id = ole.node_id AND nu.link_id = ole.link_id
WHERE (ole.node_id IS NOT NULL AND ole.intrsect = 'Y') OR
(nu.node_id IS NOT NULL AND nu.intrsect = 'Y');
new_nodes ole_nodes matches old_not_new new_not_old
322,883 189,695 172,367 17,328 150,516

@radumas
Copy link
Member Author

radumas commented Apr 9, 2020

2019 data

ALTER TABLE here.ta_201901 RENAME TO ta_201901_old; ALTER TABLE here.ta_201901_old NO INHERIT here.ta;
ALTER TABLE here.ta_201902 RENAME TO ta_201902_old; ALTER TABLE here.ta_201902_old NO INHERIT here.ta;
ALTER TABLE here.ta_201903 RENAME TO ta_201903_old; ALTER TABLE here.ta_201903_old NO INHERIT here.ta;
ALTER TABLE here.ta_201904 RENAME TO ta_201904_old; ALTER TABLE here.ta_201904_old NO INHERIT here.ta;
ALTER TABLE here.ta_201905 RENAME TO ta_201905_old; ALTER TABLE here.ta_201905_old NO INHERIT here.ta;
ALTER TABLE here.ta_201906 RENAME TO ta_201906_old; ALTER TABLE here.ta_201906_old NO INHERIT here.ta;
ALTER TABLE here.ta_201907 RENAME TO ta_201907_old; ALTER TABLE here.ta_201907_old NO INHERIT here.ta;
ALTER TABLE here.ta_201908 RENAME TO ta_201908_old; ALTER TABLE here.ta_201908_old NO INHERIT here.ta;
ALTER TABLE here.ta_201909 RENAME TO ta_201909_old; ALTER TABLE here.ta_201909_old NO INHERIT here.ta;
ALTER TABLE here.ta_201910 RENAME TO ta_201910_old; ALTER TABLE here.ta_201910_old NO INHERIT here.ta;
ALTER TABLE here.ta_201911 RENAME TO ta_201911_old; ALTER TABLE here.ta_201911_old NO INHERIT here.ta;
ALTER TABLE here.ta_201912 RENAME TO ta_201912_old; ALTER TABLE here.ta_201912_old NO INHERIT here.ta;
DO $do$
DECLARE
	startdate DATE;
	yyyymm TEXT;
	basetablename TEXT := 'ta_';
	tablename TEXT;
	yyyy INT := 2019;
BEGIN

		FOR mm IN 01..12 LOOP
			startdate:= to_date(yyyy||'-'||mm||'-01', 'YYYY-MM-DD');
			IF mm < 10 THEN
				yyyymm:= yyyy||'0'||mm;
			ELSE
				yyyymm:= yyyy||''||mm;
			END IF;
			tablename:= basetablename||yyyymm;
			EXECUTE format($$CREATE TABLE here.%I 
				(CHECK (tx >= DATE '$$||startdate ||$$'AND tx < DATE '$$||startdate ||$$'+ INTERVAL '1 month'),
				UNIQUE(link_dir, tx)
				) INHERITS (here.ta);
				ALTER TABLE here.%I OWNER TO here_admins;
				$$
				, tablename, tablename);
			PERFORM here.create_link_dir_idx(tablename);
			PERFORM here.create_tx_idx(tablename);
		END LOOP;
END;
$do$ LANGUAGE plpgsql

@radumas
Copy link
Member Author

radumas commented May 15, 2020

The routing streets issue was a simple bug 😭, edited the number in the table above ☝️ #285 (comment)

@radumas
Copy link
Member Author

radumas commented Jun 1, 2020

Hmm, if I create a routing table with ar_pedest = 'Y' the number of links for 19_4 is 314,762

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants