Feature/la data #117

allentran · 2020-06-23T18:41:17Z

addresses #114

Motivation and context

Speeds up pruning and parsing SM trees data (avoids two brute force N^2 computations) and some initial parsing for other LA datasets.

What I did

LA city/county, city parsers for Matt Stiles data upto Beverly Hills (starting alphabetically)
geohash street line segments and tree lat/lon points, match on same geohash while decrementing geohash digit precision (order(N) calculation)
use GeoPandas sjoin() to do a spatial join based on the predicate contains (uses a RTree to avoid doing the N^2 calculation)

Bump node version

use geopandas/rtree capability to do a spatial join for the pruning schedules

emillipede · 2020-06-25T02:36:36Z

data/stiles_data/parse_la_data.py

+
+    def get_maximal_df(self):
+        df = super().get_maximal_df()
+        df = df.assign(


question from @daniellezegelstein @Cdower @cajaks2

why is the variable df reassigned in every function? why isn't the variable renamed? @allentran

There is a parent class CityParser that does some useful things common to all cities.

Each cities actual parser is a subclass of CityParser, e.g BeverlyHillsParser. Each subclass reimplements/overloads (I forget the correct term) the method get_maximal_df() to do all the custom things required for that city. The name df refers to dataframe and each operation is just adding/changing stuff to that dataframe. I don't know if this is understood, but each get_maximal_df() is a different implementation specific to the city. (This is a common pattern in object oriented programming).

You could rename it but why would you when you're just operating on the same dataframe. I could rename every single instance of the dataframe, e.g df1, df2, ..., dfn but 1) since a lot of Pandas operations are not in-place, I'd be wasting memory and 2) even more confusing to someone reading the code as then I start to think why would the engineer create N different copies of the dataframe if they didn't need them. If I just see one variable/dataframe thats being operated on, the intent is clear (even without documentation) that the code just does stuff to a single dataframe and then returns it.

Cdower · 2020-07-23T02:37:43Z

pruning_planting.py


-import numpy as np
+import geohash


Which geohash library is this? needs to be added to environment.yml.

emillipede and others added 6 commits June 6, 2020 06:46

Merge pull request #111 from Public-Tree-Map/bump-node-version

19b7f89

Bump node version

la city/county and agoura hills parsers

09dd6e7

parsers upto all the As

b62a64e

bellflower and bell gardens

4cfcadc

beverly hills and raise error if missing category for cat parser

cc2e959

geohash to find streets (removes n^2 computations),

06a196e

use geopandas/rtree capability to do a spatial join for the pruning schedules

allentran requested a review from emillipede June 23, 2020 18:46

emillipede added this to for review in santa monica tree map Jun 24, 2020

emillipede reviewed Jun 25, 2020

View reviewed changes

emillipede mentioned this pull request Jul 11, 2020

implement choropleth map / marker clustering (by city) Public-Tree-Map/public-tree-map#301

Open

emillipede added this to the la county expansion milestone Jul 22, 2020

emillipede merged commit 454606e into test-circleci Jul 23, 2020

santa monica tree map automation moved this from for review to done Jul 23, 2020

Cdower reviewed Jul 23, 2020

View reviewed changes

pruning_planting.py

import numpy as np

import geohash

Copy link

Contributor

Cdower Jul 23, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Which geohash library is this? needs to be added to environment.yml.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feature/la data #117

Feature/la data #117

allentran commented Jun 23, 2020

emillipede Jun 25, 2020

allentran Jun 30, 2020 •

edited

Loading

Cdower Jul 23, 2020

Feature/la data #117

Feature/la data #117

Conversation

allentran commented Jun 23, 2020

Motivation and context

What I did

emillipede Jun 25, 2020

Choose a reason for hiding this comment

allentran Jun 30, 2020 • edited Loading

Choose a reason for hiding this comment

Cdower Jul 23, 2020

Choose a reason for hiding this comment

allentran Jun 30, 2020 •

edited

Loading