Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Onboarding new city: Brisbane #120

Closed
terryf82 opened this issue Jun 7, 2018 · 6 comments
Closed

Onboarding new city: Brisbane #120

terryf82 opened this issue Jun 7, 2018 · 6 comments

Comments

@terryf82
Copy link
Collaborator

terryf82 commented Jun 7, 2018

No description provided.

@terryf82 terryf82 created this issue from a note in v1.2 - tech upgrade (To do) Jun 7, 2018
@terryf82
Copy link
Collaborator Author

terryf82 commented Jun 7, 2018

I need to setup handling for cities that OSM doesn't have a polygon for (e.g Brisbane) and then test for additional breakage points in the pipeline.

@j-t-t I'll send you a P.R when this first part is setup and we can move forward from there.

@terryf82 terryf82 self-assigned this Jun 7, 2018
@terryf82 terryf82 moved this from To do to In progress in v1.2 - tech upgrade Jun 20, 2018
@terryf82 terryf82 moved this from In progress to On Hold in v1.2 - tech upgrade Jul 11, 2018
@j-t-t j-t-t removed this from On Hold in v1.2 - tech upgrade Aug 8, 2018
@terryf82 terryf82 mentioned this issue Aug 30, 2018
@terryf82
Copy link
Collaborator Author

terryf82 commented Sep 1, 2018

I've gone back to have another go at onboarding Brisbane.

When running the pipeline, I hit the following error as part of make_dataset.

It looks to me like the lambda function on line 144 of make_canon_dataset is trying to cast an invalid value (in this case "signals") as an integer, presumably so that it can make use of it in predictions?

I haven't been able to confirm this using basic debug strategies, but it feels like Brisbane might contain a segment that has invalid data. If this is the case, do we need to add in validation of the features data at some point in the process?

@j-t-t @bpben


Traceback (most recent call last):
File "/opt/conda/envs/crash-model/lib/python3.6/runpy.py", line 193, in _run_module_as_main
"main", mod_spec)
File "/opt/conda/envs/crash-model/lib/python3.6/runpy.py", line 85, in _run_code
exec(code, run_globals)
File "/app/src/features/make_canon_dataset.py", line 224, in
concerns=args.concern_info
File "/app/src/features/make_canon_dataset.py", line 144, in aggregate_roads
aggregated = aggregated.apply(lambda x: x.astype('int'))
File "/opt/conda/envs/crash-model/lib/python3.6/site-packages/pandas/core/frame.py", line 4877, in apply
ignore_failures=ignore_failures)
File "/opt/conda/envs/crash-model/lib/python3.6/site-packages/pandas/core/frame.py", line 4973, in _apply_standard
results[i] = func(v)
File "/app/src/features/make_canon_dataset.py", line 144, in
aggregated = aggregated.apply(lambda x: x.astype('int'))
File "/opt/conda/envs/crash-model/lib/python3.6/site-packages/pandas/util/_decorators.py", line 118, in wrapper
return func(*args, **kwargs)
File "/opt/conda/envs/crash-model/lib/python3.6/site-packages/pandas/core/generic.py", line 4004, in astype
**kwargs)
File "/opt/conda/envs/crash-model/lib/python3.6/site-packages/pandas/core/internals.py", line 3462, in astype
return self.apply('astype', dtype=dtype, **kwargs)
File "/opt/conda/envs/crash-model/lib/python3.6/site-packages/pandas/core/internals.py", line 3329, in apply
applied = getattr(b, f)(**kwargs)
File "/opt/conda/envs/crash-model/lib/python3.6/site-packages/pandas/core/internals.py", line 544, in astype
**kwargs)
File "/opt/conda/envs/crash-model/lib/python3.6/site-packages/pandas/core/internals.py", line 625, in _astype
values = astype_nansafe(values.ravel(), dtype, copy=True)
File "/opt/conda/envs/crash-model/lib/python3.6/site-packages/pandas/core/dtypes/cast.py", line 692, in astype_nansafe
return lib.astype_intsafe(arr.ravel(), dtype).reshape(arr.shape)
File "pandas/_libs/lib.pyx", line 854, in pandas._libs.lib.astype_intsafe
File "pandas/_libs/src/util.pxd", line 91, in util.set_value_at_unsafe
ValueError: ("invalid literal for int() with base 10: 'signals'", 'occurred at index osm_speed')
Traceback (most recent call last):
File "/opt/conda/envs/crash-model/lib/python3.6/runpy.py", line 193, in _run_module_as_main
"main", mod_spec)
File "/opt/conda/envs/crash-model/lib/python3.6/runpy.py", line 85, in _run_code
exec(code, run_globals)
File "/app/src/data/make_dataset.py", line 179, in
] + features)
File "/opt/conda/envs/crash-model/lib/python3.6/subprocess.py", line 291, in check_call
raise CalledProcessError(retcode, cmd)
subprocess.CalledProcessError: Command '['python', '-m', 'features.make_canon_dataset', '-d', '/app/data/brisbane/', '-features', 'width', 'lanes', 'hwy_type', 'osm_speed', 'signal', 'oneway', 'intersection_segments', 'width_per_lane']' returned non-zero exit status 1.

@j-t-t
Copy link
Collaborator

j-t-t commented Sep 1, 2018

@terryf82 do you have your config file somewhere I can use to run this? It's as you say probably an edge case, but I'll dig into this.
And do you have a link for the Brisbane data?

@terryf82
Copy link
Collaborator Author

terryf82 commented Sep 1, 2018

@j-t-t

The config is in the PR I submitted, config/config_brisbane.yml
The Brisbane data is uploaded to data.world (https://query.data.world/s/n6j3rdmqrewmxm3nf5lnwennk5sdaj), there's only crash data for now.

If you want to run it through the pipeline you'll need to use the new crash_import branch, because the Brisbane crash data doesn't supply day of month.

Thanks, let me know if I've missed anything obvious.

@bpben
Copy link
Collaborator

bpben commented Sep 1, 2018 via email

@terryf82
Copy link
Collaborator Author

At long last, we have a non-US city onboarded =)

http://tf-ecs-insightlane-198544780.ap-northeast-1.elb.amazonaws.com/brisbane

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Development

No branches or pull requests

3 participants