Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

1989 guessing of namedplaces #2809

Merged
merged 14 commits into from
Mar 30, 2015
Merged

1989 guessing of namedplaces #2809

merged 14 commits into from
Mar 30, 2015

Conversation

rafatower
Copy link
Contributor

There's the namedplaces guessing, taking into account countries

@Kartones this is pretty much ready for a "final" review, can you please take another look?

@Cartofante
Copy link
Collaborator

Frontend tests were OK 👍 (details)

@javisantana
Copy link
Contributor

why "this does not seem to scale well" ? what do you mean?


def count_namedplaces(sample, column_name_sym)
sql_array = sample.map{|row| "'" + row[column_name_sym] + "'"}.join(',')
query = "WITH geo_function as (SELECT (geocode_namedplace(Array[#{sql_array}])).*) select count(success) FROM geo_function where success = TRUE"
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Replying @javisantana this is the critical part. this is gonna be called through sql api once per column with an array of up to sample size.

Due to the nature of the sampling and the dataset itself, I do not expect having a good hit rate in cache for these queries, nor for the geocodings on the full datasets.

If you're ok with it, then me too and more than happy of releasing it :)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, so why don't we add a metric there? also, if you feel like the call is going to be really expensive we have the option of having a different geocode_namedplace that gets not only a single array but an array per column. Are we doing this only for text columns, right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 to add a metric and check whether performance is a problem or not

Yes, we're just querying text columns.

About sending one single query with all text columns of the sample: do you think it will be more efficient for the general case?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it would be more efficient because you only need to open a connection (at least this).
Let's add that metric and see how it looks

@Cartofante
Copy link
Collaborator

Frontend tests were OK 👍 (details)

@Cartofante
Copy link
Collaborator

Frontend tests were OK 👍 (details)

@@ -199,12 +233,13 @@ def geocode(formatter, geometry_type, kind)
geometry_type: geometry_type,
kind: kind,
max_rows: nil,
country_column: nil
country_column: country_column_name,
countries: "'#{country}'"
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

note for myself: this should be nil if country is nil

@Cartofante
Copy link
Collaborator

Frontend tests were OK 👍 (details)

@Cartofante
Copy link
Collaborator

Frontend tests were OK 👍 (details)

@Cartofante
Copy link
Collaborator

Frontend tests were OK 👍 (details)

@rafatower rafatower changed the title 1989 guessing of namedplaces [do NOT merge] 1989 guessing of namedplaces Mar 26, 2015
@Cartofante
Copy link
Collaborator

Frontend tests were OK 👍 (details)

@Cartofante
Copy link
Collaborator

Frontend tests were OK 👍 (details)

@Kartones
Copy link
Contributor

👍 Nice tests!

rafatower pushed a commit that referenced this pull request Mar 30, 2015
@rafatower rafatower merged commit ac248f4 into master Mar 30, 2015
@rafatower rafatower deleted the 1989-guessing-of-namedplaces branch March 30, 2015 12:46
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants