Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Searching non-ASCII characters in geocode_tbl() results in Unicode characters in the result #35

Open
JMPivette opened this issue Apr 14, 2021 · 1 comment

Comments

@JMPivette
Copy link

I guess this issue is not directly linked to banR but to the underlying API.

If one of the searched address contains non-ASCII characters we end up with Unicode characters in the results instead of UTF-8. (\xe2 instead of â for example):

In the following example using évron instead of evron results in a different encoding for my second search (Chatelaillon).

location_tbl <- tibble::tibble(city = c("évron", "Chatelaillon"))
banR::geocode_tbl(location_tbl, city) 
#> Writing tempfile to.../var/folders/dc/9dbfr9sx23jcx1tmfdlxqr3m0000gq/T//RtmpyiSyId/filef0946d58acb.csv
#> If file is larger than 8 MB, it must be splitted
#> Size is : 25 bytes
#> SuccessOKSuccess: (200) OK
#> # A tibble: 2 x 17
#>   city   latitude longitude result_label      result_score result_type result_id
#>   <chr>     <dbl>     <dbl> <chr>                    <dbl> <chr>       <chr>    
#> 1 évron      45.5      4.57 "Rue a C Victime…         0.2  street      42103_o6…
#> 2 Chate…     46.1     -1.09 "Ch\xe2telaillon…         0.62 municipali… 17094    
#> # … with 10 more variables: result_housenumber <chr>, result_name <chr>,
#> #   result_street <chr>, result_postcode <chr>, result_city <chr>,
#> #   result_context <chr>, result_citycode <chr>, result_oldcitycode <chr>,
#> #   result_oldcity <chr>, result_district <chr>

location_tbl <- tibble::tibble(city = c("evron", "Chatelaillon"))
banR::geocode_tbl(location_tbl, city)
#> Writing tempfile to.../var/folders/dc/9dbfr9sx23jcx1tmfdlxqr3m0000gq/T//RtmpyiSyId/filef096d8b39c1.csv
#> If file is larger than 8 MB, it must be splitted
#> Size is : 24 bytes
#> SuccessOKSuccess: (200) OK
#> # A tibble: 2 x 17
#>   city     latitude longitude result_label    result_score result_type result_id
#>   <chr>       <dbl>     <dbl> <chr>                  <dbl> <chr>       <chr>    
#> 1 evron        48.1    -0.425 Évron                   0.94 municipali… 53097    
#> 2 Chatela…     46.1    -1.09  Châtelaillon-P…         0.62 municipali… 17094    
#> # … with 10 more variables: result_housenumber <chr>, result_name <chr>,
#> #   result_street <chr>, result_postcode <chr>, result_city <chr>,
#> #   result_context <chr>, result_citycode <chr>, result_oldcitycode <chr>,
#> #   result_oldcity <chr>, result_district <chr>
@JMPivette
Copy link
Author

I found the underlying issue here:
https://github.com/etalab/adresse.data.gouv.fr/issues/622

So it happens only when there are less than 5 rows in the tibble and there are non-ASCII characters.

For information, my workaround so far is to rename my search using stringi::stri_trans_general(id = "Latin-ASCII")

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant