Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Import gpx with uppercase in <desc> #9336

Closed
xjurasek opened this issue Nov 1, 2020 · 22 comments · Fixed by #9559
Closed

Import gpx with uppercase in <desc> #9336

xjurasek opened this issue Nov 1, 2020 · 22 comments · Fixed by #9559
Assignees
Labels
Bug Issues classified as a bug Unverified Issue not yet confirmed/reproduced or feature requests not yet checked for plausibility

Comments

@xjurasek
Copy link

xjurasek commented Nov 1, 2020

Describe the bug:
Importing from .gpx file with 5 points imports only the last point.

To Reproduce:
Steps to reproduce the behavior:

  1. Select list
  2. Import from gpx file

Actual behavior/state after performing these steps:
If I import from file LAB_KAREL_GOTT.big.gpx, I can see only one point. If I change first word in tag desc from uppercase to camelcase, import works as expected - import 5 points.

I attach 2 files (renamed from .gpx to .txt to enable upload to github):

As you can see, there is difference only this tag desc.

In gpx you can see that name of cache is set to GC8XYT7-X, but c:geo imports it with GC code "KAREL" (of course in upper case variant, in camelcase works as expected). Screenshot_20201101-230429_cgeo

It looks like a problem with xml parsing.

Version of c:geo used:
2020.10.29, but this problem is older.

Is the problem reproducible:
Yes

@xjurasek xjurasek added Bug Issues classified as a bug Unverified Issue not yet confirmed/reproduced or feature requests not yet checked for plausibility labels Nov 1, 2020
@murggel
Copy link
Contributor

murggel commented Nov 7, 2020

My research:

Problem is not the Uppercase, but the name of the cache, especially the hyphen "-" in the name. The algorithm to find / extract the Geocode for the cache has problems to interpret it then and hence the description will be parsed for the Geocode.
Now the Lowercase / Uppercase get involved - the Uppercase will now be interpreted as Geocode.

So, question is, where does the "-0" come from? Is that normal for Lab-Caches? And if yes - can this be probably ignored for the algorithm to find the Geocode?

I can't probably answer that.
@Lineflyer: since I don't know who would be best to answer it, probably you know?

@moving-bits
Copy link
Member

moving-bits commented Nov 8, 2020

So, question is, where does the "-0" come from? Is that normal for Lab-Caches? And if yes - can this be probably ignored for the algorithm to find the Geocode?

AFAIK Lab Caches don't have any GC code (and never had). With the old lab caches you got an option to download a GPX file for it, but that did not include any GC code. For the new lab adventures you cannot even download a GPX file anymore. So I guess the "GC code" in the GPX has been chosen manually.

Edit: With a slight trick, see Lineflyer's comment below, you can download a very basic GPX for Lab Adventures again, but that's only the adventure itself, no waypoints, and no "GC code".

@Lineflyer
Copy link
Member

You can download GPX for Labs, see #9202

@moving-bits
Copy link
Member

You can download GPX for Labs, see #9202

Ah, hadn't recognized that this is in GPX format, not JSON. Thanks for pointing that out.
But, still, there's no "GC code" for lab adventures, only the "goto" link.

@murggel
Copy link
Contributor

murggel commented Nov 8, 2020

Thanks, I will further investigate with these information.

@xjurasek
Copy link
Author

Hi,

this gpx is part of "community project" on https://labgpx.cz. Primary goal was to enable statistics on lab caches in GeoGet. In case of that each lab cache must have uniq GC code. To avoid conflict with existing caches we add string "-X" to existing GC code (usualy bonus). Second efect of these .gpx files (main feature for me :-)) is importing .gpx files to c:geo and use c:geo to navigate to caches, see all stages for all adventures in one prefered application and using "Adventure lab app" only for log.

If you think, that the reason is "-" in GC code - why there is this problem only when there are upper case letters in desc tag?

@murggel
Copy link
Contributor

murggel commented Nov 10, 2020

If you think, that the reason is "-" in GC code - why there is this problem only when there are upper case letters in desc tag?

Edit:
Not really a problem, but it explain probably the different behaviour: because of the name will not be interpreted as a valid geocode, the description is used to search for. The uppercase will be interpreted as geocode, the lowercase not.
I have to investigate further, on why the other waypoints are not read - probably it is aborted then due to duplicate geocodes, cause the desc starts with the same uppercase word.

I will check, on how the geocode-check for labcaches can be improved ...

@eddiemuc
Copy link
Contributor

this gpx is part of "community project" on https://labgpx.cz.

That's a really cool project!

@moving-bits
Copy link
Member

@xjurasek
Any chance to get this artificial GC-code changed to something unique and clearly non-GC-code? Maybe even including the identifier gc.com uses internally?

See the gpx file linked in this article: #9202 (comment) - there are URL for each lab adventure, and the part after the "goto/" is unique for each adventure. This could lead to a cross-software unique identifier for each lab adventure, enabling the possibility for linking, exchanging data/status etc. Just an idea.

Eg: The URL "https://labs.geocaching.com/goto/61545" could lead to an id "LC61545" (the "*" at the beginning marking it clearly as non GC-code, not only for gc.com, but for other platforms as well).
A different example: "https://labs.geocaching.com/goto/Salmrohr" would lead to "LCSalmrohr" - the lab id can be alphanumeric and vary in length.

@xjurasek
Copy link
Author

Hi,
its more difficult:

  • first "short name" can be modified in time, this will cause problem in gpx refresh.
  • second GS recomends to use ASCII letters, but don't requires it. So we have many adventures with diacritics and i'm afraid that diacritics in "GC-code" fileld will cause to more problems in other programs.
  • last but not least - short name can be uppercase (https://labs.geocaching.com/goto/ZOOPRAHA) and using "ZOOPRAHA" in tag cause the same problem.

This shows, that GS Lab adventures is still beta version with many bugs - if you have adventure with diacritics in short name, you can't show it browser (https://labs.geocaching.com/goto/Blaník)

Programmers of labgpx.cz made workaround - when downloads gpx, convert tag from uppercase to camelcase. So if it's difficult to repair parser in c:geo, you can close this bug.

@murggel
Copy link
Contributor

murggel commented Nov 12, 2020

If labgpx can stay with the workaround of converting ...

Or we can check, if for waypoints with URL contains "labs.geocaching.com" (or other attribute to recognize it as a lab-cache) the name is be used as geocode without further check. This happens as well, if there is no valid geocode found.
I don't know, if the url is always included in the gpx.

@moving-bits : What do you think?

@xjurasek
Copy link
Author

Hi,
now I found another .gpx file, where camelcase doesn't work. If you import
LAB_Stary_Prostejov_-_Wichterle_a_Kovarik_as_WIKOV.gpx.txt:

  • file contains 5 points
  • c:geo imports only one file

Same situation as in "Karel Gott". But in this file is tag in camelcase. Why c:geo imports only the last one? So camelcase helps only in some kind of situations :-(.

@murggel
Copy link
Contributor

murggel commented Nov 13, 2020

now I found another .gpx file, where camelcase doesn't work. If you import
LAB_Stary_Prostejov_-_Wichterle_a_Kovarik_as_WIKOV.gpx.txt:

I will take a look on it...

Edit:
Weared - because in the description there is a min. 5 letter uppercase word (WIKOV) - this will be taken as geocode.

So probably the best way would be for labcaches to ignore geocode-search and just use the name as geocode...

@murggel
Copy link
Contributor

murggel commented Nov 15, 2020

@xjurasek: may I use the stary_prostejov-gpx for creating a test-case? So it would be here on github and probably in the releases version... I did not check yet, if the test-cases will be deployed in the released version...

@xjurasek
Copy link
Author

Of course you can.

@xjurasek
Copy link
Author

Hi,
I repaired .gpx file and found another examples. Short recapitulation:

  • Karel Gott (from begining of this thread) - to enable import I must convert only first word from uppercase to camelcase
  • Stary_Prostejov - to enable import I must convert the last word (but it's the only one with big letters) from uppercase to camelcase
  • Vyhled_do_dalsiho_stoleti - to enable import I must convert the last word (there are more words with big letters) from uppercase to camelcase
  • Brno_Bystrc - to enable import I must convert the second word to camelcase. I don't know, why there's not the same situation as in Karel_Gott
  • Velka_Bystrice - the special - if you import original file, c:geo imports 2 "caches". To import all of them I must convert string "geoZELVY" to "geoZelvy" and in one of record string "SOKOL" to camelcase "Sokol".

In all examples is problem in tag desc. Original files imports only one (or two in last example) "caches". I think, that c:geo made "uniq" on lab cache name and during import "rewrites" previous record with the next one.

All files (for easiest diff):
LAB_BRNO_-_BYSTRC.repaired.gpx.txt
LAB_Velka_Bystrice.repaired.gpx.txt
LAB_Stary_Prostejov_-_Wichterle_a_Kovarik_as_WIKOV.repaired.gpx.txt
LAB_VYHLED_DO_DALSIHO_STOLETI.repaired.gpx.txt
LAB_VYHLED_DO_DALSIHO_STOLETI.gpx.txt
LAB_Velka_Bystrice.gpx.txt
LAB_Stary_Prostejov_-_Wichterle_a_Kovarik_as_WIKOV.gpx.txt
LAB_BRNO_-_BYSTRC.gpx.txt

@murggel
Copy link
Contributor

murggel commented Nov 19, 2020

I think, that c:geo made "uniq" on lab cache name and during import "rewrites" previous record with the next one.

@xjurasek Yes, you are right. c:geo uses the geocode as identifier for the cache. I don't know, why c:geo tries to find a "valid" geocode out of the gpx and does not use simply the "name"-element. Probably there were some other cache-formats, which does not have set the name correctly.
So the geocode is searched in the name, desc and cmt-elements of the wpt-element (gpx-schema). Now as 5 consecutive uppercase letters / numbers are interpreted as geocode, they were found in your examples. c:geo does not care, if those 5 are inside a word a not...
If the same combination occurs in a second waypoint, it is used as identifier and so the previous waypoint with the same identifier will be overwritten by the following one.

@moving-bits what do you think?
Since it is not clear yet, if there will be a specific geocode for lab-caches, we can skip the search for a geocode and just use the content of the name-element for lab-caches.

@moving-bits
Copy link
Member

Without having looked into the source yet I imagine the following reason for parsing for a valid geocode:

Each cache in c:geo belongs to a geocaching service (either online or offline). Which service it belongs to is determined by the geocode, as their namespaces are disjunct. Thus import cannot be done for a waypoint without finding a valid geocode.

A possible solution could be to extend the GPX import to automatically create user-defined caches from waypoints with invalid or no geocode.
But be aware: A side-effect of this would be, that reimporting such a GPX file would lead to another set of user-defined caches being created, as without a unique geocode they cannot be recognized. => Not sure if we want this behavior or stay with skipping such waypoints.

@murggel
Copy link
Contributor

murggel commented Nov 20, 2020

@moving-bits ok, thanks for the information. You already proposed in #9202 (comment) some ideas how to handle the geocodes of labs, maybe with own prefix and connector.

So it would not make sense to start here with a workaround (like skipping the search for a geocode for labcaches).

I just thought about handle the lab cache as one cache with child-waypoints and not to create for each stage a own cache. But probably this will collide then with the "real" cache on GC (bonus cache) ...

@xjurasek Try to use GC...1 instead of GC...-1 as name, that should import all caches (since the "-" rejects the name as a valid geocode and so the description will be parsed). I don't know, what following effects that could have, but at least all caches should be imported.
Probably there will be a more general support for lab-caches later (see #9202) and your case can be handled then.

@moving-bits
Copy link
Member

Maybe for now it would help to add an option to GPX import to import all found waypoints as user-defined geocaches (and not trying to guess any geocodes). Menu option could be named "Import GPX (user-defined caches)" or something like this, to distinguish it from "Import GPX".

@xjurasek
Copy link
Author

I think, that first step is to find out why there is mechanism "guess geocode from other fields". Is there any comment in code or github issue?

@murggel - we can't skip dash, without it we can make collision with existing caches (now and maybe later when will be actual geocaching id extended from 7 to 8 characters).

@murggel
Copy link
Contributor

murggel commented Nov 23, 2020

I already looked for any reason why description etc are parsed for geocode, but did not find anything. The code is since initial commit to this repository. I guess some gpx were lacking of the geocode in the name or some other reason.

@moving-bits You were right in not handling labcaches differently and guess some valid geocodes for them, but I don't really like the idea of having different "Import GPX"- options for the user. Can he distinguish, which he should use for the gpx? Probably there is a mix of different caches in one GPX.
So I looked once more in the code and found another option, which I missed the whole time. I found, that there were the different services asked, if they can handle the string as geocode. And as "fallback" there is an "unknown service" (not the "internal service" for UDC) which can handle any string as geocode. In my opinion, this service should not be the winner on finding geocode on name and description. As a fallback the name would be set as geocode and hence the "unknown service" would be taken anyway if no other service can handle the string.

So this would mean:

  • check all services (except the "unknown serivce") if they can handle a geocode which is found in name or description. If one of them can handle it, this one is taken.
  • If no service can handle it, the name will be taken as geocode (and the "unknown service" can always handle the name, since it can handle any string).
  • So there will be no special treatement of LabCaches and if there will be in future a service which can handle LabCache-Codes, no workaround has to be considered
  • Still they are distinct from real UDC and can be later probably converted to LabCaches

murggel added a commit to murggel/cgeo that referenced this issue Nov 23, 2020
murggel added a commit to murggel/cgeo that referenced this issue Dec 20, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Issues classified as a bug Unverified Issue not yet confirmed/reproduced or feature requests not yet checked for plausibility
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants