Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Naming convention for temporal types #18

Closed
chaitan94 opened this issue Jul 18, 2020 · 3 comments
Closed

Naming convention for temporal types #18

chaitan94 opened this issue Jul 18, 2020 · 3 comments

Comments

@chaitan94
Copy link
Member

chaitan94 commented Jul 18, 2020

Overview

Right now there are two ways of naming being followed for temporal types.

After a small discussion with @estebanzimanyi and @mahmsakr, we felt it might be a good idea to get a more external perspective on this before making the final call.

Approach 1

This approach is followed currently by MobilityDB's main database codebase

Temporal Duration Name Base Type Specific Examples
Temporal temporal tint, tgeompoint
Instant temporalinst tintinst, tgeompointinst
Instant Set temporali tinti, tgeompointi
Sequence temporalseq tintseq, tgeompointseq
Sequence Set temporals tints, tgeompoints

This approach is also followed currently by MobilityDB's python driver codebase

(Although a slight change in terms of case sensitivity - which I think is fine considering C vs Python)

Temporal Duration Class Name Base Type Specific Examples
Temporal Temporal TInt, TGeomPoint
Instant TemporalInst TIntInst, TGeomPointInst
Instant Set TemporalI TIntI, TGeomPointI
Sequence TemporalSeq TIntSeq, TGeomPointSeq
Sequence Set TemporalS TIntS, TGeomPointS

Approach 2

This approach is followed currently by MEOS's codebase

Temporal Duration Class Name Base Type Specific Examples
Temporal Temporal Temporal<int>, Temporal<Geometry>
Instant TInstant TInstant<int>, TInstant<Geometry>
Instant Set TInstantSet TInstantSet<int>, TInstantSet<Geometry>
Sequence TSequence TSequence<int>, TSequence<Geometry>
Sequence Set TSequenceSet TSequenceSet<int>, TSequenceSet<Geometry>

This approach is also followed currently by PyMEOS

(Although a slight change being the base type being included within the class name itself as Python doesn't support template parameters)

Temporal Duration Class Name Base Type Specific Examples
Temporal Temporal TemporalInt, TemporalGeom
Instant TInstant TInstantInt, TInstantGeom
Instant Set TInstantSet TInstantSetInt, TInstantSetGeom
Sequence TSequence TSequenceInt, TSequenceGeom
Sequence Set TSequenceSet TSequenceSetInt, TSequenceSetGeom

Comparision

The case for Approach 1

  • Has already been used/mentioned in a few papers and other media, and might need them to change accordingly
  • Is less technical (not using terms like Set), more close to English and hence sounds more natural

The case for Approach 2

  • Readability - More explicit on the type of duration currently being dealt with.
  • Reduces ease of errors - There has also been a point raised about this approach being a bit resilient against accidental typos (compared in Approach 1 where it is easy to accidentally use temporals instead of temporal)
  • It is more technical and hence is always a bit unambiguous while communicating

One other point to note is that in Approach 1, base type comes before the temporal duration, while it is the reverse in Approach 2. Example: TIntInst vs TInstantInt. I cannot objectively say which is better here. But the rationale behind these were:

Base Type before Temporal Duration

  • It was natural choice on postgres extension side, as the publicly exposed types were tint, and tgeompoint
  • It is also natural to say "integer sequence" rather than "sequence (of) integer"

Temporal Duration before Base Type

  • It was a natural choice for using template classes in C++. Example: TInstant<int>
  • In terms of Python, I felt in future this suffix can be dropped altogether if we are able to add dtype like inference similar to how numpy functions, leaving TInstantInt to become something like TInstant with a dtype on the object representing the base type

Conclusion

I hope the above gives more context and insight to make the right decision. We can use this issue to collect community feedback and decide accordingly.

@chaitan94 chaitan94 changed the title Naming convetion for temporal types Naming convention for temporal types Jul 18, 2020
@estebanzimanyi
Copy link
Member

estebanzimanyi commented Jul 19, 2020

Many thanks Krishna for this great analysis. My personal perspective is based on the following premises:

  1. We should differentiate between users and developers because they have potentially different needs.
  2. We should be consistent across different programming languages and their established conventions.
  3. In order to achieve the two previous premises we should find the best compromise that optimizes both of them.

The rationale for having the user (external) types named as tint, tfloat, or tgeompoint is that these are the only types known at the SQL level and thus by the database user. Therefore we should keep them like this and add eventually a duration suffix, e.g., for Python such as TGeomPointInst. Please notice that SQL is case insensitive.

As for the suffixes one could favor Inst, InstSet, Seq, and SeqSet because of their precision even if they become very long, e.g., for TGeomPointSeqSet. This was the reason I used the shorter suffixes Inst, I, Seq, and S especially thinking of temporal geometry/geography points, that is, tgeompoint vs. tgeogpoint. Please notice that this applies also to methods in the API for transforming a value to a duration, e.g., tgeompointseqset(temp) vs. tgeompoints(temp). We must decide whether we keep the shorter or the longer suffixes.

On the other hand, regarding the developers' world, a precise naming should be used to convey the idea that the algorithms vary considerably for the different durations. For this reason, I would favor the more precise class names you used in MEOS, that is Temporal, TemporalSet, TInstant, TInstantSet, TSequence, and TSequenceSet, where the second one is the abstract superclass of both TInstantSet and TSequence.

Regarding the issue of temporal duration before or after the base type we can follow usual practices in C++. Indeed, when defining instantiations of STL template classes it is often the case that the base type comes before the template type as in

typedef vector<int> IntArray;

This is taught, e.g., in MIT Programming courses and used in well-known libraries such as SWIG, JGL, and MS Visual Studio. From a user perspective I think IntArray is much more intuitive than ArrayInt. Therefore, I would advocate to use the convention such as TFloatSeq or TGeomPointInst, which would also be more compatible with the SQL only types tfloat and tgeompoint.

Regarding the naming of temporal geometry/geography points, in the context of a Master's thesis we have been working on a first implementation of a generic TGeometry type for arbitrary non-deforming geometries, where the geometry represented at a timestamp can be, e.g., a rotating 3D polygon. This is in particular specified in the ISO 19141 standard. For the moment a few methods of the full MobilityDB API have been implemented for TGeometry but there are technical barriers that prevent us from covering the full API.

As an example, when determining the temporal distance in MobilityDB we use derivatives to understand when two moving points are at the shortest distance to add this additional timestamp as a turning point when computing the distance. This is possible since these are derivatives of linear functions, but this is no longer the case for rotating 3D geometries. We don't even know whether a derivative for the function representing the movement of 3D rotating polygons exists.

Therefore, although we are aiming to have in the future a generic TGeometry type that depending on whether the underlying geometry is a point or a line/polygon call either the current MobilityDB function or a specific one for complex geometries, it is for the moment difficult to assess whether this will be possible.

To summarize, these are my preferred naming conventions.

Temporal Duration Class Name Base Type Specific Examples SQL types and transformation functions
Temporal Temporal TInt, TGeomPoint tint, tgeompoint
Temporal Set TemporalSet TIntSet, TGeomPointSet NA
Instant TInstant TIntInst, TGeomPointInst tintinst(), tgeompointinst()
Instant Set TInstantSet TIntInstSet, TGeomPointInstSet tintinstset(), tgeompointinstset()
Sequence TSequence TIntSeq, TGeomPointSeq tintseq(), tgeompointseq()
Sequence Set TSequenceSet TIntSeqSet, TGeomPointSeqSet tintseqset(), tgeompointseqset()

@chaitan94
Copy link
Member Author

I think this is a very good middle ground approach. I think I would be good to follow the TGeomPointSeqSet-like naming throughout MEOS and PyMEOS codebases once we have the consensus towards it.

I was thinking that only downside would be that in C++, because of how templating works, I would still have the base type at the end. i.e, TSequenceSet<int> and TSequenceSet<GeomPoint>. However, as you mentioned, this is not the case as I can simply do typedef TSequenceSet<int> TIntSeqSet; That way both TIntSeqSet and TSequenceSet<int> work, and maybe users can always use TIntSeqSet and might never need to know about TSequenceSet<int>. However, at the same time, I also do not want to expose too many options for the same thing which might lead to confusion.

Also, just to clarify further, when you say we should differentiate between users and developers - do the developers here mean in a general sense or the developers/contributors working specifically on MobilityDB? I understand is that by users you mean to refer to the public/external API and by developers you mean the more internal classes and functions - But in the case of MEOS it becomes a bit more overlapping as its users would be developers in the general sense, who might want to interact with these to build their projects.

@estebanzimanyi
Copy link
Member

estebanzimanyi commented Aug 10, 2020

The type names have been changed in the develop branch. This implied a renaming also on many C functions. The resulting code is much more clear since it better highlights the variant of the temporal type that we are manipulating in a function.
Many thanks Krishna for this great suggestion.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants