Skip to content
This repository has been archived by the owner on Feb 8, 2019. It is now read-only.

Usefulness of DiplomaticRelations? #26

Open
amaury1093 opened this issue Sep 18, 2018 · 11 comments
Open

Usefulness of DiplomaticRelations? #26

amaury1093 opened this issue Sep 18, 2018 · 11 comments

Comments

@amaury1093
Copy link
Contributor

Probably not for v1, but still putting it here.

Assuming we have one abstract Entity class, and PoliticalEntity, Event and Person are subclasses. Does DiplomaticRelation only represent the PoliticalEntity<->PoliticalEntity relationship? How about PoliticalEntity<->Person, Person<->Event or Person<->Person relationships? Each will have a new model? Or should we have a generic Relation model to translate all Entity<->Entity relations?

One elegant solution, proposed by Ville, is to only use Events. So:

  • Event: Belarus became a puppet state of the USSR in 1922
  • Event: the USSR got dissolved in 1991.

From these two events we can know that Belarus was a puppet state of the USSR between 1922 and 1991. Probably there should be a DiplomaticRelation derivative table in the db, for optimization purposes. But users don't input DiplomaticRelations, only Events.

@quorth0n
Copy link
Contributor

Is the idea here more to expand the DipRep model to include other entities? Regardless I agree that we shouldn't be locking down the database to only allow for PE<->PE relationships. I don't think it would be a super common form of data submission for a long time however, but as this could potentially be difficult to change down the road I think we should provide the capability to do this with models, just not the functionality for v1 at least. What do you think?

@amaury1093
Copy link
Contributor Author

The idea is: Remove all E<->E relationships, including what we call today DipRel model. Instead, replace them with Events, cf example above.

And after that, optional optimization step: Recalculate (i.e. derive, i.e. not inputted by users) E<->E relationships based on the Events.

@quorth0n
Copy link
Contributor

Right, I guess a better way to phrase my question would be what data specifically would an Event hold and how would it be different from a DipRel model, other than, of course, the encompassing of Entity implementation models?

@amaury1093
Copy link
Contributor Author

This needs to be defined: the main difference is that Event will only have one date field (compared to 2 for DipRel). Obviously a type field (e.g. BECOMES_VASSAL), and maybe parent_entity and child_entity fields.

In the end we store of course the same data, but it's just presented differently to the user: the user inputs events, instead of relationships.

They very difficult part is to define cascade relationships between events. E.g. if an Event of type DISSOLUTION happens on a particular day, then all events of type VASSAL should also end on that day.

@quorth0n
Copy link
Contributor

Difficult indeed. I like this idea and will think over what the best way to implement this behavior may be.

@wtokumaru
Copy link

wtokumaru commented Sep 30, 2018

Please pardon my nomenclature and let me know what terms are appropriate as I try to orient myself with the current architecture.

tl;dr: Keep it simple and modular. Just have one base class that people can input data for and handle the complex options and meta information with separate and independent modules.

I see two interfaces here.
A: Mappers create polygons and compile whatever meta-data they want.
B: The front end visualizes a map at a given date, which we can make complex with layers.

It makes sense to me that the back end exists primarily to connect these two.
At interface (A) we use form inputs to convert mapper data into "rows" (models) in a chosen (Nation, Territory, or DiplomaticRelation) database "tab" (to use spreadsheet terminology; model attributes are "columns"). The most important data are polygons and dates.
At interface (B) we display a cached map if we have an up-to date one or render (and then cache) a map by querying the database. The typical use case wants to see borders defining some color-coded notion of possession.

For simplicity, I will consider a request for the "political" world map of the year 1700, which should show whichever abstraction layer is defined by international sovereignty.

  1. The front end checks the cache for the exact (or sufficiently close) parameterization (1700, 'sovereignty') and displays that image if it exists and is up to date. If not, it continues to the next step.
  2. The front end queries the database for the set of polygons to render into an image to display and cache. I do not know much about databases but we probably want to optimize for search efficiency. We need to make sure that interface (B) works well. I see a few options, in increasing complexity and efficiency:
  • 2a. For O(n) we could search through all Territories and check their ranges, building up a set of "active" polygons as we parse. This may take too much time. However, we would not need to order the database.
  • 2b. If we discritize time, we could instead have interface (A) pre-populate a finite set of Territories for each data, putting the computational time on the upload rather than the rendering. This may make it difficult to extend the layering options as we might need to re-parse all data every time we change them. It also forces precise times to the closest discrete time or make a new time (like how we do caching).
  • 2c. If we focus on continuous time and Events, we could have the database sorted (and thus indexed) by event dates and then render maps by going directly to the most recent event and working backwards until there are no more relevant events. This might be difficult to terminate as we would need to not waste time parsing irrelevant events on the way to primordial ones. Alternatively, we only parse until we find a pre-rendered map for an earlier data and we are good. Because each individual event also includes meta information, this is easy to extend to other layers.
  1. It is not immediately intuitive for mappers to upload events. Instead, they usually have a set of durations across when a polygon is or is not valid. It makes sense to me then to have the form input accommodate that structure and then have the back end convert to whichever database representation is best. There are several options that are convenient for interface (A):
  • 3a. Require all polygons to input start date, end date, and layer. It is not reasonable to assume we will have a finite set of possible meta layer tags, so we can only require the bare minimum. If a polygon is relevant for different durations (i. e. a territory that changes hands but not shape), we would need a separate row for each duration (or at least a pointer in each to the polygon data).
  • 3b. Only require polygons and make start/end dates optional and then upload Events separately. This works well with having the database store pointers to polygon data but makes it complicated to decide what meta information belongs with which class.
  • 3c. Only upload events, which include the meta layer information. These events solely define "polygon deltas" and make rendering more efficient to code (if more convoluted to understand). This assumes that we have "initial" and "final" (present day) sets of polygons and only use events as edits to those maps. This simplifies border stitching and layer choices but I do not really know if our mapping software can create "polygon deltas."

As an arbitrary middle ground in all of this, I propose the following modules (in increasing abstraction) so that any option works and it is easy to change when we change or minds:

  • Polygon: A shape file or whatever we are using these days. The only information associated with each Polygon should be a unique ID (not a name) which we can point to from the database. We can have any number of these and they can be as redundant or hierarchical as we want since the responsibility to parse them is defined entirely outside. A mapper can upload these with as much meta information as they want but the "row" only stores the pointer to the file. This does not even need a corresponding python module as in theory the name of each shapefile (or whatnot) is its ID. We can have Python helper methods that break apart or combine shapefiles into new ones to use as needed.

  • Entity/Polity/etc.: A "row" (models.Model IIUC) of the database which ascribes meaning to a collection of Polygons. We can have as many children as we want (e. g. Nation, Territory, DiplomaticRelation, PotatoSalad) without breaking the core functionality. Each child is a separate "tab" that people can use without having to worry about the others. A mapper can upload a file through any of these children and we can then isolate and save the file to a Polygon the same way regardless while filling the appropriate tab row(s). It is thus possible to upload an Entity or to have a child Entity that has no Polygon(s) or has other types of files (i. e. narratives) without breaking anything. Every input/attribute is optional by default, except for the choice of child, which in turn may have required inputs.

  • Event/Change/etc.: A separate class which the back end can use to extrapolate and interpolate Entity rows from mapper inputs. The front end can use these abstractions to minimize render time and cache (or modify cached) maps. We probably want to change the name of either Event or Entity to avoid acronym clashes. Mappers could also upload events manually through some sort of shim layer if we wanted. we would also need logic to handle cases where data is not internally consistent.

  • Layer/Map/etc.: An abstraction above Entity and Event which compiles a list (order and meta info do not matter) of pointers to polygons associated with a particular date, zoom/perspective, or perhaps narrative. We can have a child for each front end display option we wish to support and make it so that none meddle with each other in development. We can then render these into images.

In short, interface (A) usually makes Entities and interface (B) reads Layers. The back end uses Events/etc. to populate and parse a table of Entities, a hash map of cached Layers, and a directory of Polygon files.

Finally, let us consider the aforementioned example as a sanity check.

  1. The mapper hereto after known as "Mappo" makes a file that outlines the area of Belarus.
  2. All Mappo cares about this territory is that the USSR owned it from 1922 to 1997. The 1997 is incorrect but we will get back to that later. Mappo does not care about what happened before or after. Mappo refers to this colloquially as the "Byelorussian Soviet Socials Republic." Mappo has a Wikipedia link to Reference.
  3. Mappo also has a file of the USSR, which inherently includes Belarus in its outline, making part of the border (with Poland) redundant. Mappo likes the USSR to be a specific shade of red and notes its full color scheme for good measure.
  4. Mappo has written a narrative about how Belarus was integrated into the USSR which asserts that Belarus "belongs with Russia" or something and cares passionately about getting their argument out there by any means necessary. They think that by making the map, they deserve to have their narrative available on the site, where it is clearly stated to be a subjective opinion.
  5. Mappo goes to our site and clicks "Upload" then selects from a drop down menu or whatever "SSR." SSR is a child class of VassalState, which is in turn a child class of Entity (for example).
  6. The required inputs for a VasalState are a polygon (which Mappo uploads), start/end dates (which Mappo inputs), the territory owner (which is complicated), and the relationship (which is even more complicated).
  7. Normally, the territory owner requires you to search through the database for the right unique row/Entity ID or know it from memory, which is janky. We can reduce this by having a drop down menu of valid Entities based on the given start/end date but that is not necessary. It turns out that the SSR child automatically populates this with the USSR so Mappo does not (and can not) input the territory owner. The relationship option is just a link input for now so Mappo must link a "fact" resource to populate it. The SSR also has the e1997 date incorrectly, so nothing points out to Mappo that the date is questionable.
  8. Mappo then clicks "add narrative" or something then uploads their narrative.
  9. Mappo clicks Submit to close the Upload window, which sends the form input to the back end.
  10. The back end uses the relevant classes to parse the form input into the database. It fills out a row of the SSR tab and of any others we choose to fill out (could make sense to be all parents).
  11. The back end stores the polygon and narrative in the appropriate directories.
  12. Some admin makes these changes "official" so that they will start showing next time someone visits the front end. This also notes which cached images are outdated and either remakes then now or waits for someone to query something that has been changed.
  13. The User hereto after known as "Looko" navigates to chronoscio.
  14. Looko sets the date to 1955 January 27th out of curiosity.
  15. The front end queries the database to find that 1955 January 1st and 1956 January 1st currently have identical images with none between. The front end then displays the 1955 January 1st map because it is closest. This should not take long.
  16. Looko goes to a drop down menu to select the "VassalView" or whatever, which is a child of PoliticalMap, which is a child of Layer.
  17. The front end either displays the map cached in step (12) or realizes that the map is outdated now because of the introduction of Belarus. Assuming the latter, it parses the new Entity row of Belarus from the SSR tab (which it was already using in this example). From the information in the row, it knows to load the USSR and Belarus Polygon files, then creates a new one that shows Belarus as a vassal of the USSR. It then updates the relevant layer and renders the map.
  18. Looko sees a new map which has Belarus shown as a subdivision of the USSR, under whatever aesthetics we choose. Looko can also click on the territory to navigate to a page which has the Wikipedia link, narrative, and whatever else the class specifies. Previously, clicking there would go to a page about the USSR instead. Looko is happy.
  19. Mappo later makes a more precise file and tries to upload it with all the same information. They are unable to because the ID is unique, so they either make a new one or select an option that says this upload should replace the previous one.
  20. Later, someone else decides to upload a manual event that says the USSR dissolved in 1991 instead of 1997. After this uploads, the back end revises all SSR Entities to end in 1991 and then makes new rows that point to the same polygons from 1991 to 1997 but with different context.

Overall, I think the most important part of all this is to keep in mind the minutia but not get bogged down by trying to be overly general. I think we just want something that is easy to extend and change more so than something that is perfect in the first or second or tenth architecture. I honestly do not really care about 99% of the features we could have until we have a strong foundation that can cover the single one that matters most to me: having a 4D map that is what all precious attempts out there could not be.

@amaury1093
Copy link
Contributor Author

Wow, thanks for this message @wtokumaru!

First things first, let's agree on terminology:

Name Spreadsheet SQL Other names
All Models Whole spreadsheet Database
Model Tab Table Class
Attributes Column Column Fields
Instance Row Row

So I think there was a slight confusion between models and rows in your message. You also talk about modules, I assume they are Models?

I disagree with:

  • Polygons: they shouldn't be files, but stored directly in the databases, the format is called geojson. FS i/o is generally slower than DB i/o. Moreover, with PostGIS, we can do operations on geojsons. In our case, it's the Territory model, and each has a start/end date.
  • You don't attach a narrative to an Entity, Event, Person etc. A narrative shouldn't be seen as an attachment (a Territory is "attached"). Rather, it should be seen more high-level than these models. Concretely, a narrative is, once all the relevant instances created, a simple chronological series of instances. So you could have a Narrative with: Event->Person->Nation->Event etc, and each step has a description/media chosen by the author.
  • Cached images. Instead of having the backend sending images to the frontend (which are quite heavy), we just send plain JSON, in the geojson format. On the frontend, Mapbox will render these geojson very nicely, and make them clickable, hoverable etc, which is impossible with images.
  • Event/Changes: you want to derive those from Territory start/end dates, whereas I want to derive DiplomaticRelations (=who is vassal of who) from Events.
  • Event should be a child of Entity. (see Create an abstract "Entity" model #22).

So my version of your 20-point user story. This is a version without DiplomaticRelation, because the whole point of this thread is to remove it.

  1. Same.
  2. Same, except that for 1922 & 1997, Mappo needs to input an event (via interface (A)) describing why it started and why it ended. So UX-wise, Mappo inputs start/end dates + metadata, but behind the scenes, it creates 2 Events and 1 Territory.
    • Note: if e.g. the "Dissolution of USSR" Event already exists in the db, then an error message should show up on the frontend, saying "you cannot create a territory until 1997".
  3. OK.
  4. For now, the the "belongs to" information is not in a narrative, but considered as a fact. So instead of writing something, Mappo in interface (A) just needs to choose from a dropdown that Belarus is a vassal state of USSR, and paste a reference on where he has taken this information from (already done in step as 2 actually). Later on, he can create a narrative referencing this fact.
  5. VassalState is not a child class of Entity. PoliticalEntity is, and "vassalship" is deducted from the fact that there was an Event in 1922 that was "becomes vassal of", which involved 2 PoliticalEntities, namely Belarus and USSR.
  6. The required inputs are (assuming USSR is already created):
    • Metadata about Belarus PoliticalEntity, i.e. color, name, description
    • Mappo adds a Territory for Belarus, uploads shapefile, inputs start/end date
    • A sub-form appears, and prompts the user to input Event metadata about those start/end date (or select existing ones from the db). So Mappo inputs "1922 Belarus becomes vassal of USSR", and selects "1997 dissolution of USSR" from a dropdown (Event already in db, incorrectly inputted by someone else).
  7. Not relevant anymore.
  8. See comment above, narratives come later.
  9. OK.
  10. OK. In our case, it creates the following instances:
    • 1 PoliticalEntity (Belarus)
    • 1 Territory (the uploaded shapefile)
    • 1 Event ("1922 Belarus becomes vassal of USSR", assuming the 1997 dissolution already existed)
  11. The shapefile in is the Territory table, let's forget about narratives for now.
  12. Disagree with images, but agree that some optimizations/pre-gathering of raw data could be done on this step.
  13. OK.
  14. OK, with geojson.
  15. OK.
  16. Why not. So to confirm, PoliticalMap/Layer are purely derived tables that pick data from other tables, for the sole purpose of not querying O(n) queries on each frontend request, correct? If yes, then why not.
  17. Unclear to me. In the current model, it's just a simple SQL query with filter on current date, to get only the relevant geojsons. To not get duplicate data, see Allow the territory api to exclude by id #37.
  18. Yes.
  19. Yes, you can replace a Territory.
  20. If another user creates a new Event "USSR dissolved in 1991", the backend should throw an error. If he edits the existing event, then everything's good. And yes, the backend should revise those changes.

Overall, we are not rigid yet on db architecture, so improvement are welcome. However I don't see how exactly your architecture would work. If you think it's simpler/more universal/more performant, can you create a google sheet with tables/rows/columns, a bit like this, so that we have a clearer idea?

Of course, we are not the only one thinking about this, cf CIDOC. Since that methodology is a standard, I'm trying as much as possible to converge towards their model, if possible.

@wtokumaru
Copy link

wtokumaru commented Oct 7, 2018

Ah, your terminology makes sense. I often used "models" to refers to rows but what I should have said was "instances of models" so yeah.


Polygons

When I was last working on the project, all polygons were qgis shapefiles so it appears we have moved to geojson now. Will take your word for it on I/O speed. The geojson-territory relation makes sense being 1-to-1 but:

  • It is not necessarily a bijection as multiple territories will have identical geojsons, just for different date ranges (e. g. anything that declares independence or trades hands without borders changing). This means we will have redundant data in our database taking up more space than it should. I am not sure how much of a concern this is though.
  • Consider the case where a historical border changes based on new research. We might want to then edit multiple territories that have this border and it would be convenient to edit all that have it at once.
  • Most of macro history involves exchanges of polygons without altering borders. Particularly in the case of unification and collapse, we need to have robust way to consider subdivisions that are sometimes at a different zoom as discussed in Autonomy #41. I would not want to have to deal with the USSR having a completely independent geojson from Russia or Belarus, for example. Or have several rows for Alaska when the territory never really had substantially different borders from what it has now.
    This is why I proposed having a library of "atomic" territories that we can draw from and then just having each row point to the relevant one(s).

Narrative

By "attached" I mean uploading a link, text file, pdf, video, etc. while filling out the form. I agree with your point though.

Caching

When I refer to "layers" AKA "maps" they can be stored in any format. Caching them as geojson makes sense to me so long as the stitching is already done before the front end needs to render. We should not have to perform any logic to decide which geojsons to display as we should have just a single complex one readily available.

Events

What I am kind of imagining is that geojsons are non-temporal and events just point to them as needed. I agree that we should probably derive DiplomaticRelations. I am not set on deriving Territories from Events+Polygons, but I am biased in favor of atomic modularity. Territories feel "molecular" to me.

Entity

After reading more, I agree that Events should be children of Entities.


  1. The file is specifically a geojson file or one converted to be geojson? We could also have code in the back end which automatically converts different file types into geojson.
  2. So Mappo needs to submit 2 events and a Territory separately? Why not force Mappo to submit the Events first (like how we force there to be a Nation first) and then have the start and end dates set by specifying events? I do like being able to derive Events from start/end dates but I see it being much trickier to implement the parsing. The behind the scenes makes sense. Just need to make sure the metadata UI also makes sense.
  3. See Options when submitting territories? #45 for my concern. Would Mappo select the USSR as the nation for this territory? Or would they select Belarus? What if an alternate version of Mappo wanted to submit the exact same file and non-date metadata for Belarus as a Nation? Would we not then not have all the meta data and the geojson redundant?
  4. The more I think about the Diplomatic Relations, the more it seems we should not worry about them too much yet as per Implement Political Entity types #40. Deferred there.
  5. ^
  6. ^
  7. n/a
  8. n/a
  9. n/a
  10. Sounds good. We will need to work out how we derive relations from Events but that can come later.
  11. Same thought about atomic vs. molecular territories.
  12. Deferred to Allow the territory api to exclude by id #29.
  13. n/a
  14. What do you mean by with geojson? I assumed they set the date via the text box or calendar.
  15. n/a
  16. Yep.
  17. Same except in theory it would only need to grab a single geojson that has internal borders, which we then split apart with lower autonomy levels. I do not know the exact details of how Mapbox works so deferring to Use geobuf to compress geojsons #38.
  18. n/a
  19. Yeah, or edit one. What about the Mappo who wants to add the exact geojson and metadata but with a different time? AKA Belarus as described in step (3).
  20. Sounds good.

Once we remove all the topics better covered by other issue discussions, that leaves me with 3 thoughts:

  1. Can we separate the geojson data from the Territory and have it as a separate Polygon (or whatever) Entity that different territories can point to (like how they point to a Nation/PoliticalEntity)? We need this in order to avoid having like 20 Belaruses or Romes or Alsace-Loraines. Even the notion of a territory being something that trades hands or an administrative border/subdividision retained between social changes inherently can not be not limited to a single continuous time range...
  2. How do we determine DiplomaticRelations from Events? I am on board if we have a clear vision.
  3. What do you think about Layers/Maps/etc?

Finally, here is a link to a simplified database architecture: https://docs.google.com/spreadsheets/d/11_ceUZohBJ35ddNHaUTBAM_DVzDBiw0Ld3CQM06LwbQ/edit#gid=0

Note that we could still have an interface that takes in "Territories" the same way we currently do and then just save the geojson separately from the rest in the same way we create Events separately.

@amaury1093
Copy link
Contributor Author

amaury1093 commented Oct 10, 2018

1 is addressed in #53.
2 can be discussed further down in this thread (I have some ideas involving dependency graphs)
3. Caching (i.e. stitching the polygons beforehand) is definitely needed, it would take too much time if it was done on frtonend request. However, I don't get how it's modeled in your spreadsheet. Are EmpireLayer and ReligionLayer new models? At what point do we decide to put a PoliticalEntity into EmpireLayer?

@wtokumaru
Copy link

Layers are maps which stitch together polygons in different ways. I picked the term "Layer" while thinking about vaguely similar stuff like this: http://desktop.arcgis.com/en/arcmap/10.3/map/working-with-layers/a-quick-tour-of-map-layers.htm

The default layer, which I assume to be SoverigntyLayer or something, would show a map of stitched polygons into PoliticalEntitys that are each uniquely sovereign. In contrast, a ReligionLayer would stitch together the same polygons but into ReligionEntitys which have color/etc. like a PoliticalEntity would but are instead sorted by the more prominent religion. For example, an early modern religion map of Europe would look quite different from its sovereignty map, despite being composed of roughly the same smaller polygons stitched together, and the Layers would handle these representations.
I do not think that we will want to support Layers in the initial build but should we plan to some time later it would help to have polygons as described in #53.

@amaury1093
Copy link
Contributor Author

@wtokumaru Answered in the Slack thread, but I want to put it here too:

Your SoverigntyLayer covers the whole world at one particular date, correct? So if I'm only interested in the history of Europe, the frontend will still fetch the whole SoverigntyLayer? In this case, I think it's probably more performant network-wise to return an array of cached PoliticalEntities as an array.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

3 participants