-
Notifications
You must be signed in to change notification settings - Fork 8
Add a view to power the facts page #165
Conversation
jcjimenez
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
ops/storage-ddls/cassandra-setup.cql
Outdated
| AND tileid IS NOT NULL | ||
| AND placeid IS NOT NULL | ||
| AND conjunctiontopic1 IS NOT NULL | ||
| PRIMARY KEY ((pipelinekey), eventtime, eventid, conjunctiontopic1, conjunctiontopic2, conjunctiontopic3, tilez, tileid, placeid) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That MV would result into large data partitions, and slow queries.
Please set the PK to the following PRIMARY KEY ((pipelinekey, conjunctiontopic1, conjunctiontopic2, conjunctiontopic3, tilez), eventtime, eventid, tileid, placeid)
when you need to query for facts you can query by a designated tilez = 15. So for example,
Select * from eventplaces where piplelinekey in ('Twitter', 'Facebook') and tilez = 15 and conjunctiontopic1 IN('{INCLUDE ALL TOPIC TERMS}') and conjunctiontopic2 = '' and conjunctiontopic3 = '' and eventtime > '12/21/2017' and eventtime < 12/31/2017.
You may even want to consider setting conjunctiontopic2 = '' and conjunctiontopic3 = '' within the MV where clause.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We could do that, but in that case, do we really need this view? If we restrict events shown by time/topics/etc. how would this component be different from the ActivityFeed component when the map is maximally zoomed out? Would it be simpler to just provide a "maximize" button for the ActivityFeed to make it take over the entire screen instead of adding a new data-source, graphql endpoint and frontend component?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thinking about this a bit more, maybe a nice way to partition this data would be by date (day/hour-level). In that way, we can implement the infinite scroll very nicely by just requesting progressively farther back dates: every infinite scroll request then maps to a partition.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pinging @erikschlegel: what's the word on the facts view? Do we really want to implement a separate page for the Facts view if it'll simply surface the same information as the NewsFeed when the map is maximally zoomed out?
I'm happy to implement this, just worried that this will add extra maintenance burden for the same information for which we already have a view.
Test cql session after creating the view: ```cql -- insert data INSERT INTO eventplaces( eventid, conjunctiontopic1, conjunctiontopic2, conjunctiontopic3, tileid, tilez, centroidlat, centroidlon, placeid, insertiontime, eventtime, pipelinekey, externalsourceid ) VALUES( 'e1', 'foo', 'bar', 'baz', 'tile1', 1, 1.23, 2.34, 'place1', '2017-10-16', '2017-10-16', 'facebook', 'someone' ); INSERT INTO eventplaces( eventid, conjunctiontopic1, conjunctiontopic2, conjunctiontopic3, tileid, tilez, centroidlat, centroidlon, placeid, insertiontime, eventtime, pipelinekey, externalsourceid ) VALUES( 'e2', 'foo', 'bar', 'baz', 'tile1', 1, 1.23, 2.34, 'place1', '2017-10-14', '2017-10-14', 'facebook', 'someone' ); INSERT INTO eventplaces( eventid, conjunctiontopic1, conjunctiontopic2, conjunctiontopic3, tileid, tilez, centroidlat, centroidlon, placeid, insertiontime, eventtime, pipelinekey, externalsourceid ) VALUES( 'e3', 'foo', 'bar', 'baz', 'tile1', 1, 1.23, 2.34, 'place1', '2017-10-17', '2017-10-17', 'facebook', 'someone' ); INSERT INTO eventplaces( eventid, conjunctiontopic1, conjunctiontopic2, conjunctiontopic3, tileid, tilez, centroidlat, centroidlon, placeid, insertiontime, eventtime, pipelinekey, externalsourceid ) VALUES( 'e4', 'foo', 'bar', 'baz', 'tile1', 1, 1.23, 2.34, 'place1', '2017-10-17', '2017-10-17', 'twitter', 'someone' ); -- fetch twitter events, returns just e4 SELECT eventid, eventtime FROM eventsbypipeline WHERE pipelinekey='twitter'; -- fetch facebook events, returns e3, e1, e2 SELECT eventid, eventtime FROM eventsbypipeline WHERE pipelinekey='facebook'; ```
Test cql session after creating the view: