# Vistas materializadas en Cassandra

https://www.datastax.com/dev/blog/new-in-cassandra-3-0-materialized-views

As an example of how materialized views can be used, suppose we want to track the high scores for players of several games. We have a number of queries that we would like to be able to answer:

- Given a game, who has the highest score, and what is it?
- Given a game and a day, who had the highest score, and what was it?
- Given a game and a month, who had the highest score, and what was it?

In [None]:
%load_ext cql

In [None]:
%%cql
DROP KEYSPACE demo 

In [None]:
%%cql
CREATE KEYSPACE demo 
WITH replication = {'class': 'SimpleStrategy', 'replication_factor' : 1};

In [None]:
%keyspace demo

Materialized views maintain a correspondence of one CQL row each in the base and the view, so we need to ensure that each CQL row which is required for the views will be reflected in the base table’s primary keys. For the first query, we will need the game, the player, and their highest score. For the second, we will need the game, the player, their high score, as well the day, the month, and the year of that high score. For the final query, we need everything from the second except the day. The second query will be the most restrictive, so it determines the primary key we will use. A user can update their high score over the course of day, so we only need to track the highest score for a particular day.

In [None]:
%%cql
CREATE TABLE scores
(
  user TEXT,
  game TEXT,
  year INT,
  month INT,
  day INT,
  score INT,
  PRIMARY KEY (user, game, year, month, day)
)

Next, we’ll create the view which presents the all time high scores. To create the materialized view, we provide a simple select statement and the primary key to use for this view. Specifying the CLUSTERING ORDER BY allows us to reverse sort the high score so we can get the highest score by simply selecting the first item in the partition.

In [None]:
%%cql
CREATE MATERIALIZED VIEW alltimehigh AS
   SELECT user FROM scores
   WHERE game IS NOT NULL AND score IS NOT NULL AND user IS NOT NULL AND year IS NOT NULL AND month IS NOT NULL AND day IS NOT NULL
   PRIMARY KEY (game, score, user, year, month, day)
   WITH CLUSTERING ORDER BY (score desc)
        

To query the daily high scores, we create a materialized view that groups the game title and date together so a single partition contains the values for that date. We do the same for the monthly high scores.

In [None]:
%%cql
CREATE MATERIALIZED VIEW dailyhigh AS
       SELECT user FROM scores
       WHERE game IS NOT NULL AND year IS NOT NULL AND month IS NOT NULL AND day IS NOT NULL AND score IS NOT NULL AND user IS NOT NULL
       PRIMARY KEY ((game, year, month, day), score, user)
       WITH CLUSTERING ORDER BY (score DESC)

In [None]:
%%cql
CREATE MATERIALIZED VIEW monthlyhigh AS
       SELECT user FROM scores
       WHERE game IS NOT NULL AND year IS NOT NULL AND month IS NOT NULL AND score IS NOT NULL AND user IS NOT NULL AND day IS NOT NULL
       PRIMARY KEY ((game, year, month), score, user, day)
       WITH CLUSTERING ORDER BY (score DESC)

We prime our materialized views with some data. We just insert the data into the scores table, and Cassandra will populate the materialized views accordingly.

In [None]:
%cql INSERT INTO scores (user, game, year, month, day, score) VALUES ('pcmanus', 'Coup', 2015, 05, 01, 4000)
%cql INSERT INTO scores (user, game, year, month, day, score) VALUES ('jbellis', 'Coup', 2015, 05, 03, 1750)
%cql INSERT INTO scores (user, game, year, month, day, score) VALUES ('yukim', 'Coup', 2015, 05, 03, 2250)
%cql INSERT INTO scores (user, game, year, month, day, score) VALUES ('tjake', 'Coup', 2015, 05, 03, 500)
%cql INSERT INTO scores (user, game, year, month, day, score) VALUES ('jmckenzie', 'Coup', 2015, 06, 01, 2000)
%cql INSERT INTO scores (user, game, year, month, day, score) VALUES ('iamaleksey', 'Coup', 2015, 06, 01, 2500)
%cql INSERT INTO scores (user, game, year, month, day, score) VALUES ('tjake', 'Coup', 2015, 06, 02, 1000)
%cql INSERT INTO scores (user, game, year, month, day, score) VALUES ('pcmanus', 'Coup', 2015, 06, 02, 2000)

We can now search for users who have scored the highest ever on our games:

In [None]:
%cql SELECT user, score FROM alltimehigh WHERE game = 'Coup' LIMIT 1

And the daily high score:

In [None]:
%cql SELECT user, score FROM dailyhigh WHERE game = 'Coup' AND year = 2015 AND month = 06 AND day = 01  LIMIT 1

All of the entries have been copied into the all time high materialized view:

In [None]:
%cql SELECT user, score FROM alltimehigh WHERE game = 'Coup'

Because we have a CQL Row in the view for each CQL Row in the base, ‘pcmanus’ and ‘tjake’ appear multiple times in the high scores table, one for each date in the base table.

We can also delete rows from the base table and the materialized view’s records will be deleted. We’ll delete the tjake rows from the scores table:

In [None]:
%cql DELETE FROM scores WHERE user = 'tjake'

Now, looking at all of the top scores, we don’t find the tjake entries anymore:

In [None]:
%cql SELECT user, score FROM alltimehigh WHERE game = 'Coup'

When a deletion occurs, the materialized view will query all of the deleted values in the base table and generate tombstones for each of the materialized view rows, because the values that need to be tombstoned in the view are not included in the base table’s tombstone. For the single base tombstone, two view tombstones were generated; one for (tjake, 1000) and one for (tjake, 500).