-
Couldn't load subscription status.
- Fork 0
Module 1: Lab #1
Team: 12
Professor: Yugyung Lee
Name: Sneha Mishra
Class ID: 11
Email: smccr@mail.umkc.edu
MyGitHub
Technical Partner:
Name: Aditya Soman
Class ID: 19
Email: aditya.soman@mail.umkc.edu
GitHub
YouTube Link explaining the Lab work can be found here
The report for the Lab work is here
- Hadoop MapReduce Algorithm
- Use Case Based No SQL Comparison










Consider one of the use cases from the below link:
https://umkc.box.com/s/q64fvjm6yd454w5v3ky0he4854g6m1fq
These use cases were discussed in Lecture 1: Cassandra.
- Consider one of the use case and use a simple dataset. Describe the use case considered based on your assumptions, report the dataset, its fields, datatype etc.
- Use HBase to implement a Solution for the use case. Report at least 3 queries, their input and output. The query’s relevance towards solving the use case is important.
- Use Cassandra to implement a Solution for the use case. Report at least 3 queries, their input and output. The query’s relevance towards solving the use case is important.
- Compare Cassandra and HBase for your use case. Present a table with comparison of your use case being implemented in both NO SQL Systems.
When you login to Facebook and click Messages => "See all messages", you will see a search box... you can search your inbox in there..
The basic idea is that you use the user id as the partition key, and then all the information you need for an inbox search will be clustered as rows in that partition. You can then set up multiple tables like this with different types of data clustered in the partition to support different types of searches. Since Cassandra can access a partition in essentially constant time even with millions of users, the system can scale and remain fast as you add nodes and users.
HBase implementation:





Query 1: PrefixFilter: This filter takes one argument as a prefix of a row key. It returns solely those key-values present in the very row that starts with the specified row prefix

Query 2: MultipleColumnPrefixFilter: This filter takes a listing of column prefixes. It returns key-values that are present in the very column that starts with any of the specified column prefixes. every column prefixes should be a form qualifier.

Query 3: ColumnCountGetFilter: This filter takes one argument a limit. It returns the primary limit number of columns within the table.


Cassandra implementation:






Query 3: Interaction Search - returns all the conversation with all the users for a particular User ID

Comparison of Cassandra and HBase based on the selected use case:
- Has a simpler consistency model than Cassandra.
- Very good scalability and performance for their data patterns.
- Most feature rich for their requirements: auto load balancing and failover, compression support, multiple shards per server, etc.
- HDFS, the filesystem used by HBase, supports replication, end-to-end checksums, and automatic rebalancing.
- Facebook's operational teams have a lot of experience using HDFS because Facebook is a big user of Hadoop and Hadoop uses HDFS as its distributed file system.
- Wide-column store based on ideas of BigTable and DynamoDB.
- SQL-like DML and DDL statements (CQL).
- APIs and other access methods - Proprietary protocol, Thrift.
- No Server-side scripts.
- User concepts - Access rights for users can be defined per object.
- https://stackoverflow.com/questions/28130774/how-did-facebook-use-cassandra-for-inbox-search-if-caasandra-has-no-search-capa
- https://www.quora.com/What-is-inbox-search-on-Facebook
- http://highscalability.com/blog/2010/11/16/facebooks-new-real-time-messaging-system-hbase-to-store-135.html
- http://horicky.blogspot.com/2010/10/bigtable-model-with-cassandra-and-hbase.html
- https://learnhbase.wordpress.com/2013/03/02/hbase-shell-commands/
- https://drill.apache.org/docs/querying-hbase/
- https://acadgild.com/blog/different-types-of-filters-in-hbase-shell