# Challenges with RDBMS and Solutions

As part of this session we will look into some of the challenges with respect to RDBMS and how other modern databases solve the problems.

* Challenges with RDBMS
* Solutions
* NoSql Databases
* MongoDB

### Challenges with RDBMS
* When we perform insert or update or delete, SQL engine will perform constraint, datatypes, length and precision checks
* Overhead for maintaining transactions (like undo, redo in case of oracle) will impact performance for heavy weight batch processing.
* Suppose if you apply Insert statement,  it has to understand the schema and apply the schema to the record also. This causes a problem in dealing with large volume of data.
* So, applying rules and enforcing transactions are the major bottleneck
* Another challenge is – Modern databases have multiple servers. So, each of the nodes must get same technical view of data.
* Joins and aggregations also become really slow, having secondary indexes becomes too expensive.
* For example, you build e-commerce platform and want to find out the performance of each of the department, performance of platform with in geographic solution etc. To build these reports we need to process huge amounts of data from multiple database or perform expensive joins with in each database, this will take huge amount of resources – CPU, memory and networking and thus makes applications run slow.

### Solutions
* If you need to find out about your business trends, you need to get reports. Getting reports from a given database is resource intensive.
* Solution is Data Warehousing. Data Warehouses are central repositories of integrated data coming from various sources to get useful business insights.
* Examples of Data Warehousing technologies – Teradata, Hadoop etc.
* Process of getting data into datawarehouse-
    * Identify all sources from which data needs to be fetched.
    * Data Modeling using Dimension Modeling
    * Develop and Schedule ETL (Extract, Transform, Load) jobs
* Data is fetched from traditional based systems into Data Warehouse, thus it can store historical data and also we need not write expensive queries on source data.
* This process of getting data into datawarehouse is called ETL.

### Data Modeling in Data Warehouses
* In traditional database, we model our data using normalization and application starts loading data into the tables and business happens as usual .
* But when it comes to Data Warehouse, even though now the report requirements are running in Data Warehouse still meeting that SLA of running the report (it can be daily or monthly or yearly). The executive management wants to see these reports as soon as possible. If we create reports on normalized tables, it can be slow due to expensive joins.
* So, we use Dimensional Modeling  to create tables – facts, dimensions and measures.
* Denormalized Data Model- star schema and snowflake schema.
* If I want to compute daily report, if data is precomputed for each day, performance will be faster.
* Redshift, Vertica and Hive datawarehouse tools don’t enforce constraints.
* **Example of creating a fact table for daily revenue for each product**-We have two tables,orders and order_items.Now let us create a denormalized data model(include only those fields which we require for generating our report) – 

We created lowest granular fact table which will result in large table over a period of time. We can have raw data into this table and build another table where we pre-aggregate on daily basis for each product. **order_item_subtotal** can be used as a **measure** to evaluate daily revenue or daily revenue per product and order_date and product_name are the keys. This new table which contain pre-aggregated data will have as many products are sold on each day.

* Dimension – Dimension is just metadata which drive your report requirement. If we want see monthly report,we extract from order_date to get monthly reports.Having date dimension,we can find out how my business is performing on particular day.Dimension is just metadata which drive your report requirement.Similarly Product dimension has categoryand product name.Using category you can get the product name.Without the dimensions, we cannot measure the facts.
* Fact table is the one which contain keys and measures depending up on the report requirements.

### NOSQL Databases
* Examples – Cassandra, Hbase and MongoDB
* Tables are generally Indexed and partitioned
* NoSQL not suitable for mission critical solutions like e-commerce platforms.
* Works well for scalability of simple tables
* Have flexible schema
* Let us understand NoSQL databases with an example of LinkedIn skills endorsement. If we define the data model using RDBMS, we need to define 3 tables -Skills (id, name)  endorsement (id, who, whom) and person(id, name, company name, photo) and for every query we need to join these three tables which is very expensive and non-scalable in case of millions of records.
* In such cases, it’s highly preferable to use NOSQL databases.We can create two tables – one with raw data and other with pre-processed data as per our reporting requirements. In raw table we have each endorsement and pre aggregated data in another table.  Pre aggregating can be achieved using Kafka and other streaming technologies such as Spark Streaming.

### Working with MongoDB
* Log in to mongodb shell – mongo –host gw01.itversity.com
* You can have multiple databases in one mongo installation.
* List databases – show dbs;
* Collection term is used in place of tables and document for rows.
* Collection is a group of documents
* List collections – show collections;
* CRUD operations
    * Inserting records – db.demo.insert({“emp_id”:1,”emp_name”:”scott tiger”})
    * Now show collections command will also include demo in the list.
    * db.demo.find({}) is equivalent to select all records from demo.
    * Use db.demo.find({}).pretty()  command to get the data in formatted manner
    * db.demo.findOne({}) will fetch first record
    * db.demo.find({“emp_id”:2}) will fetch emp_id 2 record.
    * Create database using command – –use retail_db_demo
    * Using mongo import
        * mongoimport –db retail_db –collection departments –type json –columnsHaveTypes –file /example/file.csv