# OLAP vs. OLTP

You should now be familiar with the differences between OLTP and OLAP. In this exercise, you are given a list of cards describing a specific approach which you will categorize between OLAP and OLTP.

![OLAP_OLTP](/home/nero/Documents/Estudos/DataCamp/SQL/courses/database-design/oltp_olap.png)

**Great work, you've got the differences between the two down. In the rest of the course, we'll be referring to these two terms while delving deeper into their technical differences - so don't forget them!**

In [None]:
# exercise 01

"""
Which is better?

The city of Chicago receives many 311 service requests throughout the day. 311 service requests are non-urgent community requests, ranging from graffiti removal to street light outages. Chicago maintains a data repository of all these services organized by type of requests. In this exercise, Potholes has been loaded as an example of a table in this repository. It contains pothole reports made by Chicago residents from the past week.

Explore the dataset. What data processing approach is this larger repository most likely using?
"""

# Instructions

"""
Possible answers:
    
    OLTP because this table could not be used for any analysis.
    
    OLAP because each record has a unique service request number.
    
    OLTP because this table's structure appears to require frequent updates. {Answer}
    
    OLAP because this table focuses on pothole requests only.
"""

# solution



#----------------------------------#

# Conclusion

"""
That's right! This table probably uses an OLTP approach because it is updated and holds data from the past week.
"""

'/home/nero/Documents/Estudos/DataCamp'

# Name that data type!

In the previous video, you learned about structured, semi-structured, and unstructured data. Structured data is the easiest to analyze because it is organized and cleaned. On the other hand, unstructured data is schemaless, but scales well. In the middle we have semi-structured data for everything in between.

![DATA_TYPE](/home/nero/Documents/Estudos/DataCamp/SQL/courses/database-design/data_type.png)

**Nice classifying! From these real-life examples, can you see why unstructured data is easier to scale than structured data?**

# Ordering ETL Tasks

You have been hired to manage data at a small online clothing store. Their system is quite outdated because their only data repository is a traditional database to record transactions.

You decide to upgrade their system to a data warehouse after hearing that different departments would like to run their own business analytics. You reason that an ELT approach is unnecessary because there is relatively little data (< 50 GB).

![ORDER](/home/nero/Documents/Estudos/DataCamp/SQL/courses/database-design/order.png)

**Nice! In ETL, raw data is cleaned before being stored. This makes it accessible and ready to use.**

# Recommend a storage solution

When should you choose a data warehouse over a data lake?

### Possible Answers


    To train a machine learning model with a 150 GB of raw image data.
    
    
    To store real-time social media posts that may be used for future analysis
    
    
    To store customer data that needs to be updated regularly
    
    
    To create accessible and isolated data repositories for other analysts {Answer}

**That's right! Analysts will appreciate working in a data warehouse more because of its organization of structured data that make analysis easier.**

# Classifying data models

In the previous video, we learned about three different levels of data models: conceptual, logical, and physical.

![data_model](/home/nero/Documents/Estudos/DataCamp/SQL/courses/database-design/data_model.png)

In [1]:
# exercise 02

"""
Deciding fact and dimension tables

Imagine that you love running and data. It's only natural that you begin collecting data on your weekly running routine. You're most concerned with tracking how long you are running each week. You also record the route and the distances of your runs. You gather this data and put it into one table called Runs with the following schema:
runs
duration_mins - float
week - int
month - varchar(160)
year - int
park_name - varchar(160)
city_name - varchar(160)
distance_km - float
route_name - varchar(160)

After learning about dimensional modeling, you decide to restructure the schema for the database. Runs has been pre-loaded for you.
"""

# Instructions

"""
Question

Out of these possible answers, what would be the best way to organize the fact table and dimensional tables?

Possible answers:
    
    A fact table holding duration_mins and foreign keys to dimension tables holding route details and week details, respectively. {Answer}
    
    A fact table holding week,month, year and foreign keys to dimension tables holding route details and duration details, respectively.
    
    A fact table holding route_name,park_name, distance_km,city_name, and foreign keys to dimension tables holding week details and duration details, respectively.
---

    Create a dimension table called route that will hold the route information.
    Create a dimension table called week that will hold the week information.

"""

# solution

-- Create a route dimension table
CREATE TABLE route(
	route_id INTEGER PRIMARY KEY,
    route_name VARCHAR(160) NOT NULL,
    park_name VARCHAR(160) NOT NULL,
    distance_km FLOAT NOT NULL,
    city_name VARCHAR(160) NOT NULL
);
-- Create a week dimension table
CREATE TABLE week(
	week_id INTEGER PRIMARY KEY,
    week INTEGER NOT NULL,
    month VARCHAR(160) NOT NULL,
    year INTEGER NOT NULL
);

#----------------------------------#

# Conclusion

"""
Terrific tables! The primary keys route_id and week_id you created will be foreign keys in the fact table.
"""

'\n\n'

# Dimensional Model

<div style="background-color:darkgray; padding:2px">

![MODEL](/home/nero/Documents/Estudos/DataCamp/SQL/courses/database-design/dim_model.png)
</div>

In [2]:
# exercise 03

"""
Querying the dimensional model

Here it is! The schema reorganized using the dimensional model:

Let's try to run a query based on this schema. How about we try to find the number of minutes we ran in July, 2019? We'll break this up in two steps. First, we'll get the total number of minutes recorded in the database. Second, we'll narrow down that query to week_id's from July, 2019.
"""

# Instructions

"""

    Calculate the sum of the duration_mins column.
---

    Join week_dim and runs_fact.
    Get all the week_id's from July, 2019.

"""

# solution

SELECT 
	-- Select the sum of the duration of all runs
	SUM(duration_mins)
FROM 
	runs_fact;

#----------------------------------#

SELECT 
	-- Get the total duration of all runs
	SUM(duration_mins)
FROM 
	runs_fact
-- Get all the week_id's that are from July, 2019
INNER JOIN week_dim ON runs_fact.week_id = week_dim.week_id
WHERE month = 'July' and year = '2019';

#----------------------------------#

# Conclusion

"""
Nice! It looks like you've run 381.46 minutes in July. Because of its structure, the dimensional model usually require queries involving more than one table.
"""

'\n\n'