Skip to content

davidf552/Module03_Homework

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 

Repository files navigation

Module03_Homework

Data Warehouse

Creating tables in Big Query

Parameters

  • project_id = dataex-486105
  • gcp_bucket = dex_bk
  • external table = yellow_taxi
  • normal and unpartitioned table = yellow_taxi_n
  • partitioned and clustered table = yellow_taxi_partitioned_clustered

External table creation

CREATE OR REPLACE EXTERNAL TABLE `dataex-486105.ny_taxi.yellow_taxi`
OPTIONS (
  format = 'PARQUET',
  uris = ['gs://dez_bk/*.parquet']
);

Normal table creation

CREATE OR REPLACE TABLE `dataex-486105.ny_taxi.yellow_taxi_n`
AS
SELECT * FROM dataex-486105.ny_taxi.yellow_taxi;

Question 1

SELECT COUNT(*) AS records
FROM dataex-486105.ny_taxi.yellow_taxi;

Question 2

External table

SELECT COUNT(DISTINCT PULocationID)
FROM `dataex-486105.ny_taxi.yellow_taxi`;

Normal table

SELECT COUNT(DISTINCT PULocationID)
FROM `dataex-486105.ny_taxi.yellow_taxi_n`;

Question 3

SELECT PULocationID, DOLocationID
FROM `dataex-486105.ny_taxi.yellow_taxi_n`;

Question 4

SELECT COUNT(*) AS fare_zero FROM `dataex-486105.ny_taxi.yellow_taxi`
WHERE fare_amount=0;

Question 5

CREATE OR REPLACE TABLE `dataex-486105.ny_taxi.yellow_taxi_partitioned_clustered`
PARTITION BY DATE(tpep_dropoff_datetime)
CLUSTER BY (VendorID) AS
SELECT * FROM `dataex-486105.ny_taxi.yellow_taxi`;

Partition by tpep_dropoff_datetime and Cluster on VendorID

Question 6

Not partitioned nor clustered table

SELECT DISTINCT(VendorID) 
FROM `dataex-486105.ny_taxi.yellow_taxi_n`
WHERE DATE(tpep_dropoff_datetime) BETWEEN '2024-03-01' AND '2024-03-15' ;

Query cost = 310.24 MB

Partitioned and clustered table

SELECT DISTINCT(VendorID) 
FROM `dataex-486105.ny_taxi.yellow_taxi_partitioned_clustered`
WHERE DATE(tpep_dropoff_datetime) BETWEEN '2024-03-01' AND '2024-03-15' ;

Query cost = 26.84 MB

Question 9

SELECT COUNT(*)
FROM `dataex-486105.ny_taxi.yellow_taxi_partitioned_clustered`;

This query costs 0 bytes because BigQuery doesn't need to scan the table to produce the number of rows of a table.

About

Data Warehouse

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors