Please view the detailed case study info here.
- Business Task
- Questions And Solutions
- Data Exploration and Cleansing
- Interest Analysis
- Segment Analysis
- Index Analysis
Fresh Segments is a digital marketing agency that helps other businesses analyze trends in online ad click behavior for their unique customer base.
The task is to analyze the aggregated metrices and provide high levels insights for the given dataset.
There are 2 datasets in this case study to answer the business questions as follows:
1. Update the interest_metrics
table by modifying the month_year column to be a date data type with the start of the month.
-- Drop month_year column
ALTER TABLE interest_metrics
DROP COLUMN month_year;
-- Add new month_year column
ALTER TABLE interest_metrics
ADD COLUMN month_year DATE;
-- Add data to the month_year column
UPDATE interest_metrics
SET month_year = CAST(CONCAT(_year, "-", _month, "-01") AS DATE);
-- To check the data type of each column
DESC interest_metrics;
Answer:
![image](https://private-user-images.githubusercontent.com/148400128/318143909-176f814b-54b7-452c-af0f-1881d7353ef1.png?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTUiLCJleHAiOjE3MjE0NDUzOTAsIm5iZiI6MTcyMTQ0NTA5MCwicGF0aCI6Ii8xNDg0MDAxMjgvMzE4MTQzOTA5LTE3NmY4MTRiLTU0YjctNDUyYy1hZjBmLTE4ODFkNzM1M2VmMS5wbmc_WC1BbXotQWxnb3JpdGhtPUFXUzQtSE1BQy1TSEEyNTYmWC1BbXotQ3JlZGVudGlhbD1BS0lBVkNPRFlMU0E1M1BRSzRaQSUyRjIwMjQwNzIwJTJGdXMtZWFzdC0xJTJGczMlMkZhd3M0X3JlcXVlc3QmWC1BbXotRGF0ZT0yMDI0MDcyMFQwMzExMzBaJlgtQW16LUV4cGlyZXM9MzAwJlgtQW16LVNpZ25hdHVyZT01ZWFlMDljOWY4OGQ3OWE0M2YxNTFiZTIzMDdjZWM1MWEzMmQ0MzI5NzJiNTJhZjdjOWNjY2FjODM3Zjc3ZjA3JlgtQW16LVNpZ25lZEhlYWRlcnM9aG9zdCZhY3Rvcl9pZD0wJmtleV9pZD0wJnJlcG9faWQ9MCJ9.H-swQNjsQKUyanx0uCo-GOjAnB1yGrJLN4caYhsJ6XY)
2. What is the count of records in the interest_metrics
for each month_year
value sorted in chronological order (earliest to latest) with the null values appearing first?
SELECT
month_year,
COUNT(*) AS count
FROM interest_metrics
GROUP BY month_year
ORDER BY month_year;
Answer:
![image](https://private-user-images.githubusercontent.com/148400128/318144514-ebb78e02-e4ce-4f78-a233-fb6d92764054.png?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTUiLCJleHAiOjE3MjE0NDUzOTAsIm5iZiI6MTcyMTQ0NTA5MCwicGF0aCI6Ii8xNDg0MDAxMjgvMzE4MTQ0NTE0LWViYjc4ZTAyLWU0Y2UtNGY3OC1hMjMzLWZiNmQ5Mjc2NDA1NC5wbmc_WC1BbXotQWxnb3JpdGhtPUFXUzQtSE1BQy1TSEEyNTYmWC1BbXotQ3JlZGVudGlhbD1BS0lBVkNPRFlMU0E1M1BRSzRaQSUyRjIwMjQwNzIwJTJGdXMtZWFzdC0xJTJGczMlMkZhd3M0X3JlcXVlc3QmWC1BbXotRGF0ZT0yMDI0MDcyMFQwMzExMzBaJlgtQW16LUV4cGlyZXM9MzAwJlgtQW16LVNpZ25hdHVyZT0xMTBiMGI1YTcwNzFkMGNhNDg2MmFjNmY2ODlhZmNjZWZmZmQ4YTdkNDg2OTNmZDJhZDNiODllZTQ3NTI4MTg2JlgtQW16LVNpZ25lZEhlYWRlcnM9aG9zdCZhY3Rvcl9pZD0wJmtleV9pZD0wJnJlcG9faWQ9MCJ9.qSanAfhhb2_HKA_sU5nenoDa2tzmN0qtk2SkgkPolrs)
-- To find the percentage of null values in this dataset
SELECT
100 * COUNT(*) / (SELECT
COUNT(*)
FROM
interest_metrics) AS null_percent
FROM
interest_metrics
WHERE
interest_id IS NULL;
Answer:
![image](https://private-user-images.githubusercontent.com/148400128/318149912-6efdd09c-7277-48d9-be0f-7d2e0fa83d29.png?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTUiLCJleHAiOjE3MjE0NDUzOTAsIm5iZiI6MTcyMTQ0NTA5MCwicGF0aCI6Ii8xNDg0MDAxMjgvMzE4MTQ5OTEyLTZlZmRkMDljLTcyNzctNDhkOS1iZTBmLTdkMmUwZmE4M2QyOS5wbmc_WC1BbXotQWxnb3JpdGhtPUFXUzQtSE1BQy1TSEEyNTYmWC1BbXotQ3JlZGVudGlhbD1BS0lBVkNPRFlMU0E1M1BRSzRaQSUyRjIwMjQwNzIwJTJGdXMtZWFzdC0xJTJGczMlMkZhd3M0X3JlcXVlc3QmWC1BbXotRGF0ZT0yMDI0MDcyMFQwMzExMzBaJlgtQW16LUV4cGlyZXM9MzAwJlgtQW16LVNpZ25hdHVyZT1iODU2ZGY1ZDBiNjkxYTZhYzMwMmMwMGMyMTMyNDZlOTZkOTRmMzU4ZTI5NjljNDE0YWQ0YTM3ZDg2ZjYzYTgwJlgtQW16LVNpZ25lZEhlYWRlcnM9aG9zdCZhY3Rvcl9pZD0wJmtleV9pZD0wJnJlcG9faWQ9MCJ9.-ZEJmedhvJZT63wftP-ltOeFmQWFcaiiPp0LuzBNwrY)
Since the data is meaningless without interest_id, _month, and _year info and the null values percentage is only around 8.35%, I suggest dropping these null values.
-- To delete null values
DELETE FROM interest_metrics
WHERE interest_id IS NULL;
-- To check if all null values are deleted
SELECT
100 * COUNT(*) / (SELECT
COUNT(*)
FROM
interest_metrics) AS null_percent
FROM
interest_metrics
WHERE
interest_id IS NULL;
![image](https://private-user-images.githubusercontent.com/148400128/318150207-a683deea-e64d-4b50-b376-42ec16b758d9.png?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTUiLCJleHAiOjE3MjE0NDUzOTAsIm5iZiI6MTcyMTQ0NTA5MCwicGF0aCI6Ii8xNDg0MDAxMjgvMzE4MTUwMjA3LWE2ODNkZWVhLWU2NGQtNGI1MC1iMzc2LTQyZWMxNmI3NThkOS5wbmc_WC1BbXotQWxnb3JpdGhtPUFXUzQtSE1BQy1TSEEyNTYmWC1BbXotQ3JlZGVudGlhbD1BS0lBVkNPRFlMU0E1M1BRSzRaQSUyRjIwMjQwNzIwJTJGdXMtZWFzdC0xJTJGczMlMkZhd3M0X3JlcXVlc3QmWC1BbXotRGF0ZT0yMDI0MDcyMFQwMzExMzBaJlgtQW16LUV4cGlyZXM9MzAwJlgtQW16LVNpZ25hdHVyZT01YzY2NTAyMzI1OTA0NWEwMTdhZGRkNWVkMzZiMzM2ZmM0ODVhMWQxZjdjMmM5YTExZmFiYjkzNTA2YTVkYjljJlgtQW16LVNpZ25lZEhlYWRlcnM9aG9zdCZhY3Rvcl9pZD0wJmtleV9pZD0wJnJlcG9faWQ9MCJ9.te5nej77b0qlwMIamZbJh6hmnZO2jx1CPI45UNTqsLA)
4. How many interest_id
values exist in the interest_metrics
table but not in the interest_map
table? What about the other way around?
-- Outer Join two tables
WITH combine_cte AS (
SELECT * FROM interest_metrics mt LEFT JOIN interest_map mp ON mt.interest_id = mp.id
UNION
SELECT * FROM interest_metrics mt RIGHT JOIN interest_map mp ON mt.interest_id = mp.id
)
SELECT
COUNT(DISTINCT interest_id) AS metrics_id_count,
COUNT(DISTINCT id) AS maps_id_count,
SUM(IF(interest_id IS NULL, 1, 0)) AS not_in_metrics,
SUM(IF(id IS NULL, 1, 0)) AS not_in_maps
FROM
combine_cte;
Answer:
![image](https://private-user-images.githubusercontent.com/148400128/318150556-35a77062-bfab-4112-904f-f9a3e37d1a40.png?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTUiLCJleHAiOjE3MjE0NDUzOTAsIm5iZiI6MTcyMTQ0NTA5MCwicGF0aCI6Ii8xNDg0MDAxMjgvMzE4MTUwNTU2LTM1YTc3MDYyLWJmYWItNDExMi05MDRmLWY5YTNlMzdkMWE0MC5wbmc_WC1BbXotQWxnb3JpdGhtPUFXUzQtSE1BQy1TSEEyNTYmWC1BbXotQ3JlZGVudGlhbD1BS0lBVkNPRFlMU0E1M1BRSzRaQSUyRjIwMjQwNzIwJTJGdXMtZWFzdC0xJTJGczMlMkZhd3M0X3JlcXVlc3QmWC1BbXotRGF0ZT0yMDI0MDcyMFQwMzExMzBaJlgtQW16LUV4cGlyZXM9MzAwJlgtQW16LVNpZ25hdHVyZT1hZmI4NWY3ZjgwYmI2M2JkOGNiYmNiM2Y2MDA3NGQwZmI0MzU0NzA5MjMyZmY0NDliMDgzMWU4YWExMzBlYjA0JlgtQW16LVNpZ25lZEhlYWRlcnM9aG9zdCZhY3Rvcl9pZD0wJmtleV9pZD0wJnJlcG9faWQ9MCJ9.lx5Frybyl4wp8MMT1FUsd9qmMm_EAGQ4yLGv0sVVDmA)
SELECT
id, interest_name, COUNT(*) AS record_count
FROM
interest_map mp
JOIN
interest_metrics mt ON mt.interest_id = mp.id
GROUP BY mp.id
ORDER BY record_count DESC;
Answer:
Only showing the partial result here. There are 1,000 results from this query.
![image](https://private-user-images.githubusercontent.com/148400128/318150848-4bc0088c-c2a9-4735-b3f7-f6e4f279bd76.png?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTUiLCJleHAiOjE3MjE0NDUzOTAsIm5iZiI6MTcyMTQ0NTA5MCwicGF0aCI6Ii8xNDg0MDAxMjgvMzE4MTUwODQ4LTRiYzAwODhjLWMyYTktNDczNS1iM2Y3LWY2ZTRmMjc5YmQ3Ni5wbmc_WC1BbXotQWxnb3JpdGhtPUFXUzQtSE1BQy1TSEEyNTYmWC1BbXotQ3JlZGVudGlhbD1BS0lBVkNPRFlMU0E1M1BRSzRaQSUyRjIwMjQwNzIwJTJGdXMtZWFzdC0xJTJGczMlMkZhd3M0X3JlcXVlc3QmWC1BbXotRGF0ZT0yMDI0MDcyMFQwMzExMzBaJlgtQW16LUV4cGlyZXM9MzAwJlgtQW16LVNpZ25hdHVyZT05MWNhMzI1OTVkOTYwNzIwMWYwODI5NjY0MWM2MGI2NDA4MDg5NDAzNzhmZjEyYTAzYWFiNmI4NTIxN2NhYzNjJlgtQW16LVNpZ25lZEhlYWRlcnM9aG9zdCZhY3Rvcl9pZD0wJmtleV9pZD0wJnJlcG9faWQ9MCJ9.fWJrfcLaqIKzzNWKG5xyBYq-oZ6OUoJeYEwwLU9xL1U)
6. What sort of table join should we perform for our analysis and why? Check your logic by checking the rows where interest_id = 21246
in your joined output and include all columns from interest_metrics
and all columns from interest_map
except the id
column.
The good old INNER JOIN
should be used to connect two tables by matching interest_id
& id
. This JOIN
method would allow us to have the data records that are available in both tables.
SELECT
mt.*,
mp.interest_name,
mp.interest_summary,
mp.created_at,
mp.last_modified
FROM
interest_metrics mt
JOIN
interest_map mp ON mt.interest_id = mp.id
WHERE
mt.interest_id = 21246
AND mt._month IS NOT NULL;
Answer:
![image](https://private-user-images.githubusercontent.com/148400128/318151574-5fa6b813-c9c0-42e6-8b87-ed3c69889b28.png?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTUiLCJleHAiOjE3MjE0NDUzOTAsIm5iZiI6MTcyMTQ0NTA5MCwicGF0aCI6Ii8xNDg0MDAxMjgvMzE4MTUxNTc0LTVmYTZiODEzLWM5YzAtNDJlNi04Yjg3LWVkM2M2OTg4OWIyOC5wbmc_WC1BbXotQWxnb3JpdGhtPUFXUzQtSE1BQy1TSEEyNTYmWC1BbXotQ3JlZGVudGlhbD1BS0lBVkNPRFlMU0E1M1BRSzRaQSUyRjIwMjQwNzIwJTJGdXMtZWFzdC0xJTJGczMlMkZhd3M0X3JlcXVlc3QmWC1BbXotRGF0ZT0yMDI0MDcyMFQwMzExMzBaJlgtQW16LUV4cGlyZXM9MzAwJlgtQW16LVNpZ25hdHVyZT02YzUyMzI0OGExMTkwMmFiODlhNDZhN2FhNjgwZWUyYjZhNTQ2NGI0NzgyZjljZWU2YjhiNzAxMWE3OWRlMmU4JlgtQW16LVNpZ25lZEhlYWRlcnM9aG9zdCZhY3Rvcl9pZD0wJmtleV9pZD0wJnJlcG9faWQ9MCJ9.mpIyBaZmllkxC3QgbGou00s3UT9BR84u5PDkmNdEOgA)
7. Are there any records in your joined table where the month_year
value is before the created_at
value from the interest_map
table? Do you think these values are valid and why?
-- To check the data counts where the month_year is before the created_at value
SELECT
COUNT(*) AS count
FROM
interest_metrics mt
JOIN interest_map mp ON mt.interest_id = mp.id
WHERE mt.month_year < mp.created_at;
Answer:
![image](https://private-user-images.githubusercontent.com/148400128/318152097-c3329b36-10f3-49cd-8944-65912408b172.png?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTUiLCJleHAiOjE3MjE0NDUzOTAsIm5iZiI6MTcyMTQ0NTA5MCwicGF0aCI6Ii8xNDg0MDAxMjgvMzE4MTUyMDk3LWMzMzI5YjM2LTEwZjMtNDljZC04OTQ0LTY1OTEyNDA4YjE3Mi5wbmc_WC1BbXotQWxnb3JpdGhtPUFXUzQtSE1BQy1TSEEyNTYmWC1BbXotQ3JlZGVudGlhbD1BS0lBVkNPRFlMU0E1M1BRSzRaQSUyRjIwMjQwNzIwJTJGdXMtZWFzdC0xJTJGczMlMkZhd3M0X3JlcXVlc3QmWC1BbXotRGF0ZT0yMDI0MDcyMFQwMzExMzBaJlgtQW16LUV4cGlyZXM9MzAwJlgtQW16LVNpZ25hdHVyZT0zNGY2OTY2ZWQwYTFlYjMwYmZiODhmMGFjNGQ3YWNjNjRlMDE0OWQ0YTE1NjVlYjQ1YTY2Yzk5N2ViM2NiMGIyJlgtQW16LVNpZ25lZEhlYWRlcnM9aG9zdCZhY3Rvcl9pZD0wJmtleV9pZD0wJnJlcG9faWQ9MCJ9.TO2o7u9Kh9qKlw5OC5SiA3dk45igj4IAoM7QJzCdJV4)
Even though there are 188 records where the month_year
value is before the created_at
value, we also have to keep in mind how the month_year
column is created. The month_year
column was updated to have the data values at the start of each month. So, I would check again whether there are any values of month_year
before created_at
and not in the same month and same year as the created_at
, and there are none in such condition as seen below.
SELECT
COUNT(*) AS ctn
FROM
interest_metrics mt
JOIN
interest_map mp ON mt.interest_id = mp.id
WHERE
mt.month_year < mp.created_at
AND MONTH(mt.month_year) != MONTH(mp.created_at)
AND YEAR(mt.month_year) != YEAR(mp.created_at);
Answer:
![image](https://private-user-images.githubusercontent.com/148400128/318152481-a2aa5650-a587-4286-97e2-3addff802c5d.png?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTUiLCJleHAiOjE3MjE0NDUzOTAsIm5iZiI6MTcyMTQ0NTA5MCwicGF0aCI6Ii8xNDg0MDAxMjgvMzE4MTUyNDgxLWEyYWE1NjUwLWE1ODctNDI4Ni05N2UyLTNhZGRmZjgwMmM1ZC5wbmc_WC1BbXotQWxnb3JpdGhtPUFXUzQtSE1BQy1TSEEyNTYmWC1BbXotQ3JlZGVudGlhbD1BS0lBVkNPRFlMU0E1M1BRSzRaQSUyRjIwMjQwNzIwJTJGdXMtZWFzdC0xJTJGczMlMkZhd3M0X3JlcXVlc3QmWC1BbXotRGF0ZT0yMDI0MDcyMFQwMzExMzBaJlgtQW16LUV4cGlyZXM9MzAwJlgtQW16LVNpZ25hdHVyZT05OGNjNjcwZjQ3MGIxOTdkMTQ2Y2Y5Y2I5ZjZiYjk1ZGJiNGVmOTVhNWUyYjIxZGY1ZjViMDQwYmY5NWNhYmRjJlgtQW16LVNpZ25lZEhlYWRlcnM9aG9zdCZhY3Rvcl9pZD0wJmtleV9pZD0wJnJlcG9faWQ9MCJ9.tYHZFOvyNikcuwzwLiI2BcU0m9NbMqpZDZjioFCZLq4)
-- To check the unique number of interests and month_year
SELECT
COUNT(DISTINCT interest_id) AS interest_count,
COUNT(DISTINCT month_year) AS month_year_count
FROM
interest_metrics;
Answer:
![image](https://private-user-images.githubusercontent.com/148400128/318152699-73f6e031-f22a-42d5-9990-0a4d0a143206.png?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTUiLCJleHAiOjE3MjE0NDUzOTAsIm5iZiI6MTcyMTQ0NTA5MCwicGF0aCI6Ii8xNDg0MDAxMjgvMzE4MTUyNjk5LTczZjZlMDMxLWYyMmEtNDJkNS05OTkwLTBhNGQwYTE0MzIwNi5wbmc_WC1BbXotQWxnb3JpdGhtPUFXUzQtSE1BQy1TSEEyNTYmWC1BbXotQ3JlZGVudGlhbD1BS0lBVkNPRFlMU0E1M1BRSzRaQSUyRjIwMjQwNzIwJTJGdXMtZWFzdC0xJTJGczMlMkZhd3M0X3JlcXVlc3QmWC1BbXotRGF0ZT0yMDI0MDcyMFQwMzExMzBaJlgtQW16LUV4cGlyZXM9MzAwJlgtQW16LVNpZ25hdHVyZT0wOTIyNmRhODdkODQ4NDdlMzhlNzkwY2I2OTNjZjY2YTU5NThlMzM3N2M1NzZhMGFkMDE0MTM2MGVjMWI5M2QwJlgtQW16LVNpZ25lZEhlYWRlcnM9aG9zdCZhY3Rvcl9pZD0wJmtleV9pZD0wJnJlcG9faWQ9MCJ9.uDqnhGyXbYWDchjGqV_RjtqWzmRG7EmScBpofaLBBf0)
--To check the interests count that present in all month_year
WITH count_cte AS (
SELECT
interest_id,
COUNT(DISTINCT month_year) AS total_months
FROM
interest_metrics
WHERE month_year IS NOT NULL
GROUP BY
interest_id
)
SELECT
total_months,
COUNT(DISTINCT interest_id) AS interest_count
FROM
count_cte
WHERE total_months = 14;
Answer:
![image](https://private-user-images.githubusercontent.com/148400128/318152864-2b9ad4bd-a121-4461-9cd5-4c247278e229.png?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTUiLCJleHAiOjE3MjE0NDUzOTAsIm5iZiI6MTcyMTQ0NTA5MCwicGF0aCI6Ii8xNDg0MDAxMjgvMzE4MTUyODY0LTJiOWFkNGJkLWExMjEtNDQ2MS05Y2Q1LTRjMjQ3Mjc4ZTIyOS5wbmc_WC1BbXotQWxnb3JpdGhtPUFXUzQtSE1BQy1TSEEyNTYmWC1BbXotQ3JlZGVudGlhbD1BS0lBVkNPRFlMU0E1M1BRSzRaQSUyRjIwMjQwNzIwJTJGdXMtZWFzdC0xJTJGczMlMkZhd3M0X3JlcXVlc3QmWC1BbXotRGF0ZT0yMDI0MDcyMFQwMzExMzBaJlgtQW16LUV4cGlyZXM9MzAwJlgtQW16LVNpZ25hdHVyZT01NjlmMzBlZDUxZjA2OTJkMzYzN2JjNWNmOTMwYzc1NWE3NDM3ODE5OGE5YzE0ZjlkNjc0ZGJjM2M1ODU4Yjk1JlgtQW16LVNpZ25lZEhlYWRlcnM9aG9zdCZhY3Rvcl9pZD0wJmtleV9pZD0wJnJlcG9faWQ9MCJ9.U9ybfGwk2yyuxD8VYJTj4ds5FQ_IN89tKYhH5AaZlFg)
2. Using this same total_months
measure - calculate the cumulative percentage of all records starting at 14 months - which total_months
value passes the 90% cumulative percentage value?
WITH month_count_cte AS (
SELECT
interest_id,
COUNT(DISTINCT month_year) AS total_months
FROM
interest_metrics
WHERE month_year IS NOT NULL
GROUP BY interest_id
),
interest_count AS (
SELECT
total_months,
COUNT(DISTINCT interest_id) AS interest_count
FROM
month_count_cte
GROUP BY total_months
),
percent_cte AS (
SELECT *,
ROUND(100* SUM(interest_count) OVER(ORDER BY total_months DESC)/(SELECT COUNT(DISTINCT interest_id) FROM interest_metrics), 2) AS cumulative_percent
FROM
interest_count
)
SELECT
*
FROM
percent_cte
WHERE cumulative_percent > 90;
Answer:
![image](https://private-user-images.githubusercontent.com/148400128/318153165-42f73d26-6293-4b09-90ec-4ae63b2a2f01.png?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTUiLCJleHAiOjE3MjE0NDUzOTAsIm5iZiI6MTcyMTQ0NTA5MCwicGF0aCI6Ii8xNDg0MDAxMjgvMzE4MTUzMTY1LTQyZjczZDI2LTYyOTMtNGIwOS05MGVjLTRhZTYzYjJhMmYwMS5wbmc_WC1BbXotQWxnb3JpdGhtPUFXUzQtSE1BQy1TSEEyNTYmWC1BbXotQ3JlZGVudGlhbD1BS0lBVkNPRFlMU0E1M1BRSzRaQSUyRjIwMjQwNzIwJTJGdXMtZWFzdC0xJTJGczMlMkZhd3M0X3JlcXVlc3QmWC1BbXotRGF0ZT0yMDI0MDcyMFQwMzExMzBaJlgtQW16LUV4cGlyZXM9MzAwJlgtQW16LVNpZ25hdHVyZT00YWJlOTIzZTVkMzllMGFhNDcxM2IwOTkwMjBkMTQ0ZjhmZWE3OTQ2YjgxMDI5NTBlOTM4OWVlNjE2NmY3MTZhJlgtQW16LVNpZ25lZEhlYWRlcnM9aG9zdCZhY3Rvcl9pZD0wJmtleV9pZD0wJnJlcG9faWQ9MCJ9.DN7WaATOs_krAH6YqOCe9_OvVhjNpFhg1AOi2M6iTZ8)
3. If we were to remove all interest_id
values which are lower than the total_months
value we found in the previous question - how many total data points would we be removing?
WITH month_cte AS (
SELECT
COUNT(DISTINCT month_year) AS total_months,
interest_id
FROM
interest_metrics
WHERE month_year IS NOT NULL
GROUP BY interest_id
HAVING total_months <= 6
)
SELECT
COUNT(interest_id) AS interest_count
FROM
month_cte;
Answer:
![image](https://private-user-images.githubusercontent.com/148400128/318153477-fbb8e3b9-079f-427c-82ee-dffb9c355464.png?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTUiLCJleHAiOjE3MjE0NDUzOTAsIm5iZiI6MTcyMTQ0NTA5MCwicGF0aCI6Ii8xNDg0MDAxMjgvMzE4MTUzNDc3LWZiYjhlM2I5LTA3OWYtNDI3Yy04MmVlLWRmZmI5YzM1NTQ2NC5wbmc_WC1BbXotQWxnb3JpdGhtPUFXUzQtSE1BQy1TSEEyNTYmWC1BbXotQ3JlZGVudGlhbD1BS0lBVkNPRFlMU0E1M1BRSzRaQSUyRjIwMjQwNzIwJTJGdXMtZWFzdC0xJTJGczMlMkZhd3M0X3JlcXVlc3QmWC1BbXotRGF0ZT0yMDI0MDcyMFQwMzExMzBaJlgtQW16LUV4cGlyZXM9MzAwJlgtQW16LVNpZ25hdHVyZT1iYTI4NTAzNDg0MWMyMjMyZTUxZjEwODY2MGI3ZmVlMDg5MmJlOTk5NTg1ZjQ5NDJlZjI1MjY2MjJkNDFiYWVlJlgtQW16LVNpZ25lZEhlYWRlcnM9aG9zdCZhY3Rvcl9pZD0wJmtleV9pZD0wJnJlcG9faWQ9MCJ9.K-2JHSlmh1UnqeKcFxA107yRP7yN3wci-PgW-lQ6Rag)