![ngo_project_image](ngo_project_image.jpg)

GoodThought NGO has been a catalyst for positive change, focusing its efforts on education, healthcare, and sustainable development to make a significant difference in communities worldwide. With this mission, GoodThought has orchestrated an array of assignments aimed at uplifting underprivileged populations and fostering long-term growth.

This project offers a hands-on opportunity to explore how data-driven insights can direct and enhance these humanitarian efforts. In this project, you'll engage with the GoodThought PostgreSQL database, which encapsulates detailed records of assignments, funding, impacts, and donor activities from 2010 to 2023. This comprehensive dataset includes:

- **`Assignments`:** Details about each project, including its name, duration (start and end dates), budget, geographical region, and the impact score.
- **`Donations`:** Records of financial contributions, linked to specific donors and assignments, highlighting how financial support is allocated and utilized.
- **`Donors`:** Information on individuals and organizations that fund GoodThought’s projects, including donor types.

Refer to the below ERD diagram for a visual representation of the relationships between these data tables:
<img src="erd.png" alt="ERD" width="50%" height="50%">


You will execute SQL queries to answer two questions, as listed in the instructions. Good luck!


## Visualizing the Data
The first five rows of all three tables were viewed, along with the data type of each column.

In [45]:
SELECT *
FROM donations
LIMIT 5;

Unnamed: 0,donation_id,donor_id,amount,donation_date,assignment_id
0,1,2733,271.36,2021-08-21 00:00:00+00:00,4226
1,2,2608,251.49,2021-10-15 00:00:00+00:00,1323
2,3,1654,528.38,2020-03-03 00:00:00+00:00,4881
3,4,3265,730.36,2021-02-06 00:00:00+00:00,1237
4,5,4932,285.96,2022-03-05 00:00:00+00:00,1626


In [46]:
SELECT * 
FROM assignments
LIMIT 5;

Unnamed: 0,assignment_id,assignment_name,start_date,end_date,budget,region,impact_score
0,1,Assignment_1,2021-10-17,2021-12-04,-32322.03,West,5.55
1,2,Assignment_2,2020-10-26,2020-11-28,57278.4,South,1.45
2,3,Assignment_3,2021-08-11,2022-03-17,40414.51,West,2.34
3,4,Assignment_4,2021-11-22,2022-05-17,31732.48,East,7.05
4,5,Assignment_5,2020-11-22,2021-07-10,13548.22,North,5.29


In [47]:
SELECT *
FROM donors
LIMIT 5;

Unnamed: 0,donor_id,donor_name,donor_type
0,1,Donor_1,Individual
1,2,Donor_2,Organization
2,3,Donor_3,Individual
3,4,Donor_4,Organization
4,5,Donor_5,Organization


In [48]:
SELECT table_name, column_name, data_type
FROM information_schema.columns
WHERE table_name = 'donations' OR 
	table_name = 'assignments' OR
	table_name = 'donors';

Unnamed: 0,table_name,column_name,data_type
0,assignments,assignment_id,integer
1,assignments,assignment_name,character varying
2,assignments,start_date,character varying
3,assignments,end_date,character varying
4,assignments,budget,numeric
5,assignments,region,character varying
6,assignments,impact_score,numeric
7,donations,donation_id,integer
8,donations,donor_id,integer
9,donations,amount,numeric


## Query 1: List the top five assignments based on total value of donations, categorized by donor type. The output should include four columns: 1) `assignment_name`, 2) `region`, 3) `rounded_total_donation_amount` rounded to two decimal places, and 4) `donor_type`, sorted by `rounded_total_donation_amount` in descending order. Save the result as `highest_donation_assignments`.

In [49]:
-- Creating the CTE d_per_a that calculates rounded_total_donation_amount to be used in the final query
SELECT assignments.assignment_id, 
	ROUND(SUM(donations.amount), 2) AS rounded_total_donation_amount, 
	donors.donor_type
FROM donations
LEFT JOIN assignments
	USING(assignment_id)
LEFT JOIN donors
	ON donations.donor_id = donors.donor_id
GROUP BY assignments.assignment_id, donors.donor_type
ORDER BY assignments.assignment_id ASC;

Unnamed: 0,assignment_id,rounded_total_donation_amount,donor_type
0,1,655.14,Organization
1,2,875.52,Individual
2,2,192.37,Organization
3,3,995.71,Individual
4,4,182.04,Organization
...,...,...,...
4261,4997,1157.18,Corporate
4262,4997,515.44,Individual
4263,4998,713.25,Corporate
4264,4998,603.72,Individual


In [50]:
-- highest_donation_assignments
WITH d_per_a AS (
	SELECT assignments.assignment_id, 
		ROUND(SUM(donations.amount), 2) AS rounded_total_donation_amount, 
		donors.donor_type
	FROM donations
	LEFT JOIN assignments
		USING(assignment_id)
	LEFT JOIN donors
		ON donations.donor_id = donors.donor_id
	GROUP BY assignments.assignment_id, donors.donor_type
)
SELECT assignments.assignment_name,
	assignments.region,
	d_per_a.rounded_total_donation_amount,
	d_per_a.donor_type
FROM d_per_a
LEFT JOIN assignments
	USING(assignment_id)
ORDER BY rounded_total_donation_amount DESC
LIMIT 5;

Unnamed: 0,assignment_name,region,rounded_total_donation_amount,donor_type
0,Assignment_3033,East,3840.66,Individual
1,Assignment_300,West,3133.98,Organization
2,Assignment_4114,North,2778.57,Organization
3,Assignment_1765,West,2626.98,Organization
4,Assignment_268,East,2488.69,Individual


## Query 2: Identify the assignment with the highest impact score in each region, ensuring that each listed assignment has received at least one donation. The output should include four columns: 1) `assignment_name`, 2) `region`, 3) `impact_score`, and 4) `num_total_donations`, sorted by `region` in ascending order. Include only the highest-scoring assignment per region, avoiding duplicates within the same region. Save the result as `top_regional_impact_assignments`.

In [51]:
-- Creating the CTE ntd that calculates num_total_donations to be used in the final query
SELECT assignments.assignment_id,
	COUNT(donations.donation_id) AS num_total_donations,
	assignments.region
FROM donations
LEFT JOIN assignments
	USING(assignment_id)
GROUP BY assignments.assignment_id, assignments.region
ORDER BY assignments.assignment_id DESC
LIMIT 5;

Unnamed: 0,assignment_id,num_total_donations,region
0,5000,1,West
1,4998,2,West
2,4997,3,South
3,4996,1,West
4,4995,2,East


In [52]:
-- Creating the CTE raked that ranks the impact_scores per region to be used in the final query
SELECT assignment_name,
	assignment_id, 
	region,
	impact_score,
	ROW_NUMBER() OVER(PARTITION BY region ORDER BY impact_score DESC) AS rn
FROM assignments
WHERE assignments.assignment_id IN (SELECT assignment_id FROM donations);


Unnamed: 0,assignment_name,assignment_id,region,impact_score,rn
0,Assignment_316,316,East,10.00,1
1,Assignment_3196,3196,East,9.97,2
2,Assignment_2281,2281,East,9.95,3
3,Assignment_2745,2745,East,9.95,4
4,Assignment_4246,4246,East,9.94,5
...,...,...,...,...,...
3183,Assignment_1634,1634,West,1.06,808
3184,Assignment_1150,1150,West,1.05,809
3185,Assignment_3778,3778,West,1.05,810
3186,Assignment_2518,2518,West,1.04,811


In [53]:
-- top_regional_impact_assignments
WITH ntd AS (
	SELECT assignments.assignment_id,
		COUNT(donations.donation_id) AS num_total_donations,
	assignments.region
	FROM donations
	LEFT JOIN assignments
		USING(assignment_id)
	GROUP BY assignments.assignment_id, assignments.region
),
	ranked AS (
	SELECT assignment_name,
		assignment_id, 
		region,
		impact_score,
		ROW_NUMBER() OVER(PARTITION BY region ORDER BY impact_score DESC) AS rn
	FROM assignments
	WHERE assignments.assignment_id IN (SELECT assignment_id FROM donations)
)
SELECT ranked.assignment_name,
	ranked.region,
	ranked.impact_score,
	ntd.num_total_donations
FROM ranked
LEFT JOIN ntd
USING(assignment_id)
WHERE ranked.rn = 1
ORDER BY region ASC;


Unnamed: 0,assignment_name,region,impact_score,num_total_donations
0,Assignment_316,East,10.0,2
1,Assignment_2253,North,9.99,1
2,Assignment_3547,South,10.0,1
3,Assignment_2794,West,9.99,2
