### Phase 4: From Analysis to Action
**Purpose**</br>
This final phase is all about turning your findings into actionable insights and creating a repair & intervention plan. You’ll:</br>
Combine all prior tables into a single view. Use that to create summaries per province and town.</br>
Recommend engineering tasks based on criteria.</br>
Create a new table called Project_progress to track actual improvements.</br>

### Step-by-Step Tasks
#### 1. Combine Multiple Tables into a View
**You’ll need to join:**</br>
location (for town, province, location_type, address),</br>
visits (for source_id, location_id, and time_in_queue),</br>
water_source (for type_of_water_source, number_of_people_served),</br>
well_pollution (for results).</br>
**Filter**:</br>

Only visit_count = 1.

In [None]:
# Load and activate the SQL extension to allow us to execute SQL in a Jupyter notebook. 
# If you get an error here, make sure that mysql and pymysql are installed correctly. 

%load_ext sql

In [None]:
# Establish a connection to the local database using the '%sql' magic command.
# Replace `password` with our connection password and `db_name` with our database name. 
# If you get an error here, please make sure the database name or password is correct.

%sql mysql+pymysql://root:password@localhost:3306/md_water_services

In [None]:
%%sql
CREATE VIEW combined_analysis_table AS (
SELECT
	ws.type_of_water_source AS source_type,
    lc.town_name,
    lc.province_name,
    lc.location_type,
    ws.number_of_people_served AS people_served,
    vs.time_in_queue,
    wp.results
FROM
	visits AS vs
LEFT JOIN
	well_pollution AS wp
ON
	vs.source_id = wp.source_id
INNER JOIN
	location AS lc
ON
	vs.location_id = lc.location_id
INNER JOIN
	water_source AS ws
ON
	vs.source_id = ws.source_id);

### Province-Level Water Source Summary

**Task:**

* Summarize water access by **province**, using:

  * Tap in home
  * Tap in home broken
  * River
  * Shared tap
  * Well

Use `GROUP BY province_name`
Use **percentages**: how much of the population uses each source type.

This shows **regional inequalities** and **planning priorities**.

In [None]:
%%sql
WITH province_totals AS (
  SELECT province_name, SUM(people_served) AS total_ppl_serv
  FROM combined_analysis_table
  GROUP BY province_name
)
SELECT
    ct.province_name,
    ROUND(SUM(CASE WHEN source_type = 'river' THEN people_served ELSE 0 END) * 100.0 / pt.total_ppl_serv, 0) AS river,
    ROUND(SUM(CASE WHEN source_type = 'shared_tap' THEN people_served ELSE 0 END) * 100.0 / pt.total_ppl_serv, 0) AS shared_tap,
    ROUND(SUM(CASE WHEN source_type = 'tap_in_home' THEN people_served ELSE 0 END) * 100.0 / pt.total_ppl_serv, 0) AS tap_in_home,
    ROUND(SUM(CASE WHEN source_type = 'tap_in_home_broken' THEN people_served ELSE 0 END) * 100.0 / pt.total_ppl_serv, 0) AS tap_in_home_broken,
    ROUND(SUM(CASE WHEN source_type = 'well' THEN people_served ELSE 0 END) * 100.0 / pt.total_ppl_serv, 0) AS well
FROM
    combined_analysis_table ct
JOIN 
    province_totals pt 
ON 
    ct.province_name = pt.province_name
GROUP BY 
    ct.province_name
ORDER BY 
    ct.province_name;

**Look for:**</br>

Provinces with high % of river usage (Sokoto → drill wells).</br>

Provinces where tap_in_home_broken is high (Amanzi → fix infra).

### Town-Level Water Access Breakdown
**Task:**</br>

We do the same breakdown, but per town.</br>

Watch out for duplicate names (e.g., "Harare" exists in 2 provinces), so group by province + town.</br>

Town summaries help decide specific interventions — e.g., which towns to drill in or fix pipes.</br>

In [None]:
%%sql
WITH town_totals AS (
SELECT
	town_name,
    province_name,
    SUM(people_served) AS total_people_served_in_a_town
FROM
	combined_analysis_table
GROUP BY
	province_name,
    town_name)
SELECT
	ct.province_name,
    ct.town_name,
	ROUND(SUM(CASE WHEN source_type = 'river' THEN people_served ELSE 0 END)*100/tt.total_people_served_in_a_town) AS river,
    ROUND(SUM(CASE WHEN source_type = 'shared_tap' THEN people_served ELSE 0 END)*100/tt.total_people_served_in_a_town) AS shared_tap,
    ROUND(SUM(CASE WHEN source_type = 'tap_in_home' THEN people_served ELSE 0 END)*100/tt.total_people_served_in_a_town)AS tap_in_home,
    ROUND(SUM(CASE WHEN source_type = 'tap_in_home_broken' THEN people_served ELSE 0 END)*100/tt.total_people_served_in_a_town) AS tap_in_home_broken,
    ROUND(SUM(CASE WHEN source_type = 'well' THEN people_served ELSE 0 END)*100/tt.total_people_served_in_a_town) AS well
FROM
	combined_analysis_table AS ct
JOIN
	town_totals AS tt
ON
    ct.province_name = tt.province_name	
AND
    ct.town_name = tt.town_name
GROUP BY
	ct.province_name,
    ct.town_name
ORDER BY
	ct.town_name;

### Temporary tables 
Temporary tables in SQL are a nice way to store the results of a complex query. We run the query once, and the results are stored as a table. </br>The
catch? If you close the database connection, it deletes the table, so you have to run it again each time you start working in MySQL. </br>The benefit is
that we can use the table to do more calculations, without running the whole query each time.

In [None]:
%%sql
CREATE TEMPORARY TABLE town_aggregated_water_access
WITH town_totals AS (
SELECT
	town_name,
    province_name,
    SUM(people_served) AS total_people_served_in_a_town
FROM
	combined_analysis_table
GROUP BY
	province_name,
    town_name)
SELECT
	ct.province_name,
    ct.town_name,
	ROUND(SUM(CASE WHEN source_type = 'river' THEN people_served ELSE 0 END)*100/tt.total_people_served_in_a_town) AS river,
    ROUND(SUM(CASE WHEN source_type = 'shared_tap' THEN people_served ELSE 0 END)*100/tt.total_people_served_in_a_town) AS shared_tap,
    ROUND(SUM(CASE WHEN source_type = 'tap_in_home' THEN people_served ELSE 0 END)*100/tt.total_people_served_in_a_town)AS tap_in_home,
    ROUND(SUM(CASE WHEN source_type = 'tap_in_home_broken' THEN people_served ELSE 0 END)*100/tt.total_people_served_in_a_town) AS tap_in_home_broken,
    ROUND(SUM(CASE WHEN source_type = 'well' THEN people_served ELSE 0 END)*100/tt.total_people_served_in_a_town) AS well
FROM
	combined_analysis_table AS ct
JOIN
	town_totals AS tt
ON
    ct.province_name = tt.province_name	
AND
    ct.town_name = tt.town_name
GROUP BY
	ct.province_name,
    ct.town_name
ORDER BY
	ct.town_name;

Our final goal is to implement our plan in the database.</br>
We have a plan to improve the water access in Maji Ndogo, so we need to think it through, and as our final task, create a table where our teams</br>
have the information they need to fix, upgrade and repair water sources. They will need the addresses of the places they should visit (street</br>
address, town, province), the type of water source they should improve, and what should be done to improve it.</br>
We should also make space for them in the database to update us on their progress. We need to know if the repair is complete, and the date it was</br>
completed, and give them space to upgrade the sources. Let's call this table Project_progress.


In [None]:
%%sql
CREATE TABLE Project_progress (
    Project_id SERIAL PRIMARY KEY,
    source_id VARCHAR(20) NOT NULL REFERENCES water_source(source_id) ON DELETE CASCADE ON UPDATE CASCADE,
    Address VARCHAR(50),
    Town VARCHAR(30),
    Province VARCHAR(30),
    Source_type VARCHAR(50),
    Improvement VARCHAR(50),
    Source_status VARCHAR(50) DEFAULT 'Backlog' CHECK (Source_status IN ('Backlog', 'In progress', 'Complete')),
    Date_of_completion DATE,
    Comments TEXT
);

**At a high level, the Improvements are as follows:**
1. Rivers → Drill wells
2. wells: if the well is contaminated with chemicals → Install RO filter
3. wells: if the well is contaminated with biological contaminants → Install UV and RO filter
4. shared_taps: if the queue is longer than 30 min (30 min and above) → Install X taps nearby where X number of taps is calculated using X
= FLOOR(time_in_queue / 30).
5. tap_in_home_broken → Diagnose local infrastructure

* To make this simpler, we can start with this query:</br>
--- Project_progress_query

In [None]:
%sql
SELECT
    location.address,
    location.town_name,
    location.province_name,
    water_source.source_id,
    water_source.type_of_water_source,
    well_pollution.results
FROM
    water_source
LEFT JOIN
    well_pollution ON water_source.source_id = well_pollution.source_id
INNER JOIN
    visits ON water_source.source_id = visits.source_id
INNER JOIN
    location ON location.location_id = visits.location_id

First things first, let's filter the data to only contain sources we want to improve by thinking through the logic first.</br>
1. Only records with visit_count = 1 are allowed.
2. Any of the following rows can be included:</br>
a. Where shared taps have queue times over 30 min.</br>
b. Only wells that are contaminated are allowed -- So we exclude wells that are Clean</br>
c. Include any river and tap_in_home_broken sources.


In [None]:
%%sql
SELECT
	lc.address,
	lc.town_name,
	lc.province_name,
	ws.source_id,
	ws.type_of_water_source,
	wp.results
FROM
	water_source AS ws
JOIN
	well_pollution AS wp ON ws.source_id = wp.source_id
INNER JOIN
	visits AS vs ON vs.source_id = wp.source_id
INNER JOIN
	location AS lc ON vs.location_id = lc.location_id
WHERE
	vs.visit_count = 1
AND (
    	wp.results != 'Clean'
    OR
        ws.type_of_water_source = 'river'
    OR
        ws.type_of_water_source =  'tap_in_home_broken'
    OR
    	(ws.type_of_water_source = 'shared_tap' AND vs.time_in_queue >= 30)
);  

### Use CASE logic in SQL for:

Rivers → "Drill well"</br>

Contaminated wells → "Install UV and RO filter" / "Install RO filter"</br>

Shared taps → "Install X taps nearby" → CONCAT('Install ', FLOOR(time_in_queue/30), ' taps nearby')</br>

Broken taps → "Diagnose local infrastructure"

### Final Insights
Sokoto: Many people rely on rivers. Prioritize drilling wells here.</br>

Amanzi/Amina: Most in-home taps are broken → Fix infrastructure.</br>

Shared taps: Queue times exceed 30 mins → Add new taps.</br>

Wells: Many are biologically or chemically unsafe → Filter installation needed.

 1. Contaminated Wells

In [None]:
%%sql
UPDATE Project_progress
SET Improvement = (
  SELECT CASE
    WHEN results = 'Contaminated: Biological' THEN 'Install UV and RO filter'
    WHEN results = 'Contaminated: Chemical' THEN 'Install RO filter'
    ELSE NULL
  END
  FROM well_pollution
  WHERE well_pollution.source_id = Project_progress.source_id
)
WHERE Source_type = 'well';

2. Shared Taps With Long Queues — install more taps</br>
Install 1 tap for every 30 minutes of queue time.

In [None]:
%%sql
UPDATE Project_progress
SET Improvement = (
  SELECT CONCAT('Install ', FLOOR(time_in_queue / 30), ' taps nearby')
  FROM visits
  WHERE visits.source_id = Project_progress.source_id
)
WHERE Source_type = 'shared_tap'
  AND (
    SELECT time_in_queue
    FROM visits
    WHERE visits.source_id = Project_progress.source_id
  ) >= 30;

3. Broken In-Home Taps — diagnose infrastructure

In [None]:
%%sql
UPDATE Project_progress
SET Improvement = 'Diagnose local infrastructure'
WHERE Source_type = 'tap_in_home_broken';

 4. Rivers — drill a well

In [None]:
%%sql
UPDATE Project_progress
SET Improvement = 'Drill well'
WHERE Source_type = 'river';