## INTEGRATED PROJECT PART IV - `Charting the course for Maji Ndogo's water future`: From analysis to action

## Unveiling the Water Crisis in Maji Ndogo

## 1. Introduction

In this final part of the project, we finalise our data analysis using the full suite of SQL tools. We will gain our final insights, use these to classify water sources, and prepare relevant data for our engineering teams.

First, let's load our sample database:

In [2]:
# Load and activate the SQL extension to allow us to execute SQL in a Jupyter notebook.
%load_ext sql

In [3]:
# Establish a connection to the local database using the '%sql' magic command.
# Replace 'password' with our connection password and `db_name` with our database name. 
# If you get an error here, please make sure the database name or password is correct.

%sql mysql+pymysql://root:**********@LOCALHOST:3306/md_water_services

### 2. GETTING USED TO THE PROJECT
**Slides: 1-3**

### Information from **Aziza Naledi**

> **Last week**, we uncovered the corruption of our field workers, and I want to thank the team for bringing this to my attention. As you all know, I have no tolerance for people who put themselves first at the cost of everyone else, so I have taken the necessary steps!  
>  
> Our journey continues, as we aim to convert our data into actionable knowledge. Understanding the situation is one thing, but it’s the translation of that understanding into informed decisions that will truly make a difference.  
>  
> As we step into this next phase, you will be shaping our raw data into meaningful views — providing essential information to decision-makers. This will enable us to discern the materials we need, plan our budgets, and identify the areas requiring immediate attention. We’re not just analysing data; we’re making it speak in a language that everyone involved in this mission can understand and act upon.  
>  
> Lastly, we’ll be creating job lists for our engineers. Their expertise will be invaluable in tackling the challenges we face, but they can only do their job effectively when they have clear, data-driven directions.  
>  
> Remember, each step you take in this process contributes to a larger goal — the transformation of Maji Ndogo. Your diligence and dedication are instrumental in shaping a brighter future for our community. Thank you for being part of this journey.  
>  
> **All the best,**  
> *Aziza*


> <span style="color:red; font-weight:bold; font-size:1.2em;">CORRUPT LEADERS ARRESTED:</span>  
> - <span style="color:red; font-weight:bold;">Bello Azibo</span>  
> - <span style="color:blue; font-weight:bold;">Malachi Mavuso</span>  
> - <span style="color:green; font-weight:bold;">Lalitha Kaburi</span>  
> - <span style="color:purple; font-weight:bold;">Zuriel Matembo</span>  


<div style="text-align: center; border: 4px solid #cc0000; background-color: #ffe0e0; padding: 15px; border-radius: 12px;">

<img src="CORRUPT%20LEADERS%20DETAINED.png" alt="Corrupt Leaders Detained" width="500">

</div>



<div style="background-color:#e6f5e6; padding:15px; border-radius:10px;">

## <span style="color:#006400;">THE LAST BIT OF ANALYSIS</span>  
### <span style="color:#228B22;">SUMMARY OF THE DATA WE NEED AND WHERE TO GET IT</span>  

**<span style="color:#000080;">SLIDE 4-10</span>**  

- <span style="color:#8B0000;">All of the information about the `location` of a water source is in the `location` table</span>, specifically the **<span style="color:#2F4F4F;">town</span>** and **<span style="color:#2F4F4F;">province</span>** of that water source.  
- **<span style="color:#006400;">water_source</span>** has the type of source and the `number of people served` by each source.  
- **<span style="color:#008B8B;">visits</span>** has queue information, and connects **source_id** to **location_id**. There were multiple visits to sites, so we need to be careful to include duplicate data (**visit_count > 1**).  
- **<span style="color:#B22222;">Well_pollution</span>** has information about the `quality of water` from only wells, so we need to keep that in mind when we join this table.  

</div>



Previously, we couldn't link `provinces` and `towns` to the `type of water sources`, the `number of people served` by those sources, `queue times`, or `pollution data`, but we can now. So, what type of relationships can we look at?

 <div style="background-color:#e6f5e6; padding:15px; border-radius:10px;">

##  <span style="color:#006400;">Questions to be answered</span>

1. <span style="color:#;8B0000">Are there any specific `provinces`, or `towns` where some sources are `more abundant?`</span>  
   
2. <span style="color:#;8B0000">We identified that <b style="color:#1E90FF;">tap-in-home-broken taps</b> are easy wins.</span>  
   <span style="color:#;8B0000">Are there any `towns` where this is a particular problem?</span>  

</div>


***To answer question 1,*** 
1. we will need `province_name` and `town_name` from the `location` table. 


In [4]:
%%sql
SELECT
    *
FROM
    LOCATION
LIMIT 5;

 * mysql+pymysql://root:***@LOCALHOST:3306/md_water_services
5 rows affected.


location_id,address,province_name,town_name,location_type
AkHa00000,2 Addis Ababa Road,Akatsi,Harare,Urban
AkHa00001,10 Addis Ababa Road,Akatsi,Harare,Urban
AkHa00002,9 Addis Ababa Road,Akatsi,Harare,Urban
AkHa00003,139 Addis Ababa Road,Akatsi,Harare,Urban
AkHa00004,17 Addis Ababa Road,Akatsi,Harare,Urban


2. We also need to know `type_of_water_source` and
`number_of_people_served` from the `water_source` table.

In [5]:
%%sql
SELECT
    *
FROM
    water_source
LIMIT 5;

 * mysql+pymysql://root:***@LOCALHOST:3306/md_water_services
5 rows affected.


source_id,type_of_water_source,number_of_people_served
AkHa00000224,tap_in_home,956
AkHa00001224,tap_in_home_broken,930
AkHa00002224,tap_in_home_broken,486
AkHa00003224,well,364
AkHa00004224,tap_in_home_broken,942


In [6]:
%%sql
SELECT
    *
FROM
    visits
LIMIT 5;

 * mysql+pymysql://root:***@LOCALHOST:3306/md_water_services
5 rows affected.


record_id,location_id,source_id,time_of_record,visit_count,time_in_queue,assigned_employee_id
0,SoIl32582,SoIl32582224,2021-01-01 09:10:00,1,15,12
1,KiRu28935,KiRu28935224,2021-01-01 09:17:00,1,0,46
2,HaRu19752,HaRu19752224,2021-01-01 09:36:00,1,62,40
3,AkLu01628,AkLu01628224,2021-01-01 09:53:00,1,0,1
4,AkRu03357,AkRu03357224,2021-01-01 10:11:00,1,28,14


<div style="background-color:#e6ffe6; padding:10px; border-radius:8px; color:green;">
The problem is that the location table uses <code>location_id</code> while <code>water_source</code> only has <code>source_id</code>. So we won't be able to join these tables directly. But the <code>visits</code> table maps <code>location_id</code> and <code>source_id</code>. So if we use <code>visits</code> as the table we query from, we can join <code>location</code> where the <code>location_id</code> matches, and <code>water_source</code> where the <code>source_id</code> matches.
</div>




In [7]:
%%sql
SELECT
    lo.province_name,
    lo.town_name,
    wa.type_of_water_source,
    wa.number_of_people_served,
    lo.location_id,
    vi.visit_count
FROM 
    visits as vi
JOIN 
    location as lo
ON lo.location_id=vi.location_id
JOIN 
   water_source as wa
ON vi.source_id=wa.source_id
LIMIT 5;

 * mysql+pymysql://root:***@LOCALHOST:3306/md_water_services
5 rows affected.


province_name,town_name,type_of_water_source,number_of_people_served,location_id,visit_count
Akatsi,Harare,tap_in_home,956,AkHa00000,1
Akatsi,Harare,tap_in_home_broken,930,AkHa00001,1
Akatsi,Harare,tap_in_home_broken,486,AkHa00002,1
Akatsi,Harare,well,364,AkHa00003,1
Akatsi,Harare,tap_in_home_broken,942,AkHa00004,1


<div style="background-color:#e6ffe6; padding:10px; border-radius:8px; color:green;">
Note that there are rows where <code>visit_count &gt; 1.</code>  These were the sites our surveyors collected additional information for, but they happened at the same source/location. For example, add this to your query: <code>WHERE visits.location_id = 'AkHa00103'</code>
</div>



In [8]:
%%sql
SELECT
    lo.province_name,
    lo.town_name,
    wa.type_of_water_source,
    wa.number_of_people_served,
    lo.location_id,
    vi.visit_count
FROM 
    visits as vi
JOIN 
    location as lo
ON lo.location_id=vi.location_id
JOIN 
   water_source as wa
ON vi.source_id=wa.source_id
WHERE vi.location_id = 'AkHa00103'
LIMIT 10;

 * mysql+pymysql://root:***@LOCALHOST:3306/md_water_services
8 rows affected.


province_name,town_name,type_of_water_source,number_of_people_served,location_id,visit_count
Akatsi,Harare,shared_tap,3340,AkHa00103,1
Akatsi,Harare,shared_tap,3340,AkHa00103,2
Akatsi,Harare,shared_tap,3340,AkHa00103,3
Akatsi,Harare,shared_tap,3340,AkHa00103,4
Akatsi,Harare,shared_tap,3340,AkHa00103,5
Akatsi,Harare,shared_tap,3340,AkHa00103,6
Akatsi,Harare,shared_tap,3340,AkHa00103,7
Akatsi,Harare,shared_tap,3340,AkHa00103,8


<div style="background-color:#d9f2d9; padding:10px; border-radius:8px; color:#1a4d1a;">
For one location, there are multiple <code>AkHa00103</code> records for the same location. If we aggregate, we will include these rows, so our results will be incorrect. To fix this, we can just select rows where <code>visits.visit_count = 1</code>.
</div>


In [9]:
%%sql
SELECT
    lo.province_name,
    lo.town_name,
    wa.type_of_water_source,
    wa.number_of_people_served,
    lo.location_id,
    vi.visit_count
FROM 
    visits as vi
JOIN 
    location as lo
ON lo.location_id=vi.location_id
JOIN 
   water_source as wa
ON vi.source_id=wa.source_id
WHERE vi.visit_count = 1
LIMIT 10;

 * mysql+pymysql://root:***@LOCALHOST:3306/md_water_services
10 rows affected.


province_name,town_name,type_of_water_source,number_of_people_served,location_id,visit_count
Sokoto,Ilanga,river,402,SoIl32582,1
Kilimani,Rural,well,252,KiRu28935,1
Hawassa,Rural,shared_tap,542,HaRu19752,1
Akatsi,Lusaka,well,210,AkLu01628,1
Akatsi,Rural,shared_tap,2598,AkRu03357,1
Kilimani,Rural,river,862,KiRu29315,1
Akatsi,Rural,tap_in_home_broken,496,AkRu05234,1
Kilimani,Rural,tap_in_home,562,KiRu28520,1
Hawassa,Zanzibar,well,308,HaZa21742,1
Amanzi,Dahabu,tap_in_home,556,AmDa12214,1


<div style="background: linear-gradient(135deg, #0f9b0f, #f0f0c9); padding: 20px; border-radius: 12px; color: #1b1b1b; font-size: 1.1em; line-height: 1.6;">

### ✅ Step: Cleaning Up the Results

Now that we have **verified** the table is joined correctly,  
we can make the following adjustments to our final output:

- **Add** the `location_type` column from `location`  
- **Add** the `time_in_queue` column from `visits`  
- **Remove** the `location_id` column  
- **Remove** the `visit_count` column  

This will ensure **cleaner**, more **insightful** results, containing only the necessary data for analysis and presentation.

</div>



In [10]:
%%sql
SELECT
    lo.province_name,
    lo.town_name,
    wa.type_of_water_source,
    lo.location_type,
    wa.number_of_people_served,
    vi.time_in_queue
   
FROM 
    visits as vi
JOIN 
    location as lo
ON lo.location_id=vi.location_id
JOIN 
   water_source as wa
ON vi.source_id=wa.source_id
WHERE vi.visit_count = 1
LIMIT 10;

 * mysql+pymysql://root:***@LOCALHOST:3306/md_water_services
10 rows affected.


province_name,town_name,type_of_water_source,location_type,number_of_people_served,time_in_queue
Sokoto,Ilanga,river,Urban,402,15
Kilimani,Rural,well,Rural,252,0
Hawassa,Rural,shared_tap,Rural,542,62
Akatsi,Lusaka,well,Urban,210,0
Akatsi,Rural,shared_tap,Rural,2598,28
Kilimani,Rural,river,Rural,862,9
Akatsi,Rural,tap_in_home_broken,Rural,496,0
Kilimani,Rural,tap_in_home,Rural,562,0
Hawassa,Zanzibar,well,Urban,308,0
Amanzi,Dahabu,tap_in_home,Urban,556,0


<div style="background: linear-gradient(135deg, #0f9b0f, #f0f0c9); padding: 20px; border-radius: 12px; color: #1b1b1b; font-size: 1.1em; line-height: 1.6;">

### 💧 Step: Adding Well Pollution Data

Now we need to **include results** from the `well_pollution` table.

This part is a bit trickier because:

- The `well_pollution` table **only** contains data for wells.
- If we use a regular `JOIN` (inner join),  
  we will only keep records **present in both** `well_pollution` **and** `visits`.
- This would **exclude** all non-well sources from our results.

✅ **Solution:** Use a **LEFT JOIN** to bring in pollution data **only where it exists** for well sources.  
- For non-well sources, pollution data will appear as **`NULL`**.  
- This ensures we **keep all visit records** while enriching well sources with pollution details.

💡 Experiment with different `JOIN` types (`INNER JOIN`, `RIGHT JOIN`, `FULL OUTER JOIN`)  
to **see the difference** and understand **why** `LEFT JOIN` is the best choice here.

</div>


In [11]:
%%sql
SELECT 
    *
FROM
    well_pollution
limit 5;

 * mysql+pymysql://root:***@LOCALHOST:3306/md_water_services
5 rows affected.


source_id,date,description,pollutant_ppm,biological,results
KiRu28935224,2021-01-04 09:17:00,Bacteria: Giardia Lamblia,0.0,495.898,Contaminated: Biological
AkLu01628224,2021-01-04 09:53:00,Bacteria: E. coli,0.0,6.09608,Contaminated: Biological
HaZa21742224,2021-01-04 10:37:00,"Inorganic contaminants: Zinc, Zinc, Lead, Cadmium",2.715,0.0,Contaminated: Chemical
HaRu19725224,2021-01-04 11:04:00,Clean,0.0288593,9.56996e-05,Clean
SoRu35703224,2021-01-04 11:29:00,Bacteria: E. coli,0.0,22.5009,Contaminated: Biological


In [14]:
%%sql
SELECT
    lo.province_name,
    lo.town_name,
    wa.type_of_water_source,
    lo.location_type,
    wa.number_of_people_served,
    vi.time_in_queue,
    we.results
   
FROM 
    visits as vi
JOIN 
    location as lo
ON lo.location_id=vi.location_id
JOIN 
   water_source as wa
ON vi.source_id=wa.source_id
LEFT JOIN
    well_pollution as we
ON we.source_id = vi.source_id
WHERE vi.visit_count = 1
limit 10;

 * mysql+pymysql://root:***@LOCALHOST:3306/md_water_services
10 rows affected.


province_name,town_name,type_of_water_source,location_type,number_of_people_served,time_in_queue,results
Sokoto,Ilanga,river,Urban,402,15,
Kilimani,Rural,well,Rural,252,0,Contaminated: Biological
Hawassa,Rural,shared_tap,Rural,542,62,
Akatsi,Lusaka,well,Urban,210,0,Contaminated: Biological
Akatsi,Rural,shared_tap,Rural,2598,28,
Kilimani,Rural,river,Rural,862,9,
Akatsi,Rural,tap_in_home_broken,Rural,496,0,
Kilimani,Rural,tap_in_home,Rural,562,0,
Hawassa,Zanzibar,well,Urban,308,0,Contaminated: Chemical
Amanzi,Dahabu,tap_in_home,Urban,556,0,


**COUNT OUR OUTPUT RECORDS**

In [15]:
%%sql
With combined_analysis_table as (
SELECT
    lo.province_name,
    lo.town_name,
    wa.type_of_water_source,
    lo.location_type,
    wa.number_of_people_served,
    vi.time_in_queue,
    we.results
   
FROM 
    visits as vi
JOIN 
    location as lo
ON lo.location_id=vi.location_id
JOIN 
   water_source as wa
ON vi.source_id=wa.source_id
LEFT JOIN
    well_pollution as we
ON we.source_id = vi.source_id
WHERE vi.visit_count = 1)

SELECT 
    COUNT(*)
FROM
     combined_analysis_table
limit 10;

 * mysql+pymysql://root:***@LOCALHOST:3306/md_water_services
1 rows affected.


COUNT(*)
39650


<div style="background-color:#e6f5e6; padding:15px; border-radius:10px;">
<h1 style="color:#0f9b0f;">***optionally***</h1>



In [16]:
%%sql

SELECT
    water_source.type_of_water_source,
    location.town_name,
    location.province_name,
    location.location_type,
    water_source.number_of_people_served,
    visits.time_in_queue,
    well_pollution.results
FROM
    visits
LEFT JOIN
    well_pollution
ON well_pollution.source_id = visits.source_id
INNER JOIN
    location
ON location.location_id = visits.location_id
INNER JOIN
    water_source
ON water_source.source_id = visits.source_id
    WHERE
visits.visit_count = 1

limit 10;

 * mysql+pymysql://root:***@LOCALHOST:3306/md_water_services
10 rows affected.


type_of_water_source,town_name,province_name,location_type,number_of_people_served,time_in_queue,results
river,Ilanga,Sokoto,Urban,402,15,
well,Rural,Kilimani,Rural,252,0,Contaminated: Biological
shared_tap,Rural,Hawassa,Rural,542,62,
well,Lusaka,Akatsi,Urban,210,0,Contaminated: Biological
shared_tap,Rural,Akatsi,Rural,2598,28,
river,Rural,Kilimani,Rural,862,9,
tap_in_home_broken,Rural,Akatsi,Rural,496,0,
tap_in_home,Rural,Kilimani,Rural,562,0,
well,Zanzibar,Hawassa,Urban,308,0,Contaminated: Chemical
tap_in_home,Dahabu,Amanzi,Urban,556,0,


In [17]:
%%sql
with combined_analysis_table as (
SELECT
    water_source.type_of_water_source,
    location.town_name,
    location.province_name,
    location.location_type,
    water_source.number_of_people_served,
    visits.time_in_queue,
    well_pollution.results
FROM
    visits
LEFT JOIN
    well_pollution
ON well_pollution.source_id = visits.source_id
INNER JOIN
    location
ON location.location_id = visits.location_id
INNER JOIN
    water_source
ON water_source.source_id = visits.source_id
    WHERE
visits.visit_count = 1)


SELECT 
    COUNT(*)
FROM
     combined_analysis_table;

 * mysql+pymysql://root:***@LOCALHOST:3306/md_water_services
1 rows affected.


COUNT(*)
39650


<div style="background-color:#e6f5e6; padding:15px; border-radius:10px;">
<h1 style="color:#0f9b0f;">MAKE IT A <code>VIEW</code> OR <code>CTE</code> TO MAKE ANALYSIS EASIER</h1>


In [20]:
%%sql
DROP VIEW IF EXISTS combined_analysis_table;
CREATE VIEW combined_analysis_table AS(

SELECT
    water_source.type_of_water_source AS source_type,
    location.town_name,
    location.province_name,
    location.location_type,
    water_source.number_of_people_served AS people_served,
    visits.time_in_queue,
    well_pollution.results
FROM
    visits
LEFT JOIN
    well_pollution
ON well_pollution.source_id = visits.source_id
INNER JOIN
    location
ON location.location_id = visits.location_id
INNER JOIN
    water_source
ON water_source.source_id = visits.source_id
WHERE
    visits.visit_count = 1);

 * mysql+pymysql://root:***@LOCALHOST:3306/md_water_services
0 rows affected.
0 rows affected.


[]

In [21]:
%%sql
SELECT
    *
FROM
    combined_analysis_table
LIMIT 10;

 * mysql+pymysql://root:***@LOCALHOST:3306/md_water_services
10 rows affected.


source_type,town_name,province_name,location_type,people_served,time_in_queue,results
river,Ilanga,Sokoto,Urban,402,15,
well,Rural,Kilimani,Rural,252,0,Contaminated: Biological
shared_tap,Rural,Hawassa,Rural,542,62,
well,Lusaka,Akatsi,Urban,210,0,Contaminated: Biological
shared_tap,Rural,Akatsi,Rural,2598,28,
river,Rural,Kilimani,Rural,862,9,
tap_in_home_broken,Rural,Akatsi,Rural,496,0,
tap_in_home,Rural,Kilimani,Rural,562,0,
well,Zanzibar,Hawassa,Urban,308,0,Contaminated: Chemical
tap_in_home,Dahabu,Amanzi,Urban,556,0,


<div style="background: linear-gradient(135deg, #0f9b0f, #f0f0c9); padding: 20px; border-radius: 12px; color: #1b1b1b; font-size: 1.1em; line-height: 1.6;">

### <strong>The Last Analysis</strong>
`**SLIDE 11- 22**`

We're building another <code>*pivot table!</code>* This time, we want to break down our data into <code>**provinces or towns**</code> and <code>**source types**</code>.  
If we understand where the problems are, and what we need to improve at those locations, we can make an informed decision on where to send our repair teams.

</div>


<div style="background-color:#e6f5e6; padding:15px; border-radius:10px;">
 <h2 style="color:#0f9b0f;"><strong>1. FIND <code>Province_totals</code> for each province</strong></h2>



In [22]:
%%sql
SELECT
    province_name,
    SUM(people_served) AS total_ppl_serv
FROM
    combined_analysis_table
GROUP BY
    province_name
limit 10;


 * mysql+pymysql://root:***@LOCALHOST:3306/md_water_services
5 rows affected.


province_name,total_ppl_serv
Sokoto,5774434
Kilimani,6584764
Hawassa,3843810
Akatsi,5993306
Amanzi,5431826


<div style="background-color:#e6f5e6; padding:15px; border-radius:10px;">
 <h2 style="color:#0f9b0f;"><strong>2. Create a <code>Pivot table</code> for each province to show the <code>percentage of each type of water source per province</code></strong></h2>
    
 1. To do this, make the previous query a `cte` as `Province_totals`
 2. join the `combined_analysis_table` and the `Province_totals`
 3. calculate the `percentage of people served by each source type per province`, rounding the percentages
 4. Start with where the type of water source is `river`
 5. Divide that by the province total
 6. Multiply by 100 to get the `source pct per province`
 7. Do the same for all the other sources.

In [23]:
%%sql
##To do this, make the previous query a cte as Province_totals
## join the combined_analysis_table and the Province_totals 
with Province_totals as(
SELECT
    province_name,
    SUM(people_served) AS total_ppl_serv
FROM
    combined_analysis_table
GROUP BY
    province_name)

SELECT
    *
FROM
    combined_analysis_table as ct
JOIN
    province_totals as pt 
ON ct.province_name = pt.province_name
limit 10;

 * mysql+pymysql://root:***@LOCALHOST:3306/md_water_services
10 rows affected.


source_type,town_name,province_name,location_type,people_served,time_in_queue,results,province_name_1,total_ppl_serv
well,Kintampo,Akatsi,Urban,246,0,Contaminated: Chemical,Akatsi,5993306
tap_in_home_broken,Kintampo,Akatsi,Urban,910,0,,Akatsi,5993306
well,Kintampo,Akatsi,Urban,326,0,Clean,Akatsi,5993306
tap_in_home,Kintampo,Akatsi,Urban,440,0,,Akatsi,5993306
well,Kintampo,Akatsi,Urban,296,0,Clean,Akatsi,5993306
tap_in_home,Kintampo,Akatsi,Urban,628,0,,Akatsi,5993306
well,Kintampo,Akatsi,Urban,222,0,Clean,Akatsi,5993306
tap_in_home,Kintampo,Akatsi,Urban,562,0,,Akatsi,5993306
well,Kintampo,Akatsi,Urban,394,0,Contaminated: Biological,Akatsi,5993306
tap_in_home_broken,Kintampo,Akatsi,Urban,316,0,,Akatsi,5993306


In [24]:
%%sql
## Start with where the type of water source is `river`
with Province_totals as(
SELECT
    province_name,
    SUM(people_served) AS total_ppl_serv
FROM
    combined_analysis_table
GROUP BY
    province_name)

SELECT
    ct.province_name,
    ROUND((SUM(CASE 
                    WHEN source_type = 'river' THEN people_served 
                    ELSE 0 
                END)/ pt.total_ppl_serv * 100.0),0 ) AS river
FROM
combined_analysis_table  as ct
JOIN
province_totals as pt 
ON ct.province_name = pt.province_name
GROUP BY
ct.province_name
ORDER BY
ct.province_name

limit 10;

 * mysql+pymysql://root:***@LOCALHOST:3306/md_water_services
5 rows affected.


province_name,river
Akatsi,5
Amanzi,3
Hawassa,4
Kilimani,8
Sokoto,21


In [25]:
%%sql
##Do the same for all the other sources. 
with Province_totals as(
SELECT
    province_name,
    SUM(people_served) AS total_ppl_serv
FROM
    combined_analysis_table
GROUP BY
    province_name)

SELECT
    ct.province_name,
    ROUND((SUM(CASE 
                    WHEN source_type = 'river' THEN people_served 
                    ELSE 0 
                END)/ pt.total_ppl_serv * 100.0),0 ) AS river,
    ROUND((SUM(CASE 
                    WHEN source_type = 'shared_tap' THEN people_served 
                    ELSE 0 
                END) * 100.0 / pt.total_ppl_serv), 0) AS shared_tap,
    ROUND((SUM(CASE WHEN source_type = 'tap_in_home'
                    THEN people_served ELSE 0 
                END) * 100.0 / pt.total_ppl_serv), 0) AS tap_in_home,
    ROUND((SUM(CASE WHEN source_type = 'tap_in_home_broken'
                    THEN people_served ELSE 0 
                END) * 100.0 / pt.total_ppl_serv), 0) AS tap_in_home_broken,
    ROUND((SUM(CASE WHEN source_type = 'well'
                    THEN people_served ELSE 0 
                END) * 100.0 / pt.total_ppl_serv), 0) AS well

FROM
combined_analysis_table  as ct
JOIN
province_totals as pt 
ON ct.province_name = pt.province_name
GROUP BY
ct.province_name
ORDER BY
ct.province_name

limit 6;

 * mysql+pymysql://root:***@LOCALHOST:3306/md_water_services
5 rows affected.


province_name,river,shared_tap,tap_in_home,tap_in_home_broken,well
Akatsi,5,49,14,10,23
Amanzi,3,38,28,24,7
Hawassa,4,43,15,15,24
Kilimani,8,47,13,12,20
Sokoto,21,38,16,10,15


## Observable patterns
1. `Sokoto` has the largest population of people drinking river water. We should send our drilling equipment to <span style="color:#1f77b4; font-weight:bold;">Sokoto</span> first, so people can drink safe filtered water from a well.
2. The majority of water from <span style="color:#ff7f0e; font-weight:bold;">Amanzi</span> comes from taps, but `half of these home taps` don't work because the infrastructure is `broken.`  
3. We need to send out engineering teams to look at the infrastructure in <span style="color:#ff7f0e; font-weight:bold;">Amanzi</span> first. Fixing a `large pump, treatment plant or reservoir` means that thousands of people will have running water.This means they will also not have to queue for water, so we improve two things at once.


<div style="background-color:#e6f5e6; padding:15px; border-radius:10px;">
 <h2 style="color:#0f9b0f;"><strong>3. Aggregating the data per <code>town</code> 

<div style="background-color:#e6f5e5; padding:15px; border-radius:10px;">
Recall that there are two towns in <span style="color:#2E86C1; font-weight:bold;">Maji Ndogo</span> called <span style="color:#C0392B; font-weight:bold;">Harare</span>.  
One is in <span style="color:#27AE60; font-weight:bold;">Akatsi</span>, and one is in <span style="color:#8E44AD; font-weight:bold;">Kilimani</span>.  
<span style="color:#D35400; font-weight:bold;">Amina</span> is another example.  

So when we just aggregate by town, SQL doesn't distinguish between the different <span style="color:#C0392B; font-weight:bold;">Harare's</span>, so it combines their results.
- To get around that, we have to group by <span style="color:#2E86C1; font-weight:bold;">province</span> first, then by <span style="color:#D35400; font-weight:bold;">town</span>, so that the duplicate towns are distinct because they are in different provinces.


In [26]:
%%sql
SELECT
    province_name,
    town_name,
    SUM(people_served) AS total_ppl_serv
FROM
    combined_analysis_table
GROUP BY
    province_name, town_name
ORDER BY province_name
limit 10;

 * mysql+pymysql://root:***@LOCALHOST:3306/md_water_services
10 rows affected.


province_name,town_name,total_ppl_serv
Akatsi,Harare,419920
Akatsi,Kintampo,403222
Akatsi,Lusaka,568068
Akatsi,Rural,4602096
Amanzi,Abidjan,373650
Amanzi,Amina,458596
Amanzi,Asmara,834026
Amanzi,Bello,385324
Amanzi,Dahabu,747662
Amanzi,Pwani,497522


<div style="background-color:#e6f5e5; padding:15px; border-radius:10px;">
    
## ***To confirm  this***

In [27]:
%%sql
with town_totals as (
SELECT
    province_name,
    town_name,
    SUM(people_served) AS total_ppl_serv
FROM
    combined_analysis_table
GROUP BY
    province_name, town_name
ORDER BY province_name)

SELECT 
    Province_name,
    sum(total_ppl_serv) 
FROM
    town_totals
where province_name ='Akatsi'
limit 10;

 * mysql+pymysql://root:***@LOCALHOST:3306/md_water_services
1 rows affected.


Province_name,sum(total_ppl_serv)
Akatsi,5993306


<div style="background-color:#e6f5e6; padding:15px; border-radius:10px;">
 <h2 style="color:#0f9b0f;"><strong>2. Create a <code>Pivot table</code> for each town in each province  to show the <code>percentage of each type of water source per town</code></strong></h2>
    
 1. To do this, make the previous query a `cte` as `town_totals`
 2. join the `combined_analysis_table` and the `town_totals`
 3. calculate the `percentage of people served by each source type per TOWN`, rounding the percentages
 4. Start with where the type of water source is `river`
 5. Divide that by the province total
 6. Multiply by 100 to get the `source pct per province`
 7. Do the same for all the other sources.

In [28]:
%%sql
## make the previous query a CTE as town_totals
## join the combined_analysis_table and the town_totals

with town_totals as(
SELECT
    province_name,
    town_name,
    SUM(people_served) AS total_ppl_serv
FROM
    combined_analysis_table
GROUP BY
    province_name, town_name
ORDER BY province_name,town_name)

SELECT
    ct.province_name,
    ct.town_name,
    tt.total_ppl_serv
FROM
    combined_analysis_table as ct
JOIN
    town_totals as tt 

##−− Since the town names are not unique, we have to join on a composite key
ON ct.province_name = tt.province_name AND ct.town_name = tt.town_name

GROUP BY  ## We group by province first, then by town.
    ct.province_name,
    ct.town_name
ORDER BY
     ct.province_name,ct.town_name
limit 7;


 * mysql+pymysql://root:***@LOCALHOST:3306/md_water_services
7 rows affected.


province_name,town_name,total_ppl_serv
Akatsi,Harare,419920
Akatsi,Kintampo,403222
Akatsi,Lusaka,568068
Akatsi,Rural,4602096
Amanzi,Abidjan,373650
Amanzi,Amina,458596
Amanzi,Asmara,834026


<div style="background-color:#e6a9e3; padding:15px; border-radius:10px;">

## ***Calculating the `percentage of people served` by each `source type per town`, rounding the percentages***

In [31]:
%%sql
## where source is river 

with town_totals as(
SELECT
    province_name,
    town_name,
    SUM(people_served) AS total_ppl_serv
FROM
    combined_analysis_table
GROUP BY
    province_name, town_name
ORDER BY province_name,town_name)

SELECT
    ct.province_name,
    ct.town_name,
    ROUND((SUM(CASE 
                    WHEN source_type = 'river' THEN people_served 
                    ELSE 0
                END)/ tt.total_ppl_serv * 100.0),0 ) AS river
FROM
    combined_analysis_table as ct
JOIN
    town_totals as tt 

##−− Since the town names are not unique, we have to join on a composite key
ON ct.province_name = tt.province_name AND ct.town_name = tt.town_name

GROUP BY  ## We group by province first, then by town.
    ct.province_name,
    ct.town_name
ORDER BY
     RIVER DESC
limit 7;

 * mysql+pymysql://root:***@LOCALHOST:3306/md_water_services
7 rows affected.


province_name,town_name,river
Sokoto,Rural,22
Sokoto,Bahari,21
Sokoto,Kofi,20
Sokoto,Cheche,19
Sokoto,Majengo,18
Sokoto,Marang,17
Sokoto,Ilanga,16


<div style="background-color:#e6a9e3; padding:15px; border-radius:10px;">

## ***Calculating the `percentage of people served` by each `source type per town` for all sources,  then rounding the percentages***

In [32]:
%%sql
## for all other sources
with town_totals as(
SELECT
    province_name,
    town_name,
    SUM(people_served) AS total_ppl_serv
FROM
    combined_analysis_table
GROUP BY
    province_name, town_name
ORDER BY province_name,town_name)

SELECT
    ct.province_name,
    ct.town_name,
    ROUND((SUM(CASE 
                    WHEN source_type = 'river' THEN people_served 
                    ELSE 0 
                END)/ tt.total_ppl_serv * 100.0),0 ) AS river,
     ROUND((SUM(CASE 
                    WHEN source_type = 'shared_tap' THEN people_served 
                    ELSE 0 
                END) * 100.0 / tt.total_ppl_serv), 0) AS shared_tap,
    ROUND((SUM(CASE WHEN source_type = 'tap_in_home'
                    THEN people_served ELSE 0 
                END) * 100.0 / tt.total_ppl_serv), 0) AS tap_in_home,
    ROUND((SUM(CASE WHEN source_type = 'tap_in_home_broken'
                    THEN people_served ELSE 0 
                END) * 100.0 / tt.total_ppl_serv), 0) AS tap_in_home_broken,
    ROUND((SUM(CASE WHEN source_type = 'well'
                    THEN people_served ELSE 0 
                END) * 100.0 / tt.total_ppl_serv), 0) AS well
FROM
    combined_analysis_table as ct
JOIN
    town_totals as tt 

##−− Since the town names are not unique, we have to join on a composite key
ON ct.province_name = tt.province_name AND ct.town_name = tt.town_name

GROUP BY  ## We group by province first, then by town.
    ct.province_name,
    ct.town_name
ORDER BY
     ct.province_name,ct.town_name
limit 10;



 * mysql+pymysql://root:***@LOCALHOST:3306/md_water_services
10 rows affected.


province_name,town_name,river,shared_tap,tap_in_home,tap_in_home_broken,well
Akatsi,Harare,2,17,28,27,27
Akatsi,Kintampo,2,15,31,26,26
Akatsi,Lusaka,2,17,28,28,26
Akatsi,Rural,6,59,9,5,22
Amanzi,Abidjan,2,53,22,19,4
Amanzi,Amina,8,24,3,56,9
Amanzi,Asmara,3,49,24,20,4
Amanzi,Bello,3,53,20,22,3
Amanzi,Dahabu,3,37,55,1,4
Amanzi,Pwani,3,53,20,21,4


<div style="background-color:#e6a9e3; padding:15px; border-radius:10px;">

## ***Saving the result as `temporary table` for quicker access.***

> - Temporary tables in SQL are a nice way to `store the results of a complex query`. We run the query once, and the results are stored as a table. The
catch? If you close the database connection, it deletes the table, so you have to run it again each time you start working in MySQL. The benefit is
that `we can use the table to do more calculations, without running the whole query each time`.
> - We add this to the start of your query:
`CREATE TEMPORARY TABLE` town_aggregated_water_access


In [33]:
%%sql
DROP TEMPORARY TABLE IF EXISTS town_aggregated_water_access;
CREATE TEMPORARY TABLE town_aggregated_water_access

## for all other sources
with town_totals as(
SELECT
    province_name,
    town_name,
    SUM(people_served) AS total_ppl_serv
FROM
    combined_analysis_table
GROUP BY
    province_name, town_name
ORDER BY province_name,town_name)

SELECT
    ct.province_name,
    ct.town_name,
    ROUND((SUM(CASE 
                    WHEN source_type = 'river' THEN people_served 
                    ELSE 0 
                END)/ tt.total_ppl_serv * 100.0),0 ) AS river,
     ROUND((SUM(CASE 
                    WHEN source_type = 'shared_tap' THEN people_served 
                    ELSE 0 
                END) * 100.0 / tt.total_ppl_serv), 0) AS shared_tap,
    ROUND((SUM(CASE WHEN source_type = 'tap_in_home'
                    THEN people_served ELSE 0 
                END) * 100.0 / tt.total_ppl_serv), 0) AS tap_in_home,
    ROUND((SUM(CASE WHEN source_type = 'tap_in_home_broken'
                    THEN people_served ELSE 0 
                END) * 100.0 / tt.total_ppl_serv), 0) AS tap_in_home_broken,
    ROUND((SUM(CASE WHEN source_type = 'well'
                    THEN people_served ELSE 0 
                END) * 100.0 / tt.total_ppl_serv), 0) AS well
FROM
    combined_analysis_table as ct
JOIN
    town_totals as tt 

##−− Since the town names are not unique, we have to join on a composite key
ON ct.province_name = tt.province_name AND ct.town_name = tt.town_name

GROUP BY  ## We group by province first, then by town.
    ct.province_name,
    ct.town_name
ORDER BY
     ct.province_name,ct.town_name
;

 * mysql+pymysql://root:***@LOCALHOST:3306/md_water_services
0 rows affected.
31 rows affected.


[]

In [34]:
%%sql
SELECT * FROM town_aggregated_water_access LIMIT 10;

 * mysql+pymysql://root:***@LOCALHOST:3306/md_water_services
10 rows affected.


province_name,town_name,river,shared_tap,tap_in_home,tap_in_home_broken,well
Akatsi,Harare,2,17,28,27,27
Akatsi,Kintampo,2,15,31,26,26
Akatsi,Lusaka,2,17,28,28,26
Akatsi,Rural,6,59,9,5,22
Amanzi,Abidjan,2,53,22,19,4
Amanzi,Amina,8,24,3,56,9
Amanzi,Asmara,3,49,24,20,4
Amanzi,Bello,3,53,20,22,3
Amanzi,Dahabu,3,37,55,1,4
Amanzi,Pwani,3,53,20,21,4


<div style="background-color:#e6a9e3; padding:15px; border-radius:10px;">

## ***Ordering the result in `desc` for quicker analysis and comparison.***

In [35]:
%%sql
SELECT * FROM town_aggregated_water_access 
ORDER BY river desc
LIMIT 5;

 * mysql+pymysql://root:***@LOCALHOST:3306/md_water_services
5 rows affected.


province_name,town_name,river,shared_tap,tap_in_home,tap_in_home_broken,well
Sokoto,Rural,22,49,8,8,13
Sokoto,Bahari,21,11,36,12,20
Sokoto,Kofi,20,16,34,10,20
Sokoto,Cheche,19,16,35,12,18
Sokoto,Majengo,18,14,36,12,20


<div style="background-color:#e6a9e3; padding:15px; border-radius:10px;">
    
## INSIGHTS
This confirms what we saw on a provincial level; 
- People are drinking river water in Sokoto
- Some of our citizens are `forced to drink unsafe water from a river`, while a `lot of people have running water` in their homes in `Sokoto`
- Large disparities in water access like this often show that the `wealth distribution in Sokoto is very unequal`.
- We should mention this in our report.
- We should also `send our drilling teams to **Sokoto**` first to drill some wells for the people who are `drinking river water`, specifically the `rural` parts and the city of `Bahari`.


In [36]:
%%sql
SELECT * FROM town_aggregated_water_access 
ORDER BY shared_tap desc
LIMIT 5;

 * mysql+pymysql://root:***@LOCALHOST:3306/md_water_services
5 rows affected.


province_name,town_name,river,shared_tap,tap_in_home,tap_in_home_broken,well
Kilimani,Zuri,8,71,6,11,4
Akatsi,Rural,6,59,9,5,22
Kilimani,Rural,9,55,8,9,19
Amanzi,Bello,3,53,20,22,3
Amanzi,Pwani,3,53,20,21,4


In [37]:
%%sql
SELECT * FROM town_aggregated_water_access 
ORDER BY tap_in_home desc
LIMIT 5;

 * mysql+pymysql://root:***@LOCALHOST:3306/md_water_services
5 rows affected.


province_name,town_name,river,shared_tap,tap_in_home,tap_in_home_broken,well
Amanzi,Dahabu,3,37,55,1,4
Sokoto,Majengo,18,14,36,12,20
Sokoto,Bahari,21,11,36,12,20
Sokoto,Ilanga,16,12,36,15,21
Sokoto,Cheche,19,16,35,12,18


<div style="background-color:#e6a9e3; padding:15px; border-radius:10px;">
    
## SORTING THE DATA BY  `tap_in_home_broken` to look at the data for `Amina`town in `Amanzi`province

In [38]:
%%sql
SELECT * FROM town_aggregated_water_access 
ORDER BY tap_in_home_broken desc
LIMIT 5;

 * mysql+pymysql://root:***@LOCALHOST:3306/md_water_services
5 rows affected.


province_name,town_name,river,shared_tap,tap_in_home,tap_in_home_broken,well
Amanzi,Amina,8,24,3,56,9
Amanzi,Rural,3,27,30,30,10
Akatsi,Lusaka,2,17,28,28,26
Akatsi,Harare,2,17,28,27,27
Akatsi,Kintampo,2,15,31,26,26


<div style="background-color:#e6a9e3; padding:15px; border-radius:10px;">
    
## INSIGHTS
- In `Amina` only `3%` of Amina's citizens have access to running tapwater in their homes. `More than half of the people -56%` in Amina have taps installed in their homes, but they are not working.
> - We should send out teams to go and `fix the infrastructure` in Amina first.
> - Fixing taps in `people's homes`, means those people don't have to `queue for water anymore`, so the
queues in Amina will also get `shorter!`


In [39]:
%%sql
SELECT * FROM town_aggregated_water_access 
ORDER BY well desc
LIMIT 5;

 * mysql+pymysql://root:***@LOCALHOST:3306/md_water_services
5 rows affected.


province_name,town_name,river,shared_tap,tap_in_home,tap_in_home_broken,well
Hawassa,Amina,2,14,19,24,42
Hawassa,Zanzibar,0,22,22,17,40
Hawassa,Yaounde,2,14,22,23,38
Hawassa,Deka,3,16,23,21,38
Hawassa,Djenne,3,18,19,23,36


<div style="background-color:#e6a9e3; padding:15px; border-radius:10px;">
    
# Unveiling the Hidden Gems in these table
## 1. which town has the `highest ratio of people who have taps, but have no running water?`


In [41]:
%%sql
SELECT
    province_name,
    town_name,
    ROUND(tap_in_home_broken / (tap_in_home_broken + tap_in_home) * 100,0) AS Pct_broken_taps
FROM
    town_aggregated_water_access
ORDER BY Pct_broken_taps DESC
LIMIT 10;

 * mysql+pymysql://root:***@LOCALHOST:3306/md_water_services
10 rows affected.


province_name,town_name,Pct_broken_taps
Amanzi,Amina,95
Kilimani,Zuri,65
Hawassa,Amina,56
Hawassa,Djenne,55
Kilimani,Rural,53
Amanzi,Bello,52
Hawassa,Yaounde,51
Amanzi,Pwani,51
Hawassa,Serowe,50
Akatsi,Lusaka,50


<div style="background-color:#e6a9e3; padding:15px; border-radius:10px;">
    
## INSIGHTS
- We can see that `Amina has infrastructure installed`, but almost `none of it is working`, and only the capital city
- Dahabu's water infrastructure works.


In [42]:
%%sql
SELECT
    province_name,
    town_name,
    ROUND(tap_in_home_broken / (tap_in_home_broken + tap_in_home) * 100,0) AS Pct_broken_taps
FROM
    town_aggregated_water_access
ORDER BY Pct_broken_taps ASC
LIMIT 10;

 * mysql+pymysql://root:***@LOCALHOST:3306/md_water_services
10 rows affected.


province_name,town_name,Pct_broken_taps
Amanzi,Dahabu,2
Sokoto,Kofi,23
Sokoto,Bahari,25
Sokoto,Majengo,25
Sokoto,Cheche,26
Sokoto,Ilanga,29
Sokoto,Marang,30
Akatsi,Rural,36
Kilimani,Amara,39
Kilimani,Harare,40


<div style="background-color:#e6a9e3; padding:15px; border-radius:10px;">
    
## INSIGHTS
- Dahabu's water infrastructure works.
> - Strangely enough, `all of the politicians of the past government lived in Dahabu`, so they made sure they had water. The point is, look how simple our query is now! It's like we're back at the beginning of our journey!


<div style="background-color:#a4f8fb; padding:15px; border-radius:10px;">
    
# **Summary Report**
## INSIGHTS  
## slide 23- 25
A couple of weeks ago, we found some interesting insights:

1. Most water sources are **<span style="color:blue;">rural</span>** in Maji Ndogo.  
2. **<span style="color:blue;">43%</span>** of our people are using **<span style="color:blue;">shared taps</span>** — **2000 people** often share one tap.  
3. **<span style="color:blue;">31%</span>** of our population has **<span style="color:blue;">water infrastructure in their homes</span>**, but within that group:  
4.  - **<span style="color:blue;">45%</span>** face **<span style="color:red;">non-functional systems</span>** due to issues with pipes, pumps, and reservoirs.  
   - Towns like **<span style="color:green;">Amina</span>**, the rural parts of **<span style="color:green;">Amanzi</span>**, and several towns across **<span style="color:black;">Akatsi</span>** and **<span style="color:green;">Hawassa</span>** have broken infrastructure.  
5. **<span style="color:blue;">18%</span>** of our people are using **<span style="color:blue;">wells</span>**, but only **<span style="color:blue;">28%</span>** are **<span style="color:green;">clean</span>**.  
   - These are mostly in **<span style="color:green;">Hawassa</span>**, **<span style="color:green;">Kilimani</span>**, and **<span style="color:black;">Akatsi</span>**.  
6. Citizens often face **<span style="color:red;">long wait times</span>** for water — averaging **more than 120 minutes**:  
   - Queues are **very long** on **<span style="color:blue;">Saturdays</span>**.  
   - Queues are **longer** in the **<span style="color:blue;">mornings</span>** and **<span style="color:blue;">evenings</span>**.  
   - **<span style="color:green;">Wednesdays</span>** and **<span style="color:green;">Sundays</span>** have the shortest queues.  


<span style="background-color:#e6a9e3; display:block; padding:12px; border-radius:8px;">

# **Plan of Action**

1. **Focus on high-impact sources** -We want to focus our efforts on improving the water sources that `affect the most people `
   - Most people will benefit if we improve the **shared taps** first.

2. **Improve well water quality**  
   - Wells are a good source of water, but many are **contaminated**. Fixing this will benefit a lot of people.

3. **Repair existing infrastructure**  
   - Restoring running water will mean people won’t have to queue, **reducing wait times** for others — solving two problems at once.

4. **Avoid low-priority upgrades**  
   - Installing **home taps** will stretch resources too thin. For now, if queue times are low, we won't improve that source.

5. **Plan for rural challenges**  
   - . Most water sources are in rural areas. We need to ensure our teams know this as this means they will have to make these repairs/upgrades in rural areas where road conditions, supplies, and labour are harder challenges to overcome.





<div style="background-color:#a4f8fb;  border: 6px double blue; padding:15px;">

# **Practical solutions:**

1. If communities are using `rivers`, we will `dispatch trucks to those regions to provide water temporarily` in the short term, while we send out crews to drill for wells, providing a more permanent solution. **<span style="color:#0066cc;">Sokoto</span>** is the first province we will target.

2. If communities are using `wells`, we will `install filters` to purify the water. For chemically polluted wells, we can install **<span style=" color:#d6336c;">reverse osmosis (RO)</span>** filters, and for `wells with biological contamination`, we can install **<span style="color:#d6336c;">UV filters</span>** that `kill microorganisms` — but we should `install RO filters too`. In the long term, we must figure out why these sources are polluted.

3. For `shared taps`, in the short term, we can send additional water tankers to the busiest taps, on the busiest days. We can use the **queue time pivot table** we made to send tankers at the busiest times. Meanwhile, we can start the work on `installing extra taps` where they are needed. According to **<span style="color:#d6336c;">UN standards</span>**, the `maximum acceptable wait time` for water is **<span style="color:#d6336c;">30 minutes</span>**. With this in mind, our aim is to `install taps` to get queue times below **<span style="color:#d6336c;">30 min</span>**. Towns like **<span style="color:#0066cc;">Bello</span>**, **<span style="color:#0066cc;">Abidjan</span>** and **<span style="color:#0066cc;">Zuri</span>** have a lot of people using shared taps, so we will send out teams to those towns first.

4. `Shared taps` with short queue times (**<span style="color:#d6336c;">&lt; 30 min</span>**) represent a logistical challenge to further reduce waiting times. The most effective solution, `installing taps` in homes, is resource-intensive and better suited as a long-term goal.

5. Addressing `broken infrastructure` offers a significant impact even with just a single intervention. It is expensive to fix, but so many people can benefit from repairing one facility. For example, `fixing a reservoir` or `pipe` that multiple taps are connected to. We identified towns like **<span style="color:#0066cc;">Amina</span>**, **<span style="color:#0066cc;">Lusaka</span>**, **<span style="color:#0066cc;">Zuri</span>**, **<span style="color:#0066cc;">Djenne</span>** and rural parts of **<span style="color:#0066cc;">Amanzi</span>** seem to be good places to start.

</span>


<div style="border: 4px double red; padding: 10px; background-color: #fdf6f6;">
<h2 style="color: #2b6cb0; font-weight: bold;">A practical plan</h2>

## slide 26-33
    Our final goal is to implement our plan in the database.

    We have a plan to improve the water access in <span style="color: #2b6cb0;">Maji Ndogo</span>, so we need to think it through, and as our final task, 
1. we need to <code>create a table</code> where our teams have the information they need to <code>fix</code>, <code>upgrade</code> and <code>repair water sources</code>. 
2. They will need the addresses of the <code>places they should visit</code> (<span style="color: red;">street address</span>, <span style="color: blue;">town</span>, <span style="color: blue;">province</span>), the <code>type of water source</code> they should improve, and what should be done to improve it.<br><br>

3. We should also <code>make space for them in the database to update us</code> on their progress. 
4. We need to know if the <code>repair is complete</code>, and the <code>date it was completed</code>, and give them <code>space to upgrade the sources</code>.<br><br>

Let's call this table <span style="color: red;">Project_progress</span>.
</div>




In [None]:
%%sql
DROP TABLE IF EXISTS Project_progress;
CREATE TABLE Project_progress (
    Project_id SERIAL PRIMARY KEY,
    ## Project_id −− Unique key for sources in case we visit the same source more than once in the future.
    
    source_id VARCHAR(20) NOT NULL REFERENCES water_source(source_id) ON DELETE CASCADE ON UPDATE CASCADE,
    ## source_id −− Each of the sources we want to improve should exist, and should refer to the source table. This ensures data integrity.
    
    Address VARCHAR(50), ## Street address
    Town VARCHAR(30),
    Province VARCHAR(30),
    Source_type VARCHAR(50),
    Improvement VARCHAR(50), ## What the engineers should do at that place
    Source_status VARCHAR(50) DEFAULT 'Backlog' CHECK (Source_status IN ('Backlog', 'In progress', 'Complete')),
    ## Source_status −− We want to limit the type of information engineers can give us, so welimit Source_status.
    
    ##  By DEFAULT all projects are in the "Backlog" which is like a TODO list.
    ##CHECK() ensures only those three options will be accepted. This helps to maintain clean data.
    
    Date_of_completion DATE, ## Engineers will add this the day the source has been upgraded.
    Comments TEXT  ##Engineers can leave comments. We use a TEXT type that has no limit on char length
    );

In [43]:
%%sql
DROP TABLE IF EXISTS Project_progress; 
CREATE TABLE Project_progress (
    Project_id SERIAL PRIMARY KEY,
    source_id VARCHAR(20) NOT NULL REFERENCES water_source(source_id) ON DELETE CASCADE ON UPDATE CASCADE,
    Address VARCHAR(50),
    Town VARCHAR(30),
    Province VARCHAR(30),
    Source_type VARCHAR(50),
    pollution_results VARCHAR(150),
    Improvement VARCHAR(50),
    Source_status VARCHAR(50) DEFAULT 'Backlog' CHECK (Source_status IN ('Backlog', 'In progress', 'Complete')),
    Date_of_completion DATE,
    Comments TEXT
)
;


 * mysql+pymysql://root:***@LOCALHOST:3306/md_water_services
0 rows affected.
0 rows affected.


[]

<div style="border: 4px double PURPLE; padding: 10px; background-color: YELLOW;">
<h2 style="color: #2b6cb0; font-weight: bold;">Improvements</h2>

1. <span style="color: green; font-weight: bold;">Rivers</span> → Drill wells  
2. <span style="color: green; font-weight: bold;">wells</span>: if the well is contaminated with <span style="color: oRED; font-weight: bold;">chemicals</span> → Install <span style="color: red; font-weight: bold;">RO filter</span>  
3. <span style="color: green; font-weight: bold;">wells</span>: if the well is contaminated with <span style="color: RED; font-weight: bold;">biological contaminants</span> → Install <span style="color: red; font-weight: bold;">UV</span> and <span style="color: red; font-weight: bold;">RO filter</span>  
4. <span style="color: green; font-weight: bold;">shared_taps</span>: if the queue is longer than <span style="color: blue; font-weight: bold;">30 min (30 min and above)</span> → Install <span style="color: red; font-weight: bold;">X taps</span> nearby where <span style="color: red; font-weight: bold;">X</span> = FLOOR(time_in_queue / 30).  
5. <span style="color: green; font-weight: bold;">tap_in_home_broken</span> → Diagnose local infrastructure

***for wells and shared taps we have some `IF logic`, so we should be thinking `CASE functions!` Let's take the various Improvements
one by one, then combine them into one query at the end.***


<div style="background-color:#A5f7fb; padding:16px; border-radius:10px;">

# <strong>KEY STEPS-01</strong>

 Join the `location`, `visits`, and `well_pollution` tables to the `water_source` table. Since well_pollution only has data for wells, we have
to join those records to the water_source table with a `LEFT JOIN` and we use `visits` to link the various `id's `together.

</div>



In [44]:
%%sql
SELECT
    location.address,
    location.town_name,
    location.province_name,
    water_source.source_id,
    water_source.type_of_water_source,
    well_pollution.results
FROM
    water_source
LEFT JOIN
    well_pollution ON water_source.source_id = well_pollution.source_id
INNER JOIN
    visits ON water_source.source_id = visits.source_id
INNER JOIN
    location ON location.location_id = visits.location_id
LIMIT 10;


 * mysql+pymysql://root:***@LOCALHOST:3306/md_water_services
10 rows affected.


address,town_name,province_name,source_id,type_of_water_source,results
2 Addis Ababa Road,Harare,Akatsi,AkHa00000224,tap_in_home,
10 Addis Ababa Road,Harare,Akatsi,AkHa00001224,tap_in_home_broken,
9 Addis Ababa Road,Harare,Akatsi,AkHa00002224,tap_in_home_broken,
139 Addis Ababa Road,Harare,Akatsi,AkHa00003224,well,Clean
17 Addis Ababa Road,Harare,Akatsi,AkHa00004224,tap_in_home_broken,
125 Addis Ababa Road,Harare,Akatsi,AkHa00005224,tap_in_home,
98 Addis Ababa Road,Harare,Akatsi,AkHa00006224,tap_in_home,
21 Addis Ababa Road,Harare,Akatsi,AkHa00007224,tap_in_home,
11 Addis Ababa Road,Harare,Akatsi,AkHa00008224,well,Clean
6 Addis Ababa Road,Harare,Akatsi,AkHa00009224,well,Contaminated: Biological


<div style="background-color:#A5f7fb; padding:16px; border-radius:10px;">

# <strong>KEY STEPS -02</strong>

-  Filter the data to only contain `sources we want to improve` by thinking through the logic first:  
   1. Only records with `visit_count = 1` are allowed.  
   2. Any of the following rows can be included:  
      a. Where `shared taps have queue times over 30 min`.  
      b. Only `wells that are contaminated are allowed` — so we `exclude wells` that are **Clean**.  
      c. Include any `river` and `tap_in_home_broken` sources.

   3. Split up the logic into the WHERE and CASE clauses LIKE BELOW

      
![Key Steps Diagram](AS.png)
</div>


In [45]:
%%sql
SELECT
    location.address,
    location.town_name,
    location.province_name,
    water_source.source_id,
    water_source.type_of_water_source,
    well_pollution.results
FROM
    water_source
LEFT JOIN
    well_pollution ON water_source.source_id = well_pollution.source_id
INNER JOIN
    visits ON water_source.source_id = visits.source_id
INNER JOIN
    location ON location.location_id = visits.location_id
WHERE  visits.visit_count = 1
AND (
        (water_source.type_of_water_source = 'shared_tap' AND visits.time_in_queue >= 30)
        OR (water_source.type_of_water_source = 'well' AND well_pollution.results != 'clean')
        OR water_source.type_of_water_source IN ('tap_in_home_broken', 'river')
    )
LIMIT 10;


 * mysql+pymysql://root:***@LOCALHOST:3306/md_water_services
10 rows affected.


address,town_name,province_name,source_id,type_of_water_source,results
36 Pwani Mchangani Road,Ilanga,Sokoto,SoIl32582224,river,
129 Ziwa La Kioo Road,Rural,Kilimani,KiRu28935224,well,Contaminated: Biological
18 Mlima Tazama Avenue,Rural,Hawassa,HaRu19752224,shared_tap,
100 Mogadishu Road,Lusaka,Akatsi,AkLu01628224,well,Contaminated: Biological
26 Bahari Ya Faraja Road,Rural,Kilimani,KiRu29315224,river,
104 Kenyatta Street,Rural,Akatsi,AkRu05234224,tap_in_home_broken,
117 Kampala Road,Zanzibar,Hawassa,HaZa21742224,well,Contaminated: Chemical
55 Fennec Way,Rural,Sokoto,SoRu35008224,shared_tap,
52 Moroni Avenue,Rural,Sokoto,SoRu35703224,well,Contaminated: Biological
51 Addis Ababa Road,Harare,Akatsi,AkHa00070224,well,Contaminated: Chemical


In [46]:
%%sql
with project_progress as (

SELECT
    location.address,
    location.town_name,
    location.province_name,
    water_source.source_id,
    water_source.type_of_water_source,
    well_pollution.results
FROM
    water_source
LEFT JOIN
    well_pollution ON water_source.source_id = well_pollution.source_id
INNER JOIN
    visits ON water_source.source_id = visits.source_id
INNER JOIN
    location ON location.location_id = visits.location_id
WHERE  visits.visit_count = 1
AND (
        (water_source.type_of_water_source = 'shared_tap' AND visits.time_in_queue >= 30)
        OR (water_source.type_of_water_source = 'well' AND well_pollution.results != 'clean')
        OR water_source.type_of_water_source IN ('tap_in_home_broken', 'river')
    ))
SELECT
    COUNT(*)
FROM
    project_progress
LIMIT 10;


 * mysql+pymysql://root:***@LOCALHOST:3306/md_water_services
1 rows affected.


COUNT(*)
25398


<div style="background-color:#A5f7fb; padding:16px; border-radius:10px;">

# <strong>KEY STEPS -02.1 Wells</strong> 
    
- We Use some `control flow logic` to create `Install UV and RO filter` or `Install RO filter` values in the `Improvement` column where the `results` of the pollution tests were `Contaminated: Biological` and `Contaminated: Chemical` respectively. Think about the data you'll need, and which table to find
it in.
- Use `ELSE NULL` for the `final` alternative.


In [47]:
%%sql
SELECT
    location.address,
    location.town_name,
    location.province_name,
    water_source.source_id,
    water_source.type_of_water_source,
    well_pollution.results,
    CASE
        WHEN water_source.type_of_water_source='well' AND well_pollution.results='Contaminated: Biological' THEN 'Install UV and RO filter'
        WHEN water_source.type_of_water_source='well' AND well_pollution.results='Contaminated: Chemical' THEN 'Install RO filter'
        ELSE NULL
    END AS improvement
FROM
    water_source
LEFT JOIN
    well_pollution ON water_source.source_id = well_pollution.source_id
INNER JOIN
    visits ON water_source.source_id = visits.source_id
INNER JOIN
    location ON location.location_id = visits.location_id
WHERE  visits.visit_count = 1
AND (
        (water_source.type_of_water_source = 'shared_tap' AND visits.time_in_queue >= 30)
        OR (water_source.type_of_water_source = 'well' AND well_pollution.results != 'clean')
        OR water_source.type_of_water_source IN ('tap_in_home_broken', 'river')
    )
LIMIT 10;

 * mysql+pymysql://root:***@LOCALHOST:3306/md_water_services
10 rows affected.


address,town_name,province_name,source_id,type_of_water_source,results,improvement
36 Pwani Mchangani Road,Ilanga,Sokoto,SoIl32582224,river,,
129 Ziwa La Kioo Road,Rural,Kilimani,KiRu28935224,well,Contaminated: Biological,Install UV and RO filter
18 Mlima Tazama Avenue,Rural,Hawassa,HaRu19752224,shared_tap,,
100 Mogadishu Road,Lusaka,Akatsi,AkLu01628224,well,Contaminated: Biological,Install UV and RO filter
26 Bahari Ya Faraja Road,Rural,Kilimani,KiRu29315224,river,,
104 Kenyatta Street,Rural,Akatsi,AkRu05234224,tap_in_home_broken,,
117 Kampala Road,Zanzibar,Hawassa,HaZa21742224,well,Contaminated: Chemical,Install RO filter
55 Fennec Way,Rural,Sokoto,SoRu35008224,shared_tap,,
52 Moroni Avenue,Rural,Sokoto,SoRu35703224,well,Contaminated: Biological,Install UV and RO filter
51 Addis Ababa Road,Harare,Akatsi,AkHa00070224,well,Contaminated: Chemical,Install RO filter


<div style="background-color:#A5f7fb; padding:16px; border-radius:10px;">

# <strong>KEY STEPS -02.2 Rivers</strong> 
    
- For all water sources where `type` is `River`, we add `Drill well` to the `Improvement` column.  
- This ensures that river water quality is improved by providing nearby, cleaner well water sources.
</div>


In [48]:
%%sql
SELECT
    location.address,
    location.town_name,
    location.province_name,
    water_source.source_id,
    water_source.type_of_water_source,
    well_pollution.results,
    CASE
        WHEN water_source.type_of_water_source='well' AND well_pollution.results='Contaminated: Biological' THEN 'Install UV and RO filter'
        WHEN water_source.type_of_water_source='well' AND well_pollution.results='Contaminated: Chemical' THEN 'Install RO filter'
        WHEN water_source.type_of_water_source='River' THEN 'Drill well'
        ELSE NULL
    END AS improvement
FROM
    water_source
LEFT JOIN
    well_pollution ON water_source.source_id = well_pollution.source_id
INNER JOIN
    visits ON water_source.source_id = visits.source_id
INNER JOIN
    location ON location.location_id = visits.location_id
WHERE  visits.visit_count = 1
AND (
        (water_source.type_of_water_source = 'shared_tap' AND visits.time_in_queue >= 30)
        OR (water_source.type_of_water_source = 'well' AND well_pollution.results != 'clean')
        OR water_source.type_of_water_source IN ('tap_in_home_broken', 'river')
    )
LIMIT 10;

 * mysql+pymysql://root:***@LOCALHOST:3306/md_water_services
10 rows affected.


address,town_name,province_name,source_id,type_of_water_source,results,improvement
36 Pwani Mchangani Road,Ilanga,Sokoto,SoIl32582224,river,,Drill well
129 Ziwa La Kioo Road,Rural,Kilimani,KiRu28935224,well,Contaminated: Biological,Install UV and RO filter
18 Mlima Tazama Avenue,Rural,Hawassa,HaRu19752224,shared_tap,,
100 Mogadishu Road,Lusaka,Akatsi,AkLu01628224,well,Contaminated: Biological,Install UV and RO filter
26 Bahari Ya Faraja Road,Rural,Kilimani,KiRu29315224,river,,Drill well
104 Kenyatta Street,Rural,Akatsi,AkRu05234224,tap_in_home_broken,,
117 Kampala Road,Zanzibar,Hawassa,HaZa21742224,well,Contaminated: Chemical,Install RO filter
55 Fennec Way,Rural,Sokoto,SoRu35008224,shared_tap,,
52 Moroni Avenue,Rural,Sokoto,SoRu35703224,well,Contaminated: Biological,Install UV and RO filter
51 Addis Ababa Road,Harare,Akatsi,AkHa00070224,well,Contaminated: Chemical,Install RO filter


<div style="background-color:#A5f7fb; padding:16px; border-radius:10px;">

# <strong>KEY STEPS - 02.3 Shared Taps</strong> 
    
- We use some `control flow logic` to create an `Install X taps nearby` value in the `Improvement` column where the `type_of_water_source` is `Shared_tap`.
- The number of taps to install is calculated as `FLOOR(queue_time / 30)`, which rounds **down** to the nearest whole number to avoid adding extra taps unless the queue time exceeds the threshold.
- For example: a queue time of `45 min` results in `1` tap, while `60 min` results in `2` taps.
- Use `ELSE NULL` for the `final` alternative.

</div>


In [49]:
%%sql
SELECT
    location.address,
    location.town_name,
    location.province_name,
    water_source.source_id,
    water_source.type_of_water_source,
    well_pollution.results,
    CASE
        WHEN water_source.type_of_water_source='well' AND well_pollution.results='Contaminated: Biological' THEN 'Install UV and RO filter'
        WHEN water_source.type_of_water_source='well' AND well_pollution.results='Contaminated: Chemical' THEN 'Install RO filter'
        WHEN water_source.type_of_water_source='River' THEN 'Drill well'
        WHEN water_source.type_of_water_source = 'shared_tap' AND FLOOR(visits.time_in_queue / 30) > 0 THEN CONCAT('Install ', FLOOR(visits.time_in_queue / 30), ' taps')
        ELSE NULL
    END AS improvement
FROM
    water_source
LEFT JOIN
    well_pollution ON water_source.source_id = well_pollution.source_id
INNER JOIN
    visits ON water_source.source_id = visits.source_id
INNER JOIN
    location ON location.location_id = visits.location_id
WHERE  visits.visit_count = 1
AND (
        (water_source.type_of_water_source = 'shared_tap' AND visits.time_in_queue >= 30)
        OR (water_source.type_of_water_source = 'well' AND well_pollution.results != 'clean')
        OR water_source.type_of_water_source IN ('tap_in_home_broken', 'river')
    )
LIMIT 10;

 * mysql+pymysql://root:***@LOCALHOST:3306/md_water_services
10 rows affected.


address,town_name,province_name,source_id,type_of_water_source,results,improvement
36 Pwani Mchangani Road,Ilanga,Sokoto,SoIl32582224,river,,Drill well
129 Ziwa La Kioo Road,Rural,Kilimani,KiRu28935224,well,Contaminated: Biological,Install UV and RO filter
18 Mlima Tazama Avenue,Rural,Hawassa,HaRu19752224,shared_tap,,Install 2 taps
100 Mogadishu Road,Lusaka,Akatsi,AkLu01628224,well,Contaminated: Biological,Install UV and RO filter
26 Bahari Ya Faraja Road,Rural,Kilimani,KiRu29315224,river,,Drill well
104 Kenyatta Street,Rural,Akatsi,AkRu05234224,tap_in_home_broken,,
117 Kampala Road,Zanzibar,Hawassa,HaZa21742224,well,Contaminated: Chemical,Install RO filter
55 Fennec Way,Rural,Sokoto,SoRu35008224,shared_tap,,Install 8 taps
52 Moroni Avenue,Rural,Sokoto,SoRu35703224,well,Contaminated: Biological,Install UV and RO filter
51 Addis Ababa Road,Harare,Akatsi,AkHa00070224,well,Contaminated: Chemical,Install RO filter


<div style="background-color:#A5f7fb; padding:16px; border-radius:10px;">

# <strong>To confirm this </strong> 

In [51]:
%%sql
SELECT
    record_id,
    location_id,
    source_id,
    time_of_record,
    visit_count,
    time_in_queue,
    assigned_employee_id,
    time_in_queue,
    FLOOR(time_in_queue / 30) AS taps_to_install
FROM visits
where source_id IN ('SoRu35008224','HaRu19752224' ) AND visit_count=1

 * mysql+pymysql://root:***@LOCALHOST:3306/md_water_services
2 rows affected.


record_id,location_id,source_id,time_of_record,visit_count,time_in_queue,assigned_employee_id,time_in_queue_1,taps_to_install
2,HaRu19752,HaRu19752224,2021-01-01 09:36:00,1,62,40,62,2
12,SoRu35008,SoRu35008224,2021-01-01 11:04:00,1,240,1,240,8


<div style="background-color:#A5f7fb; padding:16px; border-radius:10px;">

# <strong>KEY STEPS - 02.4 In-home Taps</strong> 
    
- We target `in_home_tap` water sources that are **broken**.
- Broken in-home taps indicate **damaged local infrastructure**, which requires inspection.
- We use a `CASE` statement to update the `Improvement` column to `Diagnose local infrastructure` for these cases.
- Use `ELSE NULL` for the `final` alternative.
- This step ensures that no `NULL` values remain and all flagged sources are categorized for improvement.

</div>


In [52]:
%%sql
SELECT
    location.address,
    location.town_name,
    location.province_name,
    water_source.source_id,
    water_source.type_of_water_source,
    well_pollution.results,
    CASE
        WHEN water_source.type_of_water_source='well' AND well_pollution.results='Contaminated: Biological' THEN 'Install UV and RO filter'
        WHEN water_source.type_of_water_source='well' AND well_pollution.results='Contaminated: Chemical' THEN 'Install RO filter'
        WHEN water_source.type_of_water_source='River' THEN 'Drill well'
        WHEN water_source.type_of_water_source = 'shared_tap' AND FLOOR(visits.time_in_queue / 30) > 0 THEN CONCAT('Install ', FLOOR(visits.time_in_queue / 30), ' taps')
        WHEN water_source.type_of_water_source='tap_in_home_broken' THEN 'Diagnose_infrastructure'
        ELSE NULL
    END AS improvement
FROM
    water_source
LEFT JOIN
    well_pollution ON water_source.source_id = well_pollution.source_id
INNER JOIN
    visits ON water_source.source_id = visits.source_id
INNER JOIN
    location ON location.location_id = visits.location_id
WHERE  visits.visit_count = 1
AND (
        (water_source.type_of_water_source = 'shared_tap' AND visits.time_in_queue >= 30)
        OR (water_source.type_of_water_source = 'well' AND well_pollution.results != 'clean')
        OR water_source.type_of_water_source IN ('tap_in_home_broken', 'river')
    )
LIMIT 10;

 * mysql+pymysql://root:***@LOCALHOST:3306/md_water_services
10 rows affected.


address,town_name,province_name,source_id,type_of_water_source,results,improvement
36 Pwani Mchangani Road,Ilanga,Sokoto,SoIl32582224,river,,Drill well
129 Ziwa La Kioo Road,Rural,Kilimani,KiRu28935224,well,Contaminated: Biological,Install UV and RO filter
18 Mlima Tazama Avenue,Rural,Hawassa,HaRu19752224,shared_tap,,Install 2 taps
100 Mogadishu Road,Lusaka,Akatsi,AkLu01628224,well,Contaminated: Biological,Install UV and RO filter
26 Bahari Ya Faraja Road,Rural,Kilimani,KiRu29315224,river,,Drill well
104 Kenyatta Street,Rural,Akatsi,AkRu05234224,tap_in_home_broken,,Diagnose_infrastructure
117 Kampala Road,Zanzibar,Hawassa,HaZa21742224,well,Contaminated: Chemical,Install RO filter
55 Fennec Way,Rural,Sokoto,SoRu35008224,shared_tap,,Install 8 taps
52 Moroni Avenue,Rural,Sokoto,SoRu35703224,well,Contaminated: Biological,Install UV and RO filter
51 Addis Ababa Road,Harare,Akatsi,AkHa00070224,well,Contaminated: Chemical,Install RO filter


<div style="background-color:#A5f7fb; padding:16px; border-radius:10px;">

# <strong>KEY STEPS – 06. Add Data to <code>Project_progress</code></strong> 
    
- Now that we have the cleaned and transformed data ready for the engineers, we **insert** it into the `Project_progress` table.
- Ensure the **column order** in your `INSERT INTO ... SELECT` query exactly matches the structure of the `Project_progress` table.
- If you accidentally insert incorrect data or mismatch columns, you can reset by running:  
  `DROP TABLE project_progress;`  and then re-running your table creation and insert queries.
- This step **finalizes** the pipeline, making the engineer-ready dataset available for review and action.

</div>


In [53]:
%%sql
INSERT INTO Project_progress (
    Address,
    Town,
    Province,
    source_id ,
    Source_type ,
    pollution_results,
    Improvement 
)
SELECT
    location.address,
    location.town_name,
    location.province_name,
    water_source.source_id,
    water_source.type_of_water_source,
    well_pollution.results,
    CASE
        WHEN water_source.type_of_water_source='well' AND well_pollution.results='Contaminated: Biological' THEN 'Install UV and RO filter'
        WHEN water_source.type_of_water_source='well' AND well_pollution.results='Contaminated: Chemical' THEN 'Install RO filter'
        WHEN water_source.type_of_water_source='River' THEN 'Drill well'
        WHEN water_source.type_of_water_source = 'shared_tap' AND FLOOR(visits.time_in_queue / 30) > 0 THEN CONCAT('Install ', FLOOR(visits.time_in_queue / 30), ' taps')
        WHEN water_source.type_of_water_source='tap_in_home_broken' THEN 'Diagnose_infrastructure'
        ELSE NULL
    END AS improvement
FROM
    water_source
LEFT JOIN
    well_pollution ON water_source.source_id = well_pollution.source_id
INNER JOIN
    visits ON water_source.source_id = visits.source_id
INNER JOIN
    location ON location.location_id = visits.location_id
WHERE  visits.visit_count = 1
AND (
        (water_source.type_of_water_source = 'shared_tap' AND visits.time_in_queue >= 30)
        OR (water_source.type_of_water_source = 'well' AND well_pollution.results != 'clean')
        OR water_source.type_of_water_source IN ('tap_in_home_broken', 'river')
    )

 * mysql+pymysql://root:***@LOCALHOST:3306/md_water_services
25398 rows affected.


[]

In [54]:
%%sql
SELECT * FROM md_water_services.project_progress

limit 10;

 * mysql+pymysql://root:***@LOCALHOST:3306/md_water_services
10 rows affected.


Project_id,source_id,Address,Town,Province,Source_type,pollution_results,Improvement,Source_status,Date_of_completion,Comments
1,SoIl32582224,36 Pwani Mchangani Road,Ilanga,Sokoto,river,,Drill well,Backlog,,
2,KiRu28935224,129 Ziwa La Kioo Road,Rural,Kilimani,well,Contaminated: Biological,Install UV and RO filter,Backlog,,
3,HaRu19752224,18 Mlima Tazama Avenue,Rural,Hawassa,shared_tap,,Install 2 taps,Backlog,,
4,AkLu01628224,100 Mogadishu Road,Lusaka,Akatsi,well,Contaminated: Biological,Install UV and RO filter,Backlog,,
5,KiRu29315224,26 Bahari Ya Faraja Road,Rural,Kilimani,river,,Drill well,Backlog,,
6,AkRu05234224,104 Kenyatta Street,Rural,Akatsi,tap_in_home_broken,,Diagnose_infrastructure,Backlog,,
7,HaZa21742224,117 Kampala Road,Zanzibar,Hawassa,well,Contaminated: Chemical,Install RO filter,Backlog,,
8,SoRu35008224,55 Fennec Way,Rural,Sokoto,shared_tap,,Install 8 taps,Backlog,,
9,SoRu35703224,52 Moroni Avenue,Rural,Sokoto,well,Contaminated: Biological,Install UV and RO filter,Backlog,,
10,AkHa00070224,51 Addis Ababa Road,Harare,Akatsi,well,Contaminated: Chemical,Install RO filter,Backlog,,


In [55]:
%%sql
SELECT * FROM md_water_services.project_progress
where improvement IS  NULL
limit 10;

 * mysql+pymysql://root:***@LOCALHOST:3306/md_water_services
0 rows affected.


Project_id,source_id,Address,Town,Province,Source_type,pollution_results,Improvement,Source_status,Date_of_completion,Comments


<div style="border: 4px double PURPLE; padding: 10px; background-color: YELLOW;">
<h2 style="color: #2b6cb0; font-weight: bold;">THANKYOU ALL</h2>

Finally, `thank you` for sticking with me through this project. I know there were some tough times in this project; `Window Functions, JOINS,` and even
`corruption! `I'm so glad you struggled through it. The Academy does its best to show you how SQL works, but it is only when you start solving problems like this that you truly understand how to use this tool to answer data questions.
12:31
I heard you are meeting up with our `visualisation `expert soon, `Dalila`. She mentored me when I joined the team, so I'm sure you will learn a lot from
her!

My friend, Pula! In Maji Ndogo, it means "rain" and signifies blessings and prosperity.

<h2 style="color: #2b6cb0; font-weight: bold;">I hope we talk soon.</h2>
<h2 style="color: #2b6cb0; font-weight: bold;">Take care!</h2>


<div style="text-align:center;">
  <img src="picture2.jpg"
       alt="Profile"
       style="width:180px; height:180px; border-radius:50%; object-fit:cover;
              border:5px solid #cc0000; box-shadow:0 4px 10px rgba(0,0,0,0.15);" />
</div>
