# Weaving the data threads of Maji Ndogo's narrative
Maji Ndogo: From analysis to action

In this stage, we will be integrating the auditor's report into the database, which will add valuable insights and recommendations for further improvement. This integration will enhance the overall functionality and effectiveness of the database..


# Overview:

#### 1. Generating an ERD:
Understanding the database structure

#### 2. Integrating the report:
Adding the auditor report to our database

#### 3. Linking records:
Joining employee data to the report

#### 4. Gathering evidence:
Building a complex query seeking truth

### Connecting to our MySQL database

Since we have a MySQL database, we can connect to it using mysql and pymysql.

In [1]:
# Load and activate the SQL extension to allow us to execute SQL in a Jupyter notebook.

%load_ext sql

In [2]:
# Establish a connection to the local database using the '%sql' magic command.
# Replace 'password' with our connection password and `db_name` with our database name.

%sql mysql+pymysql://root:November28!@localhost:3306/md_water_services

'Connected: root@md_water_services'

# 1. Generating an ERD:
    
Before proceeding with the addition of the ERD diagram, it is essential to review the auditor's report and address any issues highlighted. This will ensure that the ERD accurately represents the current state of the database and reflects the necessary modifications based on the audit findings.

By incorporating the ERD diagram, we will have a visual representation of the database structure and its relationships, providing a clearer understanding of the data architecture. This will serve as a valuable reference for future analysis, decision-making, and communication with stakeholders.

Let's proceed with adding the ERD diagram and continue our journey towards transforming Maji Ndogo's narrative into actionable insights and positive change.

![Image Description](ERD.png)


# 2. Intergrating the Auditor's report

We begin by creating a new SQL query to drop the auditor_report table if it already exists and create a new table with the specified columns. Then import the auditor.csv file

In [3]:
%%sql

DROP TABLE IF EXISTS `auditor_report`;
CREATE TABLE `auditor_report` (
  `location_id` VARCHAR(32),
  `type_of_water_source` VARCHAR(64),
  `true_water_source_score` INT DEFAULT NULL,
  `statements` VARCHAR(255)
);

 * mysql+pymysql://root:***@localhost:3306/md_water_services
0 rows affected.
0 rows affected.


[]

##### Lets view what we just imported.

In [6]:
%%sql

SELECT *
FROM auditor_report
Limit 10;

 * mysql+pymysql://root:***@localhost:3306/md_water_services
10 rows affected.


location_id,type_of_water_source,true_water_source_score,statements
SoRu34980,well,1,"Residents admired the official's commitment to enhancing urban life, praising their cooperative and inclusive approach."
AkRu08112,well,3,"Villagers spoke highly of the official's dedication and genuine interest in their lives, fostering a sense of belonging and appreciation."
AkLu02044,river,0,"Villagers were touched by the official's interactions, noting their humility, strong work ethic, and respectful attitude."
AkHa00421,well,3,"Villagers were moved by the official's visit, praising their hard work, humility, and the profound sense of connection they fostered."
SoRu35221,river,0,"A photographer's lens captures the queue, though his own struggle for water is a hidden part of the story."
HaAm16170,well,1,"With an open heart, the official created an atmosphere of unity and familial camaraderie among the villagers."
AkRu04812,well,3,"The official's presence left an indelible mark, reflecting their humility, dedication, and the genuine connections they nurtured."
AkRu08304,well,3,"The official's interactions resonated deeply with the villagers, leaving a lasting impression of respect and camaraderie."
AkRu05107,well,2,"Villagers spoke highly of the official's dedication and genuine interest in their lives, fostering a sense of belonging and appreciation."
AkRu05215,well,3,"Villagers admired the official's visit for its respectful interactions, hard work, and genuine concern."


I see a location_id, type of water source at that location, and the quality score of the water source, that is now independently measured. Our auditor also investigated each site a bit by speaking to a few locals. Their statements are also captured in his results.

We need to tackle a couple of questions here.

 1. Is there a difference in the scores?

 2. If so, are there patterns?


#### 1. Is there a difference in the scores?

For this question, we will have to compare the quality scores in the water_quality table to the auditor's scores. The auditor_report table used location_id, but the quality scores table only has a record_id we can use. The visits table links location_id and record_id, so we can link the auditor_report table and water_quality using the visits table.

##### So first, we grab the location_id and true_water_source_score columns from auditor_report

In [7]:
%%sql

SELECT
    location_id,
    true_water_source_score
FROM
    auditor_report
LIMIT 5;

 * mysql+pymysql://root:***@localhost:3306/md_water_services
5 rows affected.


location_id,true_water_source_score
SoRu34980,1
AkRu08112,3
AkLu02044,0
AkHa00421,3
SoRu35221,0


##### Now, we join the visits table to the auditor_report table. Making sure to grab subjective_quality_score, record_id and location_id.

In [9]:
%%sql

SELECT
    auditor_report.location_id AS audit_location,
    auditor_report.true_water_source_score,
    visits.location_id AS visit_location,
    visits.record_id
FROM
  auditor_report
JOIN
    visits 
    ON auditor_report.location_id = visits.location_id
Limit 5;

 * mysql+pymysql://root:***@localhost:3306/md_water_services
5 rows affected.


audit_location,true_water_source_score,visit_location,record_id
SoRu34980,1,SoRu34980,5185
AkRu08112,3,AkRu08112,59367
AkLu02044,0,AkLu02044,37379
AkHa00421,3,AkHa00421,51627
SoRu35221,0,SoRu35221,28758


##### Now we retrieve the corresponding scores from the water_quality table, performing another join operation using the record_id as the connecting key

In [10]:
%%sql

SELECT
  auditor_report.location_id AS audit_location,
  auditor_report.true_water_source_score,
  visits.location_id AS visit_location,
  visits.record_id,
  water_quality.subjective_quality_score
FROM
  auditor_report
JOIN
  visits ON auditor_report.location_id = visits.location_id
JOIN
  water_quality ON visits.record_id = water_quality.record_id
LIMIT 5;

 * mysql+pymysql://root:***@localhost:3306/md_water_services
5 rows affected.


audit_location,true_water_source_score,visit_location,record_id,subjective_quality_score
SoRu34980,1,SoRu34980,5185,1
AkRu08112,3,AkRu08112,59367,3
AkLu02044,0,AkLu02044,37379,0
AkHa00421,3,AkHa00421,51627,3
SoRu35221,0,SoRu35221,28758,0


##### Clean up the resulting table by removing duplicate columns and renaming them for clarity

In [11]:
%%sql

SELECT
  visits.location_id AS location_id,
  visits.record_id,
  auditor_report.true_water_source_score AS auditor_score,
  water_quality.subjective_quality_score AS employee_score
FROM
  auditor_report
JOIN
  visits ON auditor_report.location_id = visits.location_id
JOIN
  water_quality ON visits.record_id = water_quality.record_id
LIMIT 10;

 * mysql+pymysql://root:***@localhost:3306/md_water_services
10 rows affected.


location_id,record_id,auditor_score,employee_score
SoRu34980,5185,1,1
AkRu08112,59367,3,3
AkLu02044,37379,0,0
AkHa00421,51627,3,3
SoRu35221,28758,0,0
HaAm16170,31048,1,1
AkRu04812,1513,3,3
AkRu08304,1218,3,3
AkRu05107,8322,2,2
AkRu05215,21160,3,10


##### Ok, let's analyse! A good starting point is to check if the auditor's and exployees' scores agree. We will use a WHERE clause and check if surveyor_score = auditor_score.

In [13]:
%%sql

SELECT
  visits.location_id AS location_id,
  visits.record_id,
  auditor_report.true_water_source_score AS auditor_score,
  water_quality.subjective_quality_score AS employee_score
FROM
  auditor_report
JOIN
  visits ON auditor_report.location_id = visits.location_id
JOIN
  water_quality ON visits.record_id = water_quality.record_id
WHERE
  auditor_report.true_water_source_score = water_quality.subjective_quality_score

LIMIT 20;

 * mysql+pymysql://root:***@localhost:3306/md_water_services
20 rows affected.


location_id,record_id,auditor_score,employee_score
SoRu34980,5185,1,1
AkRu08112,59367,3,3
AkLu02044,37379,0,0
AkHa00421,51627,3,3
SoRu35221,28758,0,0
HaAm16170,31048,1,1
AkRu04812,1513,3,3
AkRu08304,1218,3,3
AkRu05107,8322,2,2
HaDe16541,13070,2,2


##### We got 2505 rows. Some of the locations were visited multiple times, so these records are duplicated here. To fix it, we set visits.visit_count = 1 in the WHERE clause.

In [16]:
%%sql

SELECT
  visits.location_id AS location_id,
  visits.record_id,
  auditor_report.true_water_source_score AS auditor_score,
  water_quality.subjective_quality_score AS employee_score
FROM
  auditor_report
JOIN
  visits ON auditor_report.location_id = visits.location_id
JOIN
  water_quality ON visits.record_id = water_quality.record_id
WHERE
  auditor_report.true_water_source_score = water_quality.subjective_quality_score
  AND visits.visit_count = 1
LIMIT 20;

 * mysql+pymysql://root:***@localhost:3306/md_water_services
20 rows affected.


location_id,record_id,auditor_score,employee_score
SoRu34980,5185,1,1
AkRu08112,59367,3,3
AkLu02044,37379,0,0
AkHa00421,51627,3,3
SoRu35221,28758,0,0
HaAm16170,31048,1,1
AkRu04812,1513,3,3
AkRu08304,1218,3,3
AkRu05107,8322,2,2
HaDe16541,13070,2,2


With the duplicates removed we now get 1518. What does this mean considering the auditor visited 1620 sites? I think that is an excellent result. 1518/1620 = 94% of the records the auditor checked were correct! But that means that 102 records are incorrect. So let's look at those.

#### Lets check for the incorrect records

In [18]:
%%sql

SELECT
  visits.location_id AS location_id,
  visits.record_id,
  auditor_report.true_water_source_score AS auditor_score,
  water_quality.subjective_quality_score AS employee_score
FROM
  auditor_report
JOIN
  visits ON auditor_report.location_id = visits.location_id
JOIN
  water_quality ON visits.record_id = water_quality.record_id
WHERE
  auditor_report.true_water_source_score != water_quality.subjective_quality_score
  AND visits.visit_count = 1
LIMIT 20;

 * mysql+pymysql://root:***@localhost:3306/md_water_services
20 rows affected.


location_id,record_id,auditor_score,employee_score
AkRu05215,21160,3,10
KiRu29290,7938,3,10
KiHa22748,43140,9,10
SoRu37841,18495,6,10
KiRu27884,33931,1,10
KiZu31170,17950,9,10
KiZu31370,36864,3,10
AkRu06495,45924,2,10
HaRu17528,30524,1,10
SoRu38331,13192,3,10


Since we used some of this data in our previous analyses, we need to make sure those results are still valid, now we know some of them are incorrect. We didn't use the scores that much, but we relied a lot on the type_of_water_source, so let's check if there are any errors there

So, to do this, we need to grab the type_of_water_source column from the water_source table and call it survey_source, using the source_id column to JOIN. Also select the type_of_water_source from the auditor_report table, and call it auditor_source.

In [20]:
%%sql

SELECT 
    v.location_id, 
    ar.type_of_water_source AS auditor_source, 
    ws.type_of_water_source AS survey_source, 
    v.record_id, 
    ar.true_water_source_score AS auditor_score,
    wq.subjective_quality_score AS employee_score
FROM 
    visits v
JOIN 
    water_source ws ON v.source_id = ws.source_id
JOIN 
    auditor_report ar ON v.location_id = ar.location_id
JOIN 
    water_quality wq ON v.record_id = wq.record_id
WHERE
  ar.true_water_source_score != wq.subjective_quality_score
  AND v.visit_count = 1
LIMIT 10;

 * mysql+pymysql://root:***@localhost:3306/md_water_services
10 rows affected.


location_id,auditor_source,survey_source,record_id,auditor_score,employee_score
AkRu05215,well,well,21160,3,10
KiRu29290,shared_tap,shared_tap,7938,3,10
KiHa22748,tap_in_home_broken,tap_in_home_broken,43140,9,10
SoRu37841,shared_tap,shared_tap,18495,6,10
KiRu27884,well,well,33931,1,10
KiZu31170,tap_in_home_broken,tap_in_home_broken,17950,9,10
KiZu31370,shared_tap,shared_tap,36864,3,10
AkRu06495,well,well,45924,2,10
HaRu17528,well,well,30524,1,10
SoRu38331,shared_tap,shared_tap,13192,3,10


So what I can see is that the types of sources look the same! So even though the scores are wrong, the integrity of the type_of_water_source data we analysed last time is not affected.

We can now remove the columns and JOIN statement for water_sources again.

# 3. Linking records to employees

Next up, let's look at where these errors may have come from. At some of the locations, employees assigned scores incorrectly, and those records
 ended up in this results set

 I think there are two reasons this can happen.
 1. These workers are all humans and make mistakes so this is expected.
 2. Unfortunately, the alternative is that someone assigned scores incorrectly on purpose!.

In either case, the employees are the source of the errors, so let's JOIN the assigned_employee_id for all the people on our list from the visits table to our query. Our query shows 102 incorrect records, so when we join the employee data, we can see which employees made these incorrect records

In [23]:
%%sql

SELECT
    v.location_id,
    v.record_id,
    e.employee_name,
    ar.true_water_source_score AS auditor_score,
    wq.subjective_quality_score AS employee_score
FROM
    visits v
JOIN
    auditor_report ar ON v.location_id = ar.location_id
JOIN
    water_quality wq ON v.record_id = wq.record_id
JOIN
    employee e ON v.assigned_employee_id = e.assigned_employee_id
WHERE
    ar.true_water_source_score != wq.subjective_quality_score
    AND v.visit_count = 1
LIMIT 20;

 * mysql+pymysql://root:***@localhost:3306/md_water_services
20 rows affected.


location_id,record_id,employee_name,auditor_score,employee_score
AkRu05215,21160,Rudo Imani,3,10
KiRu29290,7938,Bello Azibo,3,10
KiHa22748,43140,Bello Azibo,9,10
SoRu37841,18495,Rudo Imani,6,10
KiRu27884,33931,Bello Azibo,1,10
KiZu31170,17950,Zuriel Matembo,9,10
KiZu31370,36864,Yewande Ebele,3,10
AkRu06495,45924,Bello Azibo,2,10
HaRu17528,30524,Jengo Tumaini,1,10
SoRu38331,13192,Zuriel Matembo,3,10


##### Well this query is massive and complex, so maybe it is a good idea to save this as a CTE, so when we do more analysis, we can just call that CTE like it was a table. Let's call it Incorrect_records.

In [25]:
%%sql

WITH Incorrect_records AS (
    SELECT
        v.location_id,
        v.record_id,
        e.employee_name,
        ar.true_water_source_score AS auditor_score,
        wq.subjective_quality_score AS employee_score
    FROM
        visits v
    JOIN
        water_source ws ON v.source_id = ws.source_id
    JOIN
        auditor_report ar ON v.location_id = ar.location_id
    JOIN
        water_quality wq ON v.record_id = wq.record_id
    JOIN
        employee e ON v.assigned_employee_id = e.assigned_employee_id
    WHERE
        ar.true_water_source_score != wq.subjective_quality_score
        AND v.visit_count = 1
)
SELECT *
FROM Incorrect_records
LIMIT 10;

 * mysql+pymysql://root:***@localhost:3306/md_water_services
10 rows affected.


location_id,record_id,employee_name,auditor_score,employee_score
AkRu05215,21160,Rudo Imani,3,10
KiRu29290,7938,Bello Azibo,3,10
KiHa22748,43140,Bello Azibo,9,10
SoRu37841,18495,Rudo Imani,6,10
KiRu27884,33931,Bello Azibo,1,10
KiZu31170,17950,Zuriel Matembo,9,10
KiZu31370,36864,Yewande Ebele,3,10
AkRu06495,45924,Bello Azibo,2,10
HaRu17528,30524,Jengo Tumaini,1,10
SoRu38331,13192,Zuriel Matembo,3,10


#### Let's retrieve the employee_name column from the Incorrect_records table and counts the number of occurrences for each unique employee name using the COUNT(*) function.

In [26]:
%%sql

WITH  error_count AS (
    SELECT
        v.location_id,
        v.record_id,
        e.employee_name,
        ar.true_water_source_score AS auditor_score,
        wq.subjective_quality_score AS employee_score
    FROM
        visits v
    JOIN
        water_source ws ON v.source_id = ws.source_id
    JOIN
        auditor_report ar ON v.location_id = ar.location_id
    JOIN
        water_quality wq ON v.record_id = wq.record_id
    JOIN
        employee e ON v.assigned_employee_id = e.assigned_employee_id
    WHERE
        ar.true_water_source_score != wq.subjective_quality_score
        AND v.visit_count = 1
)
SELECT
    employee_name,
    COUNT(*) AS number_of_mistakes
FROM
     error_count
GROUP BY
    employee_name
ORDER BY
    number_of_mistakes DESC;

 * mysql+pymysql://root:***@localhost:3306/md_water_services
17 rows affected.


employee_name,number_of_mistakes
Bello Azibo,26
Malachi Mavuso,21
Zuriel Matembo,17
Lalitha Kaburi,7
Rudo Imani,5
Farai Nia,4
Enitan Zuri,4
Yewande Ebele,3
Jengo Tumaini,3
Makena Thabo,3


It looks like some of our surveyors are making a lot of "mistakes" while many of the other surveyors are only making a few. 

# 4. Gathering some evidence
Ok, so thinking about this a bit. How would we go about finding out if any of our employees are corrupt?

Let's say all employees make mistakes, if someone is corrupt, they will be making a lot of "mistakes", more than average, for example. But someone could just be clumsy, so we should try to get more evidence.

#### Let's start by cleaning up our code a bit. First

Incorrect_records is a result we'll be using for the rest of the analysis, but it makes the query a bit less readable. So, let's convert it to a VIEW. We can then use it as if it was a table. It will make our code much simpler to read.

So, replace WITH with CREATE VIEW:

In [27]:
%%sql

CREATE VIEW Incorrect_records AS (
 SELECT
     auditor_report.location_id,
     visits.record_id,
     employee.employee_name,
     auditor_report.true_water_source_score AS auditor_score,
     wq.subjective_quality_score AS employee_score,
     auditor_report.statements AS statements
 FROM
     auditor_report
 JOIN
     visits
     ON auditor_report.location_id = visits.location_id
 JOIN
     water_quality AS wq
     ON visits.record_id = wq.record_id
 JOIN
     employee
     ON employee.assigned_employee_id = visits.assigned_employee_id
 WHERE
     visits.visit_count =1
     AND auditor_report.true_water_source_score != wq.subjective_quality_score);

 * mysql+pymysql://root:***@localhost:3306/md_water_services
0 rows affected.


[]

In [29]:
%%sql

SELECT * 
FROM Incorrect_records
LIMIT 10;

 * mysql+pymysql://root:***@localhost:3306/md_water_services
10 rows affected.


location_id,record_id,employee_name,auditor_score,employee_score,statements
AkRu05215,21160,Rudo Imani,3,10,"Villagers admired the official's visit for its respectful interactions, hard work, and genuine concern."
KiRu29290,7938,Bello Azibo,3,10,"A young artist sketches the faces in the queue, capturing the weariness of daily hours spent waiting for water."
KiHa22748,43140,Bello Azibo,9,10,"A young girl's hopeful eyes are clouded by mistrust, her innocence tarnished by the corrupt system."
SoRu37841,18495,Rudo Imani,6,10,"The official's respectful and diligent presence was met with heartfelt appreciation, creating a sense of closeness with the villagers."
KiRu27884,33931,Bello Azibo,1,10,"A traditional healer's empathy turns to bitterness, knowing that corrupt practices harm her community."
KiZu31170,17950,Zuriel Matembo,9,10,"A community leader stood with his people, expressing concern for the water quality and the time lost in queues."","""
KiZu31370,36864,Yewande Ebele,3,10,"With a keen understanding of urban challenges, the official's visit left a lasting impression of respect and commitment."
AkRu06495,45924,Bello Azibo,2,10,"A healthcare worker in the queue expressed fears about water-borne diseases, her face etched with worry."","""
HaRu17528,30524,Jengo Tumaini,1,10,"With humility and diligence, the official formed bonds with the villagers that felt like genuine family connections."
SoRu38331,13192,Zuriel Matembo,3,10,"An unsettling atmosphere surrounded the official, as villagers shared their experiences of arrogance and lack of dedication. The mention of cash exchanges only intensified their doubts."


#### Next, we convert the query error_count, we made earlier, into a CTE. 

In [30]:
%%sql

 WITH error_count AS (   -- This CTE calculates the number of mistakes each employee made
 SELECT
     employee_name,
     COUNT(employee_name) AS number_of_mistakes
 FROM
     Incorrect_records -- a view that joins audit report to the database for records where the auditor and employees scores are different
 GROUP BY
     employee_name
     ORDER BY number_of_mistakes DESC)     -- Query
                                                                        
 SELECT * FROM error_count;

 * mysql+pymysql://root:***@localhost:3306/md_water_services
17 rows affected.


employee_name,number_of_mistakes
Bello Azibo,26
Malachi Mavuso,21
Zuriel Matembo,17
Lalitha Kaburi,7
Rudo Imani,5
Farai Nia,4
Enitan Zuri,4
Yewande Ebele,3
Jengo Tumaini,3
Makena Thabo,3


#### Now we calculate the average of the number_of_mistakes in error_count.

In [32]:
%%sql

WITH error_count AS (
    SELECT
        employee_name,
        COUNT(employee_name) AS number_of_mistakes
    FROM
        Incorrect_records
    GROUP BY
        employee_name
)
SELECT
    ROUND(AVG(number_of_mistakes)) AS avg_error_count_per_empl
FROM
    error_count;

 * mysql+pymysql://root:***@localhost:3306/md_water_services
1 rows affected.


avg_error_count_per_empl
6


#### To find the employees who made more mistakes than the average person, we need the employee's names, the number of mistakes each one made, and filter the employees with an above-average number of mistakes.

In [33]:
%%sql

WITH error_count AS (
    SELECT
        employee_name,
        COUNT(employee_name) AS number_of_mistakes
    FROM
        Incorrect_records
    GROUP BY
        employee_name
)
SELECT
    employee_name,
    number_of_mistakes
FROM 
    error_count
WHERE number_of_mistakes > 
    (
    SELECT
        AVG(number_of_mistakes) AS avg_error_count_per_empl
    FROM
        error_count
    );

 * mysql+pymysql://root:***@localhost:3306/md_water_services
4 rows affected.


employee_name,number_of_mistakes
Bello Azibo,26
Zuriel Matembo,17
Malachi Mavuso,21
Lalitha Kaburi,7


These are the employees who made more mistakes, on average, than their peers, so let's have a closer look at them.

We should look at the Incorrect_records table again, and isolate all of the records these four employees gathered. We should also look at the statements for these records to look for patterns.

First, convert the suspect_list to a CTE, so we can use it to filter the records from these four employees. 

In [34]:
%%sql

WITH error_count AS (
    SELECT
        employee_name,
        COUNT(employee_name) AS number_of_mistakes
    FROM
        Incorrect_records
    GROUP BY
        employee_name
),
suspect_list AS (
SELECT
    employee_name,
    number_of_mistakes
FROM 
    error_count
WHERE number_of_mistakes > 
    (
    SELECT
        AVG(number_of_mistakes) AS avg_error_count_per_empl
    FROM
        error_count
    )
)


-- This query filters all of the records where the "corrupt" employees gathered data.
SELECT
    employee_name,
    location_id,
    statements
FROM
    Incorrect_records
WHERE
    employee_name in (SELECT employee_name FROM suspect_list);

 * mysql+pymysql://root:***@localhost:3306/md_water_services
71 rows affected.


employee_name,location_id,statements
Bello Azibo,KiRu29290,"A young artist sketches the faces in the queue, capturing the weariness of daily hours spent waiting for water."
Bello Azibo,KiHa22748,"A young girl's hopeful eyes are clouded by mistrust, her innocence tarnished by the corrupt system."
Bello Azibo,KiRu27884,"A traditional healer's empathy turns to bitterness, knowing that corrupt practices harm her community."
Zuriel Matembo,KiZu31170,"A community leader stood with his people, expressing concern for the water quality and the time lost in queues."","""
Bello Azibo,AkRu06495,"A healthcare worker in the queue expressed fears about water-borne diseases, her face etched with worry."","""
Zuriel Matembo,SoRu38331,"An unsettling atmosphere surrounded the official, as villagers shared their experiences of arrogance and lack of dedication. The mention of cash exchanges only intensified their doubts."
Malachi Mavuso,AmAm09607,Villagers spoke of an unsettling encounter with an official who appeared dismissive and detached. The reference to cash transactions added to their growing sense of distrust.
Zuriel Matembo,AkHa00314,"A street vendor's sales suffer from time spent waiting, her concern for the water's quality affecting her products."
Malachi Mavuso,KiRu26598,"A teenager's dreams are tempered by reality, her future threatened by the corrupt practices she sees around her."
Bello Azibo,KiIs23853,Villagers' wary accounts of an official's arrogance and detachment from their concerns raised suspicions. The mention of cash changing hands further tainted their perception.


If we have a look, we notice some alarming statements about these four officials (look at these records: AkRu04508, AkRu07310, KiRu29639, AmAm09607, for example. See how the word "cash" is used a lot in these statements.
Lets 
Filter the records that refer to "cash"

In [35]:
%%sql

WITH error_count AS (
    SELECT
        employee_name,
        COUNT(employee_name) AS number_of_mistakes
    FROM
        Incorrect_records
    GROUP BY
        employee_name
),
suspect_list AS (
SELECT
    employee_name,
    number_of_mistakes
FROM 
    error_count
WHERE number_of_mistakes > 
    (
    SELECT
        AVG(number_of_mistakes) AS avg_error_count_per_empl
    FROM
        error_count
    )
)


-- This query filters all of the records where the "corrupt" employees gathered data.
SELECT
    employee_name,
    location_id,
    statements
FROM
    Incorrect_records
WHERE
    employee_name in (SELECT employee_name FROM suspect_list) AND statements LIKE '%cash%';

 * mysql+pymysql://root:***@localhost:3306/md_water_services
19 rows affected.


employee_name,location_id,statements
Zuriel Matembo,SoRu38331,"An unsettling atmosphere surrounded the official, as villagers shared their experiences of arrogance and lack of dedication. The mention of cash exchanges only intensified their doubts."
Malachi Mavuso,AmAm09607,Villagers spoke of an unsettling encounter with an official who appeared dismissive and detached. The reference to cash transactions added to their growing sense of distrust.
Bello Azibo,KiIs23853,Villagers' wary accounts of an official's arrogance and detachment from their concerns raised suspicions. The mention of cash changing hands further tainted their perception.
Bello Azibo,HaSe21323,Villagers spoke of an unsettling encounter with an official who appeared dismissive and detached. The reference to cash transactions added to their growing sense of distrust.
Zuriel Matembo,AkRu05880,Villagers' wary accounts of an official's arrogance and detachment from their concerns raised suspicions. The allusion to cash changing hands deepened their skepticism.
Bello Azibo,KiRu27065,Villagers expressed their discomfort with an official who displayed a haughty demeanor and negligence. The mention of cash transactions deepened their growing sense of unease.
Malachi Mavuso,KiRu25347,Villagers expressed their discontent with an official who appeared dismissive and neglectful. The mention of cash changing hands added to their growing sense of distrust.
Zuriel Matembo,SoIl32575,Villagers recounted unsettling encounters with an official known for their arrogance and avoidance of responsibilities. The mention of cash changing hands added to their apprehension and distrust.
Bello Azibo,AkRu04508,"An unsettling atmosphere surrounded the official, as villagers shared their experiences of arrogance and lack of dedication. The mention of cash exchanges only intensified their doubts."
Lalitha Kaburi,AkRu07310,"Villagers spoke of their unsettling encounters with an official who seemed indifferent and uninterested, hinting at potential improprieties involving cash exchanges."


So we can sum up the evidence we have for Zuriel Matembo, Malachi Mavuso, Bello Azibo and Lalitha Kaburi:
 1. They all made more mistakes than their peers on average.
 2. They all have incriminating statements made against them, and only them.
