# Predictive Modeling: Tanzanian Water Wells

## Business Understanding

### Overview

This project uses data on various water wells in Tanzania and attempts to build a predictive model that can discern between operational and non-operational water wells. The data contains information on each water well such as longitude, latitude, funder, management, pump type, and much more. The Tanzanian government can use this analysis and predictive model in order to decide where to allocate funding for water wells.

### Business Problem

Tanzania is a developing country with a population of over 57 million people. This country struggles to provide its large population with clean water. However, there are many water wells throughout the country. The Tanzanian government needs a way to predict if these water wells are operational or non-operational. Due to the nature of this problem, predicting a water well is operational when in reality it is not, is more costly than predicting a water well is non-operational when in reality it is. 
## Data Understanding

### Data Sources

The data used for this project comes from [DrivenData](https://www.drivendata.org/) and can be downloaded [here](https://www.drivendata.org/competitions/7/pump-it-up-data-mining-the-water-table/page/23/).

## Data Preparation

### Data Cleaning

In [133]:
#Importing everything needed
from sklearn.model_selection import cross_validate
from sklearn.model_selection import train_test_split
from sklearn.tree import DecisionTreeClassifier, plot_tree
from missingpy import MissForest
import pandas as pd
import matplotlib.pyplot as plot
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

In [134]:
#Loading in the data as DataFrames
training_labels = pd.read_csv('Data/training_set_labels.csv')
training_values = pd.read_csv('Data/training_set_values.csv')

### Turning a Ternary Classification Problem into a Binary Classification Problem

`status_group` is the target group. As the data is given there are 3 possible values in this column:
1. functional
2. functional needs repair
3. non functional

According to the descriptions of these values, 'functional' and 'functional needs repair' are both considered to be operational while 'non functional' is considered non-operational. Therefore, it was decided to group 'functional' and 'functional needs repair' under operational. Leaving 'non functional' on its own and now labeled as non-operational.

In [135]:
training_labels.head()

Unnamed: 0,id,status_group
0,69572,functional
1,8776,functional
2,34310,functional
3,67743,non functional
4,19728,functional


In [136]:
#Changes the values in the 'status_group' column
training_labels['status_group'] = training_labels['status_group'].map({
                                                                        'non functional': 'non operational', 
                                                                        'functional': 'operational', 
                                                                         'functional needs repair': 'operational'})

### Merging `training_labels` with `training_values`

The data was downloaded in 2 different files. `training_labels` contains a unique identifier and the target column 'status_group'. `training_values` contains a matching unique identifier column along with the rest of the data. The data needs to be merged in order to perform a proper train/test split later.

In [137]:
#Showing a preview of 'training_labels' after the values have been changed
training_labels.head()

Unnamed: 0,id,status_group
0,69572,operational
1,8776,operational
2,34310,operational
3,67743,non operational
4,19728,operational


In [138]:
#Showing a preview of 'training_values'
training_values.head()

Unnamed: 0,id,amount_tsh,date_recorded,funder,gps_height,installer,longitude,latitude,wpt_name,num_private,...,payment_type,water_quality,quality_group,quantity,quantity_group,source,source_type,source_class,waterpoint_type,waterpoint_type_group
0,69572,6000.0,2011-03-14,Roman,1390,Roman,34.938093,-9.856322,none,0,...,annually,soft,good,enough,enough,spring,spring,groundwater,communal standpipe,communal standpipe
1,8776,0.0,2013-03-06,Grumeti,1399,GRUMETI,34.698766,-2.147466,Zahanati,0,...,never pay,soft,good,insufficient,insufficient,rainwater harvesting,rainwater harvesting,surface,communal standpipe,communal standpipe
2,34310,25.0,2013-02-25,Lottery Club,686,World vision,37.460664,-3.821329,Kwa Mahundi,0,...,per bucket,soft,good,enough,enough,dam,dam,surface,communal standpipe multiple,communal standpipe
3,67743,0.0,2013-01-28,Unicef,263,UNICEF,38.486161,-11.155298,Zahanati Ya Nanyumbu,0,...,never pay,soft,good,dry,dry,machine dbh,borehole,groundwater,communal standpipe multiple,communal standpipe
4,19728,0.0,2011-07-13,Action In A,0,Artisan,31.130847,-1.825359,Shuleni,0,...,never pay,soft,good,seasonal,seasonal,rainwater harvesting,rainwater harvesting,surface,communal standpipe,communal standpipe


In [139]:
#Merging on the two DataFrames on 'id'
df = training_values.merge(training_labels, on='id')

### Dropping Unnecessary/Unusable Columns
These columns were dropped because they either had too many null values, placeholder values, or was missing information to give the values meaning.

In [140]:
#Dropping the unique identifier column
df.drop("id", axis = 1,inplace=True)

In [141]:
#Creating a list of all the columns to drop from the DataFrame
columns_to_drop = ["amount_tsh", "num_private", "recorded_by", "payment_type", "extraction_type", "extraction_type_group", 
                   "water_quality", "quantity_group", "scheme_name"]
#Dropping all the columns from the DataFrame using the list above
df_small = df.drop(columns_to_drop, axis = 1)

### Replacing all Placeholder Values with Null Values

There were many columns with placeholder values such as 0 or 'none'. These were replaced with null values in order to be imputed later.

In [142]:
#Replacing all placeholder values with null values
df_small_small = df_small.replace({'none': None,'unknown' : None, -2.00E-08: None, "0": None})
df_small_small["district_code"].replace({0: None}, inplace=True)
df_small_small["population"].replace({0: None}, inplace=True)
df_small_small["construction_year"].replace({0: None}, inplace=True)

Decided it is not a good idea to impute values for longitude or latitude and there are some null values in the latitude column. It is not a large amount of data lost if theses nulls are removed from the data so they were removed.

In [143]:
#Finding the amount of nulls in the latitude column
df_small_small['latitude'].isna().sum()

1812

In [144]:
#Dropping the nulls from latitude
df_small_small.dropna(subset=['latitude'],inplace=True)

### Preparing Data for Imputation
MissForest is the imputer used for this project. MissForest can handle categorical data but the values must be numbers. Strings will not pass through the imputer. There is a lot of categorical data with strings as values. We created a function that turns these strings into an integer that corresponds to its rank in a .value_counts() in order to pass it through the MissForest imputer.

In [145]:
df_small_small.head()

Unnamed: 0,date_recorded,funder,gps_height,installer,longitude,latitude,wpt_name,basin,subvillage,region,...,management_group,payment,quality_group,quantity,source,source_type,source_class,waterpoint_type,waterpoint_type_group,status_group
0,2011-03-14,Roman,1390,Roman,34.938093,-9.856322,,Lake Nyasa,Mnyusi B,Iringa,...,user-group,pay annually,good,enough,spring,spring,groundwater,communal standpipe,communal standpipe,operational
1,2013-03-06,Grumeti,1399,GRUMETI,34.698766,-2.147466,Zahanati,Lake Victoria,Nyamara,Mara,...,user-group,never pay,good,insufficient,rainwater harvesting,rainwater harvesting,surface,communal standpipe,communal standpipe,operational
2,2013-02-25,Lottery Club,686,World vision,37.460664,-3.821329,Kwa Mahundi,Pangani,Majengo,Manyara,...,user-group,pay per bucket,good,enough,dam,dam,surface,communal standpipe multiple,communal standpipe,operational
3,2013-01-28,Unicef,263,UNICEF,38.486161,-11.155298,Zahanati Ya Nanyumbu,Ruvuma / Southern Coast,Mahakamani,Mtwara,...,user-group,never pay,good,dry,machine dbh,borehole,groundwater,communal standpipe multiple,communal standpipe,non operational
4,2011-07-13,Action In A,0,Artisan,31.130847,-1.825359,Shuleni,Lake Victoria,Kyanyamisa,Kagera,...,other,never pay,good,seasonal,rainwater harvesting,rainwater harvesting,surface,communal standpipe,communal standpipe,operational


In [146]:
# this function takes columns from the df and turns the value into numbers in order to impute nulls with MissForest
def transform_columns(dataframe, columns):
    #creating new df that will have integers instead of strings for values
    transformed_df = pd.DataFrame()
    
    #loops through each column in the df given and assigns the strings an integer based off the rank in .value_counts()
    for column in columns:
        unique_vals = dataframe[column].value_counts().index
        string_to_numbers = dataframe[column].replace(to_replace=unique_vals, value=list(range(len(unique_vals))))
        transformed_df[column] = string_to_numbers
        
    return transformed_df

### Using the Function

Running the function in the cell below takes a few minutes so we saved it as a .csv file in order to quickly access it.
The below cell is commented out but shows where `transformed_df` is coming from and can be ran if needed.

In [147]:
#Using the function above on all categorical columns

#transformed_df = transform_columns(df_small_small, ['funder', 'installer', 'wpt_name', 'basin', 'subvillage', 'region', 
                                                    #'region_code', 'district_code', 'lga', 'ward', 'public_meeting', 
                                                    #'scheme_management', 'scheme_name', 'permit', 'extraction_type_class',
                                                    #'management', 'management_group', 'payment', 'quality_group', 
                                                    #'quantity', 'source', 'source_type', 'source_class','waterpoint_type',
                                                    #'waterpoint_type_group', 'status_group'])

### .csv File

working_df.csv is the file saved from `transformed_df`

In [148]:
#Loading in working_df.csv as transformed_df
transformed_df = pd.read_csv('Dev_Notebooks/working_df.csv')
transformed_df

Unnamed: 0.1,Unnamed: 0,funder,installer,wpt_name,basin,subvillage,region,region_code,district_code,lga,...,source_class,waterpoint_type,waterpoint_type_group,status_group,date_recorded,gps_height,longitude,latitude,population,construction_year
0,0,33.0,87.0,,6,1877.0,0,0,4.0,35,...,0.0,0.0,0.0,0,2011,1390,34.938093,-9.856322,109.0,1999.0
1,1,133.0,160.0,1.0,1,2382.0,14,14,1.0,20,...,1.0,0.0,0.0,0,2013,1399,34.698766,-2.147466,280.0,2010.0
2,2,450.0,15.0,1374.0,0,0.0,18,17,3.0,79,...,1.0,2.0,0.0,0,2013,686,37.460664,-3.821329,250.0,2009.0
3,3,7.0,32.0,29483.0,7,253.0,17,20,14.0,108,...,0.0,2.0,0.0,1,2013,263,38.486161,-11.155298,58.0,1986.0
4,4,1057.0,58.0,0.0,1,8810.0,6,5,0.0,17,...,1.0,0.0,0.0,0,2011,0,31.130847,-1.825359,,
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
57583,59395,13.0,9.0,33426.0,0,3259.0,2,2,4.0,31,...,0.0,0.0,0.0,0,2013,1210,37.169807,-3.253847,125.0,1999.0
57584,59396,248.0,303.0,7490.0,2,345.0,0,0,3.0,0,...,1.0,0.0,0.0,0,2011,1212,35.249991,-9.070629,56.0,1996.0
57585,59397,,,142.0,2,6611.0,1,1,6.0,30,...,0.0,1.0,1.0,0,2011,0,34.017087,-8.750434,,
57586,59398,710.0,573.0,23415.0,2,159.0,12,11,3.0,71,...,0.0,1.0,1.0,0,2011,0,35.861315,-6.378573,,


### Removing Extra Column

When saving it, there was an exta column added in the beginning that needs to be dropped.

In [149]:
#Dropping unnecessary column
transformed_df = transformed_df.drop('Unnamed: 0', axis=1)
transformed_df

Unnamed: 0,funder,installer,wpt_name,basin,subvillage,region,region_code,district_code,lga,ward,...,source_class,waterpoint_type,waterpoint_type_group,status_group,date_recorded,gps_height,longitude,latitude,population,construction_year
0,33.0,87.0,,6,1877.0,0,0,4.0,35,531,...,0.0,0.0,0.0,0,2011,1390,34.938093,-9.856322,109.0,1999.0
1,133.0,160.0,1.0,1,2382.0,14,14,1.0,20,131,...,1.0,0.0,0.0,0,2013,1399,34.698766,-2.147466,280.0,2010.0
2,450.0,15.0,1374.0,0,0.0,18,17,3.0,79,1497,...,1.0,2.0,0.0,0,2013,686,37.460664,-3.821329,250.0,2009.0
3,7.0,32.0,29483.0,7,253.0,17,20,14.0,108,634,...,0.0,2.0,0.0,1,2013,263,38.486161,-11.155298,58.0,1986.0
4,1057.0,58.0,0.0,1,8810.0,6,5,0.0,17,1373,...,1.0,0.0,0.0,0,2011,0,31.130847,-1.825359,,
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
57583,13.0,9.0,33426.0,0,3259.0,2,2,4.0,31,31,...,0.0,0.0,0.0,0,2013,1210,37.169807,-3.253847,125.0,1999.0
57584,248.0,303.0,7490.0,2,345.0,0,0,3.0,0,271,...,1.0,0.0,0.0,0,2011,1212,35.249991,-9.070629,56.0,1996.0
57585,,,142.0,2,6611.0,1,1,6.0,30,70,...,0.0,1.0,1.0,0,2011,0,34.017087,-8.750434,,
57586,710.0,573.0,23415.0,2,159.0,12,11,3.0,71,852,...,0.0,1.0,1.0,0,2011,0,35.861315,-6.378573,,


### Combining all Columns into `transformed_df`

Now that the categorical columns have been changed to numbers, the numerical columns can be merged back into the same DataFrame.

In [150]:
#Adding all numerical columns onto 'transformed_df'
transformed_df['date_recorded'] = df_small_small['date_recorded']
transformed_df['gps_height'] = df_small_small['gps_height']
transformed_df['longitude'] = df_small_small['longitude']
transformed_df['latitude'] = df_small_small['latitude']
transformed_df['population'] = df_small_small['population']
transformed_df['construction_year'] = df_small_small['construction_year']

### The Data is Clean and Ready for Train/Test Split

For the split the target column is 'status_group' and will be used for `y`. All other columns will be used as featues and will be set to `X`. A random state of 33 is used and a test size slightly larger than defualt at 0.3.

In [151]:
#Splitting the column into the target and features
X = transformed_df.drop('status_group', axis=1)
y = transformed_df['status_group']

#Creating the 4 different groups resulting from a train/test split with a test size of 0.3
X_train, X_test, y_train, y_test = train_test_split(X, y, random_state=33, test_size=0.3)

### Now the Data is Split, It's Time for the Imputer


In [152]:
#Instantiate the MissForest
imputer = MissForest(random_state=33, max_depth=1)

### Imputing `X_train`

Running this takes a long time so it was saved as a .csv file for quick access. So this cell is commented out but can be ran if needed.

In [153]:
#The cat_vars parameter is telling MissForest which columns are categorical

#X_train_imputed = imputer.fit(X_train, cat_vars=[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 
                                                 #21, 22, 23, 24])
#X_train_imputed = imputer.transform(X_train)

In [154]:
#Loading in the saved .csv file as X_train_imputed
X_train_imputed = pd.read_csv('Dev_Notebooks/X_train_imputed.csv')
X_train_imputed

Unnamed: 0.1,Unnamed: 0,0,1,2,3,4,5,6,7,8,...,20,21,22,23,24,25,26,27,28,29
0,0,16.0,159.0,10914.0,5.0,25.0,12.0,11.0,4.0,68.0,...,2.0,0.0,2.0,0.0,2011.0,0.0,35.891855,-6.153545,330.952508,1998.387233
1,1,12.0,2.0,22561.0,4.0,16380.0,7.0,7.0,0.0,12.0,...,0.0,0.0,0.0,0.0,2013.0,1260.0,30.914468,-3.326810,530.000000,1993.000000
2,2,12.0,50.0,4266.0,8.0,8951.0,16.0,15.0,3.0,103.0,...,3.0,1.0,0.0,0.0,2013.0,2137.0,31.631254,-7.863417,750.000000,1984.000000
3,3,1.0,4.0,7719.0,6.0,16557.0,8.0,8.0,2.0,18.0,...,0.0,0.0,0.0,0.0,2013.0,462.0,34.831606,-11.319762,96.000000,1992.000000
4,4,0.0,0.0,26690.0,2.0,6903.0,3.0,3.0,3.0,27.0,...,1.0,0.0,1.0,1.0,2011.0,295.0,36.624641,-8.410004,400.000000,1976.000000
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
40306,40306,1198.0,0.0,14653.0,0.0,157.0,10.0,9.0,4.0,80.0,...,2.0,0.0,0.0,0.0,2011.0,52.0,38.973581,-5.375739,12.000000,1995.000000
40307,40307,28.0,59.0,2.0,0.0,8.0,2.0,2.0,2.0,11.0,...,3.0,1.0,0.0,0.0,2013.0,500.0,38.078320,-4.480761,140.000000,2013.000000
40308,40308,395.0,2.0,26850.0,5.0,2428.0,3.0,3.0,5.0,25.0,...,3.0,1.0,0.0,0.0,2011.0,520.0,37.560400,-6.917776,1.000000,1985.000000
40309,40309,1.0,4.0,28238.0,2.0,210.0,8.0,8.0,4.0,21.0,...,3.0,1.0,2.0,0.0,2013.0,844.0,36.122400,-10.463274,250.000000,1982.000000


### Dropping Unnecessary Column

Once again when saving the DataFrame as a .csv file an extra column was added.

In [155]:
X_train_imputed = X_train_imputed.drop('Unnamed: 0', axis=1)
X_train_imputed

Unnamed: 0,0,1,2,3,4,5,6,7,8,9,...,20,21,22,23,24,25,26,27,28,29
0,16.0,159.0,10914.0,5.0,25.0,12.0,11.0,4.0,68.0,331.0,...,2.0,0.0,2.0,0.0,2011.0,0.0,35.891855,-6.153545,330.952508,1998.387233
1,12.0,2.0,22561.0,4.0,16380.0,7.0,7.0,0.0,12.0,66.0,...,0.0,0.0,0.0,0.0,2013.0,1260.0,30.914468,-3.326810,530.000000,1993.000000
2,12.0,50.0,4266.0,8.0,8951.0,16.0,15.0,3.0,103.0,1003.0,...,3.0,1.0,0.0,0.0,2013.0,2137.0,31.631254,-7.863417,750.000000,1984.000000
3,1.0,4.0,7719.0,6.0,16557.0,8.0,8.0,2.0,18.0,421.0,...,0.0,0.0,0.0,0.0,2013.0,462.0,34.831606,-11.319762,96.000000,1992.000000
4,0.0,0.0,26690.0,2.0,6903.0,3.0,3.0,3.0,27.0,473.0,...,1.0,0.0,1.0,1.0,2011.0,295.0,36.624641,-8.410004,400.000000,1976.000000
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
40306,1198.0,0.0,14653.0,0.0,157.0,10.0,9.0,4.0,80.0,819.0,...,2.0,0.0,0.0,0.0,2011.0,52.0,38.973581,-5.375739,12.000000,1995.000000
40307,28.0,59.0,2.0,0.0,8.0,2.0,2.0,2.0,11.0,518.0,...,3.0,1.0,0.0,0.0,2013.0,500.0,38.078320,-4.480761,140.000000,2013.000000
40308,395.0,2.0,26850.0,5.0,2428.0,3.0,3.0,5.0,25.0,539.0,...,3.0,1.0,0.0,0.0,2011.0,520.0,37.560400,-6.917776,1.000000,1985.000000
40309,1.0,4.0,28238.0,2.0,210.0,8.0,8.0,4.0,21.0,673.0,...,3.0,1.0,2.0,0.0,2013.0,844.0,36.122400,-10.463274,250.000000,1982.000000


### Need Column Names

The imputer took care of the null values but the column names were lost in the process.

In [156]:
#Renaming all the columns
X_train_imputed.rename(columns={'0': 'funder', '1': 'installer', '2': 'wpt_name', '3': 'basin', '4': 'subvillage',
                                  '5': 'region', '6': 'region_code', '7': 'district_code', '8': 'lga', '9': 'ward',
                                  '10': 'public_meeting', '11': 'scheme_management', '12': 'permit',
                                  '13': 'extraction_type_class', '14': 'management', '15': 'management_group', '16': 
                                  'payment', '17': 'quality_group', '18': 'quantity', '19': 'source', '20': 'source_type',
                                  '21': 'source_class', '22': 'waterpoint_type', '23': 'waterpoint_type_group', '24': 
                                  'date_recorded', '25': 'gps_height', '26': 'longitude', '27': 'latitude', '28': 'population',
                                  '29': 'construction_year'}, inplace=True)

### Change All Categorical Columns Back to Strings

The numbers in all of the categorical columns don't make any sense right now but we can turn them back to what they were.

In [157]:
def revert_back_to_strings(df, columns):
    #Creates a copy of the DataFrame so it isn't overwriting the original
    df_copy = df.copy()
    
    #looping through all columns given and changes the number back to the string it represents based off a dictionary zipped
    #from one list of the .value_counts() from df_small_small and another list that creates an index of the first
    for col in columns:
        column_vc = list(df_small_small[col].value_counts().index)
        column_rank = list(range(len(column_vc)))
    
        column_vc_rank = dict(zip(column_rank, column_vc))
    
        df_copy[col] = df_copy[col].replace(column_vc_rank)
    
    return df_copy

### Creating a List of Columns to Feed the Function

In [158]:
#Creating a list of all column names in X_train_imputed
revert_columns = list(X_train_imputed.columns)
#Only taking the categorical columns from that list
revert_columns = revert_columns[:24]

### Using the Function

In [159]:
#Using the function on all categorical columns in X_train_imputed
X_train_imputed = revert_back_to_strings(X_train_imputed, revert_columns)
X_train_imputed

### Imputing `X_test`

This is commented out because it takes a long time to run. It was saved as a .csv file for easier access but can be ran if needed.

In [161]:
#X_test_imputed = imputer.transform(X_test)

### Loading in `X_test_imputed`

Once again an unnecessary column was added and now it will be dropped.

In [162]:
X_test_imputed = pd.read_csv('Dev_Notebooks/X_test_imputed.csv')

In [163]:
#Dropping the unnecessary column
X_test_imputed = X_test_imputed.drop('Unnamed: 0', axis=1)

In [164]:
#Renaming all the columns
X_test_imputed.rename(columns={'0': 'funder', '1': 'installer', '2': 'wpt_name', '3': 'basin', '4': 'subvillage',
                                  '5': 'region', '6': 'region_code', '7': 'district_code', '8': 'lga', '9': 'ward',
                                  '10': 'public_meeting', '11': 'scheme_management', '12': 'permit',
                                  '13': 'extraction_type_class', '14': 'management', '15': 'management_group', '16': 
                                  'payment', '17': 'quality_group', '18': 'quantity', '19': 'source', '20': 'source_type',
                                  '21': 'source_class', '22': 'waterpoint_type', '23': 'waterpoint_type_group', '24': 
                                  'date_recorded', '25': 'gps_height', '26': 'longitude', '27': 'latitude', '28': 'population',
                                  '29': 'construction_year'}, inplace=True)

### Using `revert_back_to_strings` Function for `X_test_imputed`

In [165]:
X_test_imputed = revert_back_to_strings(X_test_imputed, revert_columns)
X_test_imputed

Unnamed: 0,funder,installer,wpt_name,basin,subvillage,region,region_code,district_code,lga,ward,...,source_type,source_class,waterpoint_type,waterpoint_type_group,date_recorded,gps_height,longitude,latitude,population,construction_year
0,Government Of Tanzania,DWE,Area Three Namba 10,Lake Nyasa,Mtanga,Mbeya,12.0,4.0,Rungwe,Mwaya,...,spring,groundwater,communal standpipe,communal standpipe,2011.0,0.0,33.624613,-9.216957,329.925974,1991.243283
1,Kaemp,DWE,Nasimgeni Shule,Rufiji,Kibomonche,Morogoro,5.0,4.0,Ulanga,Itete,...,river/lake,surface,communal standpipe,communal standpipe,2011.0,331.0,36.414944,-8.654660,280.000000,2006.000000
2,Kiliwater,Kiliwater,Malwilo Primary Tank 2,Pangani,Dusala,Kilimanjaro,3.0,1.0,Rombo,Kazazi,...,spring,groundwater,communal standpipe,communal standpipe,2013.0,1012.0,37.664298,-3.268569,1.000000,2012.000000
3,Oxfarm,OXFARM,Ubeti Dip No 3,Lake Tanganyika,Hedaru A,Kigoma,16.0,2.0,Kasulu,Mapogoro,...,shallow well,groundwater,hand pump,hand pump,2013.0,1731.0,29.783005,-4.439251,450.000000,1995.000000
4,Netherlands,DWE,Ushirika,Lake Victoria,Mwahilo,Shinyanga,17.0,2.0,Maswa,Tingi,...,shallow well,groundwater,hand pump,hand pump,2012.0,0.0,33.586406,-3.040369,408.622687,1998.090157
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
17272,Danida,DWE,Kwa Ng'Onda,Lake Nyasa,Maweni,Ruvuma,10.0,2.0,Songea Rural,Moshono,...,shallow well,groundwater,hand pump,hand pump,2013.0,981.0,35.662256,-10.363942,800.000000,1990.000000
17273,Patuu,RDWS,Kwa Maneno Daudi,Rufiji,Mangala,Morogoro,5.0,1.0,Kilosa,Kidodi,...,river/lake,surface,communal standpipe,communal standpipe,2011.0,310.0,36.994441,-7.595949,150.000000,2008.000000
17274,African,SCOTT,Zahanati,Wami / Ruvu,Mwati,Morogoro,5.0,6.0,Mvomero,Mvomero,...,borehole,groundwater,communal standpipe,communal standpipe,2011.0,394.0,37.442462,-6.301995,1.000000,2009.000000
17275,District Council,District Council,Kwa Joeli Mege,Pangani,Kihara Kati,Kilimanjaro,3.0,2.0,Mwanga,Lembeni,...,spring,groundwater,communal standpipe,communal standpipe,2013.0,917.0,37.597108,-3.701546,67.000000,2013.000000
