# **Project: Predicting the Condition of Water Wells in Tanzania**

## 1. Business Understanding

### Background:

  Access to clean and functional water is a critical challenge in Tanzania, where over 57 million people depend on water wells.Ensuring water wells remain functional can significantly improve the quality of life, reduce waterborne diseases, and support local economies.
  However, many wells fall into disrepair or become non-functional due to preventable issues. With over 50,000 recorded wells, the ability to predict well functionality can help optimize resource allocation for maintenance and repairs.

### Problem Statement:

  NGOs and government bodies currently rely on limited, manual assessments to determine well conditions. This approach
 is time-consuming and prone to inefficiencies. A predictive model could provide an automated, data-driven solution,
 enabling stakeholders to prioritize interventions effectively.

### Objectives:

 1. Develop a classification model to predict whether a water well is Functional, Needs Repair, or Non-functional.
 2. Identify the key factors contributing to well condition and recommend actionable strategies to improve well functionality.
 3. Compare multiple machine learning models (Logistic Regression, Decision Tree, and Random Forest) to determine the best-performing algorithm.
 4. Deliver insights to stakeholders, including feature importance and predictions, to inform policy and maintenance strategies.

### Audience:
This project targets:
  - NGOs focused on water access and sustainable development.
  - The Tanzanian government, seeking to improve public infrastructure and water security.
  - Data scientists interested in real-world applications of classification models for social impact.


### **1. Import Required Libraries**

In [2]:
## 1. Import Required Libraries
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.model_selection import train_test_split, GridSearchCV
from sklearn.tree import DecisionTreeClassifier
from sklearn.linear_model import LogisticRegression
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import accuracy_score, f1_score, confusion_matrix, classification_report, roc_curve, auc, precision_recall_curve, average_precision_score