# Data Cleaning 

## Steps

1. **Import Required Libraries:**
   - Import the necessary libraries for data manipulation and visualization, including `os`, `pandas`, `matplotlib.pyplot`, and `seaborn`.

2. **Import Data Cleaning Functions:**
   - Import the `readCSV` and `cleanData` functions from `Data Cleaning.py` to use in your notebook.

3. **Read Data:**
   - Define file paths for each dataset.
   - Utilize the `readCSV` function to read the CSV files into pandas DataFrames.

4. **Data Cleaning:**
   - Call the `cleanData` function with the appropriate DataFrames as arguments to perform data cleaning operations.
   - This includes dropping NaN values, removing duplicates, and checking for outliers.

5. **Save Cleaned Data (Optional):**
   - Optionally, you can save the cleaned DataFrames to new CSV files for further analysis or sharing.

6. **Visualize Results (Optional):**
   - If applicable, generate visualizations such as pair plots to visualize the distribution of data points and identify outliers.
   - Display or save these plots within your notebook.

## Sample Code Template

```python
# Import required libraries
import os
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

# Import data cleaning functions
from DataCleaning import readCSV, cleanData

# Define file paths
currentDir = os.path.dirname(os.path.abspath(__file__))
BusinessPath = os.path.join(currentDir, "Data", "Businesses.csv")
IncomePath = os.path.join(currentDir, "Data", "Income.csv")
PollingPlacesPath = os.path.join(currentDir, "Data", "PollingPlaces2019.csv")
PopulationPath = os.path.join(currentDir, "Data", "Population.csv")
StopsPath = os.path.join(currentDir, "Data", "Stops.txt")

# Read CSV files
BusinessCSV = readCSV(BusinessPath)
IncomeCSV = readCSV(IncomePath)
PollingPlacesCSV = readCSV(PollingPlacesPath)
PopulationCSV = readCSV(PopulationPath)
StopsCSV = readCSV(StopsPath)

# Clean data
Business, Income, PollingPlace, Population, Stops = cleanData(BusinessCSV, IncomeCSV, PollingPlacesCSV, PopulationCSV, StopsCSV)

# Optional: Save cleaned data to new CSV files
# Business.to_csv('CleanedData/BusinessCleaned.csv', index=False)
# Income.to_csv('CleanedData/IncomeCleaned.csv', index=False)
# PollingPlace.to_csv('CleanedData/PollingPlaceCleaned.csv', index=False)
# Population.to_csv('CleanedData/PopulationCleaned.csv', index=False)
# Stops.to_csv('CleanedData/StopsCleaned.csv', index=False)

# Optional: Visualize results
# Pair plots for outlier detection
sns.pairplot(Business)
plt.show()
sns.pairplot(Income)
plt.show()
sns.pairplot(PollingPlace)
plt.show()
sns.pairplot(Population)
plt.show()
sns.pairplot(Stops)
plt.show()