# Spare-it DS701 Project

In [3]:
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

# 1. Client Overview and Problem Statement

## 1.1 About Spare-it


> **Spare-it: Pioneering Sustainable Waste Management with Cutting-Edge Technology**
- Spare-it is a leading sustainable waste management company, harnessing technology and environmental stewardship to help businesses, universities, and office owners reduce their environmental impact, save costs, and enhance efficiency. Its core strategy involves a robust technological framework, combining advanced hardware and software, strategic office design, and gamification to develop comprehensive waste management programs. These initiatives not only boost recycling but also aim to significantly cut waste production.
- The company’s mission centers on being a sustainability catalyst, driven by the belief that environmental awareness and behavior change require solid data intelligence. Spare-it provides real-time data on various waste management aspects, including general waste, recycling, and resource utilization like energy and water. This approach empowers organizations to make informed decisions and adopt effective waste reduction strategies.
- Spare-it's innovative approach, particularly through gamification, fosters a culture of environmental responsibility among employees and students, making waste management an engaging and meaningful endeavor. Ultimately, Spare-it positions itself as more than a waste management company; it's a collaborative partner in shaping a sustainable future, one step at a time.

## 1.2 Problem Statement

### **Task 1 :** Comparative Analysis of Manual (Fullness) Versus Scale Weight Measurements

> **Overview:** This detailed comparative analysis aims to meticulously examine the differences between manual weight measurements conducted by students and the measurements obtained from scales, as recorded in our dataset. The objective is to identify and analyze any notable discrepancies between these two methods. Additionally, this analysis will explore the potential causes of these variances, providing insights into the reliability and accuracy of manual versus scale measurements. This investigation is crucial for understanding the effectiveness of manual weight estimation techniques and the precision of scale measurements, ultimately contributing to the enhancement of data collection methods in weight measurement studies.

### **Task 2:** Noise and Signal Ratio Analysis for Scales

# 2. EDA

## 2.1 Data Overview

> We have two datasets:
- scale_records: This dataset comprises automated scale measurements of waste bins. It includes the weight data along with error codes and other metadata
- fullness_assessments: This dataset represents manual assessments of waste bin fullness. Each record includes details about the bin, its location, and the assessment specifics

In [4]:
fullness_assessments = pd.read_csv("/Users/ishan/Downloads/fullness-assessments CCDS as of Oct 13 2023.csv")
scale_records = pd.read_csv("/Users/ishan/Downloads/Spare-it V5 scale records Jan 1 to Oct 16 2023-001.csv")

## 2.2 Data Cleaning

### 2.2.1 Missing Values

In [5]:
print(fullness_assessments.isna().sum())
print(scale_records.isna().sum())

bin            0
date           0
account        0
building       0
floor          0
stationName    0
binName        0
category       0
fullness       0
dtype: int64
createdat            0
iotid                0
hide                 0
bin                  0
errorcode            0
weight               0
battery              0
updatedat            0
weightdiff    26979124
year                 0
month                0
day                  0
dtype: int64


### 2.2.2 Duplicate records

In [7]:
duplicates = scale_records[scale_records.duplicated()]
duplicates.shape # 11.6M Duplicates

(11684870, 12)

In [8]:
scale_records_clean = scale_records.drop_duplicates()

# 3. Comparative analysis of manual fullness and scale weight measurements

## 4. Bin weight anomaly detection

# 5. Limitations and potential risks