# Phase 1 Collection (11/04/2023 - 31/05/2023)

During phase 1 collection, 2 NDIR CO2 sensors (model: Sensiron SCD30) were used to sense the CO2 levels within the glass container. Both sensor modules are equipped with temperature and humidity sensors that allows them to quantify exactly how hot and humid the inside of the container is.

## Data Analysis

The data collected from this has several observed patterns:

\<We also want to add data visualisations here about these key observed patterns\>

1. **Daily:** Temperatures will rise accordingly based on how sunny that day is. The highest temperature reached on any given day is ___. The temperature required for decomposition of PolyTerra PLA is stated by the manufacturer to be ___. 
2. **Daily:** Humidity will be lowered in reverse proportion to the temperature, as the water ___ (does the water condensate off?). 
3. **Overall Trend:** There was no observed increase in CO2 throughout the experimentation period

Weight analysis: \<To be done\>

\<Add photos here of the probably decomposed plastics?\>

As such, we conclude that during phase 1, \<conclusion here\> PLA decomposition did/did not occur within the glass envelope.

## Process 1: ingestion of data

This process phase ingests and cleans multiple CSV files as data into a pandas array

In [16]:
import pandas as pd
import os

# ingest data
co2_directory = 'CO2'
li_co2 = []

for filename in os.listdir(co2_directory):
    f = os.path.join(co2_directory, filename)
    # checking if it is a file
    if os.path.isfile(f) and filename[-3:] == 'csv':
        li_co2.append(pd.read_csv(f, parse_dates=[3], encoding='utf-8'))

frame_co2 = pd.concat(li_co2, axis=0, ignore_index=True)

hum_directory = 'Humidity'
li_hum = []

for filename in os.listdir(hum_directory):
    f = os.path.join(hum_directory, filename)
    # checking if it is a file
    if os.path.isfile(f) and filename[-3:] == 'csv':
        li_hum.append(pd.read_csv(f, parse_dates=[3], encoding='utf-8'))

frame_hum = pd.concat(li_hum, axis=0, ignore_index=True)

temp_directory = 'Temperature'
li_temp = []

for filename in os.listdir(temp_directory):
    f = os.path.join(temp_directory, filename)
    # checking if it is a file
    if os.path.isfile(f) and filename[-3:] == 'csv':
        li_temp.append(pd.read_csv(f, parse_dates=[3], encoding='utf-8'))

frame_temp = pd.concat(li_temp, axis=0, ignore_index=True)

# clean data (strip metadata)
frame_co2 = frame_co2.drop(columns=['id', 'feed_id', 'lat', 'lon', 'ele'])
frame_hum = frame_hum.drop(columns=['id', 'feed_id', 'lat', 'lon', 'ele'])
frame_temp = frame_temp.drop(columns=['id', 'feed_id', 'lat', 'lon', 'ele'])

# localize the timezones
frame_co2['created_at'] = frame_co2['created_at'].dt.tz_convert('Asia/Singapore')
frame_hum['created_at'] = frame_hum['created_at'].dt.tz_convert('Asia/Singapore')
frame_temp['created_at'] = frame_temp['created_at'].dt.tz_convert('Asia/Singapore')

# make the datetime the index and dedupe the overlapping times
frame_co2 = frame_co2.set_index('created_at').drop_duplicates()
frame_hum = frame_hum.set_index('created_at').drop_duplicates()
frame_temp = frame_temp.set_index('created_at').drop_duplicates()


## Process 2: datetime processing

The dates and times have to be processed to be normalized, 5 minute intervals in order to facilitate data correlation between the 3 types of data. This resultant data can be pickled and cached on disk for rapid re-computation when tweaking the visualization algos.

In [17]:
# combine all into one big dataframe
frame_co2 = frame_co2.rename(columns = {'value':'co2'})
frame_temp = frame_temp.rename(columns = {'value':'temp'})
frame_hum = frame_hum.rename(columns = {'value':'hum'})

frame_final = pd.merge(frame_co2, frame_hum, how='outer', on='created_at')
frame_final = pd.merge(frame_final, frame_temp, how='outer', on='created_at')

frame_final.sort_index().to_csv('debug.csv') # this is just for debug to see what the data looks like rn