# Introduction 📝
🎯 **Goal:** identify the air pollution measurements over time, based on basic weather information (temperature and humidity) and the input values of 5 sensors.

📖 **Data:** 

> **train.csv / test.csv** - the training and testing set
> - ```date_time``` - Timestamp of the recording
> - ```relative_humidity``` - Relative humidity also measures water vapor but RELATIVE to the temperature of the air
> - ```absolute_humidity``` - Absolute humidity is the measure of water vapor (moisture) in the air,        regardless of temperature
> - ```sensor_1-sensor_5``` - sensor values
> - ```target_carbon_monoxide``` - Target Carbon Monoxide reading
> - ```target_benzene``` - Target Benzene Reading
> - ```target_nitrogen_oxides``` - Target Nitrogen Oxide Reading

📌 **Note:** ```target_carbon_monoxide```, ```target_benzene``` and ```target_nitrogen_oxides``` are blank in the test set.


🧪 **Evaluation metric:** Root Mean Squared Logarithmic Error (RMSE)
> $$RMSE = \sqrt{\frac{1}{n} \sum_{i=1}^n (\log(p_i + 1) - \log(a_i+1))^2 },$$
> where 
> * 𝑛 is the total number of observations
> * 𝑝𝑖 is your prediction
> * 𝑎𝑖 is the actual value
> * log(𝑥) is the natural logarithm of 𝑥

In [None]:
import os
import numpy as np 
import pandas as pd 
import matplotlib.pyplot as plt
import seaborn as sns
import missingno as msno
import re
import time
import spacy

### Read Data

In [None]:
train_data=pd.read_csv('../input/tabular-playground-series-jul-2021/train.csv')
test_data=pd.read_csv('../input/tabular-playground-series-jul-2021/test.csv')
sample_submission=pd.read_csv('../input/tabular-playground-series-jul-2021/sample_submission.csv')

In [None]:
train_data.drop(['deg_C'], axis=1, inplace=True)
test_data.drop(['deg_C'], axis=1, inplace=True)

### Missing values🔮

There are no missing values for train and test data

In [None]:
palette = ["#7209B7","#3F88C5","#136F63","#F72585","#FFBA08"]
msno.bar(train_data,color=palette[2], sort="ascending", figsize=(10,5), fontsize=12)
plt.show()

msno.bar(test_data,color=palette[2], sort="ascending", figsize=(10,5), fontsize=12)
plt.show()

The train set contains data from 10th of March 2010 to 1st of Jan 2011

In [None]:
print(train_data['date_time'].min())
print(train_data['date_time'].max())

The test set contains data from 1st of Jan 2011 to 4th of April 2011

In [None]:
print(test_data['date_time'].min())
print(test_data['date_time'].max())

**We are going to use H2O Automl to start with**

In [None]:
import h2o
print(h2o.__version__)
from h2o.automl import H2OAutoML

h2o.init(max_mem_size='16G')

In [None]:
%%time
train = h2o.H2OFrame(train_data)
test = h2o.H2OFrame(test_data)

Model 1 is to predict **target_carbon_monoxide**

In [None]:
x = train.columns
y1 = 'target_carbon_monoxide'
y2 = 'target_benzene'
y3= 'target_nitrogen_oxides'

x=[x for x in x if x not in [y1, y2, y3]]



In [None]:
aml = H2OAutoML(max_runtime_secs = 3500, seed = 1, project_name = "target_carbon_monoxide_automl")
aml.train(x = x, y = y1, training_frame = train)

In [None]:
import gc
gc.collect()

In [None]:
aml1 = H2OAutoML(max_runtime_secs = 3500, seed = 1, project_name = "target_benzene_automl")
aml1.train(x = x, y = y2, training_frame = train)

In [None]:
import gc
gc.collect()

In [None]:
aml2 = H2OAutoML(max_runtime_secs = 3500, seed = 1, project_name = "target_nitrogen_oxides_automl")
aml2.train(x = x, y = y3, training_frame = train)

In [None]:
import gc
gc.collect()

**Find the best models for each of the targets*

In [None]:
lb = aml.leaderboard
lb.head() 

In [None]:
lb = aml1.leaderboard
lb.head() 

In [None]:
lb = aml2.leaderboard
lb.head() 

**Make Predictions**

In [None]:
pred1 = aml.predict(test)
pred1.head()

In [None]:
pred2 = aml1.predict(test)
pred2.head()

In [None]:
pred3 = aml2.predict(test)
pred3.head()

**Make Submissions**

In [None]:
test_data['target_carbon_monoxide']=pred1.as_data_frame().values
test_data['target_benzene']=pred2.as_data_frame().values
test_data['target_nitrogen_oxides']=pred3.as_data_frame().values

In [None]:
req_data=test_data[['date_time','target_carbon_monoxide','target_benzene','target_nitrogen_oxides']]
req_data.to_csv('submission.csv', index=False)