<a href="https://colab.research.google.com/github/SrishBansal/colaab/blob/colab/YYYY_MM_DD_StudentName_CapstoneProject12.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Project 12: IoT Devices - Time Series Plots


### Context

IoT devices have been around for quite a while now. They are used to collect data through different kinds of sensors such as
Motion sensors: These use a visual sensor to detect a change in apparent temperature of surroundings  or when someone comes in the field of view of a camera.
Heat sensors: These are used in trucks which carry perishable goods like fish and milk where changes in a temperature lower the shelf life of the produce. These are also used to detect forest fires.

Vibration sensors: These are used in car crash tests and detecting if someone is trying to cut a tree.  

A lot of people are using smartwatches and fitness watches to track their daily physical activities, calories burnt, average resting heart rates, sleep cycle so that they lead a fit life. Such wearables are equipped with laser sensors to collect data.

Heat Index (temperature + humidity ) is one common data recorded on these IoT readers. The frequency of the upcoming data is very fast. The sensor reads hundreds to millions of data per second. There is a huge and versatile application of this data in real-world like agriculture, weather forecasting, soil monitoring and treatment, enterprise maintenance etc.

Heat stress index of India.

<img src='https://student-datasets-bucket.s3.ap-south-1.amazonaws.com/images/heat_index_india.png' width=600>


---

### Problem Statement

Put yourself in the shoes of a quality analyst whose task is to test the efficacy of new IoT devices. You need to create time-series plots for daily temperature variation for the given duration and find any inconsistencies in the temperature readings (if there are any).

In case the data collected through the device is correct, find the percentages of the yellow, orange and red zones.

---

#### Getting Started

Follow the steps described below to solve the project:

1. Click on the link provided below to open the Colab file for this project.
   
   https://colab.research.google.com/drive/1XhdWZL_DsL_8HWS58HZ2uU8C0PHQl2Yj?usp=sharing

2. Create the duplicate copy of the Colab file. Here are the steps to create the duplicate copy:

    - Click on the **File** menu. A new drop-down list will appear.

      <img src='https://student-datasets-bucket.s3.ap-south-1.amazonaws.com/images/project-share-images/0_file_menu.png' width=500>

    - Click on the **Save a copy in Drive** option. A duplicate copy will get created. It will open up in the new tab on your web browser.

      <img src='https://student-datasets-bucket.s3.ap-south-1.amazonaws.com/images/project-share-images/1_create_colab_duplicate_copy.png' width=500>

     - After creating the duplicate copy of the notebook, please rename it in the **YYYY-MM-DD_StudentName_CapstoneProject12** format.

3. Now, write your code in the prescribed code cells.

---

### Data Description

This dataset contains the temperature readings from an IoT device installed outside and inside of an anonymous room (labelled as admin room) to test the device. The readings were taken between 11 January 2018 and 10 December 2018. Additionally, it was uninstalled or taken down quite frequently during the entire reading period. There are 5 columns and 97,605 rows in the dataset.

1. `id` - unique IDs for each reading

2. `room_id/id` - room id in which device was installed (inside and/or outside). In this dataset, only `Room Admin` label is used as a `room_id` for example purpose.

3. `noted_date` - date and time of reading

4. `temp` - temperature readings

5. `out/in` - whether the reading was taken from a device installed inside or outside of the room?

Here's the dataset link:

https://student-datasets-bucket.s3.ap-south-1.amazonaws.com/whitehat-ds-datasets/iot-devices/IoT-device.csv

---

### Things To Do

- What is the trend in the variation in daily indoor and outdoor temperatures?

- What is the trend in the variation in monthly median indoor and outdoor temperatures?

- Find out the hottest and coldest month(s).

- Find the maximum and minimum temperatures recorded for each month.

- Find the hottest and coldest days for each month along with the temperatures.

- Get the percentage distribution of heat zones (green, yellow, orange and red) as per the heat index table shown in the **Context** section.

---

#### 1. Import Modules & Load Dataset

In [None]:
# Import the required modules and load the dataset.
import pandas as pd
import datetime as dt

In [None]:
# Get the information on DataFrame.
room_df = pd.read_csv('https://student-datasets-bucket.s3.ap-south-1.amazonaws.com/whitehat-ds-datasets/iot-devices/IoT-device.csv')
room_df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 97606 entries, 0 to 97605
Data columns (total 5 columns):
 #   Column      Non-Null Count  Dtype 
---  ------      --------------  ----- 
 0   id          97606 non-null  object
 1   room_id/id  97606 non-null  object
 2   noted_date  97606 non-null  object
 3   temp        97606 non-null  int64 
 4   out/in      97606 non-null  object
dtypes: int64(1), object(4)
memory usage: 3.7+ MB


---

#### 2. Missing Values Check

Check for the null values in the DataFrame.

In [None]:
# Check for the null values in the DataFrame.
room_df.isnull()

Unnamed: 0,id,room_id/id,noted_date,temp,out/in
0,False,False,False,False,False
1,False,False,False,False,False
2,False,False,False,False,False
3,False,False,False,False,False
4,False,False,False,False,False
...,...,...,...,...,...
97601,False,False,False,False,False
97602,False,False,False,False,False
97603,False,False,False,False,False
97604,False,False,False,False,False


**Q:** Are there any columns in the DataFrame containing the missing values? If yes, then provide the column names.

**A:**No there are no columns in the dataframe containng missing values.

---

#### 3. Drop Unnecessary Columns


Find out if there are unnecessary columns in the DataFrame. If there are any, then drop them from the DataFrame.

In [None]:
# Find out if there are unnecessary columns in the DataFrame. If there are any, then drop them from the DataFrame.
room_df.dropna()

Unnamed: 0,id,room_id/id,noted_date,temp,out/in
0,__export__.temp_log_196134_bd201015,Room Admin,08-12-2018 09:30,29,In
1,__export__.temp_log_196131_7bca51bc,Room Admin,08-12-2018 09:30,29,In
2,__export__.temp_log_196127_522915e3,Room Admin,08-12-2018 09:29,41,Out
3,__export__.temp_log_196128_be0919cf,Room Admin,08-12-2018 09:29,41,Out
4,__export__.temp_log_196126_d30b72fb,Room Admin,08-12-2018 09:29,31,In
...,...,...,...,...,...
97601,__export__.temp_log_91076_7fbd08ca,Room Admin,28-07-2018 07:07,31,In
97602,__export__.temp_log_147733_62c03f31,Room Admin,28-07-2018 07:07,31,In
97603,__export__.temp_log_100386_84093a68,Room Admin,28-07-2018 07:06,31,In
97604,__export__.temp_log_123297_4d8e690b,Room Admin,28-07-2018 07:06,31,In


---

#### 4. Get `datetime` Objects

Convert the values contained in the `noted_date` column into the `datetime` objects.

In [None]:
# Convert the values contained in the 'noted_date' column into the 'datetime' objects.
noted_data_dt = pd.to_datetime(room_df['noted_date'])

In [None]:
# Verify whether the conversion is successful or not.
noted_data_dt

0       2018-08-12 09:30:00
1       2018-08-12 09:30:00
2       2018-08-12 09:29:00
3       2018-08-12 09:29:00
4       2018-08-12 09:29:00
                ...        
97601   2018-07-28 07:07:00
97602   2018-07-28 07:07:00
97603   2018-07-28 07:06:00
97604   2018-07-28 07:06:00
97605   2018-07-28 07:06:00
Name: noted_date, Length: 97606, dtype: datetime64[ns]

---

#### 5. Sort The DataFrame

Sort the DataFrame in the chronological order.

In [None]:
# Sort the DataFrame in the increasing order of dates and time.
noted_data_dt.sort_values(ascending=True)

16218   2018-01-11 00:06:00
16217   2018-01-11 00:07:00
16216   2018-01-11 00:09:00
16215   2018-01-11 00:13:00
16214   2018-01-11 00:23:00
                ...        
50668   2018-12-10 23:41:00
50667   2018-12-10 23:43:00
50666   2018-12-10 23:49:00
50665   2018-12-10 23:51:00
50664   2018-12-10 23:55:00
Name: noted_date, Length: 97606, dtype: datetime64[ns]

---

#### 6. Add More Features

Get the year, month, day, day name, hours and minutes values from the `datetime` values and create new columns for the same.

In [None]:
# Create new columns for year, month, day, day name, hours and minutes values and add to the DataFrame.
date = room_df['noted_date']
print(date)

0        08-12-2018 09:30
1        08-12-2018 09:30
2        08-12-2018 09:29
3        08-12-2018 09:29
4        08-12-2018 09:29
               ...       
97601    28-07-2018 07:07
97602    28-07-2018 07:07
97603    28-07-2018 07:06
97604    28-07-2018 07:06
97605    28-07-2018 07:06
Name: noted_date, Length: 97606, dtype: object


In [None]:
# Display the first five rows of the DataFrame.
date.head()

0    08-12-2018 09:30
1    08-12-2018 09:30
2    08-12-2018 09:29
3    08-12-2018 09:29
4    08-12-2018 09:29
Name: noted_date, dtype: object

---

#### 7. Line Plots & Box Plots

Create line plots and box plots for the temperature recorded in the indoor and outdoor settings.

In [None]:
# Create a DataFrame for the indoor temperature records.
indoor_df = ()
for i in room_df['out/in'] :
  if i  == 'in' :
    indoor_df += i
indoor_df

()

In [None]:
# Create a time series line plot for the indoor temperature records.


In [None]:
# Create a DataFrame for the outdoor temperature records.


In [None]:
# Create a time series line plot for the outdoor temperature records.


In [None]:
# Compare the time series line plots for both the indoor and outdoor temperature records.


In [None]:
# Create a box plot to represent the distribution of indoor and outdoor temperatures for the whole year.


In [None]:
# Create a box plot to represent the monthly distribution of indoor and outdoor temperatures. Also label the x-axis with actual month names.


---

#### 8. Grouping, Aggregation & More Plots

Group the data by the indoor and outdoor temperatures. Also, get monthly mean, standard deviation, median, minimum and maximum values for both the indoor and outdoor groups.


In [None]:
# Group the data to get the monthly median indoor and outdoor temperatures along with the max and minimum temperatures.


In [None]:
# Create a line plot for the monthly median indoor temperatures.


In [None]:
# Create a line plot for the monthly median outdoor temperatures.


In [None]:
# Compare the monthly median indoor and outdoor temperatures.


In [None]:
# Create a bar plot for the monthly median indoor & outdoor temperatures in a single bar chart.


**Q:** Which months were the hottest and coldest months?

**A:**

In [None]:
# Get the maximum and minimum temperatures for each day in each month.


In [None]:
# Get the hottest day for each month along with the temperature.


---

#### 9. Heat Index

In [None]:
# Get the coldest day for each month along with the temperature.


In [None]:
# Create a function to label the each temperature value on a given day and time with the heat indices as advised in the data-description.


In [None]:
# Add the 'heat_index' column in the DataFrame containing the heat indices corresponding to the temperature values on a given day and time.


In [None]:
# Get the counts of the heat zones.


In [None]:
# Get the percentage distribution of the heat zones.


**Q:** Which zone has the most number of recordings and how much?

**A:**

---

### Submitting the Project

Follow the steps described below to submit the project.

1. After finishing the project, click on the **Share** button on the top right corner of the notebook. A new dialog box will appear.

  <img src='https://student-datasets-bucket.s3.ap-south-1.amazonaws.com/images/project-share-images/2_share_button.png' width=500>

2. In the dialog box, click on the **Copy link** button.

   <img src='https://student-datasets-bucket.s3.ap-south-1.amazonaws.com/images/project-share-images/3_copy_link.png' width=500>


3. The link of the duplicate copy (named as **YYYY-MM-DD_StudentName_CapstoneProject12**) of the notebook will get copied

   <img src='https://student-datasets-bucket.s3.ap-south-1.amazonaws.com/images/project-share-images/4_copy_link_confirmation.png' width=500>

4. Go to your dashboard and click on the **My Projects** option.

   <img src='https://student-datasets-bucket.s3.ap-south-1.amazonaws.com/images/project-share-images/5_student_dashboard.png' width=800>

   <img src='https://student-datasets-bucket.s3.ap-south-1.amazonaws.com/images/project-share-images/6_my_projects.png' width=800>

5. Click on the **View Project** button for the project you want to submit.

   <img src='https://student-datasets-bucket.s3.ap-south-1.amazonaws.com/images/project-share-images/7_view_project.png' width=800>

6. Click on the **Submit Project Here** button.

   <img src='https://student-datasets-bucket.s3.ap-south-1.amazonaws.com/images/project-share-images/8_submit_project.png' width=800>

7. Paste the link to the project file named as **YYYY-MM-DD_StudentName_CapstoneProject12** in the URL box and then click on the **Submit** button.

   <img src='https://student-datasets-bucket.s3.ap-south-1.amazonaws.com/images/project-share-images/9_enter_project_url.png' width=800>


---