# 1 Scope: Are there differences if people perceive the thermal comfort in different building types?
As the individual assignment content, this notebook is written to try to investigate whether different building types influence people's thermal comfort, if so, how they affect the thermal comfort.


In this analysis, ASHRAE Global Thermal Comnfort Database II will be used which can be found on the project website: http://www.comfortdatabase.com/ and the **Pandas** (https://pandas.pydata.org/) and **Seaborn** (https://seaborn.pydata.org/) libraries will be adopted.

# 2 Overview of the Original Dataset

The codes automatically generated by Kaggle notebook will be kept.

In [None]:
# This Python 3 environment comes with many helpful analytics libraries installed
# It is defined by the kaggle/python Docker image: https://github.com/kaggle/docker-python
# For example, here's several helpful packages to load

import numpy as np # linear algebra
import pandas as pd # data processing, CSV file I/O (e.g. pd.read_csv)

# Input data files are available in the read-only "../input/" directory
# For example, running this (by clicking run or pressing Shift+Enter) will list all files under the input directory

import os
for dirname, _, filenames in os.walk('/kaggle/input'):
    for filename in filenames:
        print(os.path.join(dirname, filename))

# You can write up to 20GB to the current directory (/kaggle/working/) that gets preserved as output when you create a version using "Save & Run All" 
# You can also write temporary files to /kaggle/temp/, but they won't be saved outside of the current session

In [None]:
import matplotlib.pyplot as plt
from pandas.plotting import scatter_matrix
import seaborn as sns

In [None]:
os.chdir("/kaggle/input/ashrae-global-thermal-comfort-database-ii/")

In [None]:
data = pd.read_csv("ashrae_db2.01.csv")

In [None]:
data.head()

In [None]:
data.info()

## 2.1 Summarizing
I will apply some functions or methods to summarize the data to take a look into the original data.

### 2.1.1 Using `.describe()` function to summarize the statistical descriptors.
The columns that don't include `float` data types will not be shown using this function.

In [None]:
data.describe()

### 2.1.2 Getting overview of categorical columns.
For the columns that include data other than numerical types, I will use methods such as `.nunique()` and `.value_counts()`.

In [None]:
data['Country'].nunique()

By using `.nunique()` method, we can see the data was collected from 28 countries.

In [None]:
data['Country'].value_counts()

And `.value_counts()` method can tell the detail distribution of data in different categories. From the output above, we can see the data collected in UK is the largest one and that collected from Belgium is the smallest one.

In [None]:
data['Building type'].nunique()

In [None]:
data['Building type'].value_counts()

In terms of the building type, there are five building types in the whole dataset where data of "Office" is the largest one.

## 2.2 Reshaping
From the group project, we know the air temperature is the most significant impact factor on the thermal sensation. We can use `.pivot_table()` method by choosing "Building type" as the index, "Thermal sensation" as the pivoted vector and "Air temperature" as the value to get overview from different angle.

In [None]:
data['Thermal sensation'].nunique()

We know "Thermal sensation" column has too many unique values to summarize, so they can be rounded before pivoting the data.

In [None]:
data['Thermal sensation_rounded'] = data['Thermal sensation'].round(0)
data.head()

In [None]:
data['Thermal sensation_rounded'].value_counts()

Now we can pivot the dataframe more easily.

In [None]:
data_pivoted = data.pivot_table(index='Building type', columns='Thermal sensation_rounded', values='Air temperature (C)', aggfunc='mean')
data_pivoted

The raw dataset has been converted into one table showing the relationship between the air temperature and the thermal comfort in different types of buildings.

To get an intuitive view, the data will be visualized next.

# 3 Analysis by Visualization

In [None]:
data.info()

There is "Others" type in "Building type" column which we cannot specify properly, and data with "Senior center" type only collected from Australia and South Korea which is only 821 rows, so the rows with "Others" and "Senior center" building type will be dropped in visualization.

In [None]:
data_p = data[~data['Building type'].isin(['Others', 'Senior center'])]
data_p.info()

Additionally, the rows with null value in relevant columns will be dropped.

In [None]:
data_p_nona = data_p[["Country","Building type","Thermal sensation","Thermal sensation_rounded","Air temperature (C)"]].dropna()
data_p_nona.info()

Now we plot the boxplot purely between the building type and thermal sensation.

In [None]:
fig, ax = plt.subplots()
fig.set_size_inches(10,10)
plt.xticks(fontsize=15)
plt.yticks(fontsize=15)
ax.set_xlabel('Thermal sensation_rounded', fontsize=15)
ax.set_ylabel('Building type', fontsize=15)
sns.boxplot(y="Building type", x="Thermal sensation", data=data_p_nona)

Then we can expect some characteristics if we take air temperature into consideration.

In [None]:
fig, ax = plt.subplots()
fig.set_size_inches(15,15)
plt.xticks(fontsize=15)
plt.yticks(fontsize=15)
ax.set_xlabel('Thermal sensation_rounded', fontsize=15)
ax.set_ylabel('Air temperature (C)', fontsize=15)
sns.boxplot(x="Thermal sensation_rounded", y="Air temperature (C)", hue="Building type", data=data_p_nona)

Finally, we regenerate the pivoted table using cleaned data (without null value and "Others" building type).

In [None]:
neo_pivoted = data_p_nona.pivot_table(index='Building type', columns='Thermal sensation_rounded', values='Air temperature (C)', aggfunc='mean')
neo_pivoted

From the pivoted table and the boxplots, we can see in general, thermal comfort data collected falls in the thermal sensation 0-1. However, some patterns can still be found in particular building types when mixed with air temperature variable.

Basically, people in classroom tend to endure a more lower temperature in the cooler side and they feel hotter in a lower temperature compared to other building types. And the people in multifamily housing have a higher threshold temperature value than others.

When giving -3 thermal sensation, people in classrooms are at a lower temperature than other two types. A possible explanation may be the fact that there are usually relatively more people in classrooms so that the concentration of carbon dioxide is higher than other places, which keep people from feeling chilled. Another reason may be due to the higher average metobolism rates of the most common occupants in classroom (which are commonly teenagers or young adults) which enable them to endure a lower temperature.

At the warmer side, it is obvious to see people in multifamily housing can stay comfortable in relatively high temperature. For example, when the temperature has risen up to 28℃, people in multifamily housing tend to give around 1.0 thermal sensation, but such temperature is higher than that where people in classrooms or offices would vote 3.0 thermal sensation. Similar to the circumstances in classrooms, such characteristic may be because space in housing is shared by pretty much less people so the roomy space help occupants feel comfortable even if the actual air temperature is relatively higher than that in other public facilities.