# Hot Temperature Data Cleaning and Attribute Addition

This notebook will utilize prepared data to compute the "Feels Like" temperature.

This metric in weather prediction will add a weight to the high levels of humidity that can make a person lower the temperature in their thermostat.

My goal for this notebook is to add a new attribute to weather data and use the heat index equation to make a new temperature.

In [317]:
import pandas as pd
import numpy as np

# Load the data
miamiTotalData = pd.read_csv('WeatherData/HotnHumid/miamiComplete_User.csv')
# phoenixWeather = pd.read_csv('WeatherData/HotnDry/phoenix/phoenix_Weather_Solar.csv')

miamiTotalData.head()

Unnamed: 0.1,Unnamed: 0,Year,Month,Day,MaxTemp (F),MinTemp (F),Heat Index (F),Average Wind Speed (mph),Relative Humidity (%),Users Thermostat (F)
0,0,2015,1,1,84.0,66.0,95,4.7,85.12,78
1,1,2015,1,2,84.0,69.0,96,10.3,85.98,76
2,2,2015,1,3,83.0,74.0,92,15.0,84.27,76
3,3,2015,1,4,85.0,71.0,98,11.4,84.11,78
4,4,2015,1,5,85.0,67.0,99,7.2,86.44,76


In [313]:
miamiTotalData.insert(loc=6, column='Average Wind Speed (mph)', value=0, allow_duplicates=False)
# phoenixWeather.insert(loc=5, column='Average Wind Speed (mph)', value=0, allow_duplicates=False)

In [314]:
def ms_to_mph(wind_speed):
    return round(wind_speed * 2.23694, 1)

miamiTotalData['Average Wind Speed (mph)'] = miamiTotalData.apply(lambda row: ms_to_mph(row['Wind Speed (m/s)']), axis=1)
# phoenixWeather['Average Wind Speed (mph)'] = phoenixWeather.apply(lambda row: ms_to_mph(row['Wind Speed (m/s)']), axis=1)

miamiTotalData = miamiTotalData.drop(columns=['Wind Speed (m/s)'])
# phoenixWeather = phoenixWeather.drop(columns=['Wind Speed (m/s)'])

In [316]:
miamiTotalData.to_csv(path_or_buf='WeatherData/HotnHumid/miamiComplete_User.csv')

When dealing with temperatures in locations where there is a consistent high percentage of humidity, it is advised to use
the **heat index** equation. The **heat index** equation combines air temperature and relative humidity, in shaded areas, to posit a human-perceived equivalent temperature, as how hot it would feel if the humidity were some other value in the shade. 

The version of the [heat index equation](https://www.wpc.ncep.noaa.gov/html/heatindex_equation.shtml) I will use is published from the National Oceanic and Atmospheric Administration (NOAA).

The NOAA supplies two versions of the **heat index equation** that vary between thresholds for both temperatures and relative humidities.

It is recommended that the full analysis of the **heat index equation** starts with Steadman's regression and then tested, to be recomputed with
Rothfusz's regression

In [274]:
from math import sqrt, fabs

def rothfusz_heat_index_regression(temperature, relative_humidity):
    """[Computes the 'feels like' temperature based on Rothfusz's regression]

    Args:
        temperature (float): Temperature in Fahrenheit
        relative_humidity (float): Relative humidity in percentage

    Returns:
        float: Adjusted heat index temperature
    """
    T, RH = temperature, relative_humidity

    added_adjustment = ((RH - 85)/10) * ((87 - T)/5)
    subtracted_adjustment = ((13 - RH)/4) * sqrt(17 - fabs(T - 95) / 17)

    heat_index = (-42.379) + (2.04901523)*(T) + (10.14333127)*(RH) - (0.22475541)*(T)*(RH)
    heat_index = heat_index - (0.00683783)*(T)*(T) - (0.05481717)*(RH)*(RH) + (0.00122874)*(T)*(T)*(RH)
    heat_index = heat_index + (0.00085282)*(T)*(RH)*(RH) - (0.00000199)*(T)*(T)*(RH)*(RH)

    if ((RH < 13) and (80 < T < 112)):
        heat_index -= subtracted_adjustment
    elif ((RH > 85) and (80 < T < 87)):
        heat_index += added_adjustment

    return round(heat_index)

In [275]:
def steadman_heat_index_regression(temperature, relative_humidity):
    """[Computes the 'feels like' temperature based on Steadman's regression]

    Args:
        temperature (float): Temperature in Fahrenheit
        relative_humidity (float): Relative humidity in percentage

    Returns:
        float: Adjusted heat index temperature
    """
    T, RH = temperature, relative_humidity
    heat_index = (0.5) * (T + (61.0) + ((T - 68) * 1.2) + (RH * (0.094)))
    
    return round(heat_index)

In practice, the Steadman's regression is computed first and the result averaged with the temperature.

**If** this heat index value is 80°F or higher, the full regression equation along with any adjustment as described above is applied.

In [276]:
def heat_index(temperature, relative_humidity):
    """[Computes the 'feels like' temperature for usage]

    Args:
        temperature (float): Temperature in Fahrenheit
        relative_humidity (float): Relative humidity in percentage

    Returns:
        float: Adjusted heat index temperature based on NOAA's recommendation
    """
    T, RH = temperature, relative_humidity
    
    temp = steadman_heat_index_regression(temperature=T, relative_humidity=RH)
    steadman = (0.5) * (temp + T)

    rothfusz = rothfusz_heat_index_regression(temperature=T, relative_humidity=RH)

    if (steadman >= 80):
        return rothfusz
    
    return round(steadman)

However, I noticed that some of our data is incomplete, as some values are missing.

We will apply a fill function that will take the average of the temperatures and fill the empty cells.

In [277]:
phoenixWeather.isnull().sum()

Year                         0
Month                        0
Day                          0
Max Temp (F)                24
Min Temp (F)                37
Average Wind Speed (mph)     0
Relative Humidity (%)        0
dtype: int64

Here, we are able to see that there are missing values in the temperature columns.

Since we have a lot of data, it'd be best if we just fill the missing values with the mean value for each of these columns.

In [278]:
lasvegasWeather['Max Temp (F)'].fillna(round(lasvegasWeather['Max Temp (F)'].mean()), inplace=True)
lasvegasWeather['Min Temp (F)'].fillna(round(lasvegasWeather['Min Temp (F)'].mean()), inplace=True)

phoenixWeather['Max Temp (F)'].fillna(round(phoenixWeather['Max Temp (F)'].mean()), inplace=True)
phoenixWeather['Min Temp (F)'].fillna(round(phoenixWeather['Min Temp (F)'].mean()), inplace=True)

In [279]:
phoenixWeather.isnull().sum()

Year                        0
Month                       0
Day                         0
Max Temp (F)                0
Min Temp (F)                0
Average Wind Speed (mph)    0
Relative Humidity (%)       0
dtype: int64

In [280]:
# We should be able to see that all the values from 'Min Temp (F)' are now filled
lasvegasWeather.isnull().sum()

Year                        0
Month                       0
Day                         0
Max Temp (F)                0
Min Temp (F)                0
Average Wind Speed (mph)    0
Relative Humidity (%)       0
dtype: int64

In [281]:
# Creating a new attribute in the dataframe, this new attribute will be the "feels like" temperature
lasvegasWeather.insert(loc=7, column='Heat Index (F)', value=0, allow_duplicates=False)
phoenixWeather.insert(loc=7, column='Heat Index (F)', value=0, allow_duplicates=False)

Now we have added an attribute that will contain the **heat index** or **feels like** temperature. Our function to create the number will be applied to the whole column.

In [283]:
lasvegasWeather['Heat Index (F)'] = lasvegasWeather.apply(lambda row: heat_index(row['Max Temp (F)'], row['Relative Humidity (%)']), axis=1)

phoenixWeather['Heat Index (F)'] = phoenixWeather.apply(lambda row: heat_index(row['Max Temp (F)'], row['Relative Humidity (%)']), axis=1)

lasvegasWeather.tail()

Unnamed: 0,Year,Month,Day,Max Temp (F),Min Temp (F),Average Wind Speed (mph),Relative Humidity (%),Heat Index (F)
1820,2019,12,27,53.0,43.0,6.9,56,52
1821,2019,12,28,50.0,38.0,6.7,44,48
1822,2019,12,29,47.0,31.0,2.2,42,45
1823,2019,12,30,55.0,35.0,5.8,51,54
1824,2019,12,31,57.0,40.0,7.8,43,56


In [298]:
import random
from time import gmtime
from calendar import timegm
from math import floor

def user_heating(windchill_temperature):
    # Gets the systems milliseconds since the epoch
    mseconds = timegm(gmtime())
    # Sets the milliseconds as the random seed for generation
    rand_seed = random.seed(mseconds * random.random())

    min_heat = 60
    avg_heat = 66
    max_heat = 71
    if (windchill_temperature <= 45):
        return random.randrange(avg_heat, max_heat, 1)
    elif (windchill_temperature > 45):
        return random.randrange(min_heat, avg_heat, 1)

def user_cooling(feels_like_temperature):
    # Gets the systems milliseconds since the epoch
    mseconds = timegm(gmtime())
    # Sets the milliseconds as the random seed for generation
    rand_seed = random.seed(mseconds * random.random())

    min_cool = 74
    avg_cool = 79
    max_cool = feels_like_temperature
    if (feels_like_temperature >= 90):
        return random.randrange(min_cool, avg_cool, 1)
    elif (feels_like_temperature < 90):
        return random.randrange(avg_cool, max_cool, 1)

def user_normal(temperature):
    return floor(temperature)

def get_user_data(weighted_temperature):
    if (weighted_temperature <= 64):
        # The WHO recommends a minimum indoor temperature of 64 degrees Fahrenheit
        return user_heating(windchill_temperature=weighted_temperature) # Run the heat
    elif (weighted_temperature > 64 and weighted_temperature <= 79):
        return user_normal(weighted_temperature) # The house is at room temperature
    elif (weighted_temperature >= 80):
        return user_cooling(feels_like_temperature=weighted_temperature) # Otherwise run the A/C

In [299]:
# lasvegasWeather.insert(loc=8, column='Users Thermostat (F)', value=0, allow_duplicates=False)
# phoenixWeather.insert(loc=8, column='Users Thermostat (F)', value=0, allow_duplicates=False)

lasvegasWeather['Users Thermostat (F)'] = lasvegasWeather.apply(lambda row: get_user_data(row['Heat Index (F)']), axis=1)
phoenixWeather['Users Thermostat (F)'] = phoenixWeather.apply(lambda row: get_user_data(row['Heat Index (F)']), axis=1)

lasvegasWeather.head(20)

Unnamed: 0,Year,Month,Day,Max Temp (F),Min Temp (F),Average Wind Speed (mph),Relative Humidity (%),Heat Index (F),Users Thermostat (F)
0,2015,1,1,44.0,27.0,4.0,40,42,69
1,2015,1,2,48.0,26.0,4.0,40,46,61
2,2015,1,3,51.0,29.0,1.8,45,50,65
3,2015,1,4,51.0,31.0,2.5,52,50,60
4,2015,1,5,59.0,34.0,4.3,51,58,60
5,2015,1,6,68.0,38.0,5.4,53,68,68
6,2015,1,7,72.0,43.0,5.8,53,72,72
7,2015,1,8,69.0,43.0,2.7,50,68,68
8,2015,1,9,65.0,42.0,2.5,52,64,65
9,2015,1,10,65.0,47.0,2.5,66,64,64


In [304]:
lasvegasWeather.to_csv(path_or_buf='WeatherData/HotnDry/lasVegas/lasVegasComplete_User.csv')
phoenixWeather.to_csv(path_or_buf='WeatherData/HotnDry/phoenix/phoenixComplete_User.csv')