# Record-Setting Temperatures

In talking with people about global warming (which I prefer to call Chaos Weather, as it seems more accurate to me), they often point out record cold temperatures as a sign against climate change. Ignoring the question of whether an increase in record low winter temperatures would prove or disprove climate change, I decided to explore whether there truly are more records being set now than a century ago.

I decided that there were a number of different kinds of records which might work for my research:
- High temperatures
- Low temperatures
- High precipitation (taking into account both rain and snow)
- Droughts
- High pressure readings
- Low pressure readings
- Wind speeds (maximum gust and/or maximum sustained)
- Storms (tornadoes and hurricanes)

When I first began to research, I found a NOAA data source (C00781) that showed weather record temperatures, which I believed would save me a lot of time that I could use in locating additional data. However, in attempting to download this database, I was informed that it is not actually a database, but instead a piece of software on their server that finds whether a record was set on a particular day. There is no digital file available.

After downloading and starting to process temperature files from NOAA (processed and summarized on kaggle), I discovered that each year's data file contained 1-40 million lines of data. I had originally intended to analyze 20 years (as well as additional types of data) from each decade, but computing and time constraints has limited that to 5 years (and only temperature data).

Because of the size of the files, I will attach my file processing code in a separate .py file so it doesn't freeze up your computer as you read this summary. In that function, I read in the data from all of the daily data lines, save and process the info needed:
- Mean high temperatures
    - Also separated by hemisphere
- Mean low temperatures
    - Also separated by hemisphere
- Record high temperatures
    - Also separated by hemisphere
- Record low temperatures
    - Also separated by hemisphere

And since I have the data loaded (and don't want to load it all again), I also separate out the data needed for pmf and cdf charts.

The first step for analyzing data is to load the packages that will be needed.

In [1]:
import matplotlib.pyplot as plt; plt.rcdefaults()
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from statistics import mean

The next step is to look at the histograms of the mean temperatures. I knew that I would want a bunch of histograms, so I created a function to plot them:

In [2]:
def plot_my_hist(obj,md,yp,ylab,title):
    #display chart
    plt.bar(yp, md, align='center')
    plt.xticks(yp, obj)
    plt.ylabel(ylab)
    plt.title(title)
    plt.show()

Then I charted the low temperatures and high temperatures:

In [3]:
def create_histograms():
    # Set up data for charting
    my_months = ('Jan','Feb','Mar','Apr','May','Jun','Jul','Aug','Sep',
                  'Oct','Nov','Dec')
    y_pos = np.arange(12)
    yaxis_label='Temperature in Degrees F'

    my_data = pd.read_csv(r'I:\NOAAData\OriginalMeanLows.csv',
                          names=['Month','Temp'])
    to_chart = my_data['Temp'].to_list()
    plot_my_hist(my_months,to_chart,y_pos,yaxis_label,
                 'Mean Low Temps at Chosen Stations')

    my_data = pd.read_csv(r'I:\NOAAData\OriginalMeanHighs.csv',
                          names=['Month','Temp'])
    to_chart = my_data['Temp'].to_list()
    plot_my_hist(my_months,to_chart,y_pos,yaxis_label,
                 'Mean High Temps at Chosen Stations')

When I saw how flat the charts were, I realized that I would need to separate out the northern and southern hemispheres since the two sets of seasons run opposite and they're "balancing each other" to display that flat chart. Here are the four charts:

In [4]:
def create_histograms():
    # Set up data for charting
    my_months = ('Jan','Feb','Mar','Apr','May','Jun','Jul','Aug','Sep',
                  'Oct','Nov','Dec')
    y_pos = np.arange(12)
    yaxis_label='Temperature in Degrees F'

    # Print Nothern Hemisphere charts
    my_data = pd.read_csv(r'I:\NOAAData\NorthHemMeanLows.csv',
                          names=['Month','Temp'])
    to_chart = my_data['Temp'].to_list()
    plot_my_hist(my_months,to_chart,y_pos,yaxis_label,
                 'Mean Low Temps at Northern Hemisphere Stations')

    my_data = pd.read_csv(r'I:\NOAAData\NorthHemMeanHighs.csv',
                          names=['Month','Temp'])
    to_chart = my_data['Temp'].to_list()
    plot_my_hist(my_months,to_chart,y_pos,yaxis_label,
                 'Mean High Temps at Northern Hemisphere Stations')

    # Print Southern Hemisphere charts
    my_data = pd.read_csv(r'I:\NOAAData\SouthHemMeanLows.csv',
                          names=['Month','Temp'])
    to_chart = my_data['Temp'].to_list()
    plot_my_hist(my_months,to_chart,y_pos,yaxis_label,
                 'Mean Low Temps at Southern Hemisphere Stations')

    my_data = pd.read_csv(r'I:\NOAAData\SouthHemMeanHighs.csv',
                          names=['Month','Temp'])
    to_chart = my_data['Temp'].to_list()
    plot_my_hist(my_months,to_chart,y_pos,yaxis_label,
                 'Mean High Temps at Southern Hemisphere Stations')

Then I realized that if I shifted the temperatures for the southern hemisphere so that the high temperatures lined up with the high temps in the northern hemisphere, I would get a much more comprehensive look at the real temperature distribution:

In [5]:
def create_histograms():
    # Set up data for charting
    com_months = ('Jan/Jul','Feb/Aug','Mar/Sep','Apr/Oct','May/Nov','Jun/Dec',
                  'Jul/Jan','Aug/Feb','Sep/Mar','Oct/Apr','Nov/May','Dec/Jun')
    y_pos = np.arange(12)
    yaxis_label='Temperature in Degrees F'

    # Print corrected charts for all stations
    my_data = pd.read_csv(r'I:\NOAAData\NorthHemMeanLows.csv',
                          names=['Month','Temp'])
    to_chart_lows = my_data['Temp'].to_list()
    plot_my_hist(com_months,to_chart_lows,y_pos,yaxis_label,
                 'Mean Low Temps at Selected Stations')

    my_data = pd.read_csv(r'I:\NOAAData\NorthHemMeanHighs.csv',
                          names=['Month','Temp'])
    to_chart_highs = my_data['Temp'].to_list()
    plot_my_hist(com_months,to_chart_highs,y_pos,yaxis_label,
                 'Mean High Temps at Selected Stations')

The next step is normally to have a look at the data a little more closely, but these calculations kept crashing my computer.

I was able to pull the data into another processing file and was able to find the following information:

Percent of records set for high temperatures by century:
- 1900's: 72.22%
- 2000's: 27.78%

Percent of records set for low temperatures by century:
- 1900's: 25.00%
- 2000's: 75.00%

So you can see that there is a big difference between high temperature records and low temperature records. With a better computer, it would be possible to determine if this is statistically significant or not.