# Exploring Data - the fun stuff!

## Reading in data

Today, I'm going to show you how to read in tabular data. Next week, you'll discuss a common data format called NetCDF that contains multidimensional data.

In [None]:
%matplotlib inline
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import datetime

* [NumPy](https://numpy.org/) - Numerical Python package for array/matrix data manipulation
* [Pandas](https://pandas.pydata.org/) - Panal data or tabular data reading and manipulation
* [matplotlib](https://matplotlib.org/) - plotting library
* [datetime](https://docs.python.org/3/library/datetime.html) - standard library time package
* [scipy](https://www.scipy.org/) - scientific Python (we'll import later)

## Christman Field Weather Station

[Christman Field real time data](https://www.atmos.colostate.edu/fccwx/fccwx_latest.php)

To read the data, we use Pandas. Specifically, the [`read_csv`](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.read_csv.html) function.

In [None]:
filename = './christman_field_20201010-20201014.csv'
christman_field = pd.read_csv(filename, header=0)

In [None]:
print(christman_field)

### To extact pieces of data from Pandas

For column data, the interface is similar to the Python dictionary key-value pair. Printing returns a Pandas DataSeries, which is essentially a NumPy array with metadata.

We also use the `to_datetime` function in Pandas to convert the date time strings into data information we can use.

In [None]:
time = pd.to_datetime(christman_field["DateTime"])
temperature = christman_field["Temp"]
solar = christman_field["Solar"]
wind = christman_field["Wind"]
print(time)
print(temperature)
print(solar)
print(wind)

With the arrays, we can calculate some basic statistics on the temperature and incoming solar radiation

In [None]:
print('Temperature (C)')
print('\tMax:', np.max(temperature))
print('\tMin:', np.min(temperature))
print('\tMean:', np.mean(temperature))
print('Wind (mph)')
print('\tMax:', np.max(wind))
print('\tMin:', np.min(wind))
print('\tMean:', np.mean(wind))
print('Incoming Solar Radiation (W/m2)')
print('\tMax:', np.max(solar))
print('\tMin:', np.min(solar))
print('\tMean:', np.mean(solar))

Let's visualize the data!

In [None]:
fig, ax = plt.subplots()
plt.plot(time, temperature, color='r')
plt.xlabel('Time')
plt.ylabel('Temperature (C)')
plt.xticks([datetime.date(2020, 10, 10), datetime.date(2020, 10, 11), datetime.date(2020, 10, 12), 
            datetime.date(2020, 10, 13), datetime.date(2020, 10, 14), datetime.date(2020, 10, 15)], rotation=45);
fig, ax = plt.subplots()
plt.plot(time, wind, color='b')
plt.xlabel('Time')
plt.ylabel('Wind (mph)')
plt.xticks([datetime.date(2020, 10, 10), datetime.date(2020, 10, 11), datetime.date(2020, 10, 12), 
            datetime.date(2020, 10, 13), datetime.date(2020, 10, 14), datetime.date(2020, 10, 15)], rotation=45);

fig, ax = plt.subplots()
plt.plot(time, solar, color='y')
plt.xlabel('Time')
plt.ylabel('Solar (W/m2)')
plt.xticks([datetime.date(2020, 10, 10), datetime.date(2020, 10, 11), datetime.date(2020, 10, 12), 
            datetime.date(2020, 10, 13), datetime.date(2020, 10, 14), datetime.date(2020, 10, 15)], rotation=45);


Thoughts on what is going on?

[Cheyenne radar loop](http://schubert.atmos.colostate.edu/~cslocum/nexrad/img/levelii/20201014_kcys.gif)


## Is solar radiation correlated to the temperature?

We need a statistics package! Fortunately, there is one - SciPy!

We can import the [Pearson correlation coefficient](https://docs.scipy.org/doc/scipy-0.14.0/reference/generated/scipy.stats.pearsonr.html) function to calculate R and the accompanying p-value.

In [None]:
from scipy.stats import pearsonr

r, pvalue = pearsonr(temperature, solar)
print('R', r)
print('p-value', pvalue)