<img src="https://whiteface.asrc.albany.edu/images/header_ASRC-WFM_750.png" alt="Atmospheric Sciences Research Center Whiteface Mountain Field Station logo"></img><p>

# Whiteface Mountain Cloud Water Data

---

# Accessing Cloud Water Data from the [ASRC](https://whiteface.asrc.albany.edu/)

## Overview
Cloud water data provide an insight into the chemical processing of gasses and particulates in the atmosphere. While this is not technically an API, this notebook will show how to access a niche dataset for cloud water chemistry, collected in-situ at Whiteface Mountain in Wilmington, NY. The sample site serves as a relative background for atmospheric chemistry within the region, as it is a remote, mountain-top observatory.

This notebook will cover

1. Requesting data access
1. Cleaning and sorting through the data
1. Basic cloud water chemistry analysis (Coming Soon)
1. Plotting the data (Coming Soon)

## Prerequisites

| Concepts | Importance | Notes |
| --- | --- | --- |
| [Introduction to Pandas](https://foundations.projectpythia.org/core/pandas/pandas.html) | Necessary | How to deal with dataframes and datasets |
| [Matplotlib Basics](https://foundations.projectpythia.org/core/matplotlib/matplotlib-basics.html) | Helpful | Skills for different plotting styles and techniques |

- **Time to learn**: 45 minutes
- **System requirements**:
    - <b>Email Address</b> for Data Access

---

## Imports


<div class="admonition alert alert-info">
    <p class="admonition-title" style="font-weight:bold">Info</p>
    <p>Here we'll import lots of stuff, but we might not end up using them all...</p>
</div

In [1]:
import matplotlib.pyplot as plt
import seaborn as sns
import pandas as pd
from datetime import date
from datetime import datetime
import numpy as np

We will also set some limits to the size of data that Pandas displays, so as not to overload our screens.

In [2]:
# Set the maximum number of rows and columns to display
pd.set_option('display.max_rows', 10)  # Set to the number of rows you want to display
pd.set_option('display.max_columns', 10)  # Set to the number of columns you want to display

---

## Accessing the Data

Currently, the data from the Whiteface Mountain summit are obtained and managed by the [Lance Research Laboratory](https://github.com/LanceLab-ASRC). Available data includes, among others, chemical speciation within cloud water:

| Anions | Cations |
| --- | --- |
| Sulfate | Ammonium
| Nitrate | Sodium
| Chloride | Calcium
| Formate | Magnesium
| Acetate | Potassium
| Oxalate ||

| Some Other Data |
| --- |
Total Organic Carbon
pH
Conductivity
Liquid Water Content
Sample Volume
Sample Dump Date/Time

<div class="admonition alert alert-info">
    <p class="admonition-title" style="font-weight:bold">Note:</p>
    <p>In order to access the data, we don't need an API. We just need to fill out a simple `Google Form` at the following website:</p>

   <p>http://atmoschem.asrc.cestm.albany.edu/~cloudwater/pub/Data.htm</p>

<p><img src="notebooks/images/WFM-Data-Form.png" alt="WFM Data Form"></img></p>

</div>

Once you are granted access, you can utilize <b>recent</b> and <b>historical</b> data spanning back to 1994.<br> The data come in `*.xlsx` files, or as multiple `*.xlsx` files in a zip drive, depending on which dataset you collect.


This notebook uses 2022 Cloud Water Data (current as of June 18th, 2024) as an example.<br> As the data files come with various sheets covering multiple angles of quality control, we will simplify this notebook with a `*.csv` file of the "valid" samples.

The full data file can be viewed in `../files/WFC.2022.Data.R2--6_18_24.xlsx`.

---

## Reading the Data

We will utilize the Pandas package to handle our reading in our data file. We will also preemptively use the `ISO-8859-1` encoding to ensure symbols like <b>&deg;</b> and <b>&mu;</b> work.

In [3]:
df = pd.read_csv('../files/WFC.2022.Data.R2--6_18_24.csv', encoding = 'ISO-8859-1')

Let's look at our dataframe...

In [4]:
df

Unnamed: 0,Atmospheric Sciences Research Center,Unnamed: 1,Unnamed: 2,Unnamed: 3,Unnamed: 4,...,Unnamed: 54,Unnamed: 55,Unnamed: 56,Unnamed: 57,Unnamed: 58
0,http://atmoschem.asrc.cestm.albany.edu/~cloudw...,,,,,...,,,,,
1,,,,,,...,,,,,
2,Whiteface 2022 VALID DATA,,,,,...,,,,,
3,,,,,,...,,,,,
4,LABNO,DUMP TIME,COLLECTION_HOURS,COLL_HR_F,POOL_VOL ml,...,Lactate_ppb,Malonate_ppb,Oxalate_ppb,Pyruvate_ppb,SuccinateMalate_ppb
...,...,...,...,...,...,...,...,...,...,...,...
117,,,,,,...,,,,,
118,,,,,,...,,,,,
119,,,,,,...,,,,,
120,,,,,,...,,,,,


As we can see above, the data actually begin on the fifth line.

Let's take a closer look and notice that there are only 42 samples in this particular set...

In [5]:
df.iloc[4:50,:]

Unnamed: 0,Atmospheric Sciences Research Center,Unnamed: 1,Unnamed: 2,Unnamed: 3,Unnamed: 4,...,Unnamed: 54,Unnamed: 55,Unnamed: 56,Unnamed: 57,Unnamed: 58
4,LABNO,DUMP TIME,COLLECTION_HOURS,COLL_HR_F,POOL_VOL ml,...,Lactate_ppb,Malonate_ppb,Oxalate_ppb,Pyruvate_ppb,SuccinateMalate_ppb
5,2215304,6/2/2022 6:00,12.0,,2312,...,19,M,48,24,M
6,2215401,6/3/2022 18:00,6.3,,1183,...,19,M,127,22,M
7,2215901,6/8/2022 18:00,7.2,,425,...,21,M,75,22,M
8,2216003,6/9/2022 18:00,8.0,,1866,...,33,M,35,0,M
...,...,...,...,...,...,...,...,...,...,...,...
45,2226503,9/22/2022 8:00,M,,M,...,M,M,0,M,M
46,2226901,9/26/2022 18:00,M,,M,...,M,M,0,M,M
47,2227002,9/27/2022 6:00,M,,M,...,M,M,BDL,M,M
48,,,,,,...,,,,,


In the next cell, we will use `Row 4` for our column headings, and slice the dataframe so it only shows our data. Cleaning up the data is helpful for preemptively halting any errors resulting from NaNs and empty cells.

In [6]:
df.columns = df.iloc[4]
df = df.iloc [5:48]
df

4,LABNO,DUMP TIME,COLLECTION_HOURS,COLL_HR_F,POOL_VOL ml,...,Lactate_ppb,Malonate_ppb,Oxalate_ppb,Pyruvate_ppb,SuccinateMalate_ppb
5,2215304,6/2/2022 6:00,12.0,,2312,...,19,M,48,24,M
6,2215401,6/3/2022 18:00,6.3,,1183,...,19,M,127,22,M
7,2215901,6/8/2022 18:00,7.2,,425,...,21,M,75,22,M
8,2216003,6/9/2022 18:00,8.0,,1866,...,33,M,35,0,M
9,2216104,6/10/2022 6:00,12.0,,4407,...,19,M,43,24,M
...,...,...,...,...,...,...,...,...,...,...,...
43,2226002,9/17/2022 6:00,M,,M,...,0,M,BDL,0,M
44,2226302,9/20/2022 6:00,M,,M,...,0,M,M,0,M
45,2226503,9/22/2022 8:00,M,,M,...,M,M,0,M,M
46,2226901,9/26/2022 18:00,M,,M,...,M,M,0,M,M


<div class="admonition alert alert-info">
    <p class="admonition-title" style="font-weight:bold">Some brief details about the data format...</p><br>
    <p>The <b>LABNO</b> values represent the <b>Julian date</b>, where the first two digits are <b>year</b>, and the next three are the <b>day</b>. The remaining two digits refer to internal identification regarding the collection bottles for same-day samples.</p>
    <p>The cloud water at Whiteface Mountain is collected in bulk 12-hour samples, so the time the accumulated sample was "dumped" into a storage container is in the <b>DUMP TIME</b> column, and the duration of <b>time</b> in that 12-hour period where the summit was <b>in-cloud</b> is show in in the <b>COLLECTION_HOURS</b> column.</p>
</div>

Let's look at all the columns that have data in them below...


In [7]:
for col in df.columns:
    if not df[col].isna().all():
        print(col)

LABNO
DUMP TIME
COLLECTION_HOURS
POOL_VOL ml
LWC g m-3
TEMP °C
WINDDIR_AVG °AZ
OCTANT
AVG_S_WSP m s-1
LABPH
SPCOND µS cm-1
HION µeq L-1
CA mg L-1
CA µeq L-1
MG mg L-1
MG µeq L-1
NA mg L-1
NA µeq L-1
K mg L-1
K µeq L-1
NH4 mg L-1
NH4 µeq L-1
SO4 mg L-1
SO4 µeq L-1
NO3 mg L-1
NO3 µeq L-1
CL mg L-1
CL µeq L-1
TOC µmols C L-1
TN_F
COMMENT
CATION_ANION_RATIO
SUM_CATIONS µeq L-1
SUM_ANIONS µeq L-1
RPD
Glyoxalate_ppb
Formate_ppb
AcetateGlycolate_ppb
Lactate_ppb
Malonate_ppb
Oxalate_ppb
Pyruvate_ppb
SuccinateMalate_ppb


Now that we have our data in a manageable format, we can begin any analysis or visualizations we are interested in.


---

## Analyzing the Data

<div class="admonition alert alert-danger">
    <p class="admonition-title" style="font-weight:bold">Coming Soon!</p>
   This section is still under development.
</div>

## Plotting the Data

<div class="admonition alert alert-danger">
    <p class="admonition-title" style="font-weight:bold">Coming Soon!</p>
   This section is still under development.
</div>

---

## Summary
In this notebook, we've covered how to access cloud water chemistry data from the [Lance Research Laboratory](@LanceLab-ASRC) at the University at Albany's Atmospheric Sciences Research Center. We've looked at the data format, and ways to process and analyze the data. This is a niche dataset, updated regularly as cloud water is collected, processed, and analyzed each summer.

## Resources and references
More information about the Whiteface Mountain Field Station: https://whiteface.asrc.albany.edu/

More information about the [Lance Research Laboratory](@LanceLab-ASRC): https://research.asrc.albany.edu/facstaff/lance/index.html

More information about the cloud water chemistry at Whiteface Mountain: https://acp.copernicus.org/articles/23/1619/2023/

---

<b>Information about the author: [Adam Deitsch](https://amdeitsch.github.io/)</b>