# Using pandas dataframes
This notebook demonstrates how to load data into a pandas dataframe and do basic operations with the dataframe data.

## Loading BB3 Cruise data into a pandas df

In [1]:
# === using `import ... as ...`
# you can import pandas as usual
import pandas
# and use it
print(f"pandas.__version: {pandas.__version__}")

# but people often give pandas another name - `pd` when imported
# to do this you can use `import ... as ...`
import pandas as pd
# now we can make the same code more succinct
print(f"pd.__version__: {pd.__version__}")

# pd and pandas are exactly the same *object*.
print("pd =?= pandas")
print(f"{pd == pandas}")

# if we add something to the pd object
pd.my_randomly_named_attribute = "This is Tylar's special string object"
# the two objects are still equal
print(f"Did adding the attribute change `pd` but not `pandas`? {pd == pandas}")
# we can use the attribute we put on `pd` on the `pandas`
print(f"pandas.my_randomly_named_attribute: {pandas.my_randomly_named_attribute}")

# you can even import it again with a different name
import pandas as whatever_i_feel_like
print(f"whatever_i_feel_like.__version__: {whatever_i_feel_like.__version__}")
# and the new import is still the exact same object as `pd` and `pandas`
print(f"whatever_i_feel_like.my_randomly_named_attribute: {pandas.my_randomly_named_attribute}")
# You can get creative with this but... please don't.
# Stick to the original name or common usages like `pandas as pd`
# Another common one is `numpy as np`
import numpy as np
print(f"np.__version__: {np.__version__}")

pandas.__version: 1.3.3
pd.__version__: 1.3.3
pd =?= pandas
True
Did adding the attribute change `pd` but not `pandas`? True
pandas.my_randomly_named_attribute: This is Tylar's special string object
whatever_i_feel_like.__version__: 1.3.3
whatever_i_feel_like.my_randomly_named_attribute: This is Tylar's special string object
np.__version__: 1.21.2


In [2]:
# Loading Sat data into pandas (is df the right data structure for this?)
import os

import pandas
FILEPATH = "../../data/WS19266_BB3.raw"

# read the tab-separated-values file (`.raw` = `.tsv`)
bb3_df = pandas.read_csv(
    FILEPATH, 
    sep='\t',
    on_bad_lines='skip'  # default is 'error'. can also use 'warn' and 'skip'
)

# now you can use the pandas dataframe
SEPARATOR = "="*302  # this creates big string like `=====================` with 302 `=` characters
print(bb3_df.describe())
print(SEPARATOR)
print(bb3_df.info())
print(SEPARATOR)
print(bb3_df.head())

       55421 records to read
count           29787.000000
mean              528.212610
std               599.439717
min               -65.000000
25%               517.000000
50%               521.000000
75%               524.000000
max             52521.000000
<class 'pandas.core.frame.DataFrame'>
MultiIndex: 31074 entries, ('07/29/19', '12:06:15', 470.0, 2153.0, 532.0, 2043.0, 650.0, 4130.0) to ('etx', nan, nan, nan, nan, nan, nan, nan)
Data columns (total 1 columns):
 #   Column                 Non-Null Count  Dtype  
---  ------                 --------------  -----  
 0   55421 records to read  29787 non-null  float64
dtypes: float64(1)
memory usage: 2.3+ MB
None
                                                          55421 records to read
07/29/19 12:06:15 470.0 2153.0 532.0 2043.0 650.0 4130.0                  536.0
         12:06:16 470.0 2151.0 532.0 2034.0 650.0 4130.0                  536.0
         12:06:17 470.0 2121.0 532.0 1994.0 650.0 4130.0                  535.0
    

## Loading sat image data into pandas

In [18]:
import pandas
FILEPATH =  "../../data/MODA_OC_py_data/A2007143182500.L2_LAC_OC.x.nc"

# The xarray library handles arbitrary-dimensional netCDF data, and retains metadata. 
# Xarray provides a simple method of opening netCDF files, and converting them to pandas dataframes.
import xarray as xr

# create an xarray.Dataset from `.nc` file
img_dataset = xr.open_dataset(FILEPATH)

# Here we use `display()` instead of `print()` to get a prettier output.
# (You can't do this outside of a jupyter notebook.)
from IPython.display import display
display(img_dataset)

# but the xr.DataSet breaks when we try to convert it to a dataframe.
# uncomment the next line and try it yourself:
#img_df = img_dataset.to_dataframe()  # throws `ValueError: no valid index for a 0-dimensional object`

# === convert the `xr.Dataset` to a `pandas.DataFrame`
# this file has a hierarchy of groups so it is more complicated than just using
# `img_dataset.to_dataframe()` as usual.
# [SO q/a ref](https://stackoverflow.com/a/54813257/1483986)

In [22]:
!FILEPATH="../../data/MODA_OC_py_data/A2007143182500.L2_LAC_OC.x.nc"
!ncdump -h $FILEPATH

/bin/bash: ncdump: command not found


In [26]:
NC_GROUP_TO_OPEN = "TODO: PUT_THE_GROUP_HERE"  # get this using bash `ncdump -h` or similar
xr.open_dataset(FILEPATH, group=NC_GROUP_TO_OPEN)

# img_df = img_dataset.to_dataframe()

# now you can use the pandas dataframe
# print(img_df.describe())
# print("="*302)

# print(img_df.variables)
# print(img_df)


OSError: [Errno group not found: TODO: PUT_THE_GROUP_HERE] 'TODO: PUT_THE_GROUP_HERE'