# Interact with Pandas and export datasets

Reference notebook for the final task of the Climate Geospatial Analysis with Python and Xarray project on Coursera.

Instructor: Danilo Lessa Bernardineli (https://danlessa.github.io/)

---

- Welcome back! On this final task, you will learn how to interface Xarray with Pandas through exporting and importing objects, as well as to how to export Xarray datasets to NetCDF files. This knowledge is important when working into a Data Science workflow.
- Let's start by opening the task 7 notebook. Run everything.


In [1]:
import xarray as xr

In [18]:
ds = xr.open_dataset('data.nc').sel(expver=1)

- Now I'll introduce you the to dataframe command. Open a new block, and type with me: df equals ds to dataframe df. Run it.

In [27]:
df = ds.to_dataframe()
df

Unnamed: 0_level_0,Unnamed: 1_level_0,Unnamed: 2_level_0,expver,lai_hv,skt,tp
latitude,longitude,time,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
6.0,-82.00,1979-01-01,1,0.000000,300.670105,0.000434
6.0,-82.00,1979-02-01,1,0.000000,300.829926,0.001139
6.0,-82.00,1979-03-01,1,0.000000,301.014832,0.005566
6.0,-82.00,1979-04-01,1,0.000000,301.105957,0.011398
6.0,-82.00,1979-05-01,1,0.000000,300.924347,0.009486
...,...,...,...,...,...,...
-16.0,-46.75,2020-04-01,1,1.905231,295.797791,0.004340
-16.0,-46.75,2020-05-01,1,1.735271,293.425049,0.001070
-16.0,-46.75,2020-06-01,1,1.578286,293.098816,0.000035
-16.0,-46.75,2020-07-01,1,,,


- We have a brand new Pandas DataFrame! Notice that it does have multiindexes. As you will see, Xarray will automatically associate any dimensions that you have to Index levels on Pandas, which is the Pandas way of handling multidimentionality.
- A way to see this automatic multiindex inference at action is to actually exporting a Xarray datastructure that isn't multidimensional, like the months on our time coordinate. Open a new block, and type with me: da equals ds time dt month, da to dataframe. Run it.

In [20]:
da = ds.time.dt.month
da.to_dataframe()

Unnamed: 0_level_0,expver,month
time,Unnamed: 1_level_1,Unnamed: 2_level_1
1979-01-01,1,1
1979-02-01,1,2
1979-03-01,1,3
1979-04-01,1,4
1979-05-01,1,5
...,...,...
2020-04-01,1,4
2020-05-01,1,5
2020-06-01,1,6
2020-07-01,1,7


- The month variable that we generated only has the time dimension, and as such we now have a simple pandas index.
- An alternative on that case is to output a Pandas series instead of a DataFrame. Open a new block, and type da to series. Run it.

In [24]:
da.to_series()

time
1979-01-01    1
1979-02-01    2
1979-03-01    3
1979-04-01    4
1979-05-01    5
             ..
2020-04-01    4
2020-05-01    5
2020-06-01    6
2020-07-01    7
2020-08-01    8
Name: month, Length: 500, dtype: int64

- Pretty similiar, but it is more compact as a primitive.
- Xarray also allows you to load datasets from Pandas Dataframes, on which the index levels will be recognized as being the dimensions. Let's do the reverse operation when we generated the previous DataFrame.
- Open a new block, and type: xr dataset from dataframe df. Run it.

In [30]:
xr.Dataset.from_dataframe(df)

- We have our Dataset back! But notice that we have lost the metadata and the attributes in the process.
- Similiarly, we can also retrieve Xarray DataArrays from Pandas series. This is done by creating a new block, and typing xr dataaray from series df skt. Run it.

In [31]:
xr.DataArray.from_series(df.skt)

- Yay, we retrieved once again. This concludes the interface with Pandas.
- Now I'll show how to export the dataset that we have to NetCDF so that you never lose the manipulations and the produced artifacts. It is really simple. Open a new block, and type: ds to netcdf export nc. Run it.

In [21]:
ds.to_netcdf('export.nc')

- It is really that simple! If you open another block and type xr open dataset export nc and run it, you will notice that our metadata will be persisted.

In [34]:
xr.open_dataset('export.nc')

- So this concludes the task and the course! On this last task, you have learned how to interace Xarray with Pandas and export data to NetCDF, and through this course you have mastered the entire geospatial analysis workflow. You have learned how to load multidimensional data, how to parse and read the information around the dimensions, coordinates and variables, how to visualize it, how to apply simple and grouped operations, how to merge and concatenate data. 
- You should be proud of yourself! As you saw through the course, geospatial analysis is rich of information and insights, and it is easy to get immersed with the amount of knowledge that you can extract. If you feel inspired by this Course, I've left on the references some guidance so that you can download any climate data that you would want to!
- I hope that we see again, and there are a lot of selections for Coursera Guided Projects which will help you toward your journey for technical mastery. Until next class! Bye bye.