# Load and getting familiar with NetCDF datasets

Reference notebook for the first task of the Climate Geospatial Analysis with Python and Xarray project on Coursera.

Instructor: Danilo Lessa Bernardineli (https://danlessa.github.io/)

---
- Welcome to the Climate Geospatial Analysis with Python and Xarray guided project! My name is Danilo, an atmosphere physicist at University of São Paulo and your instructor through the Rhyme Interactive Platform. 
- I'll guide you here through the right panel, while you have your own preconfigured machine on the left panel. So let's get you started.
- Through this project, you are going to learn how to use Xarray for handling real world climate data. Xarray is an Python Library for performing multidimensional data analysis, like when you are handling geospatial coordinates, time series and among other dimensions.
- Xarray is useful because it allows you to mixture dimensions at will while retaining clarity and easiness of use. It is sort like Pandas, however for datasets which are more complicated, as Xarray is more flexible in regards to the data structure.
- On our first task, we are going to learn how to load an pre-downloaded NetCDF file, and how to get familiar with the structure of it.
- So let's open our task 1 notebook. First we need to import Xarray. In order to do so, open a new cell, and type with me: import xarray as xr. Run it.

In [1]:
import xarray as xr

- Now to load a dataset, we use the open_dataset method, so type with me: ds equals xr open dataset data dot nc, then run it. This task1.nc file refers to temperature and precipitation data around the Amazon rainforest, and there is a how-to of how to download it on the Bonus section of this course.

In [2]:
ds = xr.open_dataset('data.nc')

- In order to see the contents of it, open a new cell, and type ds. Run it. You can see that Xarray provides you with an full overview of what's inside it, but let's take our time to walk through it. 

In [3]:
ds

- The first information that is provided is the dimensions vector. This informs us the size of the dimensions of our object. It is just like the Numpy shape actually, and as you can see, we have three dimensions: latitude, longitude and time.
- Each one of those dimensions is represented by a coordinate, and on the right corner you can see the numerical types for them. Also, we two variables for our dataset: the skt, which is surface temperature, and the tp, which is total precipitation.
- Notice that on each variable, you have an tuple of coordinates. This means that each variable has an associated dimensionality expressed by that tuple. Xarray allows you to mix variables with different coordinates, and actually this is the most powerful feature of it, and we'll see how to handle that along the project.
- Another neat feature of Xarray is the support for metadata. Notice that we have an Attributes section where you have the provenance for that particular file. 
- We can also have attributes by variables: to see what I mean, click on the first icon on the right corner for the skt variable, and note that we have two attributes: one informing the unit, which is Kelvins, and another informing an long name, which is 'Skin temperature'.
- So that's it for this task! Now you know how to load NetCDF as well as how to get some basic summary for the data contained inside it. On the next task, we are going to learn how to actually select and query that data so that we can get insights from it! See you soon.


## Quiz

- What of the following are valid ways to load an NetCDF file? 
    - [ ] ds = xr.load_netcdf_file('example.nc')
    - [ ] ds = xr.from_pandas(pd.read_netcdf('example.nc'))
    - [x] ds = xr.open_dataset('example.nc')
    - [ ] ds = xr.from_nc('example.nc')