# Select and filter data through coordinates

Reference notebook for the second task of the Climate Geospatial Analysis with Python and Xarray project on Coursera.

Instructor: Danilo Lessa Bernardineli (https://danlessa.github.io/)

---

- Welcome back! On this task, we are going to learn how to use the Xarray selectors as well as the where method so that we can navigate around our data.
- Xarray selectors allows you to filter across multidimensions without hassles, and it is an very powerful feature when dealing with geospatial data.

## Load data

- So to get started, let's open the task 2 notebook. Run all the cells so that we load the previous task data. Before going further, take a note of our current dimensions: It is an structure which has length 31 across the latitude, 27 across the longitude, and 937 across the time dimension. This is an 3D structure, and you could imagine our data as being points over an cube.

In [1]:
import xarray as xr

In [2]:
ds = xr.open_dataset('data.nc')

In [3]:
ds

- Now, open a new block. Let's introduce the Xarray selector method, so type with me: ds sel longitude equals minus 64 48. Run it. Notice that our structure now has dimensions of length 31 for the latitude, and 937 for the time.

In [4]:
ds.sel(longitude=-82)

- What we have just did is to select all data that has the latitude equals minus 64 48, and as such we have dropped the data that had a different latitude.
- An neat feature of the xarray selector is that you can filter by several dimensions at once. To see what I mean, open a new block, and type: ds sel longitude equals minus 64 48, latitude equals minus 7 34, time equals 2018 01 01. Run it.

In [5]:
ds.sel(longitude=-82, 
       latitude=5.75,
       time='2018-01-01')

- Now we only have four points for our two variables. To see the value of them, you can click in the second icon at the right corner of the variables. As you can see, we have points associated with four different times of the day, four skin temperature points and four total precipitation points. **Feel free to pause the video** to inspect a bit further, or to change the selector values.
- Now what if we pass an coordinate that doesn't exists to the selector? Open a new block, and pass ds sel longitude equals 99 and run it. And you have an error. Fortunately, Xarray has an neat feature for approximating coordinates that does not exists in the dataset.

In [6]:
ds.sel(longitude=99)

KeyError: 99.0

- Open a new block, and type with me: ds sel latitude equals minus 99, method equals nearest. Run it. Now you don't have an error! Providing the selector with method equals nearest makes xarray select the nearest points. This is an very powerful feature for geospatial analysis, as most of the time you won't have the exact coordinates for the place that you want to study. Feel free to pause the video now for playing with the selector value.

In [7]:
ds.sel(longitude=-99,
       method='nearest')

- Good! Another functionality of the sel method is to select for several variables at once by passing a list to it. To see what I mean, open a new block and type with me: ds sel latitude equals list of -82 -81, longitude equals list of 6 and 5.8, method equals nearest. Run it. Notice the dimensions.

In [8]:
ds.sel(latitude=[-16, -15],
       longitude=[-46, -47],
       method="nearest")

- Now let's introduce other methods besides the sel, starting with the index based one. Create a new cell, and type with me: ds isel latitude equals zero longitude equals zero. Run it. What we just did is to select the first latitude in the dataset, and the first longitude. This is analogous to the pandas iloc method, while the sel method that we were using is analogous to the Pandas loc method.

In [9]:
ds.isel(latitude=0, 
        longitude=0)

- Lastly, let's introduce the where method, which allows you to perform more complex filters. So open a new block and type with me: query equals ds longitude less than minus 64. Query equals query and ds latitude greater than minus eight. ds where query drop equals true. Run it.

In [10]:
QUERY = ds.longitude < -64
QUERY = QUERY & (ds.latitude > -8)

ds.where(QUERY,
         drop=True)

- What we just did is to filter all the data that is west to minus 64, and north to minus eight. The query variable is an array of indices, just like it's done on pandas and numpy, and we use that array of indices in the where method together with the drop equals true so that we have an filtered dataset. Feel free to pause now in order to change the query parameters or to include the time on it.
- So that's it for this task! You now know how to select and filter data across several dimensions with Xarray, and on next task we are going to visualize the data that we had been filtering all along. See you!

## Quiz

What of those are valid ways to selecting data on xarray?

- [x] `selected_data = ds.sel(city=["Ariquemes", "Cacoal"])`
- [ ] `selected_data = ds.query("longitude > -5.64")` 
- [x] `selected_data = ds.sel(longitude=-5.64, method="nearest")`
- [x] `selected_data = ds.where(ds.longitude > -5.64)`