Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add notebook on generating climate files per lat lon #1

Merged
merged 2 commits into from Jan 22, 2021
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
208 changes: 208 additions & 0 deletions Create_climate_files_for_coordinates_of_intrest.ipynb
@@ -0,0 +1,208 @@
{
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you could add a link to https://cluster.klima.uni-bremen.de/~oggm/rgi/rgi62_stats.h5


Reply via ReviewNB

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the suggestion. I added the link :)

"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"This notebook can be used as an example on how to make the preprocessing of climate data for a simulation with OGGM (e.g. from a GCM simulation) computationally significantly faster (in the order of 85% recuction in the computation time). In order to do this 3 steps need to be taken. Of those steps the first two are being described in detail in this notebook. For the last step only some hints are being given at the end. \n",
"\n",
"The first step is to select all coordinates of the climate dataset that one needs and make a table out of those. Based on this list the climate time series of each of these coordinates will be saved in a seperated file netcdf file (the second step). The advantage is that is takes less time to open a small file compared to a large file when later preprocessing the data. Additionally it reduces the amount of the same file being opened at the same time by different processors when using multi-processing."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"import pandas as pd\n",
"import salem\n",
"import numpy as np\n",
"import xarray as xr"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Here a table with all the glaciers globally is being opened and all columns that are not of relevance are being dropped. Collumns to save the coordinates of intrest are being added."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"fp = '~/rgi62_allglaciers_stats.h5' # 'https://cluster.klima.uni-bremen.de/~oggm/rgi/rgi62_stats.h5'\n",
"df = pd.read_hdf(fp)\n",
"df = df.drop(columns=['GLIMSId', 'BgnDate', 'EndDate', 'O1Region', 'O2Region', 'Zmin', 'Zmax', 'Form', 'Surging', \n",
" 'Linkages', 'TermType', 'Area', 'Zmed', 'Slope', 'Name', 'Lmax', 'Status', 'Aspect', 'Connect',\n",
" 'GlacierType', 'TerminusType', 'IsTidewater'])\n",
"df['cesm_lat'] = pd.Series(index=df.index)\n",
"df['cesm_lon'] = pd.Series(index=df.index)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Here the climate dataset of intrest is being opened."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"cesm = '~/b.e11.BLMTRC5CN.f19_g16.001.cam.h0.TREFHT.085001-200512.nc'\n",
"dsd = xr.open_dataset(cesm)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Here just the first time step of the file is being selected to make the proccess faster. (For now we're only intrested in the coordinates of the data.)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"dsd = dsd.TREFHT[0]"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"This is a loop over all the glaciers of intrest and selects the nearest coordinate in the climate data set. There might be a\n",
"difference in the longitude values being used between the datasets (-180 to 180 vs 0 to 360). Keep in mind that \n",
"that you might need to correct for a such a difference."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"for gl in np.arange(len(df)):\n",
" lat = df['CenLat'][gl]\n",
" lon = df['CenLon'][gl] + 360\n",
" cesm_lat = dsd.sel(lat=lat, lon=lon, method='nearest').lat.values\n",
" cesm_lon = dsd.sel(lat=lat, lon=lon, method='nearest').lon.values\n",
" df['cesm_lat'][gl] = cesm_lat\n",
" df['cesm_lon'][gl] = cesm_lon"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"It might be usefull to save the table for later."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"df.to_hdf('look_up_table.hdf', key='df')"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"However for we don´t need the full table. Especially when having a climate file with a course resolution, \n",
"there can like in this case be many duplicates. Therefore the duplicates are being removed."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"df_list = df.drop_duplicates(subset=['cesm_lat', 'cesm_lon'], keep='first')\n",
"df_list = df_list.dropna()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Here the climate file of intrest is being opened again. For each coordinate of intrest one \n",
"file is being generated and saved with the round coordinate in it file name."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"dsd = xr.open_dataset(cesm)\n",
"\n",
"for ki in np.arange(len(df_list)):\n",
" ds = dsd.sel(lat=df_list.iloc[ki].cesm_lat, lon=df_list.iloc[ki].cesm_lon, method='nearest')\n",
" ds.to_netcdf(path='temp_files/b.e11.BLMTRC5CN.f19_g16.001.cam.h0.temp.085001-200512.' + \n",
" str(round(df_list.iloc[ki].cesm_lat)) + '_' \n",
" + str(round(df_list.iloc[ki].cesm_lon)) + '.nc', mode='w')"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"The next step would be to create or look up a function that prepares your data so it can be fed for each glacier \n",
"of interrest to for instance the process_gcm_data. Examples of functions that do so are process_cesm_data and \n",
"process_cmip5_data. However those functions select that for you from a large file. That part of these functions \n",
"you would need to replace/ adjust. Here the previously saved look-up table could be handy. This context the \n",
"following lines of code could be usefull."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"lookupt = pd.read_hdf(look_up_table) # open the lookup table \n",
"cesm_lon = str(int(round(lookupt.loc[str(gdir.rgi_id)].cesm_lon))) # select coordinate as in the title of the file\n",
"cesm_lat = str(int(round(lookupt.loc[str(gdir.rgi_id)].cesm_lat)))\n",
"# fill the gaps in the title of the file e.g.:\n",
"# fpath_temp = 'temp_files/b.e11.BLMTRC5CN.f19_g16.001.cam.h0.temp.085001-200512.{}_{}.nc'\n",
"precpds = xr.open_dataset(fpath_precp.format(cesm_lat, cesm_lon))\n",
"tempds = xr.open_dataset(fpath_temp.format(cesm_lat, cesm_lon)) "
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.6.10"
}
},
"nbformat": 4,
"nbformat_minor": 2
}