New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
add notebook on generating climate files per lat lon #1
Merged
Merged
Changes from all commits
Commits
Show all changes
2 commits
Select commit
Hold shift + click to select a range
File filter
Filter by extension
Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,208 @@ | ||
{ | ||
"cells": [ | ||
{ | ||
"cell_type": "markdown", | ||
"metadata": {}, | ||
"source": [ | ||
"This notebook can be used as an example on how to make the preprocessing of climate data for a simulation with OGGM (e.g. from a GCM simulation) computationally significantly faster (in the order of 85% recuction in the computation time). In order to do this 3 steps need to be taken. Of those steps the first two are being described in detail in this notebook. For the last step only some hints are being given at the end. \n", | ||
"\n", | ||
"The first step is to select all coordinates of the climate dataset that one needs and make a table out of those. Based on this list the climate time series of each of these coordinates will be saved in a seperated file netcdf file (the second step). The advantage is that is takes less time to open a small file compared to a large file when later preprocessing the data. Additionally it reduces the amount of the same file being opened at the same time by different processors when using multi-processing." | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": null, | ||
"metadata": {}, | ||
"outputs": [], | ||
"source": [ | ||
"import pandas as pd\n", | ||
"import salem\n", | ||
"import numpy as np\n", | ||
"import xarray as xr" | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"metadata": {}, | ||
"source": [ | ||
"Here a table with all the glaciers globally is being opened and all columns that are not of relevance are being dropped. Collumns to save the coordinates of intrest are being added." | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": null, | ||
"metadata": {}, | ||
"outputs": [], | ||
"source": [ | ||
"fp = '~/rgi62_allglaciers_stats.h5' # 'https://cluster.klima.uni-bremen.de/~oggm/rgi/rgi62_stats.h5'\n", | ||
"df = pd.read_hdf(fp)\n", | ||
"df = df.drop(columns=['GLIMSId', 'BgnDate', 'EndDate', 'O1Region', 'O2Region', 'Zmin', 'Zmax', 'Form', 'Surging', \n", | ||
" 'Linkages', 'TermType', 'Area', 'Zmed', 'Slope', 'Name', 'Lmax', 'Status', 'Aspect', 'Connect',\n", | ||
" 'GlacierType', 'TerminusType', 'IsTidewater'])\n", | ||
"df['cesm_lat'] = pd.Series(index=df.index)\n", | ||
"df['cesm_lon'] = pd.Series(index=df.index)" | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"metadata": {}, | ||
"source": [ | ||
"Here the climate dataset of intrest is being opened." | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": null, | ||
"metadata": {}, | ||
"outputs": [], | ||
"source": [ | ||
"cesm = '~/b.e11.BLMTRC5CN.f19_g16.001.cam.h0.TREFHT.085001-200512.nc'\n", | ||
"dsd = xr.open_dataset(cesm)" | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"metadata": {}, | ||
"source": [ | ||
"Here just the first time step of the file is being selected to make the proccess faster. (For now we're only intrested in the coordinates of the data.)" | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": null, | ||
"metadata": {}, | ||
"outputs": [], | ||
"source": [ | ||
"dsd = dsd.TREFHT[0]" | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"metadata": {}, | ||
"source": [ | ||
"This is a loop over all the glaciers of intrest and selects the nearest coordinate in the climate data set. There might be a\n", | ||
"difference in the longitude values being used between the datasets (-180 to 180 vs 0 to 360). Keep in mind that \n", | ||
"that you might need to correct for a such a difference." | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": null, | ||
"metadata": {}, | ||
"outputs": [], | ||
"source": [ | ||
"for gl in np.arange(len(df)):\n", | ||
" lat = df['CenLat'][gl]\n", | ||
" lon = df['CenLon'][gl] + 360\n", | ||
" cesm_lat = dsd.sel(lat=lat, lon=lon, method='nearest').lat.values\n", | ||
" cesm_lon = dsd.sel(lat=lat, lon=lon, method='nearest').lon.values\n", | ||
" df['cesm_lat'][gl] = cesm_lat\n", | ||
" df['cesm_lon'][gl] = cesm_lon" | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"metadata": {}, | ||
"source": [ | ||
"It might be usefull to save the table for later." | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": null, | ||
"metadata": {}, | ||
"outputs": [], | ||
"source": [ | ||
"df.to_hdf('look_up_table.hdf', key='df')" | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"metadata": {}, | ||
"source": [ | ||
"However for we don´t need the full table. Especially when having a climate file with a course resolution, \n", | ||
"there can like in this case be many duplicates. Therefore the duplicates are being removed." | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": null, | ||
"metadata": {}, | ||
"outputs": [], | ||
"source": [ | ||
"df_list = df.drop_duplicates(subset=['cesm_lat', 'cesm_lon'], keep='first')\n", | ||
"df_list = df_list.dropna()" | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"metadata": {}, | ||
"source": [ | ||
"Here the climate file of intrest is being opened again. For each coordinate of intrest one \n", | ||
"file is being generated and saved with the round coordinate in it file name." | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": null, | ||
"metadata": {}, | ||
"outputs": [], | ||
"source": [ | ||
"dsd = xr.open_dataset(cesm)\n", | ||
"\n", | ||
"for ki in np.arange(len(df_list)):\n", | ||
" ds = dsd.sel(lat=df_list.iloc[ki].cesm_lat, lon=df_list.iloc[ki].cesm_lon, method='nearest')\n", | ||
" ds.to_netcdf(path='temp_files/b.e11.BLMTRC5CN.f19_g16.001.cam.h0.temp.085001-200512.' + \n", | ||
" str(round(df_list.iloc[ki].cesm_lat)) + '_' \n", | ||
" + str(round(df_list.iloc[ki].cesm_lon)) + '.nc', mode='w')" | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"metadata": {}, | ||
"source": [ | ||
"The next step would be to create or look up a function that prepares your data so it can be fed for each glacier \n", | ||
"of interrest to for instance the process_gcm_data. Examples of functions that do so are process_cesm_data and \n", | ||
"process_cmip5_data. However those functions select that for you from a large file. That part of these functions \n", | ||
"you would need to replace/ adjust. Here the previously saved look-up table could be handy. This context the \n", | ||
"following lines of code could be usefull." | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": null, | ||
"metadata": {}, | ||
"outputs": [], | ||
"source": [ | ||
"lookupt = pd.read_hdf(look_up_table) # open the lookup table \n", | ||
"cesm_lon = str(int(round(lookupt.loc[str(gdir.rgi_id)].cesm_lon))) # select coordinate as in the title of the file\n", | ||
"cesm_lat = str(int(round(lookupt.loc[str(gdir.rgi_id)].cesm_lat)))\n", | ||
"# fill the gaps in the title of the file e.g.:\n", | ||
"# fpath_temp = 'temp_files/b.e11.BLMTRC5CN.f19_g16.001.cam.h0.temp.085001-200512.{}_{}.nc'\n", | ||
"precpds = xr.open_dataset(fpath_precp.format(cesm_lat, cesm_lon))\n", | ||
"tempds = xr.open_dataset(fpath_temp.format(cesm_lat, cesm_lon)) " | ||
] | ||
} | ||
], | ||
"metadata": { | ||
"kernelspec": { | ||
"display_name": "Python 3", | ||
"language": "python", | ||
"name": "python3" | ||
}, | ||
"language_info": { | ||
"codemirror_mode": { | ||
"name": "ipython", | ||
"version": 3 | ||
}, | ||
"file_extension": ".py", | ||
"mimetype": "text/x-python", | ||
"name": "python", | ||
"nbconvert_exporter": "python", | ||
"pygments_lexer": "ipython3", | ||
"version": "3.6.10" | ||
} | ||
}, | ||
"nbformat": 4, | ||
"nbformat_minor": 2 | ||
} |
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you could add a link to https://cluster.klima.uni-bremen.de/~oggm/rgi/rgi62_stats.h5
Reply via ReviewNB
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the suggestion. I added the link :)