OGGM · fmaussion · Jan 22, 2021 · Jan 22, 2021 · Jan 22, 2021 · fmaussion
diff --git a/Create_climate_files_for_coordinates_of_intrest.ipynb b/Create_climate_files_for_coordinates_of_intrest.ipynb
@@ -0,0 +1,208 @@
+{
+ "cells": [
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "This notebook can be used as an example on how to make the preprocessing of climate data for a simulation with OGGM (e.g. from a GCM simulation) computationally significantly faster (in the order of 85% recuction in the computation time). In order to do this 3 steps need to be taken. Of those steps the first two are being described in detail in this notebook. For the last step only some hints are being given at the end. \n",
+    "\n",
+    "The first step is to select all coordinates of the climate dataset that one needs and make a table out of those. Based on this list the climate time series of each of these coordinates will be saved in a seperated file netcdf file (the second step). The advantage is that is takes less time to open a small file compared to a large file when later preprocessing the data. Additionally it reduces the amount of the same file being opened at the same time by different processors when using multi-processing."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "import pandas as pd\n",
+    "import salem\n",
+    "import numpy as np\n",
+    "import xarray as xr"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "Here a table with all the glaciers globally is being opened and all columns that are not of relevance are being dropped. Collumns to save the coordinates of intrest are being added."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "fp = '~/rgi62_allglaciers_stats.h5' # 'https://cluster.klima.uni-bremen.de/~oggm/rgi/rgi62_stats.h5'\n",
+    "df = pd.read_hdf(fp)\n",
+    "df = df.drop(columns=['GLIMSId', 'BgnDate', 'EndDate', 'O1Region', 'O2Region', 'Zmin', 'Zmax', 'Form', 'Surging', \n",
+    "                 'Linkages', 'TermType', 'Area', 'Zmed', 'Slope', 'Name', 'Lmax', 'Status', 'Aspect', 'Connect',\n",
+    "                'GlacierType',  'TerminusType', 'IsTidewater'])\n",
+    "df['cesm_lat'] = pd.Series(index=df.index)\n",
+    "df['cesm_lon'] = pd.Series(index=df.index)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "Here the climate dataset of intrest is being opened."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "cesm = '~/b.e11.BLMTRC5CN.f19_g16.001.cam.h0.TREFHT.085001-200512.nc'\n",
+    "dsd = xr.open_dataset(cesm)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "Here just the first time step of the file is being selected to make the proccess faster. (For now we're only intrested in the coordinates of the data.)"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "dsd = dsd.TREFHT[0]"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "This is a loop over all the glaciers of intrest and selects the nearest coordinate in the climate data set. There might be a\n",
+    "difference in the longitude values being used between the datasets (-180 to 180 vs 0 to 360). Keep in mind that \n",
+    "that you might need to correct for a such a difference."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "for gl in np.arange(len(df)):\n",
+    "    lat = df['CenLat'][gl]\n",
+    "    lon = df['CenLon'][gl] + 360\n",
+    "    cesm_lat = dsd.sel(lat=lat, lon=lon, method='nearest').lat.values\n",
+    "    cesm_lon = dsd.sel(lat=lat, lon=lon, method='nearest').lon.values\n",
+    "    df['cesm_lat'][gl] = cesm_lat\n",
+    "    df['cesm_lon'][gl] = cesm_lon"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "It might be usefull to save the table for later."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "df.to_hdf('look_up_table.hdf', key='df')"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "However for we don´t need the full table. Especially when having a climate file with a course resolution, \n",
+    "there can like in this case be many duplicates. Therefore the duplicates are being removed."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "df_list = df.drop_duplicates(subset=['cesm_lat', 'cesm_lon'], keep='first')\n",
+    "df_list = df_list.dropna()"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "Here the climate file of intrest is being opened again. For each coordinate of intrest one \n",
+    "file is being generated and saved with the round coordinate in it file name."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "dsd = xr.open_dataset(cesm)\n",
+    "\n",
+    "for ki in np.arange(len(df_list)):\n",
+    "    ds = dsd.sel(lat=df_list.iloc[ki].cesm_lat, lon=df_list.iloc[ki].cesm_lon, method='nearest')\n",
+    "    ds.to_netcdf(path='temp_files/b.e11.BLMTRC5CN.f19_g16.001.cam.h0.temp.085001-200512.' + \n",
+    "                 str(round(df_list.iloc[ki].cesm_lat)) + '_' \n",
+    "                 + str(round(df_list.iloc[ki].cesm_lon)) + '.nc', mode='w')"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "The next step would be to create or look up a function that prepares your data so it can be fed for each glacier \n",
+    "of interrest to for instance the process_gcm_data. Examples of functions that do so are process_cesm_data and \n",
+    "process_cmip5_data. However those functions select that for you from a large file. That part of these functions \n",
+    "you would need to replace/ adjust. Here the previously saved look-up table could be handy. This context the \n",
+    "following lines of code could be usefull."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "lookupt = pd.read_hdf(look_up_table) # open the lookup table \n",
+    "cesm_lon = str(int(round(lookupt.loc[str(gdir.rgi_id)].cesm_lon))) # select coordinate as in the title of the file\n",
+    "cesm_lat = str(int(round(lookupt.loc[str(gdir.rgi_id)].cesm_lat)))\n",
+    "# fill the gaps in the title of the file e.g.:\n",
+    "# fpath_temp = 'temp_files/b.e11.BLMTRC5CN.f19_g16.001.cam.h0.temp.085001-200512.{}_{}.nc'\n",
+    "precpds = xr.open_dataset(fpath_precp.format(cesm_lat, cesm_lon))\n",
+    "tempds = xr.open_dataset(fpath_temp.format(cesm_lat, cesm_lon))  "
+   ]
+  }
+ ],
+ "metadata": {
+  "kernelspec": {
+   "display_name": "Python 3",
+   "language": "python",
+   "name": "python3"
+  },
+  "language_info": {
+   "codemirror_mode": {
+    "name": "ipython",
+    "version": 3
+   },
+   "file_extension": ".py",
+   "mimetype": "text/x-python",
+   "name": "python",
+   "nbconvert_exporter": "python",
+   "pygments_lexer": "ipython3",
+   "version": "3.6.10"
+  }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 2
+}