## Background 
In the Climate Crisis every location is affected differently at different times of the year. In the Indian scenario the rainfall pattern has changed considerably in the last decades with some parts of the country become drier and other become wetter and a different occurence of rainfall events. 
As an Urbanist I am interested to find out how the rainfall has changed over the years in any particular city where I am planning future infrastructure works. This notebook does exactly that. It helps anyone who finds it intriguing to download rainfall grided data since 1901 upto 2023 for any particular location. And analyse it to see how the rainfall has changed.

## Methodology
The IMD Pune has published the gridded data of the grid 0.25 Degree Latitude and 0.25 Degree Longitude for India since 1901 at https://www.imdpune.gov.in/cmpg/Griddata/Rainfall_25_NetCDF.html' in separate files. But this data in NETCDF format is not a time series data since 1901 and is only compiled year wise. This data is very comprehensive and is available for open download. But the data is very heavy and calling the file for every year takes effort and time. Also the data of the entire country may not be necessary for any decision maker to take decisions about his own city or the city/location of his/her study. So the notebook automates and downloads, extracts, compiles and cleans the entire data since 1901 for any location or multiple locations into one data file for future use. By sequencing the download and extracting it serially and deleting the original file it frees up computing resources and can be adapted by any casual researcher with nominal resources. With this notebooks does is:

It opens the website
Selects the year for which the entire indian data is to be downloaded
Downloads data for a particular year, waits and checks for the completion of the download
It arithmatically interpolates the data for the selected location from the available data(*read Approximation and Errors in next section). The example here specifies 4 main metros of India i.e. Delhi, Kolkata, Mumbai and Chennai. But you can replace the latitude and longitude for any location or loactions that you would be interested in.
On completion of download of one year, it extracts the data for selected location and then deletes the original file downloaded file to save computing resources.
It then auto selects the file for the next year and repeats the process till the entire time series is available in separate year wise files for only the particular selected location.
At the end it combines all the files for the given years into one single time series data of rainfall since the last century.
Thats that you have done it, you have the time series to analyse for any given location in India. For examples on how to perform analysis and understand the change in rainfall over the last century please visit my Github or Kaggle page.

## Caution
1. The download process consumes a fair amount of data, in average each location consumes about 6GB data. The code takes care that this space is not consumed on your hard disk by downloading each file, extracting, and deleting the origina file after storing of only relevant data. Yet please be aware of the large download volume and see that you are using a internet connection with large bandwidth and low or no costs :)
2. The sequencing of work also frees up computer resources like the processor, yet the processing of such large amount of data is time and resource heavy on your system given the large amount of data it is handling

## Errors or approximations
1. The original grid data is available for the latitude longitude grid of 0.25x0.25 degrees. not for any particluar location. Once you have slevted your location of interest it interpolates the rainfall for thsi location from the lata of the four nearest vailable points. The margin for error may not ber very large as the maximum distance between a chosen location and available data point will be less than or equal to 12.5 kilometres. While this is the best method, it comes with some caveats. Eg. it does not consider large differences in topography if any found within the distance of 12.5 Km from the seleted location. But as this averages the data from 4 nearest points this error will be reduced.

## Implementation Notes
1. We need Selenium webdriver for your particular web browser. This may be different depending on your browser. Please modify the code for your browser and operating system of choice. For ease of implementation and for easy debugging I have separated the install and import of necessary libraries and combined the code in one section. 
2. The following notebook uses the combination of Windows and Edge browser.
3. For Mac and Safari combination, Safari already comes installed with a Webdriver but needs to be enabled - Open Safari, go to Preferences > Advanced and check Show Develop menu in menu bar.
In the Develop menu, check Allow Remote Automation. Of course please ping me if you encounter any problems.
4. As the internet speed varies, The code is built with a delay to check for files by sleeping very 3 minute intervals- again to save resources, but if you have a super fast connection and computer you could just delete the delay section.
   
Thank you, happy coding :)

In [7]:
pip install webdriver_manager
# Download and install the correct Edge WebDriver
service = EdgeService(EdgeChromiumDriverManager().install())

Collecting webdriver_manager
  Downloading webdriver_manager-4.0.2-py2.py3-none-any.whl (27 kB)
Collecting python-dotenv
  Downloading python_dotenv-1.0.1-py3-none-any.whl (19 kB)
Installing collected packages: python-dotenv, webdriver_manager
Successfully installed python-dotenv-1.0.1 webdriver_manager-4.0.2
Note: you may need to restart the kernel to use updated packages.


In [10]:
from selenium import webdriver
from selenium import webdriver
from selenium.webdriver.edge.service import Service as EdgeService
from webdriver_manager.microsoft import EdgeChromiumDriverManager
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import Select
import numpy as np
from scipy.interpolate import interp2d # we use this to interpolate the value of specific points that may be between the 0.25x0.25 grid given by IMD
import netCDF4
from netCDF4 import Dataset, num2date
import pandas as pd
import os
import time as ti
import ctypes

service = EdgeService(EdgeChromiumDriverManager().install())

# Initialize WebDriver for Microsoft Edge
driver = webdriver.Edge(service=service)

In [11]:
# Prevent computer from going to sleep when the process is running
ctypes.windll.kernel32.SetThreadExecutionState(0x80000002)

try:
    driver.get('https://www.imdpune.gov.in/cmpg/Griddata/Rainfall_25_NetCDF.html')
    for i in range (1951,2024):
        # Interact with the date picker (adjust selectors as needed)
        date_picker = driver.find_element(By.ID, 'RF25')  # This picks the year that we want the data for
        date_picker.click()

        date = driver.find_element(By.XPATH, f"//*[@value='{i}']")  # We use the xpath to click on the 
        date.click()

        download_button = driver.find_element(By.XPATH, "/html/body/div/div/div[2]/div/form/input")  # Example XPath
        download_button.click()

        ti.sleep(100)

        file_ready = False
        file_path = fr'C:\Users\deqin\Downloads\RF25_ind{i}_rfp25.nc'  # Replace with your file directory

        while not file_ready:
            if os.path.exists(file_path):
                file_ready = True
                print(f'File RF25_ind{i}_rfp25.nc is ready')
            else:
                print(f'Your internet download speed is slower than you estimated. We will check again in 3 minutes if file RF25_ind{i}_rfp25.nc is ready')
                ti.sleep(180)  # Wait for 3 minutes before checking again

        nc_file=netCDF4.Dataset(fr'C:\Users\deqin\Downloads\RF25_ind{i}_rfp25.nc')# use your file directory
        # Extract the latitude, longitude, time, and rainfall variables
        latitude = nc_file.variables['LATITUDE'][:]
        longitude = nc_file.variables['LONGITUDE'][:]
        time = nc_file.variables['TIME'][:]
        rainfall = nc_file.variables['RAINFALL'][:]

        # Convert time to a readable format
        time_units = nc_file.variables['TIME'].units
        time_values = num2date(time, units=time_units)

        # Define the target latitude and longitude sets in this case we take the lat long for Delhi, Mumbai, Kolkata and Chennai
        locations = [
            {'lat': 28.7, 'lon': 77.1},
            {'lat': 19.07, 'lon': 72.8},
            {'lat': 22.34, 'lon': 88.22},
            {'lat': 13.08, 'lon': 80.27}
        ]

        # Initialize a list to store DataFrames for each location
        dfs = []

        # Iterate over each location
        for location in locations:
            target_lat = location['lat']
            target_lon = location['lon']

            # Find the indices of the grid points closest to the target point
            lat_idx = np.searchsorted(latitude, target_lat) - 1
            lon_idx = np.searchsorted(longitude, target_lon) - 1

            # Ensure indices are within bounds
            lat_idx = np.clip(lat_idx, 0, len(latitude) - 2)
            lon_idx = np.clip(lon_idx, 0, len(longitude) - 2)

            # Extract the relevant grid points for interpolation
            lat1, lat2 = latitude[lat_idx], latitude[lat_idx + 1]
            lon1, lon2 = longitude[lon_idx], longitude[lon_idx + 1]

            # Initialize an array to store interpolated rainfall values
            interpolated_rainfall = []

            # Interpolate rainfall data for each time point as the data is available for only .25x.25 grid 
            # We interpolate the rainfall value for our specific latitude and longitude combination through bilinear interpolation
            for t_idx in range(len(time_values)):
                rainfall_subset = rainfall[t_idx, lat_idx:lat_idx + 2, lon_idx:lon_idx + 2]
                interp_func = interp2d([lon1, lon2], [lat1, lat2], rainfall_subset)
                interpolated_value = interp_func(target_lon, target_lat)[0]
                interpolated_rainfall.append(interpolated_value)

            # Create a DataFrame for the current location
            df = pd.DataFrame({
                'Time': time_values,
                'Latitude': [target_lat] * len(time_values),
                'Longitude': [target_lon] * len(time_values),
                'Interpolated Rainfall': interpolated_rainfall
            })

            # Append the DataFrame to the list
            dfs.append(df)

        # Concatenate all DataFrames into a single DataFrame
        final_df = pd.concat(dfs, ignore_index=True)

        # Save to CSV or view the DataFrame
        final_df.to_csv(fr'C:\Users\deqin\Downloads\interpolated_rainfall_timeseries_Indian_Metros_{i}.csv', index=False)
        print(final_df.head())

        # Close the NetCDF file
        nc_file.close()
        os.remove(fr'C:\Users\deqin\Downloads\RF25_ind{i}_rfp25.nc')

finally:
     # Restore sleep settings
    ctypes.windll.kernel32.SetThreadExecutionState(0x80000000)

Your internet download speed is slower than you estimated. We will check again in 3 minutes if file RF25_ind1951_rfp25.nc is ready
File RF25_ind1951_rfp25.nc is ready
                  Time  Latitude  Longitude  Interpolated Rainfall
0  1951-01-01 00:00:00      28.7       77.1                    0.0
1  1951-01-02 00:00:00      28.7       77.1                    0.0
2  1951-01-03 00:00:00      28.7       77.1                    0.0
3  1951-01-04 00:00:00      28.7       77.1                    0.0
4  1951-01-05 00:00:00      28.7       77.1                    0.0
Your internet download speed is slower than you estimated. We will check again in 3 minutes if file RF25_ind1952_rfp25.nc is ready
File RF25_ind1952_rfp25.nc is ready
                  Time  Latitude  Longitude  Interpolated Rainfall
0  1952-01-01 00:00:00      28.7       77.1               0.000000
1  1952-01-02 00:00:00      28.7       77.1               0.064066
2  1952-01-03 00:00:00      28.7       77.1               0.000

File RF25_ind1970_rfp25.nc is ready
                  Time  Latitude  Longitude  Interpolated Rainfall
0  1970-01-01 00:00:00      28.7       77.1                    0.0
1  1970-01-02 00:00:00      28.7       77.1                    0.0
2  1970-01-03 00:00:00      28.7       77.1                    0.0
3  1970-01-04 00:00:00      28.7       77.1                    0.0
4  1970-01-05 00:00:00      28.7       77.1                    0.0
File RF25_ind1971_rfp25.nc is ready
                  Time  Latitude  Longitude  Interpolated Rainfall
0  1971-01-01 00:00:00      28.7       77.1                    0.0
1  1971-01-02 00:00:00      28.7       77.1                    0.0
2  1971-01-03 00:00:00      28.7       77.1                    0.0
3  1971-01-04 00:00:00      28.7       77.1                    0.0
4  1971-01-05 00:00:00      28.7       77.1                    0.0
File RF25_ind1972_rfp25.nc is ready
                  Time  Latitude  Longitude  Interpolated Rainfall
0  1972-01-01 00:00:0

File RF25_ind1989_rfp25.nc is ready
                  Time  Latitude  Longitude  Interpolated Rainfall
0  1989-01-01 00:00:00      28.7       77.1               0.000000
1  1989-01-02 00:00:00      28.7       77.1               0.838863
2  1989-01-03 00:00:00      28.7       77.1               0.130343
3  1989-01-04 00:00:00      28.7       77.1               0.000000
4  1989-01-05 00:00:00      28.7       77.1               0.000000
File RF25_ind1990_rfp25.nc is ready
                  Time  Latitude  Longitude  Interpolated Rainfall
0  1990-01-01 00:00:00      28.7       77.1                    0.0
1  1990-01-02 00:00:00      28.7       77.1                    0.0
2  1990-01-03 00:00:00      28.7       77.1                    0.0
3  1990-01-04 00:00:00      28.7       77.1                    0.0
4  1990-01-05 00:00:00      28.7       77.1                    0.0
File RF25_ind1991_rfp25.nc is ready
                  Time  Latitude  Longitude  Interpolated Rainfall
0  1991-01-01 00:00:0

File RF25_ind2008_rfp25.nc is ready
                  Time  Latitude  Longitude  Interpolated Rainfall
0  2008-01-01 00:00:00      28.7       77.1                    0.0
1  2008-01-02 00:00:00      28.7       77.1                    0.0
2  2008-01-03 00:00:00      28.7       77.1                    0.0
3  2008-01-04 00:00:00      28.7       77.1                    0.0
4  2008-01-05 00:00:00      28.7       77.1                    0.0
File RF25_ind2009_rfp25.nc is ready
                  Time  Latitude  Longitude  Interpolated Rainfall
0  2009-01-01 00:00:00      28.7       77.1                    0.0
1  2009-01-02 00:00:00      28.7       77.1                    0.0
2  2009-01-03 00:00:00      28.7       77.1                    0.0
3  2009-01-04 00:00:00      28.7       77.1                    0.0
4  2009-01-05 00:00:00      28.7       77.1                    0.0
File RF25_ind2010_rfp25.nc is ready
                  Time  Latitude  Longitude  Interpolated Rainfall
0  2010-01-01 00:00:0