# **Spatial Data Extraction and Climate Data Averaging**

This Jupyter Notebook processes spatial data for Indian districts using a shapefile and integrates it with climate data from the NASA POWER API. It has two main sections:

## **1. Extracting District Centroids from a Shapefile**
- **Purpose**: To extract the geographic centroids (latitude and longitude) of all districts in India.
- **Data Source**: [India District Level Shapefile 2022](https://www.kaggle.com/datasets/ankitgaikar1995/india-district-level-shape-file-2022).[Kaggle source to identify lat and long values of different districts]
- **Steps**:
  1. **Loading Shapefile**: 
     - Uses `GeoPandas` to load the district boundary shapefile (`DISTRICT_BOUNDARY.shp`).
  2. **Reprojecting CRS**:
     - If the shapefile's coordinate reference system (CRS) is not WGS84 (`EPSG:4326`), it is converted to WGS84 for compatibility with the NASA POWER API.
  3. **Extracting Centroids**:
     - Calculates the geometric center of each district as its latitude (`lat`) and longitude (`lon`).
     - A dictionary is created with district names as keys and their centroids as values.
  4. **Saving Results**:
     - The dictionary is saved to a JSON file (`districts_lat_lon.json`) for later use.

- **Key Outputs**:
  - A Python dictionary:
    ```python
    {
        "Ahmedabad": {"lat": 23.0225, "lon": 72.5714},
        "Bangalore": {"lat": 12.9716, "lon": 77.5946},
        ...
    }
    ```
  - A JSON file containing the districts and their centroids.


In [1]:
import geopandas as gpd

# Path to the shapefile
shapefile_path = "shape_file_folder/DISTRICT_BOUNDARY.shp"

# Load the shapefile
gdf = gpd.read_file(shapefile_path)

# Check the current CRS
print(f"Current CRS: {gdf.crs}")

# Reproject to WGS84 (if not already in WGS84)
if gdf.crs != "EPSG:4326":
    gdf = gdf.to_crs("EPSG:4326")

# Extract district names and their centroids
districts = {
    row["District"]: {"lat": row.geometry.centroid.y, "lon": row.geometry.centroid.x}
    for _, row in gdf.iterrows()
}

# Print the updated dictionary
print(districts)

# Optional: Save to JSON
import json
with open("districts_lat_lon.json", "w") as f:
    json.dump(districts, f, indent=4)


Current CRS: PROJCS["LCC_WGS84",GEOGCS["WGS 84",DATUM["WGS_1984",SPHEROID["WGS 84",6378137,298.257223563,AUTHORITY["EPSG","7030"]],AUTHORITY["EPSG","6326"]],PRIMEM["Greenwich",0],UNIT["Degree",0.0174532925199433]],PROJECTION["Lambert_Conformal_Conic_2SP"],PARAMETER["latitude_of_origin",24],PARAMETER["central_meridian",80],PARAMETER["standard_parallel_1",12.472944],PARAMETER["standard_parallel_2",35.172806],PARAMETER["false_easting",4000000],PARAMETER["false_northing",4000000],UNIT["metre",1,AUTHORITY["EPSG","9001"]],AXIS["Easting",EAST],AXIS["Northing",NORTH]]
{'MORBI': {'lat': 22.83496706239543, 'lon': 70.90350528859985}, 'AHMAD>B>D': {'lat': 22.756915983462388, 'lon': 72.24363534665498}, '>NAND': {'lat': 22.426787938645756, 'lon': 72.7508601459052}, 'DEVBHUMI DW>RKA': {'lat': 22.12391478095626, 'lon': 69.46035913730104}, 'J>MNAGAR': {'lat': 22.284991538930687, 'lon': 70.19515170389606}, 'KACHCHH': {'lat': 23.649742756240617, 'lon': 69.93992725525901}, 'BH>VNAGAR': {'lat': 21.56725450

## **2. Fetching and Averaging Climate Data from NASA POWER API**
- **Purpose**: To fetch daily weather data for 8 climate parameters for each district and compute their averages over a specified time period (2020–2024).
- **Data Source**: [NASA POWER API](https://power.larc.nasa.gov/).
- **Steps**:
  1. **Loading District Data**:
     - Uses the `districts` dictionary created in the first cell.
  2. **Defining API Parameters**:
     - Fetches daily data for:
       - Temperature (T2M, T2M_MIN, T2M_MAX).
       - Wind Speed (WS2M).
       - Precipitation (PRECTOTCORR).
       - Relative Humidity (RH2M).
       - Solar Irradiance (ALLSKY_SFC_SW_DWN).
       - Surface Pressure (PS).
  3. **Fetching Data**:
     - Sends requests to the NASA POWER API for each parameter and district.
     - Parses the API response and calculates the average value for each parameter.
  4. **Saving Results**:
     - Saves the aggregated data for all districts, including their centroids, as a CSV file (`averaged_weather_data_with_lat_lon.csv`).

- **Key Outputs**:
  - A CSV file containing averaged climate data and geographic information:
    ```
    District,lat,lon,T2M,WS2M,PRECTOTCORR,RH2M,ALLSKY_SFC_SW_DWN,T2M_MIN,T2M_MAX,PS
    Delhi,28.6139,77.2090,25.6,2.3,5.4,65.2,6.8,22.1,28.7,1012
    Mumbai,19.0760,72.8777,27.8,3.1,4.2,70.5,7.3,24.6,30.2,1010
    ```

In [6]:
import requests
import pandas as pd
from io import StringIO
import os

# Define the districts and coordinates
districts = {
    "Delhi": {"lat": 28.6139, "lon": 77.2090},
    "Mumbai": {"lat": 19.0760, "lon": 72.8777},
    # Add other districts here
}

# Date range
start_date = "20200101"
end_date = "20231231"

# Parameters
parameters = ["YEAR", "DOY", "T2M", "WS2M", "PRECTOTCORR", "RH2M", "ALLSKY_SFC_SW_DWN", "T2M_MIN", "T2M_MAX", "PS"]

# NASA POWER API URL template
url_template = ("https://power.larc.nasa.gov/api/temporal/daily/point?"
                "parameters={param}&community=AG&longitude={lon}&latitude={lat}"
                "&start={start}&end={end}&format=CSV")

# Output directory
output_dir = "weather_data"
os.makedirs(output_dir, exist_ok=True)

# Loop through districts and fetch data
averaged_data = []

for district, coords in districts.items():
    print(f"Processing {district}...")
    district_averages = {
        "District": district,
        "lat": coords["lat"],  # Add latitude
        "lon": coords["lon"]   # Add longitude
    }
    
    for param in parameters:
        url = url_template.format(param=param, lon=coords["lon"], lat=coords["lat"], start=start_date, end=end_date)
        response = requests.get(url)
        
        if response.status_code == 200:
            # Parse data and compute averages
            data = pd.read_csv(StringIO(response.text), skiprows=9)
            data.columns = [col.strip() for col in data.columns]
            avg_value = data[data.columns[2]].mean()  # Compute average for the parameter
            district_averages[param] = avg_value
        else:
            print(f"Failed to fetch {param} for {district}. Status: {response.status_code}")
    
    averaged_data.append(district_averages)

# Save averaged data for all districts
averaged_df = pd.DataFrame(averaged_data)
averaged_csv_path = os.path.join(output_dir, "averaged_weather_data_with_lat_lon_1.csv")
averaged_df.to_csv(averaged_csv_path, index=False)

print(f"Averaged data saved at {averaged_csv_path}")


Processing Delhi...
Failed to fetch YEAR for Delhi. Status: 422
Failed to fetch DOY for Delhi. Status: 422
Processing Mumbai...
Failed to fetch YEAR for Mumbai. Status: 422
Failed to fetch DOY for Mumbai. Status: 422
Averaged data saved at weather_data/averaged_weather_data_with_lat_lon_1.csv


## **Resources Used**
1. **Shapefile for Indian Districts**:
   - **Source**: [India District Level Shapefile 2022 on Kaggle](https://www.kaggle.com/datasets/ankitgaikar1995/india-district-level-shape-file-2022).
   - **Purpose**: Provides district boundaries and enables centroid calculation.
2. **NASA POWER API**:
   - **Source**: [NASA POWER](https://power.larc.nasa.gov/).
   - **Purpose**: Provides weather data for specified geographic locations.
3. **Python Libraries**:
   - `GeoPandas`: For spatial data manipulation and CRS conversion.
   - `requests`: To interact with the NASA POWER API.
   - `pandas`: For data manipulation and saving results to CSV.

## **How to Use the Notebook**
1. **Prerequisites**:
   - Install required Python libraries:
     ```bash
     pip install geopandas pandas requests
     ```
   - Download and extract the shapefile dataset into the `shape_file_folder/` directory.
2. **Run the Notebook**:
   - Execute the first cell to extract district centroids and save them as JSON.
   - Execute the second cell to fetch and process weather data, saving results to a CSV.
3. **Verify Results**:
   - Check the JSON and CSV outputs for correctness.


In [3]:
import pandas as pd
# Load the averaged weather data with latitude and longitude
data_path = 'weather_data/averaged_weather_data_with_lat_lon_1.csv'
weather_data = pd.read_csv(data_path)

# Display the first few rows of the dataframe
weather_data.head()

Unnamed: 0,District,lat,lon,T2M,WS2M,PRECTOTCORR,RH2M,ALLSKY_SFC_SW_DWN,T2M_MIN,T2M_MAX,PS
0,Delhi,28.6139,77.209,25.015168,1.880075,2.295168,52.825243,17.0359,18.973402,32.100287,98.428234
1,Mumbai,19.076,72.8777,26.779151,2.447611,8.073073,70.225072,18.379843,22.48883,32.40013,99.771916


In [5]:
weather_data.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 2 entries, 0 to 1
Data columns (total 11 columns):
 #   Column             Non-Null Count  Dtype  
---  ------             --------------  -----  
 0   District           2 non-null      object 
 1   lat                2 non-null      float64
 2   lon                2 non-null      float64
 3   T2M                2 non-null      float64
 4   WS2M               2 non-null      float64
 5   PRECTOTCORR        2 non-null      float64
 6   RH2M               2 non-null      float64
 7   ALLSKY_SFC_SW_DWN  2 non-null      float64
 8   T2M_MIN            2 non-null      float64
 9   T2M_MAX            2 non-null      float64
 10  PS                 2 non-null      float64
dtypes: float64(10), object(1)
memory usage: 304.0+ bytes
