### Lapse rate Equation

$$ T_{cal} = T_{obs} + (H_{Elevation} - L_{Elevation}) * (-0.0065) $$

**Where,** 

$ T_{cal} $ = High elevation calculating temperature\
$ T_{obs} $ = Low elevation observed temperature\
$ H_{Elevation} $ = High elevation (calculating temperature station's elevation)\
$ L_{Elevation} $ = Low elevation (observed temperature station's elevation)\




In [1]:
# Import libraries
import pandas as pd
import numpy as np

In [2]:
def calculate_temp(row, temp_column, df):
    if pd.isna(row[temp_column]):  # If the temperature (e.g., Tmin) is NaN
        # Filter the data for the same date and valid temperature values
        same_date_data = df[(df['date'] == row['date']) & ~pd.isna(df[temp_column])]
        
        if not same_date_data.empty:
            # Calculate the average temperature of available stations
            avg_temp = same_date_data[temp_column].mean()
            
            # Calculate the average elevation difference
            avg_elevation_diff = (same_date_data['elevation'] - row['elevation']).mean()
            
            # Apply the lapse rate adjustment using the average elevation difference
            lapse_rate = -6.5 / 1000  # Lapse rate in °C per meter
            adjusted_temp = avg_temp + avg_elevation_diff * lapse_rate
            
            return adjusted_temp  # Return the adjusted temperature
        
    return row[temp_column]  # If temperature is not NaN, return the original value

In [37]:
for_nan_fill_df = pd.read_csv(r'G:\fresh_start\paper\code_paper\main_data\raw_data\df33333333_Nan.csv')

for_nan_fill_df['date'] = pd.to_datetime(for_nan_fill_df['date'])
print(for_nan_fill_df)

             date  station        regions        lat       long  elevation  \
0      1962-01-01     1316          Tarai  26.820440  87.159170      105.0   
1      1962-01-02     1316          Tarai  26.820440  87.159170      105.0   
2      1962-01-03     1316          Tarai  26.820440  87.159170      105.0   
3      1962-01-04     1316          Tarai  26.820440  87.159170      105.0   
4      1962-01-05     1316          Tarai  26.820440  87.159170      105.0   
...           ...      ...            ...        ...        ...        ...   
512534 2022-12-27     9999  High Mountain  27.961111  86.808889     5200.0   
512535 2022-12-28     9999  High Mountain  27.961111  86.808889     5200.0   
512536 2022-12-29     9999  High Mountain  27.961111  86.808889     5200.0   
512537 2022-12-30     9999  High Mountain  27.961111  86.808889     5200.0   
512538 2022-12-31     9999  High Mountain  27.961111  86.808889     5200.0   

             Tmin      Tmax  
0             NaN       NaN  
1  

In [39]:
start_date = '1992-01-01'
end_date = '2022-12-31'

# Filter the DataFrame to include only rows between start_date and end_date
filtered_df_1992 = for_nan_fill_df[(for_nan_fill_df['date'] >= start_date) & (for_nan_fill_df['date'] <= end_date)]

# Display the first few rows of the filtered DataFrame to verify
print(filtered_df_1992)

             date  station        regions        lat       long  elevation  \
10957  1992-01-01     1316          Tarai  26.820440  87.159170      105.0   
10958  1992-01-02     1316          Tarai  26.820440  87.159170      105.0   
10959  1992-01-03     1316          Tarai  26.820440  87.159170      105.0   
10960  1992-01-04     1316          Tarai  26.820440  87.159170      105.0   
10961  1992-01-05     1316          Tarai  26.820440  87.159170      105.0   
...           ...      ...            ...        ...        ...        ...   
512534 2022-12-27     9999  High Mountain  27.961111  86.808889     5200.0   
512535 2022-12-28     9999  High Mountain  27.961111  86.808889     5200.0   
512536 2022-12-29     9999  High Mountain  27.961111  86.808889     5200.0   
512537 2022-12-30     9999  High Mountain  27.961111  86.808889     5200.0   
512538 2022-12-31     9999  High Mountain  27.961111  86.808889     5200.0   

             Tmin      Tmax  
10957         NaN       NaN  
109

In [32]:
#filtered_df_1992.to_csv(r'G:\fresh_start\paper\code_paper\main_data\raw_data\filtered_df_1992_not_filled.csv')

In [40]:
filtered_df_1992['date'] = pd.to_datetime(filtered_df_1992['date'])

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  filtered_df_1992['date'] = pd.to_datetime(filtered_df_1992['date'])


In [41]:

filtered_df_1992['Tmin'] = filtered_df_1992.apply(lambda row: calculate_temp(row, 'Tmin', filtered_df_1992), axis=1)
filtered_df_1992['Tmax'] = filtered_df_1992.apply(lambda row: calculate_temp(row, 'Tmax', filtered_df_1992), axis=1)
print(filtered_df_1992)

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  filtered_df_1992['Tmin'] = filtered_df_1992.apply(lambda row: calculate_temp(row, 'Tmin', filtered_df_1992), axis=1)


             date  station        regions        lat       long  elevation  \
10957  1992-01-01     1316          Tarai  26.820440  87.159170      105.0   
10958  1992-01-02     1316          Tarai  26.820440  87.159170      105.0   
10959  1992-01-03     1316          Tarai  26.820440  87.159170      105.0   
10960  1992-01-04     1316          Tarai  26.820440  87.159170      105.0   
10961  1992-01-05     1316          Tarai  26.820440  87.159170      105.0   
...           ...      ...            ...        ...        ...        ...   
512534 2022-12-27     9999  High Mountain  27.961111  86.808889     5200.0   
512535 2022-12-28     9999  High Mountain  27.961111  86.808889     5200.0   
512536 2022-12-29     9999  High Mountain  27.961111  86.808889     5200.0   
512537 2022-12-30     9999  High Mountain  27.961111  86.808889     5200.0   
512538 2022-12-31     9999  High Mountain  27.961111  86.808889     5200.0   

             Tmin       Tmax  
10957  -10.472466   0.043769  
1

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  filtered_df_1992['Tmax'] = filtered_df_1992.apply(lambda row: calculate_temp(row, 'Tmax', filtered_df_1992), axis=1)


In [35]:
filtered_df_1992.to_csv(r'G:\fresh_start\paper\code_paper\main_data\raw_data\filtered_df_1992_filled_lapse.csv', index=False)