# Weather generator

***
## Weather conversion from hourly to daily


## Function: read_hly()

This function reads a tab-separated, encoded file into a pandas DataFrame, performs some preprocessing, and returns the DataFrame.

### Parameters

- `file` (str): The path to the file to be read. Downloaded from mesonet in 5-min scale.

### Returns

- `pandas.DataFrame`: The processed DataFrame.

### Description

The function performs the following steps:

- Reads the file, assuming it's encoded in 'utf-16' and tab-separated.
- Drops the last two columns of the DataFrame.
- Sorts the DataFrame by the 'Time (LST)' column.
- Interpolates missing values in the DataFrame using linear interpolation.
- Sets the 'Time (LST)' column as the index of the DataFrame.
- Prints the number and names of the columns in the DataFrame.
```

In [232]:
# only for reading the data as dataframe

def read_hly(file):
    indata = pd.read_csv(file, encoding='utf-16', sep='\t', header=1, skipfooter=1, parse_dates=[0], thousands=',',engine='python')
    indata = indata.drop(indata.columns[[-1, -2]],axis = 1)
    indata = indata.sort_values(by=['Time (LST)'])
    indata = indata.interpolate(method='linear', limit_direction='forward', axis=0)
    indata = indata.set_index('Time (LST)')

    
    headlist = indata.columns.values.tolist()
    print(len(headlist), headlist)

    return indata

## Function: hly2dly()

This function takes in a DataFrame of hourly weather data, resamples it to daily data, performs some calculations, and returns a DataFrame of daily weather data.

### usage

```python
hly2dly(indata)
```

### Parameters

- `indata` (pandas.DataFrame): The input DataFrame containing hourly weather data.

### Returns

- `pandas.DataFrame`: The processed DataFrame containing daily weather data.

### Description

The function performs the following steps:

1. Prints the information of the input DataFrame using `indata.info()`.
2. Resamples the 'Wind Speed (mph)', 'Precipitation (in)', 'Relative Humidity (%)', 'Air Temp (°F)', and 'Solar Radiation (W/m²)' columns to daily data using the mean for wind speed, relative humidity, air temperature, and solar radiation, and the sum for precipitation.
3. Renames the resampled columns.
4. Converts the precipitation from inches to millimeters and the temperature from Fahrenheit to Celsius.
5. Calculates the saturation vapor pressure using the formula provided in the `cal_svp` function.
6. Calculates the vapor pressure using the relative humidity and the saturation vapor pressure.
7. Concatenates all the calculated daily data into a new DataFrame.
8. Rounds the values in the new DataFrame to 3 decimal places.
9. Returns the new DataFrame of daily weather data.

This Python function `hly2dly` takes a DataFrame `indata` as input, which is expected to contain hourly weather data. The function processes this data to generate daily averages and sums for various weather parameters. Here's a step-by-step explanation:

1. `indata.info()`: Prints the information about the DataFrame `indata`.

2. `ws_avg_dly = indata['Wind Speed (mph)'].resample('D').mean()`: Resamples the 'Wind Speed (mph)' column to daily frequency by taking the mean of each day's data.

3. `pcpin_sum_dly = indata['Precipitation (in)'].resample('D').sum()`: Resamples the 'Precipitation (in)' column to daily frequency by taking the sum of each day's data. It then converts this data from inches to millimeters.

4. `rh_avg_dly = indata['Relative Humidity (%)'].resample('D').mean()`: Resamples the 'Relative Humidity (%)' column to daily frequency by taking the mean of each day's data.

5. `tf_avg_dly = indata['Air Temp (°F)'].resample('D').mean()`: Resamples the 'Air Temp (°F)' column to daily frequency by taking the mean of each day's data. It then converts this data from Fahrenheit to Celsius.

6. `sr_avg_dly = indata['Solar Radiation (W/m²)'].resample('D').mean()`: Resamples the 'Solar Radiation (W/m²)' column to daily frequency by taking the mean of each day's data.

7. `cal_svp(value)`: This nested function calculates the saturation vapor pressure given a temperature in Celsius.

8. `svp_dly = tc_avg_dly.apply(cal_svp)`: Applies the `cal_svp` function to the daily average temperature in Celsius to calculate the daily saturation vapor pressure.

9. `vp_dly = rh_avg_dly/100*svp_dly*10`: Calculates the daily vapor pressure using the daily average relative humidity and the daily saturation vapor pressure.

10. `dlydata = pd.concat([ws_avg_dly, pcpmm_sum_dly, sr_avg_dly, tc_avg_dly, rh_avg_dly, svp_dly, vp_dly], axis=1)`: Concatenates all the daily data into a new DataFrame.

11. `dlydata = dlydata.round(3)`: Rounds the values in the DataFrame to three decimal places.

12. `return dlydata`: Returns the final DataFrame containing the daily weather data.

Each step in the function is renaming the Series for clarity and ease of understanding when the DataFrame is returned.

In [245]:
def hly2dly(indata):
    print(indata.info())

    ws_avg_dly = indata['Wind Speed (mph)'].resample('D').mean()   # wind speed
    ws_avg_dly.rename('AVG_WS (mph)', inplace=True)

    pcpin_sum_dly = indata['Precipitation (in)'].resample('D').sum()   # solar radiation
    pcpin_sum_dly.rename('precip (in)', inplace=True)
    pcpmm_sum_dly = pcpin_sum_dly*25.4  # convert to mm
    pcpmm_sum_dly.rename('precip (mm)', inplace=True)

    rh_avg_dly = indata['Relative Humidity (%)'].resample('D').mean()   # relative humidity
    rh_avg_dly.rename('AVG_RH (%)', inplace=True)

    tf_avg_dly = indata['Air Temp (°F)'].resample('D').mean()   # max temperature
    tf_avg_dly.rename('AVG_Temp (°F)', inplace=True)
    tc_avg_dly = (tf_avg_dly-32)*5/9  # convert to celsius
    tf_avg_dly.rename('AVG_Temp (°C)', inplace=True)


    sr_avg_dly = indata['Solar Radiation (W/m²)'].resample('D').mean()   # solar radiation
    sr_avg_dly.rename('SUM_SR (W/m²)', inplace=True)

    def cal_svp(value):
        # formula for saturation vapor pressure
        # IF(K3>0,0.6108*EXP(17.27*K3/(237.3+K3)),0.6108*EXP(17.27*0.5/(237.3+0.5)))
        if value > 0:
            return np.exp(17.27*value/(237.3+value))*0.6108
        else:
            return np.exp(17.27*0.5/(237.3+0.5))*0.6108
    svp_dly = tc_avg_dly.apply(cal_svp)  # saturation vapor pressure
    svp_dly.rename('Saturation Vapor Pressure (kPa)', inplace=True)

    vp_dly = rh_avg_dly/100*svp_dly*10  # vapor pressure
    vp_dly.rename('Vapor Pressure (mb)', inplace=True)


    dlydata = pd.concat([ws_avg_dly, pcpmm_sum_dly, sr_avg_dly, tc_avg_dly, rh_avg_dly, svp_dly, vp_dly], axis=1)
    dlydata = dlydata.round(3)
    return dlydata



# Below is implementation block

This Python script is performing the following steps:

1. `import pandas as pd, numpy as np`: Importing the pandas and numpy libraries, which are used for data manipulation and mathematical operations, respectively.

2. `from datetime import date`: Importing the date class from the datetime module, which provides various functions to work with dates.

3. `today = date.today().strftime("%Y%m%d")`: Getting today's date and formatting it as a string in the format "YYYYMMDD".

4. `indata = read_hly('Table (5-min).csv')`: Calling a function `read_hly` (which is not defined in this script) that presumably reads a CSV file named 'Table (5-min).csv' and returns a DataFrame. The DataFrame is stored in the variable `indata`.

5. `indata.head()`: Displaying the first 5 rows of the DataFrame `indata`.

6. `dlydata = hly2dly(indata)`: Calling the function `hly2dly` (which you provided in your previous message) with `indata` as the argument. This function processes the hourly weather data in `indata` to generate daily averages and sums for various weather parameters. The resulting DataFrame is stored in the variable `dlydata`.

7. `dlydata.head()`: Displaying the first 5 rows of the DataFrame `dlydata`.

8. `dlydata.to_csv('Penman_weather_'+today+'_.csv')`: Saving the DataFrame `dlydata` to a CSV file. The filename is generated by concatenating the string 'Penman_weather_' with the string representation of today's date and the string '_.csv'.

In [246]:
import pandas as pd
import numpy as np

# get today's date as string
from datetime import date
today = date.today().strftime("%Y%m%d")

indata = read_hly('Table (5-min).csv')
indata.head()

dlydata = hly2dly(indata)
dlydata.head()

dlydata.to_csv('Penman_weather_'+today+'_.csv') 

  indata = pd.read_csv(file, encoding='utf-16', sep='\t', header=1, skipfooter=1, parse_dates=[0], thousands=',',engine='python')


24 ['Air Temp (°F)', '0.5 m Air Temp (°F)', '1.5 m Air Temp (°F)', '3 m Air Temp (°F)', 'Relative Humidity (%)', 'Precipitation (in)', 'Accumulated Precip (in)', 'Solar Radiation (W/m²)', 'Wind Speed (mph)', 'Wind Direction (°)', 'Wind Gust (mph)', '4" Bare Soil Temp (°F)', '4" Grass Soil Temp (°F)', '2" Soil Temp (°F)', '2" Soil Water Content (%)', '4" Soil Temp (°F)', '4" Soil Water Content (%)', '8" Soil Temp (°F)', '8" Soil Water Content (%)', '20" Soil Temp (°F)', '20" Soil Water Content (%)', 'Inversion Strength', 'Max Inversion', 'Battery Voltage']
<class 'pandas.core.frame.DataFrame'>
DatetimeIndex: 8634 entries, 2024-04-14 15:35:00 to 2024-05-14 15:00:00
Data columns (total 24 columns):
 #   Column                      Non-Null Count  Dtype  
---  ------                      --------------  -----  
 0   Air Temp (°F)               8634 non-null   float64
 1   0.5 m Air Temp (°F)         8634 non-null   float64
 2   1.5 m Air Temp (°F)         8634 non-null   float64
 3   3 m A