# ag2gcal - Texas A&M Planting Calendar to gCal

Convert the calendars at `calendar_source_urls` to Google Calendar event entries at `calendar_destination_ids`.

Example:

![](img/ag2gcal.png)

In [259]:
import datetime
from datetime import timedelta
import os

import pandas as pd
import dateparser
import gcsa
import gcsa.recurrence

from gcsa.event import Event
from gcsa.google_calendar import GoogleCalendar

### Various variables

- `region_columns` - list of all regions contained in the tables
- `all_columns` - includes the leftmost index for the veggie
- `gcloud_creds_path` - OAuth credentials with permissions to access the person's calendars
- `calendar_source_urls` - source sites for the planting schedules
- `calendar_destination_ids` - map of region to gCal calendar ID where we'll eventually write the calendar events

In [244]:
region_columns = ['Region I', 'Region II', 'Region III', 'Region IV', 'Region V']
all_columns = ['Vegetable'] + region_columns

In [273]:
gcloud_creds_path = os.environ['GOOGLE_CLOUD_AG2GCAL_CREDS_PATH']

In [140]:
calendar_source_urls = {
    'Spring': 'https://aggie-horticulture.tamu.edu/archives/parsons/earthkind/ekgarden14.html',
    'Fall': 'https://aggie-horticulture.tamu.edu/archives/parsons/fallgarden/falldirect.html'
}

In [252]:
calendar_destination_ids = {
    'Region III': 'c_00rno1r8kjpj2tcse3vdkourl8@group.calendar.google.com'
}

### Parse date ranges from tables

The dates in the tables are somewhat inconsistent. Play around with them a bit to get them into a useable, two entry array of `[start_date, end_date]`.

In [188]:
def try_parse_date_range(val, year=datetime.datetime.now().year, default_window_in_days=14, date_format='%b %d %Y'):
    stringified_val = str(val)
    
    start_date = None
    end_date = None
    
    if '-' in stringified_val:
        raw_date_range = val.split('-')
        
        if len(raw_date_range) == 2:
            raw_date_range[:] = [f'{raw_date} {year}' for raw_date in raw_date_range]
            
            start_date = raw_date_range[0]
            end_date = raw_date_range[1]
    elif stringified_val.startswith('After'):
        start_date = f'{stringified_val.replace("After", "")} {year}'
    else:
        start_date = f'{stringified_val} {year}'
        
    if start_date != "nat" and isinstance(start_date, str):
        start_date = dateparser.parse(start_date)
        
        if end_date != None:
            end_date = dateparser.parse(end_date)
        
        if start_date != None and end_date == None:
            end_date = start_date + timedelta(days=default_window_in_days)
    
    return [start_date, end_date]
    

### Parse the HTML tables and coerce them in to DataFrames

In [199]:
def parse_url(season, url):
    print(url)
    
    raw_tables = pd.read_html(url)
    
    calendar_table = raw_tables[1]
    
    if calendar_table.iloc[0][0] == 'Vegetables':
        calendar_table.drop([0], inplace=True)

    calendar_table.columns = all_columns
    calendar_table['season'] = season
        
    return calendar_table

### Do it

In [200]:
raw_calendars = pd.DataFrame()

for calendar_source in calendar_source_urls.items():
    calendar = parse_url(*calendar_source)
    raw_calendars = raw_calendars.append(calendar)
    
raw_calendars.reset_index(inplace=True)

https://aggie-horticulture.tamu.edu/archives/parsons/earthkind/ekgarden14.html
Asparagus
https://aggie-horticulture.tamu.edu/archives/parsons/fallgarden/falldirect.html
Vegetables


In [201]:
for region_column in region_columns:
    raw_calendars[region_column] = raw_calendars[region_column].apply(try_parse_date_range)

In [224]:
unpivoted_calendar = raw_calendars.melt(id_vars=['Vegetable', 'season'], value_vars=region_columns, var_name='region', value_name='date_range')

In [225]:
unpivoted_calendar['start_date'] = unpivoted_calendar.apply(lambda row: row['date_range'][0], axis=1)
unpivoted_calendar['end_date'] = unpivoted_calendar.apply(lambda row: row['date_range'][1], axis=1)
unpivoted_calendar.drop(['date_range'], axis=1, inplace=True)

### Establish gcal clients

This uses a desktop OAuth flow, so each calendar will kick out a URL to authorize. You'll need to replace the `&amp;` in the URL with actual ampersands.

TODO: figure out how to output and not urlencode within a given cell.

In [253]:
calendar_clients = {}

for region, calendar_id in calendar_destination_ids.items():
    calendar_clients[region] = GoogleCalendar(calendar_id, credentials_path=gcloud_creds_path)

### Write events to calendars

Iterate over each of the regions and write the events for that region to the respective calendar defined in `calendar_source_urls`

In [272]:
for region in region_columns:
    print(f'Adding events for {region}')
    
    if region in calendar_clients:
        calendar_client = calendar_clients[region]
    
        for index, row in unpivoted_calendar[unpivoted_calendar['region'] == region].iterrows():
            if row['start_date'] != None and not pd.isnull(row['start_date']):
                event = Event(
                    row['Vegetable'],
                    start=row['start_date'],
                    end=row['end_date']
                )

                calendar_client.add_event(event)

Region I
Region II
Region III
130 --> Vegetable               Asparagus
season                     spring
region                 Region III
start_date    2020-02-01 00:00:00
end_date      2020-02-15 00:00:00
Name: 130, dtype: object --> <class 'pandas._libs.tslibs.timestamps.Timestamp'>
131 --> Vegetable        Beans, snap bush
season                     spring
region                 Region III
start_date    2020-03-05 00:00:00
end_date      2020-05-01 00:00:00
Name: 131, dtype: object --> <class 'pandas._libs.tslibs.timestamps.Timestamp'>
132 --> Vegetable        Beans, snap pole
season                     spring
region                 Region III
start_date    2020-03-05 00:00:00
end_date      2020-04-15 00:00:00
Name: 132, dtype: object --> <class 'pandas._libs.tslibs.timestamps.Timestamp'>
133 --> Vegetable        Beans, Lima bush
season                     spring
region                 Region III
start_date    2020-03-15 00:00:00
end_date      2020-04-15 00:00:00
Name: 133, dtype: 