## Scrape Meetings (Youtube)

This is using the YoutubeIngestionScraper to pull live streamed / published Board of Education meetings for different school districts in Colorado. The function accepts strings in Y, M , D format to then be converted into Datetime for scraping within that time period.

In [1]:
from cdp_scrapers.youtube_utils import YoutubeIngestionScraper
from datetime import datetime

In [14]:
## dictionary syntax: "Channel Name" : {"School District Name":"Body Searc"}

## This dictionary is ordered by size of the school district
schools_and_body_terms = {
    "DougCoSchools" : {"Douglas County School District": "BOE Meeting"},
    "adams12fivestarschools-esc86" : {"Adams 12 Five Star Schools":"Board of Education Meeting"},
    "psdondemand3088" : {"Poudre School District R-1":"Board of Education Meeting"},
    "bouldervalleyschooldistric5781" : {"Boulder Valley School District RE-2":"Board of Education Meeting"},
}

In [28]:
def scrape_meetings(start_datetime,end_datetime):
    ## storing all of the events in a list
    events = []
    
    ## creating the datetime to be passed into the begin and end params
    start = datetime.strptime(start_datetime, "%Y,%m,%d")
    last = datetime.strptime(end_datetime, "%Y,%m,%d")
    
    ## iterating over the list above
    
    for channels_name, body_search_terms in schools_and_body_terms.items():
        scraper = YoutubeIngestionScraper(channel_name=channels_name,
                                          body_search_terms=body_search_terms,
                                          timezone="MST")
        
        curr_channel_events = scraper.get_events(begin=start,end=last)
        print("Finished",channels_name,"\n")
        
        events = [
            *events,
            *curr_channel_events
        ]
    print ("\nLIST OF EVENTS SCRAPED\n")    
    return events

In [30]:
scrape_meetings("2023,5,1","2023,5,10")

[youtube:tab] Extracting URL: https://www.youtube.com/@DougCoSchools/search?query=BOE+Meeting+after%3A2023-05-01+before%3A2023-05-10
[youtube:tab] @DougCoSchools/search: Downloading webpage
[download] Downloading playlist: DougCoSchools - Search - BOE Meeting after:2023-05-01 before:2023-05-10
[youtube:tab] Playlist DougCoSchools - Search - BOE Meeting after:2023-05-01 before:2023-05-10: Downloading 2 items of 2
[download] Downloading item 1 of 2
[youtube] Extracting URL: https://www.youtube.com/watch?v=6rOyEnCxXys
[youtube] Sleeping 0.5 seconds ...
[youtube] 6rOyEnCxXys: Downloading webpage
[youtube] Sleeping 0.5 seconds ...
[youtube] 6rOyEnCxXys: Downloading android player API JSON
[download] Downloading item 2 of 2
[youtube] Extracting URL: https://www.youtube.com/watch?v=OlcTKhs7Iu8
[youtube] Sleeping 0.5 seconds ...
[youtube] OlcTKhs7Iu8: Downloading webpage
[youtube] Sleeping 0.5 seconds ...
[youtube] OlcTKhs7Iu8: Downloading android player API JSON


         Install PhantomJS to workaround the issue. Please download it from https://phantomjs.org/download.html
         n = zkF6dGMAA453ZbWf14 ; player = https://www.youtube.com/s/player/8c7583ff/player_ias.vflset/en_US/base.js


[download] Finished downloading playlist: DougCoSchools - Search - BOE Meeting after:2023-05-01 before:2023-05-10
Finished DougCoSchools 

[youtube:tab] Extracting URL: https://www.youtube.com/@psdondemand3088/search?query=Board+of+Education+Meeting+after%3A2023-05-01+before%3A2023-05-10
[youtube:tab] @psdondemand3088/search: Downloading webpage
[download] Downloading playlist: PSD On Demand - Search - Board of Education Meeting after:2023-05-01 before:2023-05-10
[youtube:tab] Playlist PSD On Demand - Search - Board of Education Meeting after:2023-05-01 before:2023-05-10: Downloading 0 items of 0
[download] Finished downloading playlist: PSD On Demand - Search - Board of Education Meeting after:2023-05-01 before:2023-05-10
Finished psdondemand3088 

[youtube:tab] Extracting URL: https://www.youtube.com/@bouldervalleyschooldistric5781/search?query=Board+of+Education+Meeting+a...+before%3A2023-05-10
[youtube:tab] @bouldervalleyschooldistric5781/search: Downloading webpage
[download] Down

[EventIngestionModel(body=Body(name='Douglas County School District', is_active=True, start_datetime=None, description=None, end_datetime=None, external_source_id=None), sessions=[Session(session_datetime=datetime.datetime(2023, 5, 9, 0, 0, tzinfo=<StaticTzInfo 'MST'>), video_uri='https://www.youtube.com/watch?v=6rOyEnCxXys', session_index=1, video_start_time=None, video_end_time=None, caption_uri=None, external_source_id='6rOyEnCxXys')], event_minutes_items=None, agenda_uri=None, minutes_uri=None, static_thumbnail_uri=None, hover_thumbnail_uri=None, external_source_id=None),
 EventIngestionModel(body=Body(name='Douglas County School District', is_active=True, start_datetime=None, description=None, end_datetime=None, external_source_id=None), sessions=[Session(session_datetime=datetime.datetime(2023, 5, 8, 0, 0, tzinfo=<StaticTzInfo 'MST'>), video_uri='https://www.youtube.com/watch?v=OlcTKhs7Iu8', session_index=2, video_start_time=None, video_end_time=None, caption_uri=None, external_s