Legistar API does define key "EventVideoPath" in `Event`. Seattle, however, does not use that field for its legistations.

In [1]:
from cdp_scrapers.legistar_utils import get_legistar_events_for_timespan
from datetime import datetime
legistar_evs = get_legistar_events_for_timespan(
    'seattle',
    begin=datetime(2021, 8, 3),
    end=datetime(2021, 8, 5)
)

In [3]:
print(legistar_evs[0]["EventVideoPath"])
#print(legistar_evs[0])

None


What is available instead is a URL to a web site for this event in "EventInSiteURL"

In [4]:
print(legistar_evs[0]["EventInSiteURL"])

https://seattle.legistar.com/MeetingDetail.aspx?LEGID=4831&GID=393&G=FFE3B678-CEF6-4197-84AC-5204EA4CFC0C


On that web page, there is a "Meeting video" with a link to another web page. If you view the page source and search for "Meeeting video", you can see that the link is an `a` tag like `<a id="ctl00_ContentPlaceHolder1_hypVideo"... href="http://seattlechannel.org..." ...>`

In [6]:
from urllib.request import urlopen
from bs4 import BeautifulSoup
import re

# load the seattle.legistar.com web page for parsing
with urlopen(legistar_evs[0]["EventInSiteURL"]) as resp:
    soup = BeautifulSoup(resp.read(), "html.parser")

# get the href attribute from the a tag with a certain id pattern
video_page_url = soup.find(
    "a",
    id=re.compile(r"ct\S*_ContentPlaceHolder\S*_hypVideo"),
    class_="videolink",
)["href"]

print(video_page_url)

http://seattlechannel.org/mayor-and-council/city-council/2020-2021-finance-and-housing-committee/?videoid=x130988&amp;Mode2=Video


Now do another round of page source examination to finally get the URI to the video and caption. They are given inside a `script` tag like
```
...
playerInstance.setup({
sources: [
    {
        file: "video.seattle.gov/media/council/fin_080321_2612131V.mp4",
        label: "Auto"
    }
]
...
tracks: [{
    file: "documents/seattlechannel/closedcaption/2021/fin_080321_2612131.vtt",
    label: "English",
    kind: "captions",
    "default": true
}
...
```

In [8]:
# load the video page to get the actual video url
with urlopen(video_page_url) as resp:
    soup = BeautifulSoup(resp.read(), "html.parser")

# entire script tag text that has the video player setup call
video_script_text = soup.find(
    "script", text=re.compile(r"playerInstance\.setup")
).string

# beginning of the {...} inside playerInstance.setup()
player_arg_start = re.search(
    r"playerInstance\.setup\((\{)", video_script_text
).start(1)

# finding the end of the playerInstance.setup() blob by looking for
# the next playerInstance object reference following the closing )
video_json_blob = video_script_text[
    player_arg_start : player_arg_start
    + re.search(
        r"\)\;\s*\n\s*playerInstance", video_script_text[player_arg_start:]
    ).start(0)
]

print(video_json_blob)

{
            sources: [
                {
                    file: "//video.seattle.gov/media/council/fin_080321_2612131V.mp4",
                    label: "Auto"
                }
            ],
            image: "images/seattlechannel/videos/2021/Q3/finance_080321.jpg",
            primary: "html5",

                
                tracks: [{
                    file: "documents/seattlechannel/closedcaption/2021/fin_080321_2612131.vtt",
                    label: "English",
                    kind: "captions",
                    "default": true
                }
                
                 ], 
                sharing: {
                        code: encodeURI(embedCode),
                        link: shareLink
                    },
                ga: {
                    idstring:'Finance &amp; Housing Committee 8/3/21'
                }
            }


In [9]:
# where "sources:" start
videos_start = video_json_blob.find("sources:")
# where the closing ] is after "sources:"
videos_end = video_json_blob.find("],", videos_start)
# as shown above, url will start with // so prepend https:
video_uris = [
    "https:" + i
    for i in re.findall(
        r"file\:\s*\"([^\"]+)",
        video_json_blob[videos_start:videos_end],
    )
]

captions_start = video_json_blob.find("tracks:")
captions_end = video_json_blob.find("],", captions_start)
caption_uris = [
    "https://www.seattlechannel.org/" + i
    for i in re.findall(
        r"file\:\s*\"([^\"]+)",
        video_json_blob[captions_start:captions_end],
    )
]

print(video_uris)
print(caption_uris)

['https://video.seattle.gov/media/council/fin_080321_2612131V.mp4']
['https://www.seattlechannel.org/documents/seattlechannel/closedcaption/2021/fin_080321_2612131.vtt']
