Skip to content

GrayAreaorg/festival-video-archive

Repository files navigation

Gray Area Festival Video Archive

Live at: https://archive.grayareafestival.io

A filterable archive of videos from the Gray Area Festival, consisting of in-person and virtual conference talks, presentations, and group panels. The videos in the archive are currently hosted on the Gray Area YouTube Channel, which is currently the source of video metadata.

The archive is built statically with flat-file JSON files and includes english captions for each video, which improves accessibility and encourages filtering, searching, NLP analysis, and/or experimentation with LLMs. These .vtt formatted caption files are stored alongside the video metadata JSON in the data/videos folder.

Filter videos by various curated topics ordered by the frequency of key-words, and word-count per video ID generated from the video captions. View these files in the data/topics.

DB and caption files will build automatically and deploy to the dist branch on push to main using the process-videos.yml workflow. To test/run locally, see Sync all data.

This project is open to submissions and suggestions around engaging with this growing collection of content. You are welcome to make a pull request or contribute to the issues for discussion.

LICENSE

This work and all content therin is licensed under a CC Attribution-NonCommercial-NoDerivatives 4.0 International License

Dev Setup

Built Using

Installation

npm install -D tailwindcss
npx tailwindcss init

CSS Build Scripts

  • npm run css-watch to watch for change and output css to ./dist/output.css.
  • npm run css-build to output minified css to ./dist/output.css for production.

Scripts

Sync all data

Download all video data, vtt's, topic lists, and compile json DB.

./bin/_workflow-local.sh

Playlist URLs are stored in playlists.txt.

Generate Topic Lists

search-caption-string.sh

This script searches for specified query patterns in .vtt files within a source directory, counts the occurrences of each file, and saves the sorted results to an output file.

Example Use:

Find and collate vtt's related to the topic of "software": ./bin/search-caption-string.sh data/videos "software" "programming" "code" "algorhythm" "open source"

Example Output:

  97 data/videos/2015/History of the Future, Art & Technology from 1965 - Yesterday | Casey Reas | The Gray Area Festival [mHox98NFU3o].en.vtt
  72 data/videos/2020/Amelia Winger-Bearskin | Gray Area Festival 2020 [68gwy1W7Duo].en.vtt
  48 data/videos/2016/Situated Systems Panel | Autodesk | The 2016 Gray Area Festival [jTDMOl7MvrU].en.vtt
  47 data/videos/2015/The School of Poetic Computation | Zach Lieberman | The Gray Area Festival 2015 [0F8EZU6B-jE].en.vtt
  43 data/videos/2017/Lauren McCarthy | the 2017 Gray Area Festival [l1qeNMXccvA].en.vtt
  42 data/videos/2020/Ruha Benjamin | Gray Area Festival 2020 Keynote [GISl_8-fbuA].en.vtt
  39 data/videos/2019/Jaron Lanier | Gray Area Festival 2019 Keynote [lsNF4KfmwkY].en.vtt
  ...

Save output to file via redirection >:

Find and collate vtt's related to the topic of "software": ./bin/search-caption-string.sh data/videos "software" "programming" "code" "algorhythm" "open source" > data/topics/software.txt

Generate All Topic Lists

./bin/gen-all-topics.sh

Topic List Query Arrays

Keywords used to generate the curated topics. Any video with captions containing these key words will be sorted by descending word frequency.

  • AI: artificial intelligence, ai
  • Art: art, fine art, gallery
  • Biology: biology, cell, genetics, evolution, physiology, biochemistry, adaptation, reproduction, biodiversity, microbio, molecular
  • Decolonialism: decolonialism, neocolonialism, colonial, indigenous, imperialism, sovereignty
  • Design: design, graphic design, web design, product design
  • Ecology: ecology, earth, climate change, ecosystem, biodiversity, conservation, environment
  • History: history, historical, past, era, ancient, civilization, culture, heritage
  • Indigenous Wisdom: indigenous, wisdom, elder, past, native, heritage, spiritual
  • Machine Learning: machine learning, ml, neural net
  • Metaverse: ar, vr, xr, augmented reality, mixed reality, virtual reality, immersive, virtual world
  • Music: music, song, concert, composition, melody
  • Philosophy: philosophy, metaphysics, ethics, aesthetic, phenomenology
  • Software: software, programming, code, algorhythm, open source
  • Systems: systems, chaos, complexity, modeling, simulation, pattern

TopicList.txt to JSON

topic_lists_to_json.js

Converts a topic list .txt file generated from search-caption-string.sh to JSON format, readable by the front-end.

Usage: node bin/topic_lists_to_json.js file.txt > file.json

Convert entire folder:

#!/bin/bash
for file in $(find . -type f -name "*.txt"); do
  json_file="${file%.txt}.json"
  node bin/topic_lists_to_json.js "$file" > "$json_file"
done

Generate Wordcount from .vtt Captions

wordcount.js

Counts the number of words in all the video caption files (*.vtt) in a given directory. It sorts them by frequency and associates them to the corresponding video via video id.

Usage: node ./bin/wordcount.js srcDirectory outputFile.txt [--include-video-stats]

--include-video-stats (optional) includes the wordcount per video id.

Compile JSON Archive Database

compile-db.js

Usage node ./bin/compile-db.js source dest

  • source A JSON file or folder of JSON files of YouTube metadata scraped from YouTube via ytdlp
  • dest A destination folder to write the JSON single-file database. Will prompt to confirm overwrite.

Get YouTube Autogenerated Captions

get-youtube-subs.sh

Download auto-generated captions in English for given YouTube ID

Usage ./bin/get-youtube-subs.sh youtube_id [dest]

  • youtube_id A YouTubeID
  • dest (optional) .vtt output destination. Default output will be alongside the source.

Get YouTube Subs from JSON Data

get-all-subs.sh

Download YouTube auto-generated subtitles in English (en) .vtt subtitle format using information existing YouTube JSON metadata (prerequiste). Will create .vtt files alongside source JSON

Usage ./bin/get-all-subs.sh sourceDir

  • sourceDir A source folder of YouTube JSON metadata files

Get YouTube Metadata

get-youtube-metadata.sh

Download YouTube metadata in JSON format from single YouTube video, playlist, or playlists.

Usage ./bin/get-youtube-metadata.sh destination

  • destination A destination folder to save the JSON data

Example

# get all 2021 videos
./bin/get-youtube-metadata.sh https://www.youtube.com/playlist?list=PLm8zJ0HKEJIbQgPDRsUHiawVtahvShN8X ./data/videos/2021/

Optional Metadata: Custom Keys via YouTube Description Field

Used for more fidelity in Video Archive. Extra object keys can added to the YouTube description field with the following criteria:

  • Metadata delimiter (6 equals signs) "======"
  • key: value format
  • Append raw metadata to bottom of YouTube Description field
  • When parsed in the archive JSON, prepended with meta_, for example, festival_year: 2015 becomes { "meta_festival_year" : "2015" } in the archive json
======
featured: <featured person(s), or group>
featured_url: https://www.ruhabenjamin.com
festival_year: 2015
... additional metadata as needed ...

About

Gray Area festival video archive of talks, panels, and presentations w/english captions & transcripts.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published