Skip to content

Commit

Permalink
archive external urls too (#127)
Browse files Browse the repository at this point in the history
  • Loading branch information
The Open Buddhist University committed Jun 23, 2023
1 parent cc76b30 commit bcf94d3
Showing 1 changed file with 16 additions and 1 deletion.
17 changes: 16 additions & 1 deletion .github/workflows/archive.yml
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,9 @@ on:
- cron: "40 3 15 5,11 *"
jobs:
Archive:
env:
LOGFILE: "Links/7_lychee main_content prod(^content).txt"
GH_TOKEN: ${{ secrets.BUILD_ACTION_TOKEN }}
runs-on: ubuntu-latest
steps:
- name: Checkout the Code
Expand All @@ -13,9 +16,21 @@ jobs:
ref: main
- name: Install Dependencies
run: |
cd scripts/archivable_urls
RUNID=$(gh api -H "Accept: application/vnd.github+json" -H "X-GitHub-Api-Version: 2022-11-28" "/repos/buddhist-uni/buddhist-uni.github.io/actions/workflows/9334935/runs" -q '.workflow_runs[0].id')
echo "Last runid was $RUNID"
gh api -H "Accept: application/vnd.github+json" -H "X-GitHub-Api-Version: 2022-11-28" "/repos/buddhist-uni/buddhist-uni.github.io/actions/runs/$RUNID/logs" > logs.zip
unzip logs.zip "$LOGFILE"
mv "$LOGFILE" "lycheeout.txt"
python extracturls.py
python filterurls.py # creates scripts/archivable_urls/filteredurls.txt
cd ~
printf "${{ secrets.ARCHIVE_ORG_AUTH }}" > archive.org.auth
pip install tqdm
- name: Run the Site Archiver
- name: Archive Archivable External Links
shell: bash
run: |
python -c "from scripts.archive_site import *; urls = Path('scripts/archivable_urls/filteredurls.txt').read_text().split(); archive_urls(urls)"
- name: Archive Internal Pages
run: |
python scripts/archive_site.py

0 comments on commit bcf94d3

Please sign in to comment.