# Pages and Redirects: [bikejc.org] / [new.bikejc.org](https://new.bikejc.org)

- Look up [bikejc.org] pages and redirects (from [`pages/` dir][pages] / [`redirects.json`])
- Check HTTP status codes at [new.bikejc.org] for each page, redirect source, and redirect destination
- Output results to ["Pages and redirects"] gsheet

[bikejc.org]: https://bikejc.org
[pages]: https://github.com/bikejc/bikejc.github.io/tree/main/pages
[`redirects.json`]: https://github.com/bikejc/bikejc.github.io/blob/main/redirects.json
["Pages and redirects"]: https://docs.google.com/spreadsheets/d/1dA7hR7kl74Hsvj0mBA9By3T3y3Fkd4jiwjsEWTeWRNw/edit

In [1]:
from utz import *
from requests import get, head
chdir(dirname(getcwd()))  # move to repo root

All Git-tracked files:

In [3]:
git_paths = process.lines('git', 'ls-files')
git_paths = Series(git_paths, name='path')
git_paths

Running: git ls-files


0                 .eslintrc.json
1      .github/workflows/ghp.yml
2                     .gitignore
3                CONTRIBUTING.md
4                      README.md
                 ...            
368      styles/system.theme.css
369              styles/user.css
370             styles/views.css
371                tsconfig.json
372           write-redirects.js
Name: path, Length: 373, dtype: object

Just the "pages" (`.tsx` files under `pages/`):

In [4]:
pages = (
    git_paths
    .str.extract(r'pages/(?P<path>[^_].*?)\.tsx').dropna().path
    .str.replace('^index$', '', regex=True)
    .str.replace('/index$', '/', regex=True)
    .sort_values()
)
pages

38                                                     
17                                               about/
14                                  about/board-members
15                                       about/founding
16                                        about/history
18                         about/non-profit-information
23                                            bike-bus/
19                                   bike-bus/aqua-line
20                                   bike-bus/blue-line
21                                   bike-bus/gold-line
22                                  bike-bus/green-line
25                                         bike-bus/map
26                                 bike-bus/orange-line
27                                   bike-bus/pink-line
28                                 bike-bus/purple-line
29                                    bike-bus/red-line
30                                 bike-bus/silver-line
31                                   bike-bus/te

Helpers for checking whether a URL:
- exists (code 200)
- doesn't exist (code 404)
- is a redirect (code 301; also returns destination "dst")

In [6]:
domain = 'bikejc.org'  # current domain
page_prefix = f'https://{domain}/'

def http_head_page(page, domain=None):
    if re.match('^https?://', page):
        url = page
    else:
        if domain is None:
            domain = 'bikejc.org'
        prefix = f'https://{domain}/'
        if page.startswith('/'):
            page = page[1:]
        url = f'{prefix}{page}'
    resp = head(url)
    return resp

def http_head_page_code(page, domain=None):
    resp = http_head_page(page, domain=domain)
    code = resp.status_code
    if code == 301:
        dst = resp.headers['Location']
        if dst.startswith(page_prefix):
            dst = dst[len(page_prefix):]
        else:
            err(f"Unrecognized 301 redirect dst: {dst}")
    else:
        dst = None
    return dict(code=code, dst=dst)

def http_head_page_codes(s, domain=None):
    return sxs(
        s.rename('path'),
        s.apply(http_head_page_code, domain=domain).apply(Series).astype({ 'code': int }),
    )

## Check pages' existence at [bikejc.org](https://bikejc.org)
Some page routes (inferred from repo file paths above) actually redirect to the same path but with a "/" appended:

In [8]:
%%time
codes = http_head_page_codes(pages)
codes

CPU times: user 2.96 s, sys: 162 ms, total: 3.12 s
Wall time: 11.1 s


Unnamed: 0,path,code,dst
38,,200,
17,about/,200,
14,about/board-members,301,about/board-members/
15,about/founding,301,about/founding/
16,about/history,301,about/history/
18,about/non-profit-information,301,about/non-profit-information/
23,bike-bus/,200,
19,bike-bus/aqua-line,301,bike-bus/aqua-line/
20,bike-bus/blue-line,301,bike-bus/blue-line/
21,bike-bus/gold-line,301,bike-bus/gold-line/


Separate 200's from 301's:

In [9]:
is_200 = codes.dst.isna()
redirects = codes[~is_200]
oks = codes[is_200]
redirects

Unnamed: 0,path,code,dst
14,about/board-members,301,about/board-members/
15,about/founding,301,about/founding/
16,about/history,301,about/history/
18,about/non-profit-information,301,about/non-profit-information/
19,bike-bus/aqua-line,301,bike-bus/aqua-line/
20,bike-bus/blue-line,301,bike-bus/blue-line/
21,bike-bus/gold-line,301,bike-bus/gold-line/
22,bike-bus/green-line,301,bike-bus/green-line/
25,bike-bus/map,301,bike-bus/map/
26,bike-bus/orange-line,301,bike-bus/orange-line/


Verify that the redirected pages all resolve:

In [10]:
%%time
codes2 = http_head_page_codes(redirects.dst)
assert codes2.dst.isna().all()
codes2

CPU times: user 1.72 s, sys: 135 ms, total: 1.85 s
Wall time: 8.43 s


Unnamed: 0,path,code,dst
14,about/board-members/,200,
15,about/founding/,200,
16,about/history/,200,
18,about/non-profit-information/,200,
19,bike-bus/aqua-line/,200,
20,bike-bus/blue-line/,200,
21,bike-bus/gold-line/,200,
22,bike-bus/green-line/,200,
25,bike-bus/map/,200,
26,bike-bus/orange-line/,200,


These page paths resolved without redirecting (`/index.tsx` files)

In [12]:
oks

Unnamed: 0,path,code,dst
38,,200,
17,about/,200,
23,bike-bus/,200,
37,events/,200,
35,events/bike-jcast/,200,
49,news/,200,
45,news/articles/,200,
55,projects/,200,
58,resources/,200,
65,support/,200,


## Connect to ["Pages and redirects" gsheet](https://docs.google.com/spreadsheets/d/1dA7hR7kl74Hsvj0mBA9By3T3y3Fkd4jiwjsEWTeWRNw/edit)

In [13]:
from gspread_pandas import Spread, Client

In [14]:
spread = Spread('Pages and redirects')

## Pages (old and new site)

Staging domain for new/demo site: [new.bikejc.org](https://new.bikejc.org)

In [15]:
new_domain = 'new.bikejc.org'
new_page_prefix = f'https://{new_domain}/'

Combine pages that resolved immediately + those that redirected to a trailing-slash version:

In [16]:
paths = concat([ oks, codes2 ]).path.sort_values()
paths = (
    sxs(
        paths,
        (
            paths
            .apply(lambda path: f'{page_prefix}{path}')
            .rename('url')
        ),
        (
            paths
            .apply(lambda path: f'{new_page_prefix}{path}')
            .rename('new_url')
        ),
    )
    .reset_index(drop=True)
)
paths

Unnamed: 0,path,url,new_url
0,,https://bikejc.org/,https://new.bikejc.org/
1,about/,https://bikejc.org/about/,https://new.bikejc.org/about/
2,about/board-members/,https://bikejc.org/about/board-members/,https://new.bikejc.org/about/board-members/
3,about/founding/,https://bikejc.org/about/founding/,https://new.bikejc.org/about/founding/
4,about/history/,https://bikejc.org/about/history/,https://new.bikejc.org/about/history/
5,about/non-profit-information/,https://bikejc.org/about/non-profit-information/,https://new.bikejc.org/about/non-profit-inform...
6,bike-bus/,https://bikejc.org/bike-bus/,https://new.bikejc.org/bike-bus/
7,bike-bus/aqua-line/,https://bikejc.org/bike-bus/aqua-line/,https://new.bikejc.org/bike-bus/aqua-line/
8,bike-bus/blue-line/,https://bikejc.org/bike-bus/blue-line/,https://new.bikejc.org/bike-bus/blue-line/
9,bike-bus/gold-line/,https://bikejc.org/bike-bus/gold-line/,https://new.bikejc.org/bike-bus/gold-line/


### Check pages' existence at [new.bikejc.org](https://new.bikejc.org)

In [17]:
new_pages = http_head_page_codes(paths.new_url)
new_pages

Unnamed: 0,path,code,dst
0,https://new.bikejc.org/,200,
1,https://new.bikejc.org/about/,200,
2,https://new.bikejc.org/about/board-members/,200,
3,https://new.bikejc.org/about/founding/,200,
4,https://new.bikejc.org/about/history/,200,
5,https://new.bikejc.org/about/non-profit-inform...,200,
6,https://new.bikejc.org/bike-bus/,200,
7,https://new.bikejc.org/bike-bus/aqua-line/,200,
8,https://new.bikejc.org/bike-bus/blue-line/,200,
9,https://new.bikejc.org/bike-bus/gold-line/,200,


In [19]:
new_codes = new_pages.code.rename('new_url_code')
pages = sxs(paths, new_codes)
pages

Unnamed: 0,path,url,new_url,new_url_code
0,,https://bikejc.org/,https://new.bikejc.org/,200
1,about/,https://bikejc.org/about/,https://new.bikejc.org/about/,200
2,about/board-members/,https://bikejc.org/about/board-members/,https://new.bikejc.org/about/board-members/,200
3,about/founding/,https://bikejc.org/about/founding/,https://new.bikejc.org/about/founding/,200
4,about/history/,https://bikejc.org/about/history/,https://new.bikejc.org/about/history/,200
5,about/non-profit-information/,https://bikejc.org/about/non-profit-information/,https://new.bikejc.org/about/non-profit-inform...,200
6,bike-bus/,https://bikejc.org/bike-bus/,https://new.bikejc.org/bike-bus/,200
7,bike-bus/aqua-line/,https://bikejc.org/bike-bus/aqua-line/,https://new.bikejc.org/bike-bus/aqua-line/,200
8,bike-bus/blue-line/,https://bikejc.org/bike-bus/blue-line/,https://new.bikejc.org/bike-bus/blue-line/,200
9,bike-bus/gold-line/,https://bikejc.org/bike-bus/gold-line/,https://new.bikejc.org/bike-bus/gold-line/,200


### Write to "Pages" sheet:

In [20]:
spread.df_to_sheet(pages, index=False, sheet='Pages', replace=True)

## Load [`redirects.json`](https://github.com/bikejc/bikejc.github.io/blob/main/redirects.json)
This file contains "dynamic" redirects.

The site build process calls [`write-redirects.js`](https://github.com/bikejc/bikejc.github.io/blob/main/write-redirects.js), which generates HTML files that look like:
```html
<meta http-equiv=Refresh content="0; url=https://bikejc.regfox.com/ward-tour-2023?t=ref-wrd" />
```

In Wix, we'll probably just make these permanent (301) redirects (though it was nice to be able to arbitrariliy change them, with the current setup…)

In [21]:
with open('redirects.json', 'r') as f:
    dyn_redirs = json.load(f)
dyn_redirs = DF([ dict(src=src, dst=dst) for src, dst in dyn_redirs.items() ])
dyn_redirs

Unnamed: 0,src,dst
0,/bbsu,mailto:bikebus@bikejc.org?subject=Sign%20up&bo...
1,/bbsuf,/bike-bus/signup
2,/bergen-pbl,https://actionnetwork.org/petitions/build-a-be...
3,/events/jersey-city-ward-tour,/ward-tour
4,/events/jersey-city-ward-tour/volunteer,/ward-tour/2022/volunteer
5,/events/jersey-city-ward-tour/ward-tour-route,/ward-tour/2022/ward-tour-route
6,/events/jersey-city-ward-tour/finish-line-fest...,/ward-tour/2022/finish-line-festival
7,/events/jersey-city-ward-tour/ward-tour-sponsors,/ward-tour/2022/ward-tour-sponsors
8,/wt,/ward-tour
9,/wt/faq,/ward-tour/faq


### Create Wix-compatible redirects sheet
- Combine static (`<page>` → `<page>/` redirects computed previously) and dynamic (`redirects.json`) redirects
- Rename columns to match Wix's desired import format

In [22]:
stat_redirs = redirects.apply(lambda r: dict(src=f'/{r.path}', dst=f'/{r.dst}'), axis=1).apply(Series)
wix_redirs = concat([ dyn_redirs, stat_redirs ]).rename(columns={
    'src': 'Old URL',
    'dst': 'New URL',
})
wix_redirs

Unnamed: 0,Old URL,New URL
0,/bbsu,mailto:bikebus@bikejc.org?subject=Sign%20up&bo...
1,/bbsuf,/bike-bus/signup
2,/bergen-pbl,https://actionnetwork.org/petitions/build-a-be...
3,/events/jersey-city-ward-tour,/ward-tour
4,/events/jersey-city-ward-tour/volunteer,/ward-tour/2022/volunteer
5,/events/jersey-city-ward-tour/ward-tour-route,/ward-tour/2022/ward-tour-route
6,/events/jersey-city-ward-tour/finish-line-fest...,/ward-tour/2022/finish-line-festival
7,/events/jersey-city-ward-tour/ward-tour-sponsors,/ward-tour/2022/ward-tour-sponsors
8,/wt,/ward-tour
9,/wt/faq,/ward-tour/faq


In [23]:
spread.df_to_sheet(wix_redirs, index=False, sheet='Page Redirects (for Wix import)', replace=True)

## Check redirects' "src" and "dst" codes at [new.bikejc.org](https://new.bikejc.org)
Combine them, write to ["Redirects" sheet](https://docs.google.com/spreadsheets/d/1dA7hR7kl74Hsvj0mBA9By3T3y3Fkd4jiwjsEWTeWRNw/edit#gid=1880431312)

### Check redirects' "dst" codes

In [24]:
redir_dsts = wix_redirs['New URL']
redir_dsts = redir_dsts[redir_dsts.str.startswith('/')].drop_duplicates()
redir_dsts

1                                      /bike-bus/signup
3                                            /ward-tour
4                             /ward-tour/2022/volunteer
5                       /ward-tour/2022/ward-tour-route
6                  /ward-tour/2022/finish-line-festival
7                    /ward-tour/2022/ward-tour-sponsors
9                                        /ward-tour/faq
14                                 /ward-tour/2022/faqs
18                                            /bike-bus
19                                        /bike-bus/map
29                   /ward-tour-2023-sponsor-packet.pdf
14                                /about/board-members/
15                                     /about/founding/
16                                      /about/history/
18                       /about/non-profit-information/
19                                 /bike-bus/aqua-line/
20                                 /bike-bus/blue-line/
21                                 /bike-bus/gol

In [25]:
%%time
new_url_dst_codes = http_head_page_codes(redir_dsts, domain=new_domain)
new_url_dst_codes

Unnamed: 0,path,code,dst
1,/bike-bus/signup,200,
3,/ward-tour,200,
4,/ward-tour/2022/volunteer,200,
5,/ward-tour/2022/ward-tour-route,200,
6,/ward-tour/2022/finish-line-festival,200,
7,/ward-tour/2022/ward-tour-sponsors,200,
9,/ward-tour/faq,200,
14,/ward-tour/2022/faqs,200,
18,/bike-bus,200,
19,/bike-bus/map,200,


In [26]:
new_url_dsts = sxs(
    new_url_dst_codes.dropna(axis=1),
    new_url_dst_codes.path.apply(lambda p: f'{new_page_prefix}{p[1:]}').rename('new_url'),
)
new_url_dsts

Unnamed: 0,path,code,new_url
1,/bike-bus/signup,200,https://new.bikejc.org/bike-bus/signup
3,/ward-tour,200,https://new.bikejc.org/ward-tour
4,/ward-tour/2022/volunteer,200,https://new.bikejc.org/ward-tour/2022/volunteer
5,/ward-tour/2022/ward-tour-route,200,https://new.bikejc.org/ward-tour/2022/ward-tou...
6,/ward-tour/2022/finish-line-festival,200,https://new.bikejc.org/ward-tour/2022/finish-l...
7,/ward-tour/2022/ward-tour-sponsors,200,https://new.bikejc.org/ward-tour/2022/ward-tou...
9,/ward-tour/faq,200,https://new.bikejc.org/ward-tour/faq
14,/ward-tour/2022/faqs,200,https://new.bikejc.org/ward-tour/2022/faqs
18,/bike-bus,200,https://new.bikejc.org/bike-bus
19,/bike-bus/map,200,https://new.bikejc.org/bike-bus/map


### Check redirects' "src" codes

In [27]:
redir_srcs = wix_redirs['Old URL']
redir_srcs

0                                                 /bbsu
1                                                /bbsuf
2                                           /bergen-pbl
3                         /events/jersey-city-ward-tour
4               /events/jersey-city-ward-tour/volunteer
5         /events/jersey-city-ward-tour/ward-tour-route
6     /events/jersey-city-ward-tour/finish-line-fest...
7      /events/jersey-city-ward-tour/ward-tour-sponsors
8                                                   /wt
9                                               /wt/faq
10                                             /wt/palm
11                                              /wt/reg
12                                         /wt/register
13                                        /wardtour/faq
14         /events/jersey-city-ward-tour/ward-tour-faqs
15                                  /ward-tour/register
16                                   /wardtour/register
17                                  /ward-tour/c

In [28]:
%%time
new_url_src_codes = http_head_page_codes(redir_srcs, domain=new_domain)
new_url_src_codes

Unnamed: 0,path,code,dst
0,/bbsu,404,
1,/bbsuf,404,
2,/bergen-pbl,404,
3,/events/jersey-city-ward-tour,200,
4,/events/jersey-city-ward-tour/volunteer,200,
5,/events/jersey-city-ward-tour/ward-tour-route,200,
6,/events/jersey-city-ward-tour/finish-line-fest...,200,
7,/events/jersey-city-ward-tour/ward-tour-sponsors,200,
8,/wt,404,
9,/wt/faq,404,


In [29]:
new_url_srcs = sxs(
    new_url_src_codes.dropna(axis=1),
    new_url_src_codes.path.apply(lambda p: f'{new_page_prefix}{p[1:]}').rename('new_url'),
)
new_url_srcs

Unnamed: 0,path,code,new_url
0,/bbsu,404,https://new.bikejc.org/bbsu
1,/bbsuf,404,https://new.bikejc.org/bbsuf
2,/bergen-pbl,404,https://new.bikejc.org/bergen-pbl
3,/events/jersey-city-ward-tour,200,https://new.bikejc.org/events/jersey-city-ward...
4,/events/jersey-city-ward-tour/volunteer,200,https://new.bikejc.org/events/jersey-city-ward...
5,/events/jersey-city-ward-tour/ward-tour-route,200,https://new.bikejc.org/events/jersey-city-ward...
6,/events/jersey-city-ward-tour/finish-line-fest...,200,https://new.bikejc.org/events/jersey-city-ward...
7,/events/jersey-city-ward-tour/ward-tour-sponsors,200,https://new.bikejc.org/events/jersey-city-ward...
8,/wt,404,https://new.bikejc.org/wt
9,/wt/faq,404,https://new.bikejc.org/wt/faq


### Combine redirects' "src" and "dst" codes

In [31]:
all_redirs = (
    wix_redirs
    .rename(columns={
        'Old URL': 'src_path',
        'New URL': 'dst',
    })
    .merge(
        new_url_srcs.rename(columns={
            'path': 'src_path',
            'new_url': 'src_new_url',
            'code': 'src_new_code',
        }),
        how='left',
        on='src_path',
    )
)
all_redirs['src_cur_url'] = all_redirs.src_path.apply(lambda p: f'{page_prefix}{p[1:]}')
all_redirs = (
    all_redirs
    .merge(
        new_url_dsts.rename(columns={
            'path': 'dst',
            'new_url': 'dst_new_url',
            'code': 'dst_new_code',            
        }),
        how='left',
        on='dst',
    )
)
all_redirs['dst_new_code'] = all_redirs.dst_new_code.apply(lambda c: '' if isna(c) else str(int(c)))
all_redirs['dst_new_url'] = all_redirs['dst_new_url'].fillna('')
all_redirs = all_redirs[[ 'src_path', 'src_cur_url', 'src_new_url', 'src_new_code', 'dst', 'dst_new_url', 'dst_new_code', ]]
all_redirs

Unnamed: 0,src_path,src_cur_url,src_new_url,src_new_code,dst,dst_new_url,dst_new_code
0,/bbsu,https://bikejc.org/bbsu,https://new.bikejc.org/bbsu,404,mailto:bikebus@bikejc.org?subject=Sign%20up&bo...,,
1,/bbsuf,https://bikejc.org/bbsuf,https://new.bikejc.org/bbsuf,404,/bike-bus/signup,https://new.bikejc.org/bike-bus/signup,200.0
2,/bergen-pbl,https://bikejc.org/bergen-pbl,https://new.bikejc.org/bergen-pbl,404,https://actionnetwork.org/petitions/build-a-be...,,
3,/events/jersey-city-ward-tour,https://bikejc.org/events/jersey-city-ward-tour,https://new.bikejc.org/events/jersey-city-ward...,200,/ward-tour,https://new.bikejc.org/ward-tour,200.0
4,/events/jersey-city-ward-tour/volunteer,https://bikejc.org/events/jersey-city-ward-tou...,https://new.bikejc.org/events/jersey-city-ward...,200,/ward-tour/2022/volunteer,https://new.bikejc.org/ward-tour/2022/volunteer,200.0
5,/events/jersey-city-ward-tour/ward-tour-route,https://bikejc.org/events/jersey-city-ward-tou...,https://new.bikejc.org/events/jersey-city-ward...,200,/ward-tour/2022/ward-tour-route,https://new.bikejc.org/ward-tour/2022/ward-tou...,200.0
6,/events/jersey-city-ward-tour/finish-line-fest...,https://bikejc.org/events/jersey-city-ward-tou...,https://new.bikejc.org/events/jersey-city-ward...,200,/ward-tour/2022/finish-line-festival,https://new.bikejc.org/ward-tour/2022/finish-l...,200.0
7,/events/jersey-city-ward-tour/ward-tour-sponsors,https://bikejc.org/events/jersey-city-ward-tou...,https://new.bikejc.org/events/jersey-city-ward...,200,/ward-tour/2022/ward-tour-sponsors,https://new.bikejc.org/ward-tour/2022/ward-tou...,200.0
8,/wt,https://bikejc.org/wt,https://new.bikejc.org/wt,404,/ward-tour,https://new.bikejc.org/ward-tour,200.0
9,/wt/faq,https://bikejc.org/wt/faq,https://new.bikejc.org/wt/faq,404,/ward-tour/faq,https://new.bikejc.org/ward-tour/faq,200.0


### Write to "Redirects" sheet

In [32]:
spread.df_to_sheet(all_redirs, index=False, sheet='Redirects', replace=True)