To set `aclum_urls`:

- Open <https://www.aclum.org/en/legislation>
- Click "See more" until it goes away
- In Dev Tools Console, copy output from:

    ```javascript
    Array.from(document.querySelectorAll('.listing-page-item .field-title-field a')).map(a => a.href)
    ```


In [1]:
aclum_urls = [
    'https://www.aclum.org/en/legislation/end-debt-based-incarceration-license-suspensions',
    'https://www.aclum.org/en/legislation/face-surveillance-regulation',
    'https://www.aclum.org/en/legislation/work-family-mobility',
    'https://www.aclum.org/en/legislation/safe-communities',
    'https://www.aclum.org/en/legislation/votes-act',
    'https://www.aclum.org/en/legislation/remote-access-open-meetings',
    'https://www.aclum.org/en/legislation/treatment-not-imprisonment-0',
    'https://www.aclum.org/en/legislation/fix-massachusetts-civil-rights-act',
    'https://www.aclum.org/en/legislation/massachusetts-information-privacy-act',
    'https://www.aclum.org/en/legislation/artificial-intelligence-commission-0',
    'https://www.aclum.org/en/legislation/automated-license-plate-readers',
    'https://www.aclum.org/en/legislation/reduce-reincarceration-technical-violations-parole',
    'https://www.aclum.org/en/legislation/no-cost-prison-phone-calls',
    'https://www.aclum.org/en/legislation/prevent-imposition-mandatory-minimums-based-juvenile-records',
    'https://www.aclum.org/en/legislation/qualified-immunity-reform',
    'https://www.aclum.org/en/legislation/ending-life-without-parole',
    'https://www.aclum.org/en/legislation/raise-age',
    'https://www.aclum.org/en/legislation/access-justice-0',
    'https://www.aclum.org/en/legislation/medication-opioid-use-disorder-all-correctional-facilities',
    'https://www.aclum.org/en/legislation/treatment-non-carceral-settings-people-not-accused-crimes',
    'https://www.aclum.org/en/legislation/alternatives-community-emergency-services',
    'https://www.aclum.org/en/legislation/full-spectrum-pregnancy-care',
    'https://www.aclum.org/en/legislation/access-emergency-contraception',
    'https://www.aclum.org/en/legislation/healthy-and-safety-sex-workers',
    'https://www.aclum.org/en/legislation/election-participation-eligible-incarcerated-voters',
    'https://www.aclum.org/en/legislation/emergency-paid-sick-time',
    'https://www.aclum.org/en/legislation/common-start-0',
    'https://www.aclum.org/en/legislation/right-counsel-evictions'
]

In [2]:
import json
import re
import requests
from bs4 import BeautifulSoup
from tqdm.notebook import tqdm

In [3]:
def get_soup(url):
    response = requests.get(url)
    response.raise_for_status()
    return BeautifulSoup(response.text, "lxml")

def select_string(soup, selector):
    try:
        return ' '.join(soup.select_one(selector).stripped_strings)
    except AttributeError:
        return ""
    
def parse_bills(soup):
    bill_text = select_string(soup, '.field-legislation-bill')
    return [
        {
            "session": 192,
            "number": b.replace('.', '')
        }
        for b in re.findall(r'([HS]\.\d+)', bill_text)
    ]
    
def parse_legislation(url, soup):
    return {
        "id": url.split("/")[-1],
        "url": url,
        "title": select_string(soup, 'h1'),
        "description": select_string(soup, '.field-body p:not(.alt)'),
        "bills": parse_bills(soup),
    }

In [4]:
aclum_legislation = [
    parse_legislation(url, get_soup(url))
    for url in tqdm(aclum_urls)
]
aclum_legislation[0]

  0%|          | 0/28 [00:00<?, ?it/s]

{'id': 'end-debt-based-incarceration-license-suspensions',
 'url': 'https://www.aclum.org/en/legislation/end-debt-based-incarceration-license-suspensions',
 'title': 'End Debt-Based Incarceration & License Suspensions',
 'description': 'Eliminate license suspension and incarceration of drivers for reasons related to debt and poverty, rather than safety. Help get people back to work, to school, and to medical appointments.',
 'bills': [{'session': 192, 'number': 'H3453'},
  {'session': 192, 'number': 'S2304'}]}

In [5]:
for legislation in aclum_legislation:
    print(legislation['url'])
    print(legislation['title'])
    print(legislation['description'])
    print()

https://www.aclum.org/en/legislation/end-debt-based-incarceration-license-suspensions
End Debt-Based Incarceration & License Suspensions
Eliminate license suspension and incarceration of drivers for reasons related to debt and poverty, rather than safety. Help get people back to work, to school, and to medical appointments.

https://www.aclum.org/en/legislation/face-surveillance-regulation
Face Surveillance Regulation
Strengthen regulation of government use of racially biased and unreliable face recognition technology, which can screen, identify, and surveil people from a distance. Prevent the use of face surveillance to track or monitor every person in the Commonwealth and their activities in public places; require law enforcement to obtain a search warrant before conducting a facial recognition search; establish due process protections for people who are identified using facial recognition; and establish a centralized system for law enforcement agencies to access this technology and 

In [6]:
with open('../dist/legislation.json', 'w') as f:
    json.dump(aclum_legislation, f, indent=4)