## **Kenya-Parliament-Bill-Scraper**

Kenya-Parliament-Bill-Scraper is an automated data extraction tool designed to retrieve bills from the official website of the Kenyan Parliament. This script navigates through the parliamentary website, specifically the "House Business" section, and collects details about bills presented in the National Assembly. It extracts the titles of the bills, their associated PDF download links, and facilitates the downloading of these bills in PDF format. This process aims at collecting bills tabled in parliament for the year 2023 and a few of 2022 by 22/09/2023. The data is to be used in another project. By automating this process, the "Kenya-Parliament-Bill-Scraper" simplifies the task of accessing and storing legislative documents for analysis, research, and documentation.


In [48]:
import os
import requests
from bs4 import BeautifulSoup
    
def scrape_bills():
    pages_to_scrape = 2
    download_folder = '/home/brian/Work/ParBills'

    if not os.path.exists(download_folder):
        os.makedirs(download_folder)

    # start scrapping
    for page_number in range(0, pages_to_scrape):
        base_url= 'http://www.parliament.go.ke/the-national-assembly/house-business/bills'
        url = f"{base_url}?page={page_number}"
        response = requests.get(url)

        if response.status_code != 200:
            break
        else:
            soup = BeautifulSoup(response.text, 'html.parser')
            bill_links = soup.find_all('span', class_='file file--mime-application-pdf file--application-pdf')

            for link in bill_links:
                pdf_link = link.find('a')['href']
                title = link.find('a')['title']

                # get pdf content and download it
                file_path = os.path.join(download_folder, f"{title}.pdf")
                pdf_response = requests.get(pdf_link)

                if pdf_response.status_code == 200:
                    with open(file_path, 'wb') as pdf_file:
                        pdf_file.write(pdf_response.content)
                        print(f"Downloaded: {title}.pdf")
                else:
                    print("Download failed!")

scrape_bills()

Downloaded: THE SOCIALHEALTH INSURANCE BILL, 2023 (NATIONAL ASSEMBLY BILLS NO. 58)-compressed.pdf.pdf
Downloaded: THE PRESERVATION OF PUBLIC SECURITY (AMENDMENT0 BILL, 2023 (NATIONAL ASSEMBLY BILLS NO. 48).pdf.pdf
Downloaded: THE PUBLIC PARTICIPATION BILL, 2023 (NATIONAL ASSEMBLY BILLS NO 52)-1.pdf.pdf
Downloaded: THE MINING (AMENDMENT), BILL, 2023 (NATIONAL ASSEMBLY BILLS NO 51)-compressed.pdf.pdf
Downloaded: THE DIGITAL HEALTH BILL, 2023 (NATIONAL ASSEMBLY NO. 57)-1.pdf.pdf
Downloaded: THE PENAL CODE (AMENDMENT) BILL, 2023 (NATIONAL ASSEMBLY BILLS NO. 55).pdf.pdf
Downloaded: THE PRISONS (AMENDMENT) BILL, 2023 (NATIONAL ASSEMBLY BILLS NO. 54).pdf.pdf
Downloaded: THE INDEPENDENT ELECTORAL AND BOUNDARIES COMMISSION (AMENDMENT) BILL, 2023 (NATIONAL ASSEMBLY BILLS NO. 50).pdf.pdf
Downloaded: Tibunals Bill, 2023.pdf.pdf
Downloaded: Gold Processing Bill, 2023.pdf.pdf
Downloaded: Senate Bill no5 of 2023 on the Cotton Industry Development Bill 2023.pdf.pdf
Downloaded: Senate Bill no6 on the N