### Use the requests library in Python to download a PDF from an online link. Here's a basic example of how to do this:

In [2]:
import requests
import os
from pathlib import Path

# Define the dictionary with PDF filenames and their corresponding URLs
dict_ = {
    'eastern_cape.pdf': 'https://www.elections.org.za/content/Documents/Voting-stations-and-Local-IEC-offices/2021-Municipal-Elections/Eastern-Cape-voting-station-list-(PDF)/',
    'free_state.pdf': 'https://www.elections.org.za/content/Documents/Voting-stations-and-Local-IEC-offices/2021-Municipal-Elections/Free-State-voting-station-list-(PDF)/',
    'gauteng.pdf': 'https://www.elections.org.za/content/Documents/Voting-stations-and-Local-IEC-offices/2021-Municipal-Elections/Gauteng-voting-station-list-(PDF)/',
    'kwazulu_natal.pdf': 'https://www.elections.org.za/content/Documents/Voting-stations-and-Local-IEC-offices/2021-Municipal-Elections/KwaZulu-Natal-voting-station-list-(PDF)/',
    'limpopo.pdf': 'https://www.elections.org.za/content/Documents/Voting-stations-and-Local-IEC-offices/2021-Municipal-Elections/Limpopo-voting-station-list-(PDF)/',
    'mpumalanga.pdf': 'https://www.elections.org.za/content/Documents/Voting-stations-and-Local-IEC-offices/2021-Municipal-Elections/Mpumalanga-voting-station-list-(PDF)/',
    'northern_cape.pdf': 'https://www.elections.org.za/content/Documents/Voting-stations-and-Local-IEC-offices/2021-Municipal-Elections/Northern-Cape-voting-station-list-(PDF)/',
    'north_west.pdf': 'https://www.elections.org.za/content/Documents/Voting-stations-and-Local-IEC-offices/2021-Municipal-Elections/North-West-voting-station-list-(PDF)/'
}

# Get the directory where your notebook is located
notebook_dir =  '/home/mss/Desktop/aimodelx/act_data_science_workstream/data/downloaded_vd_pdfs'

# Loop through the dictionary and download PDF files
for pdf_filename, pdf_url in dict_.items():
    # Combine the notebook directory and PDF filename to get the full local file path
    local_file_path = os.path.join(notebook_dir, pdf_filename)

    # Send an HTTP GET request to the PDF URL
    response = requests.get(pdf_url)

    # Check if the request was successful (status code 200)
    if response.status_code == 200:
        # Open the local file and write the content of the response to it
        with open(local_file_path, 'wb') as pdf_file:
            pdf_file.write(response.content)

        print(f"Downloaded {pdf_filename} and saved to {local_file_path}")
    else:
        print(f"Failed to download {pdf_filename}. Status code:", response.status_code)


Downloaded eastern_cape.pdf and saved to /home/mss/Desktop/aimodelx/act_data_science_workstream/data/downloaded_vd_pdfs/eastern_cape.pdf
Downloaded free_state.pdf and saved to /home/mss/Desktop/aimodelx/act_data_science_workstream/data/downloaded_vd_pdfs/free_state.pdf
Downloaded gauteng.pdf and saved to /home/mss/Desktop/aimodelx/act_data_science_workstream/data/downloaded_vd_pdfs/gauteng.pdf
Downloaded kwazulu_natal.pdf and saved to /home/mss/Desktop/aimodelx/act_data_science_workstream/data/downloaded_vd_pdfs/kwazulu_natal.pdf
Downloaded limpopo.pdf and saved to /home/mss/Desktop/aimodelx/act_data_science_workstream/data/downloaded_vd_pdfs/limpopo.pdf
Downloaded mpumalanga.pdf and saved to /home/mss/Desktop/aimodelx/act_data_science_workstream/data/downloaded_vd_pdfs/mpumalanga.pdf
Downloaded northern_cape.pdf and saved to /home/mss/Desktop/aimodelx/act_data_science_workstream/data/downloaded_vd_pdfs/northern_cape.pdf
Downloaded north_west.pdf and saved to /home/mss/Desktop/aimodelx

### To collect handwritten text from a scanned PDF using Python, you can follow these steps:


### 1.) PDF Text Extraction in Python using PyPDF2

This code snippet showcases how to extract text from a PDF file using Python and the PyPDF2 library. The script opens a PDF document, iterates through its pages, and extracts the text content, storing it in a string. By employing PyPDF2's functionality, this process is made straightforward, making it a valuable tool for various applications that involve working with PDF files in Python, such as data analysis or text mining.

Install Required Libraries:
First, you'll need to install the necessary Python libraries. You can use the PyPDF2 library to extract text from PDFs, and the PyMuPDF (MuPDF) library for OCR (Optical Character Recognition) to extract text from scanned images.

You can install these libraries using pip:

    1.pip install PyPDF2


2.) Extract Text from PDF:
Use PyPDF2 to extract text from the scanned PDF. The code snippet below shows you how to do this:

In [9]:
import PyPDF2

def extract_text_from_pdf(pdf_path):
    try:
        # Open the PDF file in read-binary mode
        pdf_file = open(pdf_path, 'rb')
        
        # Create a PDF reader object
        pdf_reader = PyPDF2.PdfReader(pdf_file)
        
        # Initialize an empty string to store the extracted text
        text = ''
        
        # Iterate through each page in the PDF
        for page_num in range(len(pdf_reader.pages)):
            page = pdf_reader.pages[page_num]
            text += page.extract_text()
        
        # Close the PDF file
        pdf_file.close()
        
        return text
    except Exception as e:
        return str(e)


In [15]:
provinces = dict_.keys()
extracted_text = ''
for prov in provinces:
    pdf_path = notebook_dir +'/'+prov
    extracted_text += extract_text_from_pdf(pdf_path)

Province Municipality Ward VD Number Voting Station Name Address
Eastern Cape BUF - Buffalo City 29200001 10590151 PEFFERVILLE CLINIC3619 ROTTERDAM ROAD  PEFFERVILLE  EAST 
LONDON
Eastern Cape BUF - Buffalo City 29200001 10590588 EAST LONDON HIGH SCHOOL70 MAPLE LEAF AVENUE  BRAELYN  EAST 
LONDON
Eastern Cape BUF - Buffalo City 29200001 10590858SHAD MASHOLOGU MEMORIAL 
BAPTIST CHURCH117 BASHE STREET  DUNCAN VILLAGE  EAST 
LONDON
Eastern Cape BUF - Buffalo City 29200001 10590869 MASAKHE PUBLIC SCHOOL35 MAZWI STREET  DUNCAN VILLAGE  EAST 
LONDON
Eastern Cape BUF - Buffalo City 29200001 10591297 BRAELYN COMMUNITY HALL VERSUVIUS ROAD  BRAELYN   EAST LONDON
Eastern Cape BUF - Buffalo City 29200002 10590847ST PETERS CLOVER CATHOLIC 
CHURCH 2 GODLO STREET  DUNCAN VILLAGE  EAST 
LONDON
Eastern Cape BUF - Buffalo City 29200002 10591017 GOMPO COMMUNITY HALLDOUGLAS SMITH HIGHWAY  GOMPO  EAST 
LONDON
Eastern Cape BUF - Buffalo City 29200002 10591039 NCEDANI DAY CARE CENTREBEBELELE STREET  DUNCAN VI