# Health Services for "Abuse, Rape and Domestic Violence Survivor Support"

This notebook scrapes the data from https://www.healthsites.org.za <br>
Description taken from the website:
```
ABOUT HEALTH SITES
Health Sites contains an up-to-date database of health facilities in South Africa, which offer services such as free Medical Male Circumcision (MMC) , HIV Treatment, HIV Counselling and HIV Testing (HCT). Find an MMC clinic and get circumcised. Find a HIV Testing Centre and get tested.

Medical male circumcision (MMC) is the most hygienic, safest way to be circumcised, and the only way to ensure that you get the full sexual and health benefits. Medical circumcisions are performed at MMC clinics and hospitals in South Africa.
```
It specifically targets health services for "Abuse, Rape and Domestic Violence Survivor Support".<br>
The Province and health service has to be manually selected.<br>
The resultant page is saved below to be scraped.

I acquired the Name, adress, telephone number and gps coordinates for each centre.<br>
The data will be saved in csv format for further use.

There are other information that are available such as:
- services offered
- opening hours
- municipality

In [68]:
import pandas as pd
import requests
from bs4 import BeautifulSoup
from time import sleep

In [6]:
eastern_cape_url = "https://www.healthsites.org.za/clinics-in-eastern-cape.html?field_category_tid=36620&widget-service=36620&widget-province=22950"

In [69]:
northern_cape_url = 'https://www.healthsites.org.za/clinics-in-northern-cape.html?field_category_tid=36620&widget-service=36620&widget-province=31856'

In [70]:
western_cape_url = 'https://www.healthsites.org.za/clinics-in-western-cape.html?field_category_tid=36620&widget-service=36620&widget-province=32286'

In [71]:
free_state_url = 'https://www.healthsites.org.za/clinics-in-free-state.html?field_category_tid=36620&widget-service=36620&widget-province=24747'

In [72]:
gauteng_url = 'https://www.healthsites.org.za/clinics-in-gauteng.html?field_category_tid=36620&widget-service=36620&widget-province=25366'

In [73]:
kwazulu_url = 'https://www.healthsites.org.za/clinics-in-kwazulu-natal.html?field_category_tid=36620&widget-service=36620&widget-province=27795'

In [74]:
limpopo_url = 'https://www.healthsites.org.za/clinics-in-limpopo.html?field_category_tid=36620&widget-service=36620&widget-province=29930'

In [75]:
northwest_url = 'https://www.healthsites.org.za/clinics-in-north-west.html?field_category_tid=36620&widget-service=36620&widget-province=31171'

In [76]:
mpumalanga_url = 'https://www.healthsites.org.za/clinics-in-mpumalanga.html?field_category_tid=36620&widget-service=36620&widget-province=29532'

eastern cape done, now the rest

In [77]:
province_urls = [northern_cape_url, western_cape_url, free_state_url, gauteng_url, kwazulu_url, limpopo_url, 
                 northwest_url, mpumalanga_url]

In [78]:
csv_names = ['northern_cape_clinics', 'western_cape_clinics', 'free_state_clinics', 'gauteng_clinics', 'kwazulu_clinics', 
             'limpopo_clinics', 'northwest_clinics', 'mpumalanga_clinics']

### the scraper

This should iterate through the list of province urls,  <br>and output csv's of all the details of the clinics for each province

In [86]:
def scrape_clinics(province_url, csv_name):
    '''
    Scrapes data about each clinic listed in "province_url", and saves the data to "csv_name"
    
    '''
    page_num = 0
    base_url = 'https://www.healthsites.org.za' 
    nxt_url = '1'
    clinics = []
    
    while nxt_url:
        if not page_num:
            res = requests.get(province_url)
        else:
            res = requests.get(province_url+nxt_url)
        print(f'now scraping {page_num+1}: {eastern_cape_url}{nxt_url} ...')
        soup = BeautifulSoup(res.text, "html.parser")

        divs = soup.select('.node-business')
        for div in divs:
            h2 = div.find("h2")
            name = h2.get_text()
            adress = div.find(class_="field-item even").get_text()
            url = h2.find("a")["href"]
            print(name)
        
            # get further details from clinic page
            subres = requests.get(base_url+url)
            subsoup = BeautifulSoup(subres.text, "html.parser")
            try:
                contactno = subsoup.find(class_="field-name-field-contact-dtails").find(class_="field-item").get_text()
            except:
                contactno = None
            try:
                gps = subsoup.find(class_="field-name-field-combined-gps-coordinates").find(class_="field-item").get_text()
            except:
                gps = None
        
            # append dict of data to list
            clinics.append({
                'name' : name,
                'adress' : adress,
                'telephone' : contactno,
                'gps' : gps,
            })
            sleep(1)
        
        # locate "next" button link
        nxt_btn = soup.find(class_="next")    
        nxt_url = nxt_btn.find("a")["href"] if nxt_btn else None
        page_num += 1
        sleep(1)

    df = pd.DataFrame(clinics)
    df = df[['name', 'telephone', 'adress', 'gps', 'lattitude', 'longitude']]
    df.to_csv(f'{csv_name}.csv', index=0)

In [None]:
for province_url, csv_name in zip (province_urls, csv_names):
    print()
    print(f'Scraping {csv_name}')
    print('~~~_____________________________________________________~~~')
    scrape_clinics(province_url, csv_name)


Scraping northern_cape_clinics
~~~_____________________________________________________~~~
now scraping 1: https://www.healthsites.org.za/clinics-in-eastern-cape.html?field_category_tid=36620&widget-service=36620&widget-province=229501 ...
 Abraham Esau Hospital
 Age-in-Action - De Aar, Northern Cape
 Age-in-Action - Upington, Northern Cape
 Alexander Bay Clinic
 Alexander Bay Community Health Centre
 Alexander Bay Youth Development Centre
 Britstown Clinic
 Carnarvon Community Health Centre
 Catherine Koi Koi Clinic
 Child Welfare South Africa - Douglas
 Child Welfare South Africa - Keimoes
 City Clinic
 De Aar Hospital
 Department of Justice and Constitutional Development, Office of the Family Advocate - Kimberley
 Diamond Gay and Lesbians - Kimberely
 Dikgatlong Local Municipality
 Dingleton Clinic
 Douglas Community Health Centre
 Ethembeni Community and Trauma Centre
 Ga-Segonyana Local Municipality
 Gadiboe Clinic
 Galeshewe Community Health Clinic
 Gamagara Local Municipality
 