# I&E Study 7.1 Automated Stakeholder Analysis for Hedera


In this Notebook we will demonstrate how to scrape, clean, analyse and visualise data from different resources to do stakeholder analysis for Hedera.

## Scraping

First we will gather data from the website https://seepnetwork.org. More specifically we will collect the data of the members of the seepnetwork.

For this we will define a method that will do the download, or if we already have the file in our local folder we just load it from there to reduce network traffic towards there website.

In [82]:
import requests
import os
from pathlib import Path

def load_page(name, path):
    
    displayname = name.replace('https://', '')
    full_path = path+displayname
    
    content = ""
    
    try:
        with open(full_path, 'r') as f:
            content = f.read().replace("\n", "").replace("\t", "")
    except (OSError, IOError) as e:
        response = requests.get(name)
        directory = os.path.dirname(full_path)
        Path(directory).mkdir(parents=True, exist_ok=True)
        open(full_path, 'wb').write(response.content)
        content = response.content
        
    return content 


First we will scrape the main page.

In [89]:
file_dir = "webpages/"
parent_page = "https://seepnetwork.org"
main_content = load_page(parent_page + "/Profiles", file_dir)

Next we will get all of the links to the individual member profile pages. Therefor we select all of the elements and extract the href field.

In [90]:
from bs4 import BeautifulSoup

bs = BeautifulSoup(main_content)

sub_links = []
for link in bs.select(".mapListViewItem .button.border.blue a"):
    sub_links.append(link["href"])
    
print(len(sub_links))

94


^ Number of memebers found.

To get the members data we download the profile pages and parse the needed fields into an array for further processing.

In [133]:
data = []

for link in sub_links:
    content = load_page(parent_page+link, file_dir)
    bs = BeautifulSoup(content)
    parts = bs.select(".sidebar.left")
    
    name = parts[0].select_one("h3").string.strip()
    years_of_membership = parts[0].select_one(".sidebarRight > p").string[0]
    location = parts[0].select_one(".sidebarRight .twoColLeft > p").contents[-1].strip()
    website = parts[0].select_one(".sidebarRight .twoColRight a")['href'].strip()
    org_type = parts[0].select_one(".sidebarRight .twoColRight > p").contents[-1].strip()
    mission_statement = parts[1].select(".sidebarRight > p")[0].contents[-1].strip()
    countries_of_involvement = parts[1].select(".sidebarRight > p")[1].contents[-1].strip()
    practice_areas = parts[1].select(".sidebarRight > p")[2].contents[-1].strip()
    
    data.append([name, years_of_membership, location, website, org_type, mission_statement, countries_of_involvement, practice_areas])

Finally we write the parsed information into a csv file which can later be used as an input for the machine learning algorithms.

In [135]:
import csv
wtr = csv.writer(open ('member_data.csv', 'w'), delimiter=',', lineterminator='\n')
for member in data : wtr.writerow (member)

## Analyzing the data
