# OCDS for Infrastructure - Ukraine Demonstrator

## Part 1: Scraping data from CoST Ukraine portal

Navigate to the [CoST Ukraine Portal](http://portal.costukraine.org)

*Note: Use Google Chrome with auto-translation enabled, unless you speak Ukrainian*

Currently only projects of Ukravtodor, the State Highways Agency, are listed, so click the Ukravtodor logo.

Use the map to choose a region to see the projects list for, e.g. the [Sumy region](http://portal.costukraine.org/proekti/ukravtodor/sumska-oblast/)

Auto-translate doesn't work on the projects list page, so click "ТАБЛИЦІ" in the grey header bar to get a view which can be translated.

*Note: You might need to open the "ТАБЛИЦІ" link in a new tab to get it to load*

Choose the project which you want to scrape from the list, e.g. [Reconstruction of the bridge crossing on the highway N-12 Sumy-Poltava km 70 + 838](http://89.185.0.248:8888/UAD/SUM/PROJECTS.php?hname=dbo_PROJECTS_dbo_PROJECTS_TECH_DETAILS_handler&fk0=173&master_viewmode=0)

Set the following variables based on the URL of the project you selected:

In [51]:
#update with url segment for the public entity, e.g. UAD for State Highways Agency of Ukraine
publicEntity = "UAD"

#update with region for project, e.g. SUM for  region, note for MFO projects the url construction is slightly different (see commented out html = line below)
region = "SUM"

#update with value of &fk0 parameter in URL of project you want to scrape - this identifies the project
foreignKey = "173"

Run the following script to scrape the data for your project:

In [52]:
from requests import get
from bs4 import BeautifulSoup
import pprint
import json

#function to scrape content of main table
def scrape(url,output):
    
    #get html and convert to nice object
    html = get(url,stream=True).content
    html = BeautifulSoup(html, "html.parser")
    
    #get name of section we are scraping and create an object for it
    section = html.body["id"]
    output[section] = {} 
    
    main_table = html.find("div", class_="well")
    
    if main_table != None:
        for td in main_table.select("td"):
            if "data-column-name" in td.attrs:
                output[section][td["data-column-name"]] = td.text
    
    return output

#get html of first page and convert to nice object
html = get("http://89.185.0.248:8888/"+publicEntity+"/"+region+"/PROJECTS.php?hname=dbo_PROJECTS_dbo_PROJECTS_TECH_DETAILS_handler&fk0=" + foreignKey + "&master_viewmode=0",stream=True).content
# use the following line for MFO projects
# html = get("http://portal.costukraine.org/uad_mfo/PROJECTS.php?hname=dbo_PROJECTS_dbo_PROJECTS_TECH_DETAILS_handler&fk0=" + foreignKey + "&master_viewmode=0",stream=True).content
html = BeautifulSoup(html, "html.parser")

#set up array to store urls for each view of project
urls = []

#get urls of each page
navigation = html.find("ul", class_="nav nav-tabs grid-details-tabs")
for li in navigation.select("li"):
    urls.append(li.a["href"])
    
#put amendments URL in separate variable (no data found for this page yet, so we don't do anything with this)
amendmentsURL = urls.pop()

#set up object for scraped data
data = {}

#scrape summary table (appears on each page, so only do this once)
data["summary"] = {}

summary_table = html.find("div", class_="grid grid-table grid-master js-grid")

for th in summary_table.select("th"):
        data["summary"][th["data-name"]] = ""
        
for td in summary_table.select("td"):
    if "data-column-name" in td.attrs:
        data["summary"][td["data-column-name"]] = td.text

#scrape main table on each page
for url in urls:
    print("scraping " + url)
    data = scrape("http://89.185.0.248:8888/"+publicEntity+"/"+region+"/" + url, data)

print("done scraping")

scraping PROJECTS.php?hname=dbo_PROJECTS_dbo_PROJECTS_TECH_DETAILS_handler&fk0=173&master_viewmode=0
scraping PROJECTS.php?hname=dbo_PROJECTS_dbo_PROJECTS_SUBJECTS_handler&fk0=173&master_viewmode=0
scraping PROJECTS.php?hname=dbo_PROJECTS_dbo_PROJECTS_CUSTOMER_handler&fk0=173&master_viewmode=0
scraping PROJECTS.php?hname=dbo_PROJECTS_dbo_PROJECTS_FINANCING_handler&fk0=173&master_viewmode=0
scraping PROJECTS.php?hname=dbo_PROJECTS_dbo_PROJECTS_PROJECT_ORGANIZATION_handler&fk0=173&master_viewmode=0
scraping PROJECTS.php?hname=dbo_PROJECTS_dbo_PROJECTS_CONTRACTOR_handler&fk0=173&master_viewmode=0
scraping PROJECTS.php?hname=dbo_PROJECTS_dbo_PROJECTS_ENG_SUPERVIZORY_handler&fk0=173&master_viewmode=0
scraping PROJECTS.php?hname=dbo_PROJECTS_dbo_PROJECTS_TECH_SUPERVIZORY_handler&fk0=173&master_viewmode=0
done!


## Part 2: Finding related contracts in Prozorro


Using **untranslated** name of the project, search the [Prozorro front end](https://prozorro.gov.ua) for related tenders.

### Example

The name of our project:

> Reconstruction of the bridge crossing on the highway N-12 Sumy-Poltava km 70 + 838

Was translated from:

> Реконструкція мостового переходу на автомобільній дорозі Н-12 Суми-Полтава км 70+838

Searching for the name of the highway (Н-12 Суми-Полтава) returns [67 results](https://prozorro.gov.ua/tender/search?query=%D0%9D-12%20%D0%A1%D1%83%D0%BC%D0%B8-%D0%9F%D0%BE%D0%BB%D1%82%D0%B0%D0%B2%D0%B0). Manually reviewing the results to identify those relating to the bridge at km 70 +838 results in 6 contracts:

https://prozorro.gov.ua/tender/UA-2017-10-24-000686-a (ecd7008713ce40898a0b8a9e725cd75a)

https://prozorro.gov.ua/tender/UA-2017-10-24-000672-a (1711f9b90d7244d68aa47d02f538e0f9)

https://prozorro.gov.ua/tender/UA-2017-06-22-000543-b (1dc0ebaf0c3e4330bf242a91f39579e9)

https://prozorro.gov.ua/tender/UA-2017-08-18-000844-c (58c516cadcfe4767852e94d63860aed6)

https://prozorro.gov.ua/tender/UA-2016-12-14-000431-a (d87604c7236a43a09b5ec4249cc5cb84)

https://prozorro.gov.ua/tender/UA-2016-08-22-000314-b (2f5da1f7f080416e91b3fd655abffc85)

The individual tender view in the Prozorro front end includes the identifier needed to retrieve the related record from the [openprocurementapi](https://public.api.openprocurement.org/api/2/tenders/) (shown in brackets above)


## Part 3: Downloading related tenders from openprocurement API

Populate the following array with the tender identifiers you found in the previous step:

In [None]:
identifiers = ["ecd7008713ce40898a0b8a9e725cd75a","1711f9b90d7244d68aa47d02f538e0f9","1dc0ebaf0c3e4330bf242a91f39579e9","58c516cadcfe4767852e94d63860aed6","d87604c7236a43a09b5ec4249cc5cb84","2f5da1f7f080416e91b3fd655abffc85"]

Run the following script to download the data from the openprocurement API:

In [53]:
import urllib

data["tenders"] = []

for identifier in identifiers:
    with urllib.request.urlopen("https://public.api.openprocurement.org/api/2/tenders/" + identifier) as url:
        response = json.loads(url.read().decode())
        data["tenders"].append(response["data"])


In [54]:
#save results to file
with open("data_" + publicEntity + "_" + region + "_" + foreignKey + ".json", "w") as output:
    json.dump(data, output, indent = 4, ensure_ascii=False)