# Scraper Detalii Achizitii Publice

#### Realizari: 

Am reusit sa returnez cam tot textul relevant pentru o pagina detaliata de achizitii publice. 

#### Observatii: 

Informatia este foarte imprastiata pe site in tot felul de HTML tags si clase; 
Unele chestii precum denumirea partilor nu apar ca text;
Mare parte din text contine multe spatii de tip \t sau \n sau " ";
Trebuie curatate datele putin dar sunt complete.

#### Future goals:

Sa parcurgem linkurile si sa stocam datele intr-un format standardizat. 

### Importam module utile

In [None]:
################################################################################################################

import time

# Core of scraping
import requests
from os.path import join as pjoin
from bs4 import BeautifulSoup

# For applying a random sleep interval between requests  
from random import randint 
from time import sleep

# Need these in order to simulate human activity in Chrome/Firefox browser (clicking)
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as ec

# Regex
import regex as re
import os 
import sys

# Zip
import gzip
import shutil

# Needed for turning the date to datetime values
import datetime as dt

# Dataframes
import pandas as pd

# Twilio enables us to send SMS
from twilio.rest import Client


### URL-ul de interes

In [90]:
################################################################################################################

# URLS we need these specific URLs in order to run te program correctly. Do not change this !
#get_the_data_url = "https://www.e-licitatie.ro/pub/direct-acquisitions/list/1"
get_the_data_url = "https://www.e-licitatie.ro/pub/direct-acquisition/view/104273557"

################################################################################################################

### Helper functions

In [79]:
################################################################################################################

def try_connecting(get_the_data_url):
    while True:
        try:
            source_code = requests.get(get_the_data_url, timeout = 30, verify=False)
            return (source_code)
        except (requests.ConnectionError) as e:
            print("OOPS!! Connection Error. Make sure you are connected to Internet. Technical Details given below.\n")
            print(str(e))            
            continue
        except (requests.Timeout) as e:
            print("OOPS!! Timeout Error")
            print(str(e))
            continue
        except (requests.RequestException) as e:
            print("OOPS!! General Error")
            print(str(e))
            continue
        except (KeyboardInterrupt):
            print("Someone closed the program")
        break

################################################################################################################

### Request source code and create Soup

In [80]:
# Source code 
source_code = try_connecting(get_the_data_url)
soup = BeautifulSoup(source_code, 'html.parser')

# Getting source code 
plain_text = source_code.text



### Selenium package (Optional)

In [91]:
# Selenium driver
driver = webdriver.Firefox() # Firefox Driver
driver.get(get_the_data_url)

In [92]:
# New source code from driver
source_code = driver.page_source
soup = BeautifulSoup(source_code, 'html.parser')

# Data Mining

# Section 0:

In [162]:
# Seria tranzactiei (ID)
h1 = soup.find("h1", {"class": "ng-binding"})
h1.text

'cumparare directa: [nr] -  DA23884765'

In [173]:
# Data publicarii si finalizarii 
fa = soup.findAll("span", {"class": "ng-binding"})
#for f in fa:
  #  print(f.text,"######################")
print(fa[37].text)
print(fa[38].text)



                    Data publicare: 18.09.2019 09:36
                


                    Data finalizare: 18.09.2019 12:33
                


# Section 1: Date de identificare ofertant/autoritate contractanta

In [157]:
# Nu apare denumirea dar apare CIFul
# Apare adresa intreaga, nu pe bucati, Tara separat
# Date de contact
span = soup.findAll("span", {"class": "u-displayfield__field ng-binding"})
for s in span:
    print(s.text,"#######################")


                24332317
             #######################
 Strada: Mărgeanului, nr. 8, Sector: 5, Judet: Bucuresti, Localitate: Bucuresti, Cod postal: 051041 #######################
Romania #######################
- #######################
+40 214211860/+40 0722654017 #######################
+40 214201860 #######################
 doringirleanu@yahoo.com #######################

                4192910
             #######################
 Strada: Lupu Dionisie, nr. 37, Sector: 2, Judet: Bucuresti, Localitate: Bucuresti, Cod postal: 020021 #######################
- #######################
+40 0213180736 #######################
+40 0213180736 #######################
dapmf@umfcd.ro #######################


# Section 2: Detalii de achizitie

In [156]:
# Asta are data finalizarii, data inaintarii
# Cod si denumire CPV (3155...)
# Tip de contract
# Finantare prin fonduri UE: Da/Nu
# Valoare cumparare directa 
# Valoare in RON si Euro

ngbinding = soup.findAll("div", {"class": "indent"})

for c in ngbinding:
    print(c.text,"#######################")

31531000-7 - Becuri (Rev.2) #######################
Furnizare #######################


Nu
 #######################
39,50  RON (8,34  EUR) #######################
39,50  RON (8,34  EUR) #######################
 RON ( EUR) | % #######################

 RON ( EUR) |
                                %
 #######################
18.09.2019 11:20 #######################
18.09.2019 12:33 #######################
 din  #######################


In [152]:
th = soup.findAll("th", {"class": "s-center"})
for t in ngbinding:
    print(t.text,"#######################")

Oferta acceptata #######################
Romania #######################
Livrarea produselor se va face in termen de 30 zil... mai departee de la data acceptarii ofertei de catre autoritatea contractanta, conditiile DDP, la UMF “Carol Davila” – Depozitul de materiale Bdul Eroii Sanitari, nr.8, Sector 5, Bucuresti mai putin #######################
Plata produselor se va face in lei, in termen de 3... mai departe0 de zile de la data primirii facturii ,                       	dupa receptia produselor. mai putin #######################
Nu #######################

                                    1
                                 #######################

                                    39,50 
                                 #######################


In [153]:
# Asta scoate exact ce se intampla cu oferta (acceptata/nu)
# Stie detaliile contractului
# Stie si cantitatea si pretul/unitate

ngbinding = soup.findAll("span", {"class": "ng-scope"})
for c in ngbinding:
    print(c.text,"#######################")

Oferta acceptata #######################
Romania #######################
Livrarea produselor se va face in termen de 30 zil... mai departee de la data acceptarii ofertei de catre autoritatea contractanta, conditiile DDP, la UMF “Carol Davila” – Depozitul de materiale Bdul Eroii Sanitari, nr.8, Sector 5, Bucuresti mai putin #######################
Plata produselor se va face in lei, in termen de 3... mai departe0 de zile de la data primirii facturii ,                       	dupa receptia produselor. mai putin #######################
Nu #######################

                                    1
                                 #######################

                                    39,50 
                                 #######################


# Section 3: Repere achizitionate

In [147]:
# Asta e pentru cantitate...

td = soup.findAll("td", {"class": "col-color-1 ng-binding"})
for t in td: # OBS: Daca punem ngg-binding avem si textul standard din tabel
    print(t.text,"#######################")

Solicitata autoritate / entitate #######################
Ofertata operator #######################
1 #######################


                                    1
                                

 #######################


In [151]:
# Asta e pentru pret(ron) l...

#td = soup.findAll("td", {"class": "col-color-2 ng-binding"})
td = soup.findAll("td", {"class": "col-color-2 ng-binding"})
for t in td: 
    print(t.text,"#######################")

39,50   #######################

                                39,50 
                             #######################


In [146]:
# Asta e pentru pret(ron), valoare si valori totale

valoare = soup.findAll("td", {"class": "col-color-3"})

for v in valoare:
    print(t.text,"#######################")


                                39,50 
                             #######################

                                39,50 
                             #######################

                                39,50 
                             #######################

                                39,50 
                             #######################

                                39,50 
                             #######################

                                39,50 
                             #######################


In [158]:
# JOSUL PAGINII

#Numar de referinta, pret de catalog, etc. 
repere = soup.findAll("b", {"class": "ng-binding"})
for r in repere:
    print(r.text,"#######################")
    

TS100/820 #######################
39,50  RON / Unitate de masura #######################
BUC #######################
31531000-7 Becuri (Rev.2) #######################
 #######################


In [159]:
# MIXT

# Aici avem produs, livrare, plata, descriere 
random = soup.findAll("p", {"class": "ng-isolate-scope"})
for b in random:
    print(b.text,"#######################")

BEC CU VAPORI DE SODIU PHILIPS 100W,DULIE E40 HID TS100/820 #######################
Livrarea produselor se va face in termen de 30 zil... mai departee de la data acceptarii ofertei de catre autoritatea contractanta, conditiile DDP, la UMF “Carol Davila” – Depozitul de materiale Bdul Eroii Sanitari, nr.8, Sector 5, Bucuresti mai putin #######################
Plata produselor se va face in lei, in termen de 3... mai departe0 de zile de la data primirii facturii ,                       	dupa receptia produselor. mai putin #######################
BEC PE BAZA DE VAPORI DE SODIU PHILIPS PIA SON MASTERS 100 W,DULIE MARE E40. #######################


# Pagina principala cu lista de acchizitii

In [101]:
def show_many_show_more(driver, aquisitions_list):
    for aquisition in aquisitions_list: 
        driver.find_element_by_class_name('title-entity ng-binding')

In [102]:
for aquisition in aquisitions_list: 
    driver.find_element_by_class_name('title-entity ng-binding')

NameError: name 'aquisitions_list' is not defined

In [103]:
driver.find_element_by_class_name('title-entity ng-binding')

NoSuchElementException: Message: Unable to locate element: .title-entity ng-binding


In [70]:
driver.find_element(By.PARTIAL_LINK_TEXT,"ng-click").click()

NoSuchElementException: Message: Unable to locate element: ng-click
