# <b>Real Estates Market</b> analysis project - <i> Web Scraping Storia website </i>

## A little bit about the project
I perform an analysis on real estates market in order to gain insights about apartments with two rooms. 
Data used in this analysis it was scraped on <b>23 Feb 2022</b> from <b>https://www.storia.ro/</b> website.

## A little bit about this file, wich is the first step in my project
For the analysis to be more accurate I choose to scrape data from <b>https://www.storia.ro/</b>. During this phase I used <b>requests</b> and <b>BeautifulSoup</b> libraries to gather all the links from the main page, here we talk about over 1000 links. Then I used 
<b>selenium</b> library, because the webpages were dynamically loaded and I need to render them before parseing them to <b>BeautifulSoup</b> library to go thru each one of the links and gather all the necessary data. </br>
The problem that occured during this phase was the fact that I wasn't been able to scrape all the 1000+ links at once, becuase the variables that stored the data got bigger and bigger and, after the first 27 links the code interrupts and throw a memory error. So the solution that I came up was to scrape thru 5 links at a time then store the information into a .csv file, then reinitialise the python variables :). All the .csv file were stored in CSV folder.

### Importing Libraries

In [10]:
# libraries for static html pages
from bs4 import BeautifulSoup
import requests

# importing library for dynamically web pages
# in order to be scraped we need to render the page
from selenium import webdriver

import pandas as pd
import time
import os

### Scraping all the links for the apartaments from the main page

We store them in a list and later we will go thru all that links and scrape data for all the apartments

In [11]:

# we change our user agent because normally will apear something that include python keyword and changes are that our robot to be restricted by the website
headers = {'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/72.0.3626.121 Safari/537.36'} 
url = 'https://www.storia.ro/vanzare/apartament/ilfov/chiajna/?search%5Bfilter_enum_rooms_num%5D%5B0%5D=2'


links_list = [] # we creating a list to store all the links that will be iterate thru to gather the info that will need
for x in range(1,45): # for loop to iterate thru all the pages of the website
    if x == 1:
        r = requests.get(url, headers = headers)
    else:
        r = requests.get('https://www.storia.ro/vanzare/apartament/ilfov/chiajna/?search%5Bfilter_enum_rooms_num%5D%5B0%5D=2&page={}'.format(x), headers = headers)
    
    soup = BeautifulSoup(r.content, 'lxml')
    apartments = soup.find_all('div', class_='offer-item-details')

    for item in apartments:
        link = item.a['href']
        links_list.append(link) # adding links to our list

print(len(links_list))


1175


In [12]:
# for memory reasons I will segmentate the links_list into lists that contain 10 links
chunks = [links_list[x:x+5] for x in range(0 ,len(links_list), 5)]

### Scraping info from the individual links

In [13]:
#test_link = 'https://www.storia.ro/ro/oferta/apartament-2-camere-militari-IDqCf0.html#14898098de'

from selenium.webdriver import FirefoxOptions

opts = FirefoxOptions()
opts.add_argument("--headless")

path = "./CSV"
counter = 0
for i in chunks:
    apartmentslist = []
    for link in i:
        driver = webdriver.Firefox(firefox_options=opts)   # we use webdriver to open a new firefox instance 
        driver.get(link) # like requests.get()
        html = driver.execute_script('return document.documentElement.outerHTML') # we execute the script that dynamically load the webpage to imitate a static website in order to use BeautifulSoup
        time.sleep(3) # adding a delay to make shure that the page is all loaded 
        soup = BeautifulSoup(html, 'lxml')

        # from here we scrape different data like price, orientation of the apartment, etc. to create a dataset
        more_info = soup.find_all('div', class_='css-1ytkscc ev4i3ak0')

        details = []
        try:
            price = soup.find('strong', class_='css-b114we eu6swcv14').text
        except:
            price = 'no price'

        try:
            title = soup.find('h1', class_='css-11kn46p eu6swcv15').text
        except:
            title = 'no title'

        for info in more_info:
            details.append(info.text)

        try:
            building_seller = soup.find('span', class_='css-1yijy9r ezb2r8u5').text
        except:
            building_seller = 'no name'    

        try:
            agency = soup.find('div', class_='css-1rl7r8w ezb2r8u2').text
        except:
            agency = 'no info'
        
        apartment = {                             # creating a dictionary, name of the columns in our data frame will be the labels of our dictionary
            'title': title,
            'price': price,
            'details': details,
            'building_seller': building_seller,
            'agency': agency,
            'link': link
            }                                  
        apartmentslist.append(apartment)          # storing all the dictionary
        print('Saving: {}'.format(apartment['title']))
        driver.close()
        df = pd.DataFrame(apartmentslist)
        df.to_csv(path+'/{}.csv'.format(counter), index = False)
    
    counter += 1




  driver = webdriver.Firefox(firefox_options=opts)   # we use webdriver to open a new firefox instance


Saving: Vand apartament 2 camere Militari Residence mobilat
Saving: Apartament 2 camere (DIRECT DEZVOLTATOR)- 51000 Euro- comision 0.
Saving: Apartament 2 camere (DIRECT DEZVOLTATOR)- 51000 Euro- comision 0.
Saving: NOU ! Apartament 2 camere Premium | Luminos | LUX Finisaje la alegere
Saving: Apartament 2 camere Chiajna - Militari | 0% Comision | Complex premium
Saving: NOU ! Apartament 2 camere Decomandat | Zona Militari  Finisaje Premium
Saving: Apartament cu 2 camere de vânzare - Str Orhideelor
Saving: 2 Camere Finalizat Militari Residence STB - metrou Preciziei Pacii
Saving: Apartament 2 camere Chiajna-Militari | 0% comision | Finisaje premium
Saving: *NEW* Apartament 2 camere Militari * Finisaje premium * Confort LUX
Saving: *NEW* Apartament 2 camere Militari * Confort LUX * 0% Comision
Saving: Apartament nou Militari Rezervelor 60 Aqua Garden Sector 6
Saving: Apartament 2 camere/Decomandat /Militari/Dezvoltator/Finisaje Premium
Saving: Apartament 2 camere/Decomandat /Militari/Dez