### Beautiful Soup: Build a Web Scraper With Python

* #### Inspect the HTML structure of your target site with your browser’s developer tools

Before you write any Python code, you need to get to know the website that you want to scrape. That should be your first step for any web scraping project you want to tackle. You’ll need to understand the site structure to extract the information that’s relevant for you. Start by opening the site you want to scrape with your favorite browser.
* #### Decipher data encoded in URLs

A programmer can encode a lot of information in a URL. Your web scraping journey will be much easier if you first become familiar with how URLs work and what they’re made of. For example, you might find yourself on a details page that has the following URL:

https://realpython.github.io/fake-jobs/jobs/senior-python-developer-0.html
You can deconstruct the above URL into two main parts:

1. The base URL represents the path to the search functionality of the website. In the example above, the base URL is https://realpython.github.io/fake-jobs/.
2. The specific site location that ends with .html is the path to the job description’s unique resource.
Any job posted on this website will use the same base URL. However, the unique resources’ location will be different depending on what specific job posting you’re viewing.

URLs can hold more information than just the location of a file. Some websites use query parameters to encode values that you submit when performing a search. You can think of them as query strings that you send to the database to retrieve specific records.
* #### Use ```requests``` and ```Beautiful Soup``` for scraping and parsing data from the Web

Beautiful Soup is a Python library for parsing structured data. It allows you to interact with HTML in a similar way to how you interact with a web page using developer tools. The library exposes a couple of intuitive functions you can use to explore the HTML you received. 
* #### Step through a web scraping pipeline from start to finish
* #### Build a script that fetches job offers from the Web and displays relevant information in your console

#### Challenges of Web Scraping

* <b> Variety:</b> Every website is different. While you’ll encounter general structures that repeat themselves, each website is unique and will need personal treatment if you want to extract the relevant information.

* <b> Durability:</b> Websites constantly change. Say you’ve built a shiny new web scraper that automatically cherry-picks what you want from your resource of interest. The first time you run your script, it works flawlessly. But when you run the same script only a short while later, you run into a discouraging and lengthy stack of tracebacks!

### An Alternative to Web Scraping: APIs

Some website providers offer application programming interfaces (APIs) that allow you to access their data in a predefined manner. With APIs, you can avoid parsing HTML. Instead, you can access the data directly using formats like JSON and XML. HTML is primarily a way to present content to users visually.

When you use an API, the process is generally more stable than gathering the data through web scraping. That’s because developers create APIs to be consumed by programs rather than by human eyes.

The front-end presentation of a site might change often, but such a change in the website’s design doesn’t affect its API structure. The structure of an API is usually more permanent, which means it’s a more reliable source of the site’s data.

However, APIs can change as well. The challenges of both variety and durability apply to APIs just as they do to websites. Additionally, it’s much harder to inspect the structure of an API by yourself if the provided documentation lacks quality.

The approach and tools you need to gather information using APIs are outside the scope of this tutorial. To learn more about it, check out API Integration in Python.

In [None]:
import pandas as pd
import re
import time
import requests
from bs4 import BeautifulSoup

In [None]:
download_from_url = 'https://www.estate.am/բնակարաններ-երևանում-s3990854?page='

In [None]:
all_apartment_urls = list()
count = 0
for pages in range(1,10000):
    URL = download_from_url + f'{pages}'
    page = requests.get(URL)
    soup = BeautifulSoup(page.content, "html.parser")
    house_url = soup.find_all('a', class_='btn-details', title="ավելին")
    if len(house_url) == 0:
        break
    for house_href in house_url:
        complete_house_urls = 'https://www.estate.am/en' + house_href.get('href')
        count+=1
        all_apartment_urls.append([count, complete_house_urls])
print("DB rows: ", len(all_apartment_urls))
urls_to_csv = pd.DataFrame(all_apartment_urls, columns=['row', 'url'])
urls_to_csv.to_csv('urls.csv', index=False)

In [None]:
appartment_db = list()
count = 0
for apartment in urls_to_csv.url.tolist()[:100]:
    page = requests.get(apartment)
    soup = BeautifulSoup(page.content, "html.parser")
    time.sleep(1.5)
    appartment_db.append(
    {
        'addr': soup.find('strong', class_='addr').text,
        'ruler': soup.find('span', class_='ruler').text,
        'price': re.sub('\s+', '', soup.find('div', class_='price-w').text)
        
    })
    count+=1
    print('Observation: ', count)
apartments = pd.DataFrame(appartment_db)
apartments.to_csv('apartments.csv')