# [Jabama](https://www.jabama.com/) website data scrap (Hetels part)

In this notebook, I try to scrap data from [Jabama](https://www.jabama.com/hotels/) website. 

This site offers services in the field of accommodation and is one of the leaders in the Iranian accommodation and hotel market. Hotels are available on site for almost all major cities or tourist destinations in the country.

Therefore we can find valuable information about booking conditions (like price and ...) in this website.

## Import libraries
We use 3 main libraries for web scraping:

1.   Csv (for read input data and write final data)
2.   Requests (for http request to web pages)
3.   BeautifulSoup (the main library for web scrap)

In [4]:
import csv
import requests 
from bs4 import BeautifulSoup

## Software structure
I need all hotels information for a good analysis, but unfortunately the site does not have the feature for getting all hotels' information. I even called the website support team and they informed me that all information is not available directly. So I did the following activities to gather a good and complete set of booking data set.

At first, I open and read a text file called *Cities.txt*, which consists of the provinces and major cities of Iran. For all rows in the file, I build a Url for searching in the website. The Url looks like :*https://www.jabama.com/search?q=مازندران&kind=hotel&page-number=1*, which means page 1 of the search result of all hotels for province or city *مازندران*. So if I search all pages (max=10) for all cities or provinces, then I will have almost the information about booking in Iran.

#### Scrap_and_Save function
This function receives and scraps each page search result and saves the hotel data in the in csv format. To be more understandable, I separate the scrap section of my code into a function.

In [5]:
# Scrap the page soup in paramters
def Scrap_and_Save(soup, data_file) :
    # Find all stays
    hotels = soup.find_all('div', attrs= {'class':'listing-items__item'})
    for each_hotel in hotels:
        content = each_hotel.find('a', attrs= {'class':'vertical-card'}, recursive=False)
        if content == None : continue

        # Find features
        code = kind = name = price = comment = score = province = city = star = ''
        
        # kind, code
        split_url = content['href'].split('/')
        if len(split_url) == 3 :
            kind = split_url[1]
            code = split_url[2]
        
        content_card = content.find('div', attrs= {'class':'vertical-card__wrapper'}, recursive=False)
        if content_card == None : continue
        
        # score, comment
        content_temp = content_card.find('div', attrs= {'class':'vertical-card__rate'}, recursive=False)
        if content_temp != None : 
            spans = content_temp.find_all('span', recursive=False)
            for each_span in spans:
                if each_span.has_attr('class') and each_span['class'][0] == 'vertical-card__rate-score' :
                    score = each_span.get_text().strip()
                if each_span.has_attr('class') and each_span['class'][0] == 'vertical-card__rate-count' :
                    comment = each_span.get_text().strip()
        # name, star
        content_temp = content_card.find('p', attrs= {'class':'vertical-card__name'}, recursive=False)
        if content_temp != None : 
            spans = content_temp.find_all('span', recursive=False)
            for each_span in spans:
                if each_span.has_attr('class') == False:
                    name = each_span.get_text().strip()
                if each_span.has_attr('class') and each_span['class'][0] == 'vertical-card__star' :
                    star = each_span.get_text().strip()            # content_temp = content.find('span', attrs= {'class':'vertical-card__rate-count'}, recursive=False)
        # province, city
        content_temp = content_card.find('p', attrs= {'class':'vertical-card__feature'}, recursive=False)
        if content_temp != None : 
            temp_city = ''
            temp_city = content_temp.find('span', recursive=False)
            if temp_city != None : temp_city = temp_city.get_text().strip()
            temp_city = temp_city.replace('\n','')
            temp_city = temp_city.split('،')
            if len(temp_city) == 2:
                city = temp_city[1].strip()
                province = temp_city[0].strip()
       # price
        content_temp = content_card.find('div', attrs= {'class':'pricing vertical-card__pricing'}, recursive=False)
        if content_temp != None : 
            content_temp = content_temp.find('div', attrs= {'class':'hotel-pricing'}, recursive=False)
            if content_temp != None:
                content_temp = content_temp.find('p', attrs= {'class':'hotel-pricing__price'}, recursive=False)
                if content_temp != None:
                    temp_price = ''
                    temp_price = content_temp.find('strong', recursive=False)
                    if temp_price != None : temp_price = temp_price.get_text().strip().split('تومان')[0].replace(',','')
                    price = temp_price.strip()
        
        new_row = [code,kind,name,price,comment,score,province,city,star]
        
        with open(data_file, 'a+', newline='', encoding='utf-8') as write_obj:
            # Create a writer object from csv module
            csv_writer = csv.writer(write_obj)
            # Add contents of list as last row in the csv file
            csv_writer.writerow(new_row)


#### Main function
In the main function, I just open the *Cities.txt* file, build the search Url and request the Url. Then call the *Scrap_and_Save* function with the Url result.

Tip1 : The site returns a web page instead of not found error (404). So if the searched city was not found, the site returns a web page. Therefore, I continue the for loop by detecting a *div* with specific *class*.


In [6]:
if __name__ == "__main__":
    Data_File_Name = 'Data.csv'
    City_File_Name = 'Cities.txt'
    
    # Write the headers in data csv file
    with open(Data_File_Name, mode='w', newline='', encoding='utf-8') as csv_file:
        handle = csv.writer(csv_file)
        handle.writerow(['code','kind','name','price','comment','score','province', 'city','star'])

    city_list = list()
    # Read city source file
    with open(City_File_Name,'r',encoding='utf-8') as city_file:
        lines = city_file.readlines()
        for each_line in lines:
            city_list.append(each_line.replace('\n',''))

    # Get HTML of all cities
    Jabama_Url = 'https://www.jabama.com/'
    Jabama_Url_WithoutSlash = 'https://www.jabama.com'

    for each_city in city_list :
        # Find all pages (At last Max_page_no pages)
        Max_page_no = 10
        for each_page in range(Max_page_no):
            temp_page = str(each_page + 1)

            print('Scraping', each_city, 'page', temp_page)
            
            temp_Url = f'{Jabama_Url}search?q={each_city}&kind=hotel&page-number={temp_page}'

            # Check if page is found
            response = requests.get(temp_Url)
            if response.status_code != 200:
                break
                
#             print (temp_Url)

            soup = BeautifulSoup(response.content, 'html.parser')

            # Jabama return this page instead of 404
            check_empty = soup.find_all('div', attrs= {'class':'listing-empty-state'})
            if len(check_empty) > 0:
                break
            
            Scrap_and_Save(soup, Data_File_Name)

Scraping اردبیل page 1
https://www.jabama.com/search?q=اردبیل&kind=hotel&page-number=1
Scraping اردبیل page 2
https://www.jabama.com/search?q=اردبیل&kind=hotel&page-number=2
Scraping اردبیل page 3
https://www.jabama.com/search?q=اردبیل&kind=hotel&page-number=3
Scraping اردبیل page 4
https://www.jabama.com/search?q=اردبیل&kind=hotel&page-number=4
Scraping تبریز page 1
https://www.jabama.com/search?q=تبریز&kind=hotel&page-number=1
Scraping تبریز page 2
https://www.jabama.com/search?q=تبریز&kind=hotel&page-number=2
Scraping ارومیه page 1
https://www.jabama.com/search?q=ارومیه&kind=hotel&page-number=1
Scraping ارومیه page 2
https://www.jabama.com/search?q=ارومیه&kind=hotel&page-number=2
Scraping اصفهان page 1
https://www.jabama.com/search?q=اصفهان&kind=hotel&page-number=1
Scraping اصفهان page 2
https://www.jabama.com/search?q=اصفهان&kind=hotel&page-number=2
Scraping اصفهان page 3
https://www.jabama.com/search?q=اصفهان&kind=hotel&page-number=3
Scraping اصفهان page 4
https://www.jabama.com/s

https://www.jabama.com/search?q=مازندران&kind=hotel&page-number=7
Scraping مرکزی page 1
https://www.jabama.com/search?q=مرکزی&kind=hotel&page-number=1
Scraping مرکزی page 2
https://www.jabama.com/search?q=مرکزی&kind=hotel&page-number=2
Scraping هرمزگان page 1
https://www.jabama.com/search?q=هرمزگان&kind=hotel&page-number=1
Scraping هرمزگان page 2
https://www.jabama.com/search?q=هرمزگان&kind=hotel&page-number=2
Scraping هرمزگان page 3
https://www.jabama.com/search?q=هرمزگان&kind=hotel&page-number=3
Scraping هرمزگان page 4
https://www.jabama.com/search?q=هرمزگان&kind=hotel&page-number=4
Scraping هرمزگان page 5
https://www.jabama.com/search?q=هرمزگان&kind=hotel&page-number=5
Scraping هرمزگان page 6
https://www.jabama.com/search?q=هرمزگان&kind=hotel&page-number=6
Scraping هرمزگان page 7
https://www.jabama.com/search?q=هرمزگان&kind=hotel&page-number=7
Scraping هرمزگان page 8
https://www.jabama.com/search?q=هرمزگان&kind=hotel&page-number=8
Scraping هرمزگان page 9
https://www.jabama.com/searc

Finally, I will have a csv file called *Data.csv* after running the program. The file contains 27 features of accommodations. This file w