# Content with notebooks

You can also create content with Jupyter Notebooks. This means that you can include
code blocks and their outputs in your book.

## Markdown + notebooks

As it is markdown, you can embed images, HTML, etc into your posts!

![](https://myst-parser.readthedocs.io/en/latest/_static/logo-wide.svg)

You can also $add_{math}$ and

$$
math^{blocks}
$$

or

$$
\begin{aligned}
\mbox{mean} la_{tex} \\ \\
math blocks
\end{aligned}
$$

But make sure you \$Escape \$your \$dollar signs \$you want to keep!

## MyST markdown

MyST markdown works in Jupyter Notebooks as well. For more information about MyST markdown, check
out [the MyST guide in Jupyter Book](https://jupyterbook.org/content/myst.html),
or see [the MyST markdown documentation](https://myst-parser.readthedocs.io/en/latest/).

## Code blocks and outputs

Jupyter Book will also embed your code blocks and output in your book.
For example, here's some sample Matplotlib code:

In [45]:
!brew update
!brew install chromium-chromedriver
%pip install requests bs4 pandas geopy selenium webdriver-manager

from bs4 import BeautifulSoup
import requests
import pandas as pd
from geopy.geocoders import Nominatim

Updated 2 taps (homebrew/core and homebrew/cask).
[34m==>[0m [1mNew Casks[0m
hapigo                                   wiso-steuer-2024
[34m==>[0m [1mSearching for similarly named formulae and casks...[0m
[34m==>[0m [1mCasks[0m
chromedriver

To install chromedriver, run:
  brew install --cask chromedriver
Collecting webdriver-manager
  Downloading webdriver_manager-4.0.1-py2.py3-none-any.whl.metadata (12 kB)
Collecting python-dotenv (from webdriver-manager)
  Using cached python_dotenv-1.0.0-py3-none-any.whl (19 kB)
Downloading webdriver_manager-4.0.1-py2.py3-none-any.whl (27 kB)
Installing collected packages: python-dotenv, webdriver-manager
Successfully installed python-dotenv-1.0.0 webdriver-manager-4.0.1
Note: you may need to restart the kernel to use updated packages.


In [20]:
import json
import re


def scrape_airbnb(check_in, check_out):
    url = f"https://www.airbnb.com/s/Madrid/homes?place_id=ChIJgTwKgJcpQg0RaSKMYcHeNsQ&refinement_paths%5B%5D=%2Fhomes&flexible_trip_dates%5B%5D=april&flexible_trip_dates%5B%5D=august&flexible_trip_dates%5B%5D=february&flexible_trip_dates%5B%5D=july&flexible_trip_dates%5B%5D=june&flexible_trip_dates%5B%5D=march&flexible_trip_dates%5B%5D=may&flexible_trip_dates%5B%5D=october&flexible_trip_dates%5B%5D=september&date_picker_type=calendar&search_type=filter_change&tab_id=home_tab&query=Madrid&monthly_start_date=2023-12-01&monthly_length=3&price_filter_input_type=0&price_filter_num_nights=1&channel=EXPLORE&flexible_trip_lengths%5B%5D=weekend_trip&source=structured_search_input_header&checkin={check_in}&checkout={check_out}&adults=1"
    response = requests.get(url).text

    soup = BeautifulSoup(response, "html.parser")

    print(soup.prettify())
    script_tag = soup.find('script', text=re.compile('niobeMinimalClientData'))
    if script_tag:
        json_data = json.loads(script_tag.string)
        return json_data
    else:
        return "Data not found"


def extract_data(json_data, check_in, check_out):
    result = []
    length = len(
        json_data["niobeMinimalClientData"][1][1]["data"]["presentation"][
            "staysSearch"
        ]["results"]["searchResults"]
    )

    for i in range(length - 1):
        # LOCATION
        location = json_data["niobeMinimalClientData"][1][1]["data"]["presentation"][
            "staysSearch"
        ]["results"]["searchResults"][i]["listing"]["coordinate"]
        latitude = location["latitude"]
        longitude = location["longitude"]

        # PRICE
        price = json_data["niobeMinimalClientData"][1][1]["data"]["presentation"][
            "staysSearch"
        ]["results"]["searchResults"][i]["pricingQuote"]["rate"]["amount"]

        room_type = json_data["niobeMinimalClientData"][1][1]["data"]["presentation"][
            "staysSearch"
        ]["results"]["searchResults"][i]["listing"]["roomTypeCategory"]

        result.append(
            {
                "check_in": check_in,
                "check_out": check_out,
                "latitude": latitude,
                "longitude": longitude,
                "price": price,
                "room_type": room_type,
            }
        )

    return result


# Load existing data from CSV, if it exists
try:
    existing_data = pd.read_csv("airbnb_data.csv")
except FileNotFoundError:
    existing_data = pd.DataFrame(
        columns=["check_in", "check_out", "latitude", "longitude", "price", "room_type"]
    )

# Scrape new data
airbnb_raw_data = []
year = 2024
for month in range(1, 13):
    for day in range(1, 28):  # Adjusted for February
        check_in = f"{year}-{month}-{day}"
        check_out = f"{year}-{month}-{day + 1}"
        json_data = scrape_airbnb(check_in, check_out)
        airbnb_raw_data.extend(extract_data(json_data, check_in, check_out))

# Convert new data into a DataFrame
new_data = pd.DataFrame(
    airbnb_raw_data,
    columns=["check_in", "check_out", "latitude", "longitude", "price", "room_type"],
)
new_data["check_in"] = pd.to_datetime(new_data["check_in"])
new_data["check_out"] = pd.to_datetime(new_data["check_out"])

# Append new data to existing data
airbnb_data = pd.concat([existing_data, new_data], ignore_index=True)

# Remove potential duplicates
airbnb_data.drop_duplicates(inplace=True)

# Save combined data back to CSV
airbnb_data.to_csv("airbnb_data.csv", index=False)

<!DOCTYPE html>
<html class="__TODO_ENABLE_REM_RESIZE__" data-hyperloop-version="2" data-is-hyperloop="true" dir="ltr" lang="en">
 <meta charset="utf-8"/>
 <meta content="en" name="locale"/>
 <meta content="notranslate" name="google"/>
 <meta content="authenticity_token" id="csrf-param-meta-tag" name="csrf-param"/>
 <meta content="" id="csrf-token-meta-tag" name="csrf-token"/>
 <meta content="" id="english-canonical-url"/>
 <meta content="on" name="twitter:widgets:csp"/>
 <meta content="yes" name="mobile-web-app-capable"/>
 <meta content="yes" name="apple-mobile-web-app-capable"/>
 <meta content="Airbnb" name="application-name"/>
 <meta content="Airbnb" name="apple-mobile-web-app-title"/>
 <meta content="#ffffff" name="theme-color"/>
 <meta content="#ffffff" name="msapplication-navbutton-color"/>
 <meta content="black-translucent" name="apple-mobile-web-app-status-bar-style"/>
 <meta content="/?utm_source=homescreen" name="msapplication-starturl"/>
 <link crossorigin="anonymous" href="

  script_tag = soup.find('script', text=re.compile('niobeMinimalClientData'))


KeyError: 'niobeMinimalClientData'

In [None]:
print(json_data["root > core-guest-spa"])

[['ExperimentsDataToken', {'china_web_revamp': {'subject': 'visitor', 'buckets': 100, 'percent_exposed': 100, 'treatments': [{'name': 'control', 'buckets': 50}, {'name': 'treatment', 'buckets': 50}], 'hashing_key': 'china_web_revamp', 'sitar_overrides': {}, 'trebuchets': []}, 'installed_pwa': {'subject': 'visitor', 'buckets': 2, 'percent_exposed': 10, 'treatments': [{'name': 'control', 'buckets': 1}, {'name': 'treatment', 'buckets': 1}], 'hashing_key': 'installed_pwa', 'sitar_overrides': {}, 'trebuchets': []}, 'installed_pwa_parallel': {'subject': 'visitor', 'buckets': 2, 'percent_exposed': 10, 'treatments': [{'name': 'control', 'buckets': 1}, {'name': 'treatment', 'buckets': 1}], 'hashing_key': 'installed_pwa_parallel', 'sitar_overrides': {}, 'trebuchets': []}, 'contact_host_sections_preload_query_v4': {'subject': 'user', 'buckets': 100, 'percent_exposed': 100, 'treatments': [{'name': 'control', 'buckets': 50}, {'name': 'treatment', 'buckets': 50}], 'hashing_key': 'contact_host_sectio

In [None]:
airbnb_data['month'] = airbnb_data['check_in'].dt.month

# Prepare data for graph
# Extract month from check_in date


# Group by month and calculate average price
airbnb_graph_data = airbnb_data.groupby('month')['price'].mean().reset_index()

airbnb_graph_data.to_csv("airbnb_graph_data.csv", index=False)
airbnb_graph_data.head()

In [2]:
airbnb_data = pd.read_csv("airbnb_data.csv")
airbnb_data["check_in"] = pd.to_datetime(airbnb_data["check_in"])
airbnb_data["check_out"] = pd.to_datetime(airbnb_data["check_out"])

airbnb_data['month'] = airbnb_data['check_in'].dt.month

# Prepare data for map

airbnb_map_data = airbnb_data.groupby(['latitude', 'longitude', 'month'])['price'].mean().reset_index()
airbnb_map_data.to_csv("airbnb_map_data.csv", index=False)

airbnb_map_data.head()

Unnamed: 0,latitude,longitude,month,price
0,40.3441,-3.691984,1,20.0
1,40.3441,-3.691984,2,21.0
2,40.3441,-3.691984,3,21.0
3,40.372076,-3.693983,7,67.0
4,40.378957,-3.670362,2,98.0


In [23]:
from geopy.geocoders import Nominatim

def get_district(latitude, longitude):
    # Initialize Nominatim API
    geolocator = Nominatim(user_agent="map_app_airbnb", timeout=7200)

    # Get location with reverse geocode
    location = geolocator.reverse((latitude, longitude), exactly_one=True)

    if location:
        address = location.raw['address']
        district = address.get('city_district')
        if district == None:
            district = address.get('suburb')         
        return district
    else:
        return "District not found"

# Example usage
latitude = 40.748817
longitude = -73.985428
print(get_district(40.3789571457818,-3.6703623401582193))


airbnb_map_data['region'] = airbnb_map_data.apply(lambda x: get_district(x['latitude'], x['longitude']), axis=1)
print(airbnb_map_data.head())
airbnb_map_data.to_csv("airbnb_map_data.csv", index=False)

Puente de Vallecas
    latitude  longitude  month  price              region
0  40.344100  -3.691984      1   20.0       San Cristóbal
1  40.344100  -3.691984      2   21.0       San Cristóbal
2  40.344100  -3.691984      3   21.0       San Cristóbal
3  40.372076  -3.693983      7   67.0               Usera
4  40.378957  -3.670362      2   98.0  Puente de Vallecas


In [24]:
airbnb_map = airbnb_map_data.groupby(['region', 'month'])['price'].mean().reset_index()

print(airbnb_map.head())
airbnb_map.to_csv("airbnb_map.csv", index=False)

       region  month      price
0  Arganzuela      1  70.924421
1  Arganzuela      2  59.229167
2  Arganzuela      3  64.654762
3  Arganzuela      4  70.192308
4  Arganzuela      5  62.500000


In [31]:
# Scrape rentalia

def scrape_rentalia(check_in, check_out):
    url = f"https://www.rentalia.com/houses/search_res.php?checkin={check_in}&checkout={check_out}&rooms=1&idgeo=314&tpre=d&chars="
    # Send a request to the URL
    response = requests.get(url)

    # Check if the request was successful
    if response.status_code == 200:
        # Parse the content of the response with BeautifulSoup
        soup = BeautifulSoup(response.content, 'html.parser')

        print(soup.prettify())

        # Find all listings (modify the class based on actual web page structure)
        listings = soup.find_all('div', class_='listing')

        # Extract information from each listing
        extracted_listings = []
        for listing in listings:
            # Extract details like title, location, price (modify selectors as needed)
            title = listing.find('h2').get_text(strip=True)
            location = listing.find('span', class_='location').get_text(strip=True)
            price = listing.find('span', class_='price').get_text(strip=True)
            
            extracted_listings.append({
                'title': title,
                'location': location,
                'price': price
            })

        # Do something with extracted_listings (e.g., print, save to file)
        print(extracted_listings)
    else:
        print("Failed to retrieve the webpage")

scrape_rentalia('01%2F01%2F2025', '02%2F01%2F2025')


<!DOCTYPE html>
<html lang="en" ng-app="rentalia" xmlns:fb="http://www.facebook.com/2008/fbml" xmlns:og="http://opengraphprotocol.org/schema/">
 <head>
  <meta charset="utf-8"/>
  <meta content="Rentalia" name="application-name"/>
  <meta content="#ffffff" name="msapplication-TileColor"/>
  <meta content="https://st-rentalia.com/images/favicons/mstile-144x144.png" name="msapplication-TileImage"/>
  <meta content="text/html;charset=utf-8" http-equiv="Content-Type"/>
  <meta content="no-cache" http-equiv="Cache-Control"/>
  <meta content="en" http-equiv="Content-Language"/>
  <meta content="IE=edge,chrome=1" http-equiv="X-UA-Compatible"/>
  <meta content="width=device-width, height=device-height, initial-scale=1.0" name="viewport"/>
  <title>
   Holiday rentals Madrid. Apartments, holiday homes and villas - Self catering
  </title>
  <meta content="ON" http-equiv="x-dns-prefetch-control"/>
  <link href="https://img00.rhimg.com" rel="dns-prefetch"/>
  <link crossorigin="crossorigin" href=

In [63]:
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.chrome.service import Service
from webdriver_manager.chrome import ChromeDriverManager
import time

# Set up the WebDriver
service = Service(ChromeDriverManager().install())
driver = webdriver.Chrome(service=service)

def scrape_kayak(check_in, check_out):
    url = f"https://www.kayak.es/hotels/Madrid,Espana-c32213/{check_in}/{check_out}/1adults?sort=rank_a"

    # Open the URL
    driver.get(url)

    time.sleep(5)  # Adjust time as needed

    # Click on accept cookies
    driver.find_element(
        By.XPATH,
        "//div[contains(@class, 'RxNS-button-content') and contains(text(), 'Aceptar')]",
    ).click()

    return driver


check_in = "2024-01-01"
check_out = "2024-01-02"

listings = scrape_kayak(check_in, check_out)
print(listings)


<selenium.webdriver.chrome.webdriver.WebDriver (session="1a4dd493b895fbb50c935729f5f370e4")>


AttributeError: type object 'By' has no attribute 'CssSelector'

In [62]:
def extract_kayak(listings, check_in, check_out):
    extracted_listings = []
    # Extract details like title, location, price (modify selectors as needed)
    locations = listings.find_elements(
        By.CLASS_NAME, "FLpo-location-name"
    )  # Adjust class name as needed
    prices = listings.find_elements(
        By.CssSelector, "[data-target='price']"
    ) # Adjust class name as needed

    if (locations.len == prices.len):
        for i in range(locations.len):
            extracted_listings.append(
                {
                    "check_in": check_in,
                    "check_out": check_out,
                    "region": locations[i],
                    "price": prices[i],
                }
            )

    # Close the browser
    # driver.quit()
    return extracted_listings

print(extract_kayak(listings, check_in, check_out))

NoSuchElementException: Message: no such element: Unable to locate element: {"method":"xpath","selector":"//div[contains(@class, 'RxNS-button-content') and contains(text(), 'Aceptar')]"}
  (Session info: chrome=119.0.6045.199); For documentation on this error, please visit: https://www.selenium.dev/documentation/webdriver/troubleshooting/errors#no-such-element-exception
Stacktrace:
0   chromedriver                        0x00000001030d2004 chromedriver + 4169732
1   chromedriver                        0x00000001030c9ff8 chromedriver + 4136952
2   chromedriver                        0x0000000102d1f500 chromedriver + 292096
3   chromedriver                        0x0000000102d647a0 chromedriver + 575392
4   chromedriver                        0x0000000102d9f818 chromedriver + 817176
5   chromedriver                        0x0000000102d585e8 chromedriver + 525800
6   chromedriver                        0x0000000102d594b8 chromedriver + 529592
7   chromedriver                        0x0000000103098334 chromedriver + 3932980
8   chromedriver                        0x000000010309c970 chromedriver + 3950960
9   chromedriver                        0x0000000103080774 chromedriver + 3835764
10  chromedriver                        0x000000010309d478 chromedriver + 3953784
11  chromedriver                        0x0000000103072ab4 chromedriver + 3779252
12  chromedriver                        0x00000001030b9914 chromedriver + 4069652
13  chromedriver                        0x00000001030b9a90 chromedriver + 4070032
14  chromedriver                        0x00000001030c9c70 chromedriver + 4136048
15  libsystem_pthread.dylib             0x0000000180a2d034 _pthread_start + 136
16  libsystem_pthread.dylib             0x0000000180a27e3c thread_start + 8
