<a href="https://colab.research.google.com/github/blackcrowX/Data_Analytics_Projects/blob/main/Python/Extraction_EngelVoelkers_Web_Scraper.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

<div align="center">
<h1>Extraction - EngelVoelkers Web Scraper</h1>
<img src="https://i.postimg.cc/K8mbkyhz/Logo-Black.png"/>
</div>

## Table of Contents
1. Import Libraries
2. Extract Data
3. Data Cleaning
4. Date of Data
5. Create a CSV
6. Create a Function

## Step 1: Import Libraries

Import and configure libraries required for data extraction.

In [None]:
import pandas as pd

from bs4 import BeautifulSoup
import requests
import time

import smtplib

## Step 2: Extract Data

Use BeautifulSoup to extract the data from `engelvoelkers.com`.

In [None]:
url = "https://www.engelvoelkers.com/de-de/exposes/klassisches-altbremer-haus-im-fesenfeld-4621928.1541295_exp/"
headers = {"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/78.0.3904.108 Safari/537.36", "Accept-Encoding":"gzip, deflate", "Accept":"text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8", "DNT":"1","Connection":"close", "Upgrade-Insecure-Requests":"1"}
page = requests.get(url, headers=headers)

soup1 = BeautifulSoup(page.content, 'html.parser')
soup2 = BeautifulSoup(soup1.prettify(), "html.parser")

title = soup2.find(itemprop='name').get_text()
price = soup2.find(itemprop='price').get_text()

## Step 3: Data Cleaning

Let's clean up the data a little bit.

In [None]:
price = price.strip()
title = title.strip()

print(title)
print(price)

Klassisches Altbremer Haus im Fesenfeld
900.000


## Step 4: Date of Data
Create a timestamp for your output to track when data was collected.

In [None]:
import datetime

today = datetime.date.today()

print(today)

2023-05-22


## Step 5: Create A CSV

Create CSV and write headers and data into the file.

In [None]:
import csv 

header = ['Title', 'Price', 'Date']
data = [title, price, today]


with open('EngelVolkersWebScraperDataset.csv', 'w', newline='', encoding='UTF8') as f:
    writer = csv.writer(f)
    writer.writerow(header)
    writer.writerow(data)

In [None]:
df = pd.read_csv(r'/content/EngelVolkersWebScraperDataset.csv')

print(df)

                                     Title  Price        Date
0  Klassisches Altbremer Haus im Fesenfeld  900.0  2023-05-22


Now we are appending data to the CSV.

In [None]:
with open('EngelVolkersWebScraperDataset.csv', 'a+', newline='', encoding='UTF8') as f:
    writer = csv.writer(f)
    writer.writerow(data)

## Step 6: Create a Function
Combine all of the above code into one function.

In [None]:
def check_price():
    url = "https://www.engelvoelkers.com/de-de/exposes/klassisches-altbremer-haus-im-fesenfeld-4621928.1541295_exp/"
    headers = {"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/78.0.3904.108 Safari/537.36", "Accept-Encoding":"gzip, deflate", "Accept":"text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8", "DNT":"1","Connection":"close", "Upgrade-Insecure-Requests":"1"}
    page = requests.get(url, headers=headers)

    soup1 = BeautifulSoup(page.content, 'html.parser')
    soup2 = BeautifulSoup(soup1.prettify(), "html.parser")

    title = soup2.find(itemprop='name').get_text()
    price = soup2.find(itemprop='price').get_text()

    price = price.strip()
    title = title.strip()

    import datetime

    today = datetime.date.today()
    
    import csv 

    header = ['Title', 'Price', 'Date']
    data = [title, price, today]

    with open('EngelVolkersWebScraperDataset.csv', 'a+', newline='', encoding='UTF8') as f:
        writer = csv.writer(f)
        writer.writerow(data)

Runs `check_price` after a set time and inputs data into your CSV.

In [None]:
while(True):
    check_price()
    time.sleep(86400)

KeyboardInterrupt: ignored

In [None]:
df = pd.read_csv(r'/content/EngelVolkersWebScraperDataset.csv')

print(df)

                                     Title  Price        Date
0  Klassisches Altbremer Haus im Fesenfeld  900.0  2023-05-22
1  Klassisches Altbremer Haus im Fesenfeld  900.0  2023-05-22
2  Klassisches Altbremer Haus im Fesenfeld  900.0  2023-05-22
