# Price and Product Tracker
---
---

This price and product tracker tracks product prices and product details from the German "Kleinanzeigen" website. It can be used for any kind of product, but is optimized for the product type "cameras".

In [None]:
import requests
from bs4 import BeautifulSoup
import pandas as pd
import re

# PART A: Data Acquisition
---

## Step 1: Ask user for product-name and keywords

This piece of code asks the user for a specific product on the German website "Kleinanzeigen". The user is also asked for some keywords describing the product. The program asks for user-inputs in **German** language.

**Example Product**: Nikon D 7500

**Example Keywords**: Neu Objektiv Kit OVP

In [24]:
product_input = str(input('Which product are you looking for? (German)'))
i = 0
product = ''

for words in product_input.split(' '):
    if i == (len(product_input.split(' '))-1):
        product = product + words
    else: 
        product = product + words + '-'
    i += 1

keys_input = str(input('Which keywords would you like to filter for? (German, no comma, max. 4)'))
i = 0
keys = []

for words in keys_input.split(' '):
    keys += [words]

## Step 2: Create product list for specific product

This code snippet creates a product list. The output is a csv file.

In [26]:
url = f'https://www.kleinanzeigen.de/s-{product}/k0'

# count number of subpages

p = requests.get(url)
website_data = BeautifulSoup(p.text, 'html.parser')
number_of_pages = len(website_data.find_all('a', {'class': 'pagination-page'}))

# get productlist

def get_productlist(url):
    
    # check each page
    for i in range(1,number_of_pages): 
        if i > 1: 
            url = f'https://www.kleinanzeigen.de/s-seite:{i}/{product}/k0'
        
        # get html content
        r = requests.get(url)
        html_content = BeautifulSoup(r.text, 'html.parser')
        
        # get productlist
        if i == 1:
            productslist = []
        results = html_content.find_all('div', {'class': 'aditem-main'})
        for item in results:
            if item.find('a', {'class': 'ellipsis'}) is not None:
                products = {
                    'title': item.find('a', {'class': 'ellipsis'}).text.replace(',','.'),
                    'price': int(item.find('p', {'class': 'aditem-main--middle--price-shipping--price'}).text.replace(' ','').replace('\n','').replace('.','').split("€")[0].replace('VB','0').replace('Zuverschenken','0').replace('"','0')),
                    'link': item.find('a', {'class': 'ellipsis'})['href']
                }
            productslist.append(products)
    return productslist

def create_dataframe(productslist):
    df_products = pd.DataFrame(productslist)
    df_products = df_products.sort_values(by=['price'])
    df_products = df_products.drop_duplicates()
    df_products.to_csv('products_all.csv', index=False)
    print('You are looking for: ' + product_input)
    print(len(df_products), 'products saved to products_all.csv-file in your folder.')
    return df_products

df_products = create_dataframe(get_productlist(url))

You are looking for: Nikon D 7500
49 products saved to products_all.csv-file in your folder.


## Step 3: Filter for key words in title (optional)

This code snippet creates a product list. The output is a csv file.

In [20]:
# optional: reduce list products with keywords

print('Your keywords are:', keys)
i = 0

def find_keywords(df_products):
    df_products_key = pd.DataFrame()
    for entry in df_products['title']:
        if keys[0] in entry or keys[1] in entry or keys[2] in entry or keys[3] in entry:
            filtered_entry = df_products[df_products['title'] == entry]
            df_products_key = pd.concat([df_products_key, filtered_entry], ignore_index=True)
    df_products_key.to_csv('products_key.csv', index=False)
    print(len(df_products_key), 'filtered products saved to products_key.csv-file in your folder.')

find_keywords(df_products)

Your keywords are: ['Neu', 'OVP', 'einwandfrei', 'gepflegt']
0 filtered products saved to products_key.csv-file in your folder.


## Step 4: Get product description of each product

This code snippet fetches all products descriptions from each subpage. It takes around 30sec to generate an output for around 50 items. The output is a csv file.

In [None]:
# add column in dataframe
df_products_detailed = df_products
df_products_detailed["description"] = ' '
i = 0

# check each page
for subpage in df_products['link']:
    if type(subpage) == str:
        url = 'https://www.kleinanzeigen.de' + subpage
        description = ''
        q = requests.get(url)
        html_content = BeautifulSoup(q.text, 'html.parser')
        if html_content.find('p', {'id': 'viewad-description-text'}) is not None:
            description = html_content.find('p', {'id': 'viewad-description-text'}).text.replace('\n','').replace('  ','')
        df_products_detailed.loc[i,'description'] = description
        i += 1

df_products_detailed.to_csv('products_details.csv', index=False)

print('Products details saved to products_details.csv-file in your folder.')

# Bug: item description is in wrong line...

Products details saved to products_details.csv-file in your folder.


# Part B: Preprocessing
---

## Step 1: Sentence & Word Tokenization

In [None]:
import spacy
nlp = spacy.load('de_core_news_sm')

doc = nlp("Biete meine Nikon D7500 als Body inklusive Buch zur Kamera an.Wurde immer pfleglich behandelt. Kein Sand, kein Wasser, kein Staub.Kleiner Mangel: Menütaste Schrift beschädigt (s. Bild) und leichte Gebrauchsspuren.Die Kamera hat nur 9800 Auslösungen.Der Akku ist im tadellosen Zustand (0=wie neu).Habe auch einige Objektive und weiteres Zubehör im Angebot.")

for sentence in doc.sents:
    for word in sentence:
        print(word)

# this part is currently under construction

Biete
meine
Nikon
D7500
als
Body
inklusive
Buch
zur
Kamera
an
.
Wurde
immer
pfleglich
behandelt
.
Kein
Sand
,
kein
Wasser
,
kein
Staub
.
Kleiner
Mangel
:
Menütaste
Schrift
beschädigt
(
s.
Bild
)
und
leichte
Gebrauchsspuren
.
Die
Kamera
hat
nur
9800
Auslösungen
.
Der
Akku
ist
im
tadellosen
Zustand
(
0=wie
neu).Habe
auch
einige
Objektive
und
weiteres
Zubehör
im
Angebot
.


## Step 2: Stemming and Lemmatization

# Part C: Feature Engineering
---

...

# Part C (alternative): Regex
---

## Step 1: Find specific products via product description (regex)

**Note:** This steps is written for the product type "cameras" and cannot be used for other products. Instead of step 3, there is a better way to find specific cameras using **regular expressions**.

This piece of code filters all product descriptions with regard to:
- expressions describing the camera as "little used".
- less than 10.000 shutter releases (which is a good number for a used camera)

In [None]:
print(df_products["description"][7])

# regex
pattern = r'(\d{4}|((W|w)+(enig)+((e|en)?)))+((\smal)?)+(\s+(A|a)+(us)+((ge)?)+(lös))|(A|a)+(uslösung)+((\w+)?)+(\s)+(\d{4})|(wenig|kaum|nie|nicht|selten)+(\s)+(genutzt|benutzt|verwendet|fotografiert)'

for description in df_products["description"]:
    if len(re.findall(pattern, description)) != 0:
        print (description)

***Achtung! Die abgebildete Handschlaufe wird NICHT mitverkauft und ist nicht Teil dieses Angebots! ***Sehr wenig genutzte Nikon D7500 DSLR Spiegelreflexkamera (6181 Auslösungen, es werden aber noch ein paar dazukommen).Wurde als Zweitkamera genutzt.Mittlerweile habe ich auch einfach zu viele Kameras (ja, das geht). Also Kamerapark verkleinern...Die D7500 liegt ausgezeichnet in der Handund hat wegen der vielen Knöpfe und Rädchen jede Menge Einstellmöglichkeiten, ohne in Menüstrukturen eintauchen zu müssen. Für ambitionierte Fotografen das ideale Gerät!Die Kamera hat keine sichtbaren Gebrauchspuren.Die Kamera war Teil eines Kits, das Objektiv wurde bereits vor längerem verkauft. Mitgeliefert wird der nicht genutzte und originalverpackte Gurt, ein original Akku, Ladegerät, der Kameradeckel, ein USB-Kabel und die OVP.***Achtung! Die abgebildete Handschlaufe wird NICHT mitverkauft und ist nicht Teil dieses Angebots! ***Abholung bevorzugt!Nichtraucherhaushalt.Fragen? Fragen!*** so lange die

# Part D: Model Building
---

...

# Part E: Model Evaluation
---

...


---
---

In [None]:
# import watermark
# print(watermark.watermark(packages="spacy"))

spacy: 3.8.2

