<h1><center> Python project</center></h1>
<h2><center>Current car prices and other relevant parameters from bazos.cz</center></h2>
<h3><center>Daniel Brosko, Vojtěch Suchánek</center></h3>

Our goal is to web-scrape advertisements listed on website bazos.cz, which is currently one of the most used websites for selling used cars in Czech republic. It has more than 15 000 car adds daily. On the other hand, it has really poor search options, which pretty much complicates searching for desired car based on your parameters.

We are going to code algorithm, which will scan adds for the current day, pick those, which fulfill our conditions on date and car type and save their links. Then we will go to each link and save the text of the add. Then we will try to analyze the text of the add to find our parameters.

This approach might also allow for longer time period analysis in further steps - we would collect data periodically and investigate the trends in price changes, number of ads for selected car added during particular days, and more. However, since this project should be designed as one-time run, we decided to limit the data to only current date.

In [1]:
import requests
from bs4 import BeautifulSoup
import re
from datetime import date, datetime, timedelta
import time
import numpy as np
import pandas as pd

The commented line below displays the version of packages so they can be used in requirements.txt.

In [2]:
#pip list

By the code in the following chunk, we checked that we are allowed to scrape particular parts of bazos.cz domain.

In [3]:
bots = requests.get('https://auto.bazos.cz/robots.txt')
#print(bots.text)

From the robots page we can see that our actions done in our projects are allowed, since we are not gonna use these search commands.

In the next chunk, we import two .py scripts where we defined the functions search_model, and n_days_search to filter and include only advertisements relevant to our preferences. More comments on the functions are printed few chunks below - where we print the documentation, but also by looking at .py scripts directly in GitHub.

In [4]:
from model_search import search_model
from n_days_search import n_days_search
from data_mining import *

In [5]:
print("Enter your desired car model here:")
my_search = input()


Enter your desired car model here:
octavia 3


In the next chunk, we filter for the ads added today + max 5 days old. If we compared the number for today (2022-08-30) there were 98 at the time, while the number of all ads for the same car-model input was 2022.

In [9]:
soup_list = search_model(my_search)

Loaded https://auto.bazos.cz/20/?hledat=octavia+3&hlokalita=&humkreis=25&cenaod=&cenado=&order= from cache.
Loaded https://auto.bazos.cz/40/?hledat=octavia+3&hlokalita=&humkreis=25&cenaod=&cenado=&order= from cache.
Loaded https://auto.bazos.cz/60/?hledat=octavia+3&hlokalita=&humkreis=25&cenaod=&cenado=&order= from cache.
Loaded https://auto.bazos.cz/80/?hledat=octavia+3&hlokalita=&humkreis=25&cenaod=&cenado=&order= from cache.
Loaded https://auto.bazos.cz/100/?hledat=octavia+3&hlokalita=&humkreis=25&cenaod=&cenado=&order= from cache.
Loaded https://auto.bazos.cz/120/?hledat=octavia+3&hlokalita=&humkreis=25&cenaod=&cenado=&order= from cache.
Loaded https://auto.bazos.cz/140/?hledat=octavia+3&hlokalita=&humkreis=25&cenaod=&cenado=&order= from cache.
Loaded https://auto.bazos.cz/160/?hledat=octavia+3&hlokalita=&humkreis=25&cenaod=&cenado=&order= from cache.
Loaded https://auto.bazos.cz/180/?hledat=octavia+3&hlokalita=&humkreis=25&cenaod=&cenado=&order= from cache.
Loaded https://auto.baz

Loaded https://auto.bazos.cz/1520/?hledat=octavia+3&hlokalita=&humkreis=25&cenaod=&cenado=&order= from cache.
Loaded https://auto.bazos.cz/1540/?hledat=octavia+3&hlokalita=&humkreis=25&cenaod=&cenado=&order= from cache.
Loaded https://auto.bazos.cz/1560/?hledat=octavia+3&hlokalita=&humkreis=25&cenaod=&cenado=&order= from cache.
Loaded https://auto.bazos.cz/1580/?hledat=octavia+3&hlokalita=&humkreis=25&cenaod=&cenado=&order= from cache.
Loaded https://auto.bazos.cz/1600/?hledat=octavia+3&hlokalita=&humkreis=25&cenaod=&cenado=&order= from cache.
Loaded https://auto.bazos.cz/1620/?hledat=octavia+3&hlokalita=&humkreis=25&cenaod=&cenado=&order= from cache.
Loaded https://auto.bazos.cz/1640/?hledat=octavia+3&hlokalita=&humkreis=25&cenaod=&cenado=&order= from cache.
Loaded https://auto.bazos.cz/1660/?hledat=octavia+3&hlokalita=&humkreis=25&cenaod=&cenado=&order= from cache.
Loaded https://auto.bazos.cz/1680/?hledat=octavia+3&hlokalita=&humkreis=25&cenaod=&cenado=&order= from cache.
Loaded htt

In [10]:
my_days = int(input("Enter the number of past days (max. 5) that you want to include in your search here: "))

Enter the number of past days (max. 5) that you want to include in your search here: 5


In [11]:
list_of_offers_url = n_days_search(my_days, soup_list)

The number of found advertisements matching the criteria: 497.


In [12]:
# Here we print the documentation for our functions we imported earlier.

help(search_model)

help(n_days_search)

Help on function search_model in module model_search:

search_model(user_input: str)
    Function search_model takes string input, hence it has to be in quotes ("").
    The input should be the name of the car model you would like to get results for,
    e.g. "octávia 3" - there should be no problem even when full Czech alphabet is used.
    
    Then the string is stripped of the characters that are not supposed to be in the search input,
    and if there are more than single word in the input, they are connected by '+' (plus) sign,
    since that is the format that bazos.cz use in their URLs.
    
    Then, the prepared string is paste into the common URL format that bazos.cz use.
    
    The very next step is obtaining the number of found advertisements for user's input from the html source code.
    This number of advertisements is used to select the proper length of the adv. tabs list that we will scrape.
    
    The function returns "soup_list" - the list of html codes for each

Finally, we proceed to Data-mining part, where we extract the desired parameters - year of manufacture, year, and price. 

This is probably the most demanding part of the project - we need to extract the relevant data from unformated text. There is no official format of the text, so we tried to find a way how to extract this information from various formats. The results are not the best since sometimes it happen that our code is not able to recognize the unusual format of the parameter. In further steps, probably implementing some ML algorithm could improve the successful recognition significantly.

We save all of those parameters along with the URLs of particular advertisements. We created a class ResultTable that has two methods - "show_results" and "show_best" by which we can display the best recommended ads for our desired car model.

Hence, now we can take a look on potentially most interesting advertisements for us by following the URLs and checking the entire content of several ads instead of looking at "thousands" of them.

In [13]:
# DATA/TEXT MINING PART
result = get_info(list_of_offers_url)

https://auto.bazos.cz/inzerat/158354343/skoda-octavia-3-15tsi-110kw-style-koupcr1majitel2019.php <Response [200]>




  soup_add = BeautifulSoup(add_page.text, 'html')


https://auto.bazos.cz/inzerat/158297198/skoda-octavia-3-20tdi-110kw-dsg-plna-zaruka-2-roky-zdarma.php <Response [200]>
https://auto.bazos.cz/inzerat/158159992/skoda-octavia-rs-20tdi-135kw-dsg-plna-zaruka-2-roky-zdarma.php <Response [200]>
https://auto.bazos.cz/inzerat/158159895/skoda-octavia-3-14tsi-g-tec-81kw-plna-zaruka-2-roky-zdarma.php <Response [200]>
https://auto.bazos.cz/inzerat/158113235/skoda-octavia-3-20tdi-110kw-dsg-lk-zaruka-2-roky-zdarma.php <Response [200]>
https://auto.bazos.cz/inzerat/158130385/octavia-iii-16-tdi-77-kw-top-stav.php <Response [200]>
https://auto.bazos.cz/inzerat/158068402/skoda-octavia-3-rs-20tdi-135kw-plna-zaruka-2-roky-zdarma.php <Response [200]>
https://auto.bazos.cz/inzerat/158042935/octavia-iii-14-tsi-dsg-business-style-dph.php <Response [200]>
https://auto.bazos.cz/inzerat/158025175/skoda-octavia-3-16tdi-85kw-dsg-plna-zaruka-2-roky-zdarma.php <Response [200]>
https://auto.bazos.cz/inzerat/157968553/octavia-iii-14-tsi-110kw-dsg-odpocet-dph-style-bus

https://auto.bazos.cz/inzerat/156512794/skoda-octavia-3-fc-16tdi-85kw-koupcr1majiteltazne2017.php <Response [200]>
https://auto.bazos.cz/inzerat/156542867/skoda-octavia-combi-20-tdi-062017.php <Response [200]>
https://auto.bazos.cz/inzerat/157700288/octavia-3-r17-zimni-sada.php <Response [200]>
https://auto.bazos.cz/inzerat/155904809/octavia-iii-16tdi-81kw-style-1maj-2016-cr-digiklima-dph.php <Response [200]>
https://auto.bazos.cz/inzerat/156511278/skoda-octavia-3-fc-20tdi-110kw-4x4-koupcr1majitel78tkm.php <Response [200]>
https://auto.bazos.cz/inzerat/158082074/octavia-3-116tiskm-16-tdi-2017-cr-puvod.php <Response [200]>
https://auto.bazos.cz/inzerat/158070595/skoda-octavia-3-combi-20tdi-110kw-dsg-fullledkuzevirtual.php <Response [200]>
https://auto.bazos.cz/inzerat/158070993/skoda-octavia-3-combi-20tdi-110kw-dsg-fullledtaznynavi.php <Response [200]>
https://auto.bazos.cz/inzerat/158071596/skoda-octavia-3-combi-20-tdi-110-kw-4x4-dsg-fullledtazne.php <Response [200]>
https://auto.bazos

https://auto.bazos.cz/inzerat/158319187/predam-cierne-klucky-krytky-kluciek-skoda-volkswagen.php <Response [200]>
https://auto.bazos.cz/inzerat/158318838/mlhovky-octavia-3.php <Response [200]>
https://auto.bazos.cz/inzerat/158317716/leve-zadni-svetlo-skoda-octavia-3-kombi.php <Response [200]>
https://auto.bazos.cz/inzerat/158317405/chladic-octavia-3.php <Response [200]>
https://auto.bazos.cz/inzerat/158317404/drzak-ventilatoru-octavia-3.php <Response [200]>
https://auto.bazos.cz/inzerat/158317402/chladic-vody-octavia-3.php <Response [200]>
https://auto.bazos.cz/inzerat/158317401/chladic-klimatizace-octavia-3.php <Response [200]>
https://auto.bazos.cz/inzerat/158317398/kapota-octavia-3-po-faceliftu.php <Response [200]>
https://auto.bazos.cz/inzerat/158317396/5dvere-octavia-3.php <Response [200]>
https://auto.bazos.cz/inzerat/158317389/pz-dvere-octavia-3.php <Response [200]>
https://auto.bazos.cz/inzerat/158317386/pz-dvere-octavia-3-combi.php <Response [200]>
https://auto.bazos.cz/inzera

https://auto.bazos.cz/inzerat/158266043/5x112-r18-skoda-octavia-3-rs-nove-zimni-pneu.php <Response [200]>
https://auto.bazos.cz/inzerat/158265756/skoda-octavia-20tdi-110kw-sfyle-140tkm-cr-2017.php <Response [200]>
https://auto.bazos.cz/inzerat/158265414/orig-alu-disky-5x112-r15-volkswagen-skoda-nove.php <Response [200]>
https://auto.bazos.cz/inzerat/157343782/skoda-octavia-iii-20-tdi-110kw-dsg-2016-189585km-navi.php <Response [200]>
https://auto.bazos.cz/inzerat/158263861/alu-seat-leon-iiileon-fr-letni-22545-r17-hankook.php <Response [200]>
https://auto.bazos.cz/inzerat/158263140/treti-brzdove-svetlo-octavia-3-combi-a-superb-3.php <Response [200]>
https://auto.bazos.cz/inzerat/158262382/skoda-octavia-3-fabia-3-kapoty-narazniky-blatniky-napravy-dv.php <Response [200]>
https://auto.bazos.cz/inzerat/158262138/skoda-octavia-3-mlhove-svetlo-leve.php <Response [200]>
https://auto.bazos.cz/inzerat/158262121/ridici-jednotka-5-dveri-skoda-vw-audi.php <Response [200]>
https://auto.bazos.cz/inzer

https://auto.bazos.cz/inzerat/158208966/prodam-naraznik-skoda-octavia-3.php <Response [200]>
https://auto.bazos.cz/inzerat/158208623/maska-octavia-3-rs.php <Response [200]>
https://auto.bazos.cz/inzerat/158207852/16-alu-kola-5x112-audi-skoda-vw-nove-zimni.php <Response [200]>
https://auto.bazos.cz/inzerat/158207551/20-alu-kola-5x112-bmw-i8-vw-skoda-seat-audi-nove-zimni.php <Response [200]>
https://auto.bazos.cz/inzerat/158207485/nove-15-plechove-disky-skoda-octavia-3vw-golf-7seat-leon-3.php <Response [200]>
https://auto.bazos.cz/inzerat/158207480/originalni-ocelove-disky-15-skoda-octavia-3-zimni-pneu.php <Response [200]>
https://auto.bazos.cz/inzerat/158207301/turbo-ihi-is12-18-tsitfsi.php <Response [200]>
https://auto.bazos.cz/inzerat/158205334/zimni-pneu-barum-polaris-5-22545-r18-v-xl.php <Response [200]>
https://auto.bazos.cz/inzerat/158204499/viko-kufru-octavia-3.php <Response [200]>
https://auto.bazos.cz/inzerat/158203287/skoda-octavia-iii-lift-dily.php <Response [200]>
https://au

https://auto.bazos.cz/inzerat/158154410/difuzor-octavia-3-rs.php <Response [200]>
https://auto.bazos.cz/inzerat/157732026/skoda-octavia-3-kapotanaskanaraznik-blatnik-svetlo.php <Response [200]>
https://auto.bazos.cz/inzerat/158154358/ocel-disky-octavia-3.php <Response [200]>
https://auto.bazos.cz/inzerat/158153298/prevodovka-octavia-3-16-tdi-nova-0a4300047l.php <Response [200]>
https://auto.bazos.cz/inzerat/157742417/octavia-superb-golf-passat-zanovni-zimni-alu-sada-18.php <Response [200]>
https://auto.bazos.cz/inzerat/157799472/skoda-octavia-3-rs-zanovni-zimni-alu-kola-17-et48.php <Response [200]>
https://auto.bazos.cz/inzerat/157736741/golf-octavia-rs-superb-zanovni-zimni-alu-kola-17.php <Response [200]>
https://auto.bazos.cz/inzerat/158151103/nove-alu-disky-vw-golf-nebo-skoda-octavia-3.php <Response [200]>
https://auto.bazos.cz/inzerat/158150441/zadni-prave-svetlo-skoda-octavia-3.php <Response [200]>
https://auto.bazos.cz/inzerat/158150115/kapota-octavia-3.php <Response [200]>
https

https://auto.bazos.cz/inzerat/158091523/zadni-naraznik-skoda-octavia-3-combi.php <Response [200]>
https://auto.bazos.cz/inzerat/156787528/octavia-3-fl-style-16-tdi-85-kw.php <Response [200]>
https://auto.bazos.cz/inzerat/158090165/airbagova-sada-skoda-octavia-iii3.php <Response [200]>
https://auto.bazos.cz/inzerat/158089899/prevodovka-skoda-octavia-3-16tdi.php <Response [200]>
https://auto.bazos.cz/inzerat/158088760/servomotor-vzpera-5-dveri-skoda-octavia-3-combi-prava.php <Response [200]>
https://auto.bazos.cz/inzerat/158086884/volant-octavia-3-novy.php <Response [200]>


In [14]:
pd.options.display.max_colwidth = 120
test = ResultTable(result)
test.show_results(min_price = 50000, max_price = 350000, min_year = 2013, max_year = 2018, min_mileage = 100000, max_mileage = 200000)
test.show_best(n = 10)

                                                                                                    link  \
45                                           https://auto.bazos.cz/inzerat/158297601/skoda-octavia-3.php   
65                              https://auto.bazos.cz/inzerat/158156129/skoda-octavia-3-20-tdi-110kw.php   
213                                          https://auto.bazos.cz/inzerat/158285302/skoda-octavia-3.php   
81   https://auto.bazos.cz/inzerat/155904809/octavia-iii-16tdi-81kw-style-1maj-2016-cr-digiklima-dph.php   
401                                          https://auto.bazos.cz/inzerat/158160718/skoda-octavia-3.php   
282             https://auto.bazos.cz/inzerat/158240372/skoda-octavia-3-20tdi110kw-velmi-krasne-auto.php   
29   https://auto.bazos.cz/inzerat/157202880/skoda-octavia-3-combi-elegance-16tdi-77kw-nakup-v-klidu.php   
265                    https://auto.bazos.cz/inzerat/158252838/skoda-octavia-3-20-tdi-110kw-4x4-2014.php   
403                         

In [23]:
result.to_csv('result_filtered_backup.csv', index=False)
print(result)

                                                                                                       link  \
0          https://auto.bazos.cz/inzerat/158354343/skoda-octavia-3-15tsi-110kw-style-koupcr1majitel2019.php   
1     https://auto.bazos.cz/inzerat/158297198/skoda-octavia-3-20tdi-110kw-dsg-plna-zaruka-2-roky-zdarma.php   
2    https://auto.bazos.cz/inzerat/158159992/skoda-octavia-rs-20tdi-135kw-dsg-plna-zaruka-2-roky-zdarma.php   
3    https://auto.bazos.cz/inzerat/158159895/skoda-octavia-3-14tsi-g-tec-81kw-plna-zaruka-2-roky-zdarma.php   
4       https://auto.bazos.cz/inzerat/158113235/skoda-octavia-3-20tdi-110kw-dsg-lk-zaruka-2-roky-zdarma.php   
..                                                                                                      ...   
469    https://auto.bazos.cz/inzerat/158105721/skoda-octavia-3-kombi-20tdi-110kw-navi-zaruka-kmrok-2018.php   
470     https://auto.bazos.cz/inzerat/158105670/skoda-octavia-3-rs-kombi-20tdi-navi-xenony-zaruka-135kw.php   
4