## Introduction
This notebook will present the final product. Essentially, the goal was to collect all available motorcycle on the second-hand market and try to find the good deals. Since there is no predictor variable: good deal or bad deal, it is difficult to make an accurate predictive model. Another way to find the good deals and bad deals is according to a performance metric. This metric was built in the feature_engineering.ipynb. Below, we offer a way that an external person could make use of our project. Basically a function has been defined to allow user to find good deals in a certain category.

Example of how a user could filter the data:

    Type = "Super Sport"
    Power = "11 kW of minder"
    Brand = "Aprilia"

## Import packages

In [1]:
import pandas as pd
import numpy as np
# Homemade function to filter the dataframe
from FilterMoto import filter_data

In [2]:
help(filter_data)

Help on function filter_data in module FilterMoto:

filter_data(df, type_moto=None, power_moto=None, brand_moto=None)
    Function to filter the final dataframe. The user provides the dataframe and a type of motorcycle, and/or power, and/or brand.
    
    Args:
    df : User must provide a pandas dataframe.
    type_moto (str): Optional, default is all types. Otherwise, provide a valid string, example: 'Enduro'
    power_moto (str) : Optional, default is all kind of powers. Otherwise, provide a valid string. Example: 'meer dan 35 kW'
    brand_moto (str) : Optional, default is all brands available. Otherwise, provide a valid string. Example : 'BMW'
    
    Returns: Based on the input, the function returns a filtered pandas dataframe sorted by the best deal.



## Load data

In [3]:
# read final version of the dataframe
df = pd.read_csv('C:/Users/david/Desktop/courses/5_data_preprocessing/cu_moto_data/df_final.csv')
# removes all motorcycles for which there is no metric avaialble (could be because there wasn't enough data to make a metric, see feature engineering notebook for more information)
df = df.dropna(subset=['metric'])

In [4]:
# All possible values that can be provided in the filter_data function
Type_set = np.unique(df["type"])
Power_set = np.unique(df["power"])
Brand_set = np.unique(df["brand"])
print(Type_set)
print(Power_set)
print(Brand_set)

['Chopper' 'Crossmotor' 'Enduro' 'Overig' 'Scooter' 'Sport' 'SuperMoto'
 'Toermotor']
['11 kW of minder' '12 t/m 35 kW' 'meer dan 35 kW']
['Aprilia' 'BMW' 'Buell' 'Cagiva' 'Ducati' 'Harley-Davidson' 'Honda'
 'Husqvarna' 'Hyosung' 'KTM' 'Kawasaki' 'MV Agusta' 'Mash' 'Moto Guzzi'
 'Piaggio' 'Royal Enfield' 'Suzuki' 'Triumph' 'Yamaha']


#### Looking for Sport model and with strong engine

In [5]:
filter_data(df=df,type_moto = 'Sport', power_moto = 'meer dan 35 kW').head(20)

Unnamed: 0,id,metric,price_eur,km,age,cc,brand,type,power,title,location
3497,https://www.2dehands.be/m1822239772,-0.98659,60000,22600.0,15.010959,675.0,Triumph,Sport,meer dan 35 kW,Triumph daytona 675 motor 2007,Diest
1399,https://www.2dehands.be/m1789113895,-0.963743,0,18000.0,27.019178,1100.0,BMW,Sport,meer dan 35 kW,BMW collectie te koop,Knokke
1525,https://www.2dehands.be/m1437871470,-0.956864,150000,16000.0,9.005479,500.0,Honda,Sport,meer dan 35 kW,moteur complet cbr500r tournant,Philippeville
2817,https://www.2dehands.be/m1829921524,-0.910187,70000,28000.0,14.010959,1000.0,Kawasaki,Sport,meer dan 35 kW,Motor onderdelen kawasaki z1000,Houthalen+ Deel Van Zonhoven En Zolder
1783,https://www.2dehands.be/m1828063011,-0.873505,100000,3607.0,29.019178,,Kawasaki,Sport,meer dan 35 kW,Kawasaki ZZZ 750 Caferacer (bj 1993),Buitenland
2794,https://www.2dehands.be/m1824883785,-0.82083,220000,8929.0,16.010959,650.0,Hyosung,Sport,meer dan 35 kW,Hyosung Comet GT 650 1ste eigenaar,Bornem
3364,https://www.2dehands.be/m1824366766,-0.75871,175000,19000.0,18.013699,650.0,Suzuki,Sport,meer dan 35 kW,Suzuki sv année 2004,Verviers
308,https://www.2dehands.be/m1822718092,-0.739042,180000,15480.0,21.013699,,Honda,Sport,meer dan 35 kW,Honda hornet 600 s,Verviers
2308,https://www.2dehands.be/m1749734397,-0.704494,75000,21800.0,31.021918,600.0,Suzuki,Sport,meer dan 35 kW,Te koop suzuki gsx 600f,"Nijkerk, Nederland"
2394,https://www.2dehands.be/m1823137648,-0.68908,85000,30000.0,26.019178,750.0,Suzuki,Sport,meer dan 35 kW,***Suzuki GSXF 750***,Herzele


#### Looking for BMW motorcycles

In [6]:
filter_data(df=df,brand_moto = 'BMW').head(20)

Unnamed: 0,id,metric,price_eur,km,age,cc,brand,type,power,title,location
1399,https://www.2dehands.be/m1789113895,-0.963743,0,18000.0,27.019178,1100.0,BMW,Sport,meer dan 35 kW,BMW collectie te koop,Knokke
313,https://www.2dehands.be/m1829022510,-0.914107,10000,51000.0,27.019178,1083.0,BMW,Toermotor,meer dan 35 kW,BMW 1100 RT 100 EURO OM TE DEBATEREN,Drongen
1372,https://www.2dehands.be/m1806552639,-0.793611,340000,13160.0,14.010959,650.0,BMW,Overig,meer dan 35 kW,BMW F650 GS 13160 Km,Meerle
457,https://www.2dehands.be/m1829684758,-0.686728,134000,51730.0,7.005479,1200.0,BMW,Toermotor,meer dan 35 kW,R1200RT - 24m garantie,Rotselaar
3875,https://www.2dehands.be/m1830885073,-0.638598,149500,38000.0,28.019178,650.0,BMW,Enduro,meer dan 35 kW,BMW Funduro F650,Sinaai
943,https://www.2dehands.be/m1791180714,-0.638245,110000,72000.0,18.013699,1130.0,BMW,Toermotor,meer dan 35 kW,BMW R 1150 RT met schade,Zedelgem
333,https://www.2dehands.be/m1828734714,-0.610178,385000,7027.0,17.010959,652.0,BMW,Enduro,meer dan 35 kW,BMW F650 GS,Londerzeel
839,https://www.2dehands.be/m1826606486,-0.602108,275000,34000.0,22.016438,1100.0,BMW,Toermotor,meer dan 35 kW,BMW R1100S voor de echte liefhebber,Boekhoute
326,https://www.2dehands.be/m1828812647,-0.600023,320000,27472.0,20.013699,1130.0,BMW,Toermotor,meer dan 35 kW,Moto BMW 1150RT,Roeselare
3881,https://www.2dehands.be/m1830692102,-0.597252,200000,24500.0,24.016438,650.0,BMW,Toermotor,12 t/m 35 kW,BMW Funduro 650,Isnes


#### Looking for middle engine motorycles

In [7]:
filter_data(df=df,power_moto = '12 t/m 35 kW').head(20)

Unnamed: 0,id,metric,price_eur,km,age,cc,brand,type,power,title,location
1848,https://www.2dehands.be/m1776528638,-1.486912,12500,300.0,7.005479,300.0,Piaggio,Scooter,12 t/m 35 kW,Seat Vespa GTS 125 T/m 300 IE ABS V.a 2015 Ori...,Kortrijk
2365,https://www.2dehands.be/m1746690649,-1.300122,10000,10000.0,13.008219,1800.0,Honda,Toermotor,12 t/m 35 kW,Achterlichten & kofferdelen Goldwing 1800,Diepenbeek
2647,https://www.2dehands.be/m1812876731,-1.144526,6000,9000.0,6.005479,250.0,Honda,Enduro,12 t/m 35 kW,Honda crf250l uitlaat Delkevic,Herent
2598,https://www.2dehands.be/m1816115166,-1.070721,150000,100.0,2.00274,250.0,Honda,Crossmotor,12 t/m 35 kW,Moto cross 250cc,Boussu
2765,https://www.2dehands.be/m1828496514,-0.978052,250000,300.0,20.013699,426.0,Husqvarna,SuperMoto,12 t/m 35 kW,Fel blauwe SUPERMOTO Yamaha WR426F met papieren,Leopoldsburg
518,https://www.2dehands.be/m1826079567,-0.943605,170000,12000.0,21.013699,250.0,Suzuki,Chopper,12 t/m 35 kW,Suzuki Marauder GZ 250 A2 rijbewijs,Kortrijk
3347,https://www.2dehands.be/m1825517668,-0.932971,120000,15000.0,27.019178,650.0,Suzuki,Chopper,12 t/m 35 kW,Suzuki Savage bobber,Braine-L'Alleud
3136,https://www.2dehands.be/m1799769624,-0.909597,180000,9950.0,8.005479,250.0,Mash,Toermotor,12 t/m 35 kW,MASH 250,Frasnes - Lez - Buissenal
2745,https://www.2dehands.be/m1703237478,-0.895856,85000,22150.0,28.019178,248.0,Kawasaki,Chopper,12 t/m 35 kW,Kawasaki eliminator el 250 goede chopper 1994 ...,Verrebroek
521,https://www.2dehands.be/m1815665175,-0.784024,250000,8000.0,22.016438,250.0,Yamaha,Chopper,12 t/m 35 kW,Yamaha XV 250 Virago 09/2000 8000km Top condi...,Kortrijk


## Discussion/Conclusion
We have arrived at the end of the project. However, there are still a few aspects that can be improved. First of all, when collecting the data we have encountered a few failures. Consequently, the collection of data is not consistent and thus not optimal. Secondly, we should improve the extraction of the motorcycle model from the text. Thirdly, by collecting more data from different second hand website we can rely more on our results. Finally, allowing the users to use the output of the project on a website would be favorable for the end-user. Ideally, I would like to see a full automated process that starts with collecting the data on a weekly basis and end with an application where the user can find the good deals.