# Project 02: Boat Sales Dataset

![boat gif](https://i.pinimg.com/originals/97/69/2d/97692d9e26adff3716bc7e41366c55f8.gif)

## About Dataset


The dataset for the yacht and boat sales website comprises 9,888 entries and 10 columns, offering valuable information on boat listings from the last 7 days. The dataset covers essential details such as boat pricing, type, manufacturer, year of construction, length, width, material, location, and the number of views within the past week.


| **Attribute**               | **Description**                                                |
| --------------------------- | -------------------------------------------------------------- |
| Price                       | The price of the boat listing                                  |
| Boat Type                   | The type or category of the boat                               |
| Manufacturer                | The manufacturer or brand of the boat                          |
| Type                        | Additional information about the boat type                     |
| Year Built                  | The year when the boat was constructed                         |
| Length                      | The length of the boat in feet                                 |
| Width                       | The width of the boat in feet                                  |
| Material                    | The material used in the boat's construction                   |
| Location                    | The location of the boat, including country and city           |
| Number of views last 7 days | The number of views the boat listing received in the last week |

## Project Objective: Analyzing Boat Prices from 2019 to the Most Recent Year

**Main Objective:** The primary goal of this project is to calculate the mean and median prices of boats built between 2019 and the most recent year. 

# Step 1: Handling Null Values

In [15]:
import pandas as pd

Importing the `boat_data.csv` file into the `data` variable.


In [16]:
data = pd.read_csv("boat_data.csv")

In [17]:
# checking for total null values
data.isnull().sum()

Price                             0
Boat Type                         0
Manufacturer                   1338
Type                              6
Year Built                        0
Length                            9
Width                            56
Material                       1749
Location                         36
Number of views last 7 days       0
dtype: int64

In [18]:
# data of 2019 and later years
recent_data = data[data["Year Built"] >= 2019]

In [20]:
not_useful = recent_data[recent_data[["Length","Width"]].isnull().any(axis = 1) == True]
not_useful

Unnamed: 0,Price,Boat Type,Manufacturer,Type,Year Built,Length,Width,Material,Location,Number of views last 7 days
3309,EUR 4950000,Motor Yacht,,Used boat,2019,26.7,,,Italy Â» Toskana,117
3310,EUR 4936000,Motor Yacht,Bugari power boats,Used boat,2019,26.51,,,Italy Â» Marcas,96
4006,EUR 560000,Trawler,BÃ©nÃ©teau power boats,"new boat from stock,Diesel",2020,14.74,,,"Spain Â» Denia, Espagne",107
9434,EUR 16950,Bowrider,Suzuki power boats,Unleaded,2020,,,,Germany Â» Nordrhein-Westfalen Â» WSC Hopp / M...,33
9822,DKK 52000,Sport Boat,Crescent power boats,new boat from stock,2019,4.0,,,Denmark Â» Svendborg,126


In [21]:
recent_data.drop(not_useful.index,inplace = True)

A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  recent_data.drop(not_useful.index,inplace = True)


In [22]:
recent_data.isnull().sum()

Price                            0
Boat Type                        0
Manufacturer                   127
Type                             2
Year Built                       0
Length                           0
Width                            0
Material                       415
Location                         0
Number of views last 7 days      0
dtype: int64

In [23]:
# Handling the null values in 'Type'
recent_data[recent_data["Type"].isnull() == True]

Unnamed: 0,Price,Boat Type,Manufacturer,Type,Year Built,Length,Width,Material,Location,Number of views last 7 days
234,DKK 114500,Deck Boat,Crescent power boats,,2019,5.0,1.0,,Denmark Â» Svendborg,45
8057,DKK 351000,Bowrider,Campion power boats,,2019,5.0,2.0,,Denmark Â» Svendborg,41


In [39]:
recent_data["Type"][(recent_data["Boat Type"] == "Bowrider") & ((recent_data["Manufacturer"]== "Crescent power boats") | (recent_data["Location"]== "Denmark Â» Svendborg"))].value_counts()

Type
new boat from stock    9
Name: count, dtype: int64

In [43]:
recent_data["Type"][(recent_data["Boat Type"] == "Deck Boat") & ((recent_data["Manufacturer"]== "Campion power boats") | (recent_data["Location"]== "Denmark Â» Svendborg"))].value_counts()

Type
new boat from stock    6
Name: count, dtype: int64

In [44]:
recent_data["Type"].fillna(value = "new boat from stock",inplace = True )

The behavior will change in pandas 3.0. This inplace method will never work because the intermediate object on which we are setting values always behaves as a copy.

For example, when doing 'df[col].method(value, inplace=True)', try using 'df.method({col: value}, inplace=True)' or df[col] = df[col].method(value) instead, to perform the operation inplace on the original object.


  recent_data["Type"].fillna(value = "new boat from stock",inplace = True )
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  recent_data["Type"].fillna(value = "new boat from stock",inplace = True )


In [52]:
pd.set_option("display.max_rows",None)
recent_data[["Boat Type","Manufacturer","Location"]][(recent_data["Boat Type"] == "Sport Boat") & (recent_data["Location"] == "Denmark Â» Svendborg")].value_counts() 

Boat Type   Manufacturer          Location            
Sport Boat  Campion power boats   Denmark Â» Svendborg    8
            Yamarin power boats   Denmark Â» Svendborg    8
            Jeanneau power boats  Denmark Â» Svendborg    2
            Pioner power boats    Denmark Â» Svendborg    2
            Crescent power boats  Denmark Â» Svendborg    1
Name: count, dtype: int64

# Step 2: Extracting Additional Features

In [54]:
recent_data["Location"] = data["Location"]

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  recent_data["Location"] = data["Location"]


In [55]:
location = recent_data["Location"].str.split("Â»",expand = True)
location

Unnamed: 0,0,1,2
1,Germany,BÃ¶nningstedt,
3,Denmark,Svendborg,
4,Germany,Bayern,MÃ¼nchen
8,Germany,Bayern,Boote+service Oberbayern
13,Switzerland,Zugersee,Neuheim
20,Germany,Bayern,MÃ¼nchen
22,Germany,Nordrhein-Westfalen,WSC Hopp / MÃ¶nchengladbach
29,Switzerland,Horgen,
41,United Kingdom,"Burton Waters, Burton Waters",
53,Germany,Erftstadt,


In [56]:
recent_data["Country"] = location[0]

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  recent_data["Country"] = location[0]


In [22]:
recent_data

Unnamed: 0,Price,Boat Type,Manufacturer,Type,Year Built,Length,Width,Material,Location,Number of views last 7 days,Country
1,EUR 3490,Center console boat,Terhi power boats,new boat from stock,2020,4.00,1.50,Thermoplastic,Germany Â» BÃ¶nningstedt,75,Germany
3,DKK 25900,Sport Boat,Pioner power boats,new boat from stock,2020,3.00,1.00,,Denmark Â» Svendborg,64,Denmark
4,EUR 3399,Fishing Boat,Linder power boats,new boat from stock,2019,3.55,1.46,Aluminium,Germany Â» Bayern Â» MÃ¼nchen,58,Germany
8,EUR 3333,Fishing Boat,Crescent power boats,new boat from stock,2019,3.64,1.37,,Germany Â» Bayern Â» Boote+service Oberbayern,45,Germany
13,CHF 4600,Runabout,Kimple power boats,new boat from stock,2020,4.40,1.65,Aluminium,Switzerland Â» Zugersee Â» Neuheim,113,Switzerland
...,...,...,...,...,...,...,...,...,...,...,...
9872,EUR 4799,Sport Boat,BlueCraft power boats,new boat from stock,2020,5.10,2.16,GRP,Germany Â» Wesel,203,Germany
9873,EUR 4799,Working Boat,,"new boat from stock,Electric",2019,3.64,1.37,,Germany Â» Bayern Â» Boote Jochum,41,Germany
9874,EUR 4790,Fishing Boat,Linder power boats,"new boat from stock,Unleaded",2019,3.55,1.46,Aluminium,Germany Â» Bayern Â» MÃ¼nchen,56,Germany
9885,EUR 4499,Sport Boat,BlueCraft power boats,"new boat from stock,Unleaded",2020,4.40,1.80,GRP,Germany Â» Nordrhein-Westfalen Â» Wesel,354,Germany


In [58]:
recent_data.head()

Unnamed: 0,Price,Boat Type,Manufacturer,Type,Year Built,Length,Width,Material,Location,Number of views last 7 days,Country
1,EUR 3490,Center console boat,Terhi power boats,new boat from stock,2020,4.0,1.5,Thermoplastic,Germany Â» BÃ¶nningstedt,75,Germany
3,DKK 25900,Sport Boat,Pioner power boats,new boat from stock,2020,3.0,1.0,,Denmark Â» Svendborg,64,Denmark
4,EUR 3399,Fishing Boat,Linder power boats,new boat from stock,2019,3.55,1.46,Aluminium,Germany Â» Bayern Â» MÃ¼nchen,58,Germany
8,EUR 3333,Fishing Boat,Crescent power boats,new boat from stock,2019,3.64,1.37,,Germany Â» Bayern Â» Boote+service Oberbayern,45,Germany
13,CHF 4600,Runabout,Kimple power boats,new boat from stock,2020,4.4,1.65,Aluminium,Switzerland Â» Zugersee Â» Neuheim,113,Switzerland


In [60]:
recent_data["Type"].value_counts()

Type
new boat from stock,Unleaded    781
new boat from stock             445
new boat from stock,Diesel      176
new boat on order,Unleaded      128
Used boat,Unleaded               90
Used boat,Diesel                 84
Used boat                        83
new boat on order,Diesel         58
Display Model,Unleaded           51
new boat on order                49
new boat from stock,Electric     13
Display Model,Diesel             11
Display Model                     9
Used boat,Electric                4
Display Model,Electric            4
new boat from stock,Gas           2
new boat from stock,Hybrid        1
Display Model,Gas                 1
Name: count, dtype: int64

### Categorizing boats according to their condition

In [62]:
# Categorizing boats according to their condition
boat_type = recent_data["Type"].str.split(",",expand = True)
recent_data["Condition"] = boat_type[0]

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  recent_data["Condition"] = boat_type[0]


In [63]:
mapping = {"new boat from stock":"New boat","new boat on order":"New boat","Diesel":"Used boat",\
           "Unleaded":"Used boat","Electric":"Used boat"}
recent_data["Condition"].replace(mapping,inplace = True)

The behavior will change in pandas 3.0. This inplace method will never work because the intermediate object on which we are setting values always behaves as a copy.

For example, when doing 'df[col].method(value, inplace=True)', try using 'df.method({col: value}, inplace=True)' or df[col] = df[col].method(value) instead, to perform the operation inplace on the original object.


  recent_data["Condition"].replace(mapping,inplace = True)
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  recent_data["Condition"].replace(mapping,inplace = True)


In [64]:
recent_data["Condition"].value_counts()

Condition
New boat         1653
Used boat         261
Display Model      76
Name: count, dtype: int64

### Categorizing boats according to their size

In [66]:
min_length = recent_data["Length"].min()
min_length

1.15

In [67]:
max_length = recent_data["Length"].max()
max_length

54.4

In [68]:
min_width = recent_data["Width"].min()
min_width

0.01

In [69]:
max_width = recent_data["Width"].max()
max_width

9.98

In [70]:
recent_data[(recent_data["Length"] == recent_data["Length"].max()) | (recent_data["Width"] == recent_data["Width"].max()) ]

Unnamed: 0,Price,Boat Type,Manufacturer,Type,Year Built,Length,Width,Material,Location,Number of views last 7 days,Country,Condition
2066,EUR 145900,Deck Boat,Jeanneau power boats,"new boat from stock,Unleaded",2020,9.12,9.98,PVC,"France Â» PORNICHET, France",36,France,New boat
3325,EUR 31000000,Mega Yacht,Majesty Yachts power boats,"new boat from stock,Diesel",2020,54.4,9.6,,United Arab Emirates Â» Dubai & VAE,1009,United Arab Emirates,New boat


In [71]:
median_length = recent_data["Length"].median()
median_length

7.4

In [72]:
median_width = recent_data["Width"].median()
median_width

2.54

In [34]:
recent_data

Unnamed: 0,Price,Boat Type,Manufacturer,Type,Year Built,Length,Width,Material,Location,Number of views last 7 days,Country,Condition
1,EUR 3490,Center console boat,Terhi power boats,new boat from stock,2020,4.00,1.50,Thermoplastic,Germany Â» BÃ¶nningstedt,75,Germany,New boat
3,DKK 25900,Sport Boat,Pioner power boats,new boat from stock,2020,3.00,1.00,,Denmark Â» Svendborg,64,Denmark,New boat
4,EUR 3399,Fishing Boat,Linder power boats,new boat from stock,2019,3.55,1.46,Aluminium,Germany Â» Bayern Â» MÃ¼nchen,58,Germany,New boat
8,EUR 3333,Fishing Boat,Crescent power boats,new boat from stock,2019,3.64,1.37,,Germany Â» Bayern Â» Boote+service Oberbayern,45,Germany,New boat
13,CHF 4600,Runabout,Kimple power boats,new boat from stock,2020,4.40,1.65,Aluminium,Switzerland Â» Zugersee Â» Neuheim,113,Switzerland,New boat
...,...,...,...,...,...,...,...,...,...,...,...,...
9872,EUR 4799,Sport Boat,BlueCraft power boats,new boat from stock,2020,5.10,2.16,GRP,Germany Â» Wesel,203,Germany,New boat
9873,EUR 4799,Working Boat,,"new boat from stock,Electric",2019,3.64,1.37,,Germany Â» Bayern Â» Boote Jochum,41,Germany,New boat
9874,EUR 4790,Fishing Boat,Linder power boats,"new boat from stock,Unleaded",2019,3.55,1.46,Aluminium,Germany Â» Bayern Â» MÃ¼nchen,56,Germany,New boat
9885,EUR 4499,Sport Boat,BlueCraft power boats,"new boat from stock,Unleaded",2020,4.40,1.80,GRP,Germany Â» Nordrhein-Westfalen Â» Wesel,354,Germany,New boat


In [74]:
def category(frame):
    i = frame["Length"]
    x = frame["Width"]
    if i < 7.4 and x <= 2.54:
        return "Small"
    elif i == 7.4 and x == 2.54:
        return "Medium"
    else:
        return "Big"
recent_data["Category"] = recent_data.apply(category, axis=1)

recent_data.head()

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  recent_data["Category"] = recent_data.apply(category, axis=1)


Unnamed: 0,Price,Boat Type,Manufacturer,Type,Year Built,Length,Width,Material,Location,Number of views last 7 days,Country,Condition,Category
1,EUR 3490,Center console boat,Terhi power boats,new boat from stock,2020,4.0,1.5,Thermoplastic,Germany Â» BÃ¶nningstedt,75,Germany,New boat,Small
3,DKK 25900,Sport Boat,Pioner power boats,new boat from stock,2020,3.0,1.0,,Denmark Â» Svendborg,64,Denmark,New boat,Small
4,EUR 3399,Fishing Boat,Linder power boats,new boat from stock,2019,3.55,1.46,Aluminium,Germany Â» Bayern Â» MÃ¼nchen,58,Germany,New boat,Small
8,EUR 3333,Fishing Boat,Crescent power boats,new boat from stock,2019,3.64,1.37,,Germany Â» Bayern Â» Boote+service Oberbayern,45,Germany,New boat,Small
13,CHF 4600,Runabout,Kimple power boats,new boat from stock,2020,4.4,1.65,Aluminium,Switzerland Â» Zugersee Â» Neuheim,113,Switzerland,New boat,Small


### Standardizing Price

In [75]:
currency = recent_data["Price"].str.split() 

In [76]:
currency

1           [EUR, 3490]
3          [DKK, 25900]
4           [EUR, 3399]
8           [EUR, 3333]
13          [CHF, 4600]
20          [EUR, 3999]
22          [EUR, 3930]
29          [CHF, 3975]
41           [Â£, 5170]
53          [EUR, 5200]
55          [CHF, 5500]
70          [EUR, 7800]
75          [CHF, 8155]
85          [EUR, 8790]
87          [EUR, 8600]
89          [CHF, 9265]
100        [CHF, 10500]
110         [EUR, 9553]
112         [EUR, 9500]
120        [EUR, 10115]
121        [CHF, 10900]
143        [EUR, 10999]
144        [EUR, 10999]
145        [EUR, 10999]
155        [EUR, 11990]
157        [CHF, 12950]
158        [DKK, 89000]
159        [DKK, 89000]
160        [CHF, 12900]
161        [CHF, 12900]
163        [EUR, 12800]
165        [EUR, 12770]
166        [DKK, 94990]
167        [EUR, 12750]
170        [EUR, 12500]
181        [EUR, 13566]
195        [CHF, 15600]
199        [CHF, 15400]
200        [EUR, 14200]
201        [EUR, 14200]
203        [EUR, 14000]
211        [EUR,

In [77]:
new = []
for i in currency:
    if i[0] == "CHF": 
        new.append(float(i[1])*1.08)
    elif i[0] == "DKK" :
        new.append(float(i[1])/0.16)
    elif i[0] == "EUR" :
        new.append(float(i[1])*1.12)
    elif i[0] == "Â£ (GBP)" :  
        new.append(float(i[1])*1.32)
    else:
        new.append(float(i[1]))
new = [round(i,2) for i in new] 
del recent_data["Price"]

In [79]:
recent_data["USD Price"] = new

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  recent_data["USD Price"] = new


In [80]:
price_col = recent_data.pop('USD Price')
recent_data.insert(0, 'USD Price', price_col)

In [81]:
recent_data.head()

Unnamed: 0,USD Price,Boat Type,Manufacturer,Type,Year Built,Length,Width,Material,Location,Number of views last 7 days,Country,Condition,Category
1,3908.8,Center console boat,Terhi power boats,new boat from stock,2020,4.0,1.5,Thermoplastic,Germany Â» BÃ¶nningstedt,75,Germany,New boat,Small
3,161875.0,Sport Boat,Pioner power boats,new boat from stock,2020,3.0,1.0,,Denmark Â» Svendborg,64,Denmark,New boat,Small
4,3806.88,Fishing Boat,Linder power boats,new boat from stock,2019,3.55,1.46,Aluminium,Germany Â» Bayern Â» MÃ¼nchen,58,Germany,New boat,Small
8,3732.96,Fishing Boat,Crescent power boats,new boat from stock,2019,3.64,1.37,,Germany Â» Bayern Â» Boote+service Oberbayern,45,Germany,New boat,Small
13,4968.0,Runabout,Kimple power boats,new boat from stock,2020,4.4,1.65,Aluminium,Switzerland Â» Zugersee Â» Neuheim,113,Switzerland,New boat,Small


# Step 3: Analyzing Boat Prices

In [83]:
views_median = recent_data["Number of views last 7 days"].median()

In [86]:
recent_data["Number of views last 7 days"].max()

1710

### Determining the price of boats according to the quantity of boats

In [87]:
more_views = recent_data[recent_data["Number of views last 7 days"] > views_median]
more_views["USD Price"].mean().round(2) # boats with more views have less price

701501.18

In [88]:
less_views = recent_data[recent_data["Number of views last 7 days"] <= views_median]
less_views["USD Price"].mean().round(2) 

821840.84

### Determining the price of boats according to the condition of boats

In [90]:
used_boats = recent_data[recent_data["Condition"] == "Used boat"]
new_boats = recent_data[recent_data["Condition"] == "New boat"]

In [91]:
used_boats["USD Price"].mean().round(2) # used boats have more price

849722.39

In [92]:
new_boats["USD Price"].mean().round(2)

774737.44

In [93]:
used_boats["USD Price"].median()

168000.0

In [94]:
new_boats["USD Price"].median()

77795.2

In [95]:
big_boats = recent_data[recent_data["Category"] == "Big"]
small_boats = recent_data[recent_data["Category"] == "Small"]

In [96]:
big_boats["USD Price"].mean().round(2)# big boats have more prices

1224816.78

In [97]:
small_boats["USD Price"].mean().round(2)

175501.22

In [98]:
small_boats["USD Price"].median()

44688.0

In [99]:
big_boats["USD Price"].median()

179891.6