## Recommendation System
The car recommendation system is designed to assist users in finding similar cars based on specific criteria such as manufacturer, paint color, car type, and price range. Leveraging advanced techniques in natural language processing and machine learning, the system analyzes the features of each car and computes their similarities to provide personalized recommendations tailored to the user's preferences.

* Input Criteria: Users provide input criteria including manufacturer, paint colour, car type, and price range.
* Data Filtering: The system reduces the selection to pertinent cars by filtering the car dataset according to the given criteria.
* Feature Extraction: Each car is described in detail using pertinent characteristics including the manufacturer, model, transmission, year, odometer, size, and paint colour.
* Similarity Calculation: By utilizing sophisticated methods such as sigmoid kernel and TF-IDF (Term Frequency-Inverse Document Frequency), the system determines the degree of similarity between any pair of cars by combining their features.
* Recommendation Generation: The system identifies the top similar cars based on the calculated similarities and presents them as recommendations to the user.

To use the car recommendation system, simply provide your preferences including manufacturer, paint colour, car type, and price range. The system will then analyze the dataset and generate personalized recommendations to help you find cars that match your criteria.

In [20]:
# importing necessary libraries
import pandas as pd
import numpy as np
from nltk.corpus import stopwords
from sklearn.metrics.pairwise import sigmoid_kernel
from sklearn.feature_extraction.text import CountVectorizer
from sklearn.feature_extraction.text import TfidfVectorizer
from nltk.tokenize import RegexpTokenizer
import re
import string
import random
from PIL import Image
import requests
from io import BytesIO
#import io
import matplotlib.pyplot as plt
%matplotlib inline

In [21]:
# reading vechicle data set from google drive
from google.colab import drive
drive.mount('/content/drive')
csv_file_name='vehicle_dataset.csv'
# csv_file_name='final_vehicle.csv'
vehicle_data_path = f'/content/drive/My Drive/vehicle_dataset/{csv_file_name}'

Drive already mounted at /content/drive; to attempt to forcibly remount, call drive.mount("/content/drive", force_remount=True).


In [22]:
# loading data
car_data = pd.read_csv(vehicle_data_path)
car_data.head(5)

Unnamed: 0,price,year,manufacturer,model,condition,cylinders,fuel,odometer,title_status,transmission,drive,size,type,paint_color,state,lat,long
0,30990,2017.0,toyota,tundra double cab sr,good,8 cylinders,gas,41124.0,clean,other,,,pickup,red,al,32.59,-85.48
1,15000,2013.0,ford,f-150 xlt,excellent,6 cylinders,gas,128000.0,clean,automatic,rwd,full-size,truck,black,al,32.592,-85.5189
2,35000,2019.0,toyota,tacoma,excellent,6 cylinders,gas,43000.0,clean,automatic,4wd,,truck,grey,al,32.6013,-85.443974
3,29990,2016.0,chevrolet,colorado,good,6 cylinders,gas,17302.0,clean,other,4wd,,pickup,red,al,32.59,-85.48
4,38590,2011.0,chevrolet,corvette,good,8 cylinders,gas,30237.0,clean,other,rwd,,other,red,al,32.59,-85.48


In [23]:
# remove all the null values from the data set:
car_data.dropna(inplace=True)
# check the  dataframe
car_data.info()
car_data.shape

<class 'pandas.core.frame.DataFrame'>
Index: 58140 entries, 1 to 316444
Data columns (total 17 columns):
 #   Column        Non-Null Count  Dtype  
---  ------        --------------  -----  
 0   price         58140 non-null  int64  
 1   year          58140 non-null  float64
 2   manufacturer  58140 non-null  object 
 3   model         58140 non-null  object 
 4   condition     58140 non-null  object 
 5   cylinders     58140 non-null  object 
 6   fuel          58140 non-null  object 
 7   odometer      58140 non-null  float64
 8   title_status  58140 non-null  object 
 9   transmission  58140 non-null  object 
 10  drive         58140 non-null  object 
 11  size          58140 non-null  object 
 12  type          58140 non-null  object 
 13  paint_color   58140 non-null  object 
 14  state         58140 non-null  object 
 15  lat           58140 non-null  float64
 16  long          58140 non-null  float64
dtypes: float64(4), int64(1), object(12)
memory usage: 8.0+ MB


(58140, 17)

In [24]:
data_copy = car_data.copy()
# function to create a new column 'made', i.e name of the country where the car is manufactured.
def country(row):
  val=None
  if row['manufacturer'] in ['harley-davidson', 'chevrolet', 'pontiac', 'ram', 'ford', 'gmc', 'tesla', 'jeep', 'dodge',
                             'cadillac', 'chrysler', 'lincoln', 'buick', 'saturn', 'mercury']:
    val = 'American'
  elif (row['manufacturer'] in ['lexus', 'nissan', 'toyota', 'acura', 'honda', 'infiniti', 'subaru', 'mitsubishi',
                                'datsun', 'mazda']):
    val = 'Japanese'
  elif (row['manufacturer'] in ['volkswagen', 'mercedes-benz', 'bmw', 'audi', 'porsche']):
    val = 'German'
  elif (row['manufacturer'] in ['ferrari','fiat','alfa-romeo']):
    val = 'Italian'
  elif (row['manufacturer'] in ['kia','hyundai']):
    val = 'Korean'
  elif (row['manufacturer'] in ['volvo']):
    val = 'Swedish'
  elif (row['manufacturer'] in ['rover','mini','land rover', 'jaguar']):
    val = 'English'
  return val
car_data['Made'] = data_copy.apply(country, axis=1)
car_data.head()

Unnamed: 0,price,year,manufacturer,model,condition,cylinders,fuel,odometer,title_status,transmission,drive,size,type,paint_color,state,lat,long,Made
1,15000,2013.0,ford,f-150 xlt,excellent,6 cylinders,gas,128000.0,clean,automatic,rwd,full-size,truck,black,al,32.592,-85.5189,American
21,19900,2004.0,ford,f250 super duty,good,8 cylinders,diesel,88000.0,clean,automatic,4wd,full-size,pickup,blue,al,32.5475,-85.4682,American
23,14000,2012.0,honda,odyssey,excellent,6 cylinders,gas,95000.0,clean,automatic,fwd,full-size,mini-van,silver,al,32.628739,-85.46182,Japanese
28,22500,2001.0,ford,f450,good,8 cylinders,diesel,144700.0,clean,manual,rwd,full-size,truck,white,al,32.6304,-85.4016,American
44,3000,2004.0,chrysler,town & country,good,6 cylinders,gas,176144.0,clean,automatic,fwd,mid-size,mini-van,silver,al,32.629409,-85.484447,American


In [40]:

# create a new featue called car name by concatinatinc manufacturer and model
car_data['car_details'] = car_data[['Made','manufacturer', 'model', 'transmission', 'year', 'odometer', 'size', 'paint_color', 'size']].astype(str).agg(' '.join, axis=1)
car_data.head(2)

Unnamed: 0,price,year,manufacturer,model,condition,cylinders,fuel,odometer,title_status,transmission,drive,size,type,paint_color,state,lat,long,Made,car_details
1,15000,2013.0,ford,f-150 xlt,excellent,6 cylinders,gas,128000.0,clean,automatic,rwd,full-size,truck,black,al,32.592,-85.5189,American,American ford f-150 xlt automatic 2013.0 12800...
21,19900,2004.0,ford,f250 super duty,good,8 cylinders,diesel,88000.0,clean,automatic,4wd,full-size,pickup,blue,al,32.5475,-85.4682,American,American ford f250 super duty automatic 2004.0...


In [34]:
 # filtering data satisfying given criteria
data = car_data.loc[(car_data['paint_color']=='red')
                  & (car_data['type']=='pickup') & ((car_data['price']>=5000) & (car_data['price']<=10000))]
data.reset_index(level = 0, inplace = True)

In [35]:
# reverse mapping of indices and movie titles
indices = pd.Series(data.index, index = data['manufacturer'])
print(indices)

manufacturer
ford          0
ford          1
ford          2
ford          3
ford          4
             ..
toyota       85
ford         86
ford         87
chevrolet    88
ford         89
Length: 90, dtype: int64


In [36]:
 #get car manufacturer country into vectors and used unigram
tf = TfidfVectorizer(analyzer='word', ngram_range=(1, 1), min_df = 1, stop_words='english',token_pattern=r'\w{1,}')
tfidf_matrix = tf.fit_transform(data['car_details'])

In [39]:
# calculating similarity measures on the basis of sigmoid_kernel
sg = sigmoid_kernel(tfidf_matrix, tfidf_matrix)
sg

array([[0.76420653, 0.76204942, 0.76214584, ..., 0.76209269, 0.76195432,
        0.7620666 ],
       [0.76204942, 0.76420653, 0.76260512, ..., 0.76204883, 0.76191761,
        0.76202503],
       [0.76214584, 0.76260512, 0.76420653, ..., 0.76214512, 0.76199221,
        0.76211629],
       ...,
       [0.76209269, 0.76204883, 0.76214512, ..., 0.76420653, 0.76218357,
        0.76206599],
       [0.76195432, 0.76191761, 0.76199221, ..., 0.76218357, 0.76420653,
        0.76193502],
       [0.7620666 , 0.76202503, 0.76211629, ..., 0.76206599, 0.76193502,
        0.76420653]])

In [38]:
#index corresponding to original_manufacturer
idx = indices['ford']
print(idx)

manufacturer
ford     0
ford     1
ford     2
ford     3
ford     4
        ..
ford    82
ford    84
ford    86
ford    87
ford    89
Length: 65, dtype: int64


In [None]:
def recommend(manufacturer,paint_color,type,price_range):
    '''
    data set: car_data
    parameters: manufacturer, paint_color, car_type, price_range
    return: dataframe containing the top similar cars
    '''
    # filtering data satisfying given criteria
    data = car_data.loc[(car_data['paint_color']==paint_color)
                  & (car_data['type']==type) & ((car_data['price']>=price_range[0]) & (car_data['price']<=price_range[1]))]
    data.reset_index(level = 0, inplace = True)

    # reverse mapping of indices and movie titles
    indices = pd.Series(data.index, index = data['manufacturer'])
    print(indices)

    #get car manufacturer country into vectors and used unigram
    tf = TfidfVectorizer(analyzer='word', ngram_range=(1, 3), min_df = 3, stop_words='english',token_pattern=r'\w{1,}')
    tfidf_matrix = tf.fit_transform(data['car_details'])


    # calculating similarity measures on the basis of sigmoid_kernel
    sg = sigmoid_kernel(tfidf_matrix, tfidf_matrix)

    #index corresponding to original_manufacturer country
    idx = indices[manufacturer]

#pairwsie similarity scores
    sig = list(enumerate(sg[idx])) #creates an iterable of tuples where each tuple consists of an index and the corresponding similarity score.
# sorting cars
    sig = sorted(sig, reverse=True)
# 6 most similar cars
    sig = sig[0:6]
# car indicies
    cars_indexes = [i[0] for i in sig]

    # recomendation for top 6 similar cars
    rec = data[['price','Made','manufacturer', 'model','type','year','condition','fuel','title_status'
                ,'transmission','paint_color','state']].iloc[cars_indexes]
    return rec


In [41]:
recommend("ford","red","pickup",(9000,10000))

manufacturer
ford       0
ford       1
ford       2
ford       3
ford       4
toyota     5
toyota     6
ford       7
ford       8
ford       9
toyota    10
ford      11
ford      12
ford      13
ford      14
toyota    15
nissan    16
dtype: int64


Unnamed: 0,price,Made,manufacturer,model,type,year,condition,fuel,title_status,transmission,paint_color,state
11,9000,American,ford,f350,pickup,1997.0,good,diesel,clean,automatic,red,pa
10,9977,Japanese,toyota,tundra,pickup,2006.0,excellent,gas,clean,automatic,red,or
9,9495,American,ford,f-150 supercab xlt 4wd,pickup,2007.0,excellent,gas,clean,automatic,red,oh
8,9495,American,ford,f-150 supercab xlt 4wd,pickup,2007.0,excellent,gas,clean,automatic,red,oh
7,9495,American,ford,f-150 supercab xlt 4wd,pickup,2007.0,excellent,gas,clean,automatic,red,oh
6,9500,Japanese,toyota,tacoma xtracab,pickup,2001.0,excellent,gas,clean,automatic,red,oh
