<a href="https://colab.research.google.com/github/ArthurKakande/PyConAfrica2024/blob/main/PyConAfricaCBRSMARTCarRecommendationSystem.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [1]:
!pip install sentence-transformers #Sentence transformers is a prerequisite for intellikit and can either be installed seperately or during the intellikit installtion.

Collecting sentence-transformers
  Downloading sentence_transformers-3.1.1-py3-none-any.whl.metadata (10 kB)
Downloading sentence_transformers-3.1.1-py3-none-any.whl (245 kB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m245.3/245.3 kB[0m [31m4.0 MB/s[0m eta [36m0:00:00[0m
[?25hInstalling collected packages: sentence-transformers
Successfully installed sentence-transformers-3.1.1


In [2]:
!pip install intellikit #Install the intellikit library

Collecting intellikit
  Downloading intellikit-0.0.5-py2.py3-none-any.whl.metadata (2.9 kB)
Collecting Levenshtein (from intellikit)
  Downloading levenshtein-0.26.0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (3.2 kB)
Collecting rapidfuzz<4.0.0,>=3.9.0 (from Levenshtein->intellikit)
  Downloading rapidfuzz-3.10.0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (11 kB)
Downloading intellikit-0.0.5-py2.py3-none-any.whl (10 kB)
Downloading levenshtein-0.26.0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (162 kB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m162.6/162.6 kB[0m [31m7.5 MB/s[0m eta [36m0:00:00[0m
[?25hDownloading rapidfuzz-3.10.0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (3.1 MB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m3.1/3.1 MB[0m [31m41.6 MB/s[0m eta [36m0:00:00[0m
[?25hInstalling collected packages: rapidfuzz, Levenshtein, intellikit
Successfully in

In [3]:
import pandas as pd #Importing pandas to handle the dataset (casebase)
import intellikit as ik #Importing the installed intellikit library

  from tqdm.autonotebook import tqdm, trange


In [4]:
#Loading our dataset (case based) that will be for the search system
#Let's import a csv file directly from github
url = "https://raw.githubusercontent.com/ArthurKakande/intellikit/main/examples/datasets/cars-1k.csv"

#let's turn raw link into a dataframme
df = pd.read_csv(url)

In [5]:
#Let's preview our cars dataset that will be used to build a car search system that will recommend a car basing on requirements of interest
df.head()

Unnamed: 0,price,year,manufacturer,make,fuel,miles,title_status,transmission,drive,type,paint_color
0,22168,2011,mercedes-benz,viano,diesel,203593,rebuilt,manual,fwd,van,black
1,9437,2011,ford,s-max,diesel,137316,rebuilt,manual,fwd,van,black
2,1073,2002,hyundai,matrix,gas,182000,rebuilt,manual,fwd,van,black
3,1846,2012,chrysler,town-country,gas,122800,clean,manual,fwd,van,black
4,3515,2006,fiat,doblo,diesel,155623,clean,manual,4wd,van,black


In [6]:
#Define your first query with a list of features
query = pd.DataFrame({
    'price': [5000],
    'year': [2010],
    'manufacturer': ['mercedes'],
    'fuel': ['diesel'],
    'transmission': ['automatic']

})

#Define you similarity calculation methods for your project
measure_price = ik.sim_logDifference #We only want cars closest to that price
measure_year = ik.sim_logDifference #We only want cars closest to that year
measure_manufacturer = ik.sim_levenshtein #We want something close to the name specified
measure_fuel = ik.sim_stringEM #We want the exact fuel type
measure_transmission = ik.sim_stringEM #We want the exact transmission

# Assign the appropriate similarity calculation functions to each feature of interest in the query
similarity_functions = {
    'price': measure_price,
    'year': measure_year,
    'manufacturer': measure_manufacturer,
    'fuel': measure_fuel,
    'transmission': measure_transmission
}

In [7]:
# Applying the methods and weights and retrieving the top results using the linear retriever
feature_weights = {
    'price': 0.4, #Setting price to be the most important metric when recommending
    'year': 0.2,
    'manufacturer': 0.1,
    'fuel': 0.1,
    'transmission': 0.2
}


In [8]:
#Let's now use one of the inellikit retrievers to retreive the most similar cases (recommend the most similar car)

top_n = 5  # Number of top similar results to return
top_similar_cases_linear = ik.linearRetriever(df, query, similarity_functions, feature_weights, top_n)
print("Top similar cases (Linear):")
print(top_similar_cases_linear)

Top similar cases (Linear):
    price  year   manufacturer   make    fuel   miles title_status  \
12   8142  2005  mercedes-benz  viano  diesel  321000        clean   
11  12916  2005  mercedes-benz  viano  diesel  205000        clean   
17  14729  2008  mercedes-benz  viano  diesel  180000        clean   
16  14766  2011  mercedes-benz   vito  diesel  205000        clean   
55   5070  2007          mazda      2  diesel  140700      rebuilt   

   transmission drive     type paint_color  
12       manual   4wd      van       black  
11       manual   rwd      van       black  
17       manual   4wd      van       black  
16       manual   rwd      van       black  
55       manual   fwd  compact       black  


In [9]:
#The system recommends the cheapest mercedes benz since price was our most important metric.