In [3]:
# Install OpenAI library
!pip install -U -q openai tenacity

# Overview of the Used Car (Pre-Owned car) Assistant AI
- This data set contains the information of used cars in India.
- Each row in the data set describes various attributes about the car
- This AI assistant can be used for platforms like cars24, OLX, CarDekho, Quikr, etc.
- This AI assistant will help user in selecting a used car on the basis of following 6 parameters.
    - Brand preferences
    - Location
    - Fuel_type (Diesel, Petrol, Electric)
    - Transmission (Manual, Automatic)
    - Owner_Type (First, Second, Third, Forth, Fifth)
    - Price range
- Assistant AI will gather the user requirements for above 6 parameters, and help user in selecting an appropriate used car.
- If there is no car which fulfill user's requirements, then assistant AI will print the message that connecting user to human assistant expert for further help.
- Once the requirements are finalized, the assistance searches the dataset with best matching cars and presents them to the user.
- All user text goes through moderation checks to avoid responding to flagged text.

## Project documentation

### Project's goals
Design and develop an AI assistant for selecting cars based on the user requirements. Here the AI assistant is supposed to interact with the user using natural language and ask questions asking users for their preference for buying the car. Based on the users requirements, AI assistant should shortlist a few cars and showcase to the user.

### AI model used
* OpenAI’s chatgpt-3.5-turbo with python module openai
* Used APIs like chatCompletion and moderation

### Data sources
* Used cars databased from Kaggle <insert link here>

### Key design decisions
* Use of pandas for handling the datasets
* Use of openAI APIs for handling conversations and moderation
* Identify a subset of features to be used as filter for short listing the car. E.g. price range, brand, fuel type, transmission, etc.
* The prompt is designed to ask users questions one by one and buid a machine readable python dictionary as output
* Use of thought chain or flowchart as part of initial prompt to decide if we have enough data available
* openAI generates machine parseable dictionary as part of the first prompt, so we did not need another prompt to extract the dictionary. Simple string based parsing did the job.

### Challenges faced.
* Understanding the API changes from the reference code wrt to the latest model supported by openAI
* Writing the system message prompt and fine tuning it to get the required results
* Anticipating user actions while writing the prompt to handle various scenarios
* Printing the conversation array which is mix of dictionaries and openAI class objects. Used simpleNameSpace to solve this challenge



## Importing Libraries

In [94]:
import os, json, ast, openai
import pandas as pd
import types, sys


from IPython.display import display, HTML
from tenacity import retry, wait_random_exponential, stop_after_attempt
from openai import OpenAI



# Part A - DATA SET Preparation 

- This data set is downloaded from kaggle.com

In [80]:
df = pd.read_csv('used_cars_data.csv')
df.head()

Unnamed: 0,S.No.,Name,Location,Year,Kilometers_Driven,Fuel_Type,Transmission,Owner_Type,Mileage,Engine,Power,Seats,New_Price,Price
0,0,Maruti Wagon R LXI CNG,Mumbai,2010,72000,CNG,Manual,First,26.6 km/kg,998 CC,58.16 bhp,5.0,,1.75
1,1,Hyundai Creta 1.6 CRDi SX Option,Pune,2015,41000,Diesel,Manual,First,19.67 kmpl,1582 CC,126.2 bhp,5.0,,12.5
2,2,Honda Jazz V,Chennai,2011,46000,Petrol,Manual,First,18.2 kmpl,1199 CC,88.7 bhp,5.0,8.61 Lakh,4.5
3,3,Maruti Ertiga VDI,Chennai,2012,87000,Diesel,Manual,First,20.77 kmpl,1248 CC,88.76 bhp,7.0,,6.0
4,4,Audi A4 New 2.0 TDI Multitronic,Coimbatore,2013,40670,Diesel,Automatic,Second,15.2 kmpl,1968 CC,140.8 bhp,5.0,,17.74


In [81]:
# Checking the Null values
df.isna().sum()

S.No.                   0
Name                    0
Location                0
Year                    0
Kilometers_Driven       0
Fuel_Type               0
Transmission            0
Owner_Type              0
Mileage                 2
Engine                 46
Power                  46
Seats                  53
New_Price            6247
Price                1234
dtype: int64

In [7]:
df.shape

(7253, 14)

### Insight : 
- There are 14 features in this data set.  
- Deleting 'S.No.' feature (not useful in analysis) and 'New_price' feature(contains 90% missing values)
- Removing the missing values from other columns too.

In [82]:
df = df.drop(['S.No.', 'New_Price'], axis=1)

# Removing the missing values rows
df = df.dropna()

df.shape

(5975, 12)

In [83]:
# Now there are no missing values in the data set
df.isna().sum()

Name                 0
Location             0
Year                 0
Kilometers_Driven    0
Fuel_Type            0
Transmission         0
Owner_Type           0
Mileage              0
Engine               0
Power                0
Seats                0
Price                0
dtype: int64

### Extracting the 'Brand' name of the car from 'Name'

In [84]:
# Extracting the 'Brand' name from 'Name'
df['Brand'] = df['Name'].apply(lambda x: x.split(' ')[0])

# Coverting number of 'Seats' into integer 
df['Seats'] = df['Seats'].apply(lambda x: int(x))

# Changing 'Price' feature 
df['Price'] = df['Price'].apply(lambda x: int(x * 100000))

df.head()

Unnamed: 0,Name,Location,Year,Kilometers_Driven,Fuel_Type,Transmission,Owner_Type,Mileage,Engine,Power,Seats,Price,Brand
0,Maruti Wagon R LXI CNG,Mumbai,2010,72000,CNG,Manual,First,26.6 km/kg,998 CC,58.16 bhp,5,175000,Maruti
1,Hyundai Creta 1.6 CRDi SX Option,Pune,2015,41000,Diesel,Manual,First,19.67 kmpl,1582 CC,126.2 bhp,5,1250000,Hyundai
2,Honda Jazz V,Chennai,2011,46000,Petrol,Manual,First,18.2 kmpl,1199 CC,88.7 bhp,5,450000,Honda
3,Maruti Ertiga VDI,Chennai,2012,87000,Diesel,Manual,First,20.77 kmpl,1248 CC,88.76 bhp,7,600000,Maruti
4,Audi A4 New 2.0 TDI Multitronic,Coimbatore,2013,40670,Diesel,Automatic,Second,15.2 kmpl,1968 CC,140.8 bhp,5,1773999,Audi


# Part B: System Design

- Once the data set is ready for the use, Assistant AI will gather user requirement for following parameters.
    - Brand (e.g. 'Maruti', 'Hyundai', 'Honda', 'Audi', 'Nissan', etc)
    - Location (e.g. 'Mumbai', 'Pune', 'Chennai', 'Hyderabad', etc)
    - Fuel_type (Diesel, Petrol, Electric)
    - Transmission (Manual, Automatic)
    - Owner_Type (First, Second, Third, Forth, Fifth)
    - Price

- Assistant AI asks user for the suitable budget range for purchasing the used car.
- Assisatnt AI first tries to filter out the used car based on these parameters (Brand, Location, Fuel_type, Transmission, Owner_type).
- Then from the list of selected cars as per user requirement, it will sort the the cars in the ascending order of the price of the used car.
- Then it will present the user top 3 selected used cars, and asks for the user choice.
- Finally Used car assistant AI helps in finding the perfect used car as per user requirement. 
- If there is no car which fulfill user requirements, then assistant AL will print the message that connecting user to human assistant expert for further help.


## Step 1: Conversation and Information Gathering  

In [115]:
# setting up the environment to use OpenAI's API.
openai.api_key = open('chat_gpt_api_key.txt', 'r').read().strip()
os.environ['OPENAI_API_KEY'] = openai.api_key

client = OpenAI(api_key=openai.api_key)
user_req = {}


In [116]:
def car_assist_req_prompt():
    ''' 
    A system message for used car assistant AI
    Returns a dict {"role": "system", "content": system_message}
    '''

    delimiter = "####"
    user_req = {
        'brand': '',
        'location': '',
        'ownership': '',
        'fuel_type': '',
        'transmission': '',
        'budget': '',
    }

    system_message = f"""
        You are an intelligent car assistant helping users buy pre-owned cars. Your job is to ask questions to user to gather all the requirements and be confident about it.
        Your final goal is to create a python dictionary which describes the complete user requirements with high confidence.
        {delimiter}
        The dictionary must contain following keys with valid values for each of the key:
        {user_req.keys()}
        {delimiter}
        Following are the constraints on the values for each key in the dictionary:
        'brand' must be a list of car manufacturer brands. can be empty.
        'location' must be a city name in India.
        'ownership' must be either of ['first', 'second', 'third', 'fourth', 'fifth'] representing the owner of the car (e.g. first owner, second owner, etc)
        'fuel_type' must be either of ['petrol', 'diesel', 'electric']
        'transmission' must be either of ['manual', 'automatic']
        'budget' must be a list of numbers containing two values: minimum and maximum price of the car.
        Here, the minimum acceptible budget is 50000 INR. Prompt user for different values that fit this requirement.
        {delimiter}
        To build such a dictionary, stat with a short welcome message and follow this process:
        {delimiter}
        step1:
        start by asking short question to the user asking for their preference. Prefer to ask for maximum of one requirements per question.
        Evaluate each user response and infer with high confidence, the values of the dictionary. If you are not confident about the values, you must ask more questions
        until you get concrete answers. If the user gives answers that do not fit the above requirements, give choices to user to select from.

        Users may not answer the questions with exact values. You must convert units correctly. (e.g. "my budget is around 90k" means the user's budget is 90000 with some 10% variance around this number)
        If users provide answers that cannot fit in the above constraints, ask them to change their response and update the dictionary values only then.
        set a confirmation count to 4
        {delimiter}
        step2:
        Ask for more questions and when you have built the above dictionary with high confidence values.
        At any cost, avoid using random values or low confidence values for any of the requirements. You will be extremely penalized for doing this.
        this time give a special response that must contain only and only the python dictionary.
        After every response from user, update the dictionary.
        {delimiter}
        step3:
        check if the requirements gathered so far covers all the keys. Values must not be 'unknown' or None or empty unless allowed in the requirements.
        Note that when I say a key must contain from a given values, any value outside that list is considered wrong and you will be penalized for that.
        if the dictionary is incomplete, values do not fit the constraints or any other issues, jump to step2 and continue.
        When all the requirements are complete, then and only then go to next step.
        {delimiter}
        step4:
        you may have overlooked some aspects for verifying if you have all the data required and have full confidence.
        Hence, I want you to reduce the confirmation count and if it is not zero, jump to step3 and evaluate the entire conversation once again
        to confirm about your confidence. Only whe the confirmation count is zero, go to next step.
        {delimiter}
        step5:
        Only print the python dictionary. there must not be anything else in the response but the python dictionary. Note that the response, when parsed as
        dictionary must succeed. Do not add anything else in the final response.
        {delimiter}
    """
    return {'role': 'system', 'content': system_message}



In [117]:
def openai_interact(conv, user_str=None):
    ''' 
    Send the previous conversation and append the response from the gpt.
    '''

    if user_str:
        conv.append({
            'role': 'user',
            'content': user_str
        })
    # Interact with GPT
    resp = client.chat.completions.create(
        messages=conv,
        model="gpt-3.5-turbo",
        # response_format = { "type": "json_object"},
    )

    conv.append(resp.choices[0].message)
    return resp


def describe_conversation(conv):
    ''' 
    Print the conversation in human readable format.
    '''

    for msg in conv:
        if type(msg) is dict:
            msg = types.SimpleNamespace(**msg)

        if msg.role == 'system':
            continue
        

        print(f"{msg.role}: ")
        
        for line in msg.content.split('\n'):
            print(f'     {line}')

        if msg.role == 'assistant':
            print("_" * 100)



In [118]:
def extract_dict(text):
    ''' 
    Extracting the dictionary from the text and returning the dictionary.
    '''
    parts = text.split('{')
    parts = parts[-1].split('}')[0]
    parts = parts.replace("'", '"')
    try:
        return json.loads('{' + parts + '}')

    except:
        return None

In [119]:
def collect_requirements(conv):
    ''' 
    Conversation between user and gpt.
    Collecting the user requirement for the defined 6 parameters.
    Checking every user response whether is it flggged by moderation layer
    Creating the dictionary once all the information is gathered from the user
    '''
    while True:
        sys.stdout.flush()
        user_input = input('Enter your message : ')
        if user_input.lower() in ['quit', 'exit']:
            break
        moderation_response = client.moderations.create(input=user_input)
        if moderation_response.results[0].flagged:
            print('user:')
            print('     ', user_input)
            print('system:')
            print('     This message was flagged by moderation system. Please try again.')
            continue
        openai_interact(conv, user_input)
        created_dict = extract_dict(conv[-1].content)
        if created_dict:
            describe_conversation(conv[-2 : -1])
            return created_dict
        describe_conversation(conv[-2:])

    
      

## Stage 2: Product Mapping and Information Extraction

In [120]:
def match_car(row):
    ''' 
    Searching for the used cars from the data set as per user requiremnet and giving the score
    Returning the score for each row (used car)
    '''

    score = 0
    user_req['brand'] = [text.lower() for text in user_req['brand']]

    if row['Price'] < min(user_req['budget']) or row['Price'] > max(user_req['budget']):
        return 0

    score += 1
    if not user_req['brand'] or (row['Brand'].lower() in user_req['brand']):
        score += 1

    if row['Location'].lower() == user_req['location'].lower():
        score += 1
    if row['Owner_Type'].lower() == user_req['ownership'].lower():
        score += 1
    if row['Fuel_Type'].lower() == user_req['fuel_type'].lower():
        score += 1
            
    return score
    

## Stgae 3: Product Recommendations

In [122]:
def market_products(car):
    ''' 
    Returning the human readable message for the top 3 selected cars
    '''
    
    system_message = f"""
    You are a methodical car sales expert and you are tasked with the objective to  describe cars to the user based on the given parameters.
    Start with a brief summary of each car in the following format in the exact order as user provided the data.
    
    1. <car Name> : <Major specifications of the car>, <Price in Rs>
    2. <car Name> : <Major specifications of the car>, <Price in Rs>
    3. <car Name> : <Major specifications of the car>, <Price in Rs>
    """
    
    user_message = f""" These are the user's products: {car}"""
    conv = [{"role": "system", "content": system_message },
            {"role":"user","content":user_message}]

    resp = client.chat.completions.create(
        messages=conv,
        model="gpt-3.5-turbo",
        # response_format = { "type": "json_object"},
    )

    conv.append(resp)
    
    return resp

In [139]:
def recommend_products(cars_df):
    ''' 
    Return false - If there is no used car from the data set as per user requirement, and printing the meaage that connecting user to human assistant expert.
    else Rteurn True and printing the human redable message for top 3 used cars
    '''
    
    if cars_df[cars_df['Score'] > 3].shape[0] == 0:
        print('\n\nassistant:')
        print('     Sorry! I could not find any car that fulfills your requirenments.')
        print('     I am connecting you to human assistant expert for further help.')
        return False
    
    resp = market_products(cars_df.iloc[0 : 3])
    print('\n\n    Here are some cars that best match your requirements.')
    print(resp.choices[0].message.content)
    print("_"*60)
    return True


In [132]:
def market_final_product(car):
    ''' 
    Printing the Final 
    '''
    system_message = f""" 
        You are a marketing expert chatbot. 
        Your goal is to persuade the user to buy a pre-owned car they have shown interest in. 
        Use very short, engaging, positive, and professional language to highlight the car's best features, benefits and any unique selling points. 
        Your message should be compelling and reassuring, ultimately encouraging the user to make the purchase.
        Write the brief summary of the car.
    """

    user_message = f""" This is the user's product: {car}"""

    conv = [{'role': 'system', 
             'content': system_message},
             
             {'role': 'user', 
             'content': user_message}]
    
    resp = client.chat.completions.create(
        messages=conv,
        model='gpt-3.5-turbo',
    )

    print(resp.choices[0].message.content)


## Main Program 
### (Output - top 3 recommendations, and market the selection)

In [127]:
# Gather Requirements into dictionary
conversation = []
conversation.append(car_assist_req_prompt())
openai_interact(conversation)
describe_conversation(conversation)
user_req = collect_requirements(conversation)

# Sort the used cars data based on the scores.
df['Score'] = df.apply(match_car, axis=1)
recommended_cars_df = df.sort_values(by=['Score', 'Price'], ascending=[False, True]) 
recommended_cars_df = recommended_cars_df.reset_index(drop=True)

# Market the recommneded cars
is_recommended = recommend_products(recommended_cars_df)
if is_recommended:
    print("\n\n     Please select the option '1', '2', or '3' : "  )
    selected_option = int(input('Please select the option "1", "2" or "3"'))
    print("*"*60)
    print("Excellent choice! This car is the perfect match for your requirements : \n")
    market_final_product(recommended_cars_df.iloc[selected_option - 1])
    print("*"*60)

assistant: 
     Welcome to the car assistant! I'll help you find a pre-owned car based on your requirements. Let's start by asking a few questions.
     
     What is your preferred car brand? If you have multiple preferences, please provide all of them. If you're not sure, I can give you some options to choose from.
____________________________________________________________________________________________________
user: 
     honda, maruti, tata
assistant: 
     Great choice! Now, may I know your location (city) within India? This will help me find cars available in your area.
____________________________________________________________________________________________________
user: 
     bangalore
assistant: 
     Excellent! 
     What is your preferred ownership type for the car - first, second, third, fourth, or fifth?
____________________________________________________________________________________________________
user: 
     first
assistant: 
     What type of fuel do you pre

## Main Program 
### (Output - no recommendation, connecting to human expert)

In [140]:
# Gather Requirements into dictionary
conversation = []
conversation.append(car_assist_req_prompt())
openai_interact(conversation)
describe_conversation(conversation)
user_req = collect_requirements(conversation)

# Sort the used cars data based on the scores.
df['Score'] = df.apply(match_car, axis=1)
recommended_cars_df = df.sort_values(by=['Score', 'Price'], ascending=[False, True]) 
recommended_cars_df = recommended_cars_df.reset_index(drop=True)

# Market the recommneded cars
is_recommended = recommend_products(recommended_cars_df)
if is_recommended:
    print("\n\n     Please select the option '1', '2', or '3' : "  )
    selected_option = int(input('Please select the option "1", "2" or "3"'))
    print("*"*60)
    print("Excellent choice! This car is the perfect match for your requirements : \n")
    market_final_product(recommended_cars_df.iloc[selected_option - 1])
    print("*"*60)

assistant: 
     Welcome! I can help you find the perfect pre-owned car. Let's start by gathering some information about your preferences.
     
     What is your preferred car brand? If you have multiple choices, please provide a list. 
____________________________________________________________________________________________________
user: 
     bmw
assistant: 
     Great choice! Can you please provide the location where you are looking to buy the car?
____________________________________________________________________________________________________
user: 
     mumbai
assistant: 
     What ownership type are you looking for? Is it first, second, third, fourth, or fifth owner?
____________________________________________________________________________________________________
user: 
     second
assistant: 
     What type of fuel do you prefer for the car: petrol, diesel, or electric?
___________________________________________________________________________________________________