<a href="https://colab.research.google.com/github/simranroy01/lawBOT/blob/main/lawbot.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [1]:
import pandas as pd
import nltk
import string
import random
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.metrics.pairwise import cosine_similarity

In [3]:
nltk.download('punkt')

[nltk_data] Downloading package punkt to /root/nltk_data...
[nltk_data]   Unzipping tokenizers/punkt.zip.


True

In [5]:
# Load the legal dataset from CSV
df = pd.read_csv('ipc_sections.csv')

# Preprocess the dataset
df['text'] = df['Description'] + ' ' + df['Offense'] + ' ' + df['Punishment'] + ' ' + df['Section']
df['text'] = df['text'].astype(str)  # Convert all values to string

In [6]:
# Initialize NLTK resources
nltk.download('punkt')
nltk.download('wordnet')
nltk.download('omw-1.4')

lemmer = nltk.stem.WordNetLemmatizer()

[nltk_data] Downloading package punkt to /root/nltk_data...
[nltk_data]   Package punkt is already up-to-date!
[nltk_data] Downloading package wordnet to /root/nltk_data...
[nltk_data] Downloading package omw-1.4 to /root/nltk_data...


In [7]:
# Function to lemmatize tokens
def LemToken(tokens):
    return [lemmer.lemmatize(token) for token in tokens]

remove_punc_dict = dict((ord(punct), None) for punct in string.punctuation)

In [8]:
# Function to lemmatize and normalize text
def LemNormalize(text):
    return LemToken(nltk.word_tokenize(text.lower().translate(remove_punc_dict)))

# Initialize TF-IDF Vectorizer
TfidfVec = TfidfVectorizer(tokenizer=LemNormalize, stop_words='english', token_pattern=r'(?u)\b\w\w+\b')

# Fit TF-IDF Vectorizer on legal dataset text
tfidf_matrix = TfidfVec.fit_transform(df['text'])



In [9]:
# Function to generate response using TF-IDF and cosine similarity
def response(user_response):
    user_response = ' '.join(nltk.word_tokenize(user_response.lower()))
    query_vector = TfidfVec.transform([user_response])
    cosine_similarities = cosine_similarity(query_vector, tfidf_matrix)
    idx = cosine_similarities.argsort()[0][-1]
    return df.iloc[idx]['text']

In [10]:
# Function to handle greetings
def greet(sentence):
    for word in sentence.split():
        if word.lower() in ('hello', 'hi', 'wassup', 'hey'):
            return random.choice(['hi', 'hey!', 'hey there!', 'hola user'])

In [None]:
# Main execution loop
flag = True
print('Bot: Hello, I am your legal chatbot. How can I help you?')
while flag:
    user_response = input('You: ')
    user_response = user_response.lower()
    if user_response != 'bye':
        if user_response == 'thank you' or user_response == 'thanks':
            flag = False
            print('Bot: You are welcome')
        else:
            if greet(user_response) is not None:
                print('Bot:', greet(user_response))
            else:
                bot_response = response(user_response)
                print('Bot:', bot_response)
    else:
        flag = False
        print('Bot: Goodbye!!')

Bot: Hello, I am your legal chatbot. How can I help you?
Bot: Description of IPC Section 511
According to section 511 of Indian penal code, Whoever attempts to commit an offence punishable by this Code with imprisonment for life or imprisonment, or to cause such an offence to be committed, and in such attempt does any act towards the commission of the offence, shall, where no express provision is made by this Code for the punishment of such attempt, be punished with imprisonment of any description provided for the offence, for a term which may extend to one-half of the imprisonment for life or, as the case may be, one-half of the longest term of imprisonment provided for that offence, or with such fine as is provided for the offence, or with both.

IPC 511 in Simple Words
Whoever tries to commit a crime punishable by imprisonment or causes someone else to commit it, but fails, can be punished with imprisonment up to half the maximum term or a fine, as specified for that offense. Attemp