# Task -> Build a summarisation model that can summarise conversations in 2-3 sentences

# In this approach I have tried to use langchain with groq inferencing through the new llama 3.1 LLM. Then I prompted the LLM to act as a conversation summariser and summarise the conversations into 2-3 sentences. I collected the responses for each conversations and stored it into a separate csv file

# Importing libraries

In [1]:
import pandas as pd
from transformers import pipeline

# Load the dataset
file_path = 'topical_chat_dataset-top_100.csv'
data = pd.read_csv(file_path)

# Aggregate messages by conversation_id
conversations = data.groupby('conversation_id')['message'].apply(lambda x: ' '.join(x)).reset_index()



In [2]:
print(conversations)

    conversation_id                                            message
0                 1  Are you a fan of Google or Microsoft? Both are...
1                 2  do you like dance? Yes  I do. Did you know Bru...
2                 3  Hey what's up do use Google very often?I reall...
3                 4  Hi!  do you like to dance? I love to dance a l...
4                 5  do you like dance? I love it. Did you know Bru...
..              ...                                                ...
95               96  Dogs or cats?  Hi! I have two cats, but want a...
96               97  Crazy that early humans used to have to battle...
97               98   Hi, Do you like to cook? I love to cook. just...
98               99  Hi do you like tennis? Oh yeah! I pick up a ra...
99              100  What do you think of Serena Williams? I really...

[100 rows x 2 columns]


In [3]:
conversations.to_csv('to_100_organised.csv')

In [4]:
import pandas as pd

# Load the data
file_path = 'to_100_organised.csv'
conversations_df = pd.read_csv(file_path)

# Display the first few rows to understand its structure
print(conversations_df.head())


   Unnamed: 0  conversation_id  \
0           0                1   
1           1                2   
2           2                3   
3           3                4   
4           4                5   

                                             message  
0  Are you a fan of Google or Microsoft? Both are...  
1  do you like dance? Yes  I do. Did you know Bru...  
2  Hey what's up do use Google very often?I reall...  
3  Hi!  do you like to dance? I love to dance a l...  
4  do you like dance? I love it. Did you know Bru...  


In [5]:
conversations_df_test = conversations_df[:10]

In [6]:
conversations_df_test['message'][0]

'Are you a fan of Google or Microsoft? Both are excellent technology they are helpful in many ways. For the security purpose both are super.  I\'m not  a huge fan of Google, but I use it a lot because I have to. I think they are a monopoly in some sense.   Google provides online related services and products, which includes online ads, search engine and cloud computing.  Yeah, their services are good. I\'m just not a fan of intrusive they can be on our personal lives.  Google is leading the alphabet subsidiary and will continue to be the Umbrella company for Alphabet internet interest. Did you know Google had hundreds of live goats to cut the grass in the past?  It is very interesting. Google provide "Chrome OS" which is a light weight OS. Google provided a lot of hardware mainly in 2010 to 2015.  I like Google Chrome. Do you use it as well for your browser?  Yes.Google is the biggest search engine and Google service figure out top 100 website, including Youtube and Blogger.  By the wa

# Initialised langchain and a chain using Groq chat and llama 3.1 LLM

In [7]:
from langchain_core.prompts import ChatPromptTemplate
from langchain_groq import ChatGroq

# Initialize the Groq client with your API key
groq_api_key = "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"
chat = ChatGroq(temperature=0, groq_api_key=groq_api_key, model_name="llama-3.1-8b-instant")

# Define the system and human messages
system = "You are a helpful assistant for summarising messages into 2-3 sentences. Avoid using the sentence 'Here is a 3-sentence summary of the conversation'. Instead directly provide only the summary"
human = "{text}"

# Create the prompt template
prompt = ChatPromptTemplate.from_messages([("system", system), ("human", human)])

# Create the chain
chain = prompt | chat


# built a function to return the summarised response of the text

In [8]:
def summarize_conversation(conversation):
    response = chain.invoke({"text": conversation})
    
    # Print the response to debug its structure
    # print("Response:", response.content)
    
    
    # Assuming response is a dictionary with 'choices' key containing a list
    # if isinstance(response, dict) and 'choices' in response and len(response['choices']) > 0:
    #     summary = response['choices'][0]['message']['content']
    # else:
    #     summary = "Summary not available"
        
    return response.content
# Apply summarization to each conversation
conversations_df['summary'] = conversations_df['message'].apply(summarize_conversation)

# Display the dataframe with summaries
print(conversations_df.head())


   Unnamed: 0  conversation_id  \
0           0                1   
1           1                2   
2           2                3   
3           3                4   
4           4                5   

                                             message  \
0  Are you a fan of Google or Microsoft? Both are...   
1  do you like dance? Yes  I do. Did you know Bru...   
2  Hey what's up do use Google very often?I reall...   
3  Hi!  do you like to dance? I love to dance a l...   
4  do you like dance? I love it. Did you know Bru...   

                                             summary  
0  The conversation started with a discussion abo...  
1  You both enjoy dance and mentioned various int...  
2  Google was founded in 1998 and has become a da...  
3  You and the other person discussed their share...  
4  You both enjoy dance and share interesting fac...  


In [9]:
conversations_df['summary'][0]

'The conversation started with a discussion about Google and Microsoft, with the speaker expressing a neutral stance towards Google due to its monopoly and intrusive nature. They mentioned various Google services and products, including Google Chrome, which they use as their browser. The conversation then took a tangent to discuss fish, dolphins, and cats, with the speaker sharing interesting facts about these animals.'

# conversation_df contains the message and their corrwsponding summary

In [10]:
conversations_df

Unnamed: 0.1,Unnamed: 0,conversation_id,message,summary
0,0,1,Are you a fan of Google or Microsoft? Both are...,The conversation started with a discussion abo...
1,1,2,do you like dance? Yes I do. Did you know Bru...,You both enjoy dance and mentioned various int...
2,2,3,Hey what's up do use Google very often?I reall...,Google was founded in 1998 and has become a da...
3,3,4,Hi! do you like to dance? I love to dance a l...,You and the other person discussed their share...
4,4,5,do you like dance? I love it. Did you know Bru...,You both enjoy dance and share interesting fac...
...,...,...,...,...
95,95,96,"Dogs or cats? Hi! I have two cats, but want a...",You and the other person discussed their prefe...
96,96,97,Crazy that early humans used to have to battle...,"Early humans had to battle giant sloths, and h..."
97,97,98,"Hi, Do you like to cook? I love to cook. just...","You both enjoy cooking, but prefer to grill or..."
98,98,99,Hi do you like tennis? Oh yeah! I pick up a ra...,"You and the other person discussed tennis, sha..."


# Saving the summaries into a separate csv file

In [11]:
conversations_df.to_csv('summarised_convo_final.csv')