This python notebook is to just check all the api connections and get some checks done on the basic functioning of the langchain project

In [70]:
#using the dotenv library to use keys and ids for different APIs as they must remain a secret from the public
from dotenv import load_dotenv,find_dotenv
#this checks if the ".env" file exists or not
load_dotenv(find_dotenv())

True

In [71]:
#we import OpenAI's "text-davinci-003" from lang-chains large shelf of Large-Language-Models
from langchain.llms import OpenAI as oa
llm = oa (model_name = "text-davinci-003")
#we test it out and we get a result as a normal ChatGPT response, this is also using the free grant I have got from OpenAI of $5 
llm("Explain furry pets in a sentence")

'\n\nFurry pets are animals that are covered in soft, thick fur, such as cats, dogs, rabbits, and guinea pigs.'

In [72]:
#import a schema or a structure from the lang-chain library. The Human Message is a normal prompt a user gives and the System Message is a feedback giver to the user (Maybe a role to play) before accepting the Human Messsage. The Ai message here could be considered as the output
from langchain.schema import (AIMessage, HumanMessage, SystemMessage)
#importing a chatbot model so that it keeps tack the state of the conversation, generate prompts to improve quality of responses, select the best response, and logging the conversation to fine tune the output 
from langchain.chat_models import ChatOpenAI

In [73]:
#here we use gpt-3.5-turbo as the 4th generation has a really complicated relationship of money with my grandpa
chat = ChatOpenAI (model_name = "gpt-3.5-turbo", temperature = 0.3)
messages = [
    SystemMessage(content = "You are an expert on Data Science"),
    HumanMessage(content = "Write a python Script to Split a dataset and train it in LinearRegression")
]
response = chat(messages)
print(response.content,end = "\n")

Sure! Here's a Python script that splits a dataset into training and testing sets and trains a Linear Regression model using scikit-learn library:

```python
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression

# Load the dataset
data = pd.read_csv('dataset.csv')  # Replace 'dataset.csv' with your dataset file

# Split the dataset into features and target variable
X = data.drop('target_variable', axis=1)  # Replace 'target_variable' with your target variable column name
y = data['target_variable']

# Split the dataset into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)  # Adjust test_size as desired

# Train the Linear Regression model
model = LinearRegression()
model.fit(X_train, y_train)

# Evaluate the model
train_score = model.score(X_train, y_train)
test_score = model.score(X_test, y_test)

print("Training score:", train_score)
print("Testing 

In [74]:
#PromptTemplate is very important as it makes everything very flexible 
from langchain import PromptTemplate
template = ''' You are an expert on AI that talks all the things about it like a teacher. Explain the concept of {concept} in a couple of lines in the most sensible way possible'''
prompt = PromptTemplate(input_variables = ["concept"], template= template)
prompt

PromptTemplate(input_variables=['concept'], output_parser=None, partial_variables={}, template=' You are an expert on AI that talks all the things about it like a teacher. Explain the concept of {concept} in a couple of lines in the most sensible way possible', template_format='f-string', validate_template=True)

In [75]:
llm(prompt.format(concept = "Linear Regression"))

'.\n\nLinear regression is a predictive modeling technique that uses a linear equation to describe the relationships between input variables (x) and a continuous output variable (y). In linear regression, a dependent variable is predicted from one or more independent variables using a linear combination of the independent variables.'

In [76]:
#now the chain part, it is a part that can be used as an agent or an llm, what it does is just like a prompt it executes if single but in a much lagrer number one chains output is used as another chains input
from langchain.chains import LLMChain
chain = LLMChain(llm = llm, prompt = prompt)
print(chain.run("Linear Regression"))



Linear regression is a statistical method used to identify the linear relationship between one or more independent variables and a continuous dependent variable. It attempts to find the "best fit" line that describes the relationship between the variables.


In [77]:
#as seen over here the second chain is created over here... I can add another chain to this where the third chain produces questions to the topic
second_prompt = PromptTemplate( input_variables = ["another_concept"], template = "Turn the concept of {another_concept} into a very layman term in 500 words")
chain_two = LLMChain(llm = llm, prompt = second_prompt)

In [78]:
#simple sequential chain just puts the order of the chains linearly 
from langchain.chains import SimpleSequentialChain
#verbose also prints the progress
overall_chain = SimpleSequentialChain(chains = [chain, chain_two], verbose = True)
explanation = overall_chain.run("Linear Regression")
print(explanation)



[1m> Entering new SimpleSequentialChain chain...[0m
[36;1m[1;3m

Linear regression is a supervised machine learning algorithm used to predict continuous or quantitative values. It is a statistical technique that models the relationship between a dependent variable (the value we are trying to predict) and one or more independent variables (the factors that influence the value) by fitting a linear equation to the observed data.[0m
[33;1m[1;3m

Linear regression is a powerful tool for making predictions about the world. Put simply, it is a way of using the relationships between different variables to guess the value of something we don’t already have. 

For example, let’s say you want to predict how much a house will cost based on its size. You could look at the prices of other houses and the sizes of those houses to determine the relationship between these two variables. This relationship can then be used to predict the price of a house based on its size. 

To do this, we start 

In [79]:
#this function splits the large corpus into smaller chunks (like 100 characters as mentioned below) over here we are gonna use it for embedding (tokenizaion and vectorization)
from langchain.text_splitter import RecursiveCharacterTextSplitter
text_splitter = RecursiveCharacterTextSplitter(chunk_size=100, chunk_overlap=0)
texts = text_splitter.create_documents([explanation])
texts

[Document(page_content='Linear regression is a powerful tool for making predictions about the world. Put simply, it is a way', metadata={}),
 Document(page_content='of using the relationships between different variables to guess the value of something we don’t', metadata={}),
 Document(page_content='already have.', metadata={}),
 Document(page_content='For example, let’s say you want to predict how much a house will cost based on its size. You could', metadata={}),
 Document(page_content='look at the prices of other houses and the sizes of those houses to determine the relationship', metadata={}),
 Document(page_content='between these two variables. This relationship can then be used to predict the price of a house', metadata={}),
 Document(page_content='based on its size.', metadata={}),
 Document(page_content='To do this, we start by defining the “dependent variable”, which is the value we’re trying to', metadata={}),
 Document(page_content='predict. In our example, the dependent var

In [55]:
#here we use OpenAIs embedding tool to convert this categorical data to numerical
from langchain.embeddings import OpenAIEmbeddings
#one of the best things OpenAI keeps working on is the model called "ada"/adaptive it is trained on a very large corpus... It basically figures out the relationship between different characters and calculates the similarity_score, useful in searching relevant topics according to custom prompts
embeddings = OpenAIEmbeddings(model_name="ada")

In [57]:
#this is just the numerical representation of the corpus we generated above and split into different parts
query_result = embeddings.embed_query(texts[0].page_content)
query_result

[-0.004647566956370373,
 0.0064872289348079735,
 0.025378162147569546,
 0.03552942737912639,
 0.02833385218622883,
 0.013514636141025646,
 0.010171649088832809,
 0.010487602136239213,
 0.056953080236966255,
 -0.009769063483850352,
 -0.04036045091362556,
 -0.027375800649710347,
 0.026336213684088026,
 -0.02078155623137294,
 0.022809770053766394,
 -0.011588341605011994,
 0.014268846310816792,
 0.022891305482870234,
 0.011955254761269586,
 0.04219501855755869,
 0.024501647902800062,
 -0.016582436929041423,
 0.04296961072198064,
 0.007929401505412752,
 0.01805009141671695,
 0.0352440515146178,
 0.0322272071101629,
 0.0386685730645919,
 0.02568392280067668,
 -0.03147299973433949,
 -0.0278038663091184,
 -0.010242993054959959,
 0.01814181924012006,
 0.024236654964244847,
 0.0010739854814750987,
 0.015624386323845519,
 0.03196221603425284,
 0.01175141432586483,
 -0.02521508942671671,
 -0.004887079840499993,
 0.010925859258623955,
 0.00017039805314606244,
 -0.0005978264405278497,
 -0.0041226772

In [58]:
#here we use pinecone a vector storing tool (obv the free version coz we broke)
import os
import pinecone
from langchain.vectorstores import Pinecone
pinecone.init(	api_key= os.getenv("PINECONE_API_KEY") ,      
	environment=os.getenv("PINECONE_ENV"))

In [61]:
#this is the indexname I gave on the website
index_name = 'langchain-test2'
search = Pinecone.from_documents(texts,embeddings,index_name=index_name)

In [80]:
# now time for a custom query and results
query = "How does Linear Regression work ?"
result = search.similarity_search(query, include_scores = True)
result

[Document(page_content='would look something like this:', metadata={}),
 Document(page_content='Linear regression works by finding a line that best fits the data points. It looks at the input', metadata={}),
 Document(page_content='The equation that it comes up with is called a linear regression equation, which is used to predict', metadata={}),
 Document(page_content='example, this would be the price.', metadata={})]