# Reddit Langchain Pipeline Test
- This initial test is following a tutorial for reading in first a single data frame then multiple at once

#### Setup

In [4]:
import os
import openai
import warnings

warnings.filterwarnings("ignore")

In [5]:
# OpenAI API Key
from getpass import getpass

OPENAI_API_KEY = getpass()
os.environ["OPENAI_API_KEY"] = OPENAI_API_KEY
openai.api_key = os.getenv("OPENAI_API_KEY")

In [6]:
from langchain.agents import create_pandas_dataframe_agent
from langchain.llms import OpenAI

In [7]:
import pandas as pd
nsc_posts = pd.read_csv('../Reddit Data/nsc_posts.csv')
nsc_comments = pd.read_csv('../Reddit Data/nsc_comments.csv')
tt_posts = pd.read_csv('../Reddit Data/tt_posts.csv')
tt_comments = pd.read_csv('../Reddit Data/tt_comments.csv')
preds_posts = pd.read_csv('../Reddit Data/preds_posts.csv')
preds_comments = pd.read_csv('../Reddit Data/preds_comments.csv')

## Looking at just NSC Posts Df

In [8]:
agent = create_pandas_dataframe_agent(OpenAI(temperature=0), nsc_posts, verbose=True)

In [9]:
agent.run("how many rows are there?")



[1m> Entering new AgentExecutor chain...[0m
[32;1m[1;3mThought: I need to find out how many rows are in the dataframe
Action: python_repl_ast
Action Input: len(df)[0m
Observation: [36;1m[1;3m50[0m
Thought:[32;1m[1;3m I now know the final answer
Final Answer: There are 50 rows in the dataframe.[0m

[1m> Finished chain.[0m


'There are 50 rows in the dataframe.'

In [10]:
agent.run("how many rows have no information in the 'content' column?")



[1m> Entering new AgentExecutor chain...[0m
[32;1m[1;3mThought: I need to count the number of rows with no content
Action: python_repl_ast
Action Input: df['Content'].isnull().sum()[0m
Observation: [36;1m[1;3m21[0m
Thought:[32;1m[1;3m I now know the final answer
Final Answer: 21[0m

[1m> Finished chain.[0m


'21'

In [12]:
agent.run("How many unique values are there in the row 'id'?")



[1m> Entering new AgentExecutor chain...[0m
[32;1m[1;3mThought: I need to find out how many unique values are in the 'id' column
Action: python_repl_ast
Action Input: len(df['id'].unique())[0m
Observation: [36;1m[1;3m31[0m
Thought:[32;1m[1;3m I now know the final answer
Final Answer: There are 31 unique values in the row 'id'.[0m

[1m> Finished chain.[0m


"There are 31 unique values in the row 'id'."

### Trying with multiple dataframes in 1 agent

In [13]:
agent = create_pandas_dataframe_agent(OpenAI(temperature=0), [nsc_posts, nsc_comments, tt_posts, tt_comments, preds_posts, preds_comments], verbose=True)

In [16]:
agent.run("How long are each of the data frames?")



[1m> Entering new AgentExecutor chain...[0m
[32;1m[1;3mThought: I need to find out how many rows are in each data frame.
Action: python_repl_ast
Action Input: print(len(df1))[0m
Observation: [36;1m[1;3m50
[0m
Thought:[32;1m[1;3m I need to find out how many rows are in each of the other data frames.
Action: python_repl_ast
Action Input: print(len(df2))[0m
Observation: [36;1m[1;3m1190
[0m
Thought:

Retrying langchain.llms.openai.completion_with_retry.<locals>._completion_with_retry in 4.0 seconds as it raised RateLimitError: Rate limit reached for default-text-davinci-003 in organization org-mZpDGmkoo1WsEP9bv7bbg2OB on requests per min. Limit: 3 / min. Please try again in 20s. Contact us through our help center at help.openai.com if you continue to have issues. Please add a payment method to your account to increase your rate limit. Visit https://platform.openai.com/account/billing to add a payment method..


[32;1m[1;3m I need to find out how many rows are in each of the other data frames.
Action: python_repl_ast
Action Input: print(len(df3))[0m
Observation: [36;1m[1;3m50
[0m
Thought:

Retrying langchain.llms.openai.completion_with_retry.<locals>._completion_with_retry in 4.0 seconds as it raised RateLimitError: Rate limit reached for default-text-davinci-003 in organization org-mZpDGmkoo1WsEP9bv7bbg2OB on requests per min. Limit: 3 / min. Please try again in 20s. Contact us through our help center at help.openai.com if you continue to have issues. Please add a payment method to your account to increase your rate limit. Visit https://platform.openai.com/account/billing to add a payment method..
Retrying langchain.llms.openai.completion_with_retry.<locals>._completion_with_retry in 4.0 seconds as it raised RateLimitError: Rate limit reached for default-text-davinci-003 in organization org-mZpDGmkoo1WsEP9bv7bbg2OB on requests per min. Limit: 3 / min. Please try again in 20s. Contact us through our help center at help.openai.com if you continue to have issues. Please add a payment method to your account to increase your rate limit. Visit https://platform.openai.com/acco

[32;1m[1;3m I need to find out how many rows are in each of the other data frames.
Action: python_repl_ast
Action Input: print(len(df4))[0m
Observation: [36;1m[1;3m5918
[0m
Thought:

Retrying langchain.llms.openai.completion_with_retry.<locals>._completion_with_retry in 4.0 seconds as it raised RateLimitError: Rate limit reached for default-text-davinci-003 in organization org-mZpDGmkoo1WsEP9bv7bbg2OB on requests per min. Limit: 3 / min. Please try again in 20s. Contact us through our help center at help.openai.com if you continue to have issues. Please add a payment method to your account to increase your rate limit. Visit https://platform.openai.com/account/billing to add a payment method..
Retrying langchain.llms.openai.completion_with_retry.<locals>._completion_with_retry in 4.0 seconds as it raised RateLimitError: Rate limit reached for default-text-davinci-003 in organization org-mZpDGmkoo1WsEP9bv7bbg2OB on requests per min. Limit: 3 / min. Please try again in 20s. Contact us through our help center at help.openai.com if you continue to have issues. Please add a payment method to your account to increase your rate limit. Visit https://platform.openai.com/acco

[32;1m[1;3m I need to find out how many rows are in each of the other data frames.
Action: python_repl_ast
Action Input: print(len(df5))[0m
Observation: [36;1m[1;3m50
[0m
Thought:

Retrying langchain.llms.openai.completion_with_retry.<locals>._completion_with_retry in 4.0 seconds as it raised RateLimitError: Rate limit reached for default-text-davinci-003 in organization org-mZpDGmkoo1WsEP9bv7bbg2OB on requests per min. Limit: 3 / min. Please try again in 20s. Contact us through our help center at help.openai.com if you continue to have issues. Please add a payment method to your account to increase your rate limit. Visit https://platform.openai.com/account/billing to add a payment method..
Retrying langchain.llms.openai.completion_with_retry.<locals>._completion_with_retry in 4.0 seconds as it raised RateLimitError: Rate limit reached for default-text-davinci-003 in organization org-mZpDGmkoo1WsEP9bv7bbg2OB on requests per min. Limit: 3 / min. Please try again in 20s. Contact us through our help center at help.openai.com if you continue to have issues. Please add a payment method to your account to increase your rate limit. Visit https://platform.openai.com/acco

[32;1m[1;3m I need to find out how many rows are in each of the other data frames.
Action: python_repl_ast
Action Input: print(len(df6))[0m
Observation: [36;1m[1;3m1876
[0m
Thought:

Retrying langchain.llms.openai.completion_with_retry.<locals>._completion_with_retry in 4.0 seconds as it raised RateLimitError: Rate limit reached for default-text-davinci-003 in organization org-mZpDGmkoo1WsEP9bv7bbg2OB on requests per min. Limit: 3 / min. Please try again in 20s. Contact us through our help center at help.openai.com if you continue to have issues. Please add a payment method to your account to increase your rate limit. Visit https://platform.openai.com/account/billing to add a payment method..
Retrying langchain.llms.openai.completion_with_retry.<locals>._completion_with_retry in 4.0 seconds as it raised RateLimitError: Rate limit reached for default-text-davinci-003 in organization org-mZpDGmkoo1WsEP9bv7bbg2OB on requests per min. Limit: 3 / min. Please try again in 20s. Contact us through our help center at help.openai.com if you continue to have issues. Please add a payment method to your account to increase your rate limit. Visit https://platform.openai.com/acco

[32;1m[1;3m I now know the final answer.
Final Answer: df1 has 50 rows, df2 has 1190 rows, df3 has 50 rows, df4 has 5918 rows, df5 has 50 rows, and df6 has 1876 rows.[0m

[1m> Finished chain.[0m


'df1 has 50 rows, df2 has 1190 rows, df3 has 50 rows, df4 has 5918 rows, df5 has 50 rows, and df6 has 1876 rows.'