## Data Cleaning

In [1]:
import pandas as pd

In [2]:
df = pd.read_csv("hf://datasets/jibrand/Plant-dataset/plant_dataset.csv")
df.head(10)

  from .autonotebook import tqdm as notebook_tqdm


Unnamed: 0,Plant,Light,Watering,Humidity,Temperature,Fertilizer,Pruning,Propagation,Notes,Unnamed: 9
0,Rosemary,"""Full sun""","""Moderate""","""Well-drained""","""Balanced""","""Warm""","""Low""","""Regularly""","""Cuttings""",
1,Lavender,"""Full sun""","""Moderate""","""Well-drained""","""Low""","""Warm""","""Low""","""After flowering""","""Cuttings""",
2,Sage,"""Full sun""","""Moderate""","""Well-drained""","""Low""","""Warm""","""Low""","""After flowering""","""Cuttings""",
3,Thyme,"""Full sun""","""Moderate""","""Well-drained""","""Low""","""Warm""","""Low""","""After flowering""","""Cuttings""",
4,Oregano,"""Full sun""","""Moderate""","""Well-drained""","""Low""","""Warm""","""Low""","""After flowering""","""Cuttings""",
5,Basil,"""Full sun""","""Regular""","""Well-drained""","""Balanced""","""Warm""","""Moderate""","""Pinch back""","""Seeds""",
6,Mint,"""Partial shade""","""Regular""","""Well-drained""","""Balanced""","""Cool""","""Moderate""","""Pinch back""","""Division""",
7,Parsley,"""Full sun""","""Regular""","""Well-drained""","""Balanced""","""Cool""","""Moderate""","""Pinch back""","""Seeds""",
8,Dill,"""Full sun""","""Regular""","""Well-drained""","""Balanced""","""Cool""","""Moderate""","""Pinch back""","""Seeds""",
9,Chives,"""Full sun""","""Moderate""","""Well-drained""","""Balanced""","""Cool""","""Moderate""","""Regularly""","""Division""",


In [3]:
def write_stories_to_txt(df, output_file):
    # Open the output file in write mode
    with open(output_file, 'w') as f:
        # Group the dataframe by 'Plant' (or any other column you choose)
        grouped = df.groupby('Plant')
        
        # Loop through each plant and its corresponding rows
        for plant, details in grouped:
            # Write the plant name as a heading
            f.write(f"{plant}\n")
            f.write("=" * len(plant) + "\n")  # Adding a separator line
            
            # Write each detail (row) for this plant
            for _, row in details.iterrows():
                f.write(f"{row.to_dict()}\n\n")  # Convert the row to a dictionary for better formatting
            
            # Add a couple of blank lines between different plant sections
            f.write("\n\n")

    print(f"Details have been written to {output_file}")




In [4]:
write_stories_to_txt(df, "plant_details.txt")

Details have been written to plant_details.txt


## RAG

In [5]:
from langchain_openai import ChatOpenAI
from langchain_core.prompts import ChatPromptTemplate
from langchain_core.output_parsers import StrOutputParser

from langchain_community.document_loaders import TextLoader #load the document
from langchain_text_splitters import RecursiveCharacterTextSplitter #for creating chunks from the loaded document
from langchain_openai import OpenAIEmbeddings #for converting chunks into embeddings
from langchain_chroma import Chroma #database for stroring the embeddings

In [6]:
from dotenv import load_dotenv
load_dotenv()

True

In [7]:
import os
dir = os.getcwd()
db_dir = os.path.join(dir,"chroma_db")
print(db_dir)

/Users/majingyi/Downloads/H2M1/chroma_db


### Create vector DB

In [8]:
#Read the text content from the .txt file and load it as langchain document
loader = TextLoader('stories.txt')
document = loader.load()

In [9]:
#Split the document into chunks using text splitters 
text_splitter = RecursiveCharacterTextSplitter(chunk_size=1000, chunk_overlap=200)
chunks = text_splitter.split_documents(document)

print("Document chunk info:\n")
print(f"Number of document chunks: {len(chunks)}")
print(f"Sample chunk: \n{chunks[3].page_content}\n")

Document chunk info:

Number of document chunks: 3990
Sample chunk: 
In the bustling city of Bustleton, where the traffic never seemed to move and the pigeons had perfected synchronized flying, lived a man named Bob. Bob had a peculiar talent—he could never seem to find his socks.  Every morning, Bob would rummage through his dresser drawers in search of a matching pair of socks, only to emerge with one polka-dotted sock and one striped sock. No matter how many times he bought new socks or organized his drawers, the socks seemed to vanish into thin air.  One day, in a fit of frustration, Bob decided to take matters into his own hands. He set up a surveillance camera in his bedroom to catch the elusive sock thief in action. But when he reviewed the footage the next morning, he discovered the culprit—his mischievous pet cat, Whiskers, who had been hoarding Bob's socks under the bed.  With a bemused smile, Bob realized that his sock-stealing cat was just another quirky aspect of life in B

In [10]:
#create embeddings using openAI embeddings
embeddings = OpenAIEmbeddings(
    model="text-embedding-3-small"
)

In [11]:
#store the embeddings and chunks into Chroma DB
Chroma.from_documents(documents=chunks, embedding=embeddings, persist_directory=db_dir)

<langchain_chroma.vectorstores.Chroma at 0x1610ed910>

### Retrieve and generate

In [12]:
#setting up the DB for retrieval
embeddings_used = OpenAIEmbeddings(model="text-embedding-3-small")
vectorDB = Chroma(persist_directory=db_dir,embedding_function=embeddings_used)

In [13]:
#setting up Retriver
retriever = vectorDB.as_retriever(search_type="similarity", search_kwargs={"k": 3})

In [14]:
def getRetriever(dir):
    """
    dir is the directory of the vector DB
    """
    embeddings_used = OpenAIEmbeddings(model="text-embedding-3-small")
    vectorDB = Chroma(persist_directory=dir,embedding_function=embeddings_used)
    retriever = vectorDB.as_retriever(search_type="similarity", search_kwargs={"k": 3})
    return retriever

In [15]:
def textGeneration_langChain_RAG(msg,type,retrieverDir):
    """
    msg is the scenario for the text from the pic (hugging face model output);
    type is the genre of the text- Light, Watering, Humidity, Temperture, Pruning, Propagation
    retriever is the vector DB with relevant stories from txt version of 
        plants dataset from Hugging face - https://huggingface.co/datasets/jibrand/Plant-dataset
    """
    llm = ChatOpenAI(
            model="gpt-4o",
            temperature=0.2,
            max_tokens=200,
            timeout=None,
            max_retries=2
        )

    system_prompt = (
        "You are an expert gardener in giving {suggestion_type} plants caring suggestions." 
        "Provide detailed gardening advice regarding {suggestion_type}."
        "Use the following pieces of retrieved context to generate {suggestion_type} suggestions based on the plants in the given scenario."
        "Use a simple narrative structure to generate these suggestions."
        "keep the suggestions to less than 150 words."
        "\n\n"
        "{context}"
    )

    prompt = ChatPromptTemplate.from_messages(
        [
            ("system", system_prompt),
            ("human", "{scenario_lang}"),
        ]
    )

    rag_chain = prompt | llm | StrOutputParser()

    retriever = getRetriever(retrieverDir)

    out_message = rag_chain.invoke({
            "suggestion_type" : type,
            "context":retriever,
            "scenario_lang" : msg,
        })
    
    return out_message

In [16]:
scenario = "there are many ripe strawberries growing on the plant in the garden" #example output from huggingface model
suggestions = textGeneration_langChain_RAG(scenario,"Watring", db_dir)
print(suggestions)

Congratulations on your thriving strawberry plants! To ensure they continue to produce delicious fruit, proper watering is essential. Strawberries prefer consistent moisture, but they don't like to be waterlogged. Here’s how you can keep them happy:

1. **Watering Schedule**: Water your strawberries in the early morning. This allows the foliage to dry during the day, reducing the risk of fungal diseases.

2. **Soil Moisture**: Keep the soil consistently moist, but not soggy. Aim for about 1-1.5 inches of water per week, including rainfall. Use a rain gauge to track natural precipitation.

3. **Mulching**: Apply a layer of mulch around the plants to help retain moisture, suppress weeds, and keep the fruit clean.

4. **Drip Irrigation**: If possible, use drip irrigation or a soaker hose to deliver water directly to the roots, minimizing water contact with the leaves and fruit.

By following these tips, your strawberries should continue
