[![Open in Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/drive/1KXlDRGJ_EAenB3d4dX5PxaA6Q_HO4qph#scrollTo=vQiLdRGcsLzh)

# **FlowerLover Knowledge Graph Creation using Neo4j**

## 00. Get Started

### This notebook

By now you should already have the `FlowerLover` folder in your google drive. You just need to mount your drive executing the following line.

In [None]:
from google.colab import drive
drive.mount('/content/drive')

Drive already mounted at /content/drive; to attempt to forcibly remount, call drive.mount("/content/drive", force_remount=True).


And browse the working directory.

In [None]:
# Your path to FlowerLover
%cd '/content/drive/MyDrive/FlowerLover/'

/content/drive/MyDrive/FlowerLover


### If You Joined Late

No worries! 😊 Simply open the Google Drive folder [here](https://drive.google.com/drive/folders/168E2L-SF8RrSwjkAbO5ENki4J_jsZQDg?usp=drive_link) to access all the materials. Then, follow the instructions in the `Get Started.ipynb` notebook to catch up.

## 01. Introduction to Knowledge Graphs


A **Knowledge Graph (KG)**, represents a network of **real-world entities** — i.e., objects, events, situations, or concepts — and illustrates the **relationship between them**. This information is usually stored in **Knowledge Bases (KB)** and visualized as a graph structure, prompting the term knowledge “graph.”

![Neo4j Graph Result](https://drive.google.com/uc?export=view&id=15Ywos_6ubouEanqmKsyZJYayd6yoFIEy)


A knowledge graph is composed by 3 main components:

- 🟢 **Nodes** (Entities): Each node represents a distinct concept or object (e.g., "Summer", "fire lily", "canterbury bells").
- ➡️ **Edges** (Relationships): Edges are the connections or relationships between entities (e.g., "BLOOMS_IN").
- 🔖 **Properties** (Attributes): These are additional pieces of information associated with nodes and edges(e.g., "Pet_Friendly": "Yes").

These components combine to form **Triplets (subject-predicate-object)**, which are the fundamental building blocks of knowledge graphs (e.g., "fire lily" -BLOOMS_IN-> "Summer").


## 02. Enviroment Setup

### 📦 Installing and Importing Essential Libraries

In [None]:
!pip install pandas networkx py2neo &>/dev/null

In [None]:
import pandas as pd
from py2neo import Graph, Node, Relationship
import networkx as nx
import matplotlib.pyplot as plt

In [None]:
# Warning control
import warnings
warnings.filterwarnings('ignore')

### 🌐 Setting Up and Connecting to Neo4j using Neo4j Aura

To get started with Neo4j in the cloud, follow these simple steps:

1. Go to [Neo4j Aura](https://neo4j.com/product/auradb/).
2. Select the **Start Free** (perfect for small projects).
3. Create an account or sign in if you already have one.
3. Click on **"Create instance"** to start a new instance.
4. Once your instance is created, make sure to note the **URI** and **Credentials** (username and password). You'll need these to connect to the database later!

In [None]:
NEO4J_URI = "neo4j+s://<your_neo4j_uri>"
NEO4J_USERNAME = "<your_username>"
NEO4J_PASSWORD = "<your_password>"

Now, connect to Neo4j Aura instance



In [None]:
graph = Graph(NEO4J_URI, auth=(NEO4J_USERNAME, NEO4J_PASSWORD))
print("Connected to Neo4j Aura instance.")

Connected to Neo4j Aura instance.


## 03. Dataset Overview

Our dataset is **synthetic** 🤖, generated using AI to create realistic user preferences and flower characteristics 🌸. It features flower species from the renowed *Oxford 102 dataset* .

The data is neatly organized in the `recommendation system dataset/` folder 📂.


In [None]:
data='recommendation system dataset/'

The dataset is organized as follows:

- 🌹 `flowers.txt`: Details about the flowers such as species ID, name, color, fragrance, symbolism, environment, and more.
- 👤 `users_preferences.txt`: User preferences, including favorite flower IDs, preferred colors, scent preferences, and gardening experience.
- ⭐ `flower_ratings.txt`: User interactions with flowers, including ratings and feedback.

💐 Loading and preprocessing flower species data, filling missing fragrance info with "Unscented"

In [None]:
flower_species = pd.read_csv(f'{data}/flowers.txt', sep="::")

# Filling missing values in the 'Fragrance' column with 'Unscented'
flower_species['Fragrance'] = flower_species['Fragrance'].fillna('Unscented')

display(flower_species.head(5))
print(f"Flower Species: {len(flower_species)}")

Unnamed: 0,Species ID,Name,Color,Flower Shape,Fragrance,Symbolism,Uses,Environment,Blooming Season,Gardening Experience Level,Watering Needs,Light Requirements,Indoor/Outdoor Suitability,Pet Friendly
0,21,fire lily,"Red, Orange",Trumpet-like,Mild,"Passion, Energy","Decoration, Landscaping","Tropical, Humid",Summer,Intermediate,Medium,Full Sun,Outdoor,No
1,3,canterbury bells,"Blue, Purple",Bell-shaped,Mild,"Gratitude, Admiration","Bouquets, Garden Borders","Temperate, Well-drained","Spring, Summer",Beginner,Medium,"Partial Sun, Shade",Indoor/Outdoor,Yes
2,45,bolero deep blue,Deep Blue,Star-like,Unscented,"Mystery, Calmness","Cut Flowers, Decoration","Cool, Well-drained",Summer,Beginner,Medium,"Full Sun, Partial Sun",Outdoor,Yes
3,1,pink primrose,Pink,Rosette,Sweet,"Youth, Everlasting Love","Ground Cover, Borders","Temperate, Moist",Spring,Beginner,Low,"Full Sun, Partial Sun",Indoor/Outdoor,Yes
4,34,mexican aster,"Pink, White",Daisy-like,Unscented,"Innocence, Joy","Cut Flowers, Landscaping","Dry, Temperate","Summer, Fall",Beginner,Low,Full Sun,Outdoor,Yes


Flower Species: 102


In [None]:
print(flower_species.dtypes)

Species ID                     int64
Name                          object
Color                         object
Flower Shape                  object
Fragrance                     object
Symbolism                     object
Uses                          object
Environment                   object
Blooming Season               object
Gardening Experience Level    object
Watering Needs                object
Light Requirements            object
Indoor/Outdoor Suitability    object
Pet Friendly                  object
dtype: object


🙍 Loading user preferences data, handling missing scent preferences and cleaning favorite flower IDs



In [None]:
users_preferences = pd.read_csv(f'{data}/users_preferences.txt', sep="::")

# Filling missing values in the 'Scent_Preference' column with 'Unscented'
users_preferences['Scent_Preference'] = users_preferences['Scent_Preference'].fillna('Unscented')
# Replacig commas with ', ' in the 'Favorite_Flower_IDs' column
users_preferences['Favorite_Flower_IDs'] = users_preferences['Favorite_Flower_IDs'].str.replace(',', ', ')

display(users_preferences.head(5))
print(f"Number of users: {len(users_preferences)}")

Unnamed: 0,User_ID,Favorite_Flower_IDs,Preferred_Colors,Scent_Preference,Purpose,Indoor/Outdoor,Have Pets,Gardening Experience,Maintenance Difficulty Level
0,U001,"73, 82","Pink, White",Unscented,Cut Flowers,Indoor/Outdoor,No,Beginner,Medium
1,U002,"6, 39, 31",Blue,Spicy,Cut Flowers,Indoor/Outdoor,No,Beginner,High
2,U003,"3, 24, 93, 7, 102",Red,Spicy,Decoration,Indoor,Yes,Intermediate,High
3,U004,"72, 77, 88",Orange,Sweet,Unique Gardens,Indoor,Yes,Expert,Low
4,U005,"42, 89, 3, 77","Blue, Red",Unscented,Indoor Decoration,Indoor,No,Expert,High


Number of users: 100


🌟 Loading flower ratings data and displaying interaction details


In [None]:
flower_ratings = pd.read_csv(f'{data}/flower_ratings.txt', sep="::")

display(flower_ratings.head(5))
print(f"Number of interactions: {len(flower_ratings)}")

Unnamed: 0,Interaction_ID,User_ID,Flower_ID,Interaction_Type,Rating,Timestamp,Feedback
0,I001,U001,73,Viewed,4,2024-12-01 10:15:32,"""Loved the pink petals."""
1,I002,U002,6,Liked,5,2024-12-02 14:23:45,"""Spicy scent is perfect."""
2,I003,U003,93,Purchased,5,2024-12-03 16:05:12,"""Great for spicy-themed decor."""
3,I004,U004,72,Added to Wishlist,4,2024-12-04 08:30:45,"""Orange color stands out!"""
4,I005,U005,77,Viewed,3,2024-12-05 12:00:20,"""Prefer other blue flowers."""


Number of interactions: 400


## 04. Building the Knowledge Graph

⏰ This will take a few minutes to run

### 🕸️ Processing flower characteristics to populate the Neo4j graph with nodes and relationships


⚠️ The columns *Color*, *Environment*, *Blooming Season*, and *Light Requirements* may contain multiple values for a single species.

In [None]:
for _, row in flower_species.iterrows():
    # Creating a Species node
    species_node = Node("Species", name=row['Name'], id=row['Species ID'])
    graph.merge(species_node, "Species", "id")

    # Creating Color nodes and relationships to the Species node
    for color in row['Color'].split(', '):
        color_node = Node("Color", name=color)
        graph.merge(color_node, "Color", "name") # Merge ensures unique colors
        graph.merge(Relationship(species_node, "HAS_COLOR", color_node))

    # Creating Environment nodes and relationships to the Species node
    for env in row['Environment'].split(', '):
        env_node = Node("Environment", name=env)
        graph.merge(env_node, "Environment", "name")
        graph.merge(Relationship(species_node, "THRIVES_IN", env_node))

    # Creating Blooming Season nodes and relationships to the Species node
    for season in row['Blooming Season'].split(', '):
      season_node = Node("BloomingSeason", name=season)
      graph.merge(season_node, "BloomingSeason", "name")
      graph.merge(Relationship(species_node, "BLOOMS_IN", season_node))

    # Creating Light Requirement nodes and relationships to the Species node
    for light in row['Light Requirements'].split(', '):
      light_node = Node("LightRequirement", name=light)
      graph.merge(light_node, "LightRequirement", "name")
      graph.merge(Relationship(species_node, "REQUIRES_LIGHT", light_node))

    # Creating Fragrance nodes and relationships to the Species node
    fragrance_node = Node("Fragrance", name=row["Fragrance"])
    graph.merge(fragrance_node, "Fragrance", "name")
    graph.merge(Relationship(species_node, "HAS_FRAGRANCE", fragrance_node))

    # Creating Gardening Experience nodes and relationships to the Species node
    experience_node = Node("ExperienceLevel", name=row["Gardening Experience Level"])
    graph.merge(experience_node, "ExperienceLevel", "name")
    graph.merge(Relationship(species_node, "REQUIRES_EXPERIENCE", experience_node))

    # Creating Indoor/Outdoor Suitability nodes and relationships to the Species node
    suitability_node = Node("Suitability", name=row["Indoor/Outdoor Suitability"])
    graph.merge(suitability_node, "Suitability", "name")
    graph.merge(Relationship(species_node, "SUITABLE_FOR", suitability_node))

    # Adding additional attributes as properties to the Species node
    species_node["Flower_Shape"] = row["Flower Shape"]
    species_node["Symbolism"] = row["Symbolism"]
    species_node["Uses"] = row["Uses"]
    species_node["Pet_Friendly"] = row["Pet Friendly"]

    # Pushing the species node with its properties into the graph
    graph.push(species_node)


### 👤 Creating User Nodes with Preferences and Properties in the KG


In [None]:
for _, row in users_preferences.iterrows():
    # Initializing  User node
    user_node = Node("User", id=row['User_ID'])
    graph.merge(user_node, "User", "id")

    # Linking User nodes to their Preferred Colors
    for color in map(str.strip, row['Preferred_Colors'].split(', ')):
        color_node = graph.nodes.match("Color", name=color).first()
        if color_node:
            graph.merge(Relationship(user_node, "PREFERS_COLOR", color_node))

    # Connecting Scent Preference as Relationship to Fragrance nodes
    fragrance_node = graph.nodes.match("Fragrance", name=row['Scent_Preference']).first()
    if fragrance_node:
        graph.merge(Relationship(user_node, "PREFERS_SCENT", fragrance_node))

    # Adding Suitability (Indoor/Outdoor) as Relationship
    suitability_node = graph.nodes.match("Suitability", name=row['Indoor/Outdoor']).first()
    if suitability_node:
        graph.merge(Relationship(user_node, "SUITABLE_FOR", suitability_node))

    # Linking Favorite Flowers as Relationships to Species nodes
    for flower_id in map(str.strip, row['Favorite_Flower_IDs'].split(', ')):
        flower_id = int(flower_id)  # Converting Flower ID to Integer format to match Species ID
        species_node = graph.nodes.match("Species", id=flower_id).first()
        if species_node:
            graph.merge(Relationship(user_node, "FAVORITE", species_node))

    # Adding Gardening Experience Level as Relationship
    experience_node = graph.nodes.match("ExperienceLevel", name=row['Gardening Experience']).first()
    if experience_node:
        graph.merge(Relationship(user_node, "HAS_EXPERIENCE", experience_node))

    # Adding Purpose, Have Pets, Maintenance Difficulty Level as Properties
    user_node["Purpose"] = row['Purpose']
    user_node["Have_Pets"] = row['Have Pets']
    user_node["Maintenance_Difficulty_Level"] = row['Maintenance Difficulty Level']

    # Persisting User node with all properties
    graph.push(user_node)


### 🔄 Iterating through the flower ratings and add ratings to the KG

In [None]:
for _, row in flower_ratings.iterrows():
    user_node = graph.nodes.match("User", id=row['User_ID']).first()
    species_node = graph.nodes.match("Species", id=row['Flower_ID']).first()

    # Creating interaction relationship with rating and feedback as its properties
    interaction = Relationship(user_node, "RATED", species_node)
    interaction["rating"] = row['Rating']
    interaction["timestamp"] = row['Timestamp']
    interaction["feedback"] = row['Feedback']
    graph.merge(interaction)

## 05. Exploring the Knowledge Graph in Neo4j Aura

Once your Knowledge Graph is built, it's time to explore it in Neo4j Aura:

1. Return to **Neo4j Aura** .
2. Click on the **'Connect'** button in your instance and select **'Explore'** to enter the query interface.
3. Use the following Cypher query to explore the graph.


```cypher
MATCH p=(s:Species)-[:THRIVES_IN]->(e:Environment {name: "Tropical"}),
      q=(s)-[:BLOOMS_IN]->(b:BloomingSeason),
      r=(s)-[:REQUIRES_LIGHT]->(l:LightRequirement)
WHERE s.Pet_Friendly = "Yes" AND b.name IN ["Spring", "Summer"] AND l.name = "Partial Sun"
RETURN p, q, r


🖼️ Once you run the query, you will see something like this:

![Neo4j Graph Result](https://drive.google.com/uc?export=view&id=1dwv3E7Lhf-KZueRujDWSI68D3B17IYco)