# Recommendation Engine

The foundation of this model lies in understanding the distinct agroclimate zones across regions. By analyzing historical meteorological data such as rainfall and temperature patterns, states are grouped into these agroclimate zones. The dissimilarity matrix is then constructed, where each cell represents the dissimilarity score between two agroclimate zones. Higher values indicate greater dissimilarity, ensuring diverse recommendations.

Here's how the model operates: when a farmer enters their location in their profile details in the SeedShare app, the system identifies their agroclimate zone. Using the dissimilarity matrix, it recommends farms from zones dissimilar to the farmer's, reducing the risk of shared climate-related issues. Further filtering considers factors like historical yield and bid prices to offer personalized recommendations.These recommendations empower farmers with insights to explore novel geographic regions that may offer unique agricultural advantages and prospects.  In essence, this model leverages meteorological data to cluster regions, allowing farmers to make informed decisions and mitigate crop-related risks effectively.

In [None]:
import matplotlib.pyplot as plt

# Load the image
image = plt.imread(r"E:\agroclimatezones.png")
fig = plt.figure(figsize=(10, 10), dpi=300)

# Display the image
plt.imshow(image)

# Turn off the x and y axis
plt.axis('off')

# Show the figure
plt.show()


An “Agro-climatic zone” is a land unit in terms of major climates, suitable for a certain range of crops and cultivars. The planning aims at scientific management of regional resources to meet the food, fiber, fodder, and fuelwood without adversely affecting the status of natural resources and the environment. Agro-climatic conditions mainly refer to soil types, rainfall, temperature, and water availability, which influence the type of vegetation. 

# Flow Diagram of Recommendation Engine

In [None]:
import matplotlib.pyplot as plt

# Load the image
image = plt.imread(r"E:\Rengine flowdg.jpg")

# Create a figure with a specific size and resolution
fig = plt.figure(figsize=(10, 10), dpi=300)

# Display the image
plt.imshow(image)

# Turn off the x and y axis
plt.axis('off')

# Show the figure
plt.show()


# __Installing Necessary Libraries__


In [None]:
!pip install scikit-surprise
!pip install pandas openpyxl


In [None]:
import pandas as pd

# Provide the path to your Excel file
excel_file_path = r"E:\recommendation.xlsx"  # Replace with the actual file path

# Read the Excel file into a Pandas DataFrame
data = pd.read_excel(excel_file_path, sheet_name='recommendation')  # Replace 'Sheet1' with the actual sheet name.


# Optionally, you can check the first few rows of your DataFrame to verify the data
print(data.head())


 Using the historical average rainfall, minimum temperature, and maximum temperature data for each agroclimate zone from 1966 to 2011, a dissimilarity matrix (14x14) is computed. This matrix quantifies how dissimilar one agroclimate zone is from another. A higher value in the matrix indicates greater dissimilarity.


In [None]:
import matplotlib.pyplot as plt

# Load the image
image = plt.imread(r"E:\rainfall.png")
fig = plt.figure(figsize=(5, 5), dpi=300)
# Display the image
plt.imshow(image)
plt.axis('off')


# Show the figure
plt.show()

The dataset's rainfall and temperature data are crucial for defining agroclimatic zones and for calculating dissimilarity between these zones. This dissimilarity matrix is then used to recommend farms in different zones to mitigate the risk of crop failure due to localized weather events.

# Evaluating Euclidean Distance

In [None]:
import pandas as pd
from sklearn.preprocessing import normalize
from sklearn.metrics import pairwise_distances

# Replace 'Sheet1' with the name of the sheet you want to load
sheet_name = 'Sheet5'

# Load a specific sheet from your Excel file
your_dataset = pd.read_excel(r'E:\recommendation.xlsx', sheet_name=sheet_name)

# Normalize the entire dataset
normalized_data = normalize(your_dataset)

# Create a DataFrame with the normalized data
scaled_df = pd.DataFrame(normalized_data, columns=your_dataset.columns)

# Calculate the Euclidean distance matrix
euclidean_dissimilarity_matrix = pairwise_distances(scaled_df.values, metric='euclidean')

# Convert the dissimilarity matrix to a DataFrame for easier visualization
euclidean_dissimilarity_df = pd.DataFrame(euclidean_dissimilarity_matrix, index=scaled_df.index, columns=scaled_df.index)

# Print or use the dissimilarity matrix as needed
print(euclidean_dissimilarity_df)


Normalizing meteorological data, such as rainfall and temperature, is crucial because it ensures that all variables are on a consistent scale. Since these data often have different units and magnitudes, normalization makes them comparable and prevents certain variables from dominating the analysis. 

In [None]:
import pandas as pd
from sklearn.preprocessing import normalize
from sklearn.metrics import pairwise_distances

# Replace 'Sheet1' with the name of the sheet you want to load
sheet_name = 'Sheet5'

# Load a specific sheet from your Excel file
your_dataset = pd.read_excel(r'E:\recommendation.xlsx', sheet_name=sheet_name)

# Create an array with the names of your agroclimate zones
agroclimate_zone_names = ['Western Himalayan Region', 'Eastern Himalayan Region', 'Lower Gangetic Plains Region', 'Middle Gangetic Plains Region', 'Upper Gangetic Plains Region', 'Trans-Gangetic Plains Region', 'Eastern Plateau and Hills Region', 'Central Plateau and Hills Region', 'Western Plateau and Hills Region', 'Southern Plateau and Hills Region', 'East Coast Plains and Hills Region', 'West Coast Plains and Ghat Region', 'Gujarat Plains and Hills Region', 'Western Dry Region'
]

# Normalize the entire dataset
normalized_data = normalize(your_dataset)

# Create a DataFrame with the normalized data
scaled_df = pd.DataFrame(normalized_data, columns=your_dataset.columns)

# Calculate the Euclidean distance matrix
euclidean_dissimilarity_matrix = pairwise_distances(scaled_df.values, metric='euclidean')

# Append the line to create the dissimilarity DataFrame with labeled rows and columns
euclidean_dissimilarity_df = pd.DataFrame(euclidean_dissimilarity_matrix, index=agroclimate_zone_names, columns=agroclimate_zone_names)

# Print or use the dissimilarity matrix as needed
print(euclidean_dissimilarity_df)


In [None]:
import pandas as pd
from sklearn.preprocessing import normalize
from sklearn.metrics import pairwise_distances

# Assuming you have your dissimilarity matrix as euclidean_dissimilarity_matrix
# Make sure it's 13x13 if you have 14 agroclimate zones

# Replace 'Sheet1' with the name of the sheet you want to load
sheet_name = 'Sheet5'

# Load a specific sheet from your Excel file
your_dataset = pd.read_excel(r'E:\recommendation.xlsx', sheet_name=sheet_name)

# Create a list of agroclimate zone names
agroclimate_zone_names = [
    'Western Himalayan Region',
    'Eastern Himalayan Region',
    'Lower Gangetic Plains Region',
    'Middle Gangetic Plains Region',
    'Upper Gangetic Plains Region',
    'Trans-Gangetic Plains Region',
    'Eastern Plateau and Hills Region',
    'Central Plateau and Hills Region',
    'Western Plateau and Hills Region',
    'Southern Plateau and Hills Region',
    'East Coast Plains and Hills Region',
    'West Coast Plains and Ghat Region',
    'Gujarat Plains and Hills Region',
    'Western Dry Region'
]

# Normalize the entire dataset
normalized_data = normalize(your_dataset)

# Create a DataFrame with the normalized data
scaled_df = pd.DataFrame(normalized_data, columns=your_dataset.columns)

# Calculate the Euclidean distance matrix
euclidean_dissimilarity_matrix = pairwise_distances(scaled_df.values, metric='euclidean')

# Convert the dissimilarity matrix to a DataFrame with labeled rows and columns
euclidean_dissimilarity_df = pd.DataFrame(euclidean_dissimilarity_matrix, index=agroclimate_zone_names, columns=agroclimate_zone_names)

# Print or use the dissimilarity matrix as needed
print(euclidean_dissimilarity_df)


- The dataset contains historical records of annual and seasonal rainfall, as well as temperature data for different regions. These records span several years, allowing us to establish long-term climate patterns.
- By calculating dissimilarity metrics like Euclidean distance between these climate variables for all pairs of zones, we can quantify how different or similar the agroclimatic conditions are between these zones.
- Agroclimatic zones that are closer in the dissimilarity matrix have more similar climate conditions, while those further apart are more dissimilar.
- This dissimilarity matrix serves as the foundation for the recommendation engine because it helps identify zones that are climatically dissimilar to the user's current zone. Recommending farms in dissimilar zones reduces the risk associated with adverse local weather conditions.


In [None]:
import pandas as pd
from sklearn.preprocessing import StandardScaler

# Load the dataset from the Excel file (assuming it's in sheet5)
sheet_name = 'Sheet5'
file_path = r'E:\recommendation.xlsx'  # Replace with your file path
data = pd.read_excel(file_path, sheet_name=sheet_name)

# Create a StandardScaler instance
scaler = StandardScaler()

# Standardize all columns
data = scaler.fit_transform(data)

# Now, 'data' contains your standardized dataset with all columns standardized
# You can save it to a new Excel file or use it for further analysis


In [None]:
import pandas as pd
from sklearn.preprocessing import StandardScaler

# Load your dataset from Excel, assuming your data starts from the 1st column
sheet_name = 'Sheet5'  # Replace with the name of your sheet
data = pd.read_excel(r'E:\recommendation.xlsx', sheet_name=sheet_name)

# Select columns for standardization (assuming columns 1 to 9)

# Initialize the StandardScaler
scaler = StandardScaler()

# Fit and transform the selected columns
standardized_data = scaler.fit_transform(data)

# Create a DataFrame with the standardized data

# Print the standardized data
print(standardized_data)


# Dissimilarity Matrix

In [None]:
import pandas as pd
from sklearn.metrics import pairwise_distances

# Load agroclimate zone names
agroclimate_zone_names = ['Western Himalayan Region', 'Eastern Himalayan Region', 'Lower Gangetic Plains Region', 'Middle Gangetic Plains Region', 'Upper Gangetic Plains Region', 'Trans-Gangetic Plains Region', 'Eastern Plateau and Hills Region', 'Central Plateau and Hills Region', 'Western Plateau and Hills Region', 'Southern Plateau and Hills Region', 'East Coast Plains and Hills Region', 'West Coast Plains and Ghat Region', 'Gujarat Plains and Hills Region', 'Western Dry Region']

# Assuming you already have standardized data in 'standardized_df', replace this with your DataFrame
# Calculate pairwise Euclidean distances between rows (samples)
standardized_df = pd.DataFrame(standardized_data, columns=data.columns)
euclidean_dissimilarity_matrix = pairwise_distances(standardized_df.values, metric='euclidean')

# Create a DataFrame with the dissimilarity matrix
euclidean_dissimilarity_df = pd.DataFrame(euclidean_dissimilarity_matrix, index=agroclimate_zone_names, columns=agroclimate_zone_names)

# Print or use the dissimilarity matrix as needed
print(euclidean_dissimilarity_df)


In [None]:
import matplotlib.pyplot as plt

# Load the image
image = plt.imread(r"C:\Users\hp\Pictures\Screenshots\dataset.png")
fig = plt.figure(figsize=(10, 10), dpi=300)
# Display the image
plt.imshow(image)
plt.axis('off')


# Show the figure
plt.show()

The objective is to create a structured 14x14 dissimilarity matrix representing the differences between 14 agroclimate zones. Each cell in the matrix signifies the dissimilarity score between two zones, with higher values indicating greater dissimilarity

# Agro-Climatic Zone Classification

 We've created a dictionary that maps states to agro-climatic zones. When a user inputs their state, this module provides the corresponding agro-climatic zone.

In [None]:
# Create a dictionary to map states to agro-climatic zones
state_to_zone = {
    'Jammu and Kashmir': 'Western Himalayan Region',
    'Uttar Pradesh': 'Western Himalayan Region',
    'Assam': 'Eastern Himalayan Region',
    'Sikkim': 'Eastern Himalayan Region',
    'West Bengal': 'Eastern Himalayan Region',
    'Arunachal Pradesh': 'Eastern Himalayan Region',
    'Nagaland': 'Eastern Himalayan Region',
    'Manipur': 'Eastern Himalayan Region',
    'Mizoram': 'Eastern Himalayan Region',
    'Tripura': 'Eastern Himalayan Region',
    'Meghalaya': 'Eastern Himalayan Region',
    'Bihar': 'Middle Gangetic Plains Region',
    'Punjab': 'Trans-Gangetic Plains Region',
    'Haryana': 'Trans-Gangetic Plains Region',
    'Delhi': 'Trans-Gangetic Plains Region',
    'Rajasthan': 'Trans-Gangetic Plains Region',
    'Maharashtra': 'Western Plateau and Hills Region',
    'Madhya Pradesh': 'Western Plateau and Hills Region',
    'Andhra Pradesh': 'Southern Plateau and Hills Region',
    'Karnataka': 'Southern Plateau and Hills Region',
    'Tamil Nadu': 'Southern Plateau and Hills Region',
    'Odisha': 'East Coast Plains and Hills Region',
    'Goa': 'West Coast Plains and Ghat Region',
    'Kerala': 'West Coast Plains and Ghat Region',
    'Gujarat': 'Gujarat Plains and Hills Region',
    'Rajasthan': 'Western Dry Region',
    'Andaman and Nicobar': 'The Islands Region',
    'Lakshadweep': 'The Islands Region'
}

# Input the state name
state_name = input("Enter the name of the state: ")

# Check if the entered state is in the mapping dictionary
if state_name in state_to_zone:
    agro_climatic_zone = state_to_zone[state_name]
    print(f"{agro_climatic_zone}")
else:
    print(f"Agro-climatic zone information not found for {state_name}")


The user enters their current state of residence or farming location into their SeedShare profile. Upon user profile update, the SeedShare backend system automatically determines the specific agroclimate zone corresponding to the provided state using above logic.

In [None]:
# Define the target agroclimate zone for which you want recommendations
target_zone = input("Enter your agroclimate zone: ")  # Get the desired zone from the user

# Get the number of recommendations (top N) from the user
top_n = int(input("Enter the number of recommendations (top N): "))  # Get N from the user

# Get the dissimilarity values for the target zone
dissimilarity_values = euclidean_dissimilarity_df[target_zone]

# Sort the dissimilarity values in descending order and get the top N recommendations
top_recommendations = dissimilarity_values.sort_values(ascending=False).head(top_n)

# Print the top N recommendations
print(f"Top {top_n} Recommendations for '{target_zone}':")
for zone, dissimilarity in top_recommendations.items():
    print(f"{zone}: Dissimilarity = {dissimilarity}")


In [None]:
# Define the target agroclimate zone for which you want recommendations
target_zone = input("Enter your agroclimate zone: ")  # Get the desired zone from the user

# Get the number of recommendations (top N) from the user
top_n = int(input("Enter the number of recommendations (top N): "))  # Get N from the user

# Get the dissimilarity values for the target zone
dissimilarity_values = euclidean_dissimilarity_df[target_zone]

# Sort the dissimilarity values in descending order and get the top N recommendations
top_recommendations = dissimilarity_values.sort_values(ascending=False).head(top_n)

# Load the original dataset (assuming it's in the 'recommendation' sheet)
original_dataset = pd.read_excel(r'E:\recommendation.xlsx', sheet_name='recommendation')

# Display the records of the top N recommended zones
print(f"Top Recommendations for '{target_zone}':")
for zone, dissimilarity in top_recommendations.items():
    recommended_zone_records = original_dataset[original_dataset['AgroClimateZone'] == zone]
    print(recommended_zone_records)
    print("\n")


In SeedShare, recommendations are ranked using a variety of criteria, including historical crop yields, pricing factors (ask and bid prices), location-specific conditions (soil quality, water availability), user preferences (organic farming, crop choices), feedback and ratings from other users, availability of farms/resources, and machine learning algorithms tailored to individual farmer profiles.


In [None]:
# Define the target agroclimate zone for which you want recommendations
target_zone = input("Enter your agroclimate zone: ")  # Get the desired zone from the user

# Get the number of recommendations (top N) from the user
top_n = int(input("Enter the top N dissimilar zones you want: "))  # Get N from the user

# Get the dissimilarity values for the target zone
dissimilarity_values = euclidean_dissimilarity_df[target_zone]

# Sort the dissimilarity values in descending order and get the top N recommendations
top_recommendations = dissimilarity_values.sort_values(ascending=False).head(top_n)

# Load the original dataset (assuming it's in the 'recommendation' sheet)
original_dataset = pd.read_excel(r'E:\recommendation.xlsx', sheet_name='recommendation')

# Initialize an empty DataFrame to store records of the top N recommended zones
recommended_records = pd.DataFrame()

# Iterate through the top N recommended zones
for zone in top_recommendations.index:
    # Filter records for the current zone
    zone_records = original_dataset[original_dataset['AgroClimateZone'] == zone]
    
    # Append the zone_records to the recommended_records DataFrame
    recommended_records = pd.concat([recommended_records, zone_records], ignore_index=True)

# Sort the recommended records by 'AverageYield' in descending order
recommended_records = recommended_records.sort_values(by='AverageYield', ascending=False)

# Ask the user how many records they want to see
num_records_to_display = int(input("Enter the number of records to display: "))

# Display the desired number of records based on yield ranking
if num_records_to_display <= len(recommended_records):
    top_records = recommended_records.reset_index(drop=True).head(num_records_to_display)
    print(f"Top {num_records_to_display} Records for '{target_zone}' based on Yield:")
    print(top_records)
else:
    print("Invalid number of records to display.")


# Final output: Recommendation Feed

In [None]:
# ... (Previous code)

# Sort the recommended records by 'AverageYield' in descending order
# Define the target agroclimate zone for which you want recommendations
target_zone = input("Enter your agroclimate zone: ")  # Get the desired zone from the user

# Get the number of recommendations (top N) from the user
top_n = int(input("Enter the top N dissimilar zones you want: "))  # Get N from the user

# Get the dissimilarity values for the target zone
dissimilarity_values = euclidean_dissimilarity_df[target_zone]

# Sort the dissimilarity values in descending order and get the top N recommendations
top_recommendations = dissimilarity_values.sort_values(ascending=False).head(top_n)

# Load the original dataset (assuming it's in the 'recommendation' sheet)
original_dataset = pd.read_excel(r'E:\recommendation.xlsx', sheet_name='recommendation')

# Initialize an empty DataFrame to store records of the top N recommended zones
recommended_records = pd.DataFrame()

# Iterate through the top N recommended zones
for zone, _ in top_recommendations.items():
    # Filter records for the current zone
    zone_records = original_dataset[original_dataset['AgroClimateZone'] == zone]
    
    # Append the zone_records to the recommended_records DataFrame
    recommended_records = pd.concat([recommended_records, zone_records], ignore_index=True)

recommended_records = recommended_records.sort_values(by='AverageYield', ascending=False)

# Ask the user to enter a bid price
bidprice = float(input("Enter your bid price: "))

# Filter records where askprice is less than or equal to bidprice
filtered_records = recommended_records[recommended_records['askprice'] <= bidprice]

# Sort the filtered records by 'askprice' in ascending order
sorted_records = filtered_records.sort_values(by='askprice', ascending=True)

# Ask the user how many records they want to see
num_records_to_display = int(input("Enter the number of records to display: "))

# Display the desired number of records
if num_records_to_display <= len(sorted_records):
    top_records = sorted_records.reset_index(drop=True).head(num_records_to_display)
    print(f"Top {num_records_to_display} Records for '{target_zone}' based on Yield and Ask Price (<= {bidprice}):")
    print(top_records)
else:
    print("Invalid number of records to display.")


 Interpretation of the Heatmap:
   - The heatmap helps in quickly identifying which agroclimate zones are dissimilar to each other.
   - Zones that have higher dissimilarity values (warmer colors) are considered more different in terms of farming conditions.


In [None]:
import numpy as np
import matplotlib.pyplot as plt

# Your dissimilarity matrix
dissimilarity_matrix = [
    [0.000000, 4.899886, 6.995954, 6.410329, 6.464020, 6.348189, 5.995814, 5.982116, 6.941320, 7.539730, 7.623363, 7.163017, 6.809403, 7.059275],
    [4.899886, 0.000000, 4.126332, 4.809205, 5.522054, 6.091541, 3.753073, 4.812604, 5.332584, 6.169238, 5.245247, 3.319520, 5.242065, 6.674284],
    [6.995954, 4.126332, 0.000000, 2.003805, 2.942979, 3.862565, 1.521787, 2.355930, 2.646746, 3.484249, 2.568880, 4.258436, 2.397933, 3.813717],
    [6.410329, 4.809205, 2.003805, 0.000000, 1.027308, 2.084376, 1.954008, 1.226281, 2.691307, 4.098637, 3.970358, 5.697032, 2.159548, 2.134476],
    [6.464020, 5.522054, 2.942979, 1.027308, 0.000000, 1.121012, 2.878831, 1.767628, 3.351841, 4.597447, 4.618695, 6.614956, 2.826156, 1.487923],
    [6.348189, 6.091541, 3.862565, 2.084376, 1.121012, 0.000000, 3.790949, 2.642658, 4.225918, 5.160587, 5.207088, 7.413737, 3.746555, 1.662068],
    [5.995814, 3.753073, 1.521787, 1.954008, 2.878831, 3.790949, 0.000000, 1.516726, 1.783612, 3.235368, 3.139302, 4.008638, 1.630482, 3.570693],
    [5.982116, 4.812604, 2.355930, 1.226281, 1.767628, 2.642658, 1.516726, 0.000000, 1.811445, 3.468887, 3.914007, 5.452451, 1.310691, 2.167107],
    [6.941320, 5.332584, 2.646746, 2.691307, 3.351841, 4.225918, 1.783612, 1.811445, 0.000000, 2.463975, 3.450073, 4.862271, 0.966578, 3.289849],
    [7.539730, 6.169238, 3.484249, 4.098637, 4.597447, 5.160587, 3.235368, 3.468887, 2.463975, 0.000000, 2.421785, 5.306395, 2.863766, 4.345274],
    [7.623363, 5.245247, 2.568880, 3.970358, 4.618695, 5.207088, 3.139302, 3.914007, 3.450073, 2.421785, 0.000000, 4.350234, 3.663523, 4.991659],
    [7.163017, 3.319520, 4.258436, 5.697032, 6.614956, 7.413737, 4.008638, 5.452451, 4.862271, 5.306395, 4.350234, 0.000000, 5.254454, 7.414738],
    [6.809403, 5.242065, 2.397933, 2.159548, 2.826156, 3.746555, 1.630482, 1.310691, 0.966578, 2.863766, 3.663523, 5.254454, 0.000000, 2.854312],
    [7.059275, 6.674284, 3.813717, 2.134476, 1.487923, 1.662068, 3.570693, 2.167107, 3.289849, 4.345274, 4.991659, 7.414738, 2.854312, 0.000000]
]

# Create a heatmap
plt.figure(figsize=(10, 8))
plt.imshow(dissimilarity_matrix, cmap='coolwarm', interpolation='nearest')
plt.colorbar(label='Dissimilarity')
plt.title('Dissimilarity Matrix Heatmap')
plt.xticks(range(len(dissimilarity_matrix)), range(1, len(dissimilarity_matrix) + 1))
plt.yticks(range(len(dissimilarity_matrix)), range(1, len(dissimilarity_matrix) + 1))
plt.show()


Why is SeedShare's Recommendation Model Novel?

1. Agroclimate Zone Approach: SeedShare's model introduces the concept of agroclimate zones to tailor recommendations. This approach takes into account regional climate, rainfall, and temperature data to provide highly relevant recommendations.

2. Dissimilarity Matrix: The model's use of dissimilarity matrices is a novel and effective way to identify suitable farming alternatives. It ensures that recommended farms are in regions with characteristics opposite to the farmer's current location, enhancing risk mitigation.

3. Customization: SeedShare's model goes beyond generic recommendations. It offers farmers specific guidance on crops and strategies that align with their individual needs and preferences.
