# KropKart AI: Crop Quality & Price Prediction Model

This notebook contains the complete code for training a machine learning model that:
1. Analyzes **crop quality** based on natural language descriptions.
2. Predicts **market prices** based on crop species and its calculated quality score.

### Tech Stack:
- **Scikit-Learn**: For the ML Pipeline
- **Pandas/Numpy**: For data manipulation
- **Joblib**: For model serialization

In [None]:
import pandas as pd
import numpy as np
from sklearn.model_selection import train_test_split
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.ensemble import RandomForestRegressor
from sklearn.pipeline import Pipeline
from sklearn.metrics import mean_absolute_error
import joblib

print("âœ… Libraries imported successfully!")

## 1. Data Generation (Synthetic Dataset)
We create a dataset that represents a variety of crops, qualities, and descriptions.

In [None]:
data = {
    'description': [
        'Premium organic Basmati rice, aged for 2 years, Grade A quality',
        'Standard wheat, slightly high moisture, harvested last week',
        'Pure organic cotton, long staple length, export quality',
        'Fresh red onions, medium size, standard market grade',
        'Alphonso mangoes, organic certified, premium sweetness',
        'Yellow maize for poultry feed, standard quality',
        'Super fine Sharbati wheat, premium grade, no impurities',
        'Organic turmeric fingers, high curcumin content',
        'Potatoes, regular size, some soil attached, fair quality',
        'Green moong dal, organic, high protein, polished grade'
    ] * 50, 
    'crop_type': ['Rice', 'Wheat', 'Cotton', 'Onion', 'Mango', 'Maize', 'Wheat', 'Turmeric', 'Potato', 'Pulse'] * 50,
    'quality_score': [0.95, 0.65, 0.90, 0.70, 0.98, 0.60, 0.92, 0.88, 0.55, 0.85] * 50,
    'price_per_quintal': [6500, 2200, 7500, 1800, 12000, 2100, 2600, 8500, 1200, 7200] * 50
}

df = pd.DataFrame(data)
# Add random noise to prices to make the regression task learning-oriented
df['price_per_quintal'] = df['price_per_quintal'] * (1 + np.random.uniform(-0.1, 0.1, len(df)))
df.head()

## 2. Model 1: AI Quality Analysis (NLP)
This pipeline converts text to numbers using **TF-IDF** and uses **Random Forest** to predict the score.

In [None]:
X_text = df['description']
y_quality = df['quality_score']

X_train, X_test, y_train, y_test = train_test_split(X_text, y_quality, test_size=0.2, random_state=42)

quality_pipeline = Pipeline([
    ('tfidf', TfidfVectorizer(stop_words='english')),
    ('regressor', RandomForestRegressor(n_estimators=100, random_state=42))
])

quality_pipeline.fit(X_train, y_train)
score_preds = quality_pipeline.predict(X_test)

print(f"ðŸ“Š Quality Prediction Error (MAE): {mean_absolute_error(y_test, score_preds):.4f}")

## 3. Model 2: Price Estimation Model
Predicting fair market value based on Crop Category and Quality Score.

In [None]:
# One-Hot Encoding for Crop Categories
df_price = pd.get_dummies(df, columns=['crop_type'])
X_p = df_price.drop(['description', 'price_per_quintal'], axis=1)
y_p = df_price['price_per_quintal']

Xp_train, Xp_test, yp_train, yp_test = train_test_split(X_p, y_p, test_size=0.2, random_state=42)

price_model = RandomForestRegressor(n_estimators=200, random_state=42)
price_model.fit(Xp_train, yp_train)

price_preds = price_model.predict(Xp_test)
print(f"ðŸ’° Price Prediction Error: â‚¹{mean_absolute_error(yp_test, price_preds):.2f}")

## 4. Saving for Production
Save the trained models as `.pkl` files.

In [None]:
joblib.dump(quality_pipeline, 'ai_quality_model.pkl')
joblib.dump(price_model, 'ai_price_model.pkl')
print("ðŸ’¾ Models saved: ai_quality_model.pkl, ai_price_model.pkl")

## 5. Deployment Simulation
Test how the system analyzes a completely new listing.

In [None]:
def analyze_listing(desc):
    predicted_score = quality_pipeline.predict([desc])[0]
    
    if predicted_score > 0.85: grade = "Premium (Grade A+)"
    elif predicted_score > 0.65: grade = "Standard (Grade A)"
    else: grade = "Fair (Grade B)"
    
    return {
        "Input Description": desc,
        "AI Quality Score": f"{predicted_score*100:.2f}%",
        "Calculated Grade": grade
    }

new_crop = "Highly pure organic Sharbati wheat, no chemical fertilizers used, very low moisture"
print(analyze_listing(new_crop))