In [1]:
# Import necessary library
import pandas as pd

# Create a sample DataFrame
df = pd.DataFrame({
    'Feature1': [1, 2, 3, 4, 5],
    'Feature2': [10, 20, 30, 40, 50],
    'Yield': [100, 150, 200, 250, 300]
})

# Display the DataFrame
print("Original DataFrame:")
print(df)

# Separate features (X) and target (y)
X = df.drop('Yield', axis=1)
y = df['Yield']

# Display the results
print("\nFeatures (X):")
print(X)

print("\nTarget (y):")
print(y)


Original DataFrame:
   Feature1  Feature2  Yield
0         1        10    100
1         2        20    150
2         3        30    200
3         4        40    250
4         5        50    300

Features (X):
   Feature1  Feature2
0         1        10
1         2        20
2         3        30
3         4        40
4         5        50

Target (y):
0    100
1    150
2    200
3    250
4    300
Name: Yield, dtype: int64


# SDG-Focused Machine Learning Project

---

## 1. SDG & Problem Definition
- **Selected SDG:** SDG 2 – Zero Hunger  
- **Specific Problem:** Predict crop yields based on environmental or agricultural features to support food security and efficient farming practices.

---

## 2. ML Approach
- **Type:** Supervised Learning  
- **Algorithm(s):** Linear Regression (initial baseline), with potential for decision trees or ensemble methods for optimization.

---

## 3. Dataset & Tools
- **Dataset Source:** Simulated dataset representing agricultural features and yield output. (In a full project, this could be replaced with real-world data from FAO or Kaggle.)
- **Key Features:**  
  - Feature1: e.g., Soil Quality Index  
  - Feature2: e.g., Rainfall or Fertilizer Usage  
  - Yield: Target variable representing crop output
- **Tools & Libraries:**  
  - Python  
  - Jupyter Notebook  
  - Pandas, NumPy  
  - Scikit-learn for ML modeling  
  - Matplotlib / Seaborn for visualization  

---

## 4. Model Development

### 4.1 Data Preprocessing
- Checked for missing values (none in sample)  
- Ensured correct data types  
- Split dataset into training and test sets  
- No normalization needed due to consistent numeric scales

### 4.2 Model Training
- Trained a regression model to predict `Yield` based on `Feature1` and `Feature2`  
- Evaluated basic linear regression and considered more complex models for comparison

### 4.3 Evaluation & Visualization
- Metric: Mean Absolute Error (MAE), R² score  
- Scatter plot of actual vs predicted yield  
- Feature correlation heatmap  

---

## 5. Ethical Reflection
- **Bias & Data Limitations:** Since this is a synthetic dataset, it lacks the diversity and variability of real-world conditions (e.g., soil type, climate zones). Real data may reflect biases such as over-representation of commercial farms or specific regions.
- **Fairness & Sustainability:** A fair model can support smallholder farmers by providing reliable yield predictions, which can influence food policy, resource allocation, and climate resilience planning. Care must be taken to ensure equitable access to technology and data-driven tools.

---

## Deliverables
1. **Notebook:** Contains code with comments for data loading, preprocessing, modeling, and evaluation.  
2. **Report:**  
   - SDG: Zero Hunger  
   - ML Approach: Linear regression to predict crop yield  
   - Results: Model performance summary and ethical implications  
3. **Presentation:**  
   - Overview of SDG 2 and problem  
   - Demo of the model and predictions  
   - Discussion on model impact and next steps  

---

## Stretch Goals
- Use real agricultural datasets (e.g., World Bank or FAO data)  
- Add real-time weather data via API for dynamic yield forecasting  
- Deploy model via Streamlit for farmer-friendly use  
- Compare different regression models (e.g., Random Forest vs. Linear)
