# QCTO - Workplace Module

### Project Title: Regional Vegetable Price Analysis: Trends and Seasonality
#### Done By: Abel Masotla

© ExploreAI 2024

---

## Table of Contents

<a href=#BC> Background Context</a>

<a href=#one>1. Importing Packages</a>

<a href=#two>2. Data Collection and Description</a>

<a href=#three>3. Loading Data </a>

<a href=#four>4. Data Cleaning and Filtering</a>

<a href=#five>5. Exploratory Data Analysis (EDA)</a>

<a href=#six>6. Modeling </a>

<a href=#seven>7. Evaluation and Validation</a>

<a href=#eight>8. Final Model</a>

<a href=#nine>9. Conclusion and Future Work</a>

<a href=#ten>10. References</a>

---
 <a id="BC"></a>
## **Background Context**
<a href=#cont>Back to Table of Contents</a>


**Purpose:**  
This project aims to analyze vegetable prices across different regions using data from an authorized source ([Agmarknet](https://agmarknet.gov.in/)). The main objective is to explore pricing trends, identify patterns, and understand the factors influencing price fluctuations over time.

**Problem Domain:**  
Vegetable prices are highly variable, influenced by several factors such as regional availability, seasonality, weather conditions, and market demand. These fluctuations can impact both consumers and producers, making it crucial to analyze the trends to predict future price changes and minimize uncertainties.

The project seeks to address the following key questions:
- What are the **average price trends** for specific vegetables across different regions over time?
- Are there any **patterns of price volatility** or sudden changes that occur frequently?
- How does **seasonality** influence the prices of various vegetables, and can it be predicted?
- What role do **regional factors** play in pricing differences across different geographical areas?

**Significance:**  
Understanding vegetable pricing trends can offer insights to:
- **Consumers**, by helping them plan their purchases based on expected price changes.
- **Farmers**, by enabling them to optimize their planting and selling strategies to maximize profits.
- **Policymakers**, by providing data to stabilize prices and avoid extreme volatility in the market.


---
<a href=#one></a>
## **Importing Packages**
<a href=#cont>Back to Table of Contents</a>

* **Purpose:** Set up the Python environment with necessary libraries and tools.
* **Details:** List and import all the Python packages that will be used throughout the project such as Pandas for data manipulation, Matplotlib/Seaborn for visualization, scikit-learn for modeling, etc.
---

In [4]:
# Import necessary libraries
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

---
<a href=#two></a>
## **Data Collection and Description**
<a href=#cont>Back to Table of Contents</a>

**Purpose:**  
The dataset used in this project is sourced from Kaggle, which cites Agmarknet ([Agmarknet](https://agmarknet.gov.in/)) as the original source for the vegetable price data. The dataset provides extensive details on vegetable prices across various regions in India, collected over a specified time period.

**Data Characteristics:**
- **Size:** The dataset contains thousands of rows, each representing a vegetable price record.
- **Scope:** Prices of various vegetables in multiple regions across different time periods.
- **Data Types:** 
  - Categorical (vegetable names, region)
  - Numerical (price)
  - Temporal (dates)
---

In [7]:
#Please use code cells to code in and do not forget to comment your code.

---
<a href=#three></a>
## **Loading Data**
<a href=#cont>Back to Table of Contents</a>

* **Purpose:** Load the data into the notebook for manipulation and analysis.
* **Details:** Show the code used to load the data and display the first few rows to give a sense of what the raw data looks like.
---

In [13]:
# Load the CSV file
df = pd.read_csv("prices.csv")

# Display the first few rows
df.head(30)


Unnamed: 0,Price Dates,Bhindi (Ladies finger),Tomato,Onion,Potato,Brinjal,Garlic,Peas,Methi,Green Chilli,Elephant Yam (Suran)
0,01-01-2023,35.0,18,22.0,20,30,50,25,8,45.0,25
1,02-01-2023,35.0,16,22.0,20,30,55,25,7,40.0,25
2,03-01-2023,35.0,16,21.0,20,30,55,25,7,40.0,25
3,04-01-2023,30.0,16,21.0,22,25,55,25,7,40.0,25
4,08-01-2023,35.0,16,20.0,21,25,55,22,6,35.0,25
5,11-01-2023,35.0,16,18.0,24,25,55,23,6,35.0,30
6,12-01-2023,40.0,16,18.0,22,30,65,23,6,35.0,30
7,15-01-2023,42.0,16,17.0,22,25,65,23,7,35.0,30
8,17-01-2023,35.0,16,18.0,22,30,65,22,10,40.0,35
9,22-01-2023,45.0,16,18.0,22,40,65,25,9,40.0,35


In [12]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 287 entries, 0 to 286
Data columns (total 11 columns):
 #   Column                  Non-Null Count  Dtype  
---  ------                  --------------  -----  
 0   Price Dates             287 non-null    object 
 1   Bhindi (Ladies finger)  287 non-null    float64
 2   Tomato                  287 non-null    int64  
 3   Onion                   287 non-null    float64
 4   Potato                  287 non-null    int64  
 5   Brinjal                 287 non-null    int64  
 6   Garlic                  287 non-null    int64  
 7   Peas                    287 non-null    int64  
 8   Methi                   287 non-null    int64  
 9   Green Chilli            287 non-null    float64
 10  Elephant Yam (Suran)    287 non-null    int64  
dtypes: float64(3), int64(7), object(1)
memory usage: 24.8+ KB


---
<a href=#four></a>
## **Data Cleaning and Filtering**
<a href=#cont>Back to Table of Contents</a>

* **Purpose:** Prepare the data for analysis by cleaning and filtering.
* **Details:** Include steps for handling missing values, removing outliers, correcting errors, and possibly reducing the data (filtering based on certain criteria or features).
---

In [15]:
# Convert 'Price Dates' to datetime
df['Price Dates'] = pd.to_datetime(df['Price Dates'], format='%d-%m-%Y')

# Handle missing values
print("Missing values before cleaning:")
print(df.isnull().sum())
df.dropna(inplace=True)  # Drop rows with missing values



# Handle outliers using the IQR method for all numerical columns
for column in df.select_dtypes(include=['float64', 'int64']).columns:
    Q1 = df[column].quantile(0.25)
    Q3 = df[column].quantile(0.75)
    IQR = Q3 - Q1
    lower_bound = Q1 - 1.5 * IQR
    upper_bound = Q3 + 1.5 * IQR
    df = df[(df[column] >= lower_bound) & (df[column] <= upper_bound)]


# Correct errors in data types
df['Price Dates'] = pd.to_datetime(df['Price Dates'], format='%d-%m-%Y')
numeric_columns = ['Bhindi (Ladies finger)', 'Tomato', 'Onion', 'Potato', 'Brinjal', 
                   'Garlic', 'Peas', 'Methi', 'Green Chilli', 'Elephant Yam (Suran)']
df[numeric_columns] = df[numeric_columns].apply(pd.to_numeric, errors='coerce')

# Reset index after filtering
df.reset_index(drop=True, inplace=True)

# Summary of cleaned data
print(df.info())
print(df.describe())
print("Data after cleaning:")

Missing values before cleaning:
Price Dates               0
Bhindi (Ladies finger)    0
Tomato                    0
Onion                     0
Potato                    0
Brinjal                   0
Garlic                    0
Peas                      0
Methi                     0
Green Chilli              0
Elephant Yam (Suran)      0
dtype: int64
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 169 entries, 0 to 168
Data columns (total 11 columns):
 #   Column                  Non-Null Count  Dtype         
---  ------                  --------------  -----         
 0   Price Dates             169 non-null    datetime64[ns]
 1   Bhindi (Ladies finger)  169 non-null    float64       
 2   Tomato                  169 non-null    int64         
 3   Onion                   169 non-null    float64       
 4   Potato                  169 non-null    int64         
 5   Brinjal                 169 non-null    int64         
 6   Garlic                  169 non-null    int64         
 7

---
<a href=#five></a>
## **Exploratory Data Analysis (EDA)**
<a href=#cont>Back to Table of Contents</a>

* **Purpose:** Explore and visualize the data to uncover patterns, trends, and relationships.
* **Details:** Use statistics and visualizations to explore the data. This may include histograms, box plots, scatter plots, and correlation matrices. Discuss any significant findings.
---


In [None]:
#Please use code cells to code in and do not forget to comment your code.

---
<a href=#six></a>
## **Modeling**
<a href=#cont>Back to Table of Contents</a>

* **Purpose:** Develop and train predictive or statistical models.
* **Details:** Describe the choice of models, feature selection and engineering processes, and show how the models are trained. Include code for setting up the models and explanations of the model parameters.
---


In [None]:
#Please use code cells to code in and do not forget to comment your code.

---
<a href=#seven></a>
## **Evaluation and Validation**
<a href=#cont>Back to Table of Contents</a>

* **Purpose:** Evaluate and validate the effectiveness and accuracy of the models.
* **Details:** Present metrics used to evaluate the models, such as accuracy, precision, recall, F1-score, etc. Discuss validation techniques employed, such as cross-validation or train/test split.
---

In [None]:
#Please use code cells to code in and do not forget to comment your code.

---
<a href=#eight></a>
## **Final Model**
<a href=#cont>Back to Table of Contents</a>

* **Purpose:** Present the final model and its performance.
* **Details:** Highlight the best-performing model and discuss its configuration, performance, and why it was chosen over others.
---


In [None]:
#Please use code cells to code in and do not forget to comment your code.

---
<a href=#nine></a>
## **Conclusion and Future Work**
<a href=#cont>Back to Table of Contents</a>

* **Purpose:** Summarize the findings and discuss future directions.
* **Details:** Conclude with a summary of the results, insights gained, limitations of the study, and suggestions for future projects or improvements in methodology or data collection.
---


In [None]:
#Please use code cells to code in and do not forget to comment your code.

---
<a href=#ten></a>
## **References**
<a href=#cont>Back to Table of Contents</a>

* **Purpose:** Provide citations and sources of external content.
* **Details:** List all the references and sources consulted during the project, including data sources, research papers, and documentation for tools and libraries used.
---

In [None]:
#Please use code cells to code in and do not forget to comment your code.

## Additional Sections to Consider

* ### Appendix: 
For any additional code, detailed tables, or extended data visualizations that are supplementary to the main content.

* ### Contributors: 
If this is a group project, list the contributors and their roles or contributions to the project.
