# QCTO - Workplace Module

### Project Title: VEG-PRICES
#### Done By: SIPHOSETHU RULULU

© ExploreAI 2024

---

## Table of Contents

<a href=#BC> Background Context</a>

<a href=#one>1. Importing Packages</a>

<a href=#two>2. Data Collection and Description</a>

<a href=#three>3. Loading Data </a>

<a href=#four>4. Data Cleaning and Filtering</a>

<a href=#five>5. Exploratory Data Analysis (EDA)</a>

<a href=#six>6. Modeling </a>

<a href=#seven>7. Evaluation and Validation</a>

<a href=#eight>8. Final Model</a>

<a href=#nine>9. Conclusion and Future Work</a>

<a href=#ten>10. References</a>

**INTRODUCTION**


The global agricultural sector is characterized by constant fluctuations in prices, influenced by various factors such as weather conditions, supply and demand dynamics, economic policies, and consumer preferences. Vegetable prices, in particular, are sensitive to these factors, often exhibiting significant volatility over short periods. In this report, we delve into the analysis of vegetable prices specifically for the years 2023 and 2024, aiming to shed light on the intricate dynamics of the vegetable market during this timeframe. By closely examining price data and identifying trends, fluctuations, and potential driving factors, this study seeks to provide stakeholders with valuable insights into short-term price dynamics and their implications. Understanding the factors influencing vegetable prices in 2023 and 2024 is crucial for stakeholders across the supply chain, including producers, distributors, retailers, policymakers, and consumers, as it enables informed decision-making and strategic planning in response to market conditions. Moreover, by focusing on a limited timeframe, this analysis allows for a more targeted exploration of short-term trends and provides a foundation for further research into the broader trends shaping the vegetable market in the years to come.

Problem Statement:

The agricultural sector, particularly vegetable prices, experiences notable fluctuations influenced by various factors like weather conditions, supply-demand dynamics, and economic policies. Understanding these dynamics is essential for stakeholders to make informed decisions. In this study, we aim to analyze vegetable prices in 2023 and 2024 to uncover trends and factors driving short-term price variations. By examining price data and identifying key influencers, we seek to provide valuable insights for stakeholders across the supply chain, enabling them to adapt strategies and make informed decisions amidst market uncertainties. This analysis not only offers a focused exploration of short-term trends but also serves as a foundation for broader market trend research in the future.

One of the most important part is securing all the required packages to avoid encountering errors and making sure that the dataset is imported and defined properly

In [1]:
import pandas as pd # importing the Pandas package with an alias, pd
import numpy as np # importing the Numpy package with an alias, np
import matplotlib.pyplot as plt # importing the Matplotlib.pyplo package with an alias, plt
import seaborn as sns # importing the Seaborn package with an alias, sns
import warnings
price_df = pd.read_csv("prices.csv")

Next up we check the DataFrame to see if it loaded correctly

In [2]:
price_df

Unnamed: 0,Price Dates,Bhindi (Ladies finger),Tomato,Onion,Potato,Brinjal,Garlic,Peas,Methi,Green Chilli,Elephant Yam (Suran)
0,01-01-2023,35.0,18,22.0,20,30,50,25,8,45.0,25
1,02-01-2023,35.0,16,22.0,20,30,55,25,7,40.0,25
2,03-01-2023,35.0,16,21.0,20,30,55,25,7,40.0,25
3,04-01-2023,30.0,16,21.0,22,25,55,25,7,40.0,25
4,08-01-2023,35.0,16,20.0,21,25,55,22,6,35.0,25
...,...,...,...,...,...,...,...,...,...,...,...
282,27-12-2023,45.0,16,30.0,20,70,260,40,16,40.0,25
283,28-12-2023,45.0,16,30.0,20,70,260,30,20,45.0,25
284,29-12-2023,45.0,16,30.0,22,80,260,30,18,50.0,25
285,31-12-2023,45.0,16,26.0,20,60,250,40,16,50.0,40


In [3]:
price_df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 287 entries, 0 to 286
Data columns (total 11 columns):
 #   Column                  Non-Null Count  Dtype  
---  ------                  --------------  -----  
 0   Price Dates             287 non-null    object 
 1   Bhindi (Ladies finger)  287 non-null    float64
 2   Tomato                  287 non-null    int64  
 3   Onion                   287 non-null    float64
 4   Potato                  287 non-null    int64  
 5   Brinjal                 287 non-null    int64  
 6   Garlic                  287 non-null    int64  
 7   Peas                    287 non-null    int64  
 8   Methi                   287 non-null    int64  
 9   Green Chilli            287 non-null    float64
 10  Elephant Yam (Suran)    287 non-null    int64  
dtypes: float64(3), int64(7), object(1)
memory usage: 24.8+ KB


# **Data Collection and Description**

Also known as data cleansing or data cleaning, refers to the process of identifying and correcting errors, inconsistencies, and inaccuracies in a dataset to improve its quality and reliability for analysis or other purposes. It involves several tasks aimed at ensuring that the data is accurate, complete, and consistent.

It is an essential step in the data analysis process as it helps improve the quality, reliability, and usability of the data, leading to more accurate and insightful analysis results. Below are some common tasks involved in data clean up:

Check for Missing Values AND Handle Missing Values: identify missing values in the DataFrame and you can choose to drop rows with missing values, fill them with a specific value, or use more advanced imputation techniques.
Check for Duplicates AND Handle Duplicates: Identify duplicate rows in the DataFrame and if they are found, you can choose to drop them.
Data Type Conversion: Ensure that the data types of columns are appropriate for analysis.
Check Data Consistency: Look for inconsistent or erroneous data entries and correct them manually if necessary.

In [4]:
#1.Check for missing inputs using the function isnull()
missing_values = price_df.isnull().sum()
print(missing_values)

Price Dates               0
Bhindi (Ladies finger)    0
Tomato                    0
Onion                     0
Potato                    0
Brinjal                   0
Garlic                    0
Peas                      0
Methi                     0
Green Chilli              0
Elephant Yam (Suran)      0
dtype: int64


In [5]:
#2.Check for duplicates using the function duplicated()
duplicates = price_df.duplicated().sum()
print("Number of duplicate rows:", duplicates)

Number of duplicate rows: 0


In [6]:
#3.Convert 'Price Dates' to datetime format
price_df['Price Dates'] = pd.to_datetime(price_df['Price Dates'], format = '%d-%m-%Y')
print(price_df.dtypes)

Price Dates               datetime64[ns]
Bhindi (Ladies finger)           float64
Tomato                             int64
Onion                            float64
Potato                             int64
Brinjal                            int64
Garlic                             int64
Peas                               int64
Methi                              int64
Green Chilli                     float64
Elephant Yam (Suran)               int64
dtype: object


In [7]:
# Get the unique price_dates
unique_price_dates = price_df['Price Dates'].unique()

# Count the number of unique price_dates
num_unique_price_dates = len(unique_price_dates)
print(num_unique_price_dates)

287


In [8]:
# Create a new column with just the year
price_df['Year'] = price_df['Price Dates'].dt.year

In [9]:
# Create a new column with just the month
price_df['Month'] = price_df['Price Dates'].dt.month

In [10]:
price_df

Unnamed: 0,Price Dates,Bhindi (Ladies finger),Tomato,Onion,Potato,Brinjal,Garlic,Peas,Methi,Green Chilli,Elephant Yam (Suran),Year,Month
0,2023-01-01,35.0,18,22.0,20,30,50,25,8,45.0,25,2023,1
1,2023-01-02,35.0,16,22.0,20,30,55,25,7,40.0,25,2023,1
2,2023-01-03,35.0,16,21.0,20,30,55,25,7,40.0,25,2023,1
3,2023-01-04,30.0,16,21.0,22,25,55,25,7,40.0,25,2023,1
4,2023-01-08,35.0,16,20.0,21,25,55,22,6,35.0,25,2023,1
...,...,...,...,...,...,...,...,...,...,...,...,...,...
282,2023-12-27,45.0,16,30.0,20,70,260,40,16,40.0,25,2023,12
283,2023-12-28,45.0,16,30.0,20,70,260,30,20,45.0,25,2023,12
284,2023-12-29,45.0,16,30.0,22,80,260,30,18,50.0,25,2023,12
285,2023-12-31,45.0,16,26.0,20,60,250,40,16,50.0,40,2023,12


# **Loading Data**

In [19]:
# Importing necessary libraries
import pandas as pd  # For data manipulation

# Loading data from a CSV file
# Replace 'your_data"C:\Users\Siphosethu\Downloads\veg_prices.csv')





# **Data Cleaning and Filtering**

In [12]:
unique_years = price_df['Year'].unique()

# Count the number of unique price_dates
num_unique_years = len(unique_years)
print(num_unique_years)
print(unique_years)

2
[2023 2024]


In [13]:
unique_months = price_df['Month'].unique()

# Count the number of unique price_dates
num_unique_months = len(unique_months)
print(num_unique_months)
print(unique_months)

12
[ 1  2  3  4  5  6  7  8  9 10 11 12]


In [14]:
import calendar
# Convert month numbers to month names
price_df['Month Names'] = price_df['Month'].apply(lambda x: calendar.month_name[x])

# Print the DataFrame to see the updated column
print(price_df)

    Price Dates  Bhindi (Ladies finger)  Tomato  Onion  Potato  Brinjal  \
0    2023-01-01                    35.0      18   22.0      20       30   
1    2023-01-02                    35.0      16   22.0      20       30   
2    2023-01-03                    35.0      16   21.0      20       30   
3    2023-01-04                    30.0      16   21.0      22       25   
4    2023-01-08                    35.0      16   20.0      21       25   
..          ...                     ...     ...    ...     ...      ...   
282  2023-12-27                    45.0      16   30.0      20       70   
283  2023-12-28                    45.0      16   30.0      20       70   
284  2023-12-29                    45.0      16   30.0      22       80   
285  2023-12-31                    45.0      16   26.0      20       60   
286  2024-01-01                    45.0      16    9.0      18       50   

     Garlic  Peas  Methi  Green Chilli  Elephant Yam (Suran)  Year  Month  \
0        50    25     

In [15]:
unique_month_nam = price_df['Month Names'].unique()
print(unique_month_nam)


['January' 'February' 'March' 'April' 'May' 'June' 'July' 'August'
 'September' 'October' 'November' 'December']


In [16]:
price_df.drop(columns= ['Month'],inplace = True)
price_df

Unnamed: 0,Price Dates,Bhindi (Ladies finger),Tomato,Onion,Potato,Brinjal,Garlic,Peas,Methi,Green Chilli,Elephant Yam (Suran),Year,Month Names
0,2023-01-01,35.0,18,22.0,20,30,50,25,8,45.0,25,2023,January
1,2023-01-02,35.0,16,22.0,20,30,55,25,7,40.0,25,2023,January
2,2023-01-03,35.0,16,21.0,20,30,55,25,7,40.0,25,2023,January
3,2023-01-04,30.0,16,21.0,22,25,55,25,7,40.0,25,2023,January
4,2023-01-08,35.0,16,20.0,21,25,55,22,6,35.0,25,2023,January
...,...,...,...,...,...,...,...,...,...,...,...,...,...
282,2023-12-27,45.0,16,30.0,20,70,260,40,16,40.0,25,2023,December
283,2023-12-28,45.0,16,30.0,20,70,260,30,20,45.0,25,2023,December
284,2023-12-29,45.0,16,30.0,22,80,260,30,18,50.0,25,2023,December
285,2023-12-31,45.0,16,26.0,20,60,250,40,16,50.0,40,2023,December


---
<a href=#five></a>
## **Exploratory Data Analysis (EDA)**
<a href=#cont>Back to Table of Contents</a>

* **Purpose:** Explore and visualize the data to uncover patterns, trends, and relationships.
* **Details:** Use statistics and visualizations to explore the data. This may include histograms, box plots, scatter plots, and correlation matrices. Discuss any significant findings.
---


In [None]:
#Please use code cells to code in and do not forget to comment your code.

---
<a href=#six></a>
## **Modeling**
<a href=#cont>Back to Table of Contents</a>

* **Purpose:** Develop and train predictive or statistical models.
* **Details:** Describe the choice of models, feature selection and engineering processes, and show how the models are trained. Include code for setting up the models and explanations of the model parameters.
---


In [None]:
#Please use code cells to code in and do not forget to comment your code.

---
<a href=#seven></a>
## **Evaluation and Validation**
<a href=#cont>Back to Table of Contents</a>

* **Purpose:** Evaluate and validate the effectiveness and accuracy of the models.
* **Details:** Present metrics used to evaluate the models, such as accuracy, precision, recall, F1-score, etc. Discuss validation techniques employed, such as cross-validation or train/test split.
---

In [None]:
#Please use code cells to code in and do not forget to comment your code.

---
<a href=#eight></a>
## **Final Model**
<a href=#cont>Back to Table of Contents</a>

* **Purpose:** Present the final model and its performance.
* **Details:** Highlight the best-performing model and discuss its configuration, performance, and why it was chosen over others.
---


In [None]:
#Please use code cells to code in and do not forget to comment your code.

---
<a href=#nine></a>
## **Conclusion and Future Work**
<a href=#cont>Back to Table of Contents</a>

* **Purpose:** Summarize the findings and discuss future directions.
* **Details:** Conclude with a summary of the results, insights gained, limitations of the study, and suggestions for future projects or improvements in methodology or data collection.
---
