# QCTO - Workplace Module

### Project Title: Please Insert your Project Title Here
#### Done By: Name and Surname

© ExploreAI 2024

---

## Table of Contents

<a href=#BC> Background Context</a>

<a href=#one>1. Importing Packages</a>

<a href=#two>2. Data Collection and Description</a>

<a href=#three>3. Loading Data </a>

<a href=#four>4. Data Cleaning and Filtering</a>

<a href=#five>5. Exploratory Data Analysis (EDA)</a>

<a href=#six>6. Modeling </a>

<a href=#seven>7. Evaluation and Validation</a>

<a href=#eight>8. Final Model</a>

<a href=#nine>9. Conclusion and Future Work</a>

<a href=#ten>10. References</a>

---
 <a id="BC"></a>
## **Background Context**
<a href=#cont>Back to Table of Contents</a>

* **Purpose:** Introduce the project, outline its goals, and explain its significance.
* **Details:** Include information about the problem domain, the specific questions or challenges the project aims to address, and any relevant background information that sets the stage for the work.
---

---
<a href=#one></a>
## **Importing Packages**
<a href=#cont>Back to Table of Contents</a>

* **Purpose:** Set up the Python environment with necessary libraries and tools.
* **Details:** List and import all the Python packages that will be used throughout the project such as Pandas for data manipulation, Matplotlib/Seaborn for visualization, scikit-learn for modeling, etc.
---

In [1]:
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import numpy as np

---
<a href=#two></a>
## **Data Collection and Description**
<a href=#cont>Back to Table of Contents</a>

* **Purpose:** Describe how the data was collected and provide an overview of its characteristics.
* **Details:** Mention sources of the data, the methods used for collection (e.g., APIs, web scraping, datasets from repositories), and a general description of the dataset including size, scope, and types of data available (e.g., numerical, categorical).
---

## Data description


1. PriceDate

2. Months (additional feature created)

3. Vegetable Price

4. Types of vegetable

Bhindi (ladies finger)

Tomato

Onion

Brinjal

Garlic

Peas

Methi

Green Chilli

Elephant Yam (Suran)

Data Setup and preparation

We are importing the necessary libraries

-Seaborn for visualising data

-Numpy for numerical operations

-Calender for days and months

-Pandas for data manipulation and analysis

-Matplotlib.pyplot for visualising data

-Plotly.express for visualing data

---
<a href=#three></a>
## **Loading Data**
<a href=#cont>Back to Table of Contents</a>

* **Purpose:** Load the data into the notebook for manipulation and analysis.
* **Details:** Show the code used to load the data and display the first few rows to give a sense of what the raw data looks like.
---

In [2]:
prices_df= pd.read_csv("C:/Users/f8874300/Downloads/archive/prices.csv")
prices_df

Unnamed: 0,Price Dates,Bhindi (Ladies finger),Tomato,Onion,Potato,Brinjal,Garlic,Peas,Methi,Green Chilli,Elephant Yam (Suran)
0,01-01-2023,35.0,18,22.0,20,30,50,25,8,45.0,25
1,02-01-2023,35.0,16,22.0,20,30,55,25,7,40.0,25
2,03-01-2023,35.0,16,21.0,20,30,55,25,7,40.0,25
3,04-01-2023,30.0,16,21.0,22,25,55,25,7,40.0,25
4,08-01-2023,35.0,16,20.0,21,25,55,22,6,35.0,25
...,...,...,...,...,...,...,...,...,...,...,...
282,27-12-2023,45.0,16,30.0,20,70,260,40,16,40.0,25
283,28-12-2023,45.0,16,30.0,20,70,260,30,20,45.0,25
284,29-12-2023,45.0,16,30.0,22,80,260,30,18,50.0,25
285,31-12-2023,45.0,16,26.0,20,60,250,40,16,50.0,40


---
<a href=#four></a>
## **Data Cleaning and Filtering**
<a href=#cont>Back to Table of Contents</a>

* **Purpose:** Prepare the data for analysis by cleaning and filtering.
* **Details:** Include steps for handling missing values, removing outliers, correcting errors, and possibly reducing the data (filtering based on certain criteria or features).
---

In [3]:
prices_df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 287 entries, 0 to 286
Data columns (total 11 columns):
 #   Column                  Non-Null Count  Dtype  
---  ------                  --------------  -----  
 0   Price Dates             287 non-null    object 
 1   Bhindi (Ladies finger)  287 non-null    float64
 2   Tomato                  287 non-null    int64  
 3   Onion                   287 non-null    float64
 4   Potato                  287 non-null    int64  
 5   Brinjal                 287 non-null    int64  
 6   Garlic                  287 non-null    int64  
 7   Peas                    287 non-null    int64  
 8   Methi                   287 non-null    int64  
 9   Green Chilli            287 non-null    float64
 10  Elephant Yam (Suran)    287 non-null    int64  
dtypes: float64(3), int64(7), object(1)
memory usage: 24.8+ KB


In [4]:
prices_df[['Tomato', 'Potato', 'Brinjal', 'Garlic', 'Peas', 'Methi', 'Elephant Yam (Suran)']] = \
prices_df[['Tomato', 'Potato', 'Brinjal', 'Garlic', 'Peas', 'Methi', 'Elephant Yam (Suran)']].apply(pd.to_numeric, errors='coerce')

In [5]:
prices_df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 287 entries, 0 to 286
Data columns (total 11 columns):
 #   Column                  Non-Null Count  Dtype  
---  ------                  --------------  -----  
 0   Price Dates             287 non-null    object 
 1   Bhindi (Ladies finger)  287 non-null    float64
 2   Tomato                  287 non-null    int64  
 3   Onion                   287 non-null    float64
 4   Potato                  287 non-null    int64  
 5   Brinjal                 287 non-null    int64  
 6   Garlic                  287 non-null    int64  
 7   Peas                    287 non-null    int64  
 8   Methi                   287 non-null    int64  
 9   Green Chilli            287 non-null    float64
 10  Elephant Yam (Suran)    287 non-null    int64  
dtypes: float64(3), int64(7), object(1)
memory usage: 24.8+ KB


In [6]:
# converting all columns except price_date to float.
columns_to_convert = {'Bhindi (Ladies finger)': float,'Tomato' : float ,'Onion': float, 'Potato': float,'Brinjal':float,'Garlic': float, 'Peas' : float,'Methi': float, 'Elephant Yam (Suran)': float}
prices_df = prices_df.astype(columns_to_convert)

# renaming the columns below
prices_df.rename(columns={'Price Dates': 'Price_Dates','Bhindi (Ladies finger)':'Bhindi_Ladies_finger'}, inplace=True)

# converting price_dates to date
prices_df['Price_Dates'] = prices_df['Price_Dates'].apply(lambda x: pd.to_datetime(x, errors='coerce', dayfirst=True) if pd.to_datetime(x, errors='coerce', dayfirst=True) is not pd.NaT else pd.to_datetime(x, dayfirst=True))
prices_df
prices_df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 287 entries, 0 to 286
Data columns (total 11 columns):
 #   Column                Non-Null Count  Dtype         
---  ------                --------------  -----         
 0   Price_Dates           287 non-null    datetime64[ns]
 1   Bhindi_Ladies_finger  287 non-null    float64       
 2   Tomato                287 non-null    float64       
 3   Onion                 287 non-null    float64       
 4   Potato                287 non-null    float64       
 5   Brinjal               287 non-null    float64       
 6   Garlic                287 non-null    float64       
 7   Peas                  287 non-null    float64       
 8   Methi                 287 non-null    float64       
 9   Green Chilli          287 non-null    float64       
 10  Elephant Yam (Suran)  287 non-null    float64       
dtypes: datetime64[ns](1), float64(10)
memory usage: 24.8 KB


In [7]:
#Creating a new feature months 
prices_df["months"] = prices_df["Price_Dates"].dt.month_name()
prices_df

Unnamed: 0,Price_Dates,Bhindi_Ladies_finger,Tomato,Onion,Potato,Brinjal,Garlic,Peas,Methi,Green Chilli,Elephant Yam (Suran),months
0,2023-01-01,35.0,18.0,22.0,20.0,30.0,50.0,25.0,8.0,45.0,25.0,January
1,2023-01-02,35.0,16.0,22.0,20.0,30.0,55.0,25.0,7.0,40.0,25.0,January
2,2023-01-03,35.0,16.0,21.0,20.0,30.0,55.0,25.0,7.0,40.0,25.0,January
3,2023-01-04,30.0,16.0,21.0,22.0,25.0,55.0,25.0,7.0,40.0,25.0,January
4,2023-01-08,35.0,16.0,20.0,21.0,25.0,55.0,22.0,6.0,35.0,25.0,January
...,...,...,...,...,...,...,...,...,...,...,...,...
282,2023-12-27,45.0,16.0,30.0,20.0,70.0,260.0,40.0,16.0,40.0,25.0,December
283,2023-12-28,45.0,16.0,30.0,20.0,70.0,260.0,30.0,20.0,45.0,25.0,December
284,2023-12-29,45.0,16.0,30.0,22.0,80.0,260.0,30.0,18.0,50.0,25.0,December
285,2023-12-31,45.0,16.0,26.0,20.0,60.0,250.0,40.0,16.0,50.0,40.0,December


---
<a href=#five></a>
## **Exploratory Data Analysis (EDA)**
<a href=#cont>Back to Table of Contents</a>

* **Purpose:** Explore and visualize the data to uncover patterns, trends, and relationships.
* **Details:** Use statistics and visualizations to explore the data. This may include histograms, box plots, scatter plots, and correlation matrices. Discuss any significant findings.
---


---
<a href=#six></a>
## **Modeling**
<a href=#cont>Back to Table of Contents</a>

* **Purpose:** Develop and train predictive or statistical models.
* **Details:** Describe the choice of models, feature selection and engineering processes, and show how the models are trained. Include code for setting up the models and explanations of the model parameters.
---


---
<a href=#seven></a>
## **Evaluation and Validation**
<a href=#cont>Back to Table of Contents</a>

* **Purpose:** Evaluate and validate the effectiveness and accuracy of the models.
* **Details:** Present metrics used to evaluate the models, such as accuracy, precision, recall, F1-score, etc. Discuss validation techniques employed, such as cross-validation or train/test split.
---

---
<a href=#eight></a>
## **Final Model**
<a href=#cont>Back to Table of Contents</a>

* **Purpose:** Present the final model and its performance.
* **Details:** Highlight the best-performing model and discuss its configuration, performance, and why it was chosen over others.
---


---
<a href=#nine></a>
## **Conclusion and Future Work**
<a href=#cont>Back to Table of Contents</a>

* **Purpose:** Summarize the findings and discuss future directions.
* **Details:** Conclude with a summary of the results, insights gained, limitations of the study, and suggestions for future projects or improvements in methodology or data collection.
---


To sum up, the analysis of price movements across various vegetables sheds light on the nuanced behavior of agricultural markets. The contrast between stable prices in crops like tomatoes and the volatility seen in garlic or peas points to deeper market mechanisms at play—ranging from seasonal trends to shifts in supply and demand. These patterns are more than just numbers; they reflect real-world challenges and decisions faced by farmers, distributors, and consumers. By interpreting these trends, stakeholders can better anticipate changes, optimize agricultural planning, and make more informed economic choices. In essence, understanding these price behaviors is key to navigating the agricultural landscape and building a more resilient, responsive food system.

---
<a href=#ten></a>
## **References**
<a href=#cont>Back to Table of Contents</a>

* **Purpose:** Provide citations and sources of external content.
* **Details:** List all the references and sources consulted during the project, including data sources, research papers, and documentation for tools and libraries used.
---

### 📘 Documentation and Tools
1. **Visual Studio Code (VS Code)**  
   Microsoft. (n.d.). *Visual Studio Code Documentation*. 

2. **Python Programming Language**  
   Python Software Foundation. (n.d.). 

3. **Jupyter Notebooks**  
   Project Jupyter. (n.d.). *Jupyter Documentation*. 

### 🛠️ Libraries Used
4. **pandas**  
  

5. **numpy**  
   
6. **matplotlib**  
   
7. **seaborn**  
  
### 📊 Data Source
8. **Vegetable Prices Dataset**  
  https://github.com/LethaboL15/lethabo-letsoalo-veg-price-analysis
  https://trello.com/b/yqkQ1DcB/lethabo-letsoalo-veg-price-analysis

## Additional Sections to Consider

* ### Appendix: 
For any additional code, detailed tables, or extended data visualizations that are supplementary to the main content.

* ### Contributors: 
If this is a group project, list the contributors and their roles or contributions to the project.
