# QCTO - Workplace Module

### Project Title: Prices
#### Done By: Welsh Dube

© ExploreAI 2024

---

## Table of Contents

<a href=#BC> Background Context</a>

<a href=#one>1. Importing Packages</a>

<a href=#two>2. Data Collection and Description</a>

<a href=#three>3. Loading Data </a>

<a href=#four>4. Data Cleaning and Filtering</a>

<a href=#five>5. Exploratory Data Analysis (EDA)</a>

<a href=#six>6. Modeling </a>

<a href=#seven>7. Evaluation and Validation</a>

<a href=#eight>8. Final Model</a>

<a href=#nine>9. Conclusion and Future Work</a>

<a href=#ten>10. References</a>

---
 <a id="BC"></a>
## **Background Context**
<a href=#cont>Back to Table of Contents</a>

### **Background Context**

#### **Purpose:**
This project aims to explore and analyze the historical pricing of vegetables across different regions in India. The goal is to understand price variations over time, evaluate market trends, and build predictive models to forecast future price movements. By examining the data, the project intends to provide insights that can help farmers, vendors, and consumers make informed decisions about buying, selling, and stockpiling vegetables.

#### **Details:**
The vegetable market in India experiences significant volatility due to factors like seasonal changes, supply chain disruptions, regional demand, and weather conditions. These fluctuations can impact the livelihoods of farmers and influence the prices faced by consumers. The dataset provided offers a comprehensive view of the prices of key vegetables, such as brinjal, onion, tomato, and others, over time.

The project aims to address several critical questions:
- **Seasonal Impact**: How do different seasons influence vegetable prices?
- **Supply and Demand**: What role do supply chain disruptions or surges in demand play in the price variations?
- **Time Series Analysis**: Can we predict future price trends based on past data?

Understanding these factors will help create models for price prediction and allow stakeholders to better anticipate market changes. This analysis can be leveraged for future market stability, improve supply chain decisions, and ensure fair pricing for consumers.

---

View the Link to Github:

*  [Click here to view the project repo](https://github.com/WelshDube/Veg-Prices.git)


View the Link to the Trello board

---
<a href=#one></a>
## **Importing Packages**
<a href=#cont>Back to Table of Contents</a>

* **Purpose:** Set up the Python environment with necessary libraries and tools.
* **Details:** List and import all the Python packages that will be used throughout the project such as Pandas for data manipulation, Matplotlib/Seaborn for visualization, scikit-learn for modeling, etc.
---

In [None]:
# Importing necessary libraries
import pandas as pd
import numpy as np
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_squared_error, r2_score
import matplotlib.pyplot as plt
import seaborn as sns

---
<a href=#two></a>
## **Data Collection and Description**
<a href=#cont>Back to Table of Contents</a>

* **Purpose:** Describe how the data was collected and provide an overview of its characteristics.
* **Details:** Mention sources of the data, the methods used for collection (e.g., APIs, web scraping, datasets from repositories), and a general description of the dataset including size, scope, and types of data available (e.g., numerical, categorical).
---

In [None]:
# Load the dataset
df = pd.read_csv('prices.csv')

# View the first few rows
df.head()

# Display dataset information
df.info()

# Checking for missing values
df.isnull().sum()


<class 'pandas.core.frame.DataFrame'>
RangeIndex: 287 entries, 0 to 286
Data columns (total 11 columns):
 #   Column                  Non-Null Count  Dtype  
---  ------                  --------------  -----  
 0   Price Dates             287 non-null    object 
 1   Bhindi (Ladies finger)  287 non-null    float64
 2   Tomato                  287 non-null    int64  
 3   Onion                   287 non-null    float64
 4   Potato                  287 non-null    int64  
 5   Brinjal                 287 non-null    int64  
 6   Garlic                  287 non-null    int64  
 7   Peas                    287 non-null    int64  
 8   Methi                   287 non-null    int64  
 9   Green Chilli            287 non-null    float64
 10  Elephant Yam (Suran)    287 non-null    int64  
dtypes: float64(3), int64(7), object(1)
memory usage: 24.8+ KB


Unnamed: 0,0
Price Dates,0
Bhindi (Ladies finger),0
Tomato,0
Onion,0
Potato,0
Brinjal,0
Garlic,0
Peas,0
Methi,0
Green Chilli,0


---
<a href=#three></a>
## **Loading Data**
<a href=#cont>Back to Table of Contents</a>

* **Purpose:** Load the data into the notebook for manipulation and analysis.
* **Details:** Show the code used to load the data and display the first few rows to give a sense of what the raw data looks like.
---

In [None]:
# Summary statistics of the data
df.describe()

# Check the columns and their data types
df.dtypes


Unnamed: 0,0
Price Dates,object
Bhindi (Ladies finger),float64
Tomato,int64
Onion,float64
Potato,int64
Brinjal,int64
Garlic,int64
Peas,int64
Methi,int64
Green Chilli,float64


In [None]:
# Print the column names
print(df_cleaned.columns)


Index(['Price Dates', 'Bhindi (Ladies finger)', 'Tomato', 'Onion', 'Potato',
       'Brinjal', 'Garlic', 'Peas', 'Methi', 'Green Chilli',
       'Elephant Yam (Suran)'],
      dtype='object')


In [None]:
# Check for duplicates
duplicates = df.duplicated()
print(f"Number of duplicate rows: {duplicates.sum()}")

# If you want to view the actual duplicate rows
duplicate_rows = df[df.duplicated()]
print(duplicate_rows)

Number of duplicate rows: 0
Empty DataFrame
Columns: [Price Dates, Bhindi (Ladies finger), Tomato, Onion, Potato, Brinjal, Garlic, Peas, Methi, Green Chilli, Elephant Yam (Suran)]
Index: []


---
<a href=#four></a>
## **Data Cleaning and Filtering**
<a href=#cont>Back to Table of Contents</a>

* **Purpose:** Prepare the data for analysis by cleaning and filtering.
* **Details:** Include steps for handling missing values, removing outliers, correcting errors, and possibly reducing the data (filtering based on certain criteria or features).
---

In [None]:
# Handle missing values (dropping rows with missing values as an example)
df_cleaned = df.dropna()  # You can also fill missing values if needed with df.fillna()

# Filter relevant columns based on the actual column names in your dataset
df_filtered = df_cleaned[['Price Dates', 'Brinjal']]  # Add other columns as needed

# View the filtered data
print(df_filtered.head())

  Price Dates  Brinjal
0  01-01-2023       30
1  02-01-2023       30
2  03-01-2023       30
3  04-01-2023       25
4  08-01-2023       25


---
<a href=#five></a>
## **Exploratory Data Analysis (EDA)**
<a href=#cont>Back to Table of Contents</a>

* **Purpose:** Explore and visualize the data to uncover patterns, trends, and relationships.
* **Details:** Use statistics and visualizations to explore the data. This may include histograms, box plots, scatter plots, and correlation matrices. Discuss any significant findings.
---


In [None]:
#Please use code cells to code in and do not forget to comment your code.

---
<a href=#six></a>
## **Modeling**
<a href=#cont>Back to Table of Contents</a>

* **Purpose:** Develop and train predictive or statistical models.
* **Details:** Describe the choice of models, feature selection and engineering processes, and show how the models are trained. Include code for setting up the models and explanations of the model parameters.
---


In [None]:
#Please use code cells to code in and do not forget to comment your code.

---
<a href=#seven></a>
## **Evaluation and Validation**
<a href=#cont>Back to Table of Contents</a>

* **Purpose:** Evaluate and validate the effectiveness and accuracy of the models.
* **Details:** Present metrics used to evaluate the models, such as accuracy, precision, recall, F1-score, etc. Discuss validation techniques employed, such as cross-validation or train/test split.
---

In [None]:
#Please use code cells to code in and do not forget to comment your code.

---
<a href=#eight></a>
## **Final Model**
<a href=#cont>Back to Table of Contents</a>

* **Purpose:** Present the final model and its performance.
* **Details:** Highlight the best-performing model and discuss its configuration, performance, and why it was chosen over others.
---


In [None]:
#Please use code cells to code in and do not forget to comment your code.

---
<a href=#nine></a>
## **Conclusion and Future Work**
<a href=#cont>Back to Table of Contents</a>

* **Purpose:** Summarize the findings and discuss future directions.
* **Details:** Conclude with a summary of the results, insights gained, limitations of the study, and suggestions for future projects or improvements in methodology or data collection.
---


In [None]:
#Please use code cells to code in and do not forget to comment your code.

---
<a href=#ten></a>
## **References**
<a href=#cont>Back to Table of Contents</a>

* **Purpose:** Provide citations and sources of external content.
* **Details:** List all the references and sources consulted during the project, including data sources, research papers, and documentation for tools and libraries used.
---

In [None]:
#Please use code cells to code in and do not forget to comment your code.

## Additional Sections to Consider

* ### Appendix:
For any additional code, detailed tables, or extended data visualizations that are supplementary to the main content.

* ### Contributors:
If this is a group project, list the contributors and their roles or contributions to the project.
