# ü•¨ Kalimati Tarkari Dataset: Fruits & Vegetables Price Analysis

## üìä About Dataset

### **Kalimati Tarkari Dataset**
This comprehensive dataset contains historical price data for fruits and vegetables from the **Kalimati Fruits and Vegetable Market Development Board**, Nepal's largest wholesale market for agricultural produce. The data has been meticulously scraped from the official website: [https://kalimatimarket.gov.np/](https://kalimatimarket.gov.np/)

The dataset captures daily minimum, maximum, and average prices for a wide variety of commodities, providing valuable insights into market trends, seasonal variations, and price dynamics in Nepal's agricultural sector.

üì¶ **Dataset Source**: [Kaggle - Kalimati Tarkari Dataset](https://www.kaggle.com/datasets/nischallal/kalimati-tarkari-dataset)

---

## üåü Context

The Kalimati Market serves as the primary wholesale hub for fruits and vegetables in Nepal, directly influencing retail prices across the country. Understanding price patterns in this market is crucial for:

- **Farmers** seeking optimal selling times for their produce
- **Retailers** planning inventory and pricing strategies  
- **Policymakers** monitoring food security and inflation
- **Consumers** understanding seasonal price fluctuations
- **Researchers** analyzing agricultural economics and supply chain dynamics

This dataset represents years of daily price records, capturing the pulse of Nepal's agricultural market and offering a window into the economic realities of food distribution in South Asia.

---

## üì¶ Content

The dataset includes:

- **280,000+ records** spanning multiple years of daily price data
- **Multiple commodity categories**: Vegetables (tomatoes, potatoes, leafy greens, etc.), Fruits (bananas, mangoes, apples, etc.), and specialty items
- **Price metrics**: Minimum, Maximum, and Average prices per unit (Kg/Dozen/Piece)
- **Temporal data**: Date-wise records enabling time series analysis
- **Unit specifications**: Clear measurement units for each commodity

### Key Features:
- Daily price updates for 70+ different commodities
- Seasonal variation patterns across different produce types
- Price volatility indicators through min-max spreads
- Historical trends for forecasting and predictive modeling

---

## üôè Acknowledgements

We extend our sincere gratitude to:

- **Kalimati Fruits and Vegetable Market Development Board** for maintaining transparent and accessible price records
- The **Government of Nepal** for supporting agricultural market information systems
- **Open data initiatives** that make agricultural market data publicly available for research and analysis

This dataset would not be possible without the continuous efforts of market officials who diligently record and publish daily price information, contributing to market transparency and informed decision-making.

---

## üí° Inspiration & Research Questions

This dataset opens doors to numerous analytical opportunities:

### üìà Time Series Analysis:
- Can we predict future prices based on historical trends?
- What are the seasonal patterns for different commodities?
- How do prices fluctuate during festivals and special occasions?

### üîç Market Insights:
- Which commodities show the highest price volatility?
- How do local vs. imported produce prices compare?
- What is the relationship between minimum and maximum prices?

### üåæ Economic Analysis:
- How do weather patterns affect vegetable prices?
- What is the impact of supply chain disruptions on prices?
- Can we identify inflationary trends in food prices?

### ü§ñ Machine Learning Applications:
- Price forecasting models for different commodities
- Anomaly detection in price patterns
- Clustering analysis of similar price behaviors

---

**Let's explore the data and uncover the stories hidden in Nepal's agricultural market!** üöÄ

---

## üë§ Author

**Sajjad Ali Shah**  
Data Scientist | Machine Learning Engineer  
üîó [LinkedIn Profile](https://www.linkedin.com/in/sajjad-ali-shah47/)

*Feel free to connect for collaborations, discussions, or questions about this analysis!*

## üîó Dataset Source

üì¶ **Kaggle**: [Kalimati Tarkari Dataset](https://www.kaggle.com/datasets/nischallal/kalimati-tarkari-dataset)

Access the complete dataset on Kaggle for your analysis and research.

---

# üîÑ Time Series Analysis Workflow

Following a systematic approach for time series analysis:
1. üì• **Data Collection** - Load the dataset
2. üìÖ **Datetime Handling** - Convert and parse date columns
3. üîç **Initial Data Inspection** - Understand data structure
4. ‚ùì **Missing Values Check** - Identify temporal gaps
5. üìä **Resampling** (if needed) - Standardize time intervals
6. üìà **Exploratory Data Analysis** - Discover patterns and trends

---

## Step 1Ô∏è‚É£: Data Collection

Loading the Kalimati Tarkari dataset for time series analysis.

In [7]:
# import all the libraries
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns


In [8]:
# load the dataset
df=pd.read_csv('Dataset/Kalimati_Tarkari_Dataset.csv', low_memory=False)

# Convert price columns to numeric, handling any non-numeric values
df['Minimum'] = pd.to_numeric(df['Minimum'], errors='coerce')
df['Maximum'] = pd.to_numeric(df['Maximum'], errors='coerce')
df['Average'] = pd.to_numeric(df['Average'], errors='coerce')

print("‚úÖ Dataset loaded successfully!")
print(f"Shape: {df.shape[0]:,} rows, {df.shape[1]} columns")

‚úÖ Dataset loaded successfully!
Shape: 280,862 rows, 6 columns


---

## Step 2Ô∏è‚É£: Datetime Handling

Converting date columns to proper datetime format and extracting temporal features.

In [9]:
# Convert Date column to datetime format
print("=" * 100)
print("üìÖ DATETIME HANDLING")
print("=" * 100)

# Check original date format
print(f"\nüîç Original Date Column:")
print(f"Data Type: {df['Date'].dtype}")
print(f"Sample values:\n{df['Date'].head()}")

# Convert to datetime
df['Date'] = pd.to_datetime(df['Date'], format='mixed', dayfirst=False)

print(f"\n‚úÖ After Conversion:")
print(f"Data Type: {df['Date'].dtype}")
print(f"Sample values:\n{df['Date'].head()}")

# Extract temporal features
df['Year'] = df['Date'].dt.year
df['Month'] = df['Date'].dt.month
df['Month_Name'] = df['Date'].dt.month_name()
df['Day'] = df['Date'].dt.day
df['Day_of_Week'] = df['Date'].dt.day_name()
df['Quarter'] = df['Date'].dt.quarter
df['Week_of_Year'] = df['Date'].dt.isocalendar().week

print(f"\nüìä Date Range:")
print(f"Start Date: {df['Date'].min().strftime('%Y-%m-%d')}")
print(f"End Date: {df['Date'].max().strftime('%Y-%m-%d')}")
print(f"Total Days: {(df['Date'].max() - df['Date'].min()).days:,}")
print(f"Total Years: {(df['Date'].max() - df['Date'].min()).days / 365.25:.2f}")

print(f"\n‚úÖ New Temporal Features Created:")
print(f"   ‚Ä¢ Year, Month, Month_Name, Day")
print(f"   ‚Ä¢ Day_of_Week, Quarter, Week_of_Year")

üìÖ DATETIME HANDLING

üîç Original Date Column:
Data Type: object
Sample values:
0    6/16/2013
1    6/16/2013
2    6/16/2013
3    6/16/2013
4    6/16/2013
Name: Date, dtype: object

‚úÖ After Conversion:
Data Type: datetime64[ns]
Sample values:
0   2013-06-16
1   2013-06-16
2   2013-06-16
3   2013-06-16
4   2013-06-16
Name: Date, dtype: datetime64[ns]

üìä Date Range:
Start Date: 2013-06-16
End Date: 2023-09-28
Total Days: 3,756
Total Years: 10.28

‚úÖ New Temporal Features Created:
   ‚Ä¢ Year, Month, Month_Name, Day
   ‚Ä¢ Day_of_Week, Quarter, Week_of_Year


---

## Step 3Ô∏è‚É£: Initial Data Inspection

Understanding the structure and basic characteristics of the time series data.

In [10]:
# Dataset Overview
print("=" * 100)
print("üìä DATASET INFORMATION")
print("=" * 100)
df.info()

print("\n" + "=" * 100)
print("üìê DATASET SHAPE")
print("=" * 100)
print(f"Rows: {df.shape[0]:,}")
print(f"Columns: {df.shape[1]}")

print("\n" + "=" * 100)
print("üîç FIRST 5 ROWS")
print("=" * 100)
print(df.head())

print("\n" + "=" * 100)
print("üìà STATISTICAL SUMMARY")
print("=" * 100)
print(df.describe())

print("\n" + "=" * 100)
print("üîé MISSING VALUES")
print("=" * 100)
missing_values = df.isnull().sum()
if missing_values.sum() > 0:
    print(missing_values[missing_values > 0])
else:
    print("‚úÖ No missing values found!")

print("\n" + "=" * 100)
print("üìä DUPLICATE ROWS")
print("=" * 100)
duplicates = df.duplicated().sum()
print(f"Total duplicate rows: {duplicates:,}")
if duplicates == 0:
    print("‚úÖ No duplicate rows found!")

print("\n" + "=" * 100)
print("üè∑Ô∏è UNIQUE VALUES PER COLUMN")
print("=" * 100)
for col in df.columns:
    print(f"{col}: {df[col].nunique():,} unique values")

üìä DATASET INFORMATION
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 280862 entries, 0 to 280861
Data columns (total 13 columns):
 #   Column        Non-Null Count   Dtype         
---  ------        --------------   -----         
 0   Commodity     280862 non-null  object        
 1   Date          280862 non-null  datetime64[ns]
 2   Unit          280862 non-null  object        
 3   Minimum       229706 non-null  float64       
 4   Maximum       229706 non-null  float64       
 5   Average       229706 non-null  float64       
 6   Year          280862 non-null  int32         
 7   Month         280862 non-null  int32         
 8   Month_Name    280862 non-null  object        
 9   Day           280862 non-null  int32         
 10  Day_of_Week   280862 non-null  object        
 11  Quarter       280862 non-null  int32         
 12  Week_of_Year  280862 non-null  UInt32        
dtypes: UInt32(1), datetime64[ns](1), float64(3), int32(4), object(4)
memory usage: 22.8+ MB

üìê 

---

## Step 4Ô∏è‚É£: Missing Values in Time Series

Identifying temporal gaps and missing data points in the time series.

In [11]:
# Time Series Missing Values Analysis
print("=" * 100)
print("‚ùì MISSING VALUES IN TIME SERIES")
print("=" * 100)

# 1. Check for missing values in columns
print("\nüìä Missing Values by Column:")
print("-" * 100)
missing_count = df.isnull().sum()
missing_percent = (missing_count / len(df)) * 100
missing_df = pd.DataFrame({
    'Column': missing_count.index,
    'Missing Count': missing_count.values,
    'Missing %': missing_percent.values
})
print(missing_df[missing_df['Missing Count'] > 0].to_string(index=False) if missing_df['Missing Count'].sum() > 0 else "‚úÖ No missing values in columns!")

# 2. Check for temporal gaps (missing dates)
print("\n" + "=" * 100)
print("üìÖ Temporal Continuity Check:")
print("-" * 100)

# Sort by date
df_sorted = df.sort_values('Date')

# Get all unique dates
unique_dates = df_sorted['Date'].unique()
date_range = pd.date_range(start=df_sorted['Date'].min(), end=df_sorted['Date'].max(), freq='D')

missing_dates = date_range.difference(pd.DatetimeIndex(unique_dates))

print(f"Expected date range: {len(date_range):,} days")
print(f"Actual unique dates: {len(unique_dates):,} days")
print(f"Missing dates: {len(missing_dates):,} days")

if len(missing_dates) > 0:
    print(f"\n‚ö†Ô∏è Found {len(missing_dates)} missing dates in the time series")
    print(f"First 10 missing dates:")
    for date in missing_dates[:10]:
        print(f"  ‚Ä¢ {date.strftime('%Y-%m-%d')}")
else:
    print("\n‚úÖ No temporal gaps found - continuous daily data!")

# 3. Check data frequency
print("\n" + "=" * 100)
print("üìä Time Series Frequency Analysis:")
print("-" * 100)
print(f"Total Records: {len(df):,}")
print(f"Unique Commodities: {df['Commodity'].nunique()}")
print(f"Records per Commodity (average): {len(df) / df['Commodity'].nunique():.2f}")

# 4. Check for irregular spacing
print("\n" + "=" * 100)
print("üìà Data Distribution Over Time:")
print("-" * 100)
records_per_year = df.groupby('Year').size()
print(records_per_year)

print("\n‚úÖ Missing values analysis complete!")

‚ùì MISSING VALUES IN TIME SERIES

üìä Missing Values by Column:
----------------------------------------------------------------------------------------------------
 Column  Missing Count  Missing %
Minimum          51156  18.213927
Maximum          51156  18.213927
Average          51156  18.213927

üìÖ Temporal Continuity Check:
----------------------------------------------------------------------------------------------------
Expected date range: 3,757 days
Actual unique dates: 3,615 days
Missing dates: 142 days

‚ö†Ô∏è Found 142 missing dates in the time series
First 10 missing dates:
  ‚Ä¢ 2013-06-22
  ‚Ä¢ 2013-06-23
  ‚Ä¢ 2013-06-24
  ‚Ä¢ 2013-06-29
  ‚Ä¢ 2013-07-06
  ‚Ä¢ 2013-07-07
  ‚Ä¢ 2013-07-08
  ‚Ä¢ 2013-07-13
  ‚Ä¢ 2013-07-20
  ‚Ä¢ 2013-07-27

üìä Time Series Frequency Analysis:
----------------------------------------------------------------------------------------------------
Total Records: 280,862
Unique Commodities: 136
Records per Commodity (average): 2065.16

ü

---

## Step 5Ô∏è‚É£: Resampling (If Needed)

Checking if resampling is required and standardizing time intervals if necessary.

In [12]:
# Resampling Analysis
print("=" * 100)
print("üìä RESAMPLING ANALYSIS")
print("=" * 100)

# Check current frequency
print(f"\nüîç Current Data Frequency:")
print(f"   ‚Ä¢ Data appears to be DAILY (one record per commodity per day)")
print(f"   ‚Ä¢ Multiple commodities tracked simultaneously")

# Example: Resample one commodity to different frequencies
sample_commodity = df['Commodity'].value_counts().index[0]
commodity_data = df[df['Commodity'] == sample_commodity].set_index('Date').sort_index()

print(f"\nüìà Sample Resampling for '{sample_commodity}':")
print("-" * 100)

# Daily (current)
print(f"\n1Ô∏è‚É£ Daily Frequency (Current):")
print(f"   Records: {len(commodity_data)}")
print(f"   Sample:\n{commodity_data[['Average']].head()}")

# Weekly resampling
weekly = commodity_data[['Minimum', 'Maximum', 'Average']].resample('W').mean()
print(f"\n2Ô∏è‚É£ Weekly Resampling (Mean prices):")
print(f"   Records: {len(weekly)}")
print(f"   Sample:\n{weekly.head()}")

# Monthly resampling
monthly = commodity_data[['Minimum', 'Maximum', 'Average']].resample('M').mean()
print(f"\n3Ô∏è‚É£ Monthly Resampling (Mean prices):")
print(f"   Records: {len(monthly)}")
print(f"   Sample:\n{monthly.head()}")

print(f"\nüí° Resampling Options:")
print(f"   ‚Ä¢ Daily (D) - Current frequency ‚úÖ")
print(f"   ‚Ä¢ Weekly (W) - For weekly trend analysis")
print(f"   ‚Ä¢ Monthly (M) - For long-term patterns")
print(f"   ‚Ä¢ Quarterly (Q) - For seasonal analysis")

print(f"\nüìå Decision: Keep DAILY frequency for detailed analysis")
print(f"   (Can resample later for specific visualizations or modeling)")

print("\n‚úÖ Resampling analysis complete!")

üìä RESAMPLING ANALYSIS

üîç Current Data Frequency:
   ‚Ä¢ Data appears to be DAILY (one record per commodity per day)
   ‚Ä¢ Multiple commodities tracked simultaneously

üìà Sample Resampling for 'Cauli Local':
----------------------------------------------------------------------------------------------------

1Ô∏è‚É£ Daily Frequency (Current):
   Records: 3612
   Sample:
            Average
Date               
2013-06-16     32.5
2013-06-17     27.5
2013-06-18     27.5
2013-06-19     27.5
2013-06-20     22.5

2Ô∏è‚É£ Weekly Resampling (Mean prices):
   Records: 538
   Sample:
            Minimum  Maximum  Average
Date                                 
2013-06-16     30.0     35.0     32.5
2013-06-23     23.0     27.8     25.4
2013-06-30     23.6     29.4     26.5
2013-07-07     35.0     40.0     37.5
2013-07-14     47.0     54.0     50.5

3Ô∏è‚É£ Monthly Resampling (Mean prices):
   Records: 124
   Sample:
              Minimum    Maximum    Average
Date                          

  monthly = commodity_data[['Minimum', 'Maximum', 'Average']].resample('M').mean()


---

## Step 6Ô∏è‚É£: Exploratory Data Analysis (EDA)

Now that we have a clean, well-structured time series dataset, let's explore patterns, trends, and insights through comprehensive analysis.

### üìÖ Temporal Distribution Analysis
Understanding how data is distributed across time periods.

In [13]:
# Temporal Distribution Analysis
print("=" * 100)
print("üìÖ TEMPORAL DISTRIBUTION ANALYSIS")
print("=" * 100)

print(f"\nüìÜ Date Range Summary:")
print(f"   Start Date: {df['Date'].min().strftime('%Y-%m-%d')}")
print(f"   End Date: {df['Date'].max().strftime('%Y-%m-%d')}")
print(f"   Total Days: {(df['Date'].max() - df['Date'].min()).days:,}")
print(f"   Total Years: {(df['Date'].max() - df['Date'].min()).days / 365.25:.2f}")

print(f"\nüìä Records Per Year:")
print("-" * 100)
year_counts = df['Year'].value_counts().sort_index()
for year, count in year_counts.items():
    print(f"{year}: {count:7,} records")

print(f"\nüìä Records Per Month:")
print("-" * 100)
month_order = ['January', 'February', 'March', 'April', 'May', 'June',
               'July', 'August', 'September', 'October', 'November', 'December']
month_counts = df['Month_Name'].value_counts().reindex(month_order)
for month, count in month_counts.items():
    print(f"{month:12}: {count:7,} records")

print(f"\nüìä Records Per Day of Week:")
print("-" * 100)
day_order = ['Monday', 'Tuesday', 'Wednesday', 'Thursday', 'Friday', 'Saturday', 'Sunday']
day_counts = df['Day_of_Week'].value_counts().reindex(day_order)
for day, count in day_counts.items():
    print(f"{day:12}: {count:7,} records")

print("\n‚úÖ Temporal distribution analysis complete!")

üìÖ TEMPORAL DISTRIBUTION ANALYSIS

üìÜ Date Range Summary:
   Start Date: 2013-06-16
   End Date: 2023-09-28
   Total Days: 3,756
   Total Years: 10.28

üìä Records Per Year:
----------------------------------------------------------------------------------------------------
2013:  10,896 records
2014:  22,170 records
2015:  25,084 records
2016:  25,596 records
2017:  24,124 records
2018:  23,295 records
2019:  26,197 records
2020:  26,622 records
2021:  34,685 records
2022:  35,084 records
2023:  27,109 records

üìä Records Per Month:
----------------------------------------------------------------------------------------------------
January     :  24,715 records
February    :  22,520 records
March       :  23,965 records
April       :  22,909 records
May         :  22,651 records
June        :  23,315 records
July        :  24,565 records
August      :  24,159 records
September   :  23,524 records
October     :  21,391 records
November    :  22,737 records
December    :  24,411 

### ü•¨ Commodity Analysis
Exploring the different commodities and their frequency in the dataset.

### üí∞ Price Analysis
Analyzing price distributions, volatility, and trends.

In [14]:
# Commodity Analysis
print("=" * 100)
print("ü•¨ COMMODITY ANALYSIS")
print("=" * 100)

print(f"\nüìä Total Unique Commodities: {df['Commodity'].nunique()}")

print(f"\nüìà Top 20 Most Frequently Recorded Commodities:")
print("-" * 100)
top_commodities = df['Commodity'].value_counts().head(20)
for idx, (commodity, count) in enumerate(top_commodities.items(), 1):
    print(f"{idx:2}. {commodity:30} - {count:6,} records")

print(f"\nüìä Unit Types Distribution:")
print(df['Unit'].value_counts())

print(f"\nüí° Sample Commodities by Unit Type:")
for unit in df['Unit'].unique():
    commodities = df[df['Unit'] == unit]['Commodity'].unique()[:5]
    print(f"\n{unit}:")
    for commodity in commodities:
        print(f"  ‚Ä¢ {commodity}")

ü•¨ COMMODITY ANALYSIS

üìä Total Unique Commodities: 136

üìà Top 20 Most Frequently Recorded Commodities:
----------------------------------------------------------------------------------------------------
 1. Cauli Local                    -  3,612 records
 2. Ginger                         -  3,612 records
 3. Chilli Dry                     -  3,609 records
 4. Banana                         -  3,604 records
 5. Coriander Green                -  3,603 records
 6. Bamboo Shoot                   -  3,603 records
 7. Potato Red                     -  3,602 records
 8. Brd Leaf Mustard               -  3,602 records
 9. French Bean(Local)             -  3,600 records
10. Cabbage(Local)                 -  3,600 records
11. Carrot(Local)                  -  3,596 records
12. Onion Green                    -  3,593 records
13. Chilli Green                   -  3,592 records
14. Garlic Dry Chinese             -  3,589 records
15. Raddish White(Local)           -  3,588 records
16. Brin

In [15]:
# Price Analysis
print("=" * 100)
print("üí∞ PRICE ANALYSIS")
print("=" * 100)

# Overall price statistics
print(f"\nüìä Overall Price Statistics:")
print("-" * 100)
print(f"Average Minimum Price: ‚Ç® {df['Minimum'].mean():.2f}")
print(f"Average Maximum Price: ‚Ç® {df['Maximum'].mean():.2f}")
print(f"Average Price: ‚Ç® {df['Average'].mean():.2f}")

print(f"\nüìà Price Range Statistics:")
df['Price_Range'] = df['Maximum'] - df['Minimum']
print(f"Average Price Range: ‚Ç® {df['Price_Range'].mean():.2f}")
print(f"Median Price Range: ‚Ç® {df['Price_Range'].median():.2f}")
print(f"Max Price Range: ‚Ç® {df['Price_Range'].max():.2f}")

# Calculate price volatility (coefficient of variation)
df['Price_Volatility'] = (df['Price_Range'] / df['Average']) * 100

print(f"\nüìä Top 10 Commodities by Average Price:")
print("-" * 100)
top_priced = df.groupby('Commodity')['Average'].mean().sort_values(ascending=False).head(10)
for idx, (commodity, price) in enumerate(top_priced.items(), 1):
    print(f"{idx:2}. {commodity:30} - ‚Ç® {price:.2f}")

print(f"\nüìä Top 10 Most Volatile Commodities (by price range):")
print("-" * 100)
most_volatile = df.groupby('Commodity')['Price_Volatility'].mean().sort_values(ascending=False).head(10)
for idx, (commodity, volatility) in enumerate(most_volatile.items(), 1):
    print(f"{idx:2}. {commodity:30} - {volatility:.2f}%")

üí∞ PRICE ANALYSIS

üìä Overall Price Statistics:
----------------------------------------------------------------------------------------------------
Average Minimum Price: ‚Ç® 90.34
Average Maximum Price: ‚Ç® 100.23
Average Price: ‚Ç® 95.28

üìà Price Range Statistics:
Average Price Range: ‚Ç® 9.89
Median Price Range: ‚Ç® 10.00
Max Price Range: ‚Ç® 590.00

üìä Top 10 Commodities by Average Price:
----------------------------------------------------------------------------------------------------
 1. Strawberry                     - ‚Ç® 467.36
 2. Asparagus                      - ‚Ç® 436.77
 3. Kiwi                           - ‚Ç® 372.61
 4. Avocado                        - ‚Ç® 371.75
 5. Mushroom(Button)               - ‚Ç® 349.11
 6. Lime                           - ‚Ç® 322.50
 7. Grapes(Black)                  - ‚Ç® 286.64
 8. Fish Fresh(Rahu)               - ‚Ç® 284.43
 9. Apple(Fuji)                    - ‚Ç® 271.49
10. Chilli Dry                     - ‚Ç® 258.29

üìä Top 10 

---

## üìù Summary

Congratulations! You've successfully completed the 6-step Time Series Analysis workflow:

‚úÖ **Step 1: Data Collection** - Loaded the Kalimati Tarkari dataset  
‚úÖ **Step 2: Datetime Handling** - Converted dates and extracted temporal features  
‚úÖ **Step 3: Initial Data Inspection** - Explored data structure and characteristics  
‚úÖ **Step 4: Missing Values Check** - Identified temporal gaps and missing data  
‚úÖ **Step 5: Resampling** - Analyzed frequency and resampling options  
‚úÖ **Step 6: Exploratory Data Analysis** - Discovered patterns and insights  

### üéØ Key Findings:

- **Dataset Coverage**: 10+ years of daily price data (2013-2023)
- **Scale**: 280,862 records across 136 commodities
- **Temporal Coverage**: 3,615 unique dates with 142 missing dates
- **Most Expensive**: Strawberry, Asparagus, Kiwi, Avocado
- **Most Volatile**: Cabbage varieties and Cucumber showing highest price fluctuations

### üöÄ Next Steps:

You can now proceed with:
- Time series visualization (line plots, seasonal decomposition)
- Statistical modeling (ARIMA, Prophet, etc.)
- Price forecasting and prediction
- Anomaly detection in price patterns
- Correlation analysis between commodities

---

**Happy Analyzing! üìä**