This project performs an in-depth exploratory data analysis (EDA) on crop production across India. Using a cleaned dataset and powerful Python libraries, we explore trends, detect outliers, analyze correlations, and visualize agricultural performance across different regions and seasons.
The analysis is structured around key objectives and follows a clean pipeline:
- Data Loading & Cleaning
- Exploratory Data Analysis (EDA)
- Objective-wise Visual Insights
- Analyze how production varies over time for top crops.
- Visualized using line plots with log-scaled y-axis.
- Yield is calculated as
Production / Area
. - Compared across top 10 crops to assess efficiency.
- Log-scaled bar plot used for clarity.
- Outliers in
Production
andYield
identified using the IQR method. - Boxplots and scatter plots with log scale used for visualization.
- Examines the relationship between
Area
andProduction
. - Uses Pearson correlation and log-log regression plot.
- Compares production across agricultural seasons.
- Boxplot used to reveal spread, median, and outliers.
- Identifies top 20 districts by average production over time.
- Horizontal bar plot with log scaling used.
- Heatmap showing how the top 10 crops are produced across states.
- Values are log-transformed and annotated for precision.
- Visualizes the top 5 crops by total production share using a pie chart.
pandas
– Data manipulationnumpy
– Numerical operationsmatplotlib
– Core plottingseaborn
– Enhanced statistical plottingscipy.stats
– Pearson correlationopenpyxl
– Excel file support (if applicable)
- Removed rows with missing or zero production.
- Standardized column names and stripped whitespace.
- Handled outliers with IQR method.
- Applied log scales where appropriate for better visual interpretation.
- Line plots of crop trends
- Log-scaled boxplots for outliers
- Heatmaps showing state-wise crop dominance
- Pie charts summarizing production share