This project performs data analysis on a mock global sales dataset using the Pandas library in a Python Jupyter environment.
The analysis focuses on key business metrics such as total profit, units sold, and revenue trends over time, providing clear, formatted insights and visualizations.
- Data Loading & Cleaning: Reading data from a CSV file (SalesRecords.csv) and converting the 'Order Date' column to the proper datetime format for time-series analysis.
- Data Aggregation: Using the powerful groupby() and sum() functions to aggregate data by categories like Item Type and Region.
- Multi-Metric Analysis: Grouping by one category (e.g., Region) to simultaneously analyze two related metrics (Units Sold and Total Profit).
- Data Formatting: Applying display options to clean up large numeric figures, converting scientific notation to readable currency, and using .to_string() for tidy console output.
- Data Visualization: Generating clear bar charts and line plots using matplotlib to visualize trends.
- sales_analysis.ipynb: The main Jupyter notebook containing all the Python code, analysis, and charts.
- SalesRecords.csv: The dataset used for this analysis.
This project requires Python and the following libraries:
Pandas
Numpy
matplotlib
You can install the required libraries using pip:
pip install pandas matplotlib
- Upload Files: Upload both the DataAnalysis.ipynb notebook and the SalesRecords.csv data file to your Jupyter or Google Colab workspace.
- Open Notebook: Open DataAnalysis.ipynb.
- Run Cells: Execute the notebook cells sequentially from top to bottom.
The notebook provides the following business reports:
Grouping Field | Metrics Displayed | Visualization |
---|---|---|
Item Type | Total Profit | Bar Chart |
Region | Units Sold, Total Profit | N/A |
Order Month | Total Revenue | Line Plot |