Data Analysis and Visualization with Pandas & Matplotlib π Objective
This project demonstrates how to:
Load and explore a dataset using pandas.
Perform basic data analysis (summary statistics, grouping, and insights).
Create visualizations with matplotlib (and Seaborn for styling).
The work follows the assignment requirements to showcase data wrangling, analysis, and visualization skills.
ποΈ Project Structure Ubuntu_Data_Analysis/ βββ assignment.ipynb # Jupyter Notebook (or assignment.py if script) βββ README.md # Project documentation βββ dataset.csv # Dataset used (optional, if using a local CSV) βββ images/ # (Optional) Saved charts or screenshots
βοΈ Requirements
Install the following Python libraries before running the notebook/script:
pip install pandas matplotlib seaborn scikit-learn
π How to Run
Clone the repository:
git clone https://github.com//Ubuntu_Data_Analysis.git cd Ubuntu_Data_Analysis
Open the notebook in Jupyter:
jupyter notebook assignment.ipynb
Or run the script directly:
python assignment.py
π Tasks Completed πΉ Task 1: Load and Explore Dataset
Loaded dataset (Iris dataset via sklearn or CSV file).
Displayed first few rows with .head().
Checked dataset info, missing values, and cleaned data.
πΉ Task 2: Basic Data Analysis
Computed summary statistics with .describe().
Grouped data by species and calculated mean values.
Observed patterns in sepal/petal measurements across species.
πΉ Task 3: Data Visualization
Created 4 different types of plots:
Line Chart β Petal length over index.
Bar Chart β Average petal length per species.
Histogram β Distribution of sepal length.
Scatter Plot β Relationship between sepal length and petal length.
π Example Visualizations
(Include screenshots or save plots here if possible)
β Findings & Observations
Setosa flowers tend to have smaller petal lengths compared to Virginica and Versicolor.
Clear separation exists in scatter plots, suggesting petal/sepal measurements are good predictors for classification.
Distribution histograms show normal-like spread for some features.
π Ubuntu Principles Applied
Community: Using open datasets accessible to all.
Respect: Handling errors (e.g., missing data, file errors) gracefully.
Sharing: Organizing results and visualizations for reuse.
Practicality: Building a simple, real-world data analysis tool.