Skip to content

Swansea1234/Pandas-Matplotlib-Analysis

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

3 Commits
Β 
Β 
Β 
Β 

Repository files navigation

Pandas-Matplotlib-Analysis

Data Analysis and Visualization Project

Overview πŸ“Š

This project is a Python-based data analysis and visualization assignment that demonstrates fundamental skills in using the pandas and matplotlib libraries. The goal is to load, clean, analyze, and visualize a dataset to uncover insights and patterns.

The project fulfills the following objectives:

  • Loading and exploring a dataset using pandas.
  • Performing basic statistical analysis and data cleaning.
  • Creating a variety of plots and charts with matplotlib to visualize key findings.

Dataset

This project uses a sample dataset that mimics the Iris dataset, a classic dataset for classification problems. The dataset contains information about various measurements of different iris flower species.

Files in this Repository

  • data_analysis.py: The main Python script containing all the code for data loading, analysis, and visualization.
  • README.md: This file, providing an overview and instructions for the project.

How to Run the Project πŸš€

  1. Clone the Repository:

    git clone [https://github.com/your-username/Data-Analysis-Project.git](https://github.com/your-username/Data-Analysis-Project.git)
    cd Data-Analysis-Project
  2. Install Required Libraries: Make sure you have pandas and matplotlib installed. If not, you can install them using pip:

    pip install pandas matplotlib numpy
  3. Execute the Script: Run the Python script from your terminal. This will perform the analysis and display the generated plots.

    python data_analysis.py

Analysis and Findings πŸ“ˆ

The script performs the following key analyses:

  • Data Exploration: Checks for missing values and data types. Missing numerical values are filled with the mean of their respective columns.
  • Descriptive Statistics: Computes the mean, median, standard deviation, and other statistics for numerical columns to summarize the data.
  • Grouped Analysis: Calculates the average sepal_length and petal_length for each species, revealing significant differences between the groups.

Visualizations

The script generates four plots to visualize the data:

  1. Scatter Plot: Shows the relationship between sepal_length and petal_length, which clearly separates the species.
  2. Bar Chart: Compares the average petal_length across different species.
  3. Histogram: Illustrates the distribution of sepal_width values.
  4. Line Chart: Shows a trend of sepal_length over a dummy time-series, demonstrating how to plot data over time.

Feel free to explore the data_analysis.py file to see the code in detail. Contributions and feedback are welcome!

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages