# 🌸 Exploratory Data Analysis with the Iris Dataset

**Author:** André Lopes Marinho  
**Goal:** Use descriptive statistics and Python to understand the structure and distribution of flower measurements in the Iris dataset.

---

## 📌 What You'll Learn

- What are **mean**, **median**, and **mode**?
- How to calculate **variance** and **standard deviation**
- How to summarize a real-world dataset using Python

---

## 📊 Dataset: The Iris Flowers

This famous dataset contains 150 records of **three iris species** (*setosa*, *versicolor*, *virginica*), with:

- Sepal length & width
- Petal length & width

1. 📁 **Load the data**:

In [1]:
import pandas as pd

url = "https://raw.githubusercontent.com/mwaskom/seaborn-data/master/iris.csv"
df = pd.read_csv(url)
df.head()

Unnamed: 0,sepal_length,sepal_width,petal_length,petal_width,species
0,5.1,3.5,1.4,0.2,setosa
1,4.9,3.0,1.4,0.2,setosa
2,4.7,3.2,1.3,0.2,setosa
3,4.6,3.1,1.5,0.2,setosa
4,5.0,3.6,1.4,0.2,setosa


2. 🔍 **Understanding the Data Structure**

Before analyzing or visualizing data, it's critical to understand what we're working with. Step 2 is all about inspecting the dataset's **structure, completeness, and summary statistics**.

We'll use the following tools from the `pandas` library:

- `.info()` – to check column names, data types, and missing values.
- `.describe()` – to generate summary statistics for numeric columns.

In [2]:
df.info()
df.describe()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 150 entries, 0 to 149
Data columns (total 5 columns):
 #   Column        Non-Null Count  Dtype  
---  ------        --------------  -----  
 0   sepal_length  150 non-null    float64
 1   sepal_width   150 non-null    float64
 2   petal_length  150 non-null    float64
 3   petal_width   150 non-null    float64
 4   species       150 non-null    object 
dtypes: float64(4), object(1)
memory usage: 6.0+ KB


Unnamed: 0,sepal_length,sepal_width,petal_length,petal_width
count,150.0,150.0,150.0,150.0
mean,5.843333,3.057333,3.758,1.199333
std,0.828066,0.435866,1.765298,0.762238
min,4.3,2.0,1.0,0.1
25%,5.1,2.8,1.6,0.3
50%,5.8,3.0,4.35,1.3
75%,6.4,3.3,5.1,1.8
max,7.9,4.4,6.9,2.5


## 📌 Concept: .info() and .describe()
 - .info() shows data types and missing values.
 - .describe() gives count, mean, std, min, max, percentiles for each numeric column.

