# Experiment 3: Univariate Plots for Data Exploration Using Matplotlib and Seaborn

## Aim
To explore and visualize the distribution of individual features (univariate data) using Matplotlib and Seaborn for better understanding and insights.

## Objectives
- Understand univariate data exploration techniques.
- Visualize distributions, counts, and statistical properties of individual variables in a dataset.

## Tools Used
- **Matplotlib**: A powerful plotting library for Python.
- **Seaborn**: A Python data visualization library based on Matplotlib, providing high-level interfaces for drawing attractive and informative statistical graphics.

## Implementation

### Step 1: Import Libraries
```python
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
```

### Step 2: Create a Sample Dataset
```python
# Sample dataset for univariate exploration
data = {
    "Age": [22, 25, 47, 52, 46, 56, 28, 34, 42, 39],
    "Salary": [30000, 32000, 47000, 52000, 46000, 58000, 31000, 34000, 42000, 39000],
    "Gender": ["Male", "Female", "Male", "Female", "Male", "Female", "Female", "Male", "Female", "Male"],
    "Department": ["HR", "IT", "IT", "HR", "Finance", "Finance", "HR", "IT", "HR", "Finance"]
}

# Convert to DataFrame
df = pd.DataFrame(data)
print("Dataset:\n")
print(df)
```

### Step 3: Visualizing Univariate Data

#### 3.1 Bar Plot
```python
# Count of categories in 'Gender'
sns.countplot(x="Gender", data=df)
plt.title("Gender Distribution")
plt.xlabel("Gender")
plt.ylabel("Count")
plt.show()
```

#### 3.2 Histogram
```python
# Distribution of 'Age'
sns.histplot(df['Age'], kde=True, bins=5)
plt.title("Age Distribution")
plt.xlabel("Age")
plt.ylabel("Frequency")
plt.show()
```

#### 3.3 Box Plot
```python
# Box plot for 'Salary'
sns.boxplot(y="Salary", data=df)
plt.title("Salary Box Plot")
plt.ylabel("Salary")
plt.show()
```

#### 3.4 Pie Chart
```python
# Pie chart for 'Department'
dept_counts = df["Department"].value_counts()
plt.pie(dept_counts, labels=dept_counts.index, autopct="%1.1f%%", startangle=90, colors=sns.color_palette("pastel"))
plt.title("Department Distribution")
plt.show()
```

#### 3.5 Violin Plot
```python
# Violin plot for 'Age' by 'Gender'
sns.violinplot(x="Gender", y="Age", data=df)
plt.title("Age Distribution by Gender")
plt.xlabel("Gender")
plt.ylabel("Age")
plt.show()
```

### Step 4: Summary and Observations
```python
# Generate summary statistics
summary = df.describe()
print("Summary Statistics:\n")
print(summary)



In [None]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
# Step 2: Create a Sample Dataset
# Sample dataset for univariate exploration
data = {
    "Age": [22, 25, 47, 52, 46, 56, 28, 34, 42, 39],
    "Salary": [30000, 32000, 47000, 52000, 46000, 58000, 31000, 34000, 42000, 39000],
    "Gender": ["Male", "Female", "Male", "Female", "Male", "Female", "Female", "Male", "Female", "Male"],
    "Department": ["HR", "IT", "IT", "HR", "Finance", "Finance", "HR", "IT", "HR", "Finance"]
}

# Convert to DataFrame
df = pd.DataFrame(data)
print("Dataset:\n")
print(df)
# Step 3: Visualizing Univariate Data
# 3.1 Bar Plot
# Count of categories in 'Gender'
sns.countplot(x="Gender", data=df)
plt.title("Gender Distribution")
plt.xlabel("Gender")
plt.ylabel("Count")
plt.show()
# 3.2 Histogram
# Distribution of 'Age'
sns.histplot(df['Age'], kde=True, bins=5)
plt.title("Age Distribution")
plt.xlabel("Age")
plt.ylabel("Frequency")
plt.show()
# 3.3 Box Plot
# Box plot for 'Salary'
sns.boxplot(y="Salary", data=df)
plt.title("Salary Box Plot")
plt.ylabel("Salary")
plt.show()
# 3.4 Pie Chart
# Pie chart for 'Department'
dept_counts = df["Department"].value_counts()
plt.pie(dept_counts, labels=dept_counts.index, autopct="%1.1f%%", startangle=90, colors=sns.color_palette("pastel"))
plt.title("Department Distribution")
plt.show()
# 3.5 Violin Plot
# Violin plot for 'Age' by 'Gender'
sns.violinplot(x="Gender", y="Age", data=df)
plt.title("Age Distribution by Gender")
plt.xlabel("Gender")
plt.ylabel("Age")
plt.show()
# Step 4: Summary and Observations
# Generate summary statistics
summary = df.describe()
print("Summary Statistics:\n")
print(summary)