# Data Analysis with NumPy and Pandas

## Project Overview
This notebook serves as a comprehensive guide and reference for fundamental data analysis techniques using Python's core libraries: **NumPy** and **Pandas**. It covers array manipulation, data cleaning, transformation, and exploratory data analysis (EDA).

## Tech Stack
- **NumPy**: High-performance multidimensional array object and tools for working with these arrays.
- **Pandas**: Data structures and data analysis tools.
- **Matplotlib/Seaborn**: Data visualization.

---


In [None]:
# Install required packages (run once)
!pip install -q numpy pandas matplotlib seaborn


In [None]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

# Configuration for cleaner output
np.set_printoptions(precision=2, suppress=True)
pd.set_option('display.max_columns', None)
sns.set_theme(style="whitegrid")

## Part 1: NumPy Essentials
NumPy (Numerical Python) is the foundation for numerical computing in Python.


In [None]:
# 1. Creating Arrays
arr_1d = np.array([1, 2, 3, 4, 5])
arr_2d = np.array([[1, 2, 3], [4, 5, 6]])
arr_zeros = np.zeros((3, 3))
arr_range = np.arange(0, 10, 2)

print("1D Array:", arr_1d)
print("2D Array Shape:", arr_2d.shape)
print("Range Array:", arr_range)

In [None]:
# 2. Array Attributes
print(f"Dimensions (ndim): {arr_2d.ndim}")
print(f"Shape: {arr_2d.shape}")
print(f"Size (total elements): {arr_2d.size}")
print(f"Data Type (dtype): {arr_2d.dtype}")

In [None]:
# 3. Reshaping and Indexing
arr_reshaped = np.arange(12).reshape(3, 4)
print("\nReshaped (3x4):\n", arr_reshaped)

# Slicing: Rows 0-1, Columns 1-3
subset = arr_reshaped[0:2, 1:3]
print("\nSubset:\n", subset)

In [None]:
# 4. Mathematical Operations
data = np.random.randn(1000)  # Generate 1000 random numbers
print(f"Mean: {np.mean(data):.4f}")
print(f"Std Dev: {np.std(data):.4f}")
print(f"Max: {np.max(data):.4f}")

## Part 2: Pandas Data Analysis
Pandas provides high-level data structures (Series, DataFrame) for data manipulation.


In [None]:
# 1. Creating a DataFrame
data = {
    'Name': ['Alice', 'Bob', 'Charlie', 'David', 'Eva'],
    'Age': [25, 30, 35, 40, 22],
    'Department': ['HR', 'Engineering', 'Engineering', 'HR', 'Marketing'],
    'Salary': [50000, 80000, 90000, 60000, 45000]
}
df = pd.DataFrame(data)
df

In [None]:
# 2. Data Inspection
print("--- Info ---")
df.info()

print("\n--- Descriptive Statistics ---")
print(df.describe())

In [None]:
# 3. Filtering and Selection
high_earners = df[df['Salary'] > 55000]
print("High Earners (>55k):\n", high_earners)

engineers = df[df['Department'] == 'Engineering']
print("\nEngineers:\n", engineers)

In [None]:
# 4. Grouping and Aggregation
dept_stats = df.groupby('Department')['Salary'].agg(['mean', 'count'])
print("Department Stats:\n", dept_stats)

## Part 3: Visualization
Visualizing the data to gain insights.


In [None]:
plt.figure(figsize=(8, 5))
sns.barplot(x='Department', y='Salary', data=df, palette='viridis')
plt.title('Average Salary by Department')
plt.show()

In [None]:
plt.figure(figsize=(6, 6))
df['Department'].value_counts().plot.pie(autopct='%1.1f%%', startangle=90, cmap='Pastel1')
plt.title('Department Distribution')
plt.ylabel('')
plt.show()

## Conclusion
This notebook demonstrated the core capabilities of NumPy for numerical operations and Pandas for structured data analysis. These tools form the backbone of any data science workflow.
