# Introduction to Pandas

Pandas is a powerful Python library for data manipulation and analysis, built on top of NumPy. It provides data structures like Series and DataFrame, ideal for handling structured data such as spreadsheets or databases.

## What is Pandas?
- **Purpose**: Simplifies data manipulation, cleaning, and analysis with intuitive data structures.
- **Key Features**:
  - **Series**: 1D labeled arrays.
  - **DataFrame**: 2D labeled data structures (like tables).
  - Data cleaning, grouping, merging, and time-series functionality.
  - Integration with NumPy and Matplotlib.
- **Installation**: If not installed, use: `pip install pandas`.

## 1. Getting Started with Pandas
Import Pandas with the standard alias `pd`.

In [None]:
import pandas as pd
import numpy as np  # Often used with Pandas

## 2. Core Data Structures
### Series
A Series is a 1D array-like object with an index.

In [None]:
# Create a Series
s = pd.Series([1, 3, 5, np.nan, 6, 8], index=['a', 'b', 'c', 'd', 'e', 'f'])
print(s)
print(s['c'])  # Access by index

### DataFrame
A DataFrame is a 2D table with rows and columns, like a spreadsheet.

In [None]:
# Create a DataFrame from a dictionary
data = {'Name': ['Alice', 'Bob', 'Cathy'], 'Age': [25, 30, 22], 'City': ['New York', 'London', 'Paris']}
df = pd.DataFrame(data)
print(df)

# Create from NumPy array
df_array = pd.DataFrame(np.random.rand(3, 2), columns=['A', 'B'])
print(df_array)

## 3. DataFrame Operations
### Viewing Data

In [None]:
print(df.head(2))  # First 2 rows
print(df.columns)  # Column names
print(df.info())   # DataFrame info
print(df.describe())  # Summary statistics

### Indexing and Selection
- `loc`: Label-based indexing.
- `iloc`: Integer-based indexing.

In [None]:
print(df['Name'])  # Select column
print(df.loc[0])   # Select row by index
print(df.iloc[0, 1])  # Select specific element
print(df[df['Age'] > 25])  # Filter rows

## 4. Data Manipulation
### Adding/Modifying Columns

In [None]:
df['Salary'] = [50000, 60000, 45000]
print(df)

# Apply function to a column
df['Age'] = df['Age'].apply(lambda x: x + 1)
print(df)

### Handling Missing Data

In [None]:
df_with_nan = pd.DataFrame({'A': [1, np.nan, 3], 'B': [4, 5, np.nan]})
print(df_with_nan)
print(df_with_nan.fillna(0))  # Fill NaN with 0
print(df_with_nan.dropna())   # Drop rows with NaN

## 5. Grouping and Merging
### Grouping

In [None]:
df['Department'] = ['HR', 'IT', 'HR']
grouped = df.groupby('Department').mean(numeric_only=True)
print(grouped)

### Merging

In [None]:
df2 = pd.DataFrame({'Name': ['Alice', 'Bob'], 'Bonus': [1000, 2000]})
merged = pd.merge(df, df2, on='Name', how='left')
print(merged)

## 6. Reading/Writing Data
Pandas supports reading from and writing to various formats like CSV, Excel, and JSON.

In [None]:
# Example: Write to CSV (commented out to avoid file I/O in this notebook)
# df.to_csv('data.csv', index=False)
# df_from_csv = pd.read_csv('data.csv')
# print(df_from_csv)

## 7. Practical Example: Data Analysis
Analyze a small dataset using Pandas.

In [None]:
data = {'Continent': ['Africa', 'Asia', 'Europe'], 'Area': [11.74, 10.44, 1.94]}
df_area = pd.DataFrame(data)
print(df_area)
print('Average Area:', df_area['Area'].mean())

## 8. Exercises
1. Create a DataFrame with 5 rows of student data (Name, Grade, Subject) and filter for grades > 80.
2. Group a DataFrame by a column and compute the sum of another column.
3. Merge two DataFrames on a common column.
4. Handle missing data in a DataFrame by filling NaN with the column mean.

### Example Solution for Exercise 1

In [None]:
students = pd.DataFrame({'Name': ['Alice', 'Bob', 'Cathy', 'David', 'Eve'],
                         'Grade': [85, 92, 78, 95, 88],
                         'Subject': ['Math', 'Math', 'Science', 'Math', 'Science']})
print(students[students['Grade'] > 80])

## 9. Tips for Learning Pandas
- **Practice**: Work with real datasets (e.g., from Kaggle).
- **Documentation**: Refer to pandas.pydata.org for detailed guides.
- **Integration**: Combine with NumPy and Matplotlib for full data analysis workflows.
- **Efficiency**: Use vectorized operations and avoid loops where possible.