# 1. What is a DataFrame?
A DataFrame is a two-dimensional, size-mutable, and heterogeneous tabular data structure with labeled axes (rows and columns). It’s similar to a spreadsheet or SQL table and is one of the most versatile structures in Pandas.

## 2. Creating a DataFrame
DataFrames can be created in several ways:


### From a Dictionary:
You can create a DataFrame using a dictionary where the keys are column names and the values are lists or arrays.

In [None]:
import pandas as pd

data = {
    'Name': ['Alice', 'Bob', 'Charlie'],
    'Age': [25, 30, 35],
    'City': ['New York', 'Los Angeles', 'Chicago']
}

df = pd.DataFrame(data)

### From a List of Dictionaries:
Another common way to create a DataFrame is from a list of dictionaries.

In [None]:
data = [
    {'Name': 'Alice', 'Age': 25, 'City': 'New York'},
    {'Name': 'Bob', 'Age': 30, 'City': 'Los Angeles'},
    {'Name': 'Charlie', 'Age': 35, 'City': 'Chicago'}
]

df = pd.DataFrame(data)

### From a CSV or Excel File:
You can load data directly from files into a DataFrame using `pd.read_csv()` or `pd.read_excel()`.

In [None]:
df = pd.read_csv('file.csv')

### From NumPy Arrays:
DataFrames can also be created from NumPy arrays.

In [None]:
import numpy as np

data = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]])
df = pd.DataFrame(data, columns=['A', 'B', 'C'])

## 3. Indexing and Selecting Data
Pandas provides powerful ways to access and manipulate data:


### .loc[]: Access rows and columns by labels.

In [None]:
df.loc[0]  # Accesses the first row
df.loc[:, 'Age']  # Accesses the 'Age' column
df.loc[0, 'Name']  # Accesses the value at the first row of 'Name' column

### .iloc[]: Access rows and columns by integer positions.

In [None]:
df.iloc[0]  # Accesses the first row
df.iloc[:, 1]  # Accesses the second column
df.iloc[0, 1]  # Accesses the value at the first row of the second column

### Boolean Indexing: Filtering data based on conditions.

In [None]:
df[df['Age'] > 30]  # Returns rows where 'Age' is greater than 30

## 4. DataFrame Operations

### Add/Delete Columns:

In [None]:
df['Salary'] = [50000, 60000, 70000]  # Adding a new column
df.drop('City', axis=1, inplace=True)  # Deleting a column

### Add/Delete Rows:

In [None]:
df = df.append({'Name': 'David', 'Age': 40, 'City': 'Miami'}, ignore_index=True)  # Adding a row
df.drop(0, axis=0, inplace=True)  # Deleting a row by index

### DataFrame Transpose:
You can transpose a DataFrame (swap rows and columns).

In [None]:
df.T

## 5. Data Cleaning

### Handling Missing Data:

In [None]:
df.dropna()  # Drops rows with missing values
df.fillna(0)  # Fills missing values with 0
df['Age'].fillna(df['Age'].mean(), inplace=True)  # Fills missing values with the mean

### Data Type Conversion:

In [None]:
df['Age'] = df['Age'].astype(float)

### Removing Duplicates:

In [None]:
df.drop_duplicates(inplace=True)

## 6. Merging and Joining DataFrames

### Concat: Concatenate DataFrames either along rows or columns.

In [None]:
df1 = pd.DataFrame({'A': [1, 2], 'B': [3, 4]})
df2 = pd.DataFrame({'A': [5, 6], 'B': [7, 8]})
pd.concat([df1, df2])

### Merge: Merge DataFrames based on keys (similar to SQL JOIN).

In [None]:
df1 = pd.DataFrame({'key': ['A', 'B', 'C'], 'value': [1, 2, 3]})
df2 = pd.DataFrame({'key': ['B', 'C', 'D'], 'value': [4, 5, 6]})
pd.merge(df1, df2, on='key', how='inner')

## 7. GroupBy Operations

### GroupBy: You can group data by one or more columns and then apply aggregate functions.

In [None]:
df.groupby('City').mean()  # Group by 'City' and calculate the mean
df.groupby('City').agg({'Age': 'mean', 'Salary': 'sum'})  # Apply different aggregations

## 8. Pivot Tables
Pivot tables in Pandas are similar to pivot tables in Excel. They allow you to rearrange and summarize data.

In [None]:
df.pivot_table(values='Age', index='City', columns='Name', aggfunc='mean')

## 9. Handling Dates

### Datetime Conversion: Convert a column to datetime.

In [None]:
df['Date'] = pd.to_datetime(df['Date'])

### DateTimeIndex: You can use date/time as an index.

In [None]:
df.set_index('Date', inplace=True)

### Resampling: Resample time-series data to a different frequency.

In [None]:
df.resample('M').mean()  # Resample by month and calculate the mean

## 10. Advanced Indexing

### MultiIndex: Pandas supports hierarchical indexing to work with high-dimensional data.

In [None]:
arrays = [
    ['A', 'A', 'B', 'B'],
    [1, 2, 1, 2]
]
df = pd.DataFrame(np.random.randn(4, 2), index=arrays, columns=['Data1', 'Data2'])

### Index Slicing: You can slice DataFrames with MultiIndex.

In [None]:
df.loc['A']
df.loc[('A', 1):('B', 1)]

## 11. Visualization

### Plotting: Simple plots can be created directly from DataFrames.

In [None]:
df.plot(x='Age', y='Salary', kind='scatter')

## 12. Performance Optimization

### Vectorization:
Pandas operations are vectorized, meaning they are applied element-wise, making them very efficient.

### Apply/Map:
For custom operations, apply and map are often used, but they can be slower than vectorized operations.

In [None]:
df['Age'].apply(lambda x: x + 1)

### Memory Usage:
You can check and optimize memory usage by downcasting data types.

In [None]:
df.memory_usage(deep=True)

## 14. Common Issues

### SettingWithCopyWarning:
This warning occurs when you're trying to modify a copy of a slice from a DataFrame. It's essential to understand when you're working with views vs. copies.

In [None]:
df.loc[0, 'Age'] = 26  # Direct assignment avoids this warning