# Pandas Basics in Python

Pandas is a Python library used for data manipulation and analysis. It provides powerful data structures like **DataFrame** and **Series** for handling structured data effectively.

In [None]:
# Install pandas if not already installed
# !pip install pandas

## **Creating a DataFrame**

In [None]:
import pandas as pd

# Create a dummy DataFrame
data = {
    'Name': ['Alice', 'Bob', 'Charlie', 'David'],
    'Age': [25, 30, 35, 40],
    'Salary': [50000, 60000, 55000, 70000]
}

df = pd.DataFrame(data)
print("DataFrame:")
print(df)

## **Viewing Data**

In [None]:
# View the first few rows
print(df.head())

# Display column names
print(df.columns)

# Shape of the DataFrame
print(df.shape)

## **Basic Data Manipulation**

In [None]:
# Adding a New Column
df['Bonus'] = df['Salary'] * 0.1
print("\nDataFrame with Bonus:")
print(df)

In [None]:
# Filtering Data
high_salary = df[df['Salary'] > 55000]
print("\nEmployees with Salary > 55000:")
print(high_salary)

In [None]:
# Grouping and Aggregation
print("\nAverage Salary by Age:")
print(df.groupby('Age')['Salary'].mean())

## **File Input/Output**

In [None]:
# Save to CSV
df.to_csv("dummy_data.csv", index=False)

# Load from CSV
new_df = pd.read_csv("dummy_data.csv")
print("\nLoaded DataFrame:")
print(new_df)

## **Advanced Applications**

In [None]:
# Handling Missing Data
data = {
    'Name': ['Alice', 'Bob', 'Charlie', 'David'],
    'Age': [25, None, 35, 40],
    'Salary': [50000, 60000, None, 70000]
}

df = pd.DataFrame(data)

# Fill missing values
df['Age'].fillna(df['Age'].mean(), inplace=True)

# Drop rows with missing values
df.dropna(inplace=True)

print("\nDataFrame after handling missing values:")
print(df)

In [None]:
# Merging DataFrames
df1 = pd.DataFrame({'ID': [1, 2], 'Name': ['Alice', 'Bob']})
df2 = pd.DataFrame({'ID': [1, 2], 'Salary': [50000, 60000]})

merged_df = pd.merge(df1, df2, on='ID')
print("\nMerged DataFrame:")
print(merged_df)

## **Visualization with Pandas**

In [None]:
import matplotlib.pyplot as plt

# Line plot
df.plot(x='Age', y='Salary', kind='line')
plt.show()

# Bar plot
df['Salary'].plot(kind='bar')
plt.show()

## **Applications of Pandas**

1. **Data Cleaning and Preprocessing:** Handle missing values, duplicates, and type conversions.
2. **Exploratory Data Analysis (EDA):** Analyze datasets to find patterns and trends.
3. **Feature Engineering:** Prepare data for machine learning workflows.
4. **Time-Series Analysis:** Analyze temporal data for trends and patterns.
5. **Input/Output:** Read/write data from various file formats (CSV, Excel, JSON, SQL).
6. **Integration with Machine Learning Libraries:** Prepare datasets for scikit-learn, TensorFlow, or PyTorch models.