**Introduction to Pandas**

Pandas is a powerful Python library used for data manipulation and analysis. It provides data structures and functions designed to work with structured data seamlessly.

Key Features:

Efficient handling of structured data. Integrated tools for reading and writing data. Data alignment and integrated handling of missing data.

In [None]:
**Getting Started with Pandas**

To use Pandas, you need to install it first:

In [None]:
pip install pandas

**Import Pandas in your Python script**:

In [3]:
import pandas as pd

In [None]:
**Creating Data Structures**

Series: A Series is a one-dimensional labeled array.

In [4]:
data = pd.Series([1, 2, 3, 4], index=['a', 'b', 'c', 'd'])

**DataFrame**:

A DataFrame is a two-dimensional labeled data structure.

In [5]:
data = {'Name': ['John', 'Anna', 'Peter'],
        'Age': [28, 24, 35],
        'City': ['New York', 'Paris', 'Berlin']}
df = pd.DataFrame(data)


**Basic Operations Selection and Filtering**:

In [6]:
# Selecting a column
ages = df['Age']

# Filtering rows
filtered_df = df[df['Age'] > 25]

**Adding/Removing Columns**:

In [7]:
# Adding a new column
df['Salary'] = [50000, 60000, 70000]

# Dropping a column
df = df.drop('City', axis=1)


**Data Analysis with Pandas Descriptive Statistics**:

In [8]:
# Summary statistics
summary = df.describe()

# Mean, median, mode
mean_age = df['Age'].mean()

**Grouping Data**:

In [9]:
import pandas as pd

# Sample DataFrame
data = {'Name': ['John', 'Anna', 'Peter', 'Linda'],
        'Age': [28, 24, 35, 28],
        'Salary': [50000, 60000, 70000, 80000]}

df = pd.DataFrame(data)

# Group by 'Age' and calculate the mean of numeric columns only
grouped = df.groupby('Age').mean(numeric_only=True)

print(grouped)


      Salary
Age         
24   60000.0
28   65000.0
35   70000.0


**Handling Missing Data Pandas offers robust ways to handle missing data**:

Checking for missing data:

In [12]:
missing_data = df.isnull()

**Filling missing data**:

In [22]:
df.fillna({'Salary': 0}, inplace=True)

**Dropping missing data**:

In [23]:
df.dropna(inplace=True)

**Merging and Concatenation Pandas allows you to combine data from multiple DataFrames**:

Merging:

In [24]:
df1 = pd.DataFrame({'key': ['A', 'B', 'C'], 'value1': [1, 2, 3]})
df2 = pd.DataFrame({'key': ['A', 'B', 'D'], 'value2': [4, 5, 6]})
merged_df = pd.merge(df1, df2, on='key')


**Concatenation**:

In [25]:
concat_df = pd.concat([df1, df2])

Example Program:

Handle and Analyze Data Here's an example program that demonstrates data handling and analysis using Pandas:

In [27]:
import pandas as pd

# Create a sample DataFrame
data = {'Name': ['John', 'Anna', 'Peter', 'Linda'],
        'Age': [28, 24, 35, 32],
        'City': ['New York', 'Paris', 'Berlin', 'London'],
        'Salary': [50000, 60000, 70000, None]}

df = pd.DataFrame(data)

# Fill missing salary with the mean salary
df.fillna({'Salary': df['Salary'].mean()}, inplace=True)

# Group by City and calculate the average age and salary
grouped_df = df.groupby('City')[['Age', 'Salary']].mean()

# Display the original DataFrame and the grouped DataFrame
print("Original DataFrame:\n", df)
print("\nGrouped DataFrame:\n", grouped_df)


Original DataFrame:
     Name  Age      City   Salary
0   John   28  New York  50000.0
1   Anna   24     Paris  60000.0
2  Peter   35    Berlin  70000.0
3  Linda   32    London  60000.0

Grouped DataFrame:
            Age   Salary
City                   
Berlin    35.0  70000.0
London    32.0  60000.0
New York  28.0  50000.0
Paris     24.0  60000.0


**Conclusion**

Pandas is a crucial library for data manipulation and analysis in Python. By mastering its features, you can efficiently handle and analyze large datasets, making it a must-learn tool for data science and analysis.