Introduction:

Pandas is a powerful Python library specifically designed for data manipulation and analysis.
It provides a high-performance, easy-to-use data structures (Series and DataFrames) and tools for data cleaning, transformation, and exploration.
Pandas is widely used in various fields, including data science, machine learning, and finance.

### Series:

One-dimensional labeled array holding data of any data type (numeric, text, categorical, etc.).
Supports various operations like indexing, slicing, filtering, and mathematical calculations.

In [1]:
import pandas as pd

s = pd.Series([1, 2, 3, 4, 5])
print(s)

0    1
1    2
2    3
3    4
4    5
dtype: int64


### DataFrame:

Two-dimensional labeled data structure with rows and columns.
Can store heterogeneous data types within a column.
Supports various operations like selection, filtering, grouping, merging, and pivoting.

In [2]:
 data = {'name': ['Alice', 'Bob', 'Charlie'],
         'age': [25, 30, 35],
         'city': ['New York', 'Los Angeles', 'Chicago']}
df = pd.DataFrame(data)
print(df)

      name  age         city
0    Alice   25     New York
1      Bob   30  Los Angeles
2  Charlie   35      Chicago


## Data Manipulation Techniques:

### Selection and Filtering:

Accessing specific rows or columns using labels or indices.
Filtering data based on conditions using boolean masks.

In [3]:
# Select rows with age greater than 30
df_filtered = df[df['age'] > 30]
print(df_filtered)

      name  age     city
2  Charlie   35  Chicago


### Aggregation:

Calculating summary statistics like mean, median, sum, count, etc.
Grouping data by one or more columns and applying aggregation functions.

In [4]:
# Calculate average age
average_age = df['age'].mean()
print(average_age)

30.0


### Transformation:

Applying functions to data to create new columns or modify existing ones.
Using built-in functions like apply, map, and replace.

In [5]:
# Create a new column for age in years
df['age_in_years'] = df['age']
print(df)

      name  age         city  age_in_years
0    Alice   25     New York            25
1      Bob   30  Los Angeles            30
2  Charlie   35      Chicago            35


### Joining and Merging:

Combining data from multiple DataFrames based on common keys.
Using different join types (inner, outer, left, right).

In [6]:
# Merge two DataFrames based on a common column
df1 = pd.DataFrame({'key': [1, 2, 3], 'value1': ['a', 'b', 'c']})
df2 = pd.DataFrame({'key': [2, 3, 4], 'value2': ['d', 'e', 'f']})
merged_df = pd.merge(df1, df2, on='key')
print(merged_df)

   key value1 value2
0    2      b      d
1    3      c      e


### Reshaping:

Pivoting data to change its structure (e.g., from long to wide format).
Using functions like pivot_table and melt.

In [7]:
# Pivot the DataFrame to create a wide format
pivoted_df = df.pivot_table(index='city', columns='name', values='age')
print(pivoted_df)

name         Alice   Bob  Charlie
city                             
Chicago        NaN   NaN     35.0
Los Angeles    NaN  30.0      NaN
New York      25.0   NaN      NaN
