[Reference](https://debonikpal.medium.com/why-pandas-is-the-secret-weapon-for-data-analysis-in-2024-dc7eb0376c5b)

# DataFrames and Series

In [1]:
import pandas as pd

# Creating a DataFrame
data = {'Product': ['Widget A', 'Widget B', 'Widget C'],
        'Price': [9.99, 19.99, 29.99],
        'Quantity': [30, 20, 15]}
df = pd.DataFrame(data)

# Accessing the DataFrame
print(df)

    Product  Price  Quantity
0  Widget A   9.99        30
1  Widget B  19.99        20
2  Widget C  29.99        15


# Data Cleaning and Transformation

In [2]:
# Filling missing values
df['Price'].fillna(df['Price'].mean(), inplace=True)

# Dropping rows with missing values
df.dropna(inplace=True)

The behavior will change in pandas 3.0. This inplace method will never work because the intermediate object on which we are setting values always behaves as a copy.

For example, when doing 'df[col].method(value, inplace=True)', try using 'df.method({col: value}, inplace=True)' or df[col] = df[col].method(value) instead, to perform the operation inplace on the original object.


  df['Price'].fillna(df['Price'].mean(), inplace=True)


# Grouping and Aggregation

In [3]:
# Grouping by product and calculating the total quantity
grouped_data = df.groupby('Product')['Quantity'].sum()
print(grouped_data)

Product
Widget A    30
Widget B    20
Widget C    15
Name: Quantity, dtype: int64


# Time Series Analysis

In [4]:
# Creating a time series DataFrame
date_range = pd.date_range(start='2024-01-01', periods=5, freq='D')
df_time = pd.DataFrame({'Date': date_range, 'Value': [100, 200, 300, 400, 500]})

# Setting the Date as the index
df_time.set_index('Date', inplace=True)

# Resampling data
resampled_data = df_time.resample('2D').mean()
print(resampled_data)

            Value
Date             
2024-01-01  150.0
2024-01-03  350.0
2024-01-05  500.0


# Merging and Joining Datasets

In [5]:
# Merging two DataFrames
df1 = pd.DataFrame({'ID': [1, 2, 3], 'Name': ['Alice', 'Bob', 'Charlie']})
df2 = pd.DataFrame({'ID': [1, 2, 4], 'Age': [25, 30, 22]})

merged_df = pd.merge(df1, df2, on='ID', how='inner')
print(merged_df)

   ID   Name  Age
0   1  Alice   25
1   2    Bob   30
