# Day 7: Advanced Data Manipulation with Pandas

## Objectives:
- Perform more advanced operations on Pandas DataFrames.
- Understand how to clean and preprocess data.
### Topics to Cover:
#### Handling Missing Data:

Use methods like `isnull()`, `dropna()`, and `fillna()` to identify and manage missing values in your DataFrame.

In [15]:
import pandas as pd

df = pd.read_csv("D:/75 Days Challenge/Day_6_bmi_dataset.csv")
df.head()

Unnamed: 0,Gender,Height,Weight,Index
0,Male,161,89,4
1,Male,179,127,4
2,Male,172,139,5
3,Male,153,104,5
4,Male,165,68,2


- Identify missing values.

In [16]:
missing_data = df.isnull().sum()
missing_data

Gender    0
Height    0
Weight    0
Index     0
dtype: int64

- Drop rows with any missing values.

In [17]:
df_cleaned = df.dropna()

- Fill missing values with a specific value

In [None]:
df['column_name'].fillna(value=0, inplace=True)

---

#### 2. Grouping and Aggregation:

- Use `groupby()` to group data based on one or more columns and then apply aggregation functions like `mean()`, `sum()`, etc.

In [23]:
grouped_data = df.groupby('Gender').mean()
grouped_data

Unnamed: 0_level_0,Height,Weight,Index,Height_MM
Gender,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
Female,170.81,104.93,3.65,1708.1
Male,169.915,107.19,3.825,1699.15


---

#### 3. Merging and Joining DataFrames:
- Use `merge()` and `join()` to combine two DataFrames based on common columns or indices.

In [34]:
data1 = {
    'common_column': [1, 2, 3, 4],
    'Product': ['Laptop', 'Smartphone', 'Tablet', 'Smartwatch'],
    'Sales': [1000, 500, 300, 250]
}

df1 = pd.DataFrame(data1)

In [35]:
data2 = {
    'common_column': [2, 3, 4, 5],
    'Price': [700, 300, 250, 400],
    'Rating': [4.5, 4.0, 3.5, 5.0]
}

df2 = pd.DataFrame(data2)

In [37]:
merged_df = pd.merge(df1, df2, on='common_column', how='inner')
merged_df

Unnamed: 0,common_column,Product,Sales,Price,Rating
0,2,Smartphone,500,700,4.5
1,3,Tablet,300,300,4.0
2,4,Smartwatch,250,250,3.5


#### 4. Applying Functions Using `apply()`:
- Use the `apply()` method to apply a custom function to your DataFrame or Series.

In [22]:
def my_function(x):
    return x * 10

df['Height_MM'] = df['Height'].apply(my_function)
df.head()

Unnamed: 0,Gender,Height,Weight,Index,Height_MM
0,Male,161,89,4,1610
1,Male,179,127,4,1790
2,Male,172,139,5,1720
3,Male,153,104,5,1530
4,Male,165,68,2,1650


#### 5. Working with Dates and Times:
- Utilize the `datetime` module and `pd.to_datetime()` to work with date and time data effectively.

In [38]:
data = {
    'Product': ['Laptop', 'Smartphone', 'Tablet', 'Smartwatch'],
    'Sales': [1000, 500, 300, 250],
    'date_column': ['2024-01-15', '2024-02-20', '2024-03-10', '2024-04-25']  # Dates as strings
}

df = pd.DataFrame(data)

In [40]:
df['date_column'] = pd.to_datetime(df['date_column'])
df['date_column']

0   2024-01-15
1   2024-02-20
2   2024-03-10
3   2024-04-25
Name: date_column, dtype: datetime64[ns]

---

### Exercises:
**1. Clean a Dataset:** Load a dataset and handle missing values using the methods discussed.

In [41]:
df = pd.read_csv('D:/75 Days Challenge/Day_6_bmi_dataset.csv')
df.fillna(method='ffill', inplace=True)

  df.fillna(method='ffill', inplace=True)


**2. Group Data**: Group the dataset based on categorical columns and calculate aggregations such as mean or sum.

In [43]:
grouped_data = df.groupby('Gender').agg({'Height': 'mean'})
print(grouped_data)

         Height
Gender         
Female  170.810
Male    169.915


**3. Merge Two DataFrames**: Create two DataFrames and merge them using a common key.

In [45]:
df1 = pd.DataFrame({'key': ['A', 'B', 'C'], 'value': [1, 2, 3]})
df2 = pd.DataFrame({'key': ['A', 'B', 'D'], 'value': [4, 5, 6]})
merged_df = pd.merge(df1, df2, on='key', how='inner')
merged_df

Unnamed: 0,key,value_x,value_y
0,A,1,4
1,B,2,5


---