#### Data Manipulation and Analysis with Pandas
Data manipulation and analysis are key tasks in any data science or data analysis project. Pandas provides a wide range of functions for data manipulation and analysis, making it easier to clean, transform, and extract insights from data. In this lesson, we will cover various data manipulation and analysis techniques using Pandas.

In [None]:
import pandas as pd

In [None]:
df=pd.read_csv('data.csv')
## fecth the first 5 rows
df.head(5)

In [None]:
df.tail(5)

In [None]:
df.info()

In [None]:
# for vizualizing the descriptive stats
df.describe()


In [None]:
df.dtypes

In [None]:
## Handling Missing Values
## This is the way how you gonna find the missing values in any of the columns inside ur dataset.
df.isnull().any()

In [None]:
#  Not a proper way in this fast pace environment
# df.isnull()

In [None]:
# How many values are missing in each of the columns
df.isnull().sum()

In [None]:
# I want to fill the missing values with 0 in a new dataframe
# df: The original DataFrame that may contain missing values (NaN, None, etc.).
# .fillna(0): A Pandas method that fills all missing values in the DataFrame with the specified value, in this case, 0.
# df_filled: A new DataFrame where all missing values from df are replaced with 0. The original df remains unchanged.
df_filled=df.fillna(0)

In [None]:
# df_filled

In [None]:
df_filled.isnull().any()

In [None]:
### filling missing values with the mean of the column
# df['Sales_fillNA']=df['Sales'].fillna(df['Sales'].mean())

df['Sales_fillNA'] = df['Sales'].fillna(df['Sales'].mean())
df

In [None]:
df.dtypes

In [None]:
## Renaming Columns
df=df.rename(columns={'Date':'Sales Date'})
df.head()

In [None]:
## change datatypes
df['Value_new']=df['Value'].fillna(df['Value'].mean()).astype(int)

# df['Value_New'] = df['value'].fillna(df["Value"].mean()).astype(int)
df.head()

In [None]:
df['New Value']=df['Value'].apply(lambda x:x*2)
df.head()

In [None]:
## Data Aggregating And Grouping
df.head()

In [None]:
grouped_mean=df.groupby('Product')['Value'].mean()
print(grouped_mean)

In [None]:
grouped_sum=df.groupby(['Product','Region'])['Value'].sum()
print(grouped_sum)

In [None]:
df.groupby(['Product','Region'])['Value'].mean()

In [None]:
## aggregate multiple functions
groudped_agg=df.groupby('Region')['Value'].agg(['mean','sum','count'])
groudped_agg

In [None]:
### Merging and joining Dataframes
# Create sample DataFrames
df1 = pd.DataFrame({'Key': ['A', 'B', 'C'], 'Value1': [1, 2, 3]})
df2 = pd.DataFrame({'Key': ['A', 'B', 'D'], 'Value2': [4, 5, 6]})

In [None]:
df1

In [None]:
df2

In [None]:
df3 = pd.DataFrame({'Key': ['A', 'B', 'E'], 'Value3': [4, 5, 6]})

In [None]:
df3

In [None]:
## Merge Datafranme on the 'Key Columns'
pd.merge(df1,df2,on="Key",how="inner")

In [None]:
pd.merge(df1,df2,on="Key",how="outer")

In [None]:
pd.merge(df1,df2,on="Key",how="left")

In [None]:
pd.merge(df2,df1,on="Key",how="left")

In [None]:
pd.merge(df1,df2,on="Key",how="right")

In [None]:
df1 = pd.DataFrame({'id': [1, 2, 3], 'group': ['A', 'B', 'A'], 'value1': [10, 20, 30]})
df2 = pd.DataFrame({'id': [1, 2, 4], 'group': ['A', 'B', 'C'], 'value2': [100, 200, 400]})

In [None]:
merged_df = pd.merge(df1, df2, on=['id', 'group'], how='inner')
merged_df

In [None]:
merged_df = pd.merge(df1, df2, on=['id', 'group'], how='left')
print(merged_df)