#### Pandas – Grouping, Merging, Handling Missing Data, and Dates

<i> We will continue using a clean and small dataset, now adding scenarios involving grouping, merging, handling missing values, and working with dates. </i>

#####  Import Required Libraries

In [1]:
import pandas as pd
import numpy as np

##### Sample DataFrames

In [2]:
# Main animals dataset
df1 = pd.DataFrame({
    'animal': ['cat', 'dog', 'cat', 'dog', 'snake'],
    'age': [3, 5, np.nan, 7, 4],
    'visits': [2, 3, 2, 1, 2],
    'priority': ['yes', 'no', 'yes', 'no', 'no']
})

# Additional dataset for merging
df2 = pd.DataFrame({
    'animal': ['cat', 'dog', 'snake'],
    'type_code': ['C', 'D', 'S']
})

##### Grouping and Aggregation - Count the number of visits by each priority status.

In [3]:
df1.groupby('priority')['visits'].sum()

priority
no     6
yes    4
Name: visits, dtype: int64

##### Calculate both the mean and max visits for each animal type.

In [4]:
df1.groupby('animal')['visits'].agg(['mean', 'max'])

Unnamed: 0_level_0,mean,max
animal,Unnamed: 1_level_1,Unnamed: 2_level_1
cat,2.0,2
dog,2.0,3
snake,2.0,2


##### Merging DataFrames

In [5]:
merged_df = pd.merge(df1, df2, on='animal')
merged_df

Unnamed: 0,animal,age,visits,priority,type_code
0,cat,3.0,2,yes,C
1,dog,5.0,3,no,D
2,cat,,2,yes,C
3,dog,7.0,1,no,D
4,snake,4.0,2,no,S


##### Handling Missing Data - Find rows where the age is missing.

In [6]:
df1[df1['age'].isnull()]

Unnamed: 0,animal,age,visits,priority
2,cat,,2,yes


##### Fill missing age values with the average age of that animal.

In [9]:
df1['age']= df1.groupby('animal')['age'].transform(lambda x:x.fillna(x.mean()))
df1

Unnamed: 0,animal,age,visits,priority
0,cat,3.0,2,yes
1,dog,5.0,3,no
2,cat,3.0,2,yes
3,dog,7.0,1,no
4,snake,4.0,2,no


##### Working with Dates - Create a DataFrame with datetime data.

In [12]:
date_df=pd.DataFrame({ 'animal': ['cat', 'dog', 'snake', 'cat', 'dog'],
    'visit_date': pd.to_datetime(['2023-06-01', '2023-06-02', '2023-06-02', '2023-06-03', '2023-06-03']),
    'visits': [1, 2, 1, 3, 1]
})
date_df

Unnamed: 0,animal,visit_date,visits
0,cat,2023-06-01,1
1,dog,2023-06-02,2
2,snake,2023-06-02,1
3,cat,2023-06-03,3
4,dog,2023-06-03,1


##### Extract year and weekday from the visit date:

In [14]:
date_df['year'] = date_df['visit_date'].dt.year
date_df['weekday']=date_df['visit_date'].dt.day_name()
date_df

Unnamed: 0,animal,visit_date,visits,year,weekday
0,cat,2023-06-01,1,2023,Thursday
1,dog,2023-06-02,2,2023,Friday
2,snake,2023-06-02,1,2023,Friday
3,cat,2023-06-03,3,2023,Saturday
4,dog,2023-06-03,1,2023,Saturday


##### Add a new row to the DataFrame and then remove it.

In [17]:
df1.loc[len(df1)] = ['parrot', 2, 1, 'yes']
df1

Unnamed: 0,animal,age,visits,priority
0,cat,3.0,2,yes
1,dog,5.0,3,no
2,cat,3.0,2,yes
3,dog,7.0,1,no
4,snake,4.0,2,no
5,parrot,2.0,1,yes


In [18]:
df1 = df1.drop(df1[df1['animal'] == 'parrot'].index)
df1

Unnamed: 0,animal,age,visits,priority
0,cat,3.0,2,yes
1,dog,5.0,3,no
2,cat,3.0,2,yes
3,dog,7.0,1,no
4,snake,4.0,2,no


##### Using .query() for Filtering

In [20]:
#Path I
df1[
    df1['age'].between(2,5)
][['animal', 'age', 'visits']].sort_values(by='age', ascending=False)

Unnamed: 0,animal,age,visits
1,dog,5.0,3
4,snake,4.0,2
0,cat,3.0,2
2,cat,3.0,2


In [21]:
#Path II
(
    df1.query('2 <= age <= 5')
    .loc[:, ['animal', 'age', 'visits']]
    .sort_values(by='age', ascending=False)
)

Unnamed: 0,animal,age,visits
1,dog,5.0,3
4,snake,4.0,2
0,cat,3.0,2
2,cat,3.0,2


##### Filter rows where the 'Name' column contains the word "and" (case insensitive).

In [23]:
# Sample DataFrame

df = pd.DataFrame({
    'Name': [
        'Tom Hanks',
        'Emma Stone',
        'Ryan Gosling and Emma Stone',
        'Will Smith and Martin Lawrence',
        'Jennifer Lawrence',
        'Ben Affleck'
    ],
    'Movie': [
        'Forrest Gump',
        'La La Land',
        'La La Land',
        'Bad Boys',
        'Hunger Games',
        'Argo'
    ]
})

In [24]:
def contains_and(name: str) -> bool:
    return 'and' in name.lower()

In [25]:
df[df['Name'].apply(contains_and)]

Unnamed: 0,Name,Movie
2,Ryan Gosling and Emma Stone,La La Land
3,Will Smith and Martin Lawrence,Bad Boys
