#### Pandas Fundamentals – Creating, Selecting, Filtering and Summarizing DataFrames
<i> We will use a simple and clean dataset (think of it as records from an animal shelter) to demonstrate key concepts clearly. <i/>

##### Import Required Libraries

In [1]:
import pandas as pd
import numpy as np

##### Create a Sample DataFrame

In [2]:
data = {
    'animal': ['cat', 'cat', 'snake', 'dog', 'dog', 'cat', 'snake', 'cat', 'dog', 'dog'],
    'age': [2.5, 3, 0.5, np.nan, 5, 2, 4.5, np.nan, 7, 3],
    'visits': [1, 3, 2, 3, 2, 3, 1, 1, 2, 1],
    'priority': ['yes', 'yes', 'no', 'yes', 'no', 'no', 'no', 'yes', 'no', 'no']
}

labels = ['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j']

df = pd.DataFrame(data, index=labels)
df

Unnamed: 0,animal,age,visits,priority
a,cat,2.5,1,yes
b,cat,3.0,3,yes
c,snake,0.5,2,no
d,dog,,3,yes
e,dog,5.0,2,no
f,cat,2.0,3,no
g,snake,4.5,1,no
h,cat,,1,yes
i,dog,7.0,2,no
j,dog,3.0,1,no


##### Basic DataFrame Info - General info:

In [3]:
df. info()

<class 'pandas.core.frame.DataFrame'>
Index: 10 entries, a to j
Data columns (total 4 columns):
 #   Column    Non-Null Count  Dtype  
---  ------    --------------  -----  
 0   animal    10 non-null     object 
 1   age       8 non-null      float64
 2   visits    10 non-null     int64  
 3   priority  10 non-null     object 
dtypes: float64(1), int64(1), object(2)
memory usage: 400.0+ bytes


##### Statistical summary:

In [4]:
df. describe()

Unnamed: 0,age,visits
count,8.0,10.0
mean,3.4375,1.9
std,2.007797,0.875595
min,0.5,1.0
25%,2.375,1.0
50%,3.0,2.0
75%,4.625,2.75
max,7.0,3.0


#####  Selecting and Viewing Data - Show the first 3 rows

In [5]:
#Path I
df.head(3)

#Path II
df.iloc[:3]

Unnamed: 0,animal,age,visits,priority
a,cat,2.5,1,yes
b,cat,3.0,3,yes
c,snake,0.5,2,no


##### Select only 'animal' and 'age' columns:

In [7]:
df[['animal', 'age']] #or 
df.loc[:, ['animal', 'age']]

Unnamed: 0,animal,age
a,cat,2.5
b,cat,3.0
c,snake,0.5
d,dog,
e,dog,5.0
f,cat,2.0
g,snake,4.5
h,cat,
i,dog,7.0
j,dog,3.0


##### Select rows 3, 4, and 8 with 'animal' and 'age':

In [9]:
df.loc[df.index[[3,4,8]], ['animal', 'age']]

Unnamed: 0,animal,age
d,dog,
e,dog,5.0
i,dog,7.0


##### Conditional Filtering - Rows where visits > 3:

In [14]:
df[df['visits'] > 3]


Unnamed: 0,animal,age,visits,priority


##### Rows with missing age (NaN):

In [15]:
df[df['age'].isnull()]

Unnamed: 0,animal,age,visits,priority
d,dog,,3,yes
h,cat,,1,yes


##### Rows where animal is 'cat' and age < 3:

In [16]:
df[(df['animal'] == 'cat') & (df['age'] < 3)]

Unnamed: 0,animal,age,visits,priority
a,cat,2.5,1,yes
f,cat,2.0,3,no


#####  Rows where age is between 2 and 4 (inclusive):

In [17]:
df[df['age'].between(2,4)]

Unnamed: 0,animal,age,visits,priority
a,cat,2.5,1,yes
b,cat,3.0,3,yes
f,cat,2.0,3,no
j,dog,3.0,1,no


##### Updating and Aggregating Data - Change age in row 'f' to 1.5:

In [20]:
df.loc['f', 'age'] = 1.5
df

Unnamed: 0,animal,age,visits,priority
a,cat,2.5,1,yes
b,cat,3.0,3,yes
c,snake,0.5,2,no
d,dog,,3,yes
e,dog,5.0,2,no
f,cat,1.5,3,no
g,snake,4.5,1,no
h,cat,,1,yes
i,dog,7.0,2,no
j,dog,3.0,1,no


##### Total number of visits:

In [21]:
df['visits'].sum()

19

##### Mean age for each animal type:

In [22]:
df.groupby('animal')['age'].mean()

animal
cat      2.333333
dog      5.000000
snake    2.500000
Name: age, dtype: float64

##### Add a new row, then remove it:

In [None]:
df.loc['k'] =['dog', 5.5, 2, 'no']
df = df.drop('k')
df

##### Count the number of each animal type:

In [25]:
df['animal'].value_counts()

animal
cat      4
dog      4
snake    2
Name: count, dtype: int64

##### Sorting and Transforming -  Sort the DataFrame by animal and then by age:

In [26]:
df.sort_values(by=['animal', 'age'])

Unnamed: 0,animal,age,visits,priority
f,cat,1.5,3,no
a,cat,2.5,1,yes
b,cat,3.0,3,yes
h,cat,,1,yes
j,dog,3.0,1,no
e,dog,5.0,2,no
i,dog,7.0,2,no
d,dog,,3,yes
c,snake,0.5,2,no
g,snake,4.5,1,no


##### Create a new column 'age_times_two' by multiplying the age column by 2:

In [28]:
df['age_times_two'] = df['age'] * 2
df

Unnamed: 0,animal,age,visits,priority,age_times_two
a,cat,2.5,1,yes,5.0
b,cat,3.0,3,yes,6.0
c,snake,0.5,2,no,1.0
d,dog,,3,yes,
e,dog,5.0,2,no,10.0
f,cat,1.5,3,no,3.0
g,snake,4.5,1,no,9.0
h,cat,,1,yes,
i,dog,7.0,2,no,14.0
j,dog,3.0,1,no,6.0


#### Increase age by 5 for rows where visits are more than 2, save it as a new column:

In [29]:
df['age_plus_5_if_visits_gt_2'] = df['age'] + df['visits'].apply(lambda x: 5 if x > 2 else 0)
df

Unnamed: 0,animal,age,visits,priority,age_times_two,age_plus_5_if_visits_gt_2
a,cat,2.5,1,yes,5.0,2.5
b,cat,3.0,3,yes,6.0,8.0
c,snake,0.5,2,no,1.0,0.5
d,dog,,3,yes,,
e,dog,5.0,2,no,10.0,5.0
f,cat,1.5,3,no,3.0,6.5
g,snake,4.5,1,no,9.0,4.5
h,cat,,1,yes,,
i,dog,7.0,2,no,14.0,7.0
j,dog,3.0,1,no,6.0,3.0


##### For rows where priority is 'yes', increase age by 10:

In [30]:
df.loc[df['priority'] == 'yes', 'age'] = df.loc[df['priority'] == 'yes', 'age'] + 10
df

Unnamed: 0,animal,age,visits,priority,age_times_two,age_plus_5_if_visits_gt_2
a,cat,12.5,1,yes,5.0,2.5
b,cat,13.0,3,yes,6.0,8.0
c,snake,0.5,2,no,1.0,0.5
d,dog,,3,yes,,
e,dog,5.0,2,no,10.0,5.0
f,cat,1.5,3,no,3.0,6.5
g,snake,4.5,1,no,9.0,4.5
h,cat,,1,yes,,
i,dog,7.0,2,no,14.0,7.0
j,dog,3.0,1,no,6.0,3.0
