<a href="https://colab.research.google.com/github/abhilashmarathe/data_science_basics/blob/main/Day_2_Pandas_Basics_completed.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# **Pandas Practice Programs**

**1. Create and Explore DataFrames**

In [None]:
import pandas as pd

data = {
    'Name': ['Abhi', 'Ketaki', 'Aarav', 'Riya', 'Soham'],
    'Age': [24, 23, 22, 25, 24],
    'City': ['Kolhapur', 'Pune', 'Mumbai', 'Nashik', 'Goa'],
    'Salary': [50000, 55000, 48000, 62000, 58000]
}

df = pd.DataFrame(data)

print("First 3 Rows:\n", df.head(3))
print("\nLast 2 Rows:\n", df.tail(2))
print("\nShape:", df.shape)
print("\nColumns:", df.columns)
print("\nInfo:")
print(df.info())
print("\nDescribe:\n", df.describe())

First 3 Rows:
      Name  Age      City  Salary
0    Abhi   24  Kolhapur   50000
1  Ketaki   23      Pune   55000
2   Aarav   22    Mumbai   48000

Last 2 Rows:
     Name  Age    City  Salary
3   Riya   25  Nashik   62000
4  Soham   24     Goa   58000

Shape: (5, 4)

Columns: Index(['Name', 'Age', 'City', 'Salary'], dtype='object')

Info:
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 5 entries, 0 to 4
Data columns (total 4 columns):
 #   Column  Non-Null Count  Dtype 
---  ------  --------------  ----- 
 0   Name    5 non-null      object
 1   Age     5 non-null      int64 
 2   City    5 non-null      object
 3   Salary  5 non-null      int64 
dtypes: int64(2), object(2)
memory usage: 292.0+ bytes
None

Describe:
              Age        Salary
count   5.000000      5.000000
mean   23.600000  54600.000000
std     1.140175   5727.128425
min    22.000000  48000.000000
25%    23.000000  50000.000000
50%    24.000000  55000.000000
75%    24.000000  58000.000000
max    25.000000  62000

**2. Selecting and Filtering**

In [None]:
# Selecting a single column
print(df['Name'])

# Selecting multiple columns
print(df[['Name', 'City']])

# Filtering rows
print(df[df['Age'] > 23])

# Conditional filter
print(df[(df['City'] == 'Pune') & (df['Salary'] > 50000)])


0      Abhi
1    Ketaki
2     Aarav
3      Riya
4     Soham
Name: Name, dtype: object
     Name      City
0    Abhi  Kolhapur
1  Ketaki      Pune
2   Aarav    Mumbai
3    Riya    Nashik
4   Soham       Goa
    Name  Age      City  Salary
0   Abhi   24  Kolhapur   50000
3   Riya   25    Nashik   62000
4  Soham   24       Goa   58000
     Name  Age  City  Salary
1  Ketaki   23  Pune   55000


**3. Add, Update, and Delete Columns**

In [None]:
# Add a new column
df['Experience'] = [2, 3, 1, 4, 2]

# Update values
df.loc[df['Name'] == 'Abhi', 'Salary'] = 52000

# Delete column
df.drop('City', axis=1, inplace=True)

print(df)

     Name  Age  Salary  Experience
0    Abhi   24   52000           2
1  Ketaki   23   55000           3
2   Aarav   22   48000           1
3    Riya   25   62000           4
4   Soham   24   58000           2


**4. Handle Missing Values**

In [None]:
# Introduce NaN
df.loc[2, 'Salary'] = None
print(df)

# Fill NaN
df['Salary'].fillna(df['Salary'].mean(), inplace=True)

# Drop NaN rows
df.dropna(inplace=True)

     Name  Age   Salary  Experience
0    Abhi   24  52000.0           2
1  Ketaki   23  55000.0           3
2   Aarav   22      NaN           1
3    Riya   25  62000.0           4
4   Soham   24  58000.0           2


The behavior will change in pandas 3.0. This inplace method will never work because the intermediate object on which we are setting values always behaves as a copy.

For example, when doing 'df[col].method(value, inplace=True)', try using 'df.method({col: value}, inplace=True)' or df[col] = df[col].method(value) instead, to perform the operation inplace on the original object.


  df['Salary'].fillna(df['Salary'].mean(), inplace=True)


**5. Sorting and Grouping**

In [None]:
# Sort by Age
print(df.sort_values(by='Age'))

# Group by Experience
grouped = df.groupby('Experience')['Salary'].mean()
print(grouped)

     Name  Age   Salary  Experience
2   Aarav   22  56750.0           1
1  Ketaki   23  55000.0           3
0    Abhi   24  52000.0           2
4   Soham   24  58000.0           2
3    Riya   25  62000.0           4
Experience
1    56750.0
2    55000.0
3    55000.0
4    62000.0
Name: Salary, dtype: float64


**6. Merging & Joining**

In [None]:
# Second DataFrame
dept = pd.DataFrame({
    'Name': ['Abhi', 'Ketaki', 'Riya', 'Soham'],
    'Department': ['IT', 'HR', 'Finance', 'Marketing']
})

merged_df = pd.merge(df, dept, on='Name', how='left')
print(merged_df)


     Name  Age   Salary  Experience Department
0    Abhi   24  52000.0           2         IT
1  Ketaki   23  55000.0           3         HR
2   Aarav   22  56750.0           1        NaN
3    Riya   25  62000.0           4    Finance
4   Soham   24  58000.0           2  Marketing


**7. Apply & Lambda Functions**

In [None]:
# Add new column based on condition
df['Status'] = df['Salary'].apply(lambda x: 'High' if x > 55000 else 'Low')
print(df)


     Name  Age   Salary  Experience Status
0    Abhi   24  52000.0           2    Low
1  Ketaki   23  55000.0           3    Low
2   Aarav   22  56750.0           1   High
3    Riya   25  62000.0           4   High
4   Soham   24  58000.0           2   High


**8. Save & Load CSV**

In [None]:
df.to_csv('Day2_Pandas_Final.csv', index=False)

new_df = pd.read_csv('Day2_Pandas_Final.csv')
print(new_df.head())

     Name  Age   Salary  Experience Status
0    Abhi   24  52000.0           2    Low
1  Ketaki   23  55000.0           3    Low
2   Aarav   22  56750.0           1   High
3    Riya   25  62000.0           4   High
4   Soham   24  58000.0           2   High
