### Task

Generate a synthetic dataset with columns of age,salary,department,years_of_experience, and is_manager

In [None]:
import numpy as np
import pandas as pd

In [None]:
np.random.seed(42)
n=100

data={
    "age":np.random.randint(18,60,n),
    "salary":np.random.randint(30000,120000,n),
    "department":np.random.choice(["HR","IT","Finance","Marketing"],n),
    "years_of_experience":np.round(np.random.normal(5,2,n),1),
    "is_manager":np.random.choice([0,1],n)
}

df=pd.DataFrame(data)

1. Viewing Data structure

In [None]:
df.head()

Unnamed: 0,age,salary,department,years_of_experience,is_manager
0,56,38392,HR,-0.8,0
1,46,60535,Marketing,3.4,1
2,32,108603,IT,5.0,1
3,25,82256,IT,4.2,1
4,38,119135,IT,4.1,1


In [None]:
df.tail()

Unnamed: 0,age,salary,department,years_of_experience,is_manager
95,59,82662,HR,4.0,0
96,56,42688,IT,4.4,1
97,58,55342,Marketing,6.0,0
98,45,67157,IT,11.4,0
99,24,97863,IT,5.2,1


In [None]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 100 entries, 0 to 99
Data columns (total 5 columns):
 #   Column               Non-Null Count  Dtype  
---  ------               --------------  -----  
 0   age                  100 non-null    int64  
 1   salary               100 non-null    int64  
 2   department           100 non-null    object 
 3   years_of_experience  100 non-null    float64
 4   is_manager           100 non-null    int64  
dtypes: float64(1), int64(3), object(1)
memory usage: 4.0+ KB


2. Get Dataframe Info & Summary Stats

In [None]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 100 entries, 0 to 99
Data columns (total 5 columns):
 #   Column               Non-Null Count  Dtype  
---  ------               --------------  -----  
 0   age                  100 non-null    int64  
 1   salary               100 non-null    int64  
 2   department           100 non-null    object 
 3   years_of_experience  100 non-null    float64
 4   is_manager           100 non-null    int64  
dtypes: float64(1), int64(3), object(1)
memory usage: 4.0+ KB


In [None]:
df.describe()

Unnamed: 0,age,salary,years_of_experience,is_manager
count,100.0,100.0,100.0,100.0
mean,37.91,77809.16,4.823,0.47
std,12.219454,26058.643576,2.237822,0.501614
min,18.0,30206.0,-0.8,0.0
25%,26.75,55141.0,3.475,0.0
50%,38.0,80932.0,4.7,0.0
75%,46.25,98107.25,6.0,1.0
max,59.0,119474.0,11.4,1.0


3. Do Simple NumPy Operation

In [None]:
print("Mean Age:",np.mean(df["age"]))
print("Max Salary:",np.max(df["salary"]))
print("Standard Deviation of Experience:",np.std(df["years_of_experience"]))

Mean Age: 37.91
Max Salary: 119474
Standard Deviation of Experience: 2.2266052636244265


4. Filtering and Indexing Rows

In [None]:
print("Rows with salary > 100000:")
print(df[df["salary"]>100000].head())

Rows with salary > 100000:
   age  salary department  years_of_experience  is_manager
2   32  108603         IT                  5.0           1
4   38  119135         IT                  4.1           1
6   36  107373    Finance                  6.7           1
7   40  109575  Marketing                  6.9           0
8   28  114651  Marketing                  5.8           1


In [None]:
print("Row at index 4:")
print(df.iloc[4])

Row at index 4:
age                        38
salary                 119135
department                 IT
years_of_experience       4.1
is_manager                  1
Name: 4, dtype: object


In [None]:
print("Rows from index 12 to 24")
print(df.iloc[12:25])

Rows from index 12 to 24
    age  salary department  years_of_experience  is_manager
12   57  100592    Finance                  6.1           1
13   41   38110         IT                  2.0           1
14   20  109309  Marketing                  5.0           1
15   39   57266         HR                  5.8           1
16   19   82992    Finance                  4.7           0
17   41  112948  Marketing                  6.7           1
18   47   36910  Marketing                  7.6           0
19   55   30206    Finance                  2.3           0
20   19  117054  Marketing                  6.7           0
21   38  117897    Finance                  5.8           0
22   50   53419         IT                  5.0           0
23   29   80636    Finance                  1.9           0
24   39   80015    Finance                  8.0           1


5. Adding a column

In [None]:
df["salary_in_lakhs"]=df["salary"]/100000
df.head()

Unnamed: 0,age,salary,department,years_of_experience,is_manager,salary_in_lakhs
0,56,38392,HR,-0.8,0,0.38392
1,46,60535,Marketing,3.4,1,0.60535
2,32,108603,IT,5.0,1,1.08603
3,25,82256,IT,4.2,1,0.82256
4,38,119135,IT,4.1,1,1.19135


6. Grouping & Aggregation

In [None]:
avg_salary_by_dept=df.groupby("department")["salary"].mean()
print("Average Salary by Department:")
print(avg_salary_by_dept)

Average Salary by Department:
department
Finance      83124.708333
HR           75825.476190
IT           73523.052632
Marketing    77684.722222
Name: salary, dtype: float64


In [None]:
print(df['years_of_experience'].value_counts().head())

years_of_experience
5.8    6
6.7    4
4.7    4
1.9    4
5.3    3
Name: count, dtype: int64
