# Employee Data Exploration

This project explores a dummy employee dataset using **pandas**.  
It demonstrates basic data exploration, filtering, aggregation, and cleaning — 
common steps in a data engineering workflow.

## Dataset
The dataset contains:
- Employee ID
- Name
- Department
- Age
- Salary
- Years at company
- Remote (True/False)


## Employee Data Exploration – Questions

Use **pandas** to answer the following questions about the dataset.

---

### 1. Inspecting the Data
- How many rows and columns are in the dataset?  
- What are the column names and data types?  
- Show the first 5 rows of the dataset.  

---

### 2. Selection & Filtering
- Select only the `name` and `salary` columns.  
- Show all employees in the **IT** department.  
- Find employees who are older than 40.  
- Show employees in **Finance** who also work remotely.  

---

### 3. Aggregations
- What is the **average salary** across the company?  
- What is the **average age per department**?  
- Which department has the **highest total salary**?  
- What is the **maximum years at company**? Who is that employee?  

---

### 4. Data Cleaning / Checks
- Are there any missing values in the dataset?  
- How many employees work **remotely**?  
- What percentage of employees are remote vs in-office?  

---

### 5. Mini Challenges
- Create a new column called `salary_per_year` = `salary / years_at_company`.  
- Sort employees by `salary_per_year` (highest to lowest).  
- Save the sorted results to a new CSV file in the `data/` folder.  

---

✅ *Tip: Think about how each question connects to SQL operations you already know (SELECT, WHERE, GROUP BY, etc.)*


In [4]:
import pandas as pd

In [5]:
import sys

In [6]:
df = pd.read_csv('dummy_employees.csv')

In [7]:
type(df)

pandas.core.frame.DataFrame