# DataFrame in Pandas:

A DataFrame in Pandas is a two-dimensional labeled data structure that resembles a table or spreadsheet. It consists of rows and columns, where each column can contain different data types (e.g., integers, floats, strings) and is identified by a unique label. DataFrames are highly versatile and widely used for data manipulation, analysis, and visualization tasks in Python.

# Example Usage:

In [17]:
import pandas as pd

# Sample data for DataFrame
data = {
    'Name': ['Alice', 'Bob', 'Charlie', 'David', 'Emma', 'Thomas'],
    'Age': [25, 30, 35, 28, 32, 18],
    'City': ['London', 'New York', 'Paris', 'Paris', 'Sydney', 'Boutersem'],
    'Salary': [60000, 75000, 80000, 70000, 65000, 40000]
}



In [18]:
# Creating a DataFrame
df = pd.DataFrame(data)

# Displaying the DataFrame
df

Unnamed: 0,Name,Age,City,Salary
0,Alice,25,London,60000
1,Bob,30,New York,75000
2,Charlie,35,Paris,80000
3,David,28,Paris,70000
4,Emma,32,Sydney,65000
5,Thomas,18,Boutersem,40000


**DataFrame Creation:** We create a DataFrame 'df' from the dictionary 'data', containing information about individuals' names, ages, cities, and salaries.

In [19]:
# Basic DataFrame Information
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 6 entries, 0 to 5
Data columns (total 4 columns):
 #   Column  Non-Null Count  Dtype 
---  ------  --------------  ----- 
 0   Name    6 non-null      object
 1   Age     6 non-null      int64 
 2   City    6 non-null      object
 3   Salary  6 non-null      int64 
dtypes: int64(2), object(2)
memory usage: 324.0+ bytes


**DataFrame Information:** We use the info() method to display information about the DataFrame, including the data types of each column and memory usage.

In [20]:
df.describe()

Unnamed: 0,Age,Salary
count,6.0,6.0
mean,28.0,65000.0
std,5.966574,14142.135624
min,18.0,40000.0
25%,25.75,61250.0
50%,29.0,67500.0
75%,31.5,73750.0
max,35.0,80000.0


**Summary Statistics:** The describe() method generates summary statistics (count, mean, std, min, 25%, 50%, 75%, max) for numeric columns in the DataFrame.

In [21]:
filtered_df = df[df['Salary'] > 30000]
filtered_df

Unnamed: 0,Name,Age,City,Salary
0,Alice,25,London,60000
1,Bob,30,New York,75000
2,Charlie,35,Paris,80000
3,David,28,Paris,70000
4,Emma,32,Sydney,65000
5,Thomas,18,Boutersem,40000


**Filtering Data:** We filter the DataFrame to include only rows where the salary is greater than 70000.

In [22]:
df['Experience'] = [3, 5, 7, 4, 6,-1]

df

Unnamed: 0,Name,Age,City,Salary,Experience
0,Alice,25,London,60000,3
1,Bob,30,New York,75000,5
2,Charlie,35,Paris,80000,7
3,David,28,Paris,70000,4
4,Emma,32,Sydney,65000,6
5,Thomas,18,Boutersem,40000,-1


**Adding a New Column:** We add a new column 'Experience' to the DataFrame, representing the number of years of work experience for each individual.

In [23]:
avg_salary_by_city = df.groupby('City')['Salary'].mean()
avg_salary_by_city

City
Boutersem    40000.0
London       60000.0
New York     75000.0
Paris        75000.0
Sydney       65000.0
Name: Salary, dtype: float64

**Grouping and Aggregation:** We group the DataFrame by the 'City' column and calculate the average salary for each city using the groupby() and mean() methods.

Understanding DataFrames is crucial for conducting data analysis and manipulation tasks efficiently in Pandas. They serve as the backbone of many data-related operations, enabling users to work with structured datasets seamlessly.

# Assigment

In this lab exercise, you will explore a dataset containing information about employees in a company using Pandas. You will perform various data manipulation and analysis tasks to gain insights into the employee data.

**Dataset:**
You are provided with a dictionary containing employee information:

In [38]:
data = {
    'EmployeeID': [101, 102, 103, 104, 105,106],
    'Name': ['Alice', 'Bob', 'Charlie', 'David', 'Emma','Bert'],
    'Department': ['HR', 'IT', 'Finance', 'Marketing', 'Sales','HR'],
    'Salary': [60000, 75000, 80000, 70000, 65000, 55000],
}

## Tasks:

**DataFrame Creation:**
- Create a Pandas DataFrame named 'employees' from the provided dictionary 'data'.

**Data Exploration:**
- Display the first 3 rows of the DataFrame to get an overview of the data. (use the head() methode for this)
- Check the summary statistics for the 'Salary' column. (use the describe() methode for this)
- Determine the number of employees in each department.

**Data Analysis:**
- Calculate and display the average salary of employees.
- Identify the employee(s) with the highest salary and their details.
- Determine the department with the lowest average salary.
- Add a new column 'Salary Increase' to the DataFrame, where each employee's salary is increased by 10%.

In [39]:

import pandas as pd

employees = pd.DataFrame(data)

In [48]:
first_three_employees = employees.head(3)
print(first_three_employees)

   EmployeeID     Name Department  Salary
0         101    Alice         HR   60000
1         102      Bob         IT   75000
2         103  Charlie    Finance   80000


In [47]:
salary_description = employees['Salary'].describe()
print(salary_description)

count        6.000000
mean     67500.000000
std       9354.143467
min      55000.000000
25%      61250.000000
50%      67500.000000
75%      73750.000000
max      80000.000000
Name: Salary, dtype: float64


In [46]:
employees_per_department = employees['Department'].value_counts()
print(employees_per_department)

Department
HR           2
IT           1
Finance      1
Marketing    1
Sales        1
Name: count, dtype: int64


In [49]:
avg_salary = employees['Salary'].mean()
print(avg_salary)

67500.0


In [60]:
highest_salary = employees[employees['Salary'] == employees['Salary'].max()]
print(highest_salary)

   EmployeeID     Name Department  Salary
2         103  Charlie    Finance   80000


In [73]:
lowest_salary_department = employees.groupby('Department')['Salary'].mean().idxmin()
print(lowest_salary_department)

HR


In [79]:
employees['Salary_increase'] = employees['Salary'] * 1.10
employees

Unnamed: 0,EmployeeID,Name,Department,Salary,Salary_increase
0,101,Alice,HR,60000,66000.0
1,102,Bob,IT,75000,82500.0
2,103,Charlie,Finance,80000,88000.0
3,104,David,Marketing,70000,77000.0
4,105,Emma,Sales,65000,71500.0
5,106,Bert,HR,55000,60500.0
