## Intermediate DataFrame operations like filtering and sorting

### Import the pandas library

In [1]:
import pandas as pd

### Load the CSV file from this [link](https://raw.githubusercontent.com/Prajwalk09/Data-Analysis-with-Pandas-and-Python/refs/heads/main/DataFrame-2/employees.csv) into a DataFrame and assign it to a variable named `data`

In [2]:
url = 'https://raw.githubusercontent.com/Prajwalk09/Data-Analysis-with-Pandas-and-Python/refs/heads/main/DataFrame-2/employees.csv'
data = pd.read_csv(url)

### Display the first 5 rows of the DataFrame

In [3]:
data.head()

Unnamed: 0,First Name,Gender,Start Date,Last Login Time,Salary,Bonus %,Senior Management,Team
0,Douglas,Male,8/6/1993,12:42 PM,97308,6.945,True,Marketing
1,Thomas,Male,3/31/1996,6:53 AM,61933,4.17,True,
2,Maria,Female,4/23/1993,11:17 AM,130590,11.858,False,Finance
3,Jerry,Male,3/4/2005,1:00 PM,138705,9.34,True,Finance
4,Larry,Male,1/24/1998,4:47 PM,101004,1.389,True,Client Services


### Convert the `'Start Date'` and `'Last Login Time'` columns in the `data` DataFrame to datetime format

### Purpose of Converting Datetime Variables Stored as Objects to Datetime Objects

Converting datetime variables stored as objects (strings) to actual `datetime` objects allows for more efficient and accurate manipulation of date and time data. It enables:

- **Date-based operations**: You can perform operations like filtering, sorting, and calculating date differences.
- **Better performance**: Datetime objects are optimized for time-related calculations and comparisons, improving computational efficiency.
- **Consistency**: Ensures consistency in how dates are represented and handled across the dataset.
- **Advanced functionality**: Enables access to datetime-specific methods and attributes, such as extracting year, month, day, and performing time-based aggregations.

In [7]:
data['Start Date'] = pd.to_datetime(data['Start Date'])

In [8]:
data['Last Login Time'] = pd.to_datetime(data['Last Login Time'])

```python
Alternative Approach
data['Start Date'] = data['Start Date'].astype('datetime64')
data['Last Login Time'] = data['Last Login Time'].astype('datetime64')
```

### Convert the `'Senior Management'` column in the `data` DataFrame to boolean type

In [11]:
data['Senior Management'] = data['Senior Management'].astype(bool)

### Display a summary of the `data` DataFrame, including column names, non-null counts, and data types

In [12]:
data.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1000 entries, 0 to 999
Data columns (total 8 columns):
 #   Column             Non-Null Count  Dtype         
---  ------             --------------  -----         
 0   First Name         933 non-null    object        
 1   Gender             855 non-null    object        
 2   Start Date         1000 non-null   datetime64[ns]
 3   Last Login Time    1000 non-null   datetime64[ns]
 4   Salary             1000 non-null   int64         
 5   Bonus %            1000 non-null   float64       
 6   Senior Management  1000 non-null   bool          
 7   Team               957 non-null    object        
dtypes: bool(1), datetime64[ns](2), float64(1), int64(1), object(3)
memory usage: 55.8+ KB


### Filter and display rows from the `data` DataFrame where the value in the `'Gender'` column is `'Male`

In [14]:
data[data['Gender'] == 'Male']

Unnamed: 0,First Name,Gender,Start Date,Last Login Time,Salary,Bonus %,Senior Management,Team
0,Douglas,Male,1993-08-06,2024-11-22 12:42:00,97308,6.945,True,Marketing
1,Thomas,Male,1996-03-31,2024-11-22 06:53:00,61933,4.170,True,
3,Jerry,Male,2005-03-04,2024-11-22 13:00:00,138705,9.340,True,Finance
4,Larry,Male,1998-01-24,2024-11-22 16:47:00,101004,1.389,True,Client Services
5,Dennis,Male,1987-04-18,2024-11-22 01:35:00,115163,10.125,False,Legal
...,...,...,...,...,...,...,...,...
994,George,Male,2013-06-21,2024-11-22 17:47:00,98874,4.479,True,Marketing
996,Phillip,Male,1984-01-31,2024-11-22 06:30:00,42392,19.675,False,Finance
997,Russell,Male,2013-05-20,2024-11-22 12:39:00,96914,1.421,False,Product
998,Larry,Male,2013-04-20,2024-11-22 16:45:00,60500,11.985,False,Business Development


### Filter and display rows from the `data` DataFrame where the `'Senior Management'` column has a value of `True`

In [16]:
data[data['Senior Management']]

Unnamed: 0,First Name,Gender,Start Date,Last Login Time,Salary,Bonus %,Senior Management,Team
0,Douglas,Male,1993-08-06,2024-11-22 12:42:00,97308,6.945,True,Marketing
1,Thomas,Male,1996-03-31,2024-11-22 06:53:00,61933,4.170,True,
3,Jerry,Male,2005-03-04,2024-11-22 13:00:00,138705,9.340,True,Finance
4,Larry,Male,1998-01-24,2024-11-22 16:47:00,101004,1.389,True,Client Services
6,Ruby,Female,1987-08-17,2024-11-22 16:20:00,65476,10.012,True,Product
...,...,...,...,...,...,...,...,...
991,Rose,Female,2002-08-25,2024-11-22 05:12:00,134505,11.051,True,Marketing
992,Anthony,Male,2011-10-16,2024-11-22 08:35:00,112769,11.625,True,Finance
993,Tina,Female,1997-05-15,2024-11-22 15:53:00,56450,19.040,True,Engineering
994,George,Male,2013-06-21,2024-11-22 17:47:00,98874,4.479,True,Marketing


<span style="font-size:14px; color:blue;">Note that the <strong>Senior Management</strong> has boolean values by default, hence there is no need to use '==' again</span>

### Filter and display rows from the `data` DataFrame where the value in the `'Team'` column is not `'Marketing'`

In [18]:
data[data['Team'] == 'Marketing']

Unnamed: 0,First Name,Gender,Start Date,Last Login Time,Salary,Bonus %,Senior Management,Team
0,Douglas,Male,1993-08-06,2024-11-22 12:42:00,97308,6.945,True,Marketing
21,Matthew,Male,1995-09-05,2024-11-22 02:12:00,100612,13.645,False,Marketing
26,Craig,Male,2000-02-27,2024-11-22 07:45:00,37598,7.757,True,Marketing
43,Marilyn,Female,1980-12-07,2024-11-22 03:16:00,73524,5.207,True,Marketing
62,,Female,2007-06-12,2024-11-22 17:25:00,58112,19.414,True,Marketing
...,...,...,...,...,...,...,...,...
942,Lori,Female,2015-11-20,2024-11-22 13:15:00,75498,6.537,True,Marketing
947,,Male,2012-07-30,2024-11-22 15:07:00,107351,5.329,True,Marketing
986,Donna,Female,1982-11-26,2024-11-22 07:04:00,82871,17.999,False,Marketing
991,Rose,Female,2002-08-25,2024-11-22 05:12:00,134505,11.051,True,Marketing


### Filter and display rows from the `data` DataFrame where the value in the `'Salary'` column is greater than 110,000

In [28]:
data[data['Salary'] > 110000]

Unnamed: 0,First Name,Gender,Start Date,Last Login Time,Salary,Bonus %,Senior Management,Team
2,Maria,Female,1993-04-23,2024-11-22 11:17:00,130590,11.858,False,Finance
3,Jerry,Male,2005-03-04,2024-11-22 13:00:00,138705,9.340,True,Finance
5,Dennis,Male,1987-04-18,2024-11-22 01:35:00,115163,10.125,False,Legal
9,Frances,Female,2002-08-08,2024-11-22 06:51:00,139852,7.524,True,Business Development
12,Brandon,Male,1980-12-01,2024-11-22 01:08:00,112807,17.492,True,Human Resources
...,...,...,...,...,...,...,...,...
987,Gloria,Female,2014-12-08,2024-11-22 05:08:00,136709,10.331,True,Finance
991,Rose,Female,2002-08-25,2024-11-22 05:12:00,134505,11.051,True,Marketing
992,Anthony,Male,2011-10-16,2024-11-22 08:35:00,112769,11.625,True,Finance
995,Henry,,2014-11-23,2024-11-22 06:09:00,132483,16.655,False,Distribution


### Filter and display rows from the `data` DataFrame where the value in the `'Start Date'` column is earlier than or equal to `'1985-01-01'`

In [30]:
data[data['Start Date'] <= '1985-01-01']

Unnamed: 0,First Name,Gender,Start Date,Last Login Time,Salary,Bonus %,Senior Management,Team
10,Louise,Female,1980-08-12,2024-11-22 09:01:00,63241,15.132,True,
12,Brandon,Male,1980-12-01,2024-11-22 01:08:00,112807,17.492,True,Human Resources
18,Diana,Female,1981-10-23,2024-11-22 10:27:00,132940,19.082,False,Client Services
28,Terry,Male,1981-11-27,2024-11-22 18:30:00,124008,13.464,True,Client Services
37,Linda,Female,1981-10-19,2024-11-22 20:49:00,57427,9.557,True,Client Services
...,...,...,...,...,...,...,...,...
982,Rose,Female,1982-04-06,2024-11-22 10:43:00,91411,8.639,True,Human Resources
983,John,Male,1982-12-23,2024-11-22 22:35:00,146907,11.738,False,Engineering
985,Stephen,,1983-07-10,2024-11-22 20:10:00,85668,1.909,False,Legal
986,Donna,Female,1982-11-26,2024-11-22 07:04:00,82871,17.999,False,Marketing
