<a href="https://colab.research.google.com/github/ahmedsaber0913-wq/classes-and-objects/blob/main/Data_analysis.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

`






# 1. What is NumPy?
- **`NumPy`** (Numerical Python) is an open-source Python library used for numerical and scientific computing. <br><br>
- It provides support for large, multi-dimensional arrays and matrices, along with a collection of mathematical functions to operate on these arrays efficiently. <br><br>
- It provides a data type called **`ndarray`** (n-dimensional arrays) <br><br>
- **`N-Dimensional Arrays`:** The core of NumPy is its powerful N-dimensional array object (ndarray), which allows you to work with arrays of any dimension (1D, 2D, 3D, etc.). <br><br>

-----------

## Why to use ndarray instead of Python lists?
- The **`ndarray`** (N-dimensional array) from **NumPy** is much faster than a Python list for several reasons:
1. Homogeneuos Data Type:
    * **Python List:** A Python list can store elements of different data types (e.g., integers, floats, strings). This flexibility adds overhead since each element is essentially a pointer to an object in memory, and the Python interpreter has to dynamically determine the type of each element during operations.
    * **NumPy ndarray:** An ndarray stores elements of the same data type (e.g., all integers or all floats). This homogeneity allows NumPy to allocate a contiguous block of memory for the array, making operations faster because it doesn't need to check the data type of each element during computations <br><br>
2. Memory Efficiency:
    * **Python List:** Because each element in a list is a reference to an object (which includes metadata like type information and reference count), Python lists consume more memory. This can slow down operations that require large datasets.
    * **NumPy ndarray:** Since ndarray elements are stored in a contiguous block of memory without extra metadata, memory access is more efficient. This reduces memory overhead and increases cache efficiency, leading to faster execution. <br><br>
3. Built-in Functions:
    * **Python List:** Many operations on Python lists require manual implementation (e.g., element-wise addition, multiplication), often resulting in slower, loop-based Python code. <br><br>
    * **NumPy ndarray:** NumPy provides a wide range of optimized functions that operate directly on ndarray objects, leveraging the power of C and Fortran libraries like BLAS and LAPACK for efficient computation.

In [None]:
pip install NumPy



In [None]:
# Import NumPy package
import numpy as np

In [None]:
my_list = [1, 2, 3, 4, 5]
my_array = np.array(my_list)
print (my_list)
print (my_array)

[1, 2, 3, 4, 5]
[1 2 3 4 5]


In [None]:
a = np.array(10) #zero Dimenisional array
b = np.array([10,20]) #one Dimenisional array
c = np.array([[1,2],[3,4]]) #two Dimenisional array
d = np.array([[[1,2],[3,4]],[[5,6],[7,8]]]) #three Dimenisional array


In [7]:
import numpy as np
a = np.array([10 , 20 , 30 , 40 , 50])
b = np.array([5 , 4 , 3 , 2 , 1])
add_result = a + b
print(add_result)
sub_result = a - b
print(sub_result)
mul_result = a * b
print(mul_result)
div_result = a / b
print(div_result)
mean_A = np.mean(a)
max_A = np.max(a)
min_A = np.min(a)
print(mean_A)
print(max_A)
print(min_A)
dot_product = np.dot(a,b)
print("dot_product of A and B:", dot_product)
reshaped_A = a.reshape(5,1)
print(reshaped_A)

[15 24 33 42 51]
[ 5 16 27 38 49]
[50 80 90 80 50]
[ 2.  5. 10. 20. 50.]
30.0
50
10
dot_product of A and B: 350
[[10]
 [20]
 [30]
 [40]
 [50]]


In [9]:
import pandas as pd
students_data = pd.DataFrame({
    "Name": ["Alice", "Bob", "Charlie", "David", "Eve"],
    "Age": [20, 22, 19, 21, 23],
    "Grade": ["A", "B", "A", "C", "B"],
    "Marks": [85, 92, 78, 60, 88]
})
print("First 3 rows:")
print(students_data.head(3))

print("Name and marks columns:")
print(students_data[["Name", "Marks"]])

print("students with grade A:")
print(students_data[students_data["Grade"] == "A"])

print("Students older than 20:")
print(students_data[students_data["Age"] > 20])

First 3 rows:
      Name  Age Grade  Marks
0    Alice   20     A     85
1      Bob   22     B     92
2  Charlie   19     A     78
Name and marks columns:
      Name  Marks
0    Alice     85
1      Bob     92
2  Charlie     78
3    David     60
4      Eve     88
students with grade A:
      Name  Age Grade  Marks
0    Alice   20     A     85
2  Charlie   19     A     78
Students older than 20:
    Name  Age Grade  Marks
1    Bob   22     B     92
3  David   21     C     60
4    Eve   23     B     88


#index

In [None]:
print(d[1,1,1])

8


#number of Dimensions

In [None]:
print(a.ndim)
print(b.ndim)
print(c.ndim)
print(d.ndim)

0
1
2
3


#custom Dimension

In [None]:
my_custom_array = np.array([1, 2, 3, 4, 5], ndmin=3)
print(my_custom_array)
print(my_custom_array.ndim)

[[[1 2 3 4 5]]]
3


#Array Slicing

In [None]:
a = np.array(["A", "B", "C", "D", "E", "f"])
print(a[1:4])
print(a[:4])
print(a[2:])

['B' 'C' 'D']
['A' 'B' 'C' 'D']
['C' 'D' 'E' 'f']


In [None]:
b = np.array([["A", "B", "X"] , ["C", "D", "Y"], ["E", "F", "Z"] , ["M", "N", "O"] ])
print(b[1])
print(b[0:3,0:2])
print(b[2: ,0])

['C' 'D' 'Y']
[['A' 'B']
 ['C' 'D']
 ['E' 'F']]
['E' 'M']


#numpy DataType And Control Array

In [None]:
#show array DataType
my_array1= np.array([1,2,3])
my_array2= np.array([1.5,20.15,36.01])
my_array3= np.array(["Ahmed","8","osama"]) #uni code string
print(my_array1.dtype)
print(my_array2.dtype)
print(my_array3.dtype)


int64
float64
<U5


#create Array With specific Datatype

In [None]:
my_array4= np.array([1,2,3],dtype='f') #float , f , "float"
my_array5= np.array([1.5, 20.15, 3.501], dtype="int") #int , i , "int"
#my_array6= np.array(["Ahmed","8","omar"], dtype='int') # value error
print(my_array4.dtype)
print(my_array5.dtype)

float32
int64


#Change Datatype of existing array

In [None]:
my_array7 = np.array([0, 1, 2, 3, 4])
print(my_array7.dtype)
print(my_array7)


int64
[0 1 2 3 4]


In [None]:

my_array7 = my_array7.astype('float')
print(my_array7)
print(my_array7.dtype)

[0. 1. 2. 3. 4.]
float64


#Arithemtic operation

In [None]:
my_array1= np.array([10,20,30])
my_array2= np.array([5 ,2 ,4])
print(my_array1 + my_array2)
print(my_array1 - my_array2)
print(my_array1 * my_array2)
print(my_array1 / my_array2)

[15 22 34]
[ 5 18 26]
[ 50  40 120]
[ 2.  10.   7.5]


In [None]:
my_array3 = np.array([[1, 4], [5, 9]])
my_array4 = np.array([[2 ,7], [10, 5]])
print(my_array3 + my_array4)
print(my_array3 - my_array4)
print(my_array3 * my_array4)
print(my_array3 / my_array4)

[[ 3 11]
 [15 14]]
[[-1 -3]
 [-5  4]]
[[ 2 28]
 [50 45]]
[[0.5        0.57142857]
 [0.5        1.8       ]]


#Min , Max , Sum

In [None]:
my_array5= np.array([10,20,30])
print(my_array5.min())
print(my_array5.max())
print(my_array5.sum())

10
30
60


In [None]:
my_array6 = np.array([[6, 4], [3, 9]])
print(my_array6.min())
print(my_array6.max())
print(my_array6.sum())

3
9
22


#Ravel
returns flattened Array 1 Dimension

In [None]:
my_array7 = np.array([[6, 4], [3, 9]])
print(my_array7.ravel())

[6 4 3 9]


In [None]:
my_array7 = np.array([[[1, 2], [3, 4]], [[5, 6], [7, 8]]])
print(my_array7.ravel())

[1 2 3 4 5 6 7 8]


#Shape And Reshape

In [None]:
my_array1= np.array([1,2,3,4])
print(my_array1.ndim)
print(my_array1.shape)

1
(4,)


In [None]:
my_array2 = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]])
print(my_array2.ndim)
print(my_array2.shape)

2
(3, 3)


In [None]:
#RESHAPE
my_array3 = np.array([1, 2, 3, 4, 5, 6, 7, 8, 9, 10 ,11, 12])
print(my_array3.ndim)
print(my_array3.shape)

1
(12,)


In [None]:
my_array3 = np.array([1, 2, 3, 4, 5, 6, 7, 8, 9, 10 ,11, 12])

reshaped_array = my_array3.reshape(  6,2  )
print(reshaped_array)

[[ 1  2]
 [ 3  4]
 [ 5  6]
 [ 7  8]
 [ 9 10]
 [11 12]]


<h1 align="center"> Python Packages (Pandas)</h1>

<br>

# 1. Introduction to Pandas
Pandas is an open-source data manipulation and analysis library for Python. It provides data structures and functions needed to work with structured data, particularly data in tabular form, like data from spreadsheets or SQL databases.
* Pandas is built on the top of **`NumPy`**

In [None]:
# Importing Pandas
import pandas as pd


* If the previous code ran with error, it means that the `Pandas` packages is not installed in our Python
* We need to install `Pandas` from the Python package manager **`pip`** using the following command

In [None]:
! pip install pandas

<br>

# 2. Data Structures in Pandas

## 2.1. Series
A Series in Pandas is a one-dimensional labeled array that can hold data of any type (e.g., integers, strings, floats, Python objects). It is similar to a column in an Excel spreadsheet or a single column in a DataFrame.
* A Series is `homogeneous`, meaning that all elements in the Series must be of the same data type (e.g., all integers, all floats, or all strings).

### Creating a Series

In [None]:
# Creating a simple Series from a list
data = [10, 20, 30, 40]
series = pd.Series(data)

print(series)

0    10
1    20
2    30
3    40
dtype: int64


In [None]:
# Creating a Series with a custom index
data = [15, 25, 35, 45]
series_index = ['a', 'b', 'c', 'd']
series_with_index = pd.Series(data, series_index)

print(series_with_index)

a    15
b    25
c    35
d    45
dtype: int64


* We can access the `Series` elements using `indexing` or `slicing`
1. Indexing

In [None]:
value = series_with_index['c']
value

35

In [None]:
value = series_with_index[:2]
value

Unnamed: 0,0
a,15
b,25


<br>

<br>

<br>

## 2.2. Data Frames
A DataFrame in Pandas is a two-dimensional, size-mutable, and potentially heterogeneous tabular data structure with labeled axes (rows and columns).
* It is one of the most commonly used data structures in Pandas and is ideal for representing and manipulating structured data, such as data from a spreadsheet, SQL table, or a CSV file. <br><br>
* A DataFrame is essentially a collection of **`Series`** objects that share the same index (Each column in a DataFrame is a `series`)

### Creating a DataFrame
- We can create a DataFrame from difference sources (e.g., `List`, `Dictionary`, `CSV`, etc.)
### 1. Creating a DataFrame from a list

In [None]:
# 2D List
data = [[1, 'Alice', 25],
        [2, 'omar', 35],
        [3, 'Charlie', 35]]

# Create DataFrame
df = pd.DataFrame(data, columns=['ID', 'Name', 'Age'])

In [None]:
df

Unnamed: 0,ID,Name,Age
0,1,Alice,25
1,2,omar,35
2,3,Charlie,35


### 2. Create a DataFrame from a dictionary

In [None]:
data = [{'ID': 1, 'Name': 'Emma', 'Age': 25},
        {'ID': 2, 'Name': 'Alfred', 'Age': 30},
        {'ID': 3, 'Name': 'Ahmed', 'Age': 35}]

# Create DataFrame
df = pd.DataFrame(data)

In [None]:
df

Unnamed: 0,ID,Name,Age
0,1,Emma,25
1,2,Alfred,30
2,3,Ahmed,35


### Create a DataFrame from a CSV file
CSV(**C**omma-**S**eperated **V**alues) file is a file that saves tabular data as each data element is seperated by a comma (**,**)

In [None]:
df = pd.read_csv('employees.csv')

In [None]:
df

Unnamed: 0,ID,Name,Department,Age,Salary,Start Date
0,1,Alice,Sales,25,50000,2020-01-15
1,2,Bob,Marketing,30,60000,2019-07-23
2,3,Charlie,HR,35,70000,2018-05-14
3,4,David,Finance,40,80000,2017-11-30
4,5,Eve,IT,28,75000,2021-04-25
5,6,Frank,Sales,38,55000,2019-09-10
6,7,Grace,Marketing,45,90000,2016-12-19
7,8,Heidi,IT,32,62000,2018-03-02
8,9,Ivan,Finance,29,52000,2020-09-13
9,10,Judy,HR,34,64000,2019-05-22


* We can have a glance of the data included in the DataFrame by using **`.head()`**
* **`.head()`** displays the first 5 records in a DataFrame

In [None]:
df.head()

Unnamed: 0,ID,Name,Department,Age,Salary,Start Date
0,1,Alice,Sales,25,50000,2020-01-15
1,2,Bob,Marketing,30,60000,2019-07-23
2,3,Charlie,HR,35,70000,2018-05-14
3,4,David,Finance,40,80000,2017-11-30
4,5,Eve,IT,28,75000,2021-04-25


* We can customize the number of displayed recording by passing the number of displayed records to **.head()** method

In [None]:
# Display the first 7 records of the DataFrame
df.head(6)

Unnamed: 0,ID,Name,Department,Age,Salary,Start Date
0,1,Alice,Sales,25,50000,2020-01-15
1,2,Bob,Marketing,30,60000,2019-07-23
2,3,Charlie,HR,35,70000,2018-05-14
3,4,David,Finance,40,80000,2017-11-30
4,5,Eve,IT,28,75000,2021-04-25
5,6,Frank,Sales,38,55000,2019-09-10


<br>

### Accessing data in a DataFrame
### 1. Accessing Columns
* **Column Index:**
    * The column index refers to the labels assigned to each column. <br><br>
    These are typically the column names and are used to identify and select specific columns.

### 1.1. Using Bracket Notation **`[]`**

In [None]:
# Accessing the column 'Name' of the DataFrame 'df'
df['Name']

Unnamed: 0,Name
0,Alice
1,Bob
2,Charlie
3,David
4,Eve
5,Frank
6,Grace
7,Heidi
8,Ivan
9,Judy


In [None]:
df['Age']

Unnamed: 0,Age
0,25
1,30
2,35
3,40
4,28
5,38
6,45
7,32
8,29
9,34


<br>

1.2. Using Dot Notation **`(.)`**

In [None]:
df.Department

Unnamed: 0,Department
0,Sales
1,Marketing
2,HR
3,Finance
4,IT
5,Sales
6,Marketing
7,IT
8,Finance
9,HR


In [None]:
df.Salary

Unnamed: 0,Salary
0,50000
1,60000
2,70000
3,80000
4,75000
5,55000
6,90000
7,62000
8,52000
9,64000


Accessing multiple columns

In [None]:
df[['Name', 'Salary']] # Accessing the 'Name' and 'Salary' columns

Unnamed: 0,Name,Salary
0,Alice,50000
1,Bob,60000
2,Charlie,70000
3,David,80000
4,Eve,75000
5,Frank,55000
6,Grace,90000
7,Heidi,62000
8,Ivan,52000
9,Judy,64000


<br>

### 2. Accessing Rows
* **Row Index:**
    * The row index is the label assigned to each row, which can be an integer, string, or other data types <br><br>
    By default, Pandas assigns an integer index starting from 0. <br><br>

### 2.1. Using **`loc[]`** (Label-Based Indexing)

In [None]:
df.loc[0]  # Accesses the first row by its index label (0)

Unnamed: 0,0
ID,1
Name,Alice
Department,Sales
Age,25
Salary,50000
Start Date,2020-01-15


In [None]:
df

In [None]:
df.loc[df['Department'] == 'Marketing']  # Accesses rows where 'Department' is 'Marketing'

Unnamed: 0,ID,Name,Department,Age,Salary,Start Date
1,2,Bob,Marketing,30,60000,2019-07-23
6,7,Grace,Marketing,45,90000,2016-12-19


<br>

### 2.2. Using **`iloc[]`** (Inter-Based Indexing)

In [None]:
df.iloc[1]  # Accesses the second row by its integer position

Unnamed: 0,1
ID,2
Name,Bob
Department,Marketing
Age,30
Salary,60000
Start Date,2019-07-23


In [None]:
df.iloc[1:3]  # Accesses rows from index 1 to 2 (3 is exluded)

Unnamed: 0,ID,Name,Department,Age,Salary,Start Date
1,2,Bob,Marketing,30,60000,2019-07-23
2,3,Charlie,HR,35,70000,2018-05-14


<br>

# 3. Data Manipulation with Pandas

## 3.1. Data Cleaning
### 3.1.1. Handling missing data

In [None]:
df2 = pd.read_csv('employees2.csv')

In [None]:
df2

Unnamed: 0,Name,Age,Department
0,Alice,25.0,Sales
1,Bob,,Marketing
2,Charlie,35.0,Sales
3,David,22.0,HR
4,Ahmed,40.0,IT
5,Frank,,Finance
6,Grace,29.0,HR
7,Mohamed,33.0,HR
8,Ivy,,IT
9,Jack,45.0,Sales


### 1. Identifying Missing Data
* You can identify missing data in your DataFrame using **`isnull()`**

In [None]:
df2.isnull()  # Returns a DataFrame of the same shape with True where values are missing

Unnamed: 0,Name,Age,Department
0,False,False,False
1,False,True,False
2,False,False,False
3,False,False,False
4,False,False,False
5,False,True,False
6,False,False,False
7,False,False,False
8,False,True,False
9,False,False,False


Count missing values in each column

In [None]:
df2.isnull().sum()  # Returns the count of missing values in each column

Unnamed: 0,0
Name,0
Age,3
Department,0


<br>

### 2. Drop Missing Data
You can drop rows or columns with missing values using **`dropna()`**.

In [None]:
df2

Unnamed: 0,Name,Age,Department
0,Alice,25.0,Sales
1,Bob,,Marketing
2,Charlie,35.0,Sales
3,David,22.0,HR
4,Ahmed,40.0,IT
5,Frank,,Finance
6,Grace,29.0,HR
7,Mohamed,33.0,HR
8,Ivy,,IT
9,Jack,45.0,Sales


In [None]:
df_cleaned = df2.dropna() # Drops all rows with at least one missing value

In [None]:
df_cleaned

Unnamed: 0,Name,Age,Department
0,Alice,25.0,Sales
2,Charlie,35.0,Sales
3,David,22.0,HR
4,Ahmed,40.0,IT
6,Grace,29.0,HR
7,Mohamed,33.0,HR
9,Jack,45.0,Sales


In [None]:
df2

Unnamed: 0,Name,Age,Department
0,Alice,25.0,Sales
1,Bob,,Marketing
2,Charlie,35.0,Sales
3,David,22.0,HR
4,Ahmed,40.0,IT
5,Frank,,Finance
6,Grace,29.0,HR
7,Mohamed,33.0,HR
8,Ivy,,IT
9,Jack,45.0,Sales


### 3. Filling Missing Data
* 3.1. Filing missing values with a specific value

In [None]:
df2

Unnamed: 0,Name,Age,Department
0,Alice,25.0,Sales
1,Bob,,Marketing
2,Charlie,35.0,Sales
3,David,22.0,HR
4,Ahmed,40.0,IT
5,Frank,,Finance
6,Grace,29.0,HR
7,Mohamed,33.0,HR
8,Ivy,,IT
9,Jack,45.0,Sales


In [None]:
df_filled = df2.fillna(0)  # Replace all missing values with 0

In [None]:
df_filled

Unnamed: 0,Name,Age,Department
0,Alice,25.0,Sales
1,Bob,0.0,Marketing
2,Charlie,35.0,Sales
3,David,22.0,HR
4,Ahmed,40.0,IT
5,Frank,0.0,Finance
6,Grace,29.0,HR
7,Mohamed,33.0,HR
8,Ivy,0.0,IT
9,Jack,45.0,Sales


* 3.2. Filling missing values with the **`mean`** of a column

In [None]:
df2['Age'] = df2['Age'].fillna(df2['Age'].mean())  # Replace missing values with the mean of the column

In [None]:
df2

Unnamed: 0,Name,Age,Department
0,Alice,25.0,Sales
1,Bob,32.714286,Marketing
2,Charlie,35.0,Sales
3,David,22.0,HR
4,Ahmed,40.0,IT
5,Frank,32.714286,Finance
6,Grace,29.0,HR
7,Mohamed,33.0,HR
8,Ivy,32.714286,IT
9,Jack,45.0,Sales


<br>

## 3.2. Data Transformation

### 3.2.1 Adding a Column

In [None]:
df2

Unnamed: 0,Name,Age,Department
0,Alice,25.0,Sales
1,Bob,32.714286,Marketing
2,Charlie,35.0,Sales
3,David,22.0,HR
4,Ahmed,40.0,IT
5,Frank,32.714286,Finance
6,Grace,29.0,HR
7,Mohamed,33.0,HR
8,Ivy,32.714286,IT
9,Jack,45.0,Sales


In [None]:
df2['Gender'] = pd.Series(['M', 'M', 'F', 'M', 'M', 'M', 'F', 'M', 'F', 'M'])

In [None]:
df2

Unnamed: 0,Name,Age,Department,Gender
0,Alice,25.0,Sales,M
1,Bob,32.714286,Marketing,M
2,Charlie,35.0,Sales,F
3,David,22.0,HR,M
4,Ahmed,40.0,IT,M
5,Frank,32.714286,Finance,M
6,Grace,29.0,HR,F
7,Mohamed,33.0,HR,M
8,Ivy,32.714286,IT,F
9,Jack,45.0,Sales,M


<br>

### 3.2.2 Removing a Column
* You can remove a column using the **`drop()`** method.

In [None]:
df2

Unnamed: 0,Name,Age,Department,Gender
0,Alice,25.0,Sales,M
1,Bob,32.714286,Marketing,M
2,Charlie,35.0,Sales,F
3,David,22.0,HR,M
4,Ahmed,40.0,IT,M
5,Frank,32.714286,Finance,M
6,Grace,29.0,HR,F
7,Mohamed,33.0,HR,M
8,Ivy,32.714286,IT,F
9,Jack,45.0,Sales,M


In [None]:
new_df = df2.drop('Gender',  axis=1)

In [None]:
new_df

Unnamed: 0,Name,Age,Department
0,Alice,25.0,Sales
1,Bob,32.714286,Marketing
2,Charlie,35.0,Sales
3,David,22.0,HR
4,Ahmed,40.0,IT
5,Frank,32.714286,Finance
6,Grace,29.0,HR
7,Mohamed,33.0,HR
8,Ivy,32.714286,IT
9,Jack,45.0,Sales


In [None]:
df2

Unnamed: 0,Name,Age,Department,Gender
0,Alice,25.0,Sales,M
1,Bob,32.714286,Marketing,M
2,Charlie,35.0,Sales,F
3,David,22.0,HR,M
4,Ahmed,40.0,IT,M
5,Frank,32.714286,Finance,M
6,Grace,29.0,HR,F
7,Mohamed,33.0,HR,M
8,Ivy,32.714286,IT,F
9,Jack,45.0,Sales,M


* If you want to remove a column without creating a new DataFrame, use the **`inplace=True`** parameter.

In [None]:
df2.drop('Gender', axis=1, inplace=True)

In [None]:
df2

Unnamed: 0,Name,Age,Department
0,Alice,25.0,Sales
1,Bob,32.714286,Marketing
2,Charlie,35.0,Sales
3,David,22.0,HR
4,Ahmed,40.0,IT
5,Frank,32.714286,Finance
6,Grace,29.0,HR
7,Mohamed,33.0,HR
8,Ivy,32.714286,IT
9,Jack,45.0,Sales


<br>

# 4. Working with DataFrames
## 4.1. Filtering Data

### 4.1.1 Filtering Rows Based on a Single Condition

In [None]:
df2['Gender'] = pd.Series(['M', 'M', 'F', 'M', 'M', 'M', 'F', 'M', 'F', 'M'])

In [None]:
men_emps = df2[df2['Gender']=='M']

In [None]:
men_emps

Unnamed: 0,Name,Age,Department,Gender
0,Alice,25.0,Sales,M
1,Bob,32.714286,Marketing,M
3,David,22.0,HR,M
4,Ahmed,40.0,IT,M
5,Frank,32.714286,Finance,M
7,Mohamed,33.0,HR,M
9,Jack,45.0,Sales,M


In [None]:
df2

In [None]:
female_emps = df2[df2['Gender']=='F']

In [None]:
female_emps

Unnamed: 0,Name,Age,Department,Gender
2,Charlie,35.0,Sales,F
6,Grace,29.0,HR,F
8,Ivy,32.714286,IT,F


<br>

### 4.1.2 Filtering Rows Based on Multiple Conditions
* You can filter rows using multiple conditions by combining them with logical operators like **`& (and)`**, **`| (or)`**, and **`~ (not)`**.

In [None]:
# Let's say we want only the employees from the sales department
male_sales = df2[(df2['Gender']=='M') & (df2['Department']=='Sales')]

In [None]:
male_sales

Unnamed: 0,Name,Age,Department,Gender
0,Alice,25.0,Sales,M
9,Jack,45.0,Sales,M


<br>

## 4.2. Sorting Data
### 4.2.1. Sorting by a column

In [None]:
df2

Unnamed: 0,Name,Age,Department,Gender
0,Alice,25.0,Sales,M
1,Bob,32.714286,Marketing,M
2,Charlie,35.0,Sales,F
3,David,22.0,HR,M
4,Ahmed,40.0,IT,M
5,Frank,32.714286,Finance,M
6,Grace,29.0,HR,F
7,Mohamed,33.0,HR,M
8,Ivy,32.714286,IT,F
9,Jack,45.0,Sales,M


In [None]:
sorted_df = df2.sort_values(['Age'])

In [None]:
sorted_df

Unnamed: 0,Name,Age,Department,Gender
3,David,22.0,HR,M
0,Alice,25.0,Sales,M
6,Grace,29.0,HR,F
1,Bob,32.714286,Marketing,M
5,Frank,32.714286,Finance,M
8,Ivy,32.714286,IT,F
7,Mohamed,33.0,HR,M
2,Charlie,35.0,Sales,F
4,Ahmed,40.0,IT,M
9,Jack,45.0,Sales,M


In [None]:
df2.sort_values(['Age'], inplace=True, ascending=False)


In [None]:
df2

Unnamed: 0,Name,Age,Department
9,Jack,45.0,Sales
4,Ahmed,40.0,IT
2,Charlie,35.0,Sales
7,Mohamed,33.0,HR
8,Ivy,32.714286,IT
1,Bob,32.714286,Marketing
5,Frank,32.714286,Finance
6,Grace,29.0,HR
0,Alice,25.0,Sales
3,David,22.0,HR


### 4.2.2. Sorting by multiple columns

In [None]:
sorted_df = df2.sort_values(['Age', 'Department'], ascending=[True,True])

In [None]:
sorted_df

Unnamed: 0,Name,Age,Department
3,David,22.0,HR
0,Alice,25.0,Sales
6,Grace,29.0,HR
5,Frank,32.714286,Finance
8,Ivy,32.714286,IT
1,Bob,32.714286,Marketing
7,Mohamed,33.0,HR
2,Charlie,35.0,Sales
4,Ahmed,40.0,IT
9,Jack,45.0,Sales


<br>

## Exercise 3: Accessing DataFrame Rows

Create a DataFrame with columns 'Student', 'Grade', and 'Pass/Fail'. Perform the following tasks:
1. Access and display the row for the student with index 1.
2. Filter the DataFrame to show only the rows where 'Pass/Fail' is 'Pass'.
3. Display the filtered DataFrame.


In [None]:
# Write your code here