## **1. Introduction to Pandas**

#### **Task 1.1: Importing Pandas**
#### Import the Pandas library and assign it the conventional alias `pd`.

In [2]:
import numpy as np
import pandas as pd

print(f"NumPy version: {np.__version__}")
print(f"Pandas version: {pd.__version__}")

NumPy version: 2.3.1
Pandas version: 2.3.0


#### **Task 1.2: Understanding Pandas Data Structures**
#### Briefly explain in a comment what the two primary data structures in Pandas are and what they represent.

In [3]:
# Pandas has two primary data structures:
# 1. Series: A one-dimensional array-like structure that holds data of any type (e.g., integers, strings) with an index for labeling.
# 2. DataFrame: A two-dimensional, table-like structure with rows and columns, where each column can be a different data type, similar to a spreadsheet or SQL table.

---

## **2. Series**

#### **Task 2.1: Creating a Series from a List**
#### Create a Pandas Series named `fruits` from the following Python list: `['apple', 'banana', 'cherry', 'date']`.
#### Print the Series.

In [4]:
list_fruits = ['apple', 'banana', 'cherry', 'date']

fruits = pd.Series(list_fruits)

print(fruits)

0     apple
1    banana
2    cherry
3      date
dtype: object


#### **Task 2.2: Creating a Series with Custom Indices**
#### Create a Pandas Series named `colors` from the following Python list: `['red', 'green', 'blue']`.
#### Assign the indices as `['color1', 'color2', 'color3']`.
#### Print the Series.

In [5]:
colors = pd.Series(['red', 'green', 'blue'], index=['color1', 'color2', 'color3'])

print(colors)

color1      red
color2    green
color3     blue
dtype: object


#### **Task 2.3: Accessing Elements in a Series**
#### Using the `fruits` Series, access and print the element at index 1.
#### Using the `colors` Series, access and print the element with the index 'color2'.

In [6]:
print(fruits[1])
print(colors['color2'])

banana
green


---

## **3. DataFrame**

#### **Task 3.1: Creating a DataFrame from a Dictionary of Lists**
#### Create a Pandas DataFrame named `student_data` from the following Python dictionary:
#### {
####  'name': ['Alice', 'Bob', 'Charlie'],
####  'age': [20, 21, 19],
####  'major': ['CS', 'Engineering', 'Math']
#### }
#### Print the DataFrame.

In [7]:
student_data = pd.DataFrame({
 'name': ['Alice', 'Bob', 'Charlie'],
 'age': [20, 21, 19],
 'major': ['CS', 'Engineering', 'Math']
})

print(student_data)

      name  age        major
0    Alice   20           CS
1      Bob   21  Engineering
2  Charlie   19         Math


#### **Task 3.2: Creating a DataFrame from a List of Dictionaries**
#### Create a Pandas DataFrame named `employee_data` from the following Python list of dictionaries:
#### [{'name': 'David', 'salary': 50000, 'department': 'Sales'},
####  {'name': 'Eve', 'salary': 60000, 'department': 'Marketing'},
####  {'name': 'Frank', 'salary': 55000, 'department': 'Sales'}]
#### Print the DataFrame.

In [8]:
employee_data = pd.DataFrame([{'name': 'David', 'salary': 50000, 'department': 'Sales'}, {'name': 'Eve', 'salary': 60000, 'department': 'Marketing'}, {'name': 'Frank', 'salary': 55000, 'department': 'Sales'}])

print(employee_data)

    name  salary department
0  David   50000      Sales
1    Eve   60000  Marketing
2  Frank   55000      Sales


#### **Task 3.3: Creating a DataFrame from a NumPy Array**
#### Create a NumPy array `data_array` with shape (3, 2) containing random integers between 1 and 10.
#### Create a Pandas DataFrame named `random_df` from this NumPy array with column names 'Column A' and 'Column B'.
#### Print the DataFrame.

In [9]:
data_array = np.random.randint(1, 11, size=(3, 2))

random_df = pd.DataFrame(data_array, columns=['Column A', 'Column B'])

print(random_df)

   Column A  Column B
0         1        10
1         7         8
2         9         9


---

## **4. Exploring Data (Series and DataFrame)**

#### **Task 4.1: Exploring a Series**
#### Using the `fruits` Series, find and print:
#### - The number of elements in the Series.
#### - The unique values in the Series.
#### - The count of each unique value in the Series.

In [12]:
num_elements = len(fruits)
print("Number of elements:", num_elements)

unique_values = fruits.unique()
print("Unique values:", unique_values)

value_counts = fruits.value_counts()
print("Count of each unique value:\n", value_counts)

Number of elements: 4
Unique values: ['apple' 'banana' 'cherry' 'date']
Count of each unique value:
 apple     1
banana    1
cherry    1
date      1
Name: count, dtype: int64


#### **Task 4.2: Exploring a DataFrame**
#### Using the `student_data` DataFrame, find and print:
#### - The first 2 rows of the DataFrame.
#### - The last row of the DataFrame.
#### - The index of the DataFrame.
#### - The column names of the DataFrame.
#### - A concise summary of the DataFrame (including data types and non-null values).

In [13]:
print("First 2 rows:\n", student_data.head(2))

print("Last row:\n", student_data.tail(1))

print("Index:", student_data.index)

print("Column names:", student_data.columns.tolist())

print("Summary:\n", student_data.info())

First 2 rows:
     name  age        major
0  Alice   20           CS
1    Bob   21  Engineering
Last row:
       name  age major
2  Charlie   19  Math
Index: RangeIndex(start=0, stop=3, step=1)
Column names: ['name', 'age', 'major']
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 3 entries, 0 to 2
Data columns (total 3 columns):
 #   Column  Non-Null Count  Dtype 
---  ------  --------------  ----- 
 0   name    3 non-null      object
 1   age     3 non-null      int64 
 2   major   3 non-null      object
dtypes: int64(1), object(2)
memory usage: 204.0+ bytes
Summary:
 None


#### **Task 4.3: Selecting Columns and Rows**
#### Using the `employee_data` DataFrame:
#### - Select and print only the 'name' column.
#### - Select and print the row at index 1.
#### - Select and print the 'name' and 'salary' columns.
#### - Select and print the first two rows and the 'name' and 'department' columns.

In [14]:
print("Name column:\n", employee_data['name'])

print("Row at index 1:\n", employee_data.loc[1])

print("Name and salary columns:\n", employee_data[['name', 'salary']])

print("First two rows with name and department:\n", employee_data.loc[:1, ['name', 'department']])

Name column:
 0    David
1      Eve
2    Frank
Name: name, dtype: object
Row at index 1:
 name                Eve
salary            60000
department    Marketing
Name: 1, dtype: object
Name and salary columns:
     name  salary
0  David   50000
1    Eve   60000
2  Frank   55000
First two rows with name and department:
     name department
0  David      Sales
1    Eve  Marketing


---

## **5. Operations on DataFrames**

#### **Task 5.1: Adding a New Column**
#### Add a new column named 'city' to the `student_data` DataFrame with the values: `['New York', 'London', 'Paris']`.
#### Print the updated DataFrame.

In [16]:
student_data['city'] = ['New York', 'London', 'Paris']

print(student_data)

      name  age        major      city
0    Alice   20           CS  New York
1      Bob   21  Engineering    London
2  Charlie   19         Math     Paris


#### **Task 5.2: Removing a Column**
#### Remove the 'city' column from the `student_data` DataFrame.
#### Print the DataFrame after removal.

In [17]:
student_data = student_data.drop('city', axis=1)
print(student_data)

      name  age        major
0    Alice   20           CS
1      Bob   21  Engineering
2  Charlie   19         Math


#### **Task 5.3: Filtering Rows Based on a Condition**
#### Using the `employee_data` DataFrame, filter and print the rows where the 'department' is 'Sales'.

In [18]:
sales_employees = employee_data[employee_data['department'] == 'Sales']
print(sales_employees)

    name  salary department
0  David   50000      Sales
2  Frank   55000      Sales


#### **Task 5.4: Sorting a DataFrame**
#### Sort the `employee_data` DataFrame by the 'salary' column in descending order.
#### Print the sorted DataFrame.

In [19]:
sorted_data = employee_data.sort_values(by='salary', ascending=False)
print(sorted_data)

    name  salary department
1    Eve   60000  Marketing
2  Frank   55000      Sales
0  David   50000      Sales


#### **Task 5.5: Basic Aggregation**
#### Using the `employee_data` DataFrame, calculate and print the average salary of all employees.

In [20]:
average_salary = employee_data['salary'].mean()
print("Average salary:", average_salary)

Average salary: 55000.0
