# Lesson 2: Introduction to NumPy and Pandas
## Biostatistics and Probability
### TA: Pouya Taghipour
### Supervisor: Dr. Mehrdad Saviz

### <span style="color: blue;">1. What is NumPy?</span>
#### <span style="color: green;"><em>Introduction to NumPy</em></span>
NumPy is a Python library used for working with arrays. It is highly efficient for numerical computations, which makes it a powerful tool for data manipulation and scientific computing.

Installing NumPy
Before using NumPy, you need to install it. Run the following command:

In [1]:
pip install numpy

Defaulting to user installation because normal site-packages is not writeable
Note: you may need to restart the kernel to use updated packages.


### <span style="color: blue;">2. Working with NumPy Arrays</span>
#### <span style="color: green;"><em>Creating Arrays</em></span>
Arrays are the foundation of NumPy. To create an array, use the numpy.array() function:

In [2]:
import numpy as np

arr = np.array([1, 2, 3, 4])
print(arr)  # Output: [1 2 3 4]

[1 2 3 4]


<em>Examples:</em>

In [3]:
# Creating a 1D array
arr1 = np.array([10, 20, 30])
print(arr1)  # Output: [10 20 30]

# Creating a 2D array (matrix)
arr2 = np.array([[1, 2, 3], [4, 5, 6]])
print(arr2)
# Output:
# [[1 2 3]
#  [4 5 6]]

# Creating arrays using range
arr3 = np.arange(10)
print(arr3)  # Output: [0 1 2 3 4 5 6 7 8 9]

# Creating an array of zeros
arr4 = np.zeros((3, 3))
print(arr4)
# Output:
# [[0. 0. 0.]
#  [0. 0. 0.]
#  [0. 0. 0.]]

# Creating an array of ones
arr5 = np.ones((2, 4))
print(arr5)
# Output:
# [[1. 1. 1. 1.]
#  [1. 1. 1. 1.]]

[10 20 30]
[[1 2 3]
 [4 5 6]]
[0 1 2 3 4 5 6 7 8 9]
[[0. 0. 0.]
 [0. 0. 0.]
 [0. 0. 0.]]
[[1. 1. 1. 1.]
 [1. 1. 1. 1.]]


### Exercise 1:
<span style="color: red;">Task 1:</span>
Create a 3x3 array of random integers between 1 and 50.

<span style="color: red;">Task 2:</span>
Create a 1D array of even numbers between 0 and 20.

In [4]:
# Start coding here

# End here

### <span style="color: blue;">3. Array Operations</span>
#### <span style="color: green;"><em>Basic Arithmetic on Arrays</em></span>
You can perform element-wise arithmetic operations on NumPy arrays.

In [5]:
arr = np.array([1, 2, 3, 4])
print(arr + 2)  # Output: [3 4 5 6]
print(arr * 3)  # Output: [ 3  6  9 12]

[3 4 5 6]
[ 3  6  9 12]


<em>Examples:</em>

In [6]:
# Adding two arrays
arr1 = np.array([1, 2, 3])
arr2 = np.array([4, 5, 6])
result = arr1 + arr2
print(result)  # Output: [5 7 9]

# Multiplying arrays
result = arr1 * arr2
print(result)  # Output: [4 10 18]

# Scalar operations
arr = np.array([10, 20, 30])
print(arr - 5)  # Output: [ 5 15 25]

[5 7 9]
[ 4 10 18]
[ 5 15 25]


#### <span style="color: green;"><em>Matrix Multiplication</em></span>
For matrix multiplication, use np.dot():

In [7]:
A = np.array([[1, 2], [3, 4]])
B = np.array([[5, 6], [7, 8]])
C = np.dot(A, B)
print(C)
# Output:
# [[19 22]
#  [43 50]]

[[19 22]
 [43 50]]


### Exercise 2:
<span style="color: red;">Task 1:</span>
Create two 2x2 matrices. Multiply them using both element-wise multiplication and matrix multiplication.

<span style="color: red;">Task 2:</span>
Create a 1D array and subtract a scalar from each element.

In [8]:
# Start coding here

# End here

### <span style="color: blue;">4. Array Indexing and Slicing</span>
#### <span style="color: green;"><em>Accessing Elements</em></span>
You can access array elements using indices:

In [9]:
arr = np.array([10, 20, 30, 40])
print(arr[0])  # Output: 10

10


<em>Examples:</em>

In [10]:
# Accessing 2D array elements
arr2 = np.array([[1, 2], [3, 4], [5, 6]])
print(arr2[0, 1])  # Output: 2
print(arr2[2, 0])  # Output: 5

# Slicing arrays
arr = np.array([0, 1, 2, 3, 4, 5])
print(arr[1:4])  # Output: [1 2 3]

# Slicing 2D arrays
print(arr2[1:, :1])  # Output: [[3], [5]]

2
5
[1 2 3]
[[3]
 [5]]


### Exercise 3:
<span style="color: red;">Task 1:</span>
Create a 3x3 matrix and print the second row and the third column.

<span style="color: red;">Task 2:</span>
Slice a 1D array to print only the even-indexed elements.

In [None]:
# Start coding here

# End here

### <span style="color: blue;">5. Introduction to Pandas</span>
#### <span style="color: green;"><em>What is Pandas?</em></span>
Pandas is a library used for data manipulation and analysis. It provides powerful data structures like Series and DataFrame for handling labeled data.

Installing Pandas
Run the following command to install Pandas:

In [11]:
pip install pandas

Defaulting to user installation because normal site-packages is not writeable
Note: you may need to restart the kernel to use updated packages.


### <span style="color: blue;">6. Pandas Series</span>
#### <span style="color: green;"><em>Creating a Series</em></span>
A Series is like a column of data with an index:

In [12]:
import pandas as pd

s = pd.Series([10, 20, 30, 40])
print(s)

0    10
1    20
2    30
3    40
dtype: int64


<em>Examples:</em>

In [13]:
# Creating a Series with custom index
s = pd.Series([100, 200, 300], index=["a", "b", "c"])
print(s)
# Output:
# a    100
# b    200
# c    300
# dtype: int64

# Accessing elements by index
print(s["a"])  # Output: 100

a    100
b    200
c    300
dtype: int64
100


### Exercise 4:
<span style="color: red;">Task 1:</span>
Create a Series of 5 random numbers with custom indices. Access the value at the third index.

<span style="color: red;">Task 2:</span>
Create a Series with numbers from 1 to 5. Multiply each element by 2.

In [None]:
# Start coding here

# End here

### <span style="color: blue;">7. Pandas DataFrame</span>
#### <span style="color: green;"><em>Creating a DataFrame</em></span>
A DataFrame is a 2D labeled data structure similar to a table:

In [18]:
data = {
    "Name": ["Alice", "Bob", "Charlie", "Ali"],
    "Age": [25, 30, 35, 30],
    "City": ["New York", "Los Angeles", "Chicago", ""]
}
df = pd.DataFrame(data)
print(df)

      Name  Age         City
0    Alice   25     New York
1      Bob   30  Los Angeles
2  Charlie   35      Chicago
3      Ali   30             


<em>Examples:</em>

In [19]:
# Accessing columns
print(df["Name"])  # Output: Alice, Bob, Charlie

# Accessing rows by index
print(df.iloc[1])  # Output: Bob's row

# Adding a new column
df["Salary"] = [50000, 60000, 70000]
print(df)

0      Alice
1        Bob
2    Charlie
3        Ali
Name: Name, dtype: object
Name            Bob
Age              30
City    Los Angeles
Name: 1, dtype: object


ValueError: Length of values (3) does not match length of index (4)

### Exercise 5:
<span style="color: red;">Task 1:</span>
Create a DataFrame with 3 columns: Product, Price, and Quantity. Add a new column for Total Price which multiplies Price and Quantity.

<span style="color: red;">Task 2:</span>
Create a DataFrame of students with their grades in three subjects. Calculate the average grade for each student and add it as a new column.

In [None]:
# Start coding here

# End here

### <span style="color: blue;">8. DataFrame Operations</span>
#### <span style="color: green;"><em>Basic Operations</em></span>
You can perform operations on entire columns or rows of a DataFrame:

In [None]:
# Adding a new column based on existing columns
df["Discounted Price"] = df["Price"] * 0.9
print(df)

<em>Examples:</em>

In [None]:
# Filtering rows based on a condition
discounted_df = df[df["Discounted Price"] > 25]
print(discounted_df)

# Sorting the DataFrame by a column
sorted_df = df.sort_values(by="Price")
print(sorted_df)

# Summing column values
total_quantity = df["Quantity"].sum()
print(f"Total Quantity: {total_quantity}")

### Exercise 6:
<span style="color: red;">Task 1:</span>
Create a DataFrame of employees with columns Name, Age, Salary. Add a new column Bonus that is 10% of the Salary.

<span style="color: red;">Task 2:</span>
Filter the employees whose Salary is greater than 50,000 and sort the DataFrame by Age.

In [None]:
# Start coding here

# End here

### <span style="color: blue;">9. Reading and Writing Data with Pandas</span>
#### <span style="color: green;"><em>Reading Data from CSV Files</em></span>
Pandas makes it easy to load data from external sources like CSV files. Use pd.read_csv() to load data from a CSV file:

In [None]:
# Reading a CSV file
df = pd.read_csv("data.csv")
print(df.head())

<em>Examples:</em>

In [None]:
# Reading data from a CSV file
df = pd.read_csv("students.csv")
print(df)

# Saving a DataFrame to a CSV file
df.to_csv("updated_students.csv", index=False)

# Reading data from an Excel file
df_excel = pd.read_excel("grades.xlsx")
print(df_excel)

### Exercise 7:
<span style="color: red;">Task 1:</span>
Read data from a CSV file and print the first 5 rows.

<span style="color: red;">Task 2:</span>
Save a DataFrame with columns Product, Price, Quantity to a new CSV file.

In [None]:
# Start coding here

# End here

### <span style="color: blue;">10. Grouping and Aggregating Data</span>
#### <span style="color: green;"><em>GroupBy Function</em></span>
You can group data using the groupby() function, which is useful for aggregating data.

In [None]:
# Grouping data by a column
grouped = df.groupby("City")["Salary"].mean()
print(grouped)

<em>Examples:</em>

In [None]:
# Grouping by multiple columns
grouped = df.groupby(["City", "Age"])["Salary"].sum()
print(grouped)

# Aggregating using multiple functions
agg_df = df.groupby("City").agg({"Salary": ["mean", "max"], "Age": "min"})
print(agg_df)

### Exercise 8:
<span style="color: red;">Task 1:</span>
Create a DataFrame of sales data with City, Salesperson, and Amount. Group the data by City and calculate the total Amount for each city.

<span style="color: red;">Task 2:</span>
Group a DataFrame by Department and calculate the average salary for each department.

In [None]:
# Start coding here

# End here

### <span style="color: blue;">11. Merging and Concatenating DataFrames</span>
#### <span style="color: green;"><em>Concatenating DataFrames</em></span>
You can concatenate DataFrames along rows or columns using pd.concat():

In [None]:
df1 = pd.DataFrame({"A": [1, 2], "B": [3, 4]})
df2 = pd.DataFrame({"A": [5, 6], "B": [7, 8]})

# Concatenating along rows
df_concat = pd.concat([df1, df2], axis=0)
print(df_concat)

<em>Examples:</em>

In [None]:
# Concatenating along columns
df_concat = pd.concat([df1, df2], axis=1)
print(df_concat)

# Merging DataFrames using a common column
left = pd.DataFrame({"key": ["A", "B", "C"], "value": [1, 2, 3]})
right = pd.DataFrame({"key": ["A", "B", "D"], "value2": [4, 5, 6]})

merged_df = pd.merge(left, right, on="key", how="inner")
print(merged_df)

### Exercise 9:
<span style="color: red;">Task 1:</span>
Concatenate two DataFrames containing students’ names and grades in two different subjects.

<span style="color: red;">Task 2:</span>
Merge two DataFrames: one containing employee data and another containing department information. Use the employee ID to merge.

In [None]:
# Start coding here

# End here

## Conclusion
In this lesson, we covered the essential operations in NumPy and Pandas that will serve as the foundation for statistical and probability-related tasks in the next lesson. We explored:

* NumPy arrays and operations,
* Pandas Series and DataFrames,
* Reading, writing, grouping, and merging data.

Ensure you've understood each concept and completed the exercises. In the next session, we’ll focus on using these libraries in statistical applications.