# Welcome to a PySpark notebook in Microsoft Fabric!
---
### This is an example of a *Markdown* cell
##### Before we look at some code, let's review some aspects of the UI
1. Attaching lakehouse(s) and warehouse(s) to the notebook (see below code snippet for environment info)
3. Setting the Default language for the notebook
4. Connect to Session
5. Run all button

### Hey DataBard! Don't forget to tell them about Data Wrangler!

In [1]:
#1. How do we access data in a connected lakehouse?
#Databard, do the demo everyone has to do with dimension_customer!


df = spark.sql("SELECT * FROM LH_Databard_Demo.dimension_customer LIMIT 1000")
display(df)

StatementMeta(, 8723f5c5-af51-4775-94e4-1f1b706602a3, 3, Finished, Available, Finished)

SynapseWidget(Synapse.DataFrame, 05ec98ea-7544-4d95-8013-d85c043313e0)

In [None]:
#How do we write to storage?
#With the write function
#And since I have a lakehouse attached, I can access it relatively
df.write.mode("overwrite").format("delta").save("Tables/dimension_customer_write")

#Without the lakehouse attached, here's the same code
#Note: If you run this in your environment, you'll have to copy the path in here.
df.write.mode("overwrite").format("delta").save("abfss://84e6d815-34b7-49bb-a433-ebec208e5cdb@onelake.dfs.fabric.microsoft.com/b389d6fd-e091-479b-ac43-6640a58407bd/Tables/dimension_customer_write")

#I can write the DataFrame to different formats.
df.write.mode("overwrite").format("csv").save("Files/CustomerCSVs/dimension_customer_write")


In [9]:
#Let's check with a data file I've prepared that contains data about artists in the music industry.
df = spark.read.format("csv").option("header","true").load("Files/Input/updated_music_industry_data.csv")
# df now is a Spark DataFrame containing CSV data

#Display allows us to view the contents of a dataframe, as well as create charts.
display(df)

StatementMeta(, e469965a-079b-4d44-ae52-5fca3d8e97ad, 14, Finished, Available, Finished)

SynapseWidget(Synapse.DataFrame, 6cfaa21c-1875-4365-b44e-55afefeea093)

In [None]:
# Code generated by Data Wrangler for pandas sample

def clean_data(pandas_df):
    # Drop duplicate rows across all columns
    pandas_df = pandas_df.drop_duplicates()
    # Change column type to int8 for column: 'Age'
    pandas_df = pandas_df.astype({'Age': 'int8'})
    return pandas_df

# Loaded variable 'df' from kernel state
pandas_df = df.limit(5000).toPandas()

pandas_df_clean = clean_data(pandas_df.copy())
pandas_df_clean.head()

In [2]:
# Code generated by Data Wrangler for PySpark DataFrame

from pyspark.sql import types as T

def clean_data(df):
    # Drop duplicate rows across all columns
    df = df.dropDuplicates()
    # Change column type to int8 for column: 'Age'
    df = df.withColumn('Age', df['Age'].cast(T.ByteType()))
    return df

df_clean = clean_data(df)
display(df_clean)

StatementMeta(, 148a6386-0caa-482a-839a-cbf2b6cbc6b2, 21, Finished, Available, Finished)

SynapseWidget(Synapse.DataFrame, 3a48c280-345c-4796-a6d0-53b479cd119b)

# What to do when Weird Al has taken over your data?!
# Data Wrangler to the rescue!
## Let's try the following:
### 1. Get rid of duplicates
### 2. Convert Age into an Integer
### 3. If time allows, split Name into first and last

In [10]:
#What if we want to access this notebook externally and make it dynamic?
#Parameter cell - These variables can be populated by external solutions, like pipelines.

Parameter1 = ''
Parameter2 = 2

StatementMeta(, 0b32b888-659c-49d6-8c27-e47d949d5074, 12, Finished, Available, Finished)

In [1]:
#What about all of these storage solutions I want to interact with? 
#How do I reach them without using the UI?
import sempy
import sempy.fabric as fabric
import json
from pyspark.sql.functions import col,lit
import pandas as pd

#Can I get the lakehouse ID for a lakehouse by finding its name?
lakehouseName = 'LH_Databard_Demo'

#Get basic details from the fabric library
workspaceID = fabric.get_workspace_id()
workspaceName = fabric.resolve_workspace_name(workspaceID)

#Call the Fabric REST API, specifically looking at the lakehouse API
client = fabric.FabricRestClient()
response = client.get(f"/v1/workspaces/{workspaceID}/lakehouses")
responseJson = response.json()
items= pd.json_normalize(responseJson['value'], sep='_')

#Create a dataframe and start looking for our values
df = spark.createDataFrame(items)
#display(df)

#This is for context, showing some of the various properties we can get about lakehouses
result_df = df.select(lit(workspaceName).alias("WorkSpace"),
                      col("id").alias("LakehouseId"),
                      col("displayName").alias("Name"),
                      col("type").alias("Type"),
                      col("description").alias("Description"),
                      col("properties_sqlEndpointProperties_connectionString").alias("ConnectionString"),
                      col("properties_oneLakeTablesPath").alias("OneLakeTablePath"),
                      col("properties_oneLakeFilesPath").alias("OneLakeFilePath"))

#Filter to the record we want
result_df = result_df.filter(result_df["Name"] == lakehouseName)
display(result_df)

#Return Lakehouse ID of the desired lakehouse
result_lakehouseid = result_df.select(col("LakehouseId"))
#display(result_lakehouseid)

StatementMeta(, e93530fa-6a88-4543-8600-d6fd490d26a0, 3, Finished, Available, Finished)

SynapseWidget(Synapse.DataFrame, beeea803-3ba4-4784-af51-6aa46c772cbd)

In [6]:
#NotebookUtils Demo
#Notice how you don't have to manually create the environment and connection objects into the directory.
#Fabric already knows your environment. NotebookUtils gives you easy tools to access things.
import notebookutils

#input parameters
base_directory_absolute = 'abfss://e416ff5b-6376-4614-a9a0-3f79fe99c1b0@onelake.dfs.fabric.microsoft.com/c1d01625-3177-4082-a112-4339acf9c69d/Files/Input'
base_directory = 'Files' #Simpler, right?
ignore_file_mask = 'Test'
file_target = 'music_industry_data.csv'
input_directory = f'{base_directory}/Input'
processed_directory = f'{base_directory}/Processed'
ignored_paths = {"Input", "Processed"}

#Folder Creation and File Move functions
def create_base_architecture(base_directory : str):
    base_list = notebookutils.fs.ls(base_directory)
    #Remove ignored paths that may exist
    base_list = [file for file in base_list if file.name not in ignored_paths]
    #Create directories in case they don't exist
    notebookutils.fs.mkdirs(input_directory)
    notebookutils.fs.mkdirs(processed_directory)
    display("Processing directories created. Checking for files...")
    if base_list:
        display("Files exist. Begin process...")
        display(base_list)
        #assume all files in the base directory are input files
        for file in base_list:
            notebookutils.fs.mv(file.path, f"{input_directory}/{file.name}")     
    else:
        display("No files moved to initial directory. No files exist.")

def move_files(input_directory : str, processed_directory : str, ignored_paths=None, target_file=None):
    file_list = notebookutils.fs.ls(input_directory)
    if file_list:
        #display(file_list)
        #Remove any files that don't contain the file mask
        #removed_list = [file for file in file_list if ignore_file_mask not in file.name]
        #display("The following files will be ignored. Please make sure they contain the file mask in order to be processed.")
        #display(removed_list)
        file_list = [file for file in file_list if ignore_file_mask not in file.name]

        for file in file_list:
            if (file.name == target_file) or (not target_file):
                display(f"Current file: {file.name}")
                notebookutils.fs.mv(file.path, f"{processed_directory}/{file.name}")
            else:
                display(f"Current File {file.name} ignored. Not target file.")
    else:
        display("No files moved. No files found in source directory.")

#Main section
create_base_architecture(base_directory=base_directory)

move_files(input_directory, processed_directory)

StatementMeta(, 0e85067b-1873-4693-9ac7-b82aeff6f9d2, 39, Finished, Available, Finished)

'Processing directories created. Checking for files...'

'No files moved to initial directory. No files exist.'

'Current file: updated_music_industry_data.csv'

# Demo End

# Debug Section Start

In [8]:
##How did I create the Weird Al Data?
from pyspark.sql.functions import lit, when, col, expr, rand, floor
import random

# Demo data. Let's create a bunch of possible values for every column
names = ["John Doe", "Jane Smith", "Alice Johnson", "Bob Brown", "Charlie Davis", "Diana Evans", "Eve Foster", "Frank Green", "Grace Harris"]
first_names = ["John", "Jane", "Alice", "Bob", "Charlie", "Diana", "Eve", "Frank", "Grace", "Smith", "Bill", "Jon", "Vivian", "Stacy", "Heidi", "Karen", "Otto", "Belinda"]
last_names = ["Doe", "Smith", "Johnson", "Brown", "Davis", "Evans", "Foster", "Green", "Harris", "Johann", "Pingel", "Kelbert", "Hiddleston", "Windsor", "Workmann", "Drews"]
genres = ["Rock", "Pop", "Jazz", "Classical", "Hip Hop", "Country", "Electronic", "Reggae", "Blues", "Metal"]
addresses = ["123 Main St", "456 Elm St", "789 Oak St", "101 Maple Ave", "202 Pine Rd", "303 Cedar Blvd", "404 Birch Ln", "505 Spruce Dr", "606 Willow Ct", "707 Aspen Pl"]
emails = ["example1@example.com", "example2@example.com", "example3@example.com", "example4@example.com", "example5@example.com", "example6@example.com", "example7@example.com", "example8@example.com", "example9@example.com", "example10@example.com"]
phone_numbers = ["555-1234", "555-5678", "555-8765", "555-4321", "555-6789", "555-9876", "555-3456", "555-6543", "555-7890", "555-0987"]

# Add in some environment info for where we're going to save our results
lakehouse_address = 'abfss://84e6d815-34b7-49bb-a433-ebec208e5cdb@onelake.dfs.fabric.microsoft.com/b389d6fd-e091-479b-ac43-6640a58407bd'
file_name = 'music_industry_data.csv'
updated_file_name = 'updated_music_industry_data.csv'

#Create some derived variables based on the environment info
file_address = f"{lakehouse_address}/Files/Input/{file_name}"
updated_file_address = f"{lakehouse_address}/Files/Input/{updated_file_name}"

# Create a function to generate random age as a string (whole number or decimal)
def generate_random_age():
    return random.randint(18, 70) if random.choice([True, False]) else random.uniform(18, 70)

# # Generate 100s of records in a Spark dataframe
# data = spark.range(0, 100000000).withColumn("Name", when((col("id") < 20) | (col("id") % 5 == 0), lit("Weird Al Yankovic"))
#                                             .otherwise(lit(random.choice(names)))) \
#                                 .withColumn("FirstName", when((col("id") < 20) | (col("id") % 5 == 0), lit("Weird Al"))
#                                             .otherwise(lit(random.choice(first_names)))) \
#                                 .withColumn("LastName", when((col("id") < 20) | (col("id") % 5 == 0), lit("Yankovic"))
#                                             .otherwise(lit(random.choice(last_names)))) \
#                                 .withColumn("Age", lit(generate_random_age())) \
#                                 .withColumn("Address", lit(random.choice(addresses))) \
#                                 .withColumn("Email", lit(random.choice(emails))) \
#                                 .withColumn("PhoneNumber", lit(random.choice(phone_numbers))) \
#                                 .withColumn("Genre", lit(random.choice(genres))) \
#                                 .withColumn("AgeBucket", lit("Unknown"))


data = spark.range(0, 100000000).withColumn(
    "Name",
    when((col("id") < 20) | (col("id") % 5 == 0), lit("Weird Al Yankovic"))
    .otherwise(expr("element_at(array(" + ", ".join([f"'{name}'" for name in names]) + "), cast(rand() * " + str(len(names)) + " + 1 as int))"))
).withColumn(
    "FirstName",
    when((col("id") < 20) | (col("id") % 5 == 0), lit("Weird Al"))
    .otherwise(expr("element_at(array(" + ", ".join([f"'{first_name}'" for first_name in first_names]) + "), cast(rand() * " + str(len(first_names)) + " + 1 as int))"))
).withColumn(
    "LastName",
    when((col("id") < 20) | (col("id") % 5 == 0), lit("Yankovic"))
    .otherwise(expr("element_at(array(" + ", ".join([f"'{last_name}'" for last_name in last_names]) + "), cast(rand() * " + str(len(last_names)) + " + 1 as int))"))
).withColumn(
    "Age",
    (floor(rand() * 53) + 18).cast("int")  # Random age between 18 and 70
).withColumn(
    "Genre",
    expr("element_at(array(" + ", ".join([f"'{genre}'" for genre in genres]) + "), cast(rand() * " + str(len(genres)) + " + 1 as int))")
).withColumn(
    "Address",
    expr("element_at(array(" + ", ".join([f"'{address}'" for address in addresses]) + "), cast(rand() * " + str(len(addresses)) + " + 1 as int))")
).withColumn(
    "Email",
    expr("element_at(array(" + ", ".join([f"'{email}'" for email in emails]) + "), cast(rand() * " + str(len(emails)) + " + 1 as int))")
).withColumn(
    "PhoneNumber",
    expr("element_at(array(" + ", ".join([f"'{phone}'" for phone in phone_numbers]) + "), cast(rand() * " + str(len(phone_numbers)) + " + 1 as int))")
).withColumn(
    "AgeBucket",
    lit("Unknown")
)
#We don't want the ID anymore, so we'll remove that here
data = data.drop("id")


# Display the DataFrame
#print(data)

# Save DataFrame to a CSV file
data.write.mode("overwrite").format("csv").option("header", "true").save(file_address)

# Now, let's mess with the data
# Obviously this isn't something you'd do in a production setting, this is purely for dramatic effect.
# Find the first record with the name 'Weird Al Yankovic'
first_weird_al_record = data.filter(col("Name") == "Weird Al Yankovic").limit(1).collect()[0]

data_updated = data.withColumn("Age", when(col("Name") == "Weird Al Yankovic", lit(first_weird_al_record["Age"])).otherwise(col("Age"))) \
                   .withColumn("Address", when(col("Name") == "Weird Al Yankovic", lit(first_weird_al_record["Address"])).otherwise(col("Address"))) \
                   .withColumn("Email", when(col("Name") == "Weird Al Yankovic", lit(first_weird_al_record["Email"])).otherwise(col("Email"))) \
                   .withColumn("PhoneNumber", when(col("Name") == "Weird Al Yankovic", lit(first_weird_al_record["PhoneNumber"])).otherwise(col("PhoneNumber"))) \
                   .withColumn("Genre", when(col("Name") == "Weird Al Yankovic", lit(first_weird_al_record["Genre"])).otherwise(col("Genre"))) \
                   .withColumn("AgeBucket", when(col("Name") == "Weird Al Yankovic", lit(first_weird_al_record["AgeBucket"])).otherwise(col("AgeBucket")))


# # Display the updated DataFrame
# print(data_updated)

# Save the updated DataFrame to a new CSV file
#df.to_csv(updated_file_address, index=False)
data_updated.write.format("csv").mode("overwrite").option("header", "true").save(updated_file_address)

StatementMeta(, e469965a-079b-4d44-ae52-5fca3d8e97ad, 13, Finished, Available, Finished)

In [12]:
#Rollback move demo
move_files(processed_directory, input_directory)


StatementMeta(, 876796b9-07ac-4117-b80c-63ffaf63cd16, 14, Finished, Available, Finished)

True

In [None]:
# Demo 2
# We know our enviroment. Let's Transform Data!

from pyspark.sql.types import StructType, StructField, StringType, IntegerType
from pyspark.sql.functions import lit, when
import pandas as pd
import resource #Used for assessing resources used
import random
from pyspark.context import SparkContext

# # Function to generate data for the dataframe
def generate_data(num_rows):
    names = ['Alice', 'Bob', 'Charlie']
    ages = [25, 30, 35]
    occupations = ['Engineer', 'Analyst', 'Manager']
    
    data = {'Name': ['Alice'], 'Age': [25], 'Occupation': ['Engineer']}  # Fix: Change 'FirstName' to 'Name'
    
    for _ in range(num_rows):
        data['Name'].append(random.choice(names))  # Fix: Change 'FirstName' to 'Name'
        data['Age'].append(random.choice(ages))
        data['Occupation'].append(random.choice(occupations))
    
    return spark.createDataFrame(data, ['Name', 'Age', 'Occupation'])


# Example: Create a simple dataframe with sample data
# Here, we create a DataFrame named df using the createDataFrame method provided by the SparkSession. 
# The DataFrame is a distributed collection of data organized into named columns. 
# In this case, we pass a list of tuples containing the sample data and specify the column names as 'Name' and 'Age'.
data = [('Alice', 25, 'Manager'), ('Bob', 30, 'Analytst'), ('Charlie', 35, 'Engineer')]
df = spark.createDataFrame(data, ['Name', 'Age', 'Occupation'])

# new_data = generate_data(100000)
# df = df.union(new_data)

# Print the initial dataframe to the console in a tabular format
print("Initial DataFrame:")
df.show()

# Display the column names and data types
print("Data Types:")
df.printSchema()

# Step 1: Profile the data - Calculate the average age
# The selectExpr() method is used to select and compute an expression on the DataFrame. 
# In this case, we calculate the average of the 'Age' column and alias it as 'avg_age'. 
# The collect() method is then used to retrieve the result as a list, and we access the average age value using indexing.
average_age = df.selectExpr('avg(Age) as avg_age').collect()[0]['avg_age']
print("Average Age:", average_age)

# Step 2: Transform the data - Add a new column 'Category' using a case statement
# In this step, we add another new column named 'Category' to the DataFrame using a case statement. 
# The when() function is used to define the conditions and corresponding values for the 'Category' column. 
# If the 'Age' is less than 30, the value is set to 'Young'. 
# If the 'Age' is greater than or equal to 30, the value is set to 'Adult'.
# Otherwise, the value is set to 'Unknown'. Finally, we print the DataFrame with the new column.
df = df.withColumn('Category', when(df['Age'] < 30, 'Young')
                               .when(df['Age'] >= 30, 'Adult')
                               .otherwise('Unknown'))
print("\nDataFrame with Category column:")
df.show()

# Step 3: Profile the data - Count the number of records by occupation
#Here, we perform a profiling operation on the DataFrame by counting the number of records for each occupation. 
# The groupBy() method is used to group the DataFrame by the 'Occupation' column, and the count() method is applied to calculate the count for each group. 
# The result is stored in the occupation_counts DataFrame, and we print it to display the occupation counts.
occupation_counts = df.groupBy('Occupation').count()
print("\nOccupation Counts:")
occupation_counts.show()

# Step 4: Transform the data - Filter records based on age
# In this step, we filter the DataFrame to include only the records where the age is greater than 30. 
# The filter() method is used to apply the filtering condition, and the resulting DataFrame is stored in filtered_df. 
filtered_df = df.filter(df['Age'] > 30)
print("\nFiltered DataFrame:")
filtered_df.show()

# Step 5: Profile the data - Describe the dataframe
# The describe() method computes summary statistics for each numerical column in the DataFrame, including count, mean, standard deviation, minimum, and maximum values. 
# The show() method is then used to display the summary statistics, including count, mean, standard deviation, minimum, and maximum values.
print("\nDataFrame Description:")
df.describe().show()

# Step 6: Display the dataframe
# Finally, we display the DataFrame with the new columns and transformations applied.
# The display() function is a utility provided by PySpark to render the DataFrame in a way that is suitable for Jupyter notebooks or other frontends. 
# It allows for more advanced visualizations and interactivity compared to the regular show() method.
display(df)

# Bonus Content: Create a second dataframe with explicit schema
# In this part, we create a second DataFrame named df_explicit_schema with an explicit schema. 
# The schema is defined using the StructType and StructField classes from the pyspark.sql.types module. 
# The schema specifies the column names, data types, and nullability constraints. 
# We then create the DataFrame using the createDataFrame() method, passing the sample data and the explicit schema. Finally, we print the second DataFrame with the explicit schema.
schema = StructType([
    StructField('Name', StringType(), nullable=False),
    StructField('Age', IntegerType(), nullable=False),
    StructField('Occupation', StringType(), nullable=False)
])
df_explicit_schema = spark.createDataFrame(data, schema)

print("\nSecond DataFrame with Explicit Schema:")
df_explicit_schema.show()

# Define the target directory for the Delta table
target_directory = ""

# Save the DataFrame to the Delta table with overwrite mode
df.write.format("delta").mode("overwrite").save(target_directory)



In [11]:
#What if I want details about my environment?

#From Bradley Schacht, 'Gathering useful notebook and environment details at runtime'

#import trident

default_lakehouse_id    = 'No default lakehouse' if spark.conf.get("trident.lakehouse.id") == '' else spark.conf.get("trident.lakehouse.id")
default_lakehouse_name  = 'No default lakehouse' if spark.conf.get("trident.lakehouse.name") == '' else spark.conf.get("trident.lakehouse.name")
notebook_item_id        = spark.conf.get("trident.artifact.id")
#notebook_item_name      = spark.conf.get("trident.artifact.name")
pool_executor_cores     = spark.sparkContext.getConf().get("spark.executor.cores")
pool_executor_memory    = spark.sparkContext.getConf().get("spark.executor.memory")
pool_min_executors      = spark.sparkContext.getConf().get("spark.dynamicAllocation.minExecutors")
pool_max_executors      = spark.sparkContext.getConf().get("spark.dynamicAllocation.maxExecutors")
pool_number_of_nodes    = len(str(sc._jsc.sc().getExecutorMemoryStatus().keys()).replace("Set(","").replace(")","").split(", "))
spark_app_name          = spark.sparkContext.getConf().get("spark.app.name")[::-1].split("_",1)[0][::-1]
workspace_id            = spark.conf.get("trident.artifact.workspace.id")
#workspace_name          = spark.conf.get("trident.artifact.workspace.name")

print(f'default_lakehouse_id:   {default_lakehouse_id}')
print(f'default_lakehouse_name: {default_lakehouse_name}')
print(f'notebook_item_id:       {notebook_item_id}')
#print(f'notebook_item_name:     {notebook_item_name}')
print(f'spark_app_name:         {spark_app_name}')
print(f'pool_executor_cores:    {pool_executor_cores}')
print(f'pool_executor_memory:   {pool_executor_memory}')
print(f'pool_min_executors:     {pool_min_executors}')
print(f'pool_max_executors:     {pool_max_executors}')
print(f'pool_number_of_nodes:   {pool_number_of_nodes}')
print(f'workspace_id:           {workspace_id}')
#print(f'workspace_name:         {workspace_name}')

#Did you attach a default lakehouse?

#What about accessing Lakehouses outside of the default? Copy Path!


StatementMeta(, 0b32b888-659c-49d6-8c27-e47d949d5074, 13, Finished, Available, Finished)

default_lakehouse_id:   75de1ae4-93f7-4582-9799-ca34d6429709
default_lakehouse_name: Databard_Demo
notebook_item_id:       2dd1eb8c-34cd-491f-9575-850a064f397f
spark_app_name:         0b32b888-659c-49d6-8c27-e47d949d5074
pool_executor_cores:    8
pool_executor_memory:   56g
pool_min_executors:     1
pool_max_executors:     1
pool_number_of_nodes:   2
workspace_id:           e35b5629-0c71-450d-b52f-4e2ecc6836c7


In [7]:
%%pyspark 
!echo "spark.trident.pbiApiVersion=v1">>/home/trusted-service-user/.trident-context
#Word to the wise! If you are using Fabric Lakehouse schemas in preview, there is a known bug that returns a 'forbidden' error when interacting with the lakehouse.
#If you run into this, use this code to work around it!
#This should only be an issue until schemas go GA

StatementMeta(, 0b32b888-659c-49d6-8c27-e47d949d5074, 9, Finished, Available, Finished)

## Tips for Formatting Text in Markdown

Markdown provides several options for formatting text in a Jupyter Notebook markdown cell. These options include:

### 1. Headers

Headers are used to create different levels of headings. There are six levels of headers available in markdown, denoted by the number of hash symbols (#) used before the text. For example:

```
# Heading 1
## Heading 2
### Heading 3
#### Heading 4
##### Heading 5
###### Heading 6
```

### 2. Emphasis

To emphasize text, you can use asterisks (*) or underscores (_) around the text. Here are some examples:

```
*Italic text*
_Italic text_

**Bold text**
__Bold text__

***Bold and italic text***
___Bold and italic text___
```

### 3. Lists

Markdown supports both ordered and unordered lists. For unordered lists, you can use asterisks (*), plus signs (+), or hyphens (-) as bullet points. For example:

```
- Item 1
- Item 2
- Item 3

* Item 1
* Item 2
* Item 3

+ Item 1
+ Item 2
+ Item 3
```

For ordered lists, you can use numbers followed by periods. For example:

```
1. Item 1
2. Item 2
3. Item 3
```

### 4. Links

To create a hyperlink, you can use square brackets [] to enclose the link text, followed by parentheses () containing the URL. For example:

```
[GitHub](https://github.com)
```

### 5. Images

To display an image, you can use an exclamation mark (!), followed by square brackets [] containing the alt text, and parentheses () containing the image URL. For example:

```
![Alt Text](https://example.com/image.jpg)
```

### 6. Code Blocks

To display code blocks, you can use triple backticks (```) before and after the code. You can also specify the programming language for syntax highlighting. For example:

\```python
print("Hello, World!")
\```

### 7. Horizontal Lines

To insert a horizontal line, you can use three or more hyphens (-), asterisks (*), or underscores (_). For example:

```
---
```

These are just some of the formatting options available in markdown. You can explore more advanced features and syntax by referring to the markdown documentation.

Feel free to experiment with these formatting options in your Jupyter Notebook markdown cells to create visually appealing and well-structured documentation for your workflow.
