# Introducing Markdown reader in SparkNLP
This notebook showcases the newly added  `sparknlp.read().md()` method in Spark NLP that parses Markdown content from both local files and real-time URLs into a Spark DataFrame.

**Key Features:**
- Ability to parse Markdown from local directories and URLs.
- Versatile support for varied data ingestion scenarios.

## Setup and Initialization
Let's keep in mind a few things before we start 😊

Support for reading markdown files was introduced in Spark NLP 6.0.5. Please make sure you have upgraded to the latest Spark NLP release.

- Let's install and setup Spark NLP in Google Colab
- This part is pretty easy via our simple script

In [None]:
! wget -q http://setup.johnsnowlabs.com/colab.sh -O - | bash

For local files example we will download a markdown file from Spark NLP Github repo:

In [None]:
!mkdir md-files
!wget https://raw.githubusercontent.com/JohnSnowLabs/spark-nlp/master/src/test/resources/reader/md/simple.md -P md-files

mkdir: cannot create directory ‘md-files’: File exists
--2025-07-02 14:27:11--  https://raw.githubusercontent.com/JohnSnowLabs/spark-nlp/feature/SPARKNLP-1213-Adding-MarkdownReader/src/test/resources/reader/md/simple.md
Resolving raw.githubusercontent.com (raw.githubusercontent.com)... 185.199.111.133, 185.199.110.133, 185.199.108.133, ...
Connecting to raw.githubusercontent.com (raw.githubusercontent.com)|185.199.111.133|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 181 [text/plain]
Saving to: ‘md-files/simple.md’


2025-07-02 14:27:11 (2.39 MB/s) - ‘md-files/simple.md’ saved [181/181]



## Parsing Markdown from Local Files
Use the `md()` method to parse Markdown content from local directories.

In [None]:
import sparknlp
md_df = sparknlp.read().md("./md-files")

md_df.show()

+--------------------+--------------------+
|                path|                  md|
+--------------------+--------------------+
|file:/content/md-...|[{Title, Introduc...|
+--------------------+--------------------+



In [None]:
md_df.printSchema()

root
 |-- path: string (nullable = true)
 |-- md: array (nullable = true)
 |    |-- element: struct (containsNull = true)
 |    |    |-- elementType: string (nullable = true)
 |    |    |-- content: string (nullable = true)
 |    |    |-- metadata: map (nullable = true)
 |    |    |    |-- key: string
 |    |    |    |-- value: string (valueContainsNull = true)

