Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions docs.json
Original file line number Diff line number Diff line change
Expand Up @@ -262,6 +262,7 @@
{
"group": "Tool demos",
"pages": [
"examplecode/tools/jq",
"examplecode/tools/firecrawl",
"examplecode/tools/langflow",
"examplecode/tools/vectorshift",
Expand Down
178 changes: 178 additions & 0 deletions examplecode/tools/jq.mdx
Original file line number Diff line number Diff line change
@@ -0,0 +1,178 @@
---
title: Query JSON with jq
---

[jq](https://jqlang.org/) is a lightweight and flexible command-line JSON processor. You can use `jq` on a local development machine to
slice, filter, map, and transform the JSON data that Unstructured outputs in much the same ways that tools such as `sed`, `awk`, and `grep` let you work with text.

To get `jq`, see the [Download jq](https://jqlang.org/download/) page.

<Info>
`jq` is not owned or supported by Unstructured. For questions about `jq`and
feature requests for future versions of `jq`, see the [Issues](https://github.com/jqlang/jq/issues) tab of the
`jq` repository in GitHub.
</Info>

The following command examples use `jq` with the
[spring-weather.html.json](https://github.com/Unstructured-IO/unstructured/blob/main/example-docs/spring-weather.html.json) file in the
**example-docs** directory within the **Unstructured-IO/unstructured** repository in GitHub.

Find the element with a `type` of `Address`, and print the element's `text` field's value.

```bash
jq '.[]
| select(.type == "Address")
| .text' spring-weather.html.json
```

The output is:

```bash
"Silver Spring, MD 20910"
```

Find all elements with a `type` of `Title`, and print the `text` field of each found element as a string in a JSON array.

```bash
jq '[
.[]
| select(.type == "Title")
| .text]' spring-weather.html.json
```

The output is:

```bash
[
"News Around NOAA",
"National Program",
"Are You Weather-Ready for the Spring?",
"Weather.gov >",
"News Around NOAA > Are You Weather-Ready for the Spring?",
"US Dept of Commerce",
"National Oceanic and Atmospheric Administration",
"National Weather Service",
"News Around NOAA",
"1325 East West Highway",
"Comments? Questions? Please Contact Us.",
"Disclaimer",
"Information Quality",
"Help",
"Glossary",
"Privacy Policy",
"Freedom of Information Act (FOIA)",
"About Us",
"Career Opportunities"
]
```

Find all elements with a `type` of `Title`. Of these, find the ones that have a `text` field that contains the phrase `Contact Us`, and print the contents of each found element's `metadata.link_urls` field.

```bash
jq '.[]
| select(.type == "Title")
| select(.text
| contains("Contact Us"))
| .metadata.link_urls' spring-weather.html.json
```

The output is:

```bash
[
"https://www.weather.gov/news/contact"
]
```

Find all elements with a `type` of `ListItem`. Of these, find the ones that have a `text` field that contains the phrase `Weather Safety`.
For each item in `metadata.link_texts`, print the item's value as the key, followed by the matching item in
`metadata.link_urls` as the value. Trim any leading and trailing whitespace from all values. Wrap the output in a JSON array.

```bash
jq '[
.[]
| select(.type == "ListItem")
| select(.text | test("Weather Safety"; "i"))
| [.metadata.link_texts, .metadata.link_urls]
| transpose[]
| {
(.[0] | gsub("^\\s+|\\s+$"; "")) : (.[1] | gsub("^\\s+|\\s+$"; ""))
}
]' spring-weather.html.json
```

The output is:

```bash
[
{
"Weather Safety": "http://www.weather.gov/safetycampaign"
},
{
"Air Quality": "https://www.weather.gov/safety/airquality"
},
{
"Beach Hazards": "https://www.weather.gov/safety/beachhazards"
},
{
"Cold": "https://www.weather.gov/safety/cold"
},
{
"Cold Water": "https://www.weather.gov/safety/coldwater"
},
{
"Drought": "https://www.weather.gov/safety/drought"
},
{
"Floods": "https://www.weather.gov/safety/flood"
},
{
"Fog": "https://www.weather.gov/safety/fog"
},
{
"Heat": "https://www.weather.gov/safety/heat"
},
{
"Hurricanes": "https://www.weather.gov/safety/hurricane"
},
{
"Lightning Safety": "https://www.weather.gov/safety/lightning"
},
{
"Rip Currents": "https://www.weather.gov/safety/ripcurrent"
},
{
"Safe Boating": "https://www.weather.gov/safety/safeboating"
},
{
"Space Weather": "https://www.weather.gov/safety/space"
},
{
"Sun (Ultraviolet Radiation)": "https://www.weather.gov/safety/heat-uv"
},
{
"Thunderstorms & Tornadoes": "https://www.weather.gov/safety/thunderstorm"
},
{
"Tornado": "https://www.weather.gov/safety/tornado"
},
{
"Tsunami": "https://www.weather.gov/safety/tsunami"
},
{
"Wildfire": "https://www.weather.gov/safety/wildfire"
},
{
"Wind": "https://www.weather.gov/safety/wind"
},
{
"Winter": "https://www.weather.gov/safety/winter"
}
]
```

## Additional resources

- [jq Tutorial](https://jqlang.org/tutorial/)
- [jq Manual](https://jqlang.org/manual/)
- [jq Playground](https://play.jqlang.org/)